-
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
Authors:
Kenneth Enevoldsen,
Márton Kardos,
Niklas Muennighoff,
Kristoffer Laigaard Nielbo
Abstract:
The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text…
▽ More
The evaluation of English text embeddings has transitioned from evaluating a handful of datasets to broad coverage across many tasks through benchmarks such as MTEB. However, this is not the case for multilingual text embeddings due to a lack of available benchmarks. To address this problem, we introduce the Scandinavian Embedding Benchmark (SEB). SEB is a comprehensive framework that enables text embedding evaluation for Scandinavian languages across 24 tasks, 10 subtasks, and 4 task categories. Building on SEB, we evaluate more than 26 models, uncovering significant performance disparities between public and commercial solutions not previously captured by MTEB. We open-source SEB and integrate it with MTEB, thus bridging the text embedding evaluation gap for Scandinavian languages.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Reviewer Preferences and Gender Disparities in Aesthetic Judgments
Authors:
Ida Marie Schytt Lassen,
Yuri Bizzoni,
Telma Peura,
Mads Rosendahl Thomsen,
Kristoffer Laigaard Nielbo
Abstract:
Aesthetic preferences are considered highly subjective resulting in inherently noisy judgements of aesthetic objects, yet certain aspects of aesthetic judgement display convergent trends over time. This paper present a study that uses literary reviews as a proxy for aesthetic judgement in order to identify systematic components that can be attributed to bias. Specifically we find that judgement of…
▽ More
Aesthetic preferences are considered highly subjective resulting in inherently noisy judgements of aesthetic objects, yet certain aspects of aesthetic judgement display convergent trends over time. This paper present a study that uses literary reviews as a proxy for aesthetic judgement in order to identify systematic components that can be attributed to bias. Specifically we find that judgement of literary quality in newspapers displays a gender bias in preference of male writers. Male reviewers have a same gender preference while female reviewer show an opposite gender preference. While alternative accounts exist of this apparent gender disparity, we argue that it reflects a cultural gender antagonism.
△ Less
Submitted 21 June, 2022; v1 submitted 17 June, 2022;
originally announced June 2022.
-
Event Flow -- How Events Shaped the Flow of the News, 1950-1995
Authors:
Melvin Wevers,
Jan Kostkan,
Kristoffer L. Nielbo
Abstract:
This article relies on information-theoretic measures to examine how events impacted the news for the period 1950-1995. Moreover, we present a method for event characterization in (unstructured) textual sources, offering a taxonomy of events based on the different ways they impacted the flow of news information. The results give us a better understanding of the relationship between events and thei…
▽ More
This article relies on information-theoretic measures to examine how events impacted the news for the period 1950-1995. Moreover, we present a method for event characterization in (unstructured) textual sources, offering a taxonomy of events based on the different ways they impacted the flow of news information. The results give us a better understanding of the relationship between events and their impact on news sources with varying ideological backgrounds.
△ Less
Submitted 17 September, 2021;
originally announced September 2021.
-
When no news is bad news -- Detection of negative events from news media content
Authors:
Kristoffer L. Nielbo,
Frida Haestrup,
Kenneth C. Enevoldsen,
Peter B. Vahlstrup,
Rebekah B. Baglini,
Andreas Roepstorff
Abstract:
During the first wave of Covid-19 information decoupling could be observed in the flow of news media content. The corollary of the content alignment within and between news sources experienced by readers (i.e., all news transformed into Corona-news), was that the novelty of news content went down as media focused monotonically on the pandemic event. This all-important Covid-19 news theme turned ou…
▽ More
During the first wave of Covid-19 information decoupling could be observed in the flow of news media content. The corollary of the content alignment within and between news sources experienced by readers (i.e., all news transformed into Corona-news), was that the novelty of news content went down as media focused monotonically on the pandemic event. This all-important Covid-19 news theme turned out to be quite persistent as the pandemic continued, resulting in the, from a news media's perspective, paradoxical situation where the same news was repeated over and over. This information phenomenon, where novelty decreases and persistence increases, has previously been used to track change in news media, but in this study we specifically test the claim that new information decoupling behavior of media can be used to reliably detect change in news media content originating in a negative event, using a Bayesian approach to change point detection.
△ Less
Submitted 12 February, 2021;
originally announced February 2021.
-
News Information Decoupling: An Information Signature of Catastrophes in Legacy News Media
Authors:
Kristoffer L. Nielbo,
Rebekah B. Baglini,
Peter B. Vahlstrup,
Kenneth C. Enevoldsen,
Anja Bechmann,
Andreas Roepstorff
Abstract:
Content alignment in news media was an observable information effect of Covid-19's initial phase. During the first half of 2020, legacy news media became "corona news" following national outbreak and crises management patterns. While news media are neither unbiased nor infallible as sources of events, they do provide a window into socio-cultural responses to events. In this paper, we use legacy pr…
▽ More
Content alignment in news media was an observable information effect of Covid-19's initial phase. During the first half of 2020, legacy news media became "corona news" following national outbreak and crises management patterns. While news media are neither unbiased nor infallible as sources of events, they do provide a window into socio-cultural responses to events. In this paper, we use legacy print media to empirically derive the principle News Information Decoupling (NID) that functions as an information signature of culturally significant catastrophic event. Formally, NID can provide input to change detection algorithms and points to several unsolved research problems in the intersection of information theory and media studies.
△ Less
Submitted 8 January, 2021;
originally announced January 2021.
-
Tracking the Consumption Junction: Temporal Dependencies between Articles and Advertisements in Dutch Newspapers
Authors:
Melvin Wevers,
Jianbo Gao,
Kristoffer L. Nielbo
Abstract:
Historians have regularly debated whether advertisements can be used as a viable source to study the past. Their main concern centered on the question of agency. Were advertisements a reflection of historical events and societal debates, or were ad makers instrumental in shaping society and the ways people interacted with consumer goods? Using techniques from econometrics (Granger causality test)…
▽ More
Historians have regularly debated whether advertisements can be used as a viable source to study the past. Their main concern centered on the question of agency. Were advertisements a reflection of historical events and societal debates, or were ad makers instrumental in shaping society and the ways people interacted with consumer goods? Using techniques from econometrics (Granger causality test) and complexity science (Adaptive Fractal Analysis), this paper analyzes to what extent advertisements shaped or reflected society. We found evidence that indicate a fundamental difference between the dynamic behavior of word use in articles and advertisements published in a century of Dutch newspapers. Articles exhibit persistent trends that are likely to be reflective of communicative memory. Contrary to this, advertisements have a more irregular behavior characterized by short bursts and fast decay, which, in part, mirrors the dynamic through which advertisers introduced terms into public discourse. On the issue of whether advertisements shaped or reflected society, we found particular product types that seemed to be collectively driven by a causality going from advertisements to articles. Generally, we found support for a complex interaction pattern dubbed the consumption junction. Finally, we discovered noteworthy patterns in terms of causality and long-range dependencies for specific product groups. All in, this study shows how methods from econometrics and complexity science can be applied to humanities data to improve our understanding of complex cultural-historical phenomena such as the role of advertising in society.
△ Less
Submitted 27 March, 2019;
originally announced March 2019.