Skip to main content

Showing 1–29 of 29 results for author: Dietze, S

  1. arXiv:2407.10321  [pdf, other

    cs.CY cs.SI

    Public Discourse about COVID-19 Vaccinations: A Computational Analysis of the Relationship between Public Concerns and Policies

    Authors: Katarina Boland, Christopher Starke, Felix Bensmann, Frank Marcinkowski, Stefan Dietze

    Abstract: Societies worldwide have witnessed growing rifts separating advocates and opponents of vaccinations and other COVID-19 countermeasures. With the rollout of vaccination campaigns, German-speaking regions exhibited much lower vaccination uptake than other European regions. While Austria, Germany, and Switzerland (the DACH region) caught up over time, it remains unclear which factors contributed to t… ▽ More

    Submitted 7 May, 2024; originally announced July 2024.

    Comments: 34 pages, 9 figures

  2. arXiv:2404.08443  [pdf, other

    cs.DL cs.IR

    Toward FAIR Semantic Publishing of Research Dataset Metadata in the Open Research Knowledge Graph

    Authors: Raia Abu Ahmad, Jennifer D'Souza, Matthäus Zloch, Wolfgang Otto, Georg Rehm, Allard Oelen, Stefan Dietze, Sören Auer

    Abstract: Search engines these days can serve datasets as search results. Datasets get picked up by search technologies based on structured descriptions on their official web pages, informed by metadata ontologies such as the Dataset content type of schema.org. Despite this promotion of the content type dataset as a first-class citizen of search results, a vast proportion of datasets, particularly research… ▽ More

    Submitted 12 April, 2024; originally announced April 2024.

    Comments: 8 pages, 1 figure, published in the Joint Proceedings of the Onto4FAIR 2023 Workshops

    Journal ref: In Joint Proceedings of the Onto4FAIR 2023 Workshops: Collocated with FOIS 2023 and SEMANTICS 2023. pp.23-31. https://hal.science/hal-04312604

  3. arXiv:2404.05587  [pdf, ps, other

    cs.CL

    Enhancing Software-Related Information Extraction via Single-Choice Question Answering with Large Language Models

    Authors: Wolfgang Otto, Sharmila Upadhyaya, Stefan Dietze

    Abstract: This paper describes our participation in the Shared Task on Software Mentions Disambiguation (SOMD), with a focus on improving relation extraction in scholarly texts through generative Large Language Models (LLMs) using single-choice question-answering. The methodology prioritises the use of in-context learning capabilities of GLMs to extract software-related entities and their descriptive attrib… ▽ More

    Submitted 19 April, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

    Comments: Accepted at: 1st Workshop on Natural Scientific Language Processing and Research Knowledge Graphs (NSLP 2024) Co-located with Extended Semantic Web Conference (ESWC 2024)

    ACM Class: I.2.7

  4. arXiv:2404.01992  [pdf, other

    cs.CL

    Dissecting Paraphrases: The Impact of Prompt Syntax and supplementary Information on Knowledge Retrieval from Pretrained Language Models

    Authors: Stephan Linzbach, Dimitar Dimitrov, Laura Kallmeyer, Kilian Evang, Hajira Jabeen, Stefan Dietze

    Abstract: Pre-trained Language Models (PLMs) are known to contain various kinds of knowledge. One method to infer relational knowledge is through the use of cloze-style prompts, where a model is tasked to predict missing subjects or objects. Typically, designing these prompts is a tedious task because small differences in syntax or semantics can have a substantial impact on knowledge retrieval performance.… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted for NAACL 2024

  5. arXiv:2404.00406  [pdf

    cs.CL cs.AI

    TACO -- Twitter Arguments from COnversations

    Authors: Marc Feger, Stefan Dietze

    Abstract: Twitter has emerged as a global hub for engaging in online conversations and as a research corpus for various disciplines that have recognized the significance of its user-generated content. Argument mining is an important analytical task for processing and understanding online discourse. Specifically, it aims to identify the structural elements of arguments, denoted as information and inference.… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  6. nuScenes Knowledge Graph -- A comprehensive semantic representation of traffic scenes for trajectory prediction

    Authors: Leon Mlodzian, Zhigang Sun, Hendrik Berkemeyer, Sebastian Monka, Zixu Wang, Stefan Dietze, Lavdim Halilaj, Juergen Luettin

    Abstract: Trajectory prediction in traffic scenes involves accurately forecasting the behaviour of surrounding vehicles. To achieve this objective it is crucial to consider contextual information, including the driving path of vehicles, road topology, lane dividers, and traffic rules. Although studies demonstrated the potential of leveraging heterogeneous context for improving trajectory prediction, state-o… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: Accepted to the 2023 IEEE/CVF International Converence on Computer Vision (ICCV) workshop on Scene Graphs and Graph Representation Learning (SG2RL)

    ACM Class: I.2.4; I.2.6; I.2.10

  7. arXiv:2311.09860  [pdf, other

    cs.CL

    GSAP-NER: A Novel Task, Corpus, and Baseline for Scholarly Entity Extraction Focused on Machine Learning Models and Datasets

    Authors: Wolfgang Otto, Matthäus Zloch, Lu Gan, Saurav Karmakar, Stefan Dietze

    Abstract: Named Entity Recognition (NER) models play a crucial role in various NLP tasks, including information extraction (IE) and text understanding. In academic writing, references to machine learning models and datasets are fundamental components of various computer science publications and necessitate accurate models for identification. Despite the advancements in NER, existing ground truth datasets do… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: 10 pages, 1 figure, Accepted at EMNLP2023-Findings

  8. arXiv:2308.06374  [pdf, other

    cs.AI cs.CL

    Large Language Models and Knowledge Graphs: Opportunities and Challenges

    Authors: Jeff Z. Pan, Simon Razniewski, Jan-Christoph Kalo, Sneha Singhania, Jiaoyan Chen, Stefan Dietze, Hajira Jabeen, Janna Omeliyanenko, Wen Zhang, Matteo Lissandrini, Russa Biswas, Gerard de Melo, Angela Bonifati, Edlira Vakaj, Mauro Dragoni, Damien Graux

    Abstract: Large Language Models (LLMs) have taken Knowledge Representation -- and the world -- by storm. This inflection point marks a shift from explicit knowledge representation to a renewed focus on the hybrid representation of both explicit knowledge and parametric knowledge. In this position paper, we will discuss some of the common debate points within the community on LLMs (parametric knowledge) and… ▽ More

    Submitted 11 August, 2023; originally announced August 2023.

    Comments: 30 pages

  9. Which Factors are associated with Open Access Publishing? A Springer Nature Case Study

    Authors: Fakhri Momeni, Stefan Dietze, Philipp Mayr, Kristin Biesenbender, Isabella Peters

    Abstract: Open Access (OA) facilitates access to articles. But, authors or funders often must pay the publishing costs preventing authors who do not receive financial support from participating in OA publishing and citation advantage for OA articles. OA may exacerbate existing inequalities in the publication system rather than overcome them. To investigate this, we studied 522,411 articles published by Spri… ▽ More

    Submitted 25 April, 2023; v1 submitted 17 August, 2022; originally announced August 2022.

    Journal ref: Quantitative Science Studies 2023

  10. Investigating the contribution of author- and publication-specific features to scholars' h-index prediction

    Authors: Fakhri Momeni, Philipp Mayr, Stefan Dietze

    Abstract: Evaluation of researchers' output is vital for hiring committees and funding bodies, and it is usually measured via their scientific productivity, citations, or a combined metric such as h-index. Assessing young researchers is more critical because it takes a while to get citations and increment of h-index. Hence, predicting the h-index can help to discover the researchers' scientific impact. In a… ▽ More

    Submitted 9 August, 2023; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: 14 pages, 1 figure

    Journal ref: EPJ Data Science 2023

  11. arXiv:2207.01256  [pdf, other

    cs.IR cs.HC

    Still Haven't Found What You're Looking For -- Detecting the Intent of Web Search Missions from User Interaction Features

    Authors: Ran Yu, Limock, Stefan Dietze

    Abstract: Web search is among the most frequent online activities. Whereas traditional information retrieval techniques focus on the information need behind a user query, previous work has shown that user behaviour and interaction can provide important signals for understanding the underlying intent of a search mission. An established taxonomy distinguishes between transactional, navigational and informatio… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

  12. arXiv:2206.07360  [pdf, other

    cs.CL cs.CY cs.SI

    SciTweets -- A Dataset and Annotation Framework for Detecting Scientific Online Discourse

    Authors: Salim Hafid, Sebastian Schellhammer, Sandra Bringay, Konstantin Todorov, Stefan Dietze

    Abstract: Scientific topics, claims and resources are increasingly debated as part of online discourse, where prominent examples include discourse related to COVID-19 or climate change. This has led to both significant societal impact and increased interest in scientific online discourse from various disciplines. For instance, communication studies aim at a deeper understanding of biases, quality or spreadi… ▽ More

    Submitted 6 July, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: submitted to CIKM 2022

  13. The many facets of academic mobility and its impact on scholars' career

    Authors: Fakhri Momeni, Fariba Karimi, Philipp Mayr, Isabella Peters, Stefan Dietze

    Abstract: International mobility in academia can enhance the human and social capital of researchers and consequently their scientific outcome. However, there is still a very limited understanding of the different mobility patterns among scholars with various socio-demographic characteristics. The aim of this study is twofold. First, we investigate to what extent individual factors associate with the mobili… ▽ More

    Submitted 29 March, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: 27 pages

    Report number: Journal of Informetrics, Volume 16, Issue 2, May 2022

  14. SaL-Lightning Dataset: Search and Eye Gaze Behavior, Resource Interactions and Knowledge Gain during Web Search

    Authors: Christian Otto, Markus Rokicki, Georg Pardi, Wolfgang Gritz, Daniel Hienert, Ran Yu, Johannes von Hoyer, Anett Hoppe, Stefan Dietze, Peter Holtz, Yvonne Kammerer, Ralph Ewerth

    Abstract: The emerging research field Search as Learning investigates how the Web facilitates learning through modern information retrieval systems. SAL research requires significant amounts of data that capture both search behavior of users and their acquired knowledge in order to obtain conclusive insights or train supervised machine learning models. However, the creation of such datasets is costly and re… ▽ More

    Submitted 7 January, 2022; originally announced January 2022.

    Comments: To be published at the 2022 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR '22)

  15. arXiv:2108.09070  [pdf, other

    cs.IR cs.CL

    SoMeSci- A 5 Star Open Data Gold Standard Knowledge Graph of Software Mentions in Scientific Articles

    Authors: David Schindler, Felix Bensmann, Stefan Dietze, Frank Krüger

    Abstract: Knowledge about software used in scientific investigations is important for several reasons, for instance, to enable an understanding of provenance and methods involved in data handling. However, software is usually not formally cited, but rather mentioned informally within the scholarly description of the investigation, raising the need for automatic information extraction and disambiguation. Giv… ▽ More

    Submitted 20 August, 2021; originally announced August 2021.

    Comments: Preprint of CIKM 2021 Resource Paper, 10 pages

  16. arXiv:2106.06244  [pdf, other

    cs.IR

    Predicting Knowledge Gain during Web Search based on Multimedia Resource Consumption

    Authors: Christian Otto, Ran Yu, Georg Pardi, Johannes von Hoyer, Markus Rokicki, Anett Hoppe, Peter Holtz, Yvonne Kammerer, Stefan Dietze, Ralph Ewerth

    Abstract: In informal learning scenarios the popularity of multimedia content, such as video tutorials or lectures, has significantly increased. Yet, the users' interactions, navigation behavior, and consequently learning outcome, have not been researched extensively. Related work in this field, also called search as learning, has focused on behavioral or text resource features to predict learning outcome a… ▽ More

    Submitted 11 June, 2021; originally announced June 2021.

    Comments: 13 pages, 2 figures, 2 tables

  17. Better Together -- An Ensemble Learner for Combining the Results of Ready-made Entity Linking Systems

    Authors: Renato Stoffalette João, Pavlos Fafalios, Stefan Dietze

    Abstract: Entity linking (EL) is the task of automatically identifying entity mentions in text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. Throughout the past decade, a plethora of EL systems and pipelines have become available, where performance of individual systems varies heavily across corpora, languages or domains. Linking performance varies even between d… ▽ More

    Submitted 14 January, 2021; originally announced January 2021.

    Comments: SAC '20: Proceedings of the 35th Annual ACM Symposium on Applied Computing

  18. The Role of Word-Eye-Fixations for Query Term Prediction

    Authors: Masoud Davari, Daniel Hienert, Dagmar Kern, Stefan Dietze

    Abstract: Throughout the search process, the user's gaze on inspected SERPs and websites can reveal his or her search interests. Gaze behavior can be captured with eye tracking and described with word-eye-fixations. Word-eye-fixations contain the user's accumulated gaze fixation duration on each individual word of a web page. In this work, we analyze the role of word-eye-fixations for predicting query terms… ▽ More

    Submitted 5 August, 2020; originally announced August 2020.

    Journal ref: In CHIIR 2020, Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, March 2020, Pages 422-426

  19. Exploiting stance hierarchies for cost-sensitive stance detection of Web documents

    Authors: Arjun Roy, Pavlos Fafalios, Asif Ekbal, Xiaofei Zhu, Stefan Dietze

    Abstract: Fact checking is an essential challenge when combating fake news. Identifying documents that agree or disagree with a particular statement (claim) is a core task in this process. In this context, stance detection aims at identifying the position (stance) of a document towards a claim. Most approaches address this task through a 4-class classification model where the class distribution is highly im… ▽ More

    Submitted 17 May, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

    Comments: This is a pre-print version of the Journal paper published in J Intell Inf Syst (2021) (Springer). https://rdcu.be/ckLiC

  20. TweetsCOV19 -- A Knowledge Base of Semantically Annotated Tweets about the COVID-19 Pandemic

    Authors: Dimitar Dimitrov, Erdal Baran, Pavlos Fafalios, Ran Yu, Xiaofei Zhu, Matthäus Zloch, Stefan Dietze

    Abstract: Publicly available social media archives facilitate research in the social sciences and provide corpora for training and testing a wide range of machine learning and natural language processing methods. With respect to the recent outbreak of the Coronavirus disease 2019 (COVID-19), online discourse on Twitter reflects public opinion and perception related to the pandemic itself as well as mitigati… ▽ More

    Submitted 15 August, 2020; v1 submitted 25 June, 2020; originally announced June 2020.

  21. A Software Framework and Datasets for the Analysis of Graph Measures on RDF Graphs

    Authors: Matthäus Zloch, Maribel Acosta, Daniel Hienert, Stefan Dietze, Stefan Conrad

    Abstract: As the availability and the inter-connectivity of RDF datasets grow, so does the necessity to understand the structure of the data. Understanding the topology of RDF graphs can guide and inform the development of, e.g. synthetic dataset generators, sampling methods, index structures, or query optimizers. In this work, we propose two resources: (i) a software framework able to acquire, prepare, and… ▽ More

    Submitted 3 July, 2019; originally announced July 2019.

    Comments: Submitted at ESWC 2019, Resources Track. 15 pages, 5 figures, 2 tables

  22. Data4UrbanMobility: Towards Holistic Data Analytics for Mobility Applications in Urban Regions

    Authors: Nicolas Tempelmeier, Yannick Rietz, Iryna Lishchuk, Tina Kruegel, Olaf Mumm, Vanessa Miriam Carlow, Stefan Dietze, Elena Demidova

    Abstract: With the increasing availability of mobility-related data, such as GPS-traces, Web queries and climate conditions, there is a growing demand to utilize this data to better understand and support urban mobility needs. However, data available from the individual actors, such as providers of information, navigation and transportation systems, is mostly restricted to isolated mobility modes, whereas h… ▽ More

    Submitted 26 March, 2019; originally announced March 2019.

    Journal ref: Companion Proceedings of The Web Conference 2019

  23. arXiv:1812.10387  [pdf, ps, other

    cs.CL cs.IR cs.LG stat.ML

    Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty

    Authors: Renato Stoffalette João, Pavlos Fafalios, Stefan Dietze

    Abstract: Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefuln… ▽ More

    Submitted 13 December, 2018; originally announced December 2018.

    Comments: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019)

  24. TweetsKB: A Public and Large-Scale RDF Corpus of Annotated Tweets

    Authors: Pavlos Fafalios, Vasileios Iosifidis, Eirini Ntoutsi, Stefan Dietze

    Abstract: Publicly available social media archives facilitate research in a variety of fields, such as data science, sociology or the digital humanities, where Twitter has emerged as one of the most prominent sources. However, obtaining, archiving and annotating large amounts of tweets is costly. In this paper, we describe TweetsKB, a publicly available corpus of currently more than 1.5 billion tweets, span… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  25. arXiv:1810.10004  [pdf, ps, other

    cs.IR cs.LG stat.ML

    Time-Aware and Corpus-Specific Entity Relatedness

    Authors: Nilamadhaba Mohapatra, Vasileios Iosifidis, Asif Ekbal, Stefan Dietze, Pavlos Fafalios

    Abstract: Entity relatedness has emerged as an important feature in a plethora of applications such as information retrieval, entity recommendation and entity linking. Given an entity, for instance a person or an organization, entity relatedness measures can be exploited for generating a list of highly-related entities. However, the relation of an entity to some other entity depends on several factors, with… ▽ More

    Submitted 23 October, 2018; originally announced October 2018.

  26. arXiv:1806.11046  [pdf, other

    cs.HC

    Detecting, Understanding and Supporting Everyday Learning in Web Search

    Authors: Ran Yu, Ujwal Gadiraju, Stefan Dietze

    Abstract: Web search is among the most ubiquitous online activities, commonly used to acquire new knowledge and to satisfy learning-related objectives through informational search sessions. The importance of learning as an outcome of web search has been recognized widely, leading to a variety of research at the intersection of information retrieval, human computer interaction and learning-oriented sciences.… ▽ More

    Submitted 28 June, 2018; originally announced June 2018.

    Comments: 6 pages, LILE workshop at ACM WebSci conferentce 2018

  27. Predicting User Knowledge Gain in Informational Search Sessions

    Authors: Ran Yu, Ujwal Gadiraju, Peter Holtz, Markus Rokicki, Philipp Kemkes, Stefan Dietze

    Abstract: Web search is frequently used by people to acquire new knowledge and to satisfy learning-related objectives. In this context, informational search missions with an intention to obtain knowledge pertaining to a topic are prominent. The importance of learning as an outcome of web search has been recognized. Yet, there is a lack of understanding of the impact of web search on a user's knowledge state… ▽ More

    Submitted 2 May, 2018; originally announced May 2018.

    Comments: 10 pages, 2 figures, SIGIR18

  28. Inferring Missing Categorical Information in Noisy and Sparse Web Markup

    Authors: Nicolas Tempelmeier, Elena Demidova, Stefan Dietze

    Abstract: Embedded markup of Web pages has seen widespread adoption throughout the past years driven by standards such as RDFa and Microdata and initiatives such as schema.org, where recent studies show an adoption by 39% of all Web pages already in 2016. While this constitutes an important information source for tasks such as Web search, Web page classification or knowledge graph augmentation, individual m… ▽ More

    Submitted 1 March, 2018; originally announced March 2018.

    Journal ref: Proceedings of The Web Conference 2018, 27th edition of the former WWW conference

  29. Improving Entity Retrieval on Structured Data

    Authors: Besnik Fetahu, Ujwal Gadiraju, Stefan Dietze

    Abstract: The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper,… ▽ More

    Submitted 30 March, 2017; originally announced March 2017.