Skip to main content

Showing 1–20 of 20 results for author: Kassner, N

  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2402.16837  [pdf, other

    cs.CL

    Do Large Language Models Latently Perform Multi-Hop Reasoning?

    Authors: Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, Sebastian Riedel

    Abstract: We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We ana… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  3. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  4. arXiv:2306.08896  [pdf, other

    cs.CL

    Multilingual End to End Entity Linking

    Authors: Mikhail Plekhanov, Nora Kassner, Kashyap Popat, Louis Martin, Simone Merello, Borislav Kozlovskii, Frédéric A. Dreyer, Nicola Cancedda

    Abstract: Entity Linking is one of the most common Natural Language Processing tasks in practical applications, but so far efficient end-to-end solutions with multilingual coverage have been lacking, leading to complex model stacks. To fill this gap, we release and open source BELA, the first fully end-to-end multilingual entity linking model that efficiently detects and links entities in texts in any of 97… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  5. arXiv:2305.14250  [pdf, other

    cs.CL cs.AI

    Language Models with Rationality

    Authors: Nora Kassner, Oyvind Tafjord, Ashish Sabharwal, Kyle Richardson, Hinrich Schuetze, Peter Clark

    Abstract: While large language models (LLMs) are proficient at question-answering (QA), it is not always clear how (or even if) an answer follows from their latent "beliefs". This lack of interpretability is a growing impediment to widespread use of LLMs. To address this, our goals are to make model beliefs and their inferential relationships explicit, and to resolve inconsistencies that may exist, so that… ▽ More

    Submitted 29 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

  6. Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages

    Authors: Ayyoob Imani, Peiqin Lin, Amir Hossein Kargaran, Silvia Severini, Masoud Jalili Sabet, Nora Kassner, Chunlan Ma, Helmut Schmid, André F. T. Martins, François Yvon, Hinrich Schütze

    Abstract: The NLP community has mainly focused on scaling Large Language Models (LLMs) vertically, i.e., making them better for about 100 languages. We instead scale LLMs horizontally: we create, through continued pretraining, Glot500-m, an LLM that covers 511 predominantly low-resource languages. An important part of this effort is to collect and clean Glot500-c, a corpus that covers these 511 languages an… ▽ More

    Submitted 26 May, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: ACL 2023

  7. arXiv:2305.12027  [pdf, other

    cs.CL cs.AI

    Polar Ducks and Where to Find Them: Enhancing Entity Linking with Duck Typing and Polar Box Embeddings

    Authors: Mattia Atzeni, Mikhail Plekhanov, Frédéric A. Dreyer, Nora Kassner, Simone Merello, Louis Martin, Nicola Cancedda

    Abstract: Entity linking methods based on dense retrieval are an efficient and widely used solution in large-scale applications, but they fall short of the performance of generative models, as they are sensitive to the structure of the embedding space. In order to address this issue, this paper introduces DUCK, an approach to infusing structural information in the space of entity representations, using prio… ▽ More

    Submitted 20 October, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted at EMNLP 2023

  8. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  9. arXiv:2207.14251  [pdf, other

    cs.CL

    Measuring Causal Effects of Data Statistics on Language Model's `Factual' Predictions

    Authors: Yanai Elazar, Nora Kassner, Shauli Ravfogel, Amir Feder, Abhilasha Ravichander, Marius Mosbach, Yonatan Belinkov, Hinrich Schütze, Yoav Goldberg

    Abstract: Large amounts of training data are one of the major reasons for the high performance of state-of-the-art NLP models. But what exactly in the training data causes a model to make a certain prediction? We seek to answer this question by providing a language for describing how training data influences predictions, through a causal framework. Importantly, our framework bypasses the need to retrain exp… ▽ More

    Submitted 24 March, 2023; v1 submitted 28 July, 2022; originally announced July 2022.

    Comments: We received a criticism regarding the validity of the causal formulation in this paper. We will address them in an upcoming version

  10. arXiv:2205.12570  [pdf, other

    cs.CL

    EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery and Indexing

    Authors: Nora Kassner, Fabio Petroni, Mikhail Plekhanov, Sebastian Riedel, Nicola Cancedda

    Abstract: Existing work on Entity Linking mostly assumes that the reference knowledge base is complete, and therefore all mentions can be linked. In practice this is hardly ever the case, as knowledge bases are incomplete and because novel concepts arise constantly. This paper created the Unknown Entity Discovery and Indexing (EDIN) benchmark where unknown entities, that is entities without a description in… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  11. arXiv:2110.04888  [pdf, other

    cs.CL cs.AI cs.DB

    Language Models As or For Knowledge Bases

    Authors: Simon Razniewski, Andrew Yates, Nora Kassner, Gerhard Weikum

    Abstract: Pre-trained language models (LMs) have recently gained attention for their potential as an alternative to (or proxy for) explicit knowledge bases (KBs). In this position paper, we examine this hypothesis, identify strengths and limitations of both LMs and KBs, and discuss the complementary nature of the two paradigms. In particular, we offer qualitative arguments that latent LMs are not suitable a… ▽ More

    Submitted 10 October, 2021; originally announced October 2021.

    Journal ref: DL4KG 2021

  12. arXiv:2109.14723  [pdf, other

    cs.CL

    BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief

    Authors: Nora Kassner, Oyvind Tafjord, Hinrich Schütze, Peter Clark

    Abstract: Although pretrained language models (PTLMs) contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after specialized training. As a result, it can be hard to identify what the model actually "believes" about the world, making it susceptible to inconsistent behavior and simple errors. Our goal is to reduce these problems. Our appro… ▽ More

    Submitted 29 September, 2021; originally announced September 2021.

    Comments: EMNLP 2021 Camera Ready. arXiv admin note: substantial text overlap with arXiv:2104.08401

  13. arXiv:2104.08401  [pdf, ps, other

    cs.CL cs.AI

    Enriching a Model's Notion of Belief using a Persistent Memory

    Authors: Nora Kassner, Oyvind Tafjord, Hinrich Schutze, Peter Clark

    Abstract: Although pretrained language models (PTLMs) have been shown to contain significant amounts of world knowledge, they can still produce inconsistent answers to questions when probed, even after using specialized training techniques to reduce inconsistency. As a result, it can be hard to identify what the model actually "believes" about the world. Our goal is to reduce this problem, so systems are mo… ▽ More

    Submitted 7 October, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

    Comments: This is an old and now obsolete draft. See arXiv:2109.14723 ("BeliefBank: Adding Memory to a Pre-Trained Language Model for a Systematic Notion of Belief") for the final paper

  14. arXiv:2104.07094  [pdf, other

    cs.CL

    Static Embeddings as Efficient Knowledge Bases?

    Authors: Philipp Dufter, Nora Kassner, Hinrich Schütze

    Abstract: Recent research investigates factual knowledge stored in large pretrained language models (PLMs). Instead of structural knowledge base (KB) queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. The good performance on this analysis task has been interpreted as PLMs becoming potential repositories of factual knowledge. In experiments across ten linguistically divers… ▽ More

    Submitted 14 April, 2021; originally announced April 2021.

    Comments: NAACL2021 CRV; first two authors contributed equally

  15. arXiv:2102.01017  [pdf, other

    cs.CL

    Measuring and Improving Consistency in Pretrained Language Models

    Authors: Yanai Elazar, Nora Kassner, Shauli Ravfogel, Abhilasha Ravichander, Eduard Hovy, Hinrich Schütze, Yoav Goldberg

    Abstract: Consistency of a model -- that is, the invariance of its behavior under meaning-preserving alternations in its input -- is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel, a high-quality resource of cloze-style query English paraphrases… ▽ More

    Submitted 29 May, 2021; v1 submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted to the TACL journal, pre-MIT Press publication version

  16. arXiv:2102.00894  [pdf, other

    cs.CL

    Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models

    Authors: Nora Kassner, Philipp Dufter, Hinrich Schütze

    Abstract: Recently, it has been found that monolingual English language models can be used as knowledge bases. Instead of structural knowledge base queries, masked sentences such as "Paris is the capital of [MASK]" are used as probes. We translate the established benchmarks TREx and GoogleRE into 53 languages. Working with mBERT, we investigate three questions. (i) Can mBERT be used as a multilingual knowle… ▽ More

    Submitted 1 February, 2021; originally announced February 2021.

    Comments: Accepted to EACL 2021

  17. arXiv:2006.12414  [pdf, ps, other

    cs.CL

    Dirichlet-Smoothed Word Embeddings for Low-Resource Settings

    Authors: Jakob Jungmaier, Nora Kassner, Benjamin Roth

    Abstract: Nowadays, classical count-based word embeddings using positive pointwise mutual information (PPMI) weighted co-occurrence matrices have been widely superseded by machine-learning-based methods like word2vec and GloVe. But these methods are usually applied using very large amounts of text data. In many cases, however, there is not much text data available, for example for specific domains or low-re… ▽ More

    Submitted 22 June, 2020; originally announced June 2020.

    Journal ref: LREC 2020

  18. arXiv:2006.10413  [pdf, other

    cs.CL

    Are Pretrained Language Models Symbolic Reasoners Over Knowledge?

    Authors: Nora Kassner, Benno Krojer, Hinrich Schütze

    Abstract: How can pretrained language models (PLMs) learn factual knowledge from the training set? We investigate the two most important mechanisms: reasoning and memorization. Prior work has attempted to quantify the number of facts PLMs learn, but we present, using synthetic data, the first study that investigates the causal relation between facts present in training and facts learned by the PLM. For reas… ▽ More

    Submitted 10 October, 2020; v1 submitted 18 June, 2020; originally announced June 2020.

    Comments: Accepted to CoNLL 2020

  19. arXiv:2005.00766  [pdf, other

    cs.CL

    BERT-kNN: Adding a kNN Search Component to Pretrained Language Models for Better QA

    Authors: Nora Kassner, Hinrich Schütze

    Abstract: Khandelwal et al. (2020) use a k-nearest-neighbor (kNN) component to improve language model performance. We show that this idea is beneficial for open-domain question answering (QA). To improve the recall of facts encountered during training, we combine BERT (Devlin et al., 2019) with a traditional information retrieval step (IR) and a kNN search over a large datastore of an embedded text collecti… ▽ More

    Submitted 12 October, 2020; v1 submitted 2 May, 2020; originally announced May 2020.

    Comments: to appear in EMNLP Findings

  20. arXiv:1911.03343  [pdf, other

    cs.CL

    Negated and Misprimed Probes for Pretrained Language Models: Birds Can Talk, But Cannot Fly

    Authors: Nora Kassner, Hinrich Schütze

    Abstract: Building on Petroni et al. (2019), we propose two new probing tasks analyzing factual knowledge stored in Pretrained Language Models (PLMs). (1) Negation. We find that PLMs do not distinguish between negated ("Birds cannot [MASK]") and non-negated ("Birds can [MASK]") cloze questions. (2) Mispriming. Inspired by priming methods in human psychology, we add "misprimes" to cloze questions ("Talk? Bir… ▽ More

    Submitted 15 May, 2020; v1 submitted 8 November, 2019; originally announced November 2019.

    Comments: ACL 2020