Skip to main content

Showing 1–50 of 127 results for author: Riedel, S

  1. arXiv:2406.13121  [pdf, other

    cs.CL cs.AI cs.IR

    Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?

    Authors: Jinhyuk Lee, Anthony Chen, Zhuyun Dai, Dheeru Dua, Devendra Singh Sachan, Michael Boratko, Yi Luan, Sébastien M. R. Arnold, Vincent Perot, Siddharth Dalmia, Hexiang Hu, Xudong Lin, Panupong Pasupat, Aida Amini, Jeremy R. Cole, Sebastian Riedel, Iftekhar Naim, Ming-Wei Chang, Kelvin Guu

    Abstract: Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. Leveraging LCLMs' ability to natively ingest and process entire corpora of information offers numerous advantages. It enhances user-friendliness by eliminating the need for specialized knowledge of tools, provides robust end-to-… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 29 pages. Dataset available at https://github.com/google-deepmind/loft

  2. TeraPool-SDR: An 1.89TOPS 1024 RV-Cores 4MiB Shared-L1 Cluster for Next-Generation Open-Source Software-Defined Radios

    Authors: Yichao Zhang, Marco Bertuletti, Samuel Riedel, Matheus Cavalcante, Alessandro Vanelli-Coralli, Luca Benini

    Abstract: Radio Access Networks (RAN) workloads are rapidly scaling up in data processing intensity and throughput as the 5G (and beyond) standards grow in number of antennas and sub-carriers. Offering flexible Processing Elements (PEs), efficient memory access, and a productive parallel programming model, many-core clusters are a well-matched architecture for next-generation software-defined RANs, but stag… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

    Comments: 6 pages, 6 figures and 3 tables

  3. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  4. arXiv:2402.16837  [pdf, other

    cs.CL

    Do Large Language Models Latently Perform Multi-Hop Reasoning?

    Authors: Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, Sebastian Riedel

    Abstract: We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We ana… ▽ More

    Submitted 26 February, 2024; originally announced February 2024.

  5. arXiv:2402.12986  [pdf, other

    cs.AR

    Enabling Efficient Hybrid Systolic Computation in Shared L1-Memory Manycore Clusters

    Authors: Sergio Mazzola, Samuel Riedel, Luca Benini

    Abstract: Systolic arrays and shared-L1-memory manycore clusters are commonly used architectural paradigms that offer different trade-offs to accelerate parallel workloads. While the first excel with regular dataflow at the cost of rigid architectures and complex programming models, the second are versatile and easy to program but require explicit dataflow management and synchronization. This work aims at e… ▽ More

    Submitted 24 April, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  6. arXiv:2402.11330  [pdf, other

    eess.AS cs.SD

    Diffuse Sound Field Synthesis

    Authors: Franz Zotter, Stefan Riedel, Lukas Gölles, Matthias Frank

    Abstract: Can uncorrelated surrounding sound sources be used to generate extended diffuse sound fields? By definition, targets are a constant sound pressure level, a vanishing average sound intensity, uncorrelated sound waves arriving isotropically from all directions. Does this require specific sources and geometries for surrounding 2D and 3D source layouts? As methods, we employ numeric simulations and… ▽ More

    Submitted 21 February, 2024; v1 submitted 17 February, 2024; originally announced February 2024.

    Comments: 27 pages, 17 figures, submitted to acta acustica, including jan/feb 2024 upgrades while awaiting the reviews

  7. arXiv:2402.00851  [pdf, other

    cs.LG q-bio.QM

    Data Augmentation Scheme for Raman Spectra with Highly Correlated Annotations

    Authors: Christoph Lange, Isabel Thiele, Lara Santolin, Sebastian L. Riedel, Maxim Borisyak, Peter Neubauer, M. Nicolas Cruz Bournazou

    Abstract: In biotechnology Raman Spectroscopy is rapidly gaining popularity as a process analytical technology (PAT) that measures cell densities, substrate- and product concentrations. As it records vibrational modes of molecules it provides that information non-invasively in a single spectrum. Typically, partial least squares (PLS) is the model of choice to infer information about variables of interest fr… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

  8. arXiv:2401.09359  [pdf, other

    cs.AR

    LRSCwait: Enabling Scalable and Efficient Synchronization in Manycore Systems through Polling-Free and Retry-Free Operation

    Authors: Samuel Riedel, Marc Gantenbein, Alessandro Ottaviano, Torsten Hoefler, Luca Benini

    Abstract: Extensive polling in shared-memory manycore systems can lead to contention, decreased throughput, and poor energy efficiency. Both lock implementations and the general-purpose atomic operation, load-reserved/store-conditional (LRSC), cause polling due to serialization and retries. To alleviate this overhead, we propose LRwait and SCwait, a synchronization pair that eliminates polling by allowing c… ▽ More

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 6 pages, 6 figures, 2 tables, accepted as a regular paper at DATE24

  9. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  10. arXiv:2311.09569  [pdf, other

    cs.CL cs.AI

    Strings from the Library of Babel: Random Sampling as a Strong Baseline for Prompt Optimisation

    Authors: Yao Lu, Jiayi Wang, Raphael Tang, Sebastian Riedel, Pontus Stenetorp

    Abstract: Recent prompt optimisation approaches use the generative nature of language models to produce prompts -- even rivaling the performance of human-curated prompts. In this paper, we demonstrate that randomly sampling tokens from the model vocabulary as ``separators'' can be as effective as language models for prompt-style text classification. Our experiments show that random separators are competitiv… ▽ More

    Submitted 17 April, 2024; v1 submitted 16 November, 2023; originally announced November 2023.

    Comments: Accepted to NAACL 2024. The code is publicly available at https://github.com/yaolu/random-prompt

  11. arXiv:2309.10137  [pdf, other

    cs.AR

    Spatz: Clustering Compact RISC-V-Based Vector Units to Maximize Computing Efficiency

    Authors: Matheus Cavalcante, Matteo Perotti, Samuel Riedel, Luca Benini

    Abstract: The ever-increasing computational and storage requirements of modern applications and the slowdown of technology scaling pose major challenges to designing and implementing efficient computer architectures. In this paper, we leverage the architectural balance principle to alleviate the bandwidth bottleneck at the L1 data memory boundary of a tightly-coupled cluster of processing elements (PEs). We… ▽ More

    Submitted 18 September, 2023; originally announced September 2023.

    Comments: 14 pages

  12. arXiv:2307.10248  [pdf, ps, other

    cs.DC

    Fast Shared-Memory Barrier Synchronization for a 1024-Cores RISC-V Many-Core Cluster

    Authors: Marco Bertuletti, Samuel Riedel, Yichao Zhang, Alessandro Vanelli-Coralli, Luca Benini

    Abstract: Synchronization is likely the most critical performance killer in shared-memory parallel programs. With the rise of multi-core and many-core processors, the relative impact on performance and energy overhead of synchronization is bound to grow. This paper focuses on barrier synchronization for TeraPool, a cluster of 1024 RISC-V processors with non-uniform memory access to a tightly coupled 4MB sha… ▽ More

    Submitted 17 July, 2023; originally announced July 2023.

    Comments: 15 pages, 7 figures

  13. arXiv:2307.01163  [pdf, other

    cs.CL cs.LG cs.NE

    Improving Language Plasticity via Pretraining with Active Forgetting

    Authors: Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe

    Abstract: Pretrained language models (PLMs) are today the primary model for natural language processing. Despite their impressive downstream performance, it can be difficult to apply PLMs to new languages, a barrier to making their capabilities universally accessible. While prior work has shown it possible to address this issue by learning a new embedding layer for the new language, doing so is both data an… ▽ More

    Submitted 12 January, 2024; v1 submitted 3 July, 2023; originally announced July 2023.

    Comments: NeurIPS 2023 Final Version

  14. arXiv:2305.05240  [pdf, other

    cs.AR

    A High-performance, Energy-efficient Modular DMA Engine Architecture

    Authors: Thomas Benz, Michael Rogenmoser, Paul Scheffler, Samuel Riedel, Alessandro Ottaviano, Andreas Kurth, Torsten Hoefler, Luca Benini

    Abstract: Data transfers are essential in today's computing systems as latency and complex memory access patterns are increasingly challenging to manage. Direct memory access engines (DMAEs) are critically needed to transfer data independently of the processing elements, hiding latency and achieving high throughput even for complex access patterns to high-latency memory. With the prevalence of heterogeneous… ▽ More

    Submitted 14 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: 14 pages, 14 figures, accepted by an IEEE journal for publication

  15. MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

    Authors: Samuel Riedel, Matheus Cavalcante, Renzo Andri, Luca Benini

    Abstract: Shared L1 memory clusters are a common architectural pattern (e.g., in GPGPUs) for building efficient and flexible multi-processing-element (PE) engines. However, it is a common belief that these tightly-coupled clusters would not scale beyond a few tens of PEs. In this work, we tackle scaling shared L1 clusters to hundreds of PEs while supporting a flexible and productive programming model and ma… ▽ More

    Submitted 28 November, 2023; v1 submitted 30 March, 2023; originally announced March 2023.

    Comments: 14 pages, 17 figures, 2 tables, Published in IEEE Transactions on Computers

    Journal ref: IEEE Transactions on Computers, vol. 72, no. 12, pp. 3561-3575, Dec. 2023

  16. arXiv:2302.09865  [pdf, other

    cs.CL cs.AI cs.LG

    Can discrete information extraction prompts generalize across language models?

    Authors: Nathanaël Carraz Rakotonirina, Roberto Dessì, Fabio Petroni, Sebastian Riedel, Marco Baroni

    Abstract: We study whether automatically-induced prompts that effectively extract information from a language model can also be used, out-of-the-box, to probe other language models for the same information. After confirming that discrete prompts induced with the AutoPrompt algorithm outperform manual and semi-manual prompts on the slot-filling task, we demonstrate a drop in performance for AutoPrompt prompt… ▽ More

    Submitted 7 March, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

    Comments: Published as conference paper at ICLR 2023

  17. Perceptual evaluation of listener envelopment using spatial granular synthesis

    Authors: Stefan Riedel, Matthias Frank, Franz Zotter

    Abstract: Listener envelopment refers to the sensation of being surrounded by sound, either by multiple direct sound events or by a diffuse reverberant sound field. More recently, a specific attribute for the sensation of being covered by sound from elevated directions has been proposed by Sazdov et al. and was termed listener engulfment. This contribution investigates the effect of the temporal and directi… ▽ More

    Submitted 30 January, 2023; v1 submitted 24 January, 2023; originally announced January 2023.

    Comments: Submitted to the Journal of the Audio Engineering Society (JAES)

  18. arXiv:2211.09260  [pdf, other

    cs.CL

    Task-aware Retrieval with Instructions

    Authors: Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Wen-tau Yih

    Abstract: We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction tuning, which can follow human-written instructions to find the best documents for a given query. We introduce the first large-scale collection of approximately… ▽ More

    Submitted 19 December, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Code, data and pretrained model checkpoints are available at https://github.com/facebookresearch/tart

  19. arXiv:2210.16773  [pdf, other

    cs.CL cs.AI cs.LG

    An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

    Authors: Yuxiang Wu, Yu Zhao, Baotian Hu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

    Abstract: Access to external knowledge is essential for many natural language processing tasks, such as question answering and dialogue. Existing methods often rely on a parametric model that stores knowledge in its parameters, or use a retrieval-augmented model that has access to an external knowledge source. Parametric and retrieval-augmented models have complementary strengths in terms of computational e… ▽ More

    Submitted 30 October, 2022; originally announced October 2022.

    Comments: EMNLP 2022 main conference long paper. 8 pages, 6 figures

  20. arXiv:2210.07093  [pdf, other

    cs.CL

    Query Expansion Using Contextual Clue Sampling with Language Models

    Authors: Linqing Liu, Minghan Li, Jimmy Lin, Sebastian Riedel, Pontus Stenetorp

    Abstract: Query expansion is an effective approach for mitigating vocabulary mismatch between queries and documents in information retrieval. One recent line of research uses language models to generate query-related contexts for expansion. Along this line, we argue that expansion terms from these contexts should balance two key aspects: diversity and relevance. The obvious way to increase diversity is to s… ▽ More

    Submitted 13 October, 2022; originally announced October 2022.

  21. arXiv:2209.13331  [pdf, other

    cs.CL cs.LG

    EditEval: An Instruction-Based Benchmark for Text Improvements

    Authors: Jane Dwivedi-Yu, Timo Schick, Zhengbao Jiang, Maria Lomeli, Patrick Lewis, Gautier Izacard, Edouard Grave, Sebastian Riedel, Fabio Petroni

    Abstract: Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text. Writing, however, is naturally an iterative and incremental process that requires expertise in different modular skills such as fixing outdated information or making the style more consistent. Even so, comprehensive evaluation of a model's capacity to perform th… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  22. arXiv:2208.11663  [pdf, other

    cs.CL

    PEER: A Collaborative Language Model

    Authors: Timo Schick, Jane Dwivedi-Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard, Qingfei You, Christoforos Nalmpantis, Edouard Grave, Sebastian Riedel

    Abstract: Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and i… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

  23. arXiv:2208.03299  [pdf, other

    cs.CL

    Atlas: Few-shot Learning with Retrieval Augmented Language Models

    Authors: Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave

    Abstract: Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is uncl… ▽ More

    Submitted 16 November, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  24. arXiv:2207.09980  [pdf, other

    cs.LG cs.AI cs.CL

    ReFactor GNNs: Revisiting Factorisation-based Models from a Message-Passing Perspective

    Authors: Yihong Chen, Pushkar Mishra, Luca Franceschi, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

    Abstract: Factorisation-based Models (FMs), such as DistMult, have enjoyed enduring success for Knowledge Graph Completion (KGC) tasks, often outperforming Graph Neural Networks (GNNs). However, unlike GNNs, FMs struggle to incorporate node features and generalise to unseen nodes in inductive settings. Our work bridges the gap between FMs and GNNs by proposing ReFactor GNNs. This new architecture draws upon… ▽ More

    Submitted 27 October, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

    Comments: 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

    MSC Class: 68T05; 68T07; 68T50 ACM Class: I.2.7; I.2.6

  25. Spatz: A Compact Vector Processing Unit for High-Performance and Energy-Efficient Shared-L1 Clusters

    Authors: Matheus Cavalcante, Domenic Wüthrich, Matteo Perotti, Samuel Riedel, Luca Benini

    Abstract: While parallel architectures based on clusters of Processing Elements (PEs) sharing L1 memory are widespread, there is no consensus on how lean their PE should be. Architecting PEs as vector processors holds the promise to greatly reduce their instruction fetch bandwidth, mitigating the Von Neumann Bottleneck (VNB). However, due to their historical association with supercomputers, classical vector… ▽ More

    Submitted 16 July, 2022; originally announced July 2022.

    Comments: 9 pages. Accepted for publication in the 2022 International Conference on Computer-Aided Design (ICCAD 2022)

    ACM Class: C.1.3; C.1.2

  26. arXiv:2207.06220  [pdf, other

    cs.IR cs.AI

    Improving Wikipedia Verifiability with AI

    Authors: Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis, Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick, Pierre-Emmanuel Mazaré, Armand Joulin, Edouard Grave, Sebastian Riedel

    Abstract: Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this reason, finding relevant sources is a difficult task: many claims do not have any references that support them. Furthermore, even existing citations might not supp… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  27. arXiv:2205.12570  [pdf, other

    cs.CL

    EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery and Indexing

    Authors: Nora Kassner, Fabio Petroni, Mikhail Plekhanov, Sebastian Riedel, Nicola Cancedda

    Abstract: Existing work on Entity Linking mostly assumes that the reference knowledge base is complete, and therefore all mentions can be linked. In practice this is hardly ever the case, as knowledge bases are incomplete and because novel concepts arise constantly. This paper created the Unknown Entity Discovery and Indexing (EDIN) benchmark where unknown entities, that is entities without a description in… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  28. arXiv:2205.06266  [pdf, other

    cs.CL

    Lifting the Curse of Multilinguality by Pre-training Modular Transformers

    Authors: Jonas Pfeiffer, Naman Goyal, Xi Victoria Lin, Xian Li, James Cross, Sebastian Riedel, Mikel Artetxe

    Abstract: Multilingual pre-trained models are known to suffer from the curse of multilinguality, which causes per-language performance to drop as they cover more languages. We address this issue by introducing language-specific modules, which allows us to grow the total capacity of the model, while keeping the total number of trainable parameters per language constant. In contrast with prior work that learn… ▽ More

    Submitted 12 May, 2022; originally announced May 2022.

    Comments: NAACL 2022

  29. arXiv:2205.05812  [pdf, other

    cs.CL cs.LG

    Open Vocabulary Extreme Classification Using Generative Models

    Authors: Daniel Simig, Fabio Petroni, Pouya Yanki, Kashyap Popat, Christina Du, Sebastian Riedel, Majid Yazdani

    Abstract: The extreme multi-label classification (XMC) task aims at tagging content with a subset of labels from an extremely large label set. The label vocabulary is typically defined in advance by domain experts and assumed to capture all necessary tags. However in real world scenarios this label set, although large, is often incomplete and experts frequently need to refine it. To develop systems that sim… ▽ More

    Submitted 11 May, 2022; originally announced May 2022.

  30. arXiv:2204.10628  [pdf, other

    cs.CL cs.IR

    Autoregressive Search Engines: Generating Substrings as Document Identifiers

    Authors: Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen-tau Yih, Sebastian Riedel, Fabio Petroni

    Abstract: Knowledge-intensive language tasks require NLP systems to both provide the correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive language models are emerging as the de-facto standard for generating answers, with newer and more powerful systems emerging at an astonishing pace. In this paper we argue that all this (and future) progress can be directly applied to th… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: 9 pages

  31. arXiv:2112.09924  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    The Web Is Your Oyster - Knowledge-Intensive NLP against a Very Large Web Corpus

    Authors: Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Dmytro Okhonko, Samuel Broscheit, Gautier Izacard, Patrick Lewis, Barlas Oğuz, Edouard Grave, Wen-tau Yih, Sebastian Riedel

    Abstract: In order to address increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web-scale knowledge, lack of structure, inconsistent quality and noise. To this end, we propose a new setup for evaluating existing knowledge intensive tasks in which we generalize the background corpus t… ▽ More

    Submitted 24 May, 2022; v1 submitted 18 December, 2021; originally announced December 2021.

  32. arXiv:2112.09118  [pdf, other

    cs.IR cs.AI cs.CL

    Unsupervised Dense Information Retrieval with Contrastive Learning

    Authors: Gautier Izacard, Mathilde Caron, Lucas Hosseini, Sebastian Riedel, Piotr Bojanowski, Armand Joulin, Edouard Grave

    Abstract: Recently, information retrieval has seen the emergence of dense retrievers, using neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new applications with no training data, and are outperformed by unsupervised… ▽ More

    Submitted 29 August, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  33. arXiv:2112.09062  [pdf, other

    cs.CL

    Models in the Loop: Aiding Crowdworkers with Generative Annotation Assistants

    Authors: Max Bartolo, Tristan Thrush, Sebastian Riedel, Pontus Stenetorp, Robin Jia, Douwe Kiela

    Abstract: In Dynamic Adversarial Data Collection (DADC), human annotators are tasked with finding examples that models struggle to predict correctly. Models trained on DADC-collected training data have been shown to be more robust in adversarial and out-of-domain settings, and are considerably harder for humans to fool. However, DADC is more time-consuming than traditional data collection and thus more cost… ▽ More

    Submitted 17 May, 2022; v1 submitted 16 December, 2021; originally announced December 2021.

  34. arXiv:2112.07771  [pdf, other

    cs.CL cs.IR

    Boosted Dense Retriever

    Authors: Patrick Lewis, Barlas Oğuz, Wenhan Xiong, Fabio Petroni, Wen-tau Yih, Sebastian Riedel

    Abstract: We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrBoost is trained in stages: each component model is learned sequentially and specialized by focusing only on retrieval mistakes made by the current ensemble. The final representation is the concatenation of the output vectors of all the component models, making it a drop-in replacement for standard dense retrievers at test time… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  35. MemPool-3D: Boosting Performance and Efficiency of Shared-L1 Memory Many-Core Clusters with 3D Integration

    Authors: Matheus Cavalcante, Anthony Agnesina, Samuel Riedel, Moritz Brunion, Alberto Garcia-Ortiz, Dragomir Milojevic, Francky Catthoor, Sung Kyu Lim, Luca Benini

    Abstract: Three-dimensional integrated circuits promise power, performance, and footprint gains compared to their 2D counterparts, thanks to drastic reductions in the interconnects' length through their smaller form factor. We can leverage the potential of 3D integration by enhancing MemPool, an open-source many-core design with 256 cores and a shared pool of L1 scratchpad memory connected with a low-latenc… ▽ More

    Submitted 2 December, 2021; originally announced December 2021.

    Comments: Accepted for publication in DATE 2022 -- Design, Automation and Test in Europe Conference

  36. arXiv:2110.04374  [pdf, other

    cs.CL

    A Few More Examples May Be Worth Billions of Parameters

    Authors: Yuval Kirstain, Patrick Lewis, Sebastian Riedel, Omer Levy

    Abstract: We investigate the dynamics of increasing the number of model parameters versus the number of labeled examples across a wide variety of tasks. Our exploration reveals that while scaling parameters consistently yields performance improvements, the contribution of additional examples highly depends on the task's format. Specifically, in open question answering tasks, enlarging the training set does… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  37. arXiv:2110.02834  [pdf, other

    cs.CL cs.AI

    Relation Prediction as an Auxiliary Training Objective for Improving Multi-Relational Graph Representations

    Authors: Yihong Chen, Pasquale Minervini, Sebastian Riedel, Pontus Stenetorp

    Abstract: Learning good representations on multi-relational graphs is essential to knowledge base completion (KBC). In this paper, we propose a new self-supervised training objective for multi-relational graph representation learning, via simply incorporating relation prediction into the commonly used 1vsAll objective. The new training objective contains not only terms for predicting the subject and object… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: AKBC 2021

  38. arXiv:2109.01156  [pdf, other

    cs.CL cs.AI

    Challenges in Generalization in Open Domain Question Answering

    Authors: Linqing Liu, Patrick Lewis, Sebastian Riedel, Pontus Stenetorp

    Abstract: Recent work on Open Domain Question Answering has shown that there is a large discrepancy in model performance between novel test questions and those that largely overlap with training questions. However, it is unclear which aspects of novel questions make them challenging. Drawing upon studies on systematic generalization, we introduce and annotate questions according to three categories that mea… ▽ More

    Submitted 15 May, 2022; v1 submitted 2 September, 2021; originally announced September 2021.

    Comments: NAACL 2022 Findings

  39. arXiv:2108.11357  [pdf, other

    cs.CL

    ProoFVer: Natural Logic Theorem Proving for Fact Verification

    Authors: Amrith Krishna, Sebastian Riedel, Andreas Vlachos

    Abstract: Fact verification systems typically rely on neural network classifiers for veracity prediction which lack explainability. This paper proposes ProoFVer, which uses a seq2seq model to generate natural logic-based inferences as proofs. These proofs consist of lexical mutations between spans in the claim and the evidence retrieved, each marked with a natural logic operator. Claim veracity is determine… ▽ More

    Submitted 3 July, 2022; v1 submitted 25 August, 2021; originally announced August 2021.

    Comments: Accepted to TACL

  40. arXiv:2107.13602  [pdf, other

    cs.CL cs.IR

    Domain-matched Pre-training Tasks for Dense Retrieval

    Authors: Barlas Oğuz, Kushal Lakhotia, Anchit Gupta, Patrick Lewis, Vladimir Karpukhin, Aleksandra Piktus, Xilun Chen, Sebastian Riedel, Wen-tau Yih, Sonal Gupta, Yashar Mehdad

    Abstract: Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased performance across almost all NLP tasks. A notable exception is information retrieval, where additional pre-training has so far failed to produce convincing results. We show that, with the right pre-training setup, this barrier can be overcome. We demonstrate this by pre-training large bi-encoder m… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  41. arXiv:2107.02102  [pdf, other

    cs.CL cs.AI

    Training Adaptive Computation for Open-Domain Question Answering with Computational Constraints

    Authors: Yuxiang Wu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel

    Abstract: Adaptive Computation (AC) has been shown to be effective in improving the efficiency of Open-Domain Question Answering (ODQA) systems. However, current AC approaches require tuning of all model parameters, and training state-of-the-art ODQA models requires significant computational resources that may not be available for most researchers. We propose Adaptive Passage Encoder, an AC method that can… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: 7 pages, 1 figure, to be published in ACL-IJCNLP 2021

  42. arXiv:2106.13353  [pdf, other

    cs.CL cs.LG

    Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

    Authors: Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel

    Abstract: Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning. In this work, we show that finetuning LMs in the few-shot setting can considerably reduce the need for prompt engineering. In fact, one can use null prompts, prompts that contain neither task-specific templates nor training examples, and achieve competiti… ▽ More

    Submitted 1 July, 2021; v1 submitted 24 June, 2021; originally announced June 2021.

  43. arXiv:2106.01074  [pdf, other

    cs.CL cs.AI cs.DB

    Database Reasoning Over Text

    Authors: James Thorne, Majid Yazdani, Marzieh Saeidi, Fabrizio Silvestri, Sebastian Riedel, Alon Halevy

    Abstract: Neural models have shown impressive performance gains in answering queries from natural language text. However, existing works are unable to support database queries, such as "List/Count all female athletes who were born in 20th century", which require reasoning over sets of relevant facts with operations such as join, filtering and aggregation. We show that while state-of-the-art transformer mode… ▽ More

    Submitted 2 June, 2021; originally announced June 2021.

    Comments: To appear at ACL2021

  44. arXiv:2104.14337  [pdf, other

    cs.CL cs.AI

    Dynabench: Rethinking Benchmarking in NLP

    Authors: Douwe Kiela, Max Bartolo, Yixin Nie, Divyansh Kaushik, Atticus Geiger, Zhengxuan Wu, Bertie Vidgen, Grusha Prasad, Amanpreet Singh, Pratik Ringshia, Zhiyi Ma, Tristan Thrush, Sebastian Riedel, Zeerak Waseem, Pontus Stenetorp, Robin Jia, Mohit Bansal, Christopher Potts, Adina Williams

    Abstract: We introduce Dynabench, an open-source platform for dynamic dataset creation and model benchmarking. Dynabench runs in a web browser and supports human-and-model-in-the-loop dataset creation: annotators seek to create examples that a target model will misclassify, but that another person will not. In this paper, we argue that Dynabench addresses a critical need in our community: contemporary model… ▽ More

    Submitted 7 April, 2021; originally announced April 2021.

    Comments: NAACL 2021

  45. arXiv:2104.08786  [pdf, other

    cs.CL cs.AI

    Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity

    Authors: Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, Pontus Stenetorp

    Abstract: When primed with only a handful of training samples, very large, pretrained language models such as GPT-3 have shown competitive results when compared to fully-supervised, fine-tuned, large, pretrained language models. We demonstrate that the order in which the samples are provided can make the difference between near state-of-the-art and random guess performance: essentially some permutations are… ▽ More

    Submitted 3 March, 2022; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: ACL 2022

  46. Improving Question Answering Model Robustness with Synthetic Adversarial Data Generation

    Authors: Max Bartolo, Tristan Thrush, Robin Jia, Sebastian Riedel, Pontus Stenetorp, Douwe Kiela

    Abstract: Despite recent progress, state-of-the-art question answering models remain vulnerable to a variety of adversarial attacks. While dynamic adversarial data collection, in which a human annotator tries to write examples that fool a model-in-the-loop, can improve model robustness, this process is expensive which limits the scale of the collected data. In this work, we are the first to use synthetic ad… ▽ More

    Submitted 15 March, 2022; v1 submitted 17 April, 2021; originally announced April 2021.

    Comments: EMNLP 2021

    Journal ref: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, p.8830-8848. Association for Computational Linguistics

  47. arXiv:2103.12528  [pdf, other

    cs.CL cs.AI stat.ML

    Multilingual Autoregressive Entity Linking

    Authors: Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

    Abstract: We present mGENRE, a sequence-to-sequence system for the Multilingual Entity Linking (MEL) problem -- the task of resolving language-specific mentions to a multilingual Knowledge Base (KB). For a mention in a given language, mGENRE predicts the name of the target entity left-to-right, token-by-token in an autoregressive fashion. The autoregressive formulation allows us to effectively cross-encode… ▽ More

    Submitted 23 March, 2021; originally announced March 2021.

    Comments: 20 pages, 8 figures, and 11 tables

  48. arXiv:2103.05904  [pdf, other

    cs.RO

    Combining Learning from Demonstration with Learning by Exploration to Facilitate Contact-Rich Tasks

    Authors: Yunlei Shi, Zhaopeng Chen, Yansong Wu, Dimitri Henkel, Sebastian Riedel, Hongxu Liu, Qian Feng, Jianwei Zhang

    Abstract: Collaborative robots are expected to be able to work alongside humans and in some cases directly replace existing human workers, thus effectively responding to rapid assembly line changes. Current methods for programming contact-rich tasks, especially in heavily constrained space, tend to be fairly inefficient. Therefore, faster and more intuitive approaches to robot teaching are urgently required… ▽ More

    Submitted 26 October, 2021; v1 submitted 10 March, 2021; originally announced March 2021.

    Comments: Accepted by the 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2021)

  49. arXiv:2102.07033  [pdf, other

    cs.CL cs.AI cs.LG

    PAQ: 65 Million Probably-Asked Questions and What You Can Do With Them

    Authors: Patrick Lewis, Yuxiang Wu, Linqing Liu, Pasquale Minervini, Heinrich Küttler, Aleksandra Piktus, Pontus Stenetorp, Sebastian Riedel

    Abstract: Open-domain Question Answering models which directly leverage question-answer (QA) pairs, such as closed-book QA (CBQA) models and QA-pair retrievers, show promise in terms of speed and memory compared to conventional models which retrieve and read from text corpora. QA-pair retrievers also offer interpretable answers, a high degree of control, and are trivial to update at test time with new knowl… ▽ More

    Submitted 13 February, 2021; originally announced February 2021.

  50. arXiv:2101.00133  [pdf, other

    cs.CL cs.AI

    NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

    Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

    Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track