Skip to main content

Showing 1–32 of 32 results for author: Schick, T

  1. arXiv:2404.06619  [pdf, other

    cs.CL cs.CY cs.LG

    FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations

    Authors: Jane Dwivedi-Yu, Raaz Dwivedi, Timo Schick

    Abstract: The accurate evaluation of differential treatment in language models to specific groups is critical to ensuring a positive and safe user experience. An ideal evaluation should have the properties of being robust, extendable to new groups or attributes, and being able to capture biases that appear in typical usage (rather than just extreme, rare cases). Relatedly, bias evaluation should surface not… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  2. arXiv:2308.12157  [pdf, other

    cs.CL cs.AI

    Evaluation of Faithfulness Using the Longest Supported Subsequence

    Authors: Anirudh Mittal, Timo Schick, Mikel Artetxe, Jane Dwivedi-Yu

    Abstract: As increasingly sophisticated language models emerge, their trustworthiness becomes a pivotal issue, especially in tasks such as summarization and question-answering. Ensuring their responses are contextually grounded and faithful is challenging due to the linguistic diversity and the myriad of possible answers. In this paper, we introduce a novel approach to evaluate faithfulness of machine-gener… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

  3. arXiv:2308.06259  [pdf, other

    cs.CL

    Self-Alignment with Instruction Backtranslation

    Authors: Xian Li, Ping Yu, Chunting Zhou, Timo Schick, Omer Levy, Luke Zettlemoyer, Jason Weston, Mike Lewis

    Abstract: We present a scalable method to build a high quality instruction following language model by automatically labelling human-written text with corresponding instructions. Our approach, named instruction backtranslation, starts with a language model finetuned on a small amount of seed data, and a given web corpus. The seed model is used to construct training examples by generating instruction prompts… ▽ More

    Submitted 12 March, 2024; v1 submitted 11 August, 2023; originally announced August 2023.

    Comments: ICLR2024 camera ready

  4. arXiv:2305.14264  [pdf, other

    cs.CL cs.AI

    Active Learning Principles for In-Context Learning with Large Language Models

    Authors: Katerina Margatina, Timo Schick, Nikolaos Aletras, Jane Dwivedi-Yu

    Abstract: The remarkable advancements in large language models (LLMs) have significantly enhanced the performance in few-shot learning settings. By using only a small number of labeled examples, referred to as demonstrations, LLMs can effectively grasp the task at hand through in-context learning. However, the process of selecting appropriate demonstrations has received limited attention in prior work. This… ▽ More

    Submitted 22 November, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: To appear at Findings of EMNLP (Camera Ready version)

  5. arXiv:2304.08460  [pdf, other

    cs.CL cs.AI cs.LG

    LongForm: Effective Instruction Tuning with Reverse Instructions

    Authors: Abdullatif Köksal, Timo Schick, Anna Korhonen, Hinrich Schütze

    Abstract: Instruction tuning enables language models to more effectively generalize and better follow user intent. However, obtaining instruction data is costly and challenging. Prior work employs methods such as expensive human annotation, crowd-sourced datasets with alignment issues, and generating noisy examples via LLMs. We introduce the LongForm-C dataset, which is created by reverse instructions. We g… ▽ More

    Submitted 14 February, 2024; v1 submitted 17 April, 2023; originally announced April 2023.

    Comments: This version extends the evaluation with new metrics and NLU tasks

  6. arXiv:2302.07842  [pdf, ps, other

    cs.CL

    Augmented Language Models: a Survey

    Authors: Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, Edouard Grave, Yann LeCun, Thomas Scialom

    Abstract: This survey reviews works in which language models (LMs) are augmented with reasoning skills and the ability to use tools. The former is defined as decomposing a potentially complex task into simpler subtasks while the latter consists in calling external modules such as a code interpreter. LMs can leverage these augmentations separately or in combination via heuristics, or learn to do so from demo… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

  7. arXiv:2302.04761  [pdf, other

    cs.CL

    Toolformer: Language Models Can Teach Themselves to Use Tools

    Authors: Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, Thomas Scialom

    Abstract: Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of… ▽ More

    Submitted 9 February, 2023; originally announced February 2023.

  8. arXiv:2212.09689  [pdf, other

    cs.CL cs.AI cs.LG

    Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

    Authors: Or Honovich, Thomas Scialom, Omer Levy, Timo Schick

    Abstract: Instruction tuning enables pretrained language models to perform new tasks from inference-time natural language descriptions. These approaches rely on vast amounts of human supervision in the form of crowdsourced datasets or user interactions. In this work, we introduce Unnatural Instructions: a large dataset of creative and diverse instructions, collected with virtually no human labor. We collect… ▽ More

    Submitted 19 December, 2022; originally announced December 2022.

    Comments: 18 pages, 7 figures

  9. arXiv:2211.09260  [pdf, other

    cs.CL

    Task-aware Retrieval with Instructions

    Authors: Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Wen-tau Yih

    Abstract: We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction tuning, which can follow human-written instructions to find the best documents for a given query. We introduce the first large-scale collection of approximately… ▽ More

    Submitted 19 December, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Code, data and pretrained model checkpoints are available at https://github.com/facebookresearch/tart

  10. arXiv:2211.08358  [pdf, other

    cs.CL

    MEAL: Stable and Active Learning for Few-Shot Prompting

    Authors: Abdullatif Köksal, Timo Schick, Hinrich Schütze

    Abstract: Few-shot classification has made great strides due to foundation models that, through priming and prompting, are highly effective few-shot learners. However, this approach has high variance both across different sets of few shots (data selection) and across different finetuning runs (run variability). This is problematic not only because it impedes the fair comparison of different approaches, but… ▽ More

    Submitted 20 November, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: EMNLP 2023 Findings

  11. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  12. arXiv:2209.13331  [pdf, other

    cs.CL cs.LG

    EditEval: An Instruction-Based Benchmark for Text Improvements

    Authors: Jane Dwivedi-Yu, Timo Schick, Zhengbao Jiang, Maria Lomeli, Patrick Lewis, Gautier Izacard, Edouard Grave, Sebastian Riedel, Fabio Petroni

    Abstract: Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text. Writing, however, is naturally an iterative and incremental process that requires expertise in different modular skills such as fixing outdated information or making the style more consistent. Even so, comprehensive evaluation of a model's capacity to perform th… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

  13. arXiv:2208.11663  [pdf, other

    cs.CL

    PEER: A Collaborative Language Model

    Authors: Timo Schick, Jane Dwivedi-Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard, Qingfei You, Christoforos Nalmpantis, Edouard Grave, Sebastian Riedel

    Abstract: Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes. Agnostic of this process, today's language models are trained to generate only the final result. As a consequence, they lack several abilities crucial for collaborative writing: They are unable to update existing texts, difficult to control and i… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

  14. arXiv:2208.03299  [pdf, other

    cs.CL

    Atlas: Few-shot Learning with Retrieval Augmented Language Models

    Authors: Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave

    Abstract: Large language models have shown impressive few-shot results on a wide range of tasks. However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed. Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is uncl… ▽ More

    Submitted 16 November, 2022; v1 submitted 5 August, 2022; originally announced August 2022.

  15. arXiv:2207.06220  [pdf, other

    cs.IR cs.AI

    Improving Wikipedia Verifiability with AI

    Authors: Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis, Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick, Pierre-Emmanuel Mazaré, Armand Joulin, Edouard Grave, Sebastian Riedel

    Abstract: Verifiability is a core content policy of Wikipedia: claims that are likely to be challenged need to be backed by citations. There are millions of articles available online and thousands of new articles are released each month. For this reason, finding relevant sources is a difficult task: many claims do not have any references that support them. Furthermore, even existing citations might not supp… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  16. arXiv:2206.04615  [pdf, other

    cs.CL cs.AI cs.CY cs.LG stat.ML

    Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

    Authors: Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza , et al. (426 additional authors not shown)

    Abstract: Language models demonstrate both quantitative improvement and new qualitative capabilities with increasing scale. Despite their potentially transformative impact, these new capabilities are as yet poorly characterized. In order to inform future research, prepare for disruptive new model capabilities, and ameliorate socially harmful effects, it is vital that we understand the present and near-futur… ▽ More

    Submitted 12 June, 2023; v1 submitted 9 June, 2022; originally announced June 2022.

    Comments: 27 pages, 17 figures + references and appendices, repo: https://github.com/google/BIG-bench

    Journal ref: Transactions on Machine Learning Research, May/2022, https://openreview.net/forum?id=uyTL5Bvosj

  17. arXiv:2205.12604  [pdf, other

    cs.CL

    Leveraging QA Datasets to Improve Generative Data Augmentation

    Authors: Dheeraj Mekala, Tu Vu, Timo Schick, Jingbo Shang

    Abstract: The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation. In this work, we propose CONDA, an approach to further improve GLMs' ability to generate synthetic data by reformulating data generation as context generation for a given question-answer (QA) pair and leveraging QA datasets for trai… ▽ More

    Submitted 25 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  18. arXiv:2203.06228  [pdf, other

    cs.CL cs.AI

    CoDA21: Evaluating Language Understanding Capabilities of NLP Models With Context-Definition Alignment

    Authors: Lütfi Kerem Senel, Timo Schick, Hinrich Schütze

    Abstract: Pretrained language models (PLMs) have achieved superhuman performance on many benchmarks, creating a need for harder tasks. We introduce CoDA21 (Context Definition Alignment), a challenging benchmark that measures natural language understanding (NLU) capabilities of PLMs: Given a definition and a context each for k words, but not the words themselves, the task is to align the k definitions with t… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: To appear in ACL 2022, 5 pages, 2 figures

  19. arXiv:2202.06133  [pdf, other

    cs.CL

    Semantic-Oriented Unlabeled Priming for Large-Scale Language Models

    Authors: Yanchen Liu, Timo Schick, Hinrich Schütze

    Abstract: Due to the high costs associated with finetuning large language models, various recent works propose to adapt them to specific tasks without any parameter updates through in-context learning. Unfortunately, for in-context learning there is currently no way to leverage unlabeled data, which is often much easier to obtain in large quantities than labeled examples. In this work, we therefore investig… ▽ More

    Submitted 12 February, 2022; originally announced February 2022.

  20. arXiv:2111.13440  [pdf, other

    cs.CL

    True Few-Shot Learning with Prompts -- A Real-World Perspective

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Prompt-based approaches are strong at few-shot learning. However, Perez et al. (2021) have recently cast doubt on their performance because they had difficulty getting good results in a "true" few-shot setting in which prompts and hyperparameters cannot be tuned on a dev set. In view of this, we conduct an extensive study of PET, a method that combines textual instructions with example-based finet… ▽ More

    Submitted 26 November, 2021; originally announced November 2021.

  21. arXiv:2104.07540  [pdf, other

    cs.CL cs.LG

    Generating Datasets with Pretrained Language Models

    Authors: Timo Schick, Hinrich Schütze

    Abstract: To obtain high-quality sentence embeddings from pretrained language models (PLMs), they must either be augmented with additional pretraining objectives or finetuned on a large set of labeled text pairs. While the latter approach typically outperforms the former, it requires great human effort to generate suitable datasets of sufficient size. In this paper, we show how PLMs can be leveraged to obta… ▽ More

    Submitted 4 October, 2021; v1 submitted 15 April, 2021; originally announced April 2021.

    Comments: Accepted at EMNLP2021

  22. arXiv:2103.00453  [pdf, other

    cs.CL

    Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP

    Authors: Timo Schick, Sahana Udupa, Hinrich Schütze

    Abstract: When trained on large, unfiltered crawls from the internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: they often generate racist, sexist, violent or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In t… ▽ More

    Submitted 9 September, 2021; v1 submitted 28 February, 2021; originally announced March 2021.

    Comments: Accepted at TACL

  23. arXiv:2012.11926  [pdf, other

    cs.CL cs.LG

    Few-Shot Text Generation with Pattern-Exploiting Training

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Providing pretrained language models with simple task descriptions in natural language enables them to solve some tasks in a fully unsupervised fashion. Moreover, when combined with regular learning from examples, this idea yields impressive few-shot results for a wide range of text classification tasks. It is also a promising direction to improve data efficiency in generative settings, but there… ▽ More

    Submitted 4 October, 2021; v1 submitted 22 December, 2020; originally announced December 2020.

    Comments: Accepted at EMNLP2021

  24. arXiv:2010.13641  [pdf, other

    cs.CL cs.AI cs.LG

    Automatically Identifying Words That Can Serve as Labels for Few-Shot Text Classification

    Authors: Timo Schick, Helmut Schmid, Hinrich Schütze

    Abstract: A recent approach for few-shot text classification is to convert textual inputs to cloze questions that contain some form of task description, process them with a pretrained language model and map the predicted words to labels. Manually defining this mapping between words and labels requires both domain expertise and an understanding of the language model's abilities. To mitigate this issue, we de… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: To appear at COLING 2020

  25. arXiv:2009.07118  [pdf, other

    cs.CL cs.AI cs.LG

    It's Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

    Authors: Timo Schick, Hinrich Schütze

    Abstract: When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 ca… ▽ More

    Submitted 12 April, 2021; v1 submitted 15 September, 2020; originally announced September 2020.

    Comments: Accepted at NAACL2021

  26. arXiv:2001.07676  [pdf, other

    cs.CL

    Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Some NLP tasks can be solved in a fully unsupervised fashion by providing a pretrained language model with "task descriptions" in natural language (e.g., Radford et al., 2019). While this approach underperforms its supervised counterpart, we show in this work that the two ideas can be combined: We introduce Pattern-Exploiting Training (PET), a semi-supervised training procedure that reformulates i… ▽ More

    Submitted 25 January, 2021; v1 submitted 21 January, 2020; originally announced January 2020.

    Comments: Accepted at EACL2021

  27. arXiv:1910.07181  [pdf, other

    cs.CL

    BERTRAM: Improved Word Embeddings Have Big Impact on Contextualized Model Performance

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Pretraining deep language models has led to large performance gains in NLP. Despite this success, Schick and Schütze (2020) recently showed that these models struggle to understand rare words. For static word embeddings, this problem has been addressed by separately learning representations for rare words. In this work, we transfer this idea to pretrained language models: We introduce BERTRAM, a p… ▽ More

    Submitted 29 April, 2020; v1 submitted 16 October, 2019; originally announced October 2019.

    Comments: Accepted at ACL2020

  28. arXiv:1904.06707  [pdf, ps, other

    cs.CL cs.LG

    Rare Words: A Major Problem for Contextualized Embeddings And How to Fix it by Attentive Mimicking

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a me… ▽ More

    Submitted 4 December, 2019; v1 submitted 14 April, 2019; originally announced April 2019.

    Comments: To appear at AAAI 2020

  29. arXiv:1904.01617  [pdf, other

    cs.CL

    Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Learning high-quality embeddings for rare words is a hard problem because of sparse context information. Mimicking (Pinter et al., 2017) has been proposed as a solution: given embeddings learned by a standard algorithm, a model is first trained to reproduce embeddings of frequent words from their surface form and then used to compute embeddings for rare words. In this paper, we introduce attentive… ▽ More

    Submitted 5 April, 2019; v1 submitted 2 April, 2019; originally announced April 2019.

    Comments: Accepted at NAACL2019

  30. arXiv:1811.03866  [pdf, ps, other

    cs.CL cs.AI

    Learning Semantic Representations for Novel Words: Leveraging Both Form and Context

    Authors: Timo Schick, Hinrich Schütze

    Abstract: Word embeddings are a key component of high-performing natural language processing (NLP) systems, but it remains a challenge to learn good representations for novel words on the fly, i.e., for words that did not occur in the training data. The general problem setting is that word embeddings are induced on an unlabeled training corpus and then a model is trained that embeds novel words into this in… ▽ More

    Submitted 9 November, 2018; originally announced November 2018.

    Comments: AAAI 2019

  31. arXiv:1707.07591  [pdf, other

    cs.CL

    Transition-Based Generation from Abstract Meaning Representations

    Authors: Timo Schick

    Abstract: This work addresses the task of generating English sentences from Abstract Meaning Representation (AMR) graphs. To cope with this task, we transform each input AMR graph into a structure similar to a dependency tree and annotate it with syntactic information by applying various predefined actions to it. Subsequently, a sentence is obtained from this tree structure by visiting its nodes in a specif… ▽ More

    Submitted 24 July, 2017; originally announced July 2017.

  32. arXiv:0912.0284  [pdf, ps, other

    math.KT cs.CG math.GT math.NA stat.ML

    Hodge Theory on Metric Spaces

    Authors: Laurent Bartholdi, Thomas Schick, Nat Smale, Steve Smale, Anthony W. Baker

    Abstract: Hodge theory is a beautiful synthesis of geometry, topology, and analysis, which has been developed in the setting of Riemannian manifolds. On the other hand, spaces of images, which are important in the mathematical foundations of vision and pattern recognition, do not fit this framework. This motivates us to develop a version of Hodge theory on metric spaces with a probability measure. We believ… ▽ More

    Submitted 24 November, 2011; v1 submitted 1 December, 2009; originally announced December 2009.

    Comments: appendix by Anthony W. Baker, 48 pages, AMS-LaTeX. v2: final version, to appear in Foundations of Computational Mathematics. Minor changes and additions

    MSC Class: 58A14; 54E05; 55P55; 57M50

    Journal ref: Foundations of Computational Mathematics 12:1 (2012) 1-48