Skip to main content

Showing 1–50 of 78 results for author: Yih, W

  1. arXiv:2406.04744  [pdf, other

    cs.CL

    CRAG -- Comprehensive RAG Benchmark

    Authors: Xiao Yang, Kai Sun, Hao Xin, Yushi Sun, Nikita Bhalla, Xiangsen Chen, Sajal Choudhary, Rongze Daniel Gui, Ziran Will Jiang, Ziyu Jiang, Lingkun Kong, Brian Moran, Jiaqi Wang, Yifan Ethan Xu, An Yan, Chenyu Yang, Eting Yuan, Hanwen Zha, Nan Tang, Lei Chen, Nicolas Scheffer, Yue Liu, Nirav Shah, Rakesh Wanga, Anuj Kumar , et al. (2 additional authors not shown)

    Abstract: Retrieval-Augmented Generation (RAG) has recently emerged as a promising solution to alleviate Large Language Model (LLM)'s deficiency in lack of knowledge. Existing RAG datasets, however, do not adequately represent the diverse and dynamic nature of real-world Question Answering (QA) tasks. To bridge this gap, we introduce the Comprehensive RAG Benchmark (CRAG), a factual question answering bench… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  2. arXiv:2405.19325  [pdf, other

    cs.CL

    Nearest Neighbor Speculative Decoding for LLM Generation and Attribution

    Authors: Minghan Li, Xilun Chen, Ari Holtzman, Beidi Chen, Jimmy Lin, Wen-tau Yih, Xi Victoria Lin

    Abstract: Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts. In this paper, w… ▽ More

    Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  3. arXiv:2405.01525  [pdf, other

    cs.CL cs.AI

    FLAME: Factuality-Aware Alignment for Large Language Models

    Authors: Sheng-Chieh Lin, Luyu Gao, Barlas Oguz, Wenhan Xiong, Jimmy Lin, Wen-tau Yih, Xilun Chen

    Abstract: Alignment is a standard procedure to fine-tune pre-trained large language models (LLMs) to follow natural language instructions and serve as helpful AI assistants. We have observed, however, that the conventional alignment process fails to enhance the factual accuracy of LLMs, and often leads to the generation of more false facts (i.e. hallucination). In this paper, we study how to make the LLM al… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

  4. arXiv:2404.16030  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    MoDE: CLIP Data Experts via Clustering

    Authors: Jiawei Ma, Po-Yao Huang, Saining Xie, Shang-Wen Li, Luke Zettlemoyer, Shih-Fu Chang, Wen-Tau Yih, Hu Xu

    Abstract: The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inferen… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

    Comments: IEEE CVPR 2024 Camera Ready. Code Link: https://github.com/facebookresearch/MetaCLIP/tree/main/mode

  5. arXiv:2403.07816  [pdf, other

    cs.CL cs.AI

    Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

    Authors: Sainbayar Sukhbaatar, Olga Golovneva, Vasu Sharma, Hu Xu, Xi Victoria Lin, Baptiste Rozière, Jacob Kahn, Daniel Li, Wen-tau Yih, Jason Weston, Xian Li

    Abstract: We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  6. arXiv:2403.03187  [pdf, other

    cs.CL cs.AI cs.LG

    Reliable, Adaptable, and Attributable Language Models with Retrieval

    Authors: Akari Asai, Zexuan Zhong, Danqi Chen, Pang Wei Koh, Luke Zettlemoyer, Hannaneh Hajishirzi, Wen-tau Yih

    Abstract: Parametric language models (LMs), which are trained on vast amounts of web data, exhibit remarkable flexibility and capability. However, they still face practical challenges such as hallucinations, difficulty in adapting to new data distributions, and a lack of verifiability. In this position paper, we advocate for retrieval-augmented LMs to replace parametric LMs as the next generation of LMs. By… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  7. arXiv:2402.12847  [pdf, other

    cs.CL cs.AI cs.LG

    Instruction-tuned Language Models are Better Knowledge Learners

    Authors: Zhengbao Jiang, Zhiqing Sun, Weijia Shi, Pedro Rodriguez, Chunting Zhou, Graham Neubig, Xi Victoria Lin, Wen-tau Yih, Srinivasan Iyer

    Abstract: In order for large language model (LLM)-based assistants to effectively adapt to evolving information needs, it must be possible to update their factual knowledge through continued training on new data. The standard recipe for doing so involves continued pre-training on new documents followed by instruction-tuning on question-answer (QA) pairs. However, we find that LLMs trained with this recipe s… ▽ More

    Submitted 25 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: ACL 2024. The reproduced data for this paper is available at https://github.com/Edward-Sun/PIT

  8. arXiv:2305.17080  [pdf, other

    cs.CL

    Expand, Rerank, and Retrieve: Query Reranking for Open-Domain Question Answering

    Authors: Yung-Sung Chuang, Wei Fang, Shang-Wen Li, Wen-tau Yih, James Glass

    Abstract: We propose EAR, a query Expansion And Reranking approach for improving passage retrieval, with the application to open-domain question answering. EAR first applies a query expansion model to generate a diverse set of queries, and then uses a query reranker to select the ones that could lead to better retrieval results. Motivated by the observation that the best query expansion often is not picked… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: ACL 2023 long paper (Findings)

  9. arXiv:2305.14739  [pdf, other

    cs.CL

    Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

    Authors: Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih

    Abstract: Language models (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations. To mitigate this issue, we present context-aware decoding (CAD), which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context. Our experiments show that CA… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  10. arXiv:2305.14251  [pdf, other

    cs.CL cs.AI cs.LG

    FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation

    Authors: Sewon Min, Kalpesh Krishna, Xinxi Lyu, Mike Lewis, Wen-tau Yih, Pang Wei Koh, Mohit Iyyer, Luke Zettlemoyer, Hannaneh Hajishirzi

    Abstract: Evaluating the factuality of long-form text generated by large language models (LMs) is non-trivial because (1) generations often contain a mixture of supported and unsupported pieces of information, making binary judgments of quality inadequate, and (2) human evaluation is time-consuming and costly. In this paper, we introduce FACTSCORE, a new evaluation that breaks a generation into a series of… ▽ More

    Submitted 11 October, 2023; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 25 pages; 7 figures. Published as a main conference paper at EMNLP 2023. Code available at https://github.com/shmsw25/FActScore

  11. arXiv:2305.13691  [pdf, other

    cs.CL

    Few-Shot Data Synthesis for Open Domain Multi-Hop Question Answering

    Authors: Mingda Chen, Xilun Chen, Wen-tau Yih

    Abstract: Few-shot learning for open domain multi-hop question answering typically relies on the incontext learning capability of large language models (LLMs). While powerful, these LLMs usually contain tens or hundreds of billions of parameters, making them rather inefficient at inference time. To improve performance of smaller language models, we propose a data synthesis framework for multi-hop question a… ▽ More

    Submitted 12 February, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: EACL 2024 Camera Ready

  12. arXiv:2305.08195  [pdf, other

    cs.CL

    Learning to Simulate Natural Language Feedback for Interactive Semantic Parsing

    Authors: Hao Yan, Saurabh Srivastava, Yintao Tai, Sida I. Wang, Wen-tau Yih, Ziyu Yao

    Abstract: Interactive semantic parsing based on natural language (NL) feedback, where users provide feedback to correct the parser mistakes, has emerged as a more practical scenario than the traditional one-shot semantic parsing. However, prior work has heavily relied on human-annotated feedback data to train the interactive semantic parser, which is prohibitively expensive and not scalable. In this work, w… ▽ More

    Submitted 4 June, 2023; v1 submitted 14 May, 2023; originally announced May 2023.

    Comments: Accepted to ACL 2023. 18 pages, 6 figures

  13. arXiv:2305.05364  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Model Programs

    Authors: Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

    Abstract: In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples. The possibility to parameterise an LLM through such in-context examples widens their capability at a much lower cost than finetuning. We extend this line of reasoning and present a method which further expands the capabilities of an LLM by embe… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  14. arXiv:2305.03204  [pdf, other

    cs.CV cs.CL

    VideoOFA: Two-Stage Pre-Training for Video-to-Text Generation

    Authors: Xilun Chen, Lili Yu, Wenhan Xiong, Barlas Oğuz, Yashar Mehdad, Wen-tau Yih

    Abstract: We propose a new two-stage pre-training framework for video-to-text generation tasks such as video captioning and video question answering: A generative encoder-decoder model is first jointly pre-trained on massive image-text data to learn fundamental vision-language concepts, and then adapted to video data in an intermediate video-text pre-training stage to learn video-specific skills such as spa… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  15. arXiv:2302.08468  [pdf, other

    cs.LG cs.CL cs.PL cs.SE

    LEVER: Learning to Verify Language-to-Code Generation with Execution

    Authors: Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov, Wen-tau Yih, Sida I. Wang, Xi Victoria Lin

    Abstract: The advent of large language models trained on code (code LLMs) has led to significant progress in language-to-code generation. State-of-the-art approaches in this area combine LLM decoding with sample pruning and reranking using test cases or heuristics based on the execution results. However, it is challenging to obtain test cases for many real-world language-to-code applications, and heuristics… ▽ More

    Submitted 1 September, 2023; v1 submitted 16 February, 2023; originally announced February 2023.

    Comments: ICML'23; code available at https://github.com/niansong1996/lever

  16. arXiv:2302.07452  [pdf, other

    cs.IR cs.CL

    How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

    Authors: Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

    Abstract: Various techniques have been developed in recent years to improve dense retrieval (DR), such as unsupervised contrastive learning and pseudo-query generation. Existing DRs, however, often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval, which some argue was due to the limited model capacity. We contradict this hypothesis and show that a generalizable DR can be traine… ▽ More

    Submitted 14 February, 2023; originally announced February 2023.

  17. arXiv:2301.12652  [pdf, other

    cs.CL

    REPLUG: Retrieval-Augmented Black-Box Language Models

    Authors: Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

    Abstract: We introduce REPLUG, a retrieval-augmented language modeling framework that treats the language model (LM) as a black box and augments it with a tuneable retrieval model. Unlike prior retrieval-augmented LMs that train language models with special cross attention mechanisms to encode the retrieved text, REPLUG simply prepends retrieved documents to the input for the frozen black-box LM. This simpl… ▽ More

    Submitted 24 May, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

  18. arXiv:2212.09741  [pdf, other

    cs.CL

    One Embedder, Any Task: Instruction-Finetuned Text Embeddings

    Authors: Hongjin Su, Weijia Shi, Jungo Kasai, Yizhong Wang, Yushi Hu, Mari Ostendorf, Wen-tau Yih, Noah A. Smith, Luke Zettlemoyer, Tao Yu

    Abstract: We introduce INSTRUCTOR, a new method for computing text embeddings given task instructions: every text input is embedded together with instructions explaining the use case (e.g., task and domain descriptions). Unlike encoders from prior work that are more specialized, INSTRUCTOR is a single embedder that can generate text embeddings tailored to different downstream tasks and domains, without any… ▽ More

    Submitted 30 May, 2023; v1 submitted 19 December, 2022; originally announced December 2022.

    Comments: Accepted in ACL2023 Findings

  19. arXiv:2212.09726  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences

    Authors: Asish Ghoshal, Arash Einolghozati, Ankit Arun, Haoran Li, Lili Yu, Vera Gor, Yashar Mehdad, Scott Wen-tau Yih, Asli Celikyilmaz

    Abstract: Lack of factual correctness is an issue that still plagues state-of-the-art summarization systems despite their impressive progress on generating seemingly fluent summaries. In this paper, we show that factual inconsistency can be caused by irrelevant parts of the input text, which act as confounders. To that end, we leverage information-theoretic measures of causal effects to quantify the amount… ▽ More

    Submitted 18 January, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

  20. arXiv:2212.01349  [pdf, other

    cs.CL cs.AI cs.LG

    Nonparametric Masked Language Modeling

    Authors: Sewon Min, Weijia Shi, Mike Lewis, Xilun Chen, Wen-tau Yih, Hannaneh Hajishirzi, Luke Zettlemoyer

    Abstract: Existing language models (LMs) predict tokens with a softmax over a finite vocabulary, which can make it difficult to predict rare tokens or phrases. We introduce NPM, the first nonparametric masked language model that replaces this softmax with a nonparametric distribution over every phrase in a reference corpus. NPM fills in the [MASK] solely from retrieving a token from a text corpus. We show t… ▽ More

    Submitted 25 May, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

    Comments: 20 pages; 9 figures. Published at ACL 2023 Findings. Code available at https://github.com/facebookresearch/NPM

  21. arXiv:2211.16490  [pdf, other

    cs.LG cs.CL cs.PL cs.SE

    Coder Reviewer Reranking for Code Generation

    Authors: Tianyi Zhang, Tao Yu, Tatsunori B. Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I. Wang

    Abstract: Sampling diverse programs from a code language model and reranking with model likelihood is a popular method for code generation but it is prone to preferring degenerate solutions. Inspired by collaborative programming, we propose Coder-Reviewer reranking. We augment Coder language models from past work, which generate programs given language instructions, with Reviewer models, which evaluate the… ▽ More

    Submitted 29 November, 2022; originally announced November 2022.

  22. arXiv:2211.12561  [pdf, other

    cs.CV cs.CL cs.LG

    Retrieval-Augmented Multimodal Language Modeling

    Authors: Michihiro Yasunaga, Armen Aghajanyan, Weijia Shi, Rich James, Jure Leskovec, Percy Liang, Mike Lewis, Luke Zettlemoyer, Wen-tau Yih

    Abstract: Recent multimodal models such as DALL-E and CM3 have achieved remarkable progress in text-to-image and image-to-text generation. However, these models store all learned knowledge (e.g., the appearance of the Eiffel Tower) in the model parameters, requiring increasingly larger models and training data to capture more knowledge. To integrate knowledge in a more scalable and modular way, we propose a… ▽ More

    Submitted 5 June, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: Published at ICML 2023. Blog post available at https://cs.stanford.edu/~myasu/blog/racm3/

  23. arXiv:2211.11501  [pdf, other

    cs.SE cs.CL

    DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

    Authors: Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

    Abstract: We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas. Compared to prior works, DS-1000 incorporates three core features. First, our problems reflect diverse, realistic, and practical use cases since we collected them from StackOverflow. Second, our automatic evaluation is highly specific (reliable) -- acro… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  24. arXiv:2211.10411  [pdf, other

    cs.IR cs.CL

    CITADEL: Conditional Token Interaction via Dynamic Lexical Routing for Efficient and Effective Multi-Vector Retrieval

    Authors: Minghan Li, Sheng-Chieh Lin, Barlas Oguz, Asish Ghoshal, Jimmy Lin, Yashar Mehdad, Wen-tau Yih, Xilun Chen

    Abstract: Multi-vector retrieval methods combine the merits of sparse (e.g. BM25) and dense (e.g. DPR) retrievers and have achieved state-of-the-art performance on various retrieval tasks. These methods, however, are orders of magnitude slower and need much more space to store their indices compared to their single-vector counterparts. In this paper, we unify different multi-vector retrieval models from a t… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  25. arXiv:2211.09260  [pdf, other

    cs.CL

    Task-aware Retrieval with Instructions

    Authors: Akari Asai, Timo Schick, Patrick Lewis, Xilun Chen, Gautier Izacard, Sebastian Riedel, Hannaneh Hajishirzi, Wen-tau Yih

    Abstract: We study the problem of retrieval with instructions, where users of a retrieval system explicitly describe their intent along with their queries. We aim to develop a general-purpose task-aware retrieval system using multi-task instruction tuning, which can follow human-written instructions to find the best documents for a given query. We introduce the first large-scale collection of approximately… ▽ More

    Submitted 19 December, 2022; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: Code, data and pretrained model checkpoints are available at https://github.com/facebookresearch/tart

  26. arXiv:2210.14353  [pdf, other

    cs.CL

    RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question Answering

    Authors: Victor Zhong, Weijia Shi, Wen-tau Yih, Luke Zettlemoyer

    Abstract: We introduce RoMQA, the first benchmark for robust, multi-evidence, multi-answer question answering (QA). RoMQA contains clusters of questions that are derived from related constraints mined from the Wikidata knowledge graph. RoMQA evaluates robustness of QA models to varying constraints by measuring worst-case performance within each question cluster. Compared to prior QA datasets, RoMQA has more… ▽ More

    Submitted 15 November, 2022; v1 submitted 25 October, 2022; originally announced October 2022.

    Comments: The source code and evaluation for RoMQA are at https://github.com/facebookresearch/romqa

  27. arXiv:2209.10052  [pdf, other

    cs.CL

    Adapting Pretrained Text-to-Text Models for Long Text Sequences

    Authors: Wenhan Xiong, Anchit Gupta, Shubham Toshniwal, Yashar Mehdad, Wen-tau Yih

    Abstract: We present an empirical study of adapting an existing pretrained text-to-text model for long-sequence inputs. Through a comprehensive study along three axes of the pretraining pipeline -- model architecture, optimization objective, and pretraining corpus, we propose an effective recipe to build long-context models from existing short-context models. Specifically, we replace the full attention in t… ▽ More

    Submitted 16 November, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

  28. arXiv:2205.12309  [pdf, other

    cs.CL

    Structured Prompt Tuning

    Authors: Chi-Liang Liu, Hung-yi Lee, Wen-tau Yih

    Abstract: We propose structured prompt tuning, a simple and effective method to improve prompt tuning. Instead of prepending a sequence of tunable embeddings to the input, we generate the soft prompt embeddings through a hypernetwork. Our approach subsumes the standard prompt tuning, allows more flexibility in model design and can be applied to both single-task and multi-task training settings. Empirically,… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  29. arXiv:2205.02014  [pdf, other

    cs.CL cs.AI cs.LG

    On Continual Model Refinement in Out-of-Distribution Data Streams

    Authors: Bill Yuchen Lin, Sida Wang, Xi Victoria Lin, Robin Jia, Lin Xiao, Xiang Ren, Wen-tau Yih

    Abstract: Real-world natural language processing (NLP) models need to be continually updated to fix the prediction errors in out-of-distribution (OOD) data streams while overcoming catastrophic forgetting. However, existing continual learning (CL) problem setups cannot cover such a realistic and complex scenario. In response to this, we propose a new CL problem formulation dubbed continual model refinement… ▽ More

    Submitted 4 May, 2022; originally announced May 2022.

    Comments: Accepted to ACL 2022; Project website: https://cmr-nlp.github.io/

  30. arXiv:2204.10628  [pdf, other

    cs.CL cs.IR

    Autoregressive Search Engines: Generating Substrings as Document Identifiers

    Authors: Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen-tau Yih, Sebastian Riedel, Fabio Petroni

    Abstract: Knowledge-intensive language tasks require NLP systems to both provide the correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive language models are emerging as the de-facto standard for generating answers, with newer and more powerful systems emerging at an astonishing pace. In this paper we argue that all this (and future) progress can be directly applied to th… ▽ More

    Submitted 22 April, 2022; originally announced April 2022.

    Comments: 9 pages

  31. arXiv:2204.10298  [pdf, other

    cs.CL

    DiffCSE: Difference-based Contrastive Learning for Sentence Embeddings

    Authors: Yung-Sung Chuang, Rumen Dangovski, Hongyin Luo, Yang Zhang, Shiyu Chang, Marin Soljačić, Shang-Wen Li, Wen-tau Yih, Yoon Kim, James Glass

    Abstract: We propose DiffCSE, an unsupervised contrastive learning framework for learning sentence embeddings. DiffCSE learns sentence embeddings that are sensitive to the difference between the original sentence and an edited sentence, where the edited sentence is obtained by stochastically masking out the original sentence and then sampling from a masked language model. We show that DiffSCE is an instance… ▽ More

    Submitted 21 April, 2022; originally announced April 2022.

    Comments: NAACL 2022 main conference (Long paper). Pretrained models and code are available at https://github.com/voidism/DiffCSE

  32. arXiv:2204.07496  [pdf, other

    cs.CL cs.IR

    Improving Passage Retrieval with Zero-Shot Question Generation

    Authors: Devendra Singh Sachan, Mike Lewis, Mandar Joshi, Armen Aghajanyan, Wen-tau Yih, Joelle Pineau, Luke Zettlemoyer

    Abstract: We propose a simple and effective re-ranking method for improving passage retrieval in open question answering. The re-ranker re-scores retrieved passages with a zero-shot question generation model, which uses a pre-trained language model to compute the probability of the input question conditioned on a retrieved passage. This approach can be applied on top of any retrieval method (e.g. neural or… ▽ More

    Submitted 2 April, 2023; v1 submitted 15 April, 2022; originally announced April 2022.

    Comments: EMNLP 2022 camera-ready version. Code is available at: https://github.com/DevSinghSachan/unsupervised-passage-reranking

  33. arXiv:2204.05999  [pdf, other

    cs.SE cs.CL cs.LG

    InCoder: A Generative Model for Code Infilling and Synthesis

    Authors: Daniel Fried, Armen Aghajanyan, Jessy Lin, Sida Wang, Eric Wallace, Freda Shi, Ruiqi Zhong, Wen-tau Yih, Luke Zettlemoyer, Mike Lewis

    Abstract: Code is seldom written in a single left-to-right pass and is instead repeatedly edited and refined. We introduce InCoder, a unified generative model that can perform program synthesis (via left-to-right generation) as well as editing (via infilling). InCoder is trained to generate code files from a large corpus of permissively licensed code, where regions of code have been randomly masked and move… ▽ More

    Submitted 9 April, 2023; v1 submitted 12 April, 2022; originally announced April 2022.

    Comments: ICLR 2023. v3: camera-ready that includes PLBART and OpenAI baselines

  34. arXiv:2112.09924  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    The Web Is Your Oyster - Knowledge-Intensive NLP against a Very Large Web Corpus

    Authors: Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Dmytro Okhonko, Samuel Broscheit, Gautier Izacard, Patrick Lewis, Barlas Oğuz, Edouard Grave, Wen-tau Yih, Sebastian Riedel

    Abstract: In order to address increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web-scale knowledge, lack of structure, inconsistent quality and noise. To this end, we propose a new setup for evaluating existing knowledge intensive tasks in which we generalize the background corpus t… ▽ More

    Submitted 24 May, 2022; v1 submitted 18 December, 2021; originally announced December 2021.

  35. arXiv:2112.07771  [pdf, other

    cs.CL cs.IR

    Boosted Dense Retriever

    Authors: Patrick Lewis, Barlas Oğuz, Wenhan Xiong, Fabio Petroni, Wen-tau Yih, Sebastian Riedel

    Abstract: We propose DrBoost, a dense retrieval ensemble inspired by boosting. DrBoost is trained in stages: each component model is learned sequentially and specialized by focusing only on retrieval mistakes made by the current ensemble. The final representation is the concatenation of the output vectors of all the component models, making it a drop-in replacement for standard dense retrievers at test time… ▽ More

    Submitted 14 December, 2021; originally announced December 2021.

  36. arXiv:2112.07210  [pdf, other

    cs.CL

    Simple Local Attentions Remain Competitive for Long-Context Tasks

    Authors: Wenhan Xiong, Barlas Oğuz, Anchit Gupta, Xilun Chen, Diana Liskovich, Omer Levy, Wen-tau Yih, Yashar Mehdad

    Abstract: Many NLP tasks require processing long contexts beyond the length limit of pretrained models. In order to scale these models to longer text sequences, many efficient long-range attention variants have been proposed. Despite the abundance of research along this direction, it is still difficult to gauge the relative effectiveness of these models in practical use cases, e.g., if we apply these models… ▽ More

    Submitted 3 May, 2022; v1 submitted 14 December, 2021; originally announced December 2021.

    Comments: NAACL 2022 Main Conference

  37. arXiv:2110.07731  [pdf, other

    cs.CL cs.LG

    CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training

    Authors: Patrick Huber, Armen Aghajanyan, Barlas Oğuz, Dmytro Okhonko, Wen-tau Yih, Sonal Gupta, Xilun Chen

    Abstract: With the rise of large-scale pre-trained language models, open-domain question-answering (ODQA) has become an important research topic in NLP. Based on the popular pre-training fine-tuning approach, we posit that an additional in-domain pre-training stage using a large-scale, natural, and diverse question-answering (QA) dataset can be beneficial for ODQA. Consequently, we propose a novel QA datase… ▽ More

    Submitted 2 May, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: 9 pages, Findings of NAACL 2022

  38. arXiv:2110.07577  [pdf, other

    cs.CL cs.AI cs.LG

    UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

    Authors: Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei Han, Wen-tau Yih, Madian Khabsa

    Abstract: Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited. However, different PELT methods may perform rather differently on the same task, making it nontrivial to select the most appropriate method for a specific task, especially considering the fast-… ▽ More

    Submitted 4 September, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: ACL 2022 (w. typo fixes)

  39. arXiv:2110.06918  [pdf, other

    cs.CL cs.IR cs.LG

    Salient Phrase Aware Dense Retrieval: Can a Dense Retriever Imitate a Sparse One?

    Authors: Xilun Chen, Kushal Lakhotia, Barlas Oğuz, Anchit Gupta, Patrick Lewis, Stan Peshterliev, Yashar Mehdad, Sonal Gupta, Wen-tau Yih

    Abstract: Despite their recent popularity and well-known advantages, dense retrievers still lag behind sparse methods such as BM25 in their ability to reliably match salient phrases and rare entities in the query and to generalize to out-of-domain data. It has been argued that this is an inherent limitation of dense models. We rebut this claim by introducing the Salient Phrase Aware Retriever (SPAR), a dens… ▽ More

    Submitted 11 November, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

  40. arXiv:2107.13602  [pdf, other

    cs.CL cs.IR

    Domain-matched Pre-training Tasks for Dense Retrieval

    Authors: Barlas Oğuz, Kushal Lakhotia, Anchit Gupta, Patrick Lewis, Vladimir Karpukhin, Aleksandra Piktus, Xilun Chen, Sebastian Riedel, Wen-tau Yih, Sonal Gupta, Yashar Mehdad

    Abstract: Pre-training on larger datasets with ever increasing model size is now a proven recipe for increased performance across almost all NLP tasks. A notable exception is information retrieval, where additional pre-training has so far failed to produce convincing results. We show that, with the right pre-training setup, this barrier can be overcome. We demonstrate this by pre-training large bi-encoder m… ▽ More

    Submitted 28 July, 2021; originally announced July 2021.

  41. arXiv:2106.00872  [pdf, other

    cs.CL cs.AI cs.LG

    On the Efficacy of Adversarial Data Collection for Question Answering: Results from a Large-Scale Randomized Study

    Authors: Divyansh Kaushik, Douwe Kiela, Zachary C. Lipton, Wen-tau Yih

    Abstract: In adversarial data collection (ADC), a human workforce interacts with a model in real time, attempting to produce examples that elicit incorrect predictions. Researchers hope that models trained on these more challenging datasets will rely less on superficial patterns, and thus be less brittle. However, despite ADC's intuitive appeal, it remains unclear when training on adversarial datasets produ… ▽ More

    Submitted 1 June, 2021; originally announced June 2021.

    Comments: Accepted at ACL-IJCNLP 2021

  42. arXiv:2104.08840  [pdf, other

    cs.CL cs.LG

    On the Influence of Masking Policies in Intermediate Pre-training

    Authors: Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa

    Abstract: Current NLP models are predominantly trained through a two-stage "pre-train then fine-tune" pipeline. Prior work has shown that inserting an intermediate pre-training stage, using heuristic masking policies for masked language modeling (MLM), can significantly improve final performance. However, it is still unclear (1) in what cases such intermediate pre-training is helpful, (2) whether hand-craft… ▽ More

    Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: Accepted to EMNLP 2021. Camera-ready version

  43. arXiv:2104.05243  [pdf, other

    cs.AI cs.CL

    On Unifying Misinformation Detection

    Authors: Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, Madian Khabsa

    Abstract: In this paper, we introduce UnifiedM2, a general-purpose misinformation model that jointly models multiple domains of misinformation with a single, unified setup. The model is trained to handle four tasks: detecting news bias, clickbait, fake news, and verifying rumors. By grouping these tasks together, UnifiedM2learns a richer representation of misinformation, which leads to state-of-the-art or c… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted to NAACL2021

  44. arXiv:2101.00133  [pdf, other

    cs.CL cs.AI

    NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

    Authors: Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini , et al. (28 additional authors not shown)

    Abstract: We review the EfficientQA competition from NeurIPS 2020. The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers. The aim of the competition was to build systems that can predict correct answers while also satisfying strict on-disk memory budgets. These memory budgets were designed to encourage conte… ▽ More

    Submitted 19 September, 2021; v1 submitted 31 December, 2020; originally announced January 2021.

    Comments: 26 pages; Published in Proceedings of Machine Learning Research (PMLR), NeurIPS 2020 Competition and Demonstration Track

  45. arXiv:2101.00117  [pdf, other

    cs.CL

    Multi-task Retrieval for Knowledge-Intensive Tasks

    Authors: Jean Maillard, Vladimir Karpukhin, Fabio Petroni, Wen-tau Yih, Barlas Oğuz, Veselin Stoyanov, Gargi Ghosh

    Abstract: Retrieving relevant contexts from a large corpus is a crucial step for tasks such as open-domain question answering and fact checking. Although neural retrieval outperforms traditional methods like tf-idf and BM25, its performance degrades considerably when applied to out-of-domain data. Driven by the question of whether a neural retrieval model can be universal and perform robustly on a wide va… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

  46. arXiv:2012.15856  [pdf, other

    cs.CL cs.AI

    Studying Strategically: Learning to Mask for Closed-book QA

    Authors: Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa

    Abstract: Closed-book question-answering (QA) is a challenging task that requires a model to directly answer questions without access to external knowledge. It has been shown that directly fine-tuning pre-trained language models with (question, answer) examples yields surprisingly competitive performance, which is further improved upon through adding an intermediate pre-training stage between general pre-tr… ▽ More

    Submitted 1 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  47. arXiv:2012.15482  [pdf, other

    cs.CL

    FiD-Ex: Improving Sequence-to-Sequence Models for Extractive Rationale Generation

    Authors: Kushal Lakhotia, Bhargavi Paranjape, Asish Ghoshal, Wen-tau Yih, Yashar Mehdad, Srinivasan Iyer

    Abstract: Natural language (NL) explanations of model predictions are gaining popularity as a means to understand and verify decisions made by large black-box pre-trained models, for NLP tasks such as Question Answering (QA) and Fact Verification. Recently, pre-trained sequence to sequence (seq2seq) models have proven to be very effective in jointly making predictions, as well as generating NL explanations.… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

  48. Joint Verification and Reranking for Open Fact Checking Over Tables

    Authors: Michael Schlichtkrull, Vladimir Karpukhin, Barlas Oğuz, Mike Lewis, Wen-tau Yih, Sebastian Riedel

    Abstract: Structured information is an important knowledge source for automatic verification of factual claims. Nevertheless, the majority of existing research into this task has focused on textual data, and the few recent inquiries into structured data have been for the closed-domain setting where appropriate evidence for each claim is assumed to have already been retrieved. In this paper, we investigate v… ▽ More

    Submitted 20 August, 2021; v1 submitted 30 December, 2020; originally announced December 2020.

  49. arXiv:2010.10757  [pdf, other

    cs.CL

    RECONSIDER: Re-Ranking using Span-Focused Cross-Attention for Open Domain Question Answering

    Authors: Srinivasan Iyer, Sewon Min, Yashar Mehdad, Wen-tau Yih

    Abstract: State-of-the-art Machine Reading Comprehension (MRC) models for Open-domain Question Answering (QA) are typically trained for span selection using distantly supervised positive examples and heuristically retrieved negative examples. This training scheme possibly explains empirical observations that these models achieve a high recall amongst their top few predictions, but a low overall accuracy, mo… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  50. arXiv:2010.02413  [pdf, other

    cs.CL cs.AI

    Efficient One-Pass End-to-End Entity Linking for Questions

    Authors: Belinda Z. Li, Sewon Min, Srinivasan Iyer, Yashar Mehdad, Wen-tau Yih

    Abstract: We present ELQ, a fast end-to-end entity linking model for questions, which uses a biencoder to jointly perform mention detection and linking in one pass. Evaluated on WebQSP and GraphQuestions with extended annotations that cover multiple entities per question, ELQ outperforms the previous state of the art by a large margin of +12.7% and +19.6% F1, respectively. With a very fast inference time (1… ▽ More

    Submitted 5 October, 2020; originally announced October 2020.

    Comments: 9 pages, EMNLP 2020