Skip to main content

Showing 1–10 of 10 results for author: Yih, S

  1. arXiv:2310.10638  [pdf, other

    cs.CL cs.AI cs.LG

    In-context Pretraining: Language Modeling Beyond Document Boundaries

    Authors: Weijia Shi, Sewon Min, Maria Lomeli, Chunting Zhou, Margaret Li, Gergely Szilvasy, Rich James, Xi Victoria Lin, Noah A. Smith, Luke Zettlemoyer, Scott Yih, Mike Lewis

    Abstract: Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining pipelines train LMs by concatenating random sets of short documents to create input contexts but the prior documents provide no signal for predicting the next d… ▽ More

    Submitted 24 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

  2. arXiv:2310.01352  [pdf, other

    cs.CL cs.AI

    RA-DIT: Retrieval-Augmented Dual Instruction Tuning

    Authors: Xi Victoria Lin, Xilun Chen, Mingda Chen, Weijia Shi, Maria Lomeli, Rich James, Pedro Rodriguez, Jacob Kahn, Gergely Szilvasy, Mike Lewis, Luke Zettlemoyer, Scott Yih

    Abstract: Retrieval-augmented language models (RALMs) improve performance by accessing long-tail and up-to-date knowledge from external data stores, but are challenging to build. Existing approaches require either expensive retrieval-specific modifications to LM pre-training or use post-hoc integration of the data store that leads to suboptimal performance. We introduce Retrieval-Augmented Dual Instruction… ▽ More

    Submitted 6 May, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: v4: ICLR 2024 camera-ready version

  3. arXiv:2306.01061  [pdf, other

    cs.CL cs.DB

    Reimagining Retrieval Augmented Language Models for Answering Queries

    Authors: Wang-Chiew Tan, Yuliang Li, Pedro Rodriguez, Richard James, Xi Victoria Lin, Alon Halevy, Scott Yih

    Abstract: We present a reality check on large language models and inspect the promise of retrieval augmented language models in comparison. Such language models are semi-parametric, where models integrate model parameters and knowledge from external data sources to make their predictions, as opposed to the parametric nature of vanilla large language models. We give initial experimental findings that semi-pa… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  4. arXiv:2305.14739  [pdf, other

    cs.CL

    Trusting Your Evidence: Hallucinate Less with Context-aware Decoding

    Authors: Weijia Shi, Xiaochuang Han, Mike Lewis, Yulia Tsvetkov, Luke Zettlemoyer, Scott Wen-tau Yih

    Abstract: Language models (LMs) often struggle to pay enough attention to the input context, and generate texts that are unfaithful or contain hallucinations. To mitigate this issue, we present context-aware decoding (CAD), which follows a contrastive output distribution that amplifies the difference between the output probabilities when a model is used with and without context. Our experiments show that CA… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  5. arXiv:2212.09726  [pdf, other

    cs.CL cs.AI cs.LG

    Improving Faithfulness of Abstractive Summarization by Controlling Confounding Effect of Irrelevant Sentences

    Authors: Asish Ghoshal, Arash Einolghozati, Ankit Arun, Haoran Li, Lili Yu, Vera Gor, Yashar Mehdad, Scott Wen-tau Yih, Asli Celikyilmaz

    Abstract: Lack of factual correctness is an issue that still plagues state-of-the-art summarization systems despite their impressive progress on generating seemingly fluent summaries. In this paper, we show that factual inconsistency can be caused by irrelevant parts of the input text, which act as confounders. To that end, we leverage information-theoretic measures of causal effects to quantify the amount… ▽ More

    Submitted 18 January, 2024; v1 submitted 19 December, 2022; originally announced December 2022.

  6. arXiv:2211.11501  [pdf, other

    cs.SE cs.CL

    DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation

    Authors: Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu

    Abstract: We introduce DS-1000, a code generation benchmark with a thousand data science problems spanning seven Python libraries, such as NumPy and Pandas. Compared to prior works, DS-1000 incorporates three core features. First, our problems reflect diverse, realistic, and practical use cases since we collected them from StackOverflow. Second, our automatic evaluation is highly specific (reliable) -- acro… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

  7. arXiv:2205.13016  [pdf, other

    cs.LG cs.CL

    BiT: Robustly Binarized Multi-distilled Transformer

    Authors: Zechun Liu, Barlas Oguz, Aasish Pappu, Lin Xiao, Scott Yih, Meng Li, Raghuraman Krishnamoorthi, Yashar Mehdad

    Abstract: Modern pre-trained transformers have rapidly advanced the state-of-the-art in machine learning, but have also grown in parameters and computational complexity, making them increasingly difficult to deploy in resource-constrained environments. Binarization of the weights and activations of the network can significantly alleviate these issues, however, is technically challenging from an optimization… ▽ More

    Submitted 2 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022

  8. arXiv:2012.14610  [pdf, other

    cs.CL

    UniK-QA: Unified Representations of Structured and Unstructured Knowledge for Open-Domain Question Answering

    Authors: Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Scott Yih

    Abstract: We study open-domain question answering with structured, unstructured and semi-structured knowledge sources, including text, tables, lists and knowledge bases. Departing from prior work, we propose a unifying approach that homogenizes all sources by reducing them to text and applies the retriever-reader model which has so far been limited to text sources only. Our approach greatly improves the res… ▽ More

    Submitted 3 May, 2022; v1 submitted 29 December, 2020; originally announced December 2020.

    Comments: NAACL-HLT 2022 Findings

  9. arXiv:1911.02972  [pdf, other

    cs.CL cs.LG

    Blockwise Self-Attention for Long Document Understanding

    Authors: Jiezhong Qiu, Hao Ma, Omer Levy, Scott Wen-tau Yih, Sinong Wang, Jie Tang

    Abstract: We present BlockBERT, a lightweight and efficient BERT model for better modeling long-distance dependencies. Our model extends BERT by introducing sparse block structures into the attention matrix to reduce both memory consumption and training/inference time, which also enables attention heads to capture either short- or long-range contextual information. We conduct experiments on language model p… ▽ More

    Submitted 1 November, 2020; v1 submitted 7 November, 2019; originally announced November 2019.

    Comments: Accepted at Findings of EMNLP'20 and SustaiNLP 2020 at EMNLP'20, 12 pages

  10. arXiv:1908.05739  [pdf, other

    cs.CL

    Abductive Commonsense Reasoning

    Authors: Chandra Bhagavatula, Ronan Le Bras, Chaitanya Malaviya, Keisuke Sakaguchi, Ari Holtzman, Hannah Rashkin, Doug Downey, Scott Wen-tau Yih, Yejin Choi

    Abstract: Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the… ▽ More

    Submitted 13 February, 2020; v1 submitted 15 August, 2019; originally announced August 2019.

    Comments: ICLR 2020 Camera Ready