Skip to main content

Showing 1–32 of 32 results for author: Khabsa, M

  1. arXiv:2312.06674  [pdf, other

    cs.CL cs.AI

    Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations

    Authors: Hakan Inan, Kartikeya Upasani, Jianfeng Chi, Rashi Rungta, Krithika Iyer, Yuning Mao, Michael Tontchev, Qing Hu, Brian Fuller, Davide Testuggine, Madian Khabsa

    Abstract: We introduce Llama Guard, an LLM-based input-output safeguard model geared towards Human-AI conversation use cases. Our model incorporates a safety risk taxonomy, a valuable tool for categorizing a specific set of safety risks found in LLM prompts (i.e., prompt classification). This taxonomy is also instrumental in classifying the responses generated by LLMs to these prompts, a process we refer to… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  2. arXiv:2312.04032  [pdf, other

    cs.CL cs.LG

    RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training

    Authors: Jaehyung Kim, Yuning Mao, Rui Hou, Hanchao Yu, Davis Liang, Pascale Fung, Qifan Wang, Fuli Feng, Lifu Huang, Madian Khabsa

    Abstract: Fine-tuning pre-trained language models (LMs) has become the de facto standard in many NLP tasks. Nevertheless, fine-tuned LMs are still prone to robustness issues, such as adversarial robustness and model calibration. Several perspectives of robustness for LMs have been studied independently, but lacking a unified consideration in multiple perspectives. In this paper, we propose Robustifying LMs… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

    Comments: 33 pages, accepted at EMNLP 2023 Findings

  3. arXiv:2311.07689  [pdf, other

    cs.CL

    MART: Improving LLM Safety with Multi-round Automatic Red-Teaming

    Authors: Suyu Ge, Chunting Zhou, Rui Hou, Madian Khabsa, Yi-Chia Wang, Qifan Wang, Jiawei Han, Yuning Mao

    Abstract: Red-teaming is a common practice for mitigating unsafe behaviors in Large Language Models (LLMs), which involves thoroughly assessing LLMs to identify potential flaws and addressing them with responsible and accurate responses. While effective, manual red-teaming is costly, and existing automatic red-teaming typically discovers safety risks without addressing them. In this paper, we propose a Mult… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

  4. arXiv:2310.00183  [pdf, other

    cs.LG cs.AI

    On the Equivalence of Graph Convolution and Mixup

    Authors: Xiaotian Han, Hanqing Zeng, Yu Chen, Shaoliang Nie, Jingzhou Liu, Kanika Narang, Zahra Shakeri, Karthik Abinav Sankararaman, Song Jiang, Madian Khabsa, Qifan Wang, Xia Hu

    Abstract: This paper investigates the relationship between graph convolution and Mixup techniques. Graph convolution in a graph neural network involves aggregating features from neighboring samples to learn representative features for a specific node or sample. On the other hand, Mixup is a data augmentation technique that generates new examples by averaging features and one-hot labels from multiple samples… ▽ More

    Submitted 29 September, 2023; originally announced October 2023.

  5. arXiv:2309.16039  [pdf, other

    cs.CL

    Effective Long-Context Scaling of Foundation Models

    Authors: Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, Hao Ma

    Abstract: We present a series of long-context LLMs that support effective context windows of up to 32,768 tokens. Our model series are built through continual pretraining from Llama 2 with longer training sequences and on a dataset where long texts are upsampled. We perform extensive evaluation on language modeling, synthetic context probing tasks, and a wide range of research benchmarks. On research benchm… ▽ More

    Submitted 13 November, 2023; v1 submitted 27 September, 2023; originally announced September 2023.

  6. arXiv:2308.16884  [pdf, other

    cs.CL cs.AI cs.LG

    The Belebele Benchmark: a Parallel Reading Comprehension Dataset in 122 Language Variants

    Authors: Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, Madian Khabsa

    Abstract: We present Belebele, a multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. Significantly expanding the language coverage of natural language understanding (NLU) benchmarks, this dataset enables the evaluation of text models in high-, medium-, and low-resource languages. Each question is based on a short passage from the Flores-200 dataset and has four multip… ▽ More

    Submitted 31 August, 2023; originally announced August 2023.

    Comments: 27 pages, 13 figures

    ACM Class: I.2.7

  7. arXiv:2307.09288  [pdf, other

    cs.CL cs.AI

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Authors: Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini , et al. (43 additional authors not shown)

    Abstract: In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be… ▽ More

    Submitted 19 July, 2023; v1 submitted 18 July, 2023; originally announced July 2023.

  8. arXiv:2305.03937  [pdf, other

    cs.CL cs.AI

    Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization

    Authors: Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Jimmy Ba, Amjad Almahairi

    Abstract: Prompt tuning is one of the successful approaches for parameter-efficient tuning of pre-trained language models. Despite being arguably the most parameter-efficient (tuned soft prompts constitute <0.1% of total parameters), it typically performs worse than other efficient tuning methods and is quite sensitive to hyper-parameters. In this work, we introduce Residual Prompt Tuning - a simple and eff… ▽ More

    Submitted 6 May, 2023; originally announced May 2023.

    Comments: ACL Findings 2023

  9. arXiv:2305.00104  [pdf, other

    cs.CV eess.AS eess.IV

    MMViT: Multiscale Multiview Vision Transformers

    Authors: Yuchen Liu, Natasha Ong, Kaiyan Peng, Bo Xiong, Qifan Wang, Rui Hou, Madian Khabsa, Kaiyue Yang, David Liu, Donald S. Williamson, Hanchao Yu

    Abstract: We present Multiscale Multiview Vision Transformers (MMViT), which introduces multiscale feature maps and multiview encodings to transformer models. Our model encodes different views of the input signal and builds several channel-resolution feature stages to process the multiple views of the input at different resolutions in parallel. At each scale stage, we use a cross-attention block to fuse inf… ▽ More

    Submitted 28 April, 2023; originally announced May 2023.

  10. arXiv:2304.00325  [pdf, other

    cs.CV

    SVT: Supertoken Video Transformer for Efficient Video Understanding

    Authors: Chenbin Pan, Rui Hou, Hanchao Yu, Qifan Wang, Senem Velipasalar, Madian Khabsa

    Abstract: Whether by processing videos with fixed resolution from start to end or incorporating pooling and down-scaling strategies, existing video transformers process the whole video content throughout the network without specially handling the large portions of redundant information. In this paper, we present a Supertoken Video Transformer (SVT) that incorporates a Semantic Pooling Module (SPM) to aggreg… ▽ More

    Submitted 23 April, 2023; v1 submitted 1 April, 2023; originally announced April 2023.

  11. arXiv:2301.12314  [pdf, other

    cs.CL cs.AI cs.LG

    Progressive Prompts: Continual Learning for Language Models

    Authors: Anastasia Razdaibiedina, Yuning Mao, Rui Hou, Madian Khabsa, Mike Lewis, Amjad Almahairi

    Abstract: We introduce Progressive Prompts - a simple and efficient approach for continual learning in language models. Our method allows forward transfer and resists catastrophic forgetting, without relying on data replay or a large number of task-specific parameters. Progressive Prompts learns a new soft prompt for each task and sequentially concatenates it with the previously learned prompts, while keepi… ▽ More

    Submitted 28 January, 2023; originally announced January 2023.

  12. arXiv:2301.10472  [pdf, other

    cs.CL cs.LG

    XLM-V: Overcoming the Vocabulary Bottleneck in Multilingual Masked Language Models

    Authors: Davis Liang, Hila Gonen, Yuning Mao, Rui Hou, Naman Goyal, Marjan Ghazvininejad, Luke Zettlemoyer, Madian Khabsa

    Abstract: Large multilingual language models typically rely on a single vocabulary shared across 100+ languages. As these models have increased in parameter count and depth, vocabulary size has remained largely unchanged. This \textit{vocabulary bottleneck} limits the representational capabilities of multilingual models like XLM-R. In this paper, we introduce a new approach for scaling to very large multili… ▽ More

    Submitted 13 October, 2023; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: EMNLP 2023

  13. arXiv:2212.05195  [pdf, other

    cs.LG

    Uniform Masking Prevails in Vision-Language Pretraining

    Authors: Siddharth Verma, Yuchen Lu, Rui Hou, Hanchao Yu, Nicolas Ballas, Madian Khabsa, Amjad Almahairi

    Abstract: Masked Language Modeling (MLM) has proven to be an essential component of Vision-Language (VL) pretraining. To implement MLM, the researcher must make two design choices: the masking strategy, which determines which tokens to mask, and the masking rate, which determines how many tokens to mask. Previous work has focused primarily on the masking strategy while setting the masking rate at a default… ▽ More

    Submitted 9 December, 2022; originally announced December 2022.

  14. arXiv:2205.12469  [pdf, other

    cs.CL

    Logical Satisfiability of Counterfactuals for Faithful Explanations in NLI

    Authors: Suzanna Sia, Anton Belyy, Amjad Almahairi, Madian Khabsa, Luke Zettlemoyer, Lambert Mathias

    Abstract: Evaluating an explanation's faithfulness is desired for many reasons such as trust, interpretability and diagnosing the sources of model's errors. In this work, which focuses on the NLI task, we introduce the methodology of Faithfulness-through-Counterfactuals, which first generates a counterfactual hypothesis based on the logical predicates expressed in the explanation, and then evaluates if the… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

    Comments: Under Review

  15. arXiv:2112.13884  [pdf, other

    cs.CV

    A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision

    Authors: Ajinkya Tejankar, Maziar Sanjabi, Bichen Wu, Saining Xie, Madian Khabsa, Hamed Pirsiavash, Hamed Firooz

    Abstract: Using natural language as a supervision for training visual recognition models holds great promise. Recent works have shown that if such supervision is used in the form of alignment between images and captions in large training datasets, then the resulting aligned models perform well on zero-shot classification as downstream tasks2. In this paper, we focus on teasing out what parts of the language… ▽ More

    Submitted 5 January, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

  16. arXiv:2112.03204  [pdf, other

    cs.CL cs.LG

    Quantifying Adaptability in Pre-trained Language Models with 500 Tasks

    Authors: Belinda Z. Li, Jane Yu, Madian Khabsa, Luke Zettlemoyer, Alon Halevy, Jacob Andreas

    Abstract: When a neural language model (LM) is adapted to perform a new task, what aspects of the task predict the eventual performance of the model? In NLP, systematic features of LM generalization to individual examples are well characterized, but systematic aspects of LM adaptability to new tasks are not nearly as well understood. We present a large-scale empirical study of the features and limits of LM… ▽ More

    Submitted 4 May, 2022; v1 submitted 6 December, 2021; originally announced December 2021.

    Comments: NAACL 2022; 20 pages, 6 figures, 8 tables

  17. arXiv:2110.08536  [pdf, other

    cs.CL cs.LG

    Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models

    Authors: Qinyuan Ye, Madian Khabsa, Mike Lewis, Sinong Wang, Xiang Ren, Aaron Jaech

    Abstract: Distilling state-of-the-art transformer models into lightweight student models is an effective way to reduce computation cost at inference time. The student models are typically compact transformers with fewer parameters, while expensive operations such as self-attention persist. Therefore, the improved inference speed may still be unsatisfactory for real-time or high-volume use cases. In this pap… ▽ More

    Submitted 25 July, 2022; v1 submitted 16 October, 2021; originally announced October 2021.

    Comments: NAACL 2022 camera-ready version. Code: https://github.com/ink-usc/sparse-distillation. In v2, we updated the performance of KD-BiLSTM baselines after fixing a bug

  18. arXiv:2110.07577  [pdf, other

    cs.CL cs.AI cs.LG

    UniPELT: A Unified Framework for Parameter-Efficient Language Model Tuning

    Authors: Yuning Mao, Lambert Mathias, Rui Hou, Amjad Almahairi, Hao Ma, Jiawei Han, Wen-tau Yih, Madian Khabsa

    Abstract: Recent parameter-efficient language model tuning (PELT) methods manage to match the performance of fine-tuning with much fewer trainable parameters and perform especially well when training data is limited. However, different PELT methods may perform rather differently on the same task, making it nontrivial to select the most appropriate method for a specific task, especially considering the fast-… ▽ More

    Submitted 4 September, 2022; v1 submitted 14 October, 2021; originally announced October 2021.

    Comments: ACL 2022 (w. typo fixes)

  19. arXiv:2104.14690  [pdf, other

    cs.CL cs.AI

    Entailment as Few-Shot Learner

    Authors: Sinong Wang, Han Fang, Madian Khabsa, Hanzi Mao, Hao Ma

    Abstract: Large pre-trained language models (LMs) have demonstrated remarkable ability as few-shot learners. However, their success hinges largely on scaling model parameters to a degree that makes it challenging to train and serve. In this paper, we propose a new approach, named as EFL, that can turn small LMs into better few-shot learners. The key idea of this approach is to reformulate potential NLP task… ▽ More

    Submitted 29 April, 2021; originally announced April 2021.

  20. arXiv:2104.08840  [pdf, other

    cs.CL cs.LG

    On the Influence of Masking Policies in Intermediate Pre-training

    Authors: Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa

    Abstract: Current NLP models are predominantly trained through a two-stage "pre-train then fine-tune" pipeline. Prior work has shown that inserting an intermediate pre-training stage, using heuristic masking policies for masked language modeling (MLM), can significantly improve final performance. However, it is still unclear (1) in what cases such intermediate pre-training is helpful, (2) whether hand-craft… ▽ More

    Submitted 30 September, 2021; v1 submitted 18 April, 2021; originally announced April 2021.

    Comments: Accepted to EMNLP 2021. Camera-ready version

  21. arXiv:2104.05243  [pdf, other

    cs.AI cs.CL

    On Unifying Misinformation Detection

    Authors: Nayeon Lee, Belinda Z. Li, Sinong Wang, Pascale Fung, Hao Ma, Wen-tau Yih, Madian Khabsa

    Abstract: In this paper, we introduce UnifiedM2, a general-purpose misinformation model that jointly models multiple domains of misinformation with a single, unified setup. The model is trained to handle four tasks: detecting news bias, clickbait, fake news, and verifying rumors. By grouping these tasks together, UnifiedM2learns a richer representation of misinformation, which leads to state-of-the-art or c… ▽ More

    Submitted 12 April, 2021; originally announced April 2021.

    Comments: Accepted to NAACL2021

  22. arXiv:2103.09535  [pdf, other

    cs.CL cs.LG

    Towards Few-Shot Fact-Checking via Perplexity

    Authors: Nayeon Lee, Yejin Bang, Andrea Madotto, Madian Khabsa, Pascale Fung

    Abstract: Few-shot learning has drawn researchers' attention to overcome the problem of data scarcity. Recently, large pre-trained language models have shown great performance in few-shot learning for various downstream tasks, such as question answering and machine translation. Nevertheless, little exploration has been made to achieve few-shot learning for the fact-checking task. However, fact-checking is a… ▽ More

    Submitted 17 March, 2021; originally announced March 2021.

    Comments: Accpeted to NAACL'21

  23. arXiv:2012.15856  [pdf, other

    cs.CL cs.AI

    Studying Strategically: Learning to Mask for Closed-book QA

    Authors: Qinyuan Ye, Belinda Z. Li, Sinong Wang, Benjamin Bolte, Hao Ma, Wen-tau Yih, Xiang Ren, Madian Khabsa

    Abstract: Closed-book question-answering (QA) is a challenging task that requires a model to directly answer questions without access to external knowledge. It has been shown that directly fine-tuning pre-trained language models with (question, answer) examples yields surprisingly competitive performance, which is further improved upon through adding an intermediate pre-training stage between general pre-tr… ▽ More

    Submitted 1 January, 2021; v1 submitted 31 December, 2020; originally announced December 2020.

  24. arXiv:2012.15466  [pdf, ps, other

    cs.CL

    CLEAR: Contrastive Learning for Sentence Representation

    Authors: Zhuofeng Wu, Sinong Wang, Jiatao Gu, Madian Khabsa, Fei Sun, Hao Ma

    Abstract: Pre-trained language models have proven their unique powers in capturing implicit language features. However, most pre-training approaches focus on the word-level training objective, while sentence-level objectives are rarely studied. In this paper, we propose Contrastive LEArning for sentence Representation (CLEAR), which employs multiple sentence-level augmentation strategies in order to learn a… ▽ More

    Submitted 31 December, 2020; originally announced December 2020.

    Comments: 10 pages, 2 figures

  25. arXiv:2006.08671  [pdf, other

    cs.CL cs.LG stat.ML

    To Pretrain or Not to Pretrain: Examining the Benefits of Pretraining on Resource Rich Tasks

    Authors: Sinong Wang, Madian Khabsa, Hao Ma

    Abstract: Pretraining NLP models with variants of Masked Language Model (MLM) objectives has recently led to a significant improvements on many tasks. This paper examines the benefits of pretrained models as a function of the number of training samples used in the downstream task. On several text classification tasks, we show that as the number of training examples grow into the millions, the accuracy gap b… ▽ More

    Submitted 15 June, 2020; originally announced June 2020.

    Comments: Accepted in ACL2020

  26. arXiv:2006.04768  [pdf, other

    cs.LG stat.ML

    Linformer: Self-Attention with Linear Complexity

    Authors: Sinong Wang, Belinda Z. Li, Madian Khabsa, Han Fang, Hao Ma

    Abstract: Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the Transformer uses $O(n^2)$ time and space with respect to sequence length. In this paper, we demonstrate that the… ▽ More

    Submitted 14 June, 2020; v1 submitted 8 June, 2020; originally announced June 2020.

  27. arXiv:2006.04102  [pdf, other

    cs.CL cs.AI

    Language Models as Fact Checkers?

    Authors: Nayeon Lee, Belinda Z. Li, Sinong Wang, Wen-tau Yih, Hao Ma, Madian Khabsa

    Abstract: Recent work has suggested that language models (LMs) store both common-sense and factual knowledge learned from pre-training data. In this paper, we leverage this implicit knowledge to create an effective end-to-end fact checker using a solely a language model, without any external knowledge or explicit retrieval components. While previous work on extracting knowledge from LMs have focused on the… ▽ More

    Submitted 24 July, 2020; v1 submitted 7 June, 2020; originally announced June 2020.

    Comments: Accepted in FEVER Workshop (ACL2020)

  28. arXiv:1906.05275  [pdf, other

    cs.CL cs.LG

    Keeping Notes: Conditional Natural Language Generation with a Scratchpad Mechanism

    Authors: Ryan Y. Benmalek, Madian Khabsa, Suma Desu, Claire Cardie, Michele Banko

    Abstract: We introduce the Scratchpad Mechanism, a novel addition to the sequence-to-sequence (seq2seq) neural network architecture and demonstrate its effectiveness in improving the overall fluency of seq2seq models for natural language generation tasks. By enabling the decoder at each time step to write to all of the encoder output layers, Scratchpad can employ the encoder as a "scratchpad" memory to keep… ▽ More

    Submitted 13 June, 2019; v1 submitted 12 June, 2019; originally announced June 2019.

    Comments: Accepted to ACL 2019

  29. arXiv:1804.08058  [pdf, other

    cs.CL

    Adversarial Training for Community Question Answer Selection Based on Multi-scale Matching

    Authors: Xiao Yang, Madian Khabsa, Miaosen Wang, Wei Wang, Madian Khabsa, Ahmed Awadallah, Daniel Kifer, C. Lee Giles

    Abstract: Community-based question answering (CQA) websites represent an important source of information. As a result, the problem of matching the most valuable answers to their corresponding questions has become an increasingly popular research topic. We frame this task as a binary (relevant/irrelevant) classification problem, and present an adversarial training framework to alleviate label imbalance issue… ▽ More

    Submitted 16 November, 2018; v1 submitted 21 April, 2018; originally announced April 2018.

  30. arXiv:1712.09185  [pdf, other

    cs.CL

    Actionable Email Intent Modeling with Reparametrized RNNs

    Authors: Chu-Cheng Lin, Dongyeop Kang, Michael Gamon, Madian Khabsa, Ahmed Hassan Awadallah, Patrick Pantel

    Abstract: Emails in the workplace are often intentional calls to action for its recipients. We propose to annotate these emails for what action its recipient will take. We argue that our approach of action-based annotation is more scalable and theory-agnostic than traditional speech-act-based email intent annotation, while still carrying important semantic and pragmatic information. We show that our action-… ▽ More

    Submitted 26 December, 2017; originally announced December 2017.

    Comments: AAAI 2018

  31. arXiv:1602.01792  [pdf, other

    cs.IR

    Random Forest DBSCAN for USPTO Inventor Name Disambiguation

    Authors: Kunho Kim, Madian Khabsa, C. Lee Giles

    Abstract: Name disambiguation and the subsequent name conflation are essential for the correct processing of person name queries in a digital library or other database. It distinguishes each unique person from all other records in the database. We study inventor name disambiguation for a patent database using methods and features from earlier work on author name disambiguation and propose a feature set appr… ▽ More

    Submitted 14 September, 2017; v1 submitted 4 February, 2016; originally announced February 2016.

  32. arXiv:1307.1718  [pdf, ps, other

    cs.IR

    Graph-based Approach to Automatic Taxonomy Generation (GraBTax)

    Authors: Pucktada Treeratpituk, Madian Khabsa, C. Lee Giles

    Abstract: We propose a novel graph-based approach for constructing concept hierarchy from a large text corpus. Our algorithm, GraBTax, incorporates both statistical co-occurrences and lexical similarity in optimizing the structure of the taxonomy. To automatically generate topic-dependent taxonomies from a large text corpus, GraBTax first extracts topical terms and their relationships from the corpus. The a… ▽ More

    Submitted 28 April, 2014; v1 submitted 5 July, 2013; originally announced July 2013.

    ACM Class: H.3.1