Skip to main content

Showing 1–50 of 157 results for author: Joty, S

  1. arXiv:2407.04172  [pdf, other

    cs.AI cs.CL cs.CV

    ChartGemma: Visual Instruction-tuning for Chart Reasoning in the Wild

    Authors: Ahmed Masry, Megh Thakkar, Aayush Bajaj, Aaryaman Kartha, Enamul Hoque, Shafiq Joty

    Abstract: Given the ubiquity of charts as a data analysis, visualization, and decision-making tool across industries and sciences, there has been a growing interest in developing pre-trained foundation models as well as general purpose instruction-tuned models for chart understanding and reasoning. However, existing methods suffer crucial drawbacks across two critical axes affecting the performance of chart… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2407.04069  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations

    Authors: Md Tahmid Rahman Laskar, Sawsan Alqahtani, M Saiful Bari, Mizanur Rahman, Mohammad Abdullah Matin Khan, Haidar Khan, Israt Jahan, Amran Bhuiyan, Chee Wei Tan, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty, Jimmy Huang

    Abstract: Large Language Models (LLMs) have recently gained significant attention due to their remarkable capabilities in performing diverse tasks across various domains. However, a thorough evaluation of these models is crucial before deploying them in real-world applications to ensure they produce reliable performance. Despite the well-established importance of evaluating LLMs in the community, the comple… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2406.03776  [pdf, other

    cs.CL cs.AI cs.CV cs.IR

    XL-HeadTags: Leveraging Multimodal Retrieval Augmentation for the Multilingual Generation of News Headlines and Tags

    Authors: Faisal Tareque Shohan, Mir Tafseer Nayeem, Samsul Islam, Abu Ubaida Akash, Shafiq Joty

    Abstract: Millions of news articles published online daily can overwhelm readers. Headlines and entity (topic) tags are essential for guiding readers to decide if the content is worth their time. While headline generation has been extensively studied, tag generation remains largely unexplored, yet it offers readers better access to topics of interest. The need for conciseness in capturing readers' attention… ▽ More

    Submitted 7 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: ACL 2024 camera ready. The first two authors contributed equally

  4. arXiv:2405.15329  [pdf, other

    cs.CL

    Decompose and Aggregate: A Step-by-Step Interpretable Evaluation Framework

    Authors: Minzhi Li, Zhengyuan Liu, Shumin Deng, Shafiq Joty, Nancy F. Chen, Min-Yen Kan

    Abstract: The acceleration of Large Language Models (LLMs) research has opened up new possibilities for evaluating generated texts. They serve as scalable and economical evaluators, but the question of how reliable these evaluators are has emerged as a crucial research question. Prior research efforts in the meta-evaluation of LLMs as judges limit the prompting of an LLM to a single use to obtain a final ev… ▽ More

    Submitted 14 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  5. arXiv:2404.16251  [pdf, ps, other

    cs.CR cs.AI cs.CL

    Investigating the prompt leakage effect and black-box defenses for multi-turn LLM interactions

    Authors: Divyansh Agarwal, Alexander R. Fabbri, Philippe Laban, Ben Risher, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Prompt leakage in large language models (LLMs) poses a significant security and privacy threat, particularly in retrieval-augmented generation (RAG) systems. However, leakage in multi-turn LLM interactions along with mitigation strategies has not been studied in a standardized manner. This paper investigates LLM vulnerabilities against prompt leakage across 4 diverse domains and 10 closed- and ope… ▽ More

    Submitted 26 April, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

  6. arXiv:2404.12728  [pdf, other

    cs.CL

    Relevant or Random: Can LLMs Truly Perform Analogical Reasoning?

    Authors: Chengwei Qin, Wenhan Xia, Tan Wang, Fangkai Jiao, Yuchen Hu, Bosheng Ding, Ruirui Chen, Shafiq Joty

    Abstract: Analogical reasoning is a unique ability of humans to address unfamiliar challenges by transferring strategies from relevant past experiences. One key finding in psychology is that compared with irrelevant past experiences, recalling relevant ones can help humans better handle new tasks. Coincidentally, the NLP community has also recently found that self-generating relevant examples in the context… ▽ More

    Submitted 23 June, 2024; v1 submitted 19 April, 2024; originally announced April 2024.

  7. arXiv:2404.02507  [pdf, other

    cs.CL

    Lifelong Event Detection with Embedding Space Separation and Compaction

    Authors: Chengwei Qin, Ruirui Chen, Ruochen Zhao, Wenhan Xia, Shafiq Joty

    Abstract: To mitigate forgetting, existing lifelong event detection methods typically maintain a memory module and replay the stored memory data during the learning of a new task. However, the simple combination of memory data and new-task samples can still result in substantial forgetting of previously acquired knowledge, which may occur due to the potential overlap between the feature distribution of new… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: NAACL 2024 main conference

  8. arXiv:2404.00699  [pdf, other

    cs.CL

    How Much are LLMs Contaminated? A Comprehensive Survey and the LLMSanitize Library

    Authors: Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, Shafiq Joty

    Abstract: With the rise of Large Language Models (LLMs) in recent years, new opportunities are emerging, but also new challenges, and contamination is quickly becoming critical. Business applications and fundraising in AI have reached a scale at which a few percentage points gained on popular question-answering benchmarks could translate into dozens of millions of dollars, placing high pressure on model int… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: 10 pages, 1 figure, 3 tables

  9. arXiv:2404.00570  [pdf, other

    cs.CL

    ParaICL: Towards Robust Parallel In-Context Learning

    Authors: Xingxuan Li, Xuan-Phi Nguyen, Shafiq Joty, Lidong Bing

    Abstract: Large language models (LLMs) have become the norm in natural language processing (NLP), excelling in few-shot in-context learning (ICL) with their remarkable abilities. Nonetheless, the success of ICL largely hinges on the choice of few-shot demonstration examples, making the selection process increasingly crucial. Existing methods have delved into optimizing the quantity and semantic similarity o… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

    Comments: Work in progress

  10. arXiv:2403.12027  [pdf, other

    cs.CL cs.AI cs.CV

    From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

    Authors: Kung-Hsiang Huang, Hou Pong Chan, Yi R. Fung, Haoyi Qiu, Mingyang Zhou, Shafiq Joty, Shih-Fu Chang, Heng Ji

    Abstract: Data visualization in the form of charts plays a pivotal role in data analysis, offering critical insights and aiding in informed decision-making. Automatic chart understanding has witnessed significant advancements with the rise of large foundation models in recent years. Foundation models, such as large language models, have revolutionized various natural language processing tasks and are increa… ▽ More

    Submitted 25 March, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

  11. arXiv:2403.09028  [pdf, other

    cs.CL

    ChartInstruct: Instruction Tuning for Chart Comprehension and Reasoning

    Authors: Ahmed Masry, Mehrad Shahmohammadi, Md Rizwan Parvez, Enamul Hoque, Shafiq Joty

    Abstract: Charts provide visual representations of data and are widely used for analyzing information, addressing queries, and conveying insights to others. Various chart-related downstream tasks have emerged recently, such as question-answering and summarization. A common strategy to solve these tasks is to fine-tune various models originally trained on vision tasks language. However, such task-specific mo… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  12. arXiv:2403.02990  [pdf, other

    cs.CL cs.AI

    Data Augmentation using Large Language Models: Data Perspectives, Learning Paradigms and Challenges

    Authors: Bosheng Ding, Chengwei Qin, Ruochen Zhao, Tianze Luo, Xinze Li, Guizhen Chen, Wenhan Xia, Junjie Hu, Anh Tuan Luu, Shafiq Joty

    Abstract: In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This survey explores the transformative impact of LLMs on DA, particularly addressing the unique challenges and opportunities they present in the context of natural… ▽ More

    Submitted 2 July, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  13. arXiv:2402.00658  [pdf, other

    cs.AI cs.CL

    Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing

    Authors: Fangkai Jiao, Chengwei Qin, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty

    Abstract: Large Language Models (LLMs) have demonstrated significant potential in handling complex reasoning tasks through step-by-step rationale generation. However, recent studies have raised concerns regarding the hallucination and flaws in their reasoning process. Substantial efforts are being made to improve the reliability and faithfulness of the generated rationales. Some approaches model reasoning a… ▽ More

    Submitted 15 April, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: 17 pages, 9 figures

  14. arXiv:2401.13974  [pdf, other

    cs.CV cs.AI cs.GR

    BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models

    Authors: Senthil Purushwalkam, Akash Gokul, Shafiq Joty, Nikhil Naik

    Abstract: Recent text-to-image generation models have demonstrated incredible success in generating images that faithfully follow input prompts. However, the requirement of using words to describe a desired concept provides limited control over the appearance of the generated concepts. In this work, we address this shortcoming by proposing an approach to enable personalization capabilities in existing text-… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

  15. arXiv:2312.17055  [pdf, other

    cs.CL

    Improving In-context Learning via Bidirectional Alignment

    Authors: Chengwei Qin, Wenhan Xia, Fangkai Jiao, Chen Chen, Yuchen Hu, Bosheng Ding, Shafiq Joty

    Abstract: Large language models (LLMs) have shown impressive few-shot generalization on many tasks via in-context learning (ICL). Despite their success in showing such emergent abilities, the scale and complexity of larger models also lead to unprecedentedly high computational demands and deployment challenges. In reaction, researchers explore transferring the powerful capabilities of larger models to more… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  16. arXiv:2312.10610  [pdf, other

    cs.CL

    Do LLMs Work on Charts? Designing Few-Shot Prompts for Chart Question Answering and Summarization

    Authors: Xuan Long Do, Mohammad Hassanpour, Ahmed Masry, Parsa Kavehzadeh, Enamul Hoque, Shafiq Joty

    Abstract: A number of tasks have been proposed recently to facilitate easy access to charts such as chart QA and summarization. The dominant paradigm to solve these tasks has been to fine-tune a pretrained model on the task data. However, this approach is not only expensive but also not generalizable to unseen tasks. On the other hand, large language models (LLMs) have shown impressive generalization capabi… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

    Comments: 23 pages

  17. arXiv:2311.18799  [pdf, other

    cs.CV cs.CL

    X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning

    Authors: Artemis Panagopoulou, Le Xue, Ning Yu, Junnan Li, Dongxu Li, Shafiq Joty, Ran Xu, Silvio Savarese, Caiming Xiong, Juan Carlos Niebles

    Abstract: Vision-language pre-training and instruction tuning have demonstrated general-purpose capabilities in 2D visual reasoning tasks by aligning visual encoders with state-of-the-art large language models (LLMs). In this paper, we introduce a simple, yet effective, cross-modality framework built atop frozen LLMs that allows the integration of various modalities without extensive modality-specific custo… ▽ More

    Submitted 30 November, 2023; originally announced November 2023.

  18. arXiv:2311.16989  [pdf, other

    cs.CL

    ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?

    Authors: Hailin Chen, Fangkai Jiao, Xingxuan Li, Chengwei Qin, Mathieu Ravaut, Ruochen Zhao, Caiming Xiong, Shafiq Joty

    Abstract: Upon its release in late 2022, ChatGPT has brought a seismic shift in the entire landscape of AI, both in research and commerce. Through instruction-tuning a large language model (LLM) with supervised fine-tuning and reinforcement learning from human feedback, it showed that a model could answer human questions and follow instructions on a broad panel of tasks. Following this success, interests in… ▽ More

    Submitted 15 January, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: version v4, included latest top-performing open-sourced LLMs

  19. arXiv:2311.12908  [pdf, other

    cs.CV cs.AI cs.GR cs.LG

    Diffusion Model Alignment Using Direct Preference Optimization

    Authors: Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, Nikhil Naik

    Abstract: Large language models (LLMs) are fine-tuned using human comparison data with Reinforcement Learning from Human Feedback (RLHF) methods to make them better aligned with users' preferences. In contrast to LLMs, human preference learning has not been widely explored in text-to-image diffusion models; the best existing approach is to fine-tune a pretrained model using carefully curated high quality im… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  20. arXiv:2311.09184  [pdf, other

    cs.CL cs.LG

    Benchmarking Generation and Evaluation Capabilities of Large Language Models for Instruction Controllable Summarization

    Authors: Yixin Liu, Alexander R. Fabbri, Jiawen Chen, Yilun Zhao, Simeng Han, Shafiq Joty, Pengfei Liu, Dragomir Radev, Chien-Sheng Wu, Arman Cohan

    Abstract: While large language models (LLMs) can already achieve strong performance on standard generic summarization benchmarks, their performance on more complex summarization task settings is less studied. Therefore, we benchmark LLMs on instruction controllable text summarization, where the model input consists of both a source article and a natural language requirement for desired summary characteristi… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

    Comments: NAACL 2024 Findings, GitHub Repo: https://github.com/yale-nlp/InstruSum, LLM-evaluators Leaderboard: https://huggingface.co/spaces/yale-nlp/InstruSumEval

  21. arXiv:2310.20170  [pdf, other

    cs.CL

    DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

    Authors: Wenting Zhao, Ye Liu, Tong Niu, Yao Wan, Philip S. Yu, Shafiq Joty, Yingbo Zhou, Semih Yavuz

    Abstract: Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasi… ▽ More

    Submitted 31 October, 2023; originally announced October 2023.

  22. arXiv:2310.18628  [pdf, other

    cs.CL cs.LG

    Personalised Distillation: Empowering Open-Sourced LLMs with Adaptive Learning for Code Generation

    Authors: Hailin Chen, Amrita Saha, Steven Hoi, Shafiq Joty

    Abstract: With the rise of powerful closed-sourced LLMs (ChatGPT, GPT-4), there are increasing interests in distilling the capabilies of close-sourced LLMs to smaller open-sourced LLMs. Previous distillation methods usually prompt ChatGPT to generate a set of instructions and answers, for the student model to learn. However, such standard distillation approach neglects the merits and conditions of the stude… ▽ More

    Submitted 26 January, 2024; v1 submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023; Codes at: https://github.com/SalesforceAIResearch/PersDistill

  23. arXiv:2310.10570  [pdf, other

    cs.CL

    On Context Utilization in Summarization with Large Language Models

    Authors: Mathieu Ravaut, Aixin Sun, Nancy F. Chen, Shafiq Joty

    Abstract: Large language models (LLMs) excel in abstractive summarization tasks, delivering fluent and pertinent summaries. Recent advancements have extended their capabilities to handle long-input contexts, exceeding 100k tokens. However, in question answering, language models exhibit uneven utilization of their input context. They tend to favor the initial and final segments, resulting in a U-shaped perfo… ▽ More

    Submitted 14 June, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: ACL 2024. 9 pages, 7 figures, 3 tables

  24. arXiv:2310.09886  [pdf, other

    cs.CL cs.AI

    Lifelong Sequence Generation with Dynamic Module Expansion and Adaptation

    Authors: Chengwei Qin, Chen Chen, Shafiq Joty

    Abstract: Lifelong sequence generation (LSG), a problem in continual learning, aims to continually train a model on a sequence of generation tasks to learn constantly emerging new generation patterns while avoiding the forgetting of previous knowledge. Existing LSG methods mainly focus on maintaining old knowledge while paying little attention to knowledge transfer across tasks. In contrast, humans can bett… ▽ More

    Submitted 22 November, 2023; v1 submitted 15 October, 2023; originally announced October 2023.

  25. arXiv:2310.08992  [pdf, other

    cs.AI cs.CL cs.PL

    CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules

    Authors: Hung Le, Hailin Chen, Amrita Saha, Akash Gokul, Doyen Sahoo, Shafiq Joty

    Abstract: Large Language Models (LLMs) have already become quite proficient at solving simpler programming tasks like those in HumanEval or MBPP benchmarks. However, solving more complex and competitive programming tasks is still quite challenging for these models - possibly due to their tendency to generate solutions as monolithic code blocks instead of decomposing them into logical sub-tasks and sub-modul… ▽ More

    Submitted 13 March, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  26. arXiv:2310.01917  [pdf, other

    cs.CL cs.HC

    Hierarchical Evaluation Framework: Best Practices for Human Evaluation

    Authors: Iva Bojic, Jessica Chen, Si Yuan Chang, Qi Chwen Ong, Shafiq Joty, Josip Car

    Abstract: Human evaluation plays a crucial role in Natural Language Processing (NLP) as it assesses the quality and relevance of developed systems, thereby facilitating their enhancement. However, the absence of widely accepted human evaluation metrics in NLP hampers fair comparisons among different systems and the establishment of universal assessment standards. Through an extensive analysis of existing li… ▽ More

    Submitted 12 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

  27. arXiv:2309.17446  [pdf, other

    cs.CL cs.LG cs.PL cs.SE

    L2CEval: Evaluating Language-to-Code Generation Capabilities of Large Language Models

    Authors: Ansong Ni, Pengcheng Yin, Yilun Zhao, Martin Riddell, Troy Feng, Rui Shen, Stephen Yin, Ye Liu, Semih Yavuz, Caiming Xiong, Shafiq Joty, Yingbo Zhou, Dragomir Radev, Arman Cohan

    Abstract: Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising results, there is a notable lack of a comprehensive evaluation of these models language-to-code generation capabilities. Existing studies often focus on specific task… ▽ More

    Submitted 2 October, 2023; v1 submitted 29 September, 2023; originally announced September 2023.

    Comments: Project Website: https://l2c-eval.github.io/

  28. arXiv:2309.09369  [pdf, other

    cs.CL

    Embrace Divergence for Richer Insights: A Multi-document Summarization Benchmark and a Case Study on Summarizing Diverse Information from News Articles

    Authors: Kung-Hsiang Huang, Philippe Laban, Alexander R. Fabbri, Prafulla Kumar Choubey, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Previous research in multi-document news summarization has typically concentrated on collating information that all sources agree upon. However, the summarization of diverse information dispersed across multiple articles about an event remains underexplored. In this paper, we propose a new task of summarizing diverse information encountered in multiple news articles encompassing the same event. To… ▽ More

    Submitted 22 March, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: NAACL 2024

  29. arXiv:2309.06057  [pdf, other

    cs.SE cs.CL

    RAP-Gen: Retrieval-Augmented Patch Generation with CodeT5 for Automatic Program Repair

    Authors: Weishi Wang, Yue Wang, Shafiq Joty, Steven C. H. Hoi

    Abstract: Automatic program repair (APR) is crucial to reduce manual debugging efforts for developers and improve software reliability. While conventional search-based techniques typically rely on heuristic rules or a redundancy assumption to mine fix patterns, recent years have witnessed the surge of deep learning (DL) based approaches to automate the program repair process in a data-driven manner. However… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

    Comments: FSE 2023, Long paper

  30. arXiv:2309.03450  [pdf, other

    cs.CL cs.AI cs.LG

    XGen-7B Technical Report

    Authors: Erik Nijkamp, Tian Xie, Hiroaki Hayashi, Bo Pang, Congying Xia, Chen Xing, Jesse Vig, Semih Yavuz, Philippe Laban, Ben Krause, Senthil Purushwalkam, Tong Niu, Wojciech Kryściński, Lidiya Murakhovs'ka, Prafulla Kumar Choubey, Alex Fabbri, Ye Liu, Rui Meng, Lifu Tu, Meghana Bhat, Chien-Sheng Wu, Silvio Savarese, Yingbo Zhou, Shafiq Joty, Caiming Xiong

    Abstract: Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many t… ▽ More

    Submitted 6 September, 2023; originally announced September 2023.

  31. arXiv:2308.12574  [pdf, other

    cs.IR cs.AI

    Modeling Uncertainty and Using Post-fusion as Fallback Improves Retrieval Augmented Generation with LLMs

    Authors: Ye Liu, Semih Yavuz, Rui Meng, Meghana Moorthy, Shafiq Joty, Caiming Xiong, Yingbo Zhou

    Abstract: The integration of retrieved passages and large language models (LLMs), such as ChatGPTs, has significantly contributed to improving open-domain question answering. However, there is still a lack of exploration regarding the optimal approach for incorporating retrieved passages into the answer generation process. This paper aims to fill this gap by investigating different methods of combining retr… ▽ More

    Submitted 7 April, 2024; v1 submitted 24 August, 2023; originally announced August 2023.

  32. arXiv:2308.03117  [pdf, other

    cs.CL

    PromptSum: Parameter-Efficient Controllable Abstractive Summarization

    Authors: Mathieu Ravaut, Hailin Chen, Ruochen Zhao, Chengwei Qin, Shafiq Joty, Nancy Chen

    Abstract: Prompt tuning (PT), a parameter-efficient technique that only tunes the additional prompt embeddings while keeping the backbone pre-trained language model (PLM) frozen, has shown promising results in language understanding tasks, especially in low-resource scenarios. However, effective prompt design methods suitable for generation tasks such as summarization are still lacking. At the same time, su… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

  33. arXiv:2306.11372  [pdf, other

    cs.CL cs.AI

    Democratizing LLMs for Low-Resource Languages by Leveraging their English Dominant Abilities with Linguistically-Diverse Prompts

    Authors: Xuan-Phi Nguyen, Sharifah Mahani Aljunied, Shafiq Joty, Lidong Bing

    Abstract: Large language models (LLMs) are known to effectively perform tasks by simply observing few exemplars. However, in low-resource languages, obtaining such hand-picked exemplars can still be challenging, where unsupervised techniques may be necessary. Moreover, competent generative capabilities of LLMs are observed only in high-resource languages, while their performances among under-represented lan… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Pre-print

  34. arXiv:2306.01150  [pdf, other

    cs.CL cs.AI

    Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learning

    Authors: Fan Yin, Jesse Vig, Philippe Laban, Shafiq Joty, Caiming Xiong, Chien-Sheng Jason Wu

    Abstract: Large language models (LLMs) have shown impressive performance in following natural language instructions to solve unseen tasks. However, it remains unclear whether models truly understand task definitions and whether the human-written definitions are optimal. In this paper, we systematically study the role of task definitions in instruction learning. We first conduct an ablation analysis informed… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

    Comments: ACL 2023, camera-ready; 10 pages

  35. arXiv:2305.19707  [pdf, other

    cs.CL

    Building Extractive Question Answering System to Support Human-AI Health Coaching Model for Sleep Domain

    Authors: Iva Bojic, Qi Chwen Ong, Shafiq Joty, Josip Car

    Abstract: Non-communicable diseases (NCDs) are a leading cause of global deaths, necessitating a focus on primary prevention and lifestyle behavior change. Health coaching, coupled with Question Answering (QA) systems, has the potential to transform preventive healthcare. This paper presents a human-Artificial Intelligence (AI) health coaching model incorporating a domain-specific extractive QA system. A sl… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: 2 pages, 1 figure

  36. arXiv:2305.19204  [pdf, other

    cs.CL

    SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages

    Authors: Philippe Laban, Jesse Vig, Wojciech Kryscinski, Shafiq Joty, Caiming Xiong, Chien-Sheng Wu

    Abstract: Text simplification research has mostly focused on sentence-level simplification, even though many desirable edits - such as adding relevant background information or reordering content - may require document-level context. Prior work has also predominantly framed simplification as a single-step, input-to-output task, only implicitly modeling the fine-grained, span-level edits that elucidate the s… ▽ More

    Submitted 30 May, 2023; originally announced May 2023.

    Comments: ACL 2023, Long Paper

  37. arXiv:2305.18486  [pdf, other

    cs.CL cs.AI cs.LG

    A Systematic Study and Comprehensive Evaluation of ChatGPT on Benchmark Datasets

    Authors: Md Tahmid Rahman Laskar, M Saiful Bari, Mizanur Rahman, Md Amran Hossen Bhuiyan, Shafiq Joty, Jimmy Xiangji Huang

    Abstract: The development of large language models (LLMs) such as ChatGPT has brought a lot of attention recently. However, their evaluation in the benchmark academic datasets remains under-explored due to the difficulty of evaluating the generative outputs produced by this model against the ground truth. In this paper, we aim to present a thorough evaluation of ChatGPT's performance on diverse academic dat… ▽ More

    Submitted 5 July, 2023; v1 submitted 29 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL 2023 Findings. The first three authors contributed equally

  38. arXiv:2305.15014  [pdf, other

    cs.CL

    Unlocking Temporal Question Answering for Large Language Models Using Code Execution

    Authors: Xingxuan Li, Liying Cheng, Qingyu Tan, Hwee Tou Ng, Shafiq Joty, Lidong Bing

    Abstract: Large language models (LLMs) have made significant progress in natural language processing (NLP), and are utilized extensively in various applications. Recent works, such as chain-of-thought (CoT), have shown that intermediate reasoning steps can improve the performance of LLMs for complex reasoning tasks, such as math problems and symbolic question-answering tasks. However, we notice the challeng… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  39. arXiv:2305.14761  [pdf, other

    cs.CL

    UniChart: A Universal Vision-language Pretrained Model for Chart Comprehension and Reasoning

    Authors: Ahmed Masry, Parsa Kavehzadeh, Xuan Long Do, Enamul Hoque, Shafiq Joty

    Abstract: Charts are very popular for analyzing data, visualizing key insights and answering complex reasoning questions about data. To facilitate chart-based data analysis using natural language, several downstream tasks have been introduced recently such as chart question answering and chart summarization. However, most of the methods that solve these tasks use pretraining on language or vision-language t… ▽ More

    Submitted 10 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

  40. arXiv:2305.14540  [pdf, other

    cs.CL

    LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond

    Authors: Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu

    Abstract: With the recent appearance of LLMs in practical settings, having methods that can effectively detect factual inconsistencies is crucial to reduce the propagation of misinformation and improve trust in model outputs. When testing on existing factual consistency benchmarks, we find that a few large language models (LLMs) perform competitively on classification benchmarks for factual inconsistency de… ▽ More

    Submitted 23 May, 2023; originally announced May 2023.

  41. arXiv:2305.13718  [pdf, other

    cs.CL

    Exploring Self-supervised Logic-enhanced Training for Large Language Models

    Authors: Fangkai Jiao, Zhiyang Teng, Bosheng Ding, Zhengyuan Liu, Nancy F. Chen, Shafiq Joty

    Abstract: Existing efforts to improve logical reasoning ability of language models have predominantly relied on supervised fine-tuning, hindering generalization to new domains and/or tasks. The development of Large Langauge Models (LLMs) has demonstrated the capacity of compressing abundant knowledge into a single proxy, enabling them to tackle multiple tasks effectively. Our preliminary experiments, nevert… ▽ More

    Submitted 16 June, 2024; v1 submitted 23 May, 2023; originally announced May 2023.

    Comments: 16 pages, NAACL 2024

  42. arXiv:2305.13269  [pdf, other

    cs.CL

    Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources

    Authors: Xingxuan Li, Ruochen Zhao, Yew Ken Chia, Bosheng Ding, Shafiq Joty, Soujanya Poria, Lidong Bing

    Abstract: We present chain-of-knowledge (CoK), a novel framework that augments large language models (LLMs) by dynamically incorporating grounding information from heterogeneous sources. It results in more factual rationales and reduced hallucination in generation. Specifically, CoK consists of three stages: reasoning preparation, dynamic knowledge adapting, and answer consolidation. Given a knowledge-inten… ▽ More

    Submitted 21 February, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted by ICLR 2024

  43. arXiv:2305.06522  [pdf, other

    cs.CL cs.AI

    Randomized Smoothing with Masked Inference for Adversarially Robust Text Classifications

    Authors: Han Cheol Moon, Shafiq Joty, Ruochen Zhao, Megh Thakkar, Xu Chi

    Abstract: Large-scale pre-trained language models have shown outstanding performance in a variety of NLP tasks. However, they are also known to be significantly brittle against specifically crafted adversarial examples, leading to increasing interest in probing the adversarial robustness of NLP systems. We introduce RSMI, a novel two-stage framework that combines randomized smoothing (RS) with masked infere… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: 19 pages, 4 figures, ACL23

  44. arXiv:2305.03268  [pdf, other

    cs.CL

    Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework

    Authors: Ruochen Zhao, Xingxuan Li, Shafiq Joty, Chengwei Qin, Lidong Bing

    Abstract: As large language models (LLMs) have become the norm in NLP, demonstrating good performance in generation and reasoning tasks, one of its most fatal disadvantages is the lack of factual correctness. Generating unfactual texts not only leads to lower performances but also degrades the trust and validity of their applications. Chain-of-Thought (CoT) prompting improves trust and model performance on… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

  45. arXiv:2305.03088  [pdf, other

    cs.CL cs.AI

    Modeling What-to-ask and How-to-ask for Answer-unaware Conversational Question Generation

    Authors: Xuan Long Do, Bowei Zou, Shafiq Joty, Anh Tai Tran, Liangming Pan, Nancy F. Chen, Ai Ti Aw

    Abstract: Conversational Question Generation (CQG) is a critical task for machines to assist humans in fulfilling their information needs through conversations. The task is generally cast into two different settings: answer-aware and answer-unaware. While the former facilitates the models by exposing the expected answer, the latter is more realistic and receiving growing attentions recently. What-to-ask and… ▽ More

    Submitted 4 May, 2023; originally announced May 2023.

    Comments: 17 pages, ACL 2023

  46. arXiv:2305.02160  [pdf, other

    cs.CL

    Explaining Language Models' Predictions with High-Impact Concepts

    Authors: Ruochen Zhao, Shafiq Joty, Yongjie Wang, Tan Wang

    Abstract: The emergence of large-scale pretrained language models has posed unprecedented challenges in deriving explanations of why the model has made some predictions. Stemmed from the compositional nature of languages, spurious correlations have further undermined the trustworthiness of NLP systems, leading to unreliable model explanations that are merely correlated with the output predictions. To encour… ▽ More

    Submitted 3 May, 2023; originally announced May 2023.

  47. arXiv:2304.01295  [pdf, other

    cs.CL cs.AI

    Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning

    Authors: Lifu Tu, Jin Qu, Semih Yavuz, Shafiq Joty, Wenhao Liu, Caiming Xiong, Yingbo Zhou

    Abstract: Cross-lingual transfer of language models trained on high-resource languages like English has been widely studied for many NLP tasks, but focus on conversational tasks has been rather limited. This is partly due to the high cost of obtaining non-English conversational data, which results in limited coverage. In this work, we introduce XSGD for cross-lingual alignment pretraining, a parallel and la… ▽ More

    Submitted 26 January, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: Accepted to the Finding of the ACL: EACL 2024

  48. A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

    Authors: Iva Bojic, Josef Halim, Verena Suharman, Sreeja Tar, Qi Chwen Ong, Duy Phung, Mathieu Ravaut, Shafiq Joty, Josip Car

    Abstract: Low-quality data can cause downstream problems in high-stakes applications. Data-centric approach emphasizes on improving dataset quality to enhance model performance. High-quality datasets are needed for general-purpose Large Language Models (LLMs) training, as well as for domain-specific models, which are usually small in size as it is costly to engage a large number of domain experts for their… ▽ More

    Submitted 26 May, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

    Journal ref: 2023.In The Fourth Workshop on Insights from Negative Results in NLP, pages 19-32, Dubrovnik, Croatia. Association for Computational Linguistics

  49. arXiv:2303.10868  [pdf, other

    cs.CL

    Retrieving Multimodal Information for Augmented Generation: A Survey

    Authors: Ruochen Zhao, Hailin Chen, Weishi Wang, Fangkai Jiao, Xuan Long Do, Chengwei Qin, Bosheng Ding, Xiaobao Guo, Minzhi Li, Xingxuan Li, Shafiq Joty

    Abstract: As Large Language Models (LLMs) become popular, there emerged an important trend of using multimodality to augment the LLMs' generation ability, which enables LLMs to better interact with the world. However, there lacks a unified perception of at which stage and how to incorporate different modalities. In this survey, we review methods that assist and augment generative models by retrieving multim… ▽ More

    Submitted 30 November, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

  50. arXiv:2303.03608  [pdf, other

    cs.CL

    Towards Interpretable and Efficient Automatic Reference-Based Summarization Evaluation

    Authors: Yixin Liu, Alexander R. Fabbri, Yilun Zhao, Pengfei Liu, Shafiq Joty, Chien-Sheng Wu, Caiming Xiong, Dragomir Radev

    Abstract: Interpretability and efficiency are two important considerations for the adoption of neural automatic metrics. In this work, we develop strong-performing automatic metrics for reference-based summarization evaluation, based on a two-stage evaluation pipeline that first extracts basic information units from one text sequence and then checks the extracted units in another sequence. The metrics we de… ▽ More

    Submitted 16 November, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: EMNLP 2023 Camera Ready Version