Skip to main content

Showing 1–19 of 19 results for author: Mekala, D

  1. arXiv:2407.05778  [pdf, other

    cs.CL cs.AI

    When is the consistent prediction likely to be a correct prediction?

    Authors: Alex Nguyen, Dheeraj Mekala, Chengyu Dong, Jingbo Shang

    Abstract: Self-consistency (Wang et al., 2023) suggests that the most consistent answer obtained through large language models (LLMs) is more likely to be correct. In this paper, we challenge this argument and propose a nuanced correction. Our observations indicate that consistent answers derived through more computation i.e. longer reasoning texts, rather than simply the most consistent answer across all o… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  2. arXiv:2404.00439  [pdf, other

    cs.CL

    DOCMASTER: A Unified Platform for Annotation, Training, & Inference in Document Question-Answering

    Authors: Alex Nguyen, Zilong Wang, Jingbo Shang, Dheeraj Mekala

    Abstract: The application of natural language processing models to PDF documents is pivotal for various business applications yet the challenge of training models for this purpose persists in businesses due to specific hurdles. These include the complexity of working with PDF formats that necessitate parsing text and layout information for curating training data and the lack of privacy-preserving annotation… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  3. arXiv:2402.14158  [pdf, other

    cs.CL

    TOOLVERIFIER: Generalization to New Tools via Self-Verification

    Authors: Dheeraj Mekala, Jason Weston, Jack Lanchantin, Roberta Raileanu, Maria Lomeli, Jingbo Shang, Jane Dwivedi-Yu

    Abstract: Teaching language models to use tools is an important milestone towards building general assistants, but remains an open problem. While there has been significant progress on learning to use specific tools via fine-tuning, language models still struggle with learning how to robustly use new tools from only a few demonstrations. In this work we introduce a self-verification method which distinguish… ▽ More

    Submitted 13 March, 2024; v1 submitted 21 February, 2024; originally announced February 2024.

  4. arXiv:2402.11711  [pdf, other

    cs.CL

    MORL-Prompt: An Empirical Analysis of Multi-Objective Reinforcement Learning for Discrete Prompt Optimization

    Authors: Yasaman Jafari, Dheeraj Mekala, Rose Yu, Taylor Berg-Kirkpatrick

    Abstract: RL-based techniques can be used to search for prompts that when fed into a target language model maximize a set of user-specified reward functions. However, in many target applications, the natural reward functions are in tension with one another -- for example, content preservation vs. style matching in style transfer tasks. Current techniques focus on maximizing the average of reward functions,… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  5. arXiv:2402.10430  [pdf, other

    cs.CL

    Smaller Language Models are capable of selecting Instruction-Tuning Training Data for Larger Language Models

    Authors: Dheeraj Mekala, Alex Nguyen, Jingbo Shang

    Abstract: Instruction-tuning language models has become a crucial step in aligning them for general use. Typically, this process involves extensive training on large datasets, incurring high training costs. In this paper, we introduce a novel training data selection based on the learning percentage of the samples. We assert that current language models possess the capability to autonomously select high-qual… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

  6. arXiv:2311.03319  [pdf, other

    cs.CL cs.AI

    DAIL: Data Augmentation for In-Context Learning via Self-Paraphrase

    Authors: Dawei Li, Yaxuan Li, Dheeraj Mekala, Shuyao Li, Yulin wang, Xueqi Wang, William Hogan, Jingbo Shang

    Abstract: In-Context Learning (ICL) combined with pre-trained large language models has achieved promising results on various NLP tasks. However, ICL requires high-quality annotated demonstrations which might not be available in real-world scenarios. To overcome this limitation, we propose \textbf{D}ata \textbf{A}ugmentation for \textbf{I}n-Context \textbf{L}earning (\textbf{DAIL}). DAIL leverages the intui… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

    Comments: Course project for DSC 253 (Advanced Data-Driven Text Mining) at UCSD

  7. arXiv:2305.14696  [pdf, other

    cs.CL

    SELFOOD: Self-Supervised Out-Of-Distribution Detection via Learning to Rank

    Authors: Dheeraj Mekala, Adithya Samavedhi, Chengyu Dong, Jingbo Shang

    Abstract: Deep neural classifiers trained with cross-entropy loss (CE loss) often suffer from poor calibration, necessitating the task of out-of-distribution (OOD) detection. Traditional supervised OOD detection methods require expensive manual annotation of in-distribution and OOD samples. To address the annotation bottleneck, we introduce SELFOOD, a self-supervised OOD detection method that requires only… ▽ More

    Submitted 24 May, 2023; originally announced May 2023.

  8. arXiv:2305.12749  [pdf, other

    cs.CL

    A Benchmark on Extremely Weakly Supervised Text Classification: Reconcile Seed Matching and Prompting Approaches

    Authors: Zihan Wang, Tianle Wang, Dheeraj Mekala, Jingbo Shang

    Abstract: Etremely Weakly Supervised Text Classification (XWS-TC) refers to text classification based on minimal high-level human guidance, such as a few label-indicative seed words or classification instructions. There are two mainstream approaches for XWS-TC, however, never being rigorously compared: (1) training classifiers based on pseudo-labels generated by (softly) matching seed words (SEED) and (2) p… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: ACL 2023 Findings

  9. arXiv:2212.10815  [pdf, other

    cs.CL

    ZEROTOP: Zero-Shot Task-Oriented Semantic Parsing using Large Language Models

    Authors: Dheeraj Mekala, Jason Wolfe, Subhro Roy

    Abstract: We explore the use of large language models (LLMs) for zero-shot semantic parsing. Semantic parsing involves mapping natural language utterances to task-specific meaning representations. Language models are generally trained on the publicly available text and code and cannot be expected to directly generalize to domain-specific parsing tasks in a zero-shot setting. In this work, we propose ZEROTOP… ▽ More

    Submitted 21 December, 2022; originally announced December 2022.

  10. arXiv:2210.14380  [pdf, other

    cs.CL

    Progressive Sentiment Analysis for Code-Switched Text Data

    Authors: Sudhanshu Ranjan, Dheeraj Mekala, Jingbo Shang

    Abstract: Multilingual transformer language models have recently attracted much attention from researchers and are used in cross-lingual transfer learning for many NLP tasks such as text classification and named entity recognition. However, similar methods for transfer learning from monolingual text to code-switched text have not been extensively explored mainly due to the following challenges: (1) Code-swi… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

    Comments: To appear in Findings of EMNLP 2022

  11. arXiv:2205.12604  [pdf, other

    cs.CL

    Leveraging QA Datasets to Improve Generative Data Augmentation

    Authors: Dheeraj Mekala, Tu Vu, Timo Schick, Jingbo Shang

    Abstract: The ability of generative language models (GLMs) to generate text has improved considerably in the last few years, enabling their use for generative data augmentation. In this work, we propose CONDA, an approach to further improve GLMs' ability to generate synthetic data by reformulating data generation as context generation for a given question-answer (QA) pair and leveraging QA datasets for trai… ▽ More

    Submitted 25 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  12. arXiv:2205.12528  [pdf, other

    cs.CL

    LOPS: Learning Order Inspired Pseudo-Label Selection for Weakly Supervised Text Classification

    Authors: Dheeraj Mekala, Chengyu Dong, Jingbo Shang

    Abstract: Weakly supervised text classification methods typically train a deep neural classifier based on pseudo-labels. The quality of pseudo-labels is crucial to final performance but they are inevitably noisy due to their heuristic nature, so selecting the correct ones has a huge potential for performance boost. One straightforward solution is to select samples based on the softmax probability scores in… ▽ More

    Submitted 25 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

  13. arXiv:2109.10856  [pdf, other

    cs.CL cs.LG

    Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data

    Authors: Dheeraj Mekala, Varun Gangal, Jingbo Shang

    Abstract: Existing text classification methods mainly focus on a fixed label set, whereas many real-world applications require extending to new fine-grained classes as the number of samples per label increases. To accommodate such requirements, we introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data. Instead of ask… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: Accepted to appear in EMNLP 2021

  14. arXiv:2109.10855  [pdf, other

    cs.CL cs.LG

    BFClass: A Backdoor-free Text Classification Framework

    Authors: Zichao Li, Dheeraj Mekala, Chengyu Dong, Jingbo Shang

    Abstract: Backdoor attack introduces artificial vulnerabilities into the model by poisoning a subset of the training data via injecting triggers and modifying labels. Various trigger design strategies have been explored to attack text classifiers, however, defending such attacks remains an open problem. In this work, we propose BFClass, a novel efficient backdoor-free training framework for text classificat… ▽ More

    Submitted 22 September, 2021; originally announced September 2021.

    Comments: Accepted to appear in Findings of EMNLP 2021

  15. arXiv:2104.08723  [pdf, other

    cs.CL

    News Meets Microblog: Hashtag Annotation via Retriever-Generator

    Authors: Xiuwen Zheng, Dheeraj Mekala, Amarnath Gupta, Jingbo Shang

    Abstract: Hashtag annotation for microblog posts has been recently formulated as a sequence generation problem to handle emerging hashtags that are unseen in the training set. The state-of-the-art method leverages conversations initiated by posts to enrich contextual information for the short posts. However, it is unrealistic to assume the existence of conversations before the hashtag annotation itself. The… ▽ More

    Submitted 18 April, 2021; originally announced April 2021.

  16. arXiv:2010.12794  [pdf, other

    cs.CL cs.IR cs.LG

    X-Class: Text Classification with Extremely Weak Supervision

    Authors: Zihan Wang, Dheeraj Mekala, Jingbo Shang

    Abstract: In this paper, we explore text classification with extremely weak supervision, i.e., only relying on the surface text of class names. This is a more challenging setting than the seed-driven weak supervision, which allows a few seed words per class. We opt to attack this problem from a representation learning perspective -- ideal document representations should lead to nearly the same results betwe… ▽ More

    Submitted 7 February, 2022; v1 submitted 24 October, 2020; originally announced October 2020.

  17. arXiv:1802.06771  [pdf, other

    cs.LG cs.AI

    Bayes-optimal Hierarchical Classification over Asymmetric Tree-Distance Loss

    Authors: Dheeraj Mekala, Vivek Gupta, Purushottam Kar, Harish Karnick

    Abstract: Hierarchical classification is supervised multi-class classification problem over the set of class labels organized according to a hierarchy. In this report, we study the work by Ramaswamy et. al. on hierarchical classification over symmetric tree distance loss. We extend the consistency of hierarchical classification algorithm over asymmetric tree distance loss. We design a… ▽ More

    Submitted 17 February, 2018; originally announced February 2018.

    Comments: CS 396 Undergraduate Project Report, 17 Pages 3 Figures

  18. arXiv:1612.06821  [pdf, other

    cs.CL

    User Bias Removal in Review Score Prediction

    Authors: Rahul Wadbude, Vivek Gupta, Dheeraj Mekala, Harish Karnick

    Abstract: Review score prediction of text reviews has recently gained a lot of attention in recommendation systems. A major problem in models for review score prediction is the presence of noise due to user-bias in review scores. We propose two simple statistical methods to remove such noise and improve review score prediction. Compared to other methods that use multiple classifiers, one for each user, our… ▽ More

    Submitted 12 May, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

    Comments: 6 Pages, 3 Figures, Under Review. Update : Added more baselines results

  19. arXiv:1612.06778  [pdf, other

    cs.CL

    SCDV : Sparse Composite Document Vectors using soft clustering over distributional representations

    Authors: Dheeraj Mekala, Vivek Gupta, Bhargavi Paranjape, Harish Karnick

    Abstract: We present a feature vector formation technique for documents - Sparse Composite Document Vector (SCDV) - which overcomes several shortcomings of the current distributional paragraph vector representations that are widely used for text representation. In SCDV, word embedding's are clustered to capture multiple semantic contexts in which words occur. They are then chained together to form document… ▽ More

    Submitted 12 May, 2017; v1 submitted 20 December, 2016; originally announced December 2016.

    Comments: 10 pages, 5 figures. Update: Added results on Information Retrieval and Topic Coherence with Discussion