Skip to main content

Showing 1–50 of 299 results for author: Shah, S

  1. arXiv:2407.07370  [pdf, other

    cs.CL

    LokiLM: Technical Report

    Authors: Justin Kiefel, Shrey Shah

    Abstract: In this work, we introduce LokiLM, a 1.4B parameter large language model trained on 500B tokens. Our model performs strongly in natural language reasoning tasks and achieves state-of-the-art performance among models with 1.5B parameters or less. LokiLM is trained using multi-teacher knowledge distillation and high-quality training data to achieve benchmark results competitive with larger models tr… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  2. arXiv:2407.05106  [pdf, other

    cs.CV

    DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

    Authors: Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang

    Abstract: Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing thei… ▽ More

    Submitted 13 July, 2024; v1 submitted 6 July, 2024; originally announced July 2024.

    Comments: Accepted to ECCV 2024

  3. arXiv:2407.01047  [pdf, other

    cs.CL

    Development of Cognitive Intelligence in Pre-trained Language Models

    Authors: Raj Sanjay Shah, Khushi Bhardwaj, Sashank Varma

    Abstract: Recent studies show evidence for emergent cognitive abilities in Large Pre-trained Language Models (PLMs). The increasing cognitive alignment of these models has made them candidates for cognitive science theories. Prior research into the emergent cognitive abilities of PLMs has largely been path-independent to model training, i.e., has focused on the final model weights and not the intermediate s… ▽ More

    Submitted 12 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  4. arXiv:2406.16253  [pdf, other

    cs.CL

    LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

    Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

    Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More

    Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  5. arXiv:2406.14237  [pdf, other

    cs.IT

    Finite Alphabet Fast List Decoders for Polar Codes

    Authors: Syed Aizaz Ali Shah, Gerhard Bauch

    Abstract: The so-called fast polar decoding schedules are meant to improve the decoding speed of the sequential-natured successive cancellation list decoders. The decoding speedup is achieved by replacing various parts of the serial decoding process with efficient special-purpose decoder nodes. This work incorporates the fast decoding schedules for polar codes into their quantized finite alphabet decoding.… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 6 pages, 7 figures, submitted to IEEE GLOBECOM 2024

  6. arXiv:2406.11106  [pdf, other

    cs.CL cs.AI

    From Intentions to Techniques: A Comprehensive Taxonomy and Challenges in Text Watermarking for Large Language Models

    Authors: Harsh Nishant Lalai, Aashish Anantha Ramakrishnan, Raj Sanjay Shah, Dongwon Lee

    Abstract: With the rapid growth of Large Language Models (LLMs), safeguarding textual content against unauthorized use is crucial. Text watermarking offers a vital solution, protecting both - LLM-generated and plain text sources. This paper presents a unified overview of different perspectives behind designing watermarking techniques, through a comprehensive survey of the research literature. Our work has t… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  7. arXiv:2406.10723   

    cs.CV

    Eye in the Sky: Detection and Compliance Monitoring of Brick Kilns using Satellite Imagery

    Authors: Rishabh Mondal, Shataxi Dubey, Vannsh Jani, Shrimay Shah, Suraj Jaiswal, Zeel B Patel, Nipun Batra

    Abstract: Air pollution kills 7 million people annually. The brick manufacturing industry accounts for 8%-14% of air pollution in the densely populated Indo-Gangetic plain. Due to the unorganized nature of brick kilns, policy violation detection, such as proximity to human habitats, remains challenging. While previous studies have utilized computer vision-based machine learning methods for brick kiln detect… ▽ More

    Submitted 23 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: The PI was not in favour of making the work public on arXiv as the content is not yet ready to be released

  8. arXiv:2406.10085  [pdf, other

    cs.CL

    Enhancing Question Answering on Charts Through Effective Pre-training Tasks

    Authors: Ashim Gupta, Vivek Gupta, Shuo Zhang, Yujie He, Ning Zhang, Shalin Shah

    Abstract: To completely understand a document, the use of textual information is not enough. Understanding visual cues, such as layouts and charts, is also required. While the current state-of-the-art approaches for document understanding (both OCR-based and OCR-free) work well, a thorough analysis of their capabilities and limitations has not yet been performed. Therefore, in this work, we addresses the li… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  9. arXiv:2406.09409  [pdf, other

    cs.CV eess.IV

    CodedEvents: Optimal Point-Spread-Function Engineering for 3D-Tracking with Event Cameras

    Authors: Sachin Shah, Matthew Albert Chan, Haoming Cai, Jingxi Chen, Sakshum Kulshrestha, Chahat Deep Singh, Yiannis Aloimonos, Christopher Metzler

    Abstract: Point-spread-function (PSF) engineering is a well-established computational imaging technique that uses phase masks and other optical elements to embed extra information (e.g., depth) into the images captured by conventional CMOS image sensors. To date, however, PSF-engineering has not been applied to neuromorphic event cameras; a powerful new image sensing technology that responds to changes in t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2405.16128  [pdf, other

    cs.AI cs.CL

    How Well Do Deep Learning Models Capture Human Concepts? The Case of the Typicality Effect

    Authors: Siddhartha K. Vemuri, Raj Sanjay Shah, Sashank Varma

    Abstract: How well do representations learned by ML models align with those of humans? Here, we consider concept representations learned by deep learning models and evaluate whether they show a fundamental behavioral signature of human concepts, the typicality effect. This is the finding that people judge some instances (e.g., robin) of a category (e.g., Bird) to be more typical than others (e.g., penguin).… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: To appear at CogSci 2024

  11. arXiv:2405.16042  [pdf, other

    cs.CL

    Incremental Comprehension of Garden-Path Sentences by Large Language Models: Semantic Interpretation, Syntactic Re-Analysis, and Attention

    Authors: Andrew Li, Xianle Feng, Siddhant Narang, Austin Peng, Tianle Cai, Raj Sanjay Shah, Sashank Varma

    Abstract: When reading temporarily ambiguous garden-path sentences, misinterpretations sometimes linger past the point of disambiguation. This phenomenon has traditionally been studied in psycholinguistic experiments using online measures such as reading times and offline measures such as comprehension questions. Here, we investigate the processing of garden-path sentences and the fate of lingering misinter… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: Accepted by CogSci-24

  12. arXiv:2404.14219  [pdf, other

    cs.CL cs.AI

    Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

    Authors: Marah Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Jianmin Bao, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Qin Cai, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Dong Chen, Dongdong Chen, Yen-Chun Chen, Yi-Ling Chen, Parul Chopra , et al. (90 additional authors not shown)

    Abstract: We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset… ▽ More

    Submitted 23 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: 19 pages

  13. arXiv:2404.13008  [pdf, other

    cs.SD eess.AS

    Enhancing Generalization in Audio Deepfake Detection: A Neural Collapse based Sampling and Training Approach

    Authors: Mohammed Yousif, Jonat John Mathew, Huzaifa Pallan, Agamjeet Singh Padda, Syed Daniyal Shah, Sara Adamski, Madhu Reddiboina, Arjun Pankajakshan

    Abstract: Generalization in audio deepfake detection presents a significant challenge, with models trained on specific datasets often struggling to detect deepfakes generated under varying conditions and unknown algorithms. While collectively training a model using diverse datasets can enhance its generalization ability, it comes with high computational costs. To address this, we propose a neural collapse-b… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  14. arXiv:2404.04003  [pdf, other

    cs.CL

    BuDDIE: A Business Document Dataset for Multi-task Information Extraction

    Authors: Ran Zmigrod, Dongsheng Wang, Mathieu Sibue, Yulong Pei, Petr Babkin, Ivan Brugere, Xiaomo Liu, Nacho Navarro, Antony Papadimitriou, William Watson, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

    Abstract: The field of visually rich document understanding (VRDU) aims to solve a multitude of well-researched NLP tasks in a multi-modal domain. Several datasets exist for research on specific tasks of VRDU such as document classification (DC), key entity extraction (KEE), entity linking, visual question answering (VQA), inter alia. These datasets cover documents like invoices and receipts with sparse ann… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

  15. arXiv:2404.01591  [pdf, other

    cs.CV

    Language Model Guided Interpretable Video Action Reasoning

    Authors: Ning Wang, Guangming Zhu, HS Li, Liang Zhang, Syed Afaq Ali Shah, Mohammed Bennamoun

    Abstract: While neural networks have excelled in video action recognition tasks, their black-box nature often obscures the understanding of their decision-making processes. Recent approaches used inherently interpretable models to analyze video actions in a manner akin to human reasoning. These models, however, usually fall short in performance compared to their black-box counterparts. In this work, we pres… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  16. arXiv:2403.18152  [pdf, other

    cs.CL

    Large Language Models as Financial Data Annotators: A Study on Effectiveness and Efficiency

    Authors: Toyin Aguda, Suchetha Siddagangappa, Elena Kochkina, Simerjot Kaur, Dongsheng Wang, Charese Smiley, Sameena Shah

    Abstract: Collecting labeled datasets in finance is challenging due to scarcity of domain experts and higher cost of employing them. While Large Language Models (LLMs) have demonstrated remarkable performance in data annotation tasks on general domain datasets, their effectiveness on domain specific datasets remains underexplored. To address this gap, we investigate the potential of LLMs as efficient data a… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted to LREC-COLING 2024

  17. arXiv:2403.15482  [pdf, other

    cs.CL cs.HC cs.LG

    Multi-Level Feedback Generation with Large Language Models for Empowering Novice Peer Counselors

    Authors: Alicja Chaszczewicz, Raj Sanjay Shah, Ryan Louie, Bruce A Arnow, Robert Kraut, Diyi Yang

    Abstract: Realistic practice and tailored feedback are key processes for training peer counselors with clinical skills. However, existing mechanisms of providing feedback largely rely on human supervision. Peer counselors often lack mechanisms to receive detailed feedback from experienced mentors, making it difficult for them to support the large number of people with mental health issues who use peer couns… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  18. arXiv:2403.14645  [pdf

    cs.CY cs.AI

    Designing Multi-Step Action Models for Enterprise AI Adoption

    Authors: Shreyash Mishra, Shrey Shah, Rex Pereira

    Abstract: This paper introduces the Multi-Step Action Model (MSAM), a closed-source AI model designed by Empsing to address challenges hindering AI adoption in enterprises. Through a holistic examination, this paper explores MSAM's foundational principles, design architecture, and future trajectory. It evaluates MSAM's performance via rigorous testing methodologies and envisions its potential impact on adva… ▽ More

    Submitted 21 February, 2024; originally announced March 2024.

    Comments: 8 pages, 5 figures

    Report number: EMP-202401 MSC Class: 68T42 ACM Class: I.2.1; I.2.8

  19. arXiv:2403.11021  [pdf, other

    cs.CV cs.AI

    Towards Neuro-Symbolic Video Understanding

    Authors: Minkyu Choi, Harsh Goel, Mohammad Omama, Yunhao Yang, Sahil Shah, Sandeep Chinchali

    Abstract: The unprecedented surge in video data production in recent years necessitates efficient tools to extract meaningful frames from videos for downstream tasks. Long-term temporal reasoning is a key desideratum for frame retrieval systems. While state-of-the-art foundation models, like VideoLLaMA and ViCLIP, are proficient in short-term semantic understanding, they surprisingly fail at long-term reaso… ▽ More

    Submitted 15 July, 2024; v1 submitted 16 March, 2024; originally announced March 2024.

    Comments: Accepted by The European Conference on Computer Vision (ECCV) 2024

  20. Code Revert Prediction with Graph Neural Networks: A Case Study at J.P. Morgan Chase

    Authors: Yulong Pei, Salwa Alamir, Rares Dolga, Sameena Shah

    Abstract: Code revert prediction, a specialized form of software defect detection, aims to forecast or predict the likelihood of code changes being reverted or rolled back in software development. This task is very important in practice because by identifying code changes that are more prone to being reverted, developers and project managers can proactively take measures to prevent issues, improve code qual… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

    Comments: SDD'23: the 1st International Workshop on Software Defect Datasets

  21. arXiv:2403.09260  [pdf, other

    cs.SI physics.soc-ph

    Belief and Persuasion in Scientific Discourse on Social Media: A Study of the COVID-19 Pandemic

    Authors: Salwa Alamir, Armineh Nourbakhsh, Cecilia Tilli, Sameena Shah, Manuela Veloso

    Abstract: Research into COVID-19 has been rapidly evolving since the onset of the pandemic. This occasionally results in contradictory recommendations by credible sources of scientific opinion, public health authorities, and medical professionals. In this study, we examine whether this has resulted in a lack of trust in scientific opinion, by examining the belief patterns of social media users and their rea… ▽ More

    Submitted 14 March, 2024; originally announced March 2024.

  22. Log Summarisation for Defect Evolution Analysis

    Authors: Rares Dolga, Ran Zmigrod, Rui Silva, Salwa Alamir, Sameena Shah

    Abstract: Log analysis and monitoring are essential aspects in software maintenance and identifying defects. In particular, the temporal nature and vast size of log data leads to an interesting and important research question: How can logs be summarised and monitored over time? While this has been a fundamental topic of research in the software engineering community, work has typically focused on heuristic-… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

  23. arXiv:2403.05683  [pdf, other

    cs.AI cs.LG

    Efficient Public Health Intervention Planning Using Decomposition-Based Decision-Focused Learning

    Authors: Sanket Shah, Arun Suggala, Milind Tambe, Aparna Taneja

    Abstract: The declining participation of beneficiaries over time is a key concern in public health programs. A popular strategy for improving retention is to have health workers `intervene' on beneficiaries at risk of dropping out. However, the availability and time of these health workers are limited resources. As a result, there has been a line of research on optimizing these limited intervention resource… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 12 pages, 3 figures, 2 tables

  24. arXiv:2402.14454  [pdf, other

    cs.CV

    CCPA: Long-term Person Re-Identification via Contrastive Clothing and Pose Augmentation

    Authors: Vuong D. Nguyen, Shishir K. Shah

    Abstract: Long-term Person Re-Identification (LRe-ID) aims at matching an individual across cameras after a long period of time, presenting variations in clothing, pose, and viewpoint. In this work, we propose CCPA: Contrastive Clothing and Pose Augmentation framework for LRe-ID. Beyond appearance, CCPA captures body shape information which is cloth-invariant using a Relation Graph Attention Network. Traini… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

  25. arXiv:2402.12867  [pdf

    cs.SE cs.LG

    Towards MLOps: A DevOps Tools Recommender System for Machine Learning System

    Authors: Pir Sami Ullah Shah, Naveed Ahmad, Mirza Omer Beg

    Abstract: Applying DevOps practices to machine learning system is termed as MLOps and machine learning systems evolve on new data unlike traditional systems on requirements. The objective of MLOps is to establish a connection between different open-source tools to construct a pipeline that can automatically perform steps to construct a dataset, train the machine learning model and deploy the model to the pr… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  26. arXiv:2402.11771  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Evaluating the Effectiveness of Index-Based Treatment Allocation

    Authors: Niclas Boehmer, Yash Nair, Sanket Shah, Lucas Janson, Aparna Taneja, Milind Tambe

    Abstract: When resources are scarce, an allocation policy is needed to decide who receives a resource. This problem occurs, for instance, when allocating scarce medical resources and is often solved using modern ML methods. This paper introduces methods to evaluate index-based allocation policies -- that allocate a fixed number of resources to those who need them the most -- by using data from a randomized… ▽ More

    Submitted 18 February, 2024; originally announced February 2024.

  27. arXiv:2402.07258  [pdf, other

    cs.CV

    Data Quality Aware Approaches for Addressing Model Drift of Semantic Segmentation Models

    Authors: Samiha Mirza, Vuong D. Nguyen, Pranav Mantini, Shishir K. Shah

    Abstract: In the midst of the rapid integration of artificial intelligence (AI) into real world applications, one pressing challenge we confront is the phenomenon of model drift, wherein the performance of AI models gradually degrades over time, compromising their effectiveness in real-world, dynamic environments. Once identified, we need techniques for handling this drift to preserve the model performance… ▽ More

    Submitted 11 February, 2024; originally announced February 2024.

  28. arXiv:2402.05282  [pdf, other

    cs.CL

    TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing

    Authors: Ran Zmigrod, Zhiqiang Ma, Armineh Nourbakhsh, Sameena Shah

    Abstract: Visually Rich Form Understanding (VRFU) poses a complex research problem due to the documents' highly structured nature and yet highly variable style and content. Current annotation schemes decompose form understanding and omit key hierarchical structure, making development and evaluation of end-to-end models difficult. In this paper, we propose a novel F1 metric to evaluate form parsers and descr… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  29. arXiv:2402.03716  [pdf, other

    cs.CV

    Attention-based Shape and Gait Representations Learning for Video-based Cloth-Changing Person Re-Identification

    Authors: Vuong D. Nguyen, Samiha Mirza, Pranav Mantini, Shishir K. Shah

    Abstract: Current state-of-the-art Video-based Person Re-Identification (Re-ID) primarily relies on appearance features extracted by deep learning models. These methods are not applicable for long-term analysis in real-world scenarios where persons have changed clothes, making appearance information unreliable. In this work, we deal with the practical problem of Video-based Cloth-Changing Person Re-ID (VCCR… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  30. arXiv:2402.00838  [pdf, other

    cs.CL

    OLMo: Accelerating the Science of Language Models

    Authors: Dirk Groeneveld, Iz Beltagy, Pete Walsh, Akshita Bhagia, Rodney Kinney, Oyvind Tafjord, Ananya Harsh Jha, Hamish Ivison, Ian Magnusson, Yizhong Wang, Shane Arora, David Atkinson, Russell Authur, Khyathi Raghavi Chandu, Arman Cohan, Jennifer Dumas, Yanai Elazar, Yuling Gu, Jack Hessel, Tushar Khot, William Merrill, Jacob Morrison, Niklas Muennighoff, Aakanksha Naik, Crystal Nam , et al. (18 additional authors not shown)

    Abstract: Language models (LMs) have become ubiquitous in both NLP research and in commercial product offerings. As their commercial importance has surged, the most powerful models have become closed off, gated behind proprietary interfaces, with important details of their training data, architectures, and development undisclosed. Given the importance of these details in scientifically studying these models… ▽ More

    Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  31. arXiv:2401.13218  [pdf, other

    cs.CL

    ULTRA: Unleash LLMs' Potential for Event Argument Extraction through Hierarchical Modeling and Pair-wise Refinement

    Authors: Xinliang Frederick Zhang, Carter Blum, Temma Choji, Shalin Shah, Alakananda Vempala

    Abstract: Structural extraction of events within discourse is critical since it avails a deeper understanding of communication patterns and behavior trends. Event argument extraction (EAE), at the core of event-centric understanding, is the task of identifying role-specific text spans (i.e., arguments) for a given event. Document-level EAE (DocEAE) focuses on arguments that are scattered across an entire do… ▽ More

    Submitted 23 January, 2024; originally announced January 2024.

  32. arXiv:2401.10393  [pdf, other

    cs.LG cs.AI

    Catastrophic Interference is Mitigated in Naturalistic Power-Law Learning Environments

    Authors: Atith Gandhi, Raj Sanjay Shah, Vijay Marupudi, Sashank Varma

    Abstract: Neural networks often suffer from catastrophic interference (CI): performance on previously learned tasks drops off significantly when learning a new task. This contrasts strongly with humans, who can sequentially learn new tasks without appreciably forgetting previous tasks. Prior work has explored various techniques for mitigating CI such as regularization, rehearsal, generative replay, and dist… ▽ More

    Submitted 22 January, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

  33. arXiv:2401.03742  [pdf, other

    cs.CV

    Flowmind2Digital: The First Comprehensive Flowmind Recognition and Conversion Approach

    Authors: Huanyu Liu, Jianfeng Cai, Tingjia Zhang, Hongsheng Li, Siyuan Wang, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang

    Abstract: Flowcharts and mind maps, collectively known as flowmind, are vital in daily activities, with hand-drawn versions facilitating real-time collaboration. However, there's a growing need to digitize them for efficient processing. Automated conversion methods are essential to overcome manual conversion challenges. Existing sketch recognition methods face limitations in practical situations, being fiel… ▽ More

    Submitted 8 January, 2024; originally announced January 2024.

  34. arXiv:2401.02823  [pdf, other

    cs.CL cs.IR

    DocGraphLM: Documental Graph Language Model for Information Extraction

    Authors: Dongsheng Wang, Zhiqiang Ma, Armineh Nourbakhsh, Kang Gu, Sameena Shah

    Abstract: Advances in Visually Rich Document Understanding (VrDU) have enabled information extraction and question answering over documents with complex layouts. Two tropes of architectures have emerged -- transformer-based models inspired by LLMs, and Graph Neural Networks. In this paper, we introduce DocGraphLM, a novel framework that combines pre-trained language models with graph semantics. To achieve t… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Published at SIGIR'23 (repost for easier access)

  35. arXiv:2312.10775  [pdf, other

    cs.HC

    What Makes Digital Support Effective? How Therapeutic Skills Affect Clinical Well-Being

    Authors: Anna Fang, Wenjie Yang, Raj Sanjay Shah, Yash Mathur, Diyi Yang, Haiyi Zhu, Robert Kraut

    Abstract: Online mental health support communities have grown in recent years for providing accessible mental and emotional health support through volunteer counselors. Despite millions of people participating in chat support on these platforms, the clinical effectiveness of these communities on mental health symptoms remains unknown. Furthermore, although volunteers receive some training based on establish… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  36. arXiv:2312.05003  [pdf, other

    cs.IT

    On the Regret of Online Coded Caching

    Authors: Anupam Nayak, Sheel Shah, Nikhil Karamchandani

    Abstract: We consider the widely studied problem of coded caching under non-uniform requests where users independently request files according to some underlying popularity distribution in each slot. This work is a first step towards analyzing this framework through the lens of online learning. We consider the case where the underlying request distribution is apriori unknown and propose an online policy as… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  37. arXiv:2312.04073  [pdf, other

    cs.GT cs.MA

    Information Design for Hybrid Work under Infectious Disease Transmission Risk

    Authors: Sohil Shah, Saurabh Amin, Patrick Jaillet

    Abstract: We study a planner's provision of information to manage workplace occupancy when strategic workers (agents) face risk of infectious disease transmission. The planner implements an information mechanism to signal information about the underlying risk of infection at the workplace. Agents update their belief over the risk parameter using this information and choose to work in-person or remotely. We… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  38. arXiv:2312.00634  [pdf

    eess.IV cs.CV

    A Recent Survey of Vision Transformers for Medical Image Segmentation

    Authors: Asifullah Khan, Zunaira Rauf, Abdul Rehman Khan, Saima Rathore, Saddam Hussain Khan, Najmus Saher Shah, Umair Farooq, Hifsa Asif, Aqsa Asif, Umme Zahoora, Rafi Ullah Khalil, Suleman Qamar, Umme Hani Asif, Faiza Babar Khan, Abdul Majid, Jeonghwan Gwak

    Abstract: Medical image segmentation plays a crucial role in various healthcare applications, enabling accurate diagnosis, treatment planning, and disease monitoring. Traditionally, convolutional neural networks (CNNs) dominated this domain, excelling at local feature extraction. However, their limitations in capturing long-range dependencies across image regions pose challenges for segmenting complex, inte… ▽ More

    Submitted 18 December, 2023; v1 submitted 1 December, 2023; originally announced December 2023.

  39. arXiv:2311.04666  [pdf, other

    cs.CL cs.AI

    Pre-training LLMs using human-like development data corpus

    Authors: Khushi Bhardwaj, Raj Sanjay Shah, Sashank Varma

    Abstract: Pre-trained Large Language Models (LLMs) have shown success in a diverse set of language inference and understanding tasks. The pre-training stage of LLMs looks at a large corpus of raw textual data. The BabyLM shared task compares LLM pre-training to human language acquisition, where the number of tokens seen by 13-year-old kids is magnitudes smaller than the number of tokens seen by LLMs. In thi… ▽ More

    Submitted 10 January, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Proceedings of the BabyLM Challenge at the 27th Conference on Computational Natural Language Learning

  40. arXiv:2310.18930  [pdf, other

    cs.CL

    Retrofitting Light-weight Language Models for Emotions using Supervised Contrastive Learning

    Authors: Sapan Shah, Sreedhar Reddy, Pushpak Bhattacharyya

    Abstract: We present a novel retrofitting method to induce emotion aspects into pre-trained language models (PLMs) such as BERT and RoBERTa. Our method updates pre-trained network weights using contrastive learning so that the text fragments exhibiting similar emotions are encoded nearby in the representation space, and the fragments with different emotion content are pushed apart. While doing so, it also e… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: EMNLP 2023 Camera Ready Version

  41. arXiv:2310.13263  [pdf, other

    cs.CV

    UE4-NeRF:Neural Radiance Field for Real-Time Rendering of Large-Scale Scene

    Authors: Jiaming Gu, Minchao Jiang, Hongsheng Li, Xiaoyuan Lu, Guangming Zhu, Syed Afaq Ali Shah, Liang Zhang, Mohammed Bennamoun

    Abstract: Neural Radiance Fields (NeRF) is a novel implicit 3D reconstruction method that shows immense potential and has been gaining increasing attention. It enables the reconstruction of 3D scenes solely from a set of photographs. However, its real-time rendering capability, especially for interactive real-time rendering of large-scale scenes, still has significant limitations. To address these challenge… ▽ More

    Submitted 20 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS2023

  42. arXiv:2310.08678  [pdf, other

    cs.CL cs.AI q-fin.GN

    Can GPT models be Financial Analysts? An Evaluation of ChatGPT and GPT-4 on mock CFA Exams

    Authors: Ethan Callanan, Amarachi Mbakwe, Antony Papadimitriou, Yulong Pei, Mathieu Sibue, Xiaodan Zhu, Zhiqiang Ma, Xiaomo Liu, Sameena Shah

    Abstract: Large Language Models (LLMs) have demonstrated remarkable performance on a wide range of Natural Language Processing (NLP) tasks, often matching or even beating state-of-the-art task-specific models. This study aims at assessing the financial reasoning capabilities of LLMs. We leverage mock exam questions of the Chartered Financial Analyst (CFA) Program to conduct a comprehensive evaluation of Cha… ▽ More

    Submitted 12 October, 2023; originally announced October 2023.

  43. arXiv:2310.07886  [pdf, other

    cs.CV

    A Survey of Feature Types and Their Contributions for Camera Tampering Detection

    Authors: Pranav Mantini, Shishir K. Shah

    Abstract: Camera tamper detection is the ability to detect unauthorized and unintentional alterations in surveillance cameras by analyzing the video. Camera tampering can occur due to natural events or it can be caused intentionally to disrupt surveillance. We cast tampering detection as a change detection problem, and perform a review of the existing literature with emphasis on feature types. We formulate… ▽ More

    Submitted 11 October, 2023; originally announced October 2023.

  44. arXiv:2310.02486  [pdf, other

    eess.IV cs.CV cs.LG

    OCU-Net: A Novel U-Net Architecture for Enhanced Oral Cancer Segmentation

    Authors: Ahmed Albishri, Syed Jawad Hussain Shah, Yugyung Lee, Rong Wang

    Abstract: Accurate detection of oral cancer is crucial for improving patient outcomes. However, the field faces two key challenges: the scarcity of deep learning-based image segmentation research specifically targeting oral cancer and the lack of annotated data. Our study proposes OCU-Net, a pioneering U-Net image segmentation architecture exclusively designed to detect oral cancer in hematoxylin and eosin… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  45. arXiv:2310.00010  [pdf

    cs.RO cs.AI cs.LG

    Artificial Empathy Classification: A Survey of Deep Learning Techniques, Datasets, and Evaluation Scales

    Authors: Sharjeel Tahir, Syed Afaq Shah, Jumana Abu-Khalaf

    Abstract: From the last decade, researchers in the field of machine learning (ML) and assistive developmental robotics (ADR) have taken an interest in artificial empathy (AE) as a possible future paradigm for human-robot interaction (HRI). Humans learn empathy since birth, therefore, it is challenging to instill this sense in robots and intelligent machines. Nevertheless, by training over a vast amount of d… ▽ More

    Submitted 4 September, 2023; originally announced October 2023.

    MSC Class: 68T40

  46. arXiv:2309.06550  [pdf, other

    cs.CL cs.AI

    Synthetic Text Generation using Hypergraph Representations

    Authors: Natraj Raman, Sameena Shah

    Abstract: Generating synthetic variants of a document is often posed as text-to-text transformation. We propose an alternate LLM based method that first decomposes a document into semantic frames and then generates text using this interim sparse format. The frames are modeled using a hypergraph, which allows perturbing the frame contents in a principled manner. Specifically, new hyperedges are mined through… ▽ More

    Submitted 2 December, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  47. arXiv:2309.06195  [pdf, other

    cs.LG eess.SP

    Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding

    Authors: Shaik Basheeruddin Shah, Pradyumna Pradhan, Wei Pu, Ramunaidu Randhi, Miguel R. D. Rodrigues, Yonina C. Eldar

    Abstract: Solving linear inverse problems plays a crucial role in numerous applications. Algorithm unfolding based, model-aware data-driven approaches have gained significant attention for effectively addressing these problems. Learned iterative soft-thresholding algorithm (LISTA) and alternating direction method of multipliers compressive sensing network (ADMM-CSNet) are two widely used such approaches, ba… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  48. arXiv:2309.03231  [pdf

    quant-ph cs.AI cs.LG

    Quantum-AI empowered Intelligent Surveillance: Advancing Public Safety Through Innovative Contraband Detection

    Authors: Syed Atif Ali Shah, Nasir Algeelani, Najeeb Al-Sammarraie

    Abstract: Surveillance systems have emerged as crucial elements in upholding peace and security in the modern world. Their ubiquity aids in monitoring suspicious activities effectively. However, in densely populated environments, continuous active monitoring becomes impractical, necessitating the development of intelligent surveillance systems. AI integration in the surveillance domain was a big revolution,… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

  49. arXiv:2308.14089  [pdf, other

    cs.CL cs.AI cs.LG

    MedAlign: A Clinician-Generated Dataset for Instruction Following with Electronic Medical Records

    Authors: Scott L. Fleming, Alejandro Lozano, William J. Haberkorn, Jenelle A. Jindal, Eduardo P. Reis, Rahul Thapa, Louis Blankemeier, Julian Z. Genkins, Ethan Steinberg, Ashwin Nayak, Birju S. Patel, Chia-Chun Chiang, Alison Callahan, Zepeng Huo, Sergios Gatidis, Scott J. Adams, Oluseyi Fayanju, Shreya J. Shah, Thomas Savage, Ethan Goh, Akshay S. Chaudhari, Nima Aghaeepour, Christopher Sharp, Michael A. Pfeffer, Percy Liang , et al. (5 additional authors not shown)

    Abstract: The ability of large language models (LLMs) to follow natural language instructions with human-level fluency suggests many opportunities in healthcare to reduce administrative burden and improve quality of care. However, evaluating LLMs on realistic text generation tasks for healthcare remains challenging. Existing question answering datasets for electronic health record (EHR) data fail to capture… ▽ More

    Submitted 24 December, 2023; v1 submitted 27 August, 2023; originally announced August 2023.

  50. Software Startups -- A Research Agenda

    Authors: Michael Unterkalmsteiner, Pekka Abrahamsson, Xiaofeng Wang, Anh Nguyen-Duc, Syed M. Ali Shah, Sohaib Shahid Bajwa, Guido H. Baltes, Kieran Conboy, Eoin Cullina, Denis Dennehy, Henry Edison, Carlos Fernández-Sánchez, Juan Garbajosa, Tony Gorschek, Eriks Klotins, Laura Hokkanen, Fabio Kon, Ilaria Lunesu, Michele Marchesi, Lorraine Morgan, Markku Oivo, Christoph Selig, Pertti Seppänen, Roger Sweetman, Pasi Tyrväinen , et al. (2 additional authors not shown)

    Abstract: Software startup companies develop innovative, software-intensive products within limited time frames and with few resources, searching for sustainable and scalable business models. Software startups are quite distinct from traditional mature software companies, but also from micro-, small-, and medium-sized enterprises, introducing new challenges relevant for software engineering research. This p… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Journal ref: e-Informatica Softw. Eng. J. 10(1): 89-124 (2016)