Skip to main content

Showing 1–19 of 19 results for author: Zi, Y

  1. arXiv:2406.01983  [pdf, other

    cs.CL

    RKLD: Reverse KL-Divergence-based Knowledge Distillation for Unlearning Personal Information in Large Language Models

    Authors: Bichen Wang, Yuzhe Zi, Yixin Sun, Yanyan Zhao, Bing Qin

    Abstract: With the passage of the Right to Be Forgotten (RTBF) regulations and the scaling up of language model training datasets, research on model unlearning in large language models (LLMs) has become more crucial. Before the era of LLMs, machine unlearning research focused mainly on classification tasks in models with small parameters. In these tasks, the content to be forgotten or retained is clear and… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Work is in progress

  2. arXiv:2406.01605  [pdf, other

    eess.IV cs.CV

    An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

    Authors: Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

    Abstract: The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the… ▽ More

    Submitted 26 May, 2024; originally announced June 2024.

  3. arXiv:2405.11704  [pdf

    cs.LG cs.AI

    Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

    Authors: Taiyuan Mei, Yun Zi, Xiaohan Cheng, Zijun Gao, Qi Wang, Haowei Yang

    Abstract: The internal structure and operation mechanism of large-scale language models are analyzed theoretically, especially how Transformer and its derivative architectures can restrict computing efficiency while capturing long-term dependencies. Further, we dig deep into the efficiency bottleneck of the training phase, and evaluate in detail the contribution of adaptive optimization algorithms (such as… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  4. arXiv:2401.15232  [pdf, other

    cs.HC

    How Beginning Programmers and Code LLMs (Mis)read Each Other

    Authors: Sydney Nguyen, Hannah McLean Babe, Yangtian Zi, Arjun Guha, Carolyn Jane Anderson, Molly Q Feldman

    Abstract: Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluat… ▽ More

    Submitted 26 January, 2024; originally announced January 2024.

    Comments: Conditionally Accepted to CHI 2024

  5. arXiv:2312.09932  [pdf, other

    cs.CL cs.AI

    RDR: the Recap, Deliberate, and Respond Method for Enhanced Language Understanding

    Authors: Yuxin Zi, Hariram Veeramani, Kaushik Roy, Amit Sheth

    Abstract: Natural language understanding (NLU) using neural network pipelines often requires additional context that is not solely present in the input data. Through Prior research, it has been evident that NLU benchmarks are susceptible to manipulation by neural models, wherein these models exploit statistical artifacts within the encoded external knowledge to artificially inflate performance metrics for d… ▽ More

    Submitted 5 March, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

  6. arXiv:2306.13865  [pdf, other

    cs.CL

    IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations

    Authors: Yuxin Zi, Kaushik Roy, Vignesh Narayanan, Manas Gaur, Amit Sheth

    Abstract: Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Distributed semantics capture common statistical patterns among language tokens (words, phrases, and sentences) from large amounts of data. LLMs perform exceedingly well across General Language Understanding Evaluation (GLUE) tasks designed to test a model's understanding of the meanings of the input tokens… ▽ More

    Submitted 24 June, 2023; originally announced June 2023.

    Comments: Accepted for publication at the KDD workshop on Knowledge-infused Machine Learning, 2023

  7. arXiv:2306.13501  [pdf, other

    cs.CL

    Knowledge-Infused Self Attention Transformers

    Authors: Kaushik Roy, Yuxin Zi, Vignesh Narayanan, Manas Gaur, Amit Sheth

    Abstract: Transformer-based language models have achieved impressive success in various natural language processing tasks due to their ability to capture complex dependencies and contextual information using self-attention mechanisms. However, they are not without limitations. These limitations include hallucinations, where they produce incorrect outputs with high confidence, and alignment issues, where the… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Comments: Accepted for publication at the Second Workshop on Knowledge Augmented Methods for NLP, colocated with KDD 2023

  8. arXiv:2306.09824  [pdf, other

    cs.CL cs.AI

    Process Knowledge-infused Learning for Clinician-friendly Explanations

    Authors: Kaushik Roy, Yuxin Zi, Manas Gaur, Jinendra Malekar, Qi Zhang, Vignesh Narayanan, Amit Sheth

    Abstract: Language models have the potential to assess mental health using social media data. By analyzing online posts and conversations, these models can detect patterns indicating mental health conditions like depression, anxiety, or suicidal thoughts. They examine keywords, language markers, and sentiment to gain insights into an individual's mental well-being. This information is crucial for early dete… ▽ More

    Submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted for Publication at AAAI Second Symposium on Human Partnership with Medical Artificial Intelligence (HUMAN.AI Summer 2023): Design, Operationalization, and Ethics. July 17-19, 2023

  9. arXiv:2306.04556  [pdf, other

    cs.LG cs.HC cs.SE

    StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code

    Authors: Hannah McLean Babe, Sydney Nguyen, Yangtian Zi, Arjun Guha, Molly Q Feldman, Carolyn Jane Anderson

    Abstract: Code LLMs are being rapidly deployed and there is evidence that they can make professional programmers more productive. Current benchmarks for code generation measure whether models generate correct programs given an expert prompt. In this paper, we present a new benchmark containing multiple prompts per problem, written by a specific population of non-expert prompters: beginning programmers. Stud… ▽ More

    Submitted 7 June, 2023; originally announced June 2023.

  10. arXiv:2306.01805  [pdf, other

    cs.CL cs.AI cs.IR

    Cook-Gen: Robust Generative Modeling of Cooking Actions from Recipes

    Authors: Revathy Venkataramanan, Kaushik Roy, Kanak Raj, Renjith Prasad, Yuxin Zi, Vignesh Narayanan, Amit Sheth

    Abstract: As people become more aware of their food choices, food computation models have become increasingly popular in assisting people in maintaining healthy eating habits. For example, food recommendation systems analyze recipe instructions to assess nutritional contents and provide recipe recommendations. The recent and remarkable successes of generative AI methods, such as auto-regressive large langua… ▽ More

    Submitted 1 June, 2023; originally announced June 2023.

  11. arXiv:2305.06161  [pdf, other

    cs.CL cs.AI cs.PL cs.SE

    StarCoder: may the source be with you!

    Authors: Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu , et al. (42 additional authors not shown)

    Abstract: The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15.5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention. StarCoderBase is trained on 1 trillion tokens sourced from The Stack, a large colle… ▽ More

    Submitted 13 December, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

  12. arXiv:2305.04989  [pdf, other

    cs.CL cs.AI

    Knowledge Graph Guided Semantic Evaluation of Language Models For User Trust

    Authors: Kaushik Roy, Tarun Garg, Vedant Palit, Yuxin Zi, Vignesh Narayanan, Amit Sheth

    Abstract: A fundamental question in natural language processing is - what kind of language structure and semantics is the language model capturing? Graph formats such as knowledge graphs are easy to evaluate as they explicitly express language semantics and structure. This study evaluates the semantics encoded in the self-attention transformers by leveraging explicit knowledge graph structures. We propose n… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  13. arXiv:2301.03988  [pdf, other

    cs.SE cs.AI cs.LG

    SantaCoder: don't reach for the stars!

    Authors: Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo , et al. (16 additional authors not shown)

    Abstract: The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code. This tech report describes the progress of the collaboration until December 2022, outlining the current state of the Personally Identifiable Information (PII) redaction pipeline, the experiments conducted to de-risk the model architecture, and the experiments investigat… ▽ More

    Submitted 24 February, 2023; v1 submitted 9 January, 2023; originally announced January 2023.

  14. arXiv:2210.04307  [pdf, other

    cs.CL cs.AI

    KSAT: Knowledge-infused Self Attention Transformer -- Integrating Multiple Domain-Specific Contexts

    Authors: Kaushik Roy, Yuxin Zi, Vignesh Narayanan, Manas Gaur, Amit Sheth

    Abstract: Domain-specific language understanding requires integrating multiple pieces of relevant contextual information. For example, we see both suicide and depression-related behavior (multiple contexts) in the text ``I have a gun and feel pretty bad about my life, and it wouldn't be the worst thing if I didn't wake up tomorrow''. Domain specificity in self-attention architectures is handled by fine-tuni… ▽ More

    Submitted 24 June, 2023; v1 submitted 9 October, 2022; originally announced October 2022.

    Comments: Preprint version of paper accepted for publication at KDD workshop on Knowledge Augmented Methods for NLP, 2023

  15. arXiv:2208.08227  [pdf, other

    cs.LG cs.PL

    MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation

    Authors: Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q Feldman, Arjun Guha, Michael Greenberg, Abhinav Jangda

    Abstract: Large language models have demonstrated the ability to generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities wi… ▽ More

    Submitted 19 December, 2022; v1 submitted 17 August, 2022; originally announced August 2022.

  16. arXiv:2107.09351  [pdf, ps, other

    cs.DB cs.PF

    IoTDataBench: Extending TPCx-IoT for Compression and Scalability

    Authors: Yuqing Zhu, Yanzhe An, Yuan Zi, Yu Feng, Jianmin Wang

    Abstract: We present a record-breaking result and lessons learned in practicing TPCx-IoT benchmarking for a real-world use case. We find that more system characteristics need to be benchmarked for its application to real-world use cases. We introduce an extension to the TPCx-IoT benchmark, covering fundamental requirements of time-series data management for IoT infrastructure. We characterize them as data c… ▽ More

    Submitted 28 December, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

    Comments: 16 pages, 7 figures, accepted by Thirteenth TPC Technology Conference on Performance Evaluation & Benchmarking

  17. Repetitive Transients Extraction Algorithm for Detecting Bearing Faults

    Authors: Wangpeng He, Yin Ding, Yanyang Zi, Ivan W. Selesnick

    Abstract: This paper addresses the problem of noise reduction with simultaneous components extrac- tion in vibration signals for faults diagnosis of bearing. The observed vibration signal is modeled as a summation of two components contaminated by noise, and each component composes of repetitive transients. To extract the two components simultaneously, an approach by solving an optimization problem is propo… ▽ More

    Submitted 6 August, 2016; v1 submitted 11 January, 2016; originally announced January 2016.

  18. Detection of Faults in Rotating Machinery Using Periodic Time-Frequency Sparsity

    Authors: Yin Ding, Wangpeng He, Binqiang Chen, Yanyang Zi, Ivan W. Selesnick

    Abstract: This paper addresses the problem of extracting periodic oscillatory features in vibration sig- nals for detecting faults in rotating machinery. To extract the feature, we propose an approach in the short-time Fourier transform (STFT) domain where the periodic oscillatory feature man- ifests itself as a relatively sparse grid. To estimate the sparse grid, we formulate an optimization problem using… ▽ More

    Submitted 30 July, 2016; v1 submitted 2 November, 2015; originally announced November 2015.

  19. Sparsity-based Algorithm for Detecting Faults in Rotating Machines

    Authors: Wangpeng He, Yin Ding, Yanyang Zi, Ivan W. Selesnick

    Abstract: This paper addresses the detection of periodic transients in vibration signals for detecting faults in rotating machines. For this purpose, we present a method to estimate periodic-group-sparse signals in noise. The method is based on the formulation of a convex optimization problem. A fast iterative algorithm is given for its solution. A simulated signal is formulated to verify the performance of… ▽ More

    Submitted 30 October, 2015; originally announced November 2015.