Skip to main content

Showing 1–28 of 28 results for author: Phan, L

  1. arXiv:2406.04313  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Improving Alignment and Robustness with Circuit Breakers

    Authors: Andy Zou, Long Phan, Justin Wang, Derek Duenas, Maxwell Lin, Maksym Andriushchenko, Rowan Wang, Zico Kolter, Matt Fredrikson, Dan Hendrycks

    Abstract: AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as they respond with harmful outputs with "circuit breakers." Existing techniques aimed at improving alignment, such as refusal training, are often bypassed. Techniques such as adversarial training try to… ▽ More

    Submitted 12 July, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: Code and models are available at https://github.com/GraySwanAI/circuit-breakers

  2. arXiv:2405.12463  [pdf, other

    math.OC cs.AI cs.LG eess.SY stat.ML

    Stochastic Learning of Computational Resource Usage as Graph Structured Multimarginal Schrödinger Bridge

    Authors: Georgiy A. Bondar, Robert Gifford, Linh Thi Xuan Phan, Abhishek Halder

    Abstract: We propose to learn the time-varying stochastic computational resource usage of software as a graph structured Schrödinger bridge problem. In general, learning the computational resource usage from data is challenging because resources such as the number of CPU instructions and the number of last level cache requests are both time-varying and statistically correlated. Our proposed method enables l… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

  3. arXiv:2403.03218  [pdf, other

    cs.LG cs.AI cs.CL cs.CY

    The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning

    Authors: Nathaniel Li, Alexander Pan, Anjali Gopal, Summer Yue, Daniel Berrios, Alice Gatti, Justin D. Li, Ann-Kathrin Dombrowski, Shashwat Goel, Long Phan, Gabriel Mukobi, Nathan Helm-Burger, Rassin Lababidi, Lennart Justen, Andrew B. Liu, Michael Chen, Isabelle Barrass, Oliver Zhang, Xiaoyuan Zhu, Rishub Tamirisa, Bhrugu Bharathi, Adam Khoja, Zhenqi Zhao, Ariel Herbert-Voss, Cort B. Breuer , et al. (32 additional authors not shown)

    Abstract: The White House Executive Order on Artificial Intelligence highlights the risks of large language models (LLMs) empowering malicious actors in developing biological, cyber, and chemical weapons. To measure these risks of malicious use, government institutions and major AI labs are developing evaluations for hazardous capabilities in LLMs. However, current evaluations are private, preventing furthe… ▽ More

    Submitted 15 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

    Comments: See the project page at https://wmdp.ai

  4. arXiv:2402.14874  [pdf, other

    cs.CL cs.AI cs.LG

    Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation

    Authors: Phuc Phan, Hieu Tran, Long Phan

    Abstract: We propose a straightforward approach called Distillation Contrastive Decoding (DCD) to enhance the reasoning capabilities of Large Language Models (LLMs) during inference. In contrast to previous approaches that relied on smaller amateur models or analysis of hidden state differences, DCD employs Contrastive Chain-of-thought Prompting and advanced distillation techniques, including Dropout and Qu… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: Under Review

  5. arXiv:2402.04249  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal

    Authors: Mantas Mazeika, Long Phan, Xuwang Yin, Andy Zou, Zifan Wang, Norman Mu, Elham Sakhaee, Nathaniel Li, Steven Basart, Bo Li, David Forsyth, Dan Hendrycks

    Abstract: Automated red teaming holds substantial promise for uncovering and mitigating the risks associated with the malicious use of large language models (LLMs), yet the field lacks a standardized evaluation framework to rigorously assess new methods. To address this issue, we introduce HarmBench, a standardized evaluation framework for automated red teaming. We identify several desirable properties prev… ▽ More

    Submitted 26 February, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Website: https://www.harmbench.org

  6. arXiv:2310.05350  [pdf

    cs.DC cs.LG

    Scaling Studies for Efficient Parameter Search and Parallelism for Large Language Model Pre-training

    Authors: Michael Benington, Leo Phan, Chris Pierre Paul, Evan Shoemaker, Priyanka Ranade, Torstein Collett, Grant Hodgson Perez, Christopher Krieger

    Abstract: AI accelerator processing capabilities and memory constraints largely dictate the scale in which machine learning workloads (e.g., training and inference) can be executed within a desirable time frame. Training a state of the art, transformer-based model today requires use of GPU-accelerated high performance computers with high-speed interconnects. As datasets and models continue to increase in si… ▽ More

    Submitted 10 October, 2023; v1 submitted 8 October, 2023; originally announced October 2023.

    Journal ref: Supercomputing 2023 (SC23) Student Research Poster Track

  7. arXiv:2310.01405  [pdf, other

    cs.LG cs.AI cs.CL cs.CV cs.CY

    Representation Engineering: A Top-Down Approach to AI Transparency

    Authors: Andy Zou, Long Phan, Sarah Chen, James Campbell, Phillip Guo, Richard Ren, Alexander Pan, Xuwang Yin, Mantas Mazeika, Ann-Kathrin Dombrowski, Shashwat Goel, Nathaniel Li, Michael J. Byun, Zifan Wang, Alex Mallen, Steven Basart, Sanmi Koyejo, Dawn Song, Matt Fredrikson, J. Zico Kolter, Dan Hendrycks

    Abstract: In this paper, we identify and characterize the emerging area of representation engineering (RepE), an approach to enhancing the transparency of AI systems that draws on insights from cognitive neuroscience. RepE places population-level representations, rather than neurons or circuits, at the center of analysis, equipping us with novel methods for monitoring and manipulating high-level cognitive p… ▽ More

    Submitted 10 October, 2023; v1 submitted 2 October, 2023; originally announced October 2023.

    Comments: Code is available at https://github.com/andyzoujm/representation-engineering

  8. arXiv:2310.00604  [pdf, other

    eess.SY cs.LG math.OC stat.ML

    Path Structured Multimarginal Schrödinger Bridge for Probabilistic Learning of Hardware Resource Usage by Control Software

    Authors: Georgiy A. Bondar, Robert Gifford, Linh Thi Xuan Phan, Abhishek Halder

    Abstract: The solution of the path structured multimarginal Schrödinger bridge problem (MSBP) is the most-likely measure-valued trajectory consistent with a sequence of observed probability measures or distributional snapshots. We leverage recent algorithmic advances in solving such structured MSBPs for learning stochastic hardware resource usage by control software. The solution enables predicting the time… ▽ More

    Submitted 3 October, 2023; v1 submitted 1 October, 2023; originally announced October 2023.

    Comments: 8 pages, 6 figures. Submitted to American Control Conference (ACC) 2024

  9. arXiv:2303.13592  [pdf, other

    cs.CL cs.AI

    Prompting Multilingual Large Language Models to Generate Code-Mixed Texts: The Case of South East Asian Languages

    Authors: Zheng-Xin Yong, Ruochen Zhang, Jessica Zosa Forde, Skyler Wang, Arjun Subramonian, Holy Lovenia, Samuel Cahyawijaya, Genta Indra Winata, Lintang Sutawika, Jan Christian Blaise Cruz, Yin Lin Tan, Long Phan, Rowena Garcia, Thamar Solorio, Alham Fikri Aji

    Abstract: While code-mixing is a common linguistic practice in many parts of the world, collecting high-quality and low-cost code-mixed data remains a challenge for natural language processing (NLP) research. The recent proliferation of Large Language Models (LLMs) compels one to ask: how capable are these systems in generating code-mixed data? In this paper, we explore prompting multilingual LLMs in a zero… ▽ More

    Submitted 12 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: Updating Authors

  10. arXiv:2303.03915  [pdf, other

    cs.CL cs.AI

    The BigScience ROOTS Corpus: A 1.6TB Composite Multilingual Dataset

    Authors: Hugo Laurençon, Lucile Saulnier, Thomas Wang, Christopher Akiki, Albert Villanova del Moral, Teven Le Scao, Leandro Von Werra, Chenghao Mou, Eduardo González Ponferrada, Huu Nguyen, Jörg Frohberg, Mario Šaško, Quentin Lhoest, Angelina McMillan-Major, Gerard Dupont, Stella Biderman, Anna Rogers, Loubna Ben allal, Francesco De Toni, Giada Pistilli, Olivier Nguyen, Somaieh Nikpoor, Maraim Masoud, Pierre Colombo, Javier de la Rosa , et al. (29 additional authors not shown)

    Abstract: As language models grow ever larger, the need for large-scale high-quality text datasets has never been more pressing, especially in multilingual settings. The BigScience workshop, a 1-year international and multidisciplinary initiative, was formed with the goal of researching and training large language models as a values-driven undertaking, putting issues of ethics, harm, and governance in the f… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

    Comments: NeurIPS 2022, Datasets and Benchmarks Track

    ACM Class: I.2.7

  11. arXiv:2211.05100  [pdf, other

    cs.CL

    BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

    Authors: BigScience Workshop, :, Teven Le Scao, Angela Fan, Christopher Akiki, Ellie Pavlick, Suzana Ilić, Daniel Hesslow, Roman Castagné, Alexandra Sasha Luccioni, François Yvon, Matthias Gallé, Jonathan Tow, Alexander M. Rush, Stella Biderman, Albert Webson, Pawan Sasanka Ammanamanchi, Thomas Wang, Benoît Sagot, Niklas Muennighoff, Albert Villanova del Moral, Olatunji Ruwase, Rachel Bawden, Stas Bekman, Angelina McMillan-Major , et al. (369 additional authors not shown)

    Abstract: Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access… ▽ More

    Submitted 27 June, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

  12. arXiv:2210.05610  [pdf, other

    cs.CL cs.AI

    MTet: Multi-domain Translation for English and Vietnamese

    Authors: Chinh Ngo, Trieu H. Trinh, Long Phan, Hieu Tran, Tai Dang, Hieu Nguyen, Minh Nguyen, Minh-Thang Luong

    Abstract: We introduce MTet, the largest publicly available parallel corpus for English-Vietnamese translation. MTet consists of 4.2M high-quality training sentence pairs and a multi-domain test set refined by the Vietnamese research community. Combining with previous works on English-Vietnamese translation, we grow the existing parallel dataset to 6.2M sentence pairs. We also release the first pretrained m… ▽ More

    Submitted 19 October, 2022; v1 submitted 11 October, 2022; originally announced October 2022.

  13. arXiv:2210.05598  [pdf, other

    cs.CL cs.AI

    Enriching Biomedical Knowledge for Low-resource Language Through Large-Scale Translation

    Authors: Long Phan, Tai Dang, Hieu Tran, Trieu H. Trinh, Vy Phan, Lam D. Chau, Minh-Thang Luong

    Abstract: Biomedical data and benchmarks are highly valuable yet very limited in low-resource languages other than English such as Vietnamese. In this paper, we make use of a state-of-the-art translation model in English-Vietnamese to translate and produce both pretrained as well as supervised data in the biomedical domains. Thanks to such large-scale translation, we introduce ViPubmedT5, a pretrained Encod… ▽ More

    Submitted 29 January, 2023; v1 submitted 11 October, 2022; originally announced October 2022.

  14. arXiv:2207.08391  [pdf, other

    cs.LG cs.DC

    Federated Learning for Non-IID Data via Client Variance Reduction and Adaptive Server Update

    Authors: Hiep Nguyen, Lam Phan, Harikrishna Warrier, Yogesh Gupta

    Abstract: Federated learning (FL) is an emerging technique used to collaboratively train a global machine learning model while keeping the data localized on the user devices. The main obstacle to FL's practical implementation is the Non-Independent and Identical (Non-IID) data distribution across users, which slows convergence and degrades performance. To tackle this fundamental issue, we propose a method (… ▽ More

    Submitted 29 July, 2022; v1 submitted 18 July, 2022; originally announced July 2022.

  15. arXiv:2207.05422  [pdf, other

    cs.CV

    Improving Domain Generalization by Learning without Forgetting: Application in Retail Checkout

    Authors: Thuy C. Nguyen, Nam LH. Phan, Son T. Nguyen

    Abstract: Designing an automatic checkout system for retail stores at the human level accuracy is challenging due to similar appearance products and their various poses. This paper addresses the problem by proposing a method with a two-stage pipeline. The first stage detects class-agnostic items, and the second one is dedicated to classify product categories. We also track the objects across video frames to… ▽ More

    Submitted 12 July, 2022; originally announced July 2022.

  16. arXiv:2205.06457  [pdf, ps, other

    cs.CL cs.AI

    ViT5: Pretrained Text-to-Text Transformer for Vietnamese Language Generation

    Authors: Long Phan, Hieu Tran, Hieu Nguyen, Trieu H. Trinh

    Abstract: We present ViT5, a pretrained Transformer-based encoder-decoder model for the Vietnamese language. With T5-style self-supervised pretraining, ViT5 is trained on a large corpus of high-quality and diverse Vietnamese texts. We benchmark ViT5 on two downstream text generation tasks, Abstractive Text Summarization and Named Entity Recognition. Although Abstractive Text Summarization has been widely st… ▽ More

    Submitted 26 May, 2022; v1 submitted 13 May, 2022; originally announced May 2022.

    Comments: NAACL SRW 2022. arXiv admin note: text overlap with arXiv:2110.04257

  17. arXiv:2110.07833  [pdf, other

    cs.CL

    Span Detection for Aspect-Based Sentiment Analysis in Vietnamese

    Authors: Kim Thi-Thanh Nguyen, Sieu Khai Huynh, Luong Luc Phan, Phuc Huynh Pham, Duc-Vu Nguyen, Kiet Van Nguyen

    Abstract: Aspect-based sentiment analysis plays an essential role in natural language processing and artificial intelligence. Recently, researchers only focused on aspect detection and sentiment classification but ignoring the sub-task of detecting user opinion span, which has enormous potential in practical applications. In this paper, we present a new Vietnamese dataset (UIT-ViSD4SA) consisting of 35,396… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

  18. arXiv:2110.04257  [pdf, other

    cs.CL

    VieSum: How Robust Are Transformer-based Models on Vietnamese Summarization?

    Authors: Hieu Nguyen, Long Phan, James Anibal, Alec Peltekian, Hieu Tran

    Abstract: Text summarization is a challenging task within natural language processing that involves text generation from lengthy input sequences. While this task has been widely studied in English, there is very limited research on summarization for Vietnamese text. In this paper, we investigate the robustness of transformer-based encoder-decoder architectures for Vietnamese abstractive summarization. Lever… ▽ More

    Submitted 8 October, 2021; originally announced October 2021.

  19. arXiv:2108.10520  [pdf, other

    cs.CV

    Improving Object Detection by Label Assignment Distillation

    Authors: Chuong H. Nguyen, Thuy C. Nguyen, Tuan N. Tang, Nam L. H. Phan

    Abstract: Label assignment in object detection aims to assign targets, foreground or background, to sampled regions in an image. Unlike labeling for image classification, this problem is not well defined due to the object's bounding box. In this paper, we investigate the problem from a perspective of distillation, hence we call Label Assignment Distillation (LAD). Our initial motivation is very simple, we u… ▽ More

    Submitted 19 October, 2021; v1 submitted 24 August, 2021; originally announced August 2021.

    Comments: To appear in WACV 2022

  20. arXiv:2106.09997  [pdf, other

    cs.CL

    SPBERT: An Efficient Pre-training BERT on SPARQL Queries for Question Answering over Knowledge Graphs

    Authors: Hieu Tran, Long Phan, James Anibal, Binh T. Nguyen, Truong-Son Nguyen

    Abstract: In this paper, we propose SPBERT, a transformer-based language model pre-trained on massive SPARQL query logs. By incorporating masked language modeling objectives and the word structural objective, SPBERT can learn general-purpose representations in both natural language and SPARQL query language. We investigate how SPBERT and encoder-decoder architecture can be adapted for Knowledge-based QA cor… ▽ More

    Submitted 30 June, 2021; v1 submitted 18 June, 2021; originally announced June 2021.

  21. arXiv:2106.06649  [pdf, other

    cs.CV

    1st Place Solution for YouTubeVOS Challenge 2021:Video Instance Segmentation

    Authors: Thuy C. Nguyen, Tuan N. Tang, Nam LH. Phan, Chuong H. Nguyen, Masayuki Yamazaki, Masao Yamanaka

    Abstract: Video Instance Segmentation (VIS) is a multi-task problem performing detection, segmentation, and tracking simultaneously. Extended from image set applications, video data additionally induces the temporal information, which, if handled appropriately, is very useful to identify and predict object motions. In this work, we design a unified model to mutually learn these tasks. Specifically, we propo… ▽ More

    Submitted 8 July, 2021; v1 submitted 11 June, 2021; originally announced June 2021.

    Comments: Accepted to CPVR 2021 Workshop

  22. arXiv:2106.03598  [pdf, other

    cs.CL cs.AI cs.LG

    SciFive: a text-to-text transformer model for biomedical literature

    Authors: Long N. Phan, James T. Anibal, Hieu Tran, Shaurya Chanana, Erol Bahadroglu, Alec Peltekian, Grégoire Altan-Bonnet

    Abstract: In this report, we introduce SciFive, a domain-specific T5 model that has been pre-trained on large biomedical corpora. Our model outperforms the current SOTA methods (i.e. BERT, BioBERT, Base T5) on tasks in named entity relation, relation extraction, natural language inference, and question-answering. We show that text-generation methods have significant potential in a broad array of biomedical… ▽ More

    Submitted 28 May, 2021; originally announced June 2021.

  23. arXiv:2105.15079  [pdf, other

    cs.CL

    SA2SL: From Aspect-Based Sentiment Analysis to Social Listening System for Business Intelligence

    Authors: Luong Luc Phan, Phuc Huynh Pham, Kim Thi-Thanh Nguyen, Tham Thi Nguyen, Sieu Khai Huynh, Luan Thanh Nguyen, Tin Van Huynh, Kiet Van Nguyen

    Abstract: In this paper, we present a process of building a social listening system based on aspect-based sentiment analysis in Vietnamese from creating a dataset to building a real application. Firstly, we create UIT-ViSFD, a Vietnamese Smartphone Feedback Dataset as a new benchmark corpus built based on a strict annotation schemes for evaluating aspect-based sentiment analysis, consisting of 11,122 human-… ▽ More

    Submitted 10 June, 2021; v1 submitted 31 May, 2021; originally announced May 2021.

  24. arXiv:2105.13578  [pdf, other

    cs.CL

    Hierarchical Transformer Encoders for Vietnamese Spelling Correction

    Authors: Hieu Tran, Cuong V. Dinh, Long Phan, Son T. Nguyen

    Abstract: In this paper, we propose a Hierarchical Transformer model for Vietnamese spelling correction problem. The model consists of multiple Transformer encoders and utilizes both character-level and word-level to detect errors and make corrections. In addition, to facilitate future work in Vietnamese spelling correction tasks, we propose a realistic dataset collected from real-life texts for the problem… ▽ More

    Submitted 28 May, 2021; originally announced May 2021.

    Comments: Accepted by The 34th International Conference on Industrial, Engineering & Other Applications of Applied Intelligent Systems(IEA/AIE 2021)

  25. arXiv:2105.08645  [pdf, other

    cs.AI cs.PL

    CoTexT: Multi-task Learning with Code-Text Transformer

    Authors: Long Phan, Hieu Tran, Daniel Le, Hieu Nguyen, James Anibal, Alec Peltekian, Yanfang Ye

    Abstract: We present CoTexT, a pre-trained, transformer-based encoder-decoder model that learns the representative context between natural language (NL) and programming language (PL). Using self-supervision, CoTexT is pre-trained on large programming language corpora to learn a general understanding of language and code. CoTexT supports downstream NL-PL tasks such as code summarizing/documentation, code gen… ▽ More

    Submitted 21 June, 2021; v1 submitted 18 May, 2021; originally announced May 2021.

  26. arXiv:2012.07557  [pdf, other

    cs.CL cs.AI cs.LG

    Leveraging Transfer Learning for Reliable Intelligence Identification on Vietnamese SNSs (ReINTEL)

    Authors: Trung-Hieu Tran, Long Phan, Truong-Son Nguyen, Tien-Huy Nguyen

    Abstract: This paper proposed several transformer-based approaches for Reliable Intelligence Identification on Vietnamese social network sites at VLSP 2020 evaluation campaign. We exploit both of monolingual and multilingual pre-trained models. Besides, we utilize the ensemble method to improve the robustness of different approaches. Our team achieved a score of 0.9378 at ROC-AUC metric in the private test… ▽ More

    Submitted 16 December, 2020; v1 submitted 10 December, 2020; originally announced December 2020.

  27. arXiv:2007.02096  [pdf

    eess.IV cs.CV cs.LG

    Multi-Site Infant Brain Segmentation Algorithms: The iSeg-2019 Challenge

    Authors: Yue Sun, Kun Gao, Zhengwang Wu, Zhihao Lei, Ying Wei, Jun Ma, Xiaoping Yang, Xue Feng, Li Zhao, Trung Le Phan, Jitae Shin, Tao Zhong, Yu Zhang, Lequan Yu, Caizi Li, Ramesh Basnet, M. Omair Ahmad, M. N. S. Swamy, Wenao Ma, Qi Dou, Toan Duc Bui, Camilo Bermudez Noguera, Bennett Landman, Ian H. Gotlib, Kathryn L. Humphreys , et al. (8 additional authors not shown)

    Abstract: To better understand early brain growth patterns in health and disorder, it is critical to accurately segment infant brain magnetic resonance (MR) images into white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF). Deep learning-based methods have achieved state-of-the-art performance; however, one of major limitations is that the learning-based methods may suffer from the multi-site i… ▽ More

    Submitted 11 July, 2020; v1 submitted 4 July, 2020; originally announced July 2020.

    Journal ref: IEEE Transactions on Medical Imaging, 40(5), 1363-1376, 2021

  28. arXiv:1509.08834  [pdf, other

    cs.GR q-bio.TO

    Visualization techniques for the developing chicken heart

    Authors: Ly Phan, Sandra Rugonyi, Cindy Grimm

    Abstract: We present a geometric surface parameterization algorithm and several visualization techniques adapted to the problem of understanding the 4D peristaltic-like motion of the outflow tract (OFT) in an embryonic chick heart. We illustrated the techniques using data from hearts under normal conditions (four embryos), and hearts in which blood flow conditions are altered through OFT banding (four embry… ▽ More

    Submitted 29 September, 2015; originally announced September 2015.

    Comments: Longer version of conference paper published in 11th International Symposium on Visual Computing (December 2015)

    ACM Class: I.3.5, J.3