Skip to main content

Showing 1–50 of 371 results for author: Ji, S

  1. arXiv:2407.02886  [pdf, other

    cs.CR

    A Wolf in Sheep's Clothing: Practical Black-box Adversarial Attacks for Evading Learning-based Windows Malware Detection in the Wild

    Authors: Xiang Ling, Zhiyu Wu, Bin Wang, Wei Deng, Jingzheng Wu, Shouling Ji, Tianyue Luo, Yanjun Wu

    Abstract: Given the remarkable achievements of existing learning-based malware detection in both academia and industry, this paper presents MalGuise, a practical black-box adversarial attack framework that evaluates the security risks of existing learning-based Windows malware detection systems under the black-box setting. MalGuise first employs a novel semantics-preserving transformation of call-based redi… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: This paper has been accepted by 33rd USENIX Security Symposium 2024

  2. arXiv:2407.02775  [pdf, other

    cs.CL cs.LG

    MLKD-BERT: Multi-level Knowledge Distillation for Pre-trained Language Models

    Authors: Ying Zhang, Ziheng Yang, Shufan Ji

    Abstract: Knowledge distillation is an effective technique for pre-trained language model compression. Although existing knowledge distillation methods perform well for the most typical model BERT, they could be further improved in two aspects: the relation-level knowledge could be further explored to improve model performance; and the setting of student attention head number could be more flexible to decre… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  3. arXiv:2407.01100  [pdf, other

    cs.CL cs.LG

    Eliminating Position Bias of Language Models: A Mechanistic Approach

    Authors: Ziqi Wang, Hanlin Zhang, Xiner Li, Kuan-Hao Huang, Chi Han, Shuiwang Ji, Sham M. Kakade, Hao Peng, Heng Ji

    Abstract: Position bias has proven to be a prevalent issue of modern language models (LMs), where the models prioritize content based on its position within the given context. This bias often leads to unexpected model failures and hurts performance, robustness, and reliability across various applications. Our mechanistic analysis attributes the position bias to two components employed in nearly all state-of… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 18 pages, 5 figures

  4. arXiv:2406.19389  [pdf, other

    cs.CV

    OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

    Authors: Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shunping Ji, Chen Change Loy, Shuicheng Yan

    Abstract: Current universal segmentation methods demonstrate strong capabilities in pixel-level image and video understanding. However, they lack reasoning abilities and cannot be controlled via text instructions. In contrast, large vision-language multimodal models exhibit powerful vision-based conversation and reasoning capabilities but lack pixel-level understanding and have difficulty accepting visual p… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  5. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  6. arXiv:2406.12888  [pdf, other

    cond-mat.mtrl-sci cs.AI physics.atom-ph

    A Space Group Symmetry Informed Network for O(3) Equivariant Crystal Tensor Prediction

    Authors: Keqiang Yan, Alexandra Saxton, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

    Abstract: We consider the prediction of general tensor properties of crystalline materials, including dielectric, piezoelectric, and elastic tensors. A key challenge here is how to make the predictions satisfy the unique tensor equivariance to O(3) group and invariance to crystal space groups. To this end, we propose a General Materials Tensor Network (GMTNet), which is carefully designed to satisfy the req… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

    Comments: This paper has been accepted to ICML 24 as a poster. You are encouraged to cite the conference version of this paper

  7. arXiv:2406.11935  [pdf, other

    cs.PL cs.AI cs.SE

    Iterative or Innovative? A Problem-Oriented Perspective for Code Optimization

    Authors: Tong Ye, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, Wenhai Wang

    Abstract: Large language models (LLMs) have demonstrated strong capabilities in solving a wide range of programming tasks. However, LLMs have rarely been explored for code optimization. In this paper, we explore code optimization with a focus on performance enhancement, specifically aiming to optimize code for minimal execution time. The recently proposed first PIE dataset for performance optimization const… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  8. arXiv:2406.10833  [pdf, other

    cs.CL

    A Comprehensive Survey of Scientific Large Language Models and Their Applications in Scientific Discovery

    Authors: Yu Zhang, Xiusi Chen, Bowen Jin, Sheng Wang, Shuiwang Ji, Wei Wang, Jiawei Han

    Abstract: In many scientific fields, large language models (LLMs) have revolutionized the way with which text and other modalities of data (e.g., molecules and proteins) are dealt, achieving superior performance in various applications and augmenting the scientific discovery process. Nevertheless, previous surveys on scientific LLMs often concentrate on one to two fields or a single modality. In this paper,… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 33 pages (GitHub: https://github.com/yuzhimanhua/Awesome-Scientific-Language-Models)

  9. arXiv:2406.09669  [pdf, other

    cs.CR

    Watch the Watcher! Backdoor Attacks on Security-Enhancing Diffusion Models

    Authors: Changjiang Li, Ren Pang, Bochuan Cao, Jinghui Chen, Fenglong Ma, Shouling Ji, Ting Wang

    Abstract: Thanks to their remarkable denoising capabilities, diffusion models are increasingly being employed as defensive tools to reinforce the security of other models, notably in purifying adversarial examples and certifying adversarial robustness. However, the security risks of these practices themselves remain largely unexplored, which is highly concerning. To bridge this gap, this work investigates t… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  10. arXiv:2406.07598  [pdf, other

    cs.LG

    Equivariance via Minimal Frame Averaging for More Symmetries and Efficiency

    Authors: Yuchao Lin, Jacob Helwig, Shurui Gui, Shuiwang Ji

    Abstract: We consider achieving equivariance in machine learning systems via frame averaging. Current frame averaging methods involve a costly sum over large frames or rely on sampling-based approaches that only yield approximate equivariance. Here, we propose Minimal Frame Averaging (MFA), a mathematical framework for constructing provably minimal frames that are exactly equivariant. The general foundation… ▽ More

    Submitted 21 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  11. arXiv:2406.02930  [pdf, other

    cs.CV

    P2PFormer: A Primitive-to-polygon Method for Regular Building Contour Extraction from Remote Sensing Images

    Authors: Tao Zhang, Shiqing Wei, Yikang Zhou, Muying Luo, Wenling You, Shunping Ji

    Abstract: Extracting building contours from remote sensing imagery is a significant challenge due to buildings' complex and diverse shapes, occlusions, and noise. Existing methods often struggle with irregular contours, rounded corners, and redundancy points, necessitating extensive post-processing to produce regular polygonal building contours. To address these challenges, we introduce a novel, streamlined… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

  12. arXiv:2406.01205  [pdf, other

    eess.AS cs.LG cs.SD

    ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

    Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

    Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  13. arXiv:2405.16133  [pdf, other

    cs.SE cs.AI

    Uncovering LLM-Generated Code: A Zero-Shot Synthetic Code Detector via Code Rewriting

    Authors: Tong Ye, Yangkai Du, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji, Wenhai Wang

    Abstract: Large Language Models (LLMs) have exhibited remarkable proficiency in generating code. However, the misuse of LLM-generated (Synthetic) code has prompted concerns within both educational and industrial domains, highlighting the imperative need for the development of synthetic code detectors. Existing methods for detecting LLM-generated content are primarily tailored for general text and often stru… ▽ More

    Submitted 29 May, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

    Comments: Previously submitted to EMNLP2023

  14. arXiv:2405.15179  [pdf, other

    cs.CL

    VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks

    Authors: Yang Li, Shaobo Han, Shihao Ji

    Abstract: As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decompositio… ▽ More

    Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  15. arXiv:2405.14024  [pdf, other

    cs.CV cs.AI

    Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation

    Authors: Mykhailo Uss, Ruslan Yermolenko, Olena Kolodiazhna, Oleksii Shashko, Ivan Safonov, Volodymyr Savin, Yoonjae Yeo, Seowon Ji, Jaeyun Jeong

    Abstract: Quantization is widely used to increase deep neural networks' (DNN) memory, computation, and power efficiency. Various techniques, such as post-training quantization and quantization-aware training, have been proposed to improve quantization quality. We introduce a novel approach for DNN quantization that uses a redundant representation of DNN's output. We represent the target quantity as a point… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 18 pages, 10 figures

  16. arXiv:2405.13584  [pdf, other

    cs.LG cs.DC

    Emulating Full Client Participation: A Long-Term Client Selection Strategy for Federated Learning

    Authors: Qingming Li, Juzheng Miao, Puning Zhao, Li Zhou, Shouling Ji, Bowen Zhou, Furui Liu

    Abstract: Client selection significantly affects the system convergence efficiency and is a crucial problem in federated learning. Existing methods often select clients by evaluating each round individually and overlook the necessity for long-term optimization, resulting in suboptimal performance and potential fairness issues. In this study, we propose a novel client selection strategy designed to emulate t… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  17. arXiv:2405.12786  [pdf, other

    cs.CR

    Rethinking the Vulnerabilities of Face Recognition Systems:From a Practical Perspective

    Authors: Jiahao Chen, Zhiqiang Shen, Yuwen Pu, Chunyi Zhou, Changjiang Li, Jiliang Li, Ting Wang, Shouling Ji

    Abstract: Face Recognition Systems (FRS) have increasingly integrated into critical applications, including surveillance and user authentication, highlighting their pivotal role in modern security systems. Recent studies have revealed vulnerabilities in FRS to adversarial (e.g., adversarial patch attacks) and backdoor attacks (e.g., training data poisoning), raising significant concerns about their reliabil… ▽ More

    Submitted 8 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

    Comments: 19 pages,version 3

  18. arXiv:2405.12751  [pdf, other

    cs.CR

    A Stealthy Backdoor Attack for Without-Label-Sharing Split Learning

    Authors: Yuwen Pu, Zhuoyuan Ding, Jiahao Chen, Chunyi Zhou, Qingming Li, Chunqiang Hu, Shouling Ji

    Abstract: As a novel privacy-preserving paradigm aimed at reducing client computational costs and achieving data utility, split learning has garnered extensive attention and proliferated widespread applications across various fields, including smart health and smart transportation, among others. While recent studies have primarily concentrated on addressing privacy leakage concerns in split learning, such a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 15 pages

  19. arXiv:2405.12719  [pdf, other

    cs.CR

    How to Train a Backdoor-Robust Model on a Poisoned Dataset without Auxiliary Data?

    Authors: Yuwen Pu, Jiahao Chen, Chunyi Zhou, Zhou Feng, Qingming Li, Chunqiang Hu, Shouling Ji

    Abstract: Backdoor attacks have attracted wide attention from academia and industry due to their great security threat to deep neural networks (DNN). Most of the existing methods propose to conduct backdoor attacks by poisoning the training dataset with different strategies, so it's critical to identify the poisoned samples and then train a clean model on the unreliable dataset in the context of defending b… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

    Comments: 13 pages, under review

  20. arXiv:2405.12663  [pdf, other

    cs.GR cs.CV

    LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting

    Authors: Jia Gong, Shenyu Ji, Lin Geng Foo, Kang Chen, Hossein Rahmani, Jun Liu

    Abstract: Creating and customizing a 3D clothed avatar from textual descriptions is a critical and challenging task. Traditional methods often treat the human body and clothing as inseparable, limiting users' ability to freely mix and match garments. In response to this limitation, we present LAyered Gaussian Avatar (LAGA), a carefully designed framework enabling the creation of high-fidelity decomposable a… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  21. arXiv:2405.12094  [pdf, other

    cs.LG

    Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

    Authors: Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Shengchao Hu, Mengzhu Wang, Shouling Ji, Jincai Huang, Li Shen

    Abstract: Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to substantial parameter size and limited scalability, which is particularly critical in sequential decision-making scenarios where resources are constrained such as in robots and drones with limited computational power. Mamba, a pr… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 20 pages, 8 figures

  22. arXiv:2405.05846  [pdf, other

    cs.CR cs.CV

    Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models

    Authors: Zhe Ma, Xuhong Zhang, Qingming Li, Tianyu Du, Wenzhi Chen, Zonghui Wang, Shouling Ji

    Abstract: The past few years have witnessed substantial advancement in text-guided image generation powered by diffusion models. However, it was shown that text-to-image diffusion models are vulnerable to training image memorization, raising concerns on copyright infringement and privacy invasion. In this work, we perform practical analysis of memorization in text-to-image diffusion models. Targeting a set… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  23. arXiv:2404.12925  [pdf, other

    cs.CV

    A Hybrid Generative and Discriminative PointNet on Unordered Point Sets

    Authors: Yang Ye, Shihao Ji

    Abstract: As point cloud provides a natural and flexible representation usable in myriad applications (e.g., robotics and self-driving cars), the ability to synthesize point clouds for analysis becomes crucial. Recently, Xie et al. propose a generative model for unordered point sets in the form of an energy-based model (EBM). Despite the model achieving an impressive performance for point cloud generation,… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  24. arXiv:2404.08939  [pdf, other

    cs.RO cs.AI cs.HC

    NeurIT: Pushing the Limit of Neural Inertial Tracking for Indoor Robotic IoT

    Authors: Xinzhe Zheng, Sijie Ji, Yipeng Pan, Kaiwen Zhang, Chenshu Wu

    Abstract: Inertial tracking is vital for robotic IoT and has gained popularity thanks to the ubiquity of low-cost Inertial Measurement Units (IMUs) and deep learning-powered tracking algorithms. Existing works, however, have not fully utilized IMU measurements, particularly magnetometers, nor maximized the potential of deep learning to achieve the desired accuracy. To enhance the tracking accuracy for indoo… ▽ More

    Submitted 13 April, 2024; originally announced April 2024.

  25. arXiv:2404.07577  [pdf, other

    cs.LG eess.SP

    Generating Comprehensive Lithium Battery Charging Data with Generative AI

    Authors: Lidang Jiang, Changyan Hu, Sibei Ji, Hang Zhao, Junxiong Chen, Ge He

    Abstract: In optimizing performance and extending the lifespan of lithium batteries, accurate state prediction is pivotal. Traditional regression and classification methods have achieved some success in battery state prediction. However, the efficacy of these data-driven approaches heavily relies on the availability and quality of public datasets. Additionally, generating electrochemical data predominantly… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

  26. arXiv:2404.05094  [pdf, other

    cs.LG cs.AI

    Active Test-Time Adaptation: Theoretical Analyses and An Algorithm

    Authors: Shurui Gui, Xiner Li, Shuiwang Ji

    Abstract: Test-time adaptation (TTA) addresses distribution shifts for streaming test data in unsupervised settings. Currently, most TTA methods can only deal with minor shifts and rely heavily on heuristic and empirical studies. To advance TTA under domain shifts, we propose the novel problem setting of active test-time adaptation (ATTA) that integrates active learning within the fully TTA setting. We… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  27. arXiv:2404.04850  [pdf, other

    cs.CL

    Lucky 52: How Many Languages Are Needed to Instruction Fine-Tune Large Language Models?

    Authors: Shaoxiong Ji, Pinzhen Chen

    Abstract: Fine-tuning large language models for multilingual downstream tasks requires a diverse set of languages to capture the nuances and structures of different linguistic contexts effectively. While the specific number varies depending on the desired scope and target languages, we argue that the number of languages, language exposure, and similarity that incorporate the selection of languages for fine-… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

  28. arXiv:2404.00086  [pdf, other

    cs.CV

    DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries

    Authors: Yikang Zhou, Tao Zhang, Shunping Ji, Shuicheng Yan, Xiangtai Li

    Abstract: Modern video segmentation methods adopt object queries to perform inter-frame association and demonstrate satisfactory performance in tracking continuously appearing objects despite large-scale motion and transient occlusion. However, they all underperform on newly emerging and disappearing objects that are common in the real world because they attempt to model object emergence and disappearance t… ▽ More

    Submitted 14 June, 2024; v1 submitted 29 March, 2024; originally announced April 2024.

    Comments: Update more results and analysis

  29. arXiv:2403.19507  [pdf, other

    cs.LG

    SineNet: Learning Temporal Dynamics in Time-Dependent Partial Differential Equations

    Authors: Xuan Zhang, Jacob Helwig, Yuchao Lin, Yaochen Xie, Cong Fu, Stephan Wojtowytsch, Shuiwang Ji

    Abstract: We consider using deep neural networks to solve time-dependent partial differential equations (PDEs), where multi-scale processing is crucial for modeling complex, time-evolving dynamics. While the U-Net architecture with skip connections is commonly used by prior studies to enable multi-scale processing, our analysis shows that the need for features to evolve across layers results in temporally m… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: The Twelfth International Conference on Learning Representations

  30. arXiv:2403.16777  [pdf, other

    cs.CL

    Can Machine Translation Bridge Multilingual Pretraining and Cross-lingual Transfer Learning?

    Authors: Shaoxiong Ji, Timothee Mickus, Vincent Segonne, Jörg Tiedemann

    Abstract: Multilingual pretraining and fine-tuning have remarkably succeeded in various natural language processing tasks. Transferring representations from one language to another is especially crucial for cross-lingual learning. One can expect machine translation objectives to be well suited to fostering such capabilities, as they involve the explicit alignment of semantically equivalent sentences from di… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  31. arXiv:2403.14009  [pdf, other

    cs.CL

    A New Massive Multilingual Dataset for High-Performance Language Technologies

    Authors: Ona de Gibert, Graeme Nail, Nikolay Arefyev, Marta Bañón, Jelmer van der Linde, Shaoxiong Ji, Jaume Zaragoza-Bernabeu, Mikko Aulamo, Gema Ramírez-Sánchez, Andrey Kutuzov, Sampo Pyysalo, Stephan Oepen, Jörg Tiedemann

    Abstract: We present the HPLT (High Performance Language Technologies) language resources, a new massive multilingual dataset including both monolingual and bilingual corpora extracted from CommonCrawl and previously unused web crawls from the Internet Archive. We describe our methods for data acquisition, management and processing of large corpora, which rely on open-source software tools and high-performa… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  32. arXiv:2403.12541  [pdf, other

    cs.CR

    TAGS: Real-time Intrusion Detection with Tag-Propagation-based Provenance Graph Alignment on Streaming Events

    Authors: Zhenyuan Li, Yangyang Wei, Xiangmin Shen, Lingzhi Wang, Yan Chen, Haitao Xu, Shouling Ji, Fan Zhang

    Abstract: The evolution and advancement of cyberattacks pose challenges to existing security products. Recent concentrated research on provenance graph-based detection has proved its effectiveness in attack detection and investigation. However, implementing these approaches in practice encounters challenges such as high overhead, slow responsiveness, and low interpretability and extensibility. Towards pra… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  33. arXiv:2403.11857  [pdf, other

    cs.LG cond-mat.mtrl-sci

    Complete and Efficient Graph Transformers for Crystal Material Property Prediction

    Authors: Keqiang Yan, Cong Fu, Xiaofeng Qian, Xiaoning Qian, Shuiwang Ji

    Abstract: Crystal structures are characterized by atomic bases within a primitive unit cell that repeats along a regular lattice throughout 3D space. The periodic and infinite nature of crystals poses unique challenges for geometric graph representation learning. Specifically, constructing graphs that effectively capture the complete geometric information of crystals and handle chiral crystals remains an un… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted by ICLR 2024

  34. Stylized Face Sketch Extraction via Generative Prior with Limited Data

    Authors: Kwan Yun, Kwanggyoon Seo, Chang Wook Seo, Soyeon Yoon, Seongcheol Kim, Soohyun Ji, Amirsaman Ashtari, Junyong Noh

    Abstract: Facial sketches are both a concise way of showing the identity of a person and a means to express artistic intention. While a few techniques have recently emerged that allow sketches to be extracted in different styles, they typically rely on a large amount of data that is difficult to obtain. Here, we propose StyleSketch, a method for extracting high-resolution stylized sketches from a face image… ▽ More

    Submitted 17 March, 2024; originally announced March 2024.

    Comments: 14 pages

    MSC Class: 68T45 ACM Class: I.4.9

  35. arXiv:2403.07544  [pdf, other

    cs.CL

    MAMMOTH: Massively Multilingual Modular Open Translation @ Helsinki

    Authors: Timothee Mickus, Stig-Arne Grönroos, Joseph Attieh, Michele Boggia, Ona De Gibert, Shaoxiong Ji, Niki Andreas Lopi, Alessandro Raganato, Raúl Vázquez, Jörg Tiedemann

    Abstract: NLP in the age of monolithic large language models is approaching its limits in terms of size and information that can be handled. The trend goes to modularization, a necessary step into the direction of designing smaller sub-networks and components with specialized functionality. In this paper, we present the MAMMOTH toolkit: a framework designed for training massively multilingual modular machin… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

    Comments: Presented as a demo at EACL 2024

  36. arXiv:2403.06201  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Are You Being Tracked? Discover the Power of Zero-Shot Trajectory Tracing with LLMs!

    Authors: Huanqi Yang, Sijie Ji, Rucheng Wu, Weitao Xu

    Abstract: There is a burgeoning discussion around the capabilities of Large Language Models (LLMs) in acting as fundamental components that can be seamlessly incorporated into Artificial Intelligence of Things (AIoT) to interpret complex trajectories. This study introduces LLMTrack, a model that illustrates how LLMs can be leveraged for Zero-Shot Trajectory Recognition by employing a novel single-prompt tec… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  37. arXiv:2403.05168  [pdf, other

    cs.CV cs.AI

    Unlocking the Potential of Multimodal Unified Discrete Representation through Training-Free Codebook Optimization and Hierarchical Alignment

    Authors: Hai Huang, Yan Xia, Shengpeng Ji, Shulei Wang, Hanting Wang, Jieming Zhu, Zhenhua Dong, Zhou Zhao

    Abstract: Recent advances in representation learning have demonstrated the significance of multimodal alignment. The Dual Cross-modal Information Disentanglement (DCID) model, utilizing a unified codebook, shows promising results in achieving fine-grained representation and cross-modal generalization. However, it is still hindered by equal treatment of all channels and neglect of minor event information, re… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

  38. arXiv:2403.04976  [pdf, other

    cs.DC

    Towards Data-center Level Carbon Modeling and Optimization for Deep Learning Inference

    Authors: Shixin Ji, Zhuoping Yang, Xingzhen Chen, Jingtong Hu, Yiyu Shi, Alex K. Jones, Peipei Zhou

    Abstract: Recently, the increasing need for computing resources has led to the prosperity of data centers, which poses challenges to the environmental impacts and calls for improvements in data center provisioning strategies. In this work, we show a comprehensive analysis based on profiling a variety of deep-learning inference applications on different generations of GPU servers. Our analysis reveals severa… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: 12 pages, 9 figures

  39. arXiv:2403.04929  [pdf, other

    cs.LG cs.AI cs.NE

    On the Markov Property of Neural Algorithmic Reasoning: Analyses and Methods

    Authors: Montgomery Bohde, Meng Liu, Alexandra Saxton, Shuiwang Ji

    Abstract: Neural algorithmic reasoning is an emerging research direction that endows neural networks with the ability to mimic algorithmic executions step-by-step. A common paradigm in existing designs involves the use of historical embeddings in predicting the results of future execution steps. Our observation in this work is that such historical dependence intrinsically contradicts the Markov nature of al… ▽ More

    Submitted 7 March, 2024; originally announced March 2024.

    Comments: To appear at ICLR 2024 (Spotlight paper). 17 pages, 10 figures

  40. arXiv:2403.02727  [pdf, other

    cs.CL cs.AI cs.HC

    HARGPT: Are LLMs Zero-Shot Human Activity Recognizers?

    Authors: Sijie Ji, Xinzhe Zheng, Chenshu Wu

    Abstract: There is an ongoing debate regarding the potential of Large Language Models (LLMs) as foundational models seamlessly integrated with Cyber-Physical Systems (CPS) for interpreting the physical world. In this paper, we carry out a case study to answer the following question: Are LLMs capable of zero-shot human activity recognition (HAR). Our study, HARGPT, presents an affirmative answer by demonstra… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

  41. arXiv:2403.00762  [pdf, other

    cs.CV

    Point Cloud Mamba: Point Cloud Learning via State Space Model

    Authors: Tao Zhang, Xiangtai Li, Haobo Yuan, Shunping Ji, Shuicheng Yan

    Abstract: Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture in point cloud analysis. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To… ▽ More

    Submitted 29 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

    Comments: Update more results on S3DIS dataset

  42. arXiv:2402.19267  [pdf, other

    cs.CL cs.AI

    Robust Guidance for Unsupervised Data Selection: Capturing Perplexing Named Entities for Domain-Specific Machine Translation

    Authors: Seunghyun Ji, Hagai Raja Sinulingga, Darongsae Kwon

    Abstract: Low-resourced data presents a significant challenge for neural machine translation. In most cases, the low-resourced environment is caused by high costs due to the need for domain experts or the lack of language experts. Therefore, identifying the most training-efficient data within an unsupervised setting emerges as a practical strategy. Recent research suggests that such effective data can be id… ▽ More

    Submitted 21 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: 11 pages, 3 figures, 5 tables. Oral presentation was given in SIGUL 2024, a satellite workshop of LREC-COLING 2024 (https://sigul-2024.ilc.cnr.it/wp-content/uploads/2024/05/Ji-et-al.pdf)

  43. arXiv:2402.19200  [pdf, other

    cs.CR cs.CL

    PRSA: PRompt Stealing Attacks against Large Language Models

    Authors: Yong Yang, Changjiang Li, Yi Jiang, Xi Chen, Haoyu Wang, Xuhong Zhang, Zonghui Wang, Shouling Ji

    Abstract: In recent years, "prompt as a service" has greatly enhanced the utility of large language models (LLMs) by enabling them to perform various downstream tasks efficiently without fine-tuning. This has also increased the commercial value of prompts. However, the potential risk of leakage in these commercialized prompts remains largely underexplored. In this paper, we introduce a novel attack framewor… ▽ More

    Submitted 7 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

  44. arXiv:2402.13518  [pdf, other

    cs.SE cs.CL

    RITFIS: Robust input testing framework for LLMs-based intelligent software

    Authors: Mingxuan Xiao, Yan Xiao, Hai Dong, Shunhui Ji, Pengcheng Zhang

    Abstract: The dependence of Natural Language Processing (NLP) intelligent software on Large Language Models (LLMs) is increasingly prominent, underscoring the necessity for robustness testing. Current testing methods focus solely on the robustness of LLM-based software to prompts. Given the complexity and diversity of real-world inputs, studying the robustness of LLMbased software in handling comprehensive… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

  45. arXiv:2402.12208  [pdf, other

    eess.AS cs.SD

    Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

    Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

    Abstract: In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs a… ▽ More

    Submitted 27 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

    Comments: We release a more powerful checkpoint in Language-Codec v3

  46. arXiv:2402.09378  [pdf, other

    eess.AS cs.SD

    MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

    Authors: Shengpeng Ji, Ziyue Jiang, Hanting Wang, Jialong Zuo, Zhou Zhao

    Abstract: Zero-shot text-to-speech (TTS) has gained significant attention due to its powerful voice cloning capabilities, requiring only a few seconds of unseen speaker voice prompts. However, all previous work has been developed for cloud-based systems. Taking autoregressive models as an example, although these approaches achieve high-fidelity voice cloning, they fall short in terms of inference speed, mod… ▽ More

    Submitted 2 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by ACL 2024 (Main Conference)

  47. arXiv:2402.03741  [pdf, other

    cs.LG cs.AI cs.CR

    SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

    Authors: Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji

    Abstract: Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's v… ▽ More

    Submitted 26 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

  48. arXiv:2401.14027  [pdf, other

    cs.LG

    The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness

    Authors: Mengyao Du, Miao Zhang, Yuwen Pu, Kai Xu, Shouling Ji, Quanjun Yin

    Abstract: To tackle the scarcity and privacy issues associated with domain-specific datasets, the integration of federated learning in conjunction with fine-tuning has emerged as a practical solution. However, our findings reveal that federated learning has the risk of skewing fine-tuning features and compromising the out-of-distribution robustness of the model. By introducing three robustness indicators an… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: 12 pages, 10 figures

  49. arXiv:2401.13303  [pdf, other

    cs.CL

    MaLA-500: Massive Language Adaptation of Large Language Models

    Authors: Peiqin Lin, Shaoxiong Ji, Jörg Tiedemann, André F. T. Martins, Hinrich Schütze

    Abstract: Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we em… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

  50. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

    Authors: Jinming Zhuang, Zhuoping Yang, Shixin Ji, Heng Huang, Alex K. Jones, Jingtong Hu, Yiyu Shi, Peipei Zhou

    Abstract: With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latenc… ▽ More

    Submitted 18 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Journal ref: 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '24)