subscribe to arXiv mailings

Point Cloud Mamba: Point Cloud Learning via State Space Model

Authors: Tao Zhang, Xiangtai Li, Haobo Yuan, Shunping Ji, Shuicheng Yan

Abstract: Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture in point cloud analysis. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To… ▽ More Recently, state space models have exhibited strong global modeling capabilities and linear computational complexity in contrast to transformers. This research focuses on applying such architecture in point cloud analysis. In particular, for the first time, we demonstrate that Mamba-based point cloud methods can outperform previous methods based on transformer or multi-layer perceptrons (MLPs). To enable Mamba to process 3-D point cloud data more effectively, we propose a novel Consistent Traverse Serialization method to convert point clouds into 1-D point sequences while ensuring that neighboring points in the sequence are also spatially adjacent. Consistent Traverse Serialization yields six variants by permuting the order of x, y, and z coordinates, and the synergistic use of these variants aids Mamba in comprehensively observing point cloud data. Furthermore, to assist Mamba in handling point sequences with different orders more effectively, we introduce point prompts to inform Mamba of the sequence's arrangement rules. Finally, we propose positional encoding based on spatial coordinate mapping to inject positional information into point cloud sequences better. Point Cloud Mamba surpasses the state-of-the-art (SOTA) point-based method PointNeXt and achieves new SOTA performance on the ScanObjectNN, ModelNet40, ShapeNetPart, and S3DIS datasets. It is worth mentioning that when using a more powerful local feature extraction module, our PCM achieves 82.6 mIoU on S3DIS, significantly surpassing the previous SOTA models, DeLA and PTv3, by 8.5 mIoU and 7.9 mIoU, respectively. Code and model are available at https://github.com/SkyworkAI/PointCloudMamba. △ Less

Submitted 29 May, 2024; v1 submitted 1 March, 2024; originally announced March 2024.

Comments: Update more results on S3DIS dataset

arXiv:2403.00479 [pdf, other]

Observational Evidence for Hot Wind Impact on pc-scale in Low-luminosity Active Galactic Nucleus

Authors: Fangzheng Shi, Feng Yuan, Zhiyuan Li, Zhao Su, Suoqing Ji

Abstract: Supermassive black holes in galaxies spend majority of their lifetime in the low-luminosity regime, powered by hot accretion flow. Strong winds launched from the hot accretion flow have the potential to play an important role in active galactic nuclei (AGN) feedback. Direct observational evidence for these hot winds with temperature around 10 keV, has been obtained through the detection of highly… ▽ More Supermassive black holes in galaxies spend majority of their lifetime in the low-luminosity regime, powered by hot accretion flow. Strong winds launched from the hot accretion flow have the potential to play an important role in active galactic nuclei (AGN) feedback. Direct observational evidence for these hot winds with temperature around 10 keV, has been obtained through the detection of highly ionized iron emission lines with Doppler shifts in two prototypical low-luminosity AGNs, namely M81* and NGC 7213. In this work, we further identify blueshifted H-like O/Ne emission lines in the soft X-ray spectra of these two sources. These lines are interpreted to be associated with additional outflowing components possessing velocity around several $10^3$ km/s and lower temperature (~0.2-0.4 keV). Blue-shifted velocity and the X-ray intensity of these additional outflowing components are hard to be explained by previously detected hot wind freely propagating to larger radii. Through detailed numerical simulations, we find the newly detected blue-shifted emission lines would come from circumnuclear gas shock-heated by the hot wind instead. Hot wind can provide larger ram pressure force on the clumpy circumnuclear gas than the gravitational force from central black hole, effectively impeding the black hole accretion of gas. Our results provide strong evidences for the energy and momentum feedback by the hot AGN wind. △ Less

Submitted 1 March, 2024; originally announced March 2024.

Comments: 19 pages, 9 figures, submitted to ApJ

arXiv:2402.19267 [pdf, other]

Robust Guidance for Unsupervised Data Selection: Capturing Perplexing Named Entities for Domain-Specific Machine Translation

Authors: Seunghyun Ji, Hagai Raja Sinulingga, Darongsae Kwon

Abstract: Low-resourced data presents a significant challenge for neural machine translation. In most cases, the low-resourced environment is caused by high costs due to the need for domain experts or the lack of language experts. Therefore, identifying the most training-efficient data within an unsupervised setting emerges as a practical strategy. Recent research suggests that such effective data can be id… ▽ More Low-resourced data presents a significant challenge for neural machine translation. In most cases, the low-resourced environment is caused by high costs due to the need for domain experts or the lack of language experts. Therefore, identifying the most training-efficient data within an unsupervised setting emerges as a practical strategy. Recent research suggests that such effective data can be identified by selecting 'appropriately complex data' based on its volume, providing strong intuition for unsupervised data selection. However, we have discovered that establishing criteria for unsupervised data selection remains a challenge, as the 'appropriate level of difficulty' may vary depending on the data domain. We introduce a novel unsupervised data selection method named 'Capturing Perplexing Named Entities,' which leverages the maximum inference entropy in translated named entities as a metric for selection. When tested with the 'Korean-English Parallel Corpus of Specialized Domains,' our method served as robust guidance for identifying training-efficient data across different domains, in contrast to existing methods. △ Less

Submitted 21 May, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: 11 pages, 3 figures, 5 tables. Oral presentation was given in SIGUL 2024, a satellite workshop of LREC-COLING 2024 (https://sigul-2024.ilc.cnr.it/wp-content/uploads/2024/05/Ji-et-al.pdf)

arXiv:2402.19200 [pdf, other]

PRSA: PRompt Stealing Attacks against Large Language Models

Authors: Yong Yang, Changjiang Li, Yi Jiang, Xi Chen, Haoyu Wang, Xuhong Zhang, Zonghui Wang, Shouling Ji

Abstract: In recent years, "prompt as a service" has greatly enhanced the utility of large language models (LLMs) by enabling them to perform various downstream tasks efficiently without fine-tuning. This has also increased the commercial value of prompts. However, the potential risk of leakage in these commercialized prompts remains largely underexplored. In this paper, we introduce a novel attack framewor… ▽ More In recent years, "prompt as a service" has greatly enhanced the utility of large language models (LLMs) by enabling them to perform various downstream tasks efficiently without fine-tuning. This has also increased the commercial value of prompts. However, the potential risk of leakage in these commercialized prompts remains largely underexplored. In this paper, we introduce a novel attack framework, PRSA, designed for prompt stealing attacks against LLMs. The main idea of PRSA is to infer the intent behind a prompt by analyzing its input-output content, enabling the generation of a surrogate prompt that replicates the original's functionality. Specifically, PRSA mainly consists of two key phases: prompt mutation and prompt pruning. In the mutation phase, we propose a prompt attention algorithm based on output difference. The algorithm facilitates the generation of effective surrogate prompts by learning key factors that influence the accurate inference of prompt intent. During the pruning phase, we employ a two-step related word identification strategy to detect and mask words that are highly related to the input, thus improving the generalizability of the surrogate prompts. We verify the actual threat of PRSA through evaluation in both real-world settings, non-interactive and interactive prompt services. The results strongly confirm the PRSA's effectiveness and generalizability. We have reported these findings to prompt service providers and actively collaborate with them to implement defensive measures. △ Less

Submitted 7 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

arXiv:2402.13518 [pdf, other]

RITFIS: Robust input testing framework for LLMs-based intelligent software

Authors: Mingxuan Xiao, Yan Xiao, Hai Dong, Shunhui Ji, Pengcheng Zhang

Abstract: The dependence of Natural Language Processing (NLP) intelligent software on Large Language Models (LLMs) is increasingly prominent, underscoring the necessity for robustness testing. Current testing methods focus solely on the robustness of LLM-based software to prompts. Given the complexity and diversity of real-world inputs, studying the robustness of LLMbased software in handling comprehensive… ▽ More The dependence of Natural Language Processing (NLP) intelligent software on Large Language Models (LLMs) is increasingly prominent, underscoring the necessity for robustness testing. Current testing methods focus solely on the robustness of LLM-based software to prompts. Given the complexity and diversity of real-world inputs, studying the robustness of LLMbased software in handling comprehensive inputs (including prompts and examples) is crucial for a thorough understanding of its performance. To this end, this paper introduces RITFIS, a Robust Input Testing Framework for LLM-based Intelligent Software. To our knowledge, RITFIS is the first framework designed to assess the robustness of LLM-based intelligent software against natural language inputs. This framework, based on given threat models and prompts, primarily defines the testing process as a combinatorial optimization problem. Successful test cases are determined by a goal function, creating a transformation space for the original examples through perturbation means, and employing a series of search methods to filter cases that meet both the testing objectives and language constraints. RITFIS, with its modular design, offers a comprehensive method for evaluating the robustness of LLMbased intelligent software. RITFIS adapts 17 automated testing methods, originally designed for Deep Neural Network (DNN)-based intelligent software, to the LLM-based software testing scenario. It demonstrates the effectiveness of RITFIS in evaluating LLM-based intelligent software through empirical validation. However, existing methods generally have limitations, especially when dealing with lengthy texts and structurally complex threat models. Therefore, we conducted a comprehensive analysis based on five metrics and provided insightful testing method optimization strategies, benefiting both researchers and everyday users. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12208 [pdf, other]

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

Abstract: In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs a… ▽ More In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs and downstream speech language models. Specifically, 1) most codec models are trained on only 1,000 hours of data, whereas most speech language models are trained on 60,000 hours; 2) Achieving good reconstruction performance requires the utilization of numerous codebooks, which increases the burden on downstream speech language models; 3) The initial channel of the codebooks contains excessive information, making it challenging to directly generate acoustic tokens from weakly supervised signals such as text in downstream tasks. Consequently, leveraging the characteristics of speech language models, we propose Language-Codec. In the Language-Codec, we introduce a Mask Channel Residual Vector Quantization (MCRVQ) mechanism along with improved Fourier transform structures and larger training datasets to address the aforementioned gaps. We compare our method with competing audio compression algorithms and observe significant outperformance across extensive evaluations. Furthermore, we also validate the efficiency of the Language-Codec on downstream speech language models. The source code and pre-trained models can be accessed at https://github.com/jishengpeng/languagecodec . △ Less

Submitted 27 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: We release a more powerful checkpoint in Language-Codec v3

arXiv:2402.09378 [pdf, other]

MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech

Authors: Shengpeng Ji, Ziyue Jiang, Hanting Wang, Jialong Zuo, Zhou Zhao

Abstract: Zero-shot text-to-speech (TTS) has gained significant attention due to its powerful voice cloning capabilities, requiring only a few seconds of unseen speaker voice prompts. However, all previous work has been developed for cloud-based systems. Taking autoregressive models as an example, although these approaches achieve high-fidelity voice cloning, they fall short in terms of inference speed, mod… ▽ More Zero-shot text-to-speech (TTS) has gained significant attention due to its powerful voice cloning capabilities, requiring only a few seconds of unseen speaker voice prompts. However, all previous work has been developed for cloud-based systems. Taking autoregressive models as an example, although these approaches achieve high-fidelity voice cloning, they fall short in terms of inference speed, model size, and robustness. Therefore, we propose MobileSpeech, which is a fast, lightweight, and robust zero-shot text-to-speech system based on mobile devices for the first time. Specifically: 1) leveraging discrete codec, we design a parallel speech mask decoder module called SMD, which incorporates hierarchical information from the speech codec and weight mechanisms across different codec layers during the generation process. Moreover, to bridge the gap between text and speech, we introduce a high-level probabilistic mask that simulates the progression of information flow from less to more during speech generation. 2) For speaker prompts, we extract fine-grained prompt duration from the prompt speech and incorporate text, prompt speech by cross attention in SMD. We demonstrate the effectiveness of MobileSpeech on multilingual datasets at different levels, achieving state-of-the-art results in terms of generating speed and speech quality. MobileSpeech achieves RTF of 0.09 on a single A100 GPU and we have successfully deployed MobileSpeech on mobile devices. Audio samples are available at \url{https://mobilespeech.github.io/} . △ Less

Submitted 2 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

Comments: Accepted by ACL 2024 (Main Conference)

arXiv:2402.03741 [pdf, other]

SUB-PLAY: Adversarial Policies against Partially Observed Multi-Agent Reinforcement Learning Systems

Authors: Oubo Ma, Yuwen Pu, Linkang Du, Yang Dai, Ruo Wang, Xiaolei Liu, Yingcai Wu, Shouling Ji

Abstract: Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's v… ▽ More Recent advancements in multi-agent reinforcement learning (MARL) have opened up vast application prospects, such as swarm control of drones, collaborative manipulation by robotic arms, and multi-target encirclement. However, potential security threats during the MARL deployment need more attention and thorough investigation. Recent research reveals that attackers can rapidly exploit the victim's vulnerabilities, generating adversarial policies that result in the failure of specific tasks. For instance, reducing the winning rate of a superhuman-level Go AI to around 20%. Existing studies predominantly focus on two-player competitive environments, assuming attackers possess complete global state observation. In this study, we unveil, for the first time, the capability of attackers to generate adversarial policies even when restricted to partial observations of the victims in multi-agent competitive environments. Specifically, we propose a novel black-box attack (SUB-PLAY) that incorporates the concept of constructing multiple subgames to mitigate the impact of partial observability and suggests sharing transitions among subpolicies to improve attackers' exploitative ability. Extensive evaluations demonstrate the effectiveness of SUB-PLAY under three typical partial observability limitations. Visualization results indicate that adversarial policies induce significantly different activations of the victims' policy networks. Furthermore, we evaluate three potential defenses aimed at exploring ways to mitigate security threats posed by adversarial policies, providing constructive recommendations for deploying MARL in competitive environments. △ Less

Submitted 26 June, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

Comments: To appear in the ACM Conference on Computer and Communications Security (CCS'24), October 14-18, 2024, Salt Lake City, UT, USA

arXiv:2401.14027 [pdf, other]

The Risk of Federated Learning to Skew Fine-Tuning Features and Underperform Out-of-Distribution Robustness

Authors: Mengyao Du, Miao Zhang, Yuwen Pu, Kai Xu, Shouling Ji, Quanjun Yin

Abstract: To tackle the scarcity and privacy issues associated with domain-specific datasets, the integration of federated learning in conjunction with fine-tuning has emerged as a practical solution. However, our findings reveal that federated learning has the risk of skewing fine-tuning features and compromising the out-of-distribution robustness of the model. By introducing three robustness indicators an… ▽ More To tackle the scarcity and privacy issues associated with domain-specific datasets, the integration of federated learning in conjunction with fine-tuning has emerged as a practical solution. However, our findings reveal that federated learning has the risk of skewing fine-tuning features and compromising the out-of-distribution robustness of the model. By introducing three robustness indicators and conducting experiments across diverse robust datasets, we elucidate these phenomena by scrutinizing the diversity, transferability, and deviation within the model feature space. To mitigate the negative impact of federated learning on model robustness, we introduce GNP, a \underline{G}eneral \underline{N}oisy \underline{P}rojection-based robust algorithm, ensuring no deterioration of accuracy on the target distribution. Specifically, the key strategy for enhancing model robustness entails the transfer of robustness from the pre-trained model to the fine-tuned model, coupled with adding a small amount of Gaussian noise to augment the representative capacity of the model. Comprehensive experimental results demonstrate that our approach markedly enhances the robustness across diverse scenarios, encompassing various parameter-efficient fine-tuning methods and confronting different levels of data heterogeneity. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 12 pages, 10 figures

arXiv:2401.13303 [pdf, other]

MaLA-500: Massive Language Adaptation of Large Language Models

Authors: Peiqin Lin, Shaoxiong Ji, Jörg Tiedemann, André F. T. Martins, Hinrich Schütze

Abstract: Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we em… ▽ More Large language models (LLMs) have advanced the state of the art in natural language processing. However, their predominant design for English or a limited set of languages creates a substantial gap in their effectiveness for low-resource languages. To bridge this gap, we introduce MaLA-500, a novel large language model designed to cover an extensive range of 534 languages. To train MaLA-500, we employ vocabulary extension and continued pretraining on LLaMA 2 with Glot500-c. Our intrinsic evaluation demonstrates that MaLA-500 is better at predicting the given texts of low-resource languages than existing multilingual LLMs. Moreover, the extrinsic evaluation of in-context learning shows that MaLA-500 outperforms previous LLMs on SIB200 and Taxi1500 by a significant margin, i.e., 11.68% and 4.82% marco-average accuracy across languages. We release MaLA-500 at https://huggingface.co/MaLA-LM △ Less

Submitted 3 April, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.13281 [pdf, ps, other]

On the metric of the jet bundle and similarity on Dirichlet spaces

Authors: Kui Ji, Shanshan Ji, Hyun-Kyoung Kwon, Xiaoceng Liu, Jing Xu

Abstract: In general, it is more difficult to formulate a sufficient condition for similarity than a necessary condition. We give a sufficient condition for a Cowen-Douglas operator with a positivity condition to be similar to the backward shift operator on weighted Dirichlet space. This condition involves the holomorphic jet bundle of the eigenvector bundle of the operator. In general, it is more difficult to formulate a sufficient condition for similarity than a necessary condition. We give a sufficient condition for a Cowen-Douglas operator with a positivity condition to be similar to the backward shift operator on weighted Dirichlet space. This condition involves the holomorphic jet bundle of the eigenvector bundle of the operator. △ Less

Submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.10417 [pdf, other]

doi 10.1145/3626202.3637569

SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration

Authors: Jinming Zhuang, Zhuoping Yang, Shixin Ji, Heng Huang, Alex K. Jones, Jingtong Hu, Yiyu Shi, Peipei Zhou

Abstract: With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latenc… ▽ More With the increase in the computation intensity of the chip, the mismatch between computation layer shapes and the available computation resource significantly limits the utilization of the chip. Driven by this observation, prior works discuss spatial accelerators or dataflow architecture to maximize the throughput. However, using spatial accelerators could potentially increase the execution latency. In this work, we first systematically investigate two execution models: (1) sequentially (temporally) launch one monolithic accelerator, and (2) spatially launch multiple accelerators. From the observations, we find that there is a latency throughput tradeoff between these two execution models, and combining these two strategies together can give us a more efficient latency throughput Pareto front. To achieve this, we propose spatial sequential architecture (SSR) and SSR design automation framework to explore both strategies together when deploying deep learning inference. We use the 7nm AMD Versal ACAP VCK190 board to implement SSR accelerators for four end-to-end transformer-based deep learning models. SSR achieves average throughput gains of 2.53x, 35.71x, and 14.20x under different batch sizes compared to the 8nm Nvidia GPU A10G, 16nm AMD FPGAs ZCU102, and U250. The average energy efficiency gains are 8.51x, 6.75x, and 21.22x, respectively. Compared with the sequential-only solution and spatial-only solution on VCK190, our spatial-sequential-hybrid solutions achieve higher throughput under the same latency requirement and lower latency under the same throughput requirement. We also use SSR analytical models to demonstrate how to use SSR to optimize solutions on other computing platforms, e.g., 14nm Intel Stratix 10 NX. △ Less

Submitted 18 February, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Journal ref: 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '24)

arXiv:2401.06270 [pdf, other]

SCARIF: Towards Carbon Modeling of Cloud Servers with Accelerators

Authors: Shixin Ji, Zhuoping Yang, Xingzhen Chen, Stephen Cahoon, Jingtong Hu, Yiyu Shi, Alex K. Jones, Peipei Zhou

Abstract: Embodied carbon has been widely reported as a significant component in the full system lifecycle of various computing systems' green house gas emissions. Many efforts have been undertaken to quantify the elements that comprise this embodied carbon, from tools that evaluate semiconductor manufacturing to those that can quantify different elements of the computing system from commercial and academic… ▽ More Embodied carbon has been widely reported as a significant component in the full system lifecycle of various computing systems' green house gas emissions. Many efforts have been undertaken to quantify the elements that comprise this embodied carbon, from tools that evaluate semiconductor manufacturing to those that can quantify different elements of the computing system from commercial and academic sources. However, these tools cannot easily reproduce results reported by server vendors' product carbon reports and the accuracy can vary substantially due to various assumptions. Furthermore, attempts to determine green house gas contributions using bottom-up methodologies often do not agree with system-level studies and are hard to rectify. Nonetheless, given there is a need to consider all contributions to green house gas emissions in datacenters, we propose SCARIF, the Server Carbon including Accelerator Reporter with Intelligence-based Formulation tool. SCARIF has three main contributions: (1) We first collect reported carbon cost data from server vendors and design statistic models to predict the embodied carbon cost so that users can get the embodied carbon cost for their server configurations. (2) We provide embodied carbon cost if users configure servers with accelerators including GPUs, and FPGAs. (3) By using case studies, we show that certain design choices of data center management might flip by the insight and observation from using SCARIF. Thus, SCARIF provides an opportunity for large-scale datacenter and hyperscaler design. We release SCARIF as an open-source tool at https://github.com/arc-research-lab/SCARIF. △ Less

Submitted 22 May, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: 6 pages; 6 figures; 3 tables. Accepted by ISVLSI' 24

arXiv:2401.05561 [pdf, other]

TrustLLM: Trustworthiness in Large Language Models

Authors: Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang , et al. (45 additional authors not shown)

Abstract: Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in… ▽ More Large language models (LLMs), exemplified by ChatGPT, have gained considerable attention for their excellent natural language processing capabilities. Nonetheless, these LLMs present many challenges, particularly in the realm of trustworthiness. Therefore, ensuring the trustworthiness of LLMs emerges as an important topic. This paper introduces TrustLLM, a comprehensive study of trustworthiness in LLMs, including principles for different dimensions of trustworthiness, established benchmark, evaluation, and analysis of trustworthiness for mainstream LLMs, and discussion of open challenges and future directions. Specifically, we first propose a set of principles for trustworthy LLMs that span eight different dimensions. Based on these principles, we further establish a benchmark across six dimensions including truthfulness, safety, fairness, robustness, privacy, and machine ethics. We then present a study evaluating 16 mainstream LLMs in TrustLLM, consisting of over 30 datasets. Our findings firstly show that in general trustworthiness and utility (i.e., functional effectiveness) are positively related. Secondly, our observations reveal that proprietary LLMs generally outperform most open-source counterparts in terms of trustworthiness, raising concerns about the potential risks of widely accessible open-source LLMs. However, a few open-source LLMs come very close to proprietary ones. Thirdly, it is important to note that some LLMs may be overly calibrated towards exhibiting trustworthiness, to the extent that they compromise their utility by mistakenly treating benign prompts as harmful and consequently not responding. Finally, we emphasize the importance of ensuring transparency not only in the models themselves but also in the technologies that underpin trustworthiness. Knowing the specific trustworthy technologies that have been employed is crucial for analyzing their effectiveness. △ Less

Submitted 17 March, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: This work is still under work and we welcome your contribution

arXiv:2401.03690 [pdf]

So You Want to Image Myelin Using MRI: Magnetic Susceptibility Source Separation for Myelin Imaging

Authors: Jongho Lee, Sooyeon Ji, Se-Hong Oh

Abstract: In MRI, researchers have long endeavored to effectively visualize myelin distribution in the brain, a pursuit with significant implications for both scientific research and clinical applications. Over time, various methods such as myelin water imaging, magnetization transfer imaging, and relaxometric imaging have been developed, each carrying distinct advantages and limitations. Recently, an innov… ▽ More In MRI, researchers have long endeavored to effectively visualize myelin distribution in the brain, a pursuit with significant implications for both scientific research and clinical applications. Over time, various methods such as myelin water imaging, magnetization transfer imaging, and relaxometric imaging have been developed, each carrying distinct advantages and limitations. Recently, an innovative technique named as magnetic susceptibility source separation has emerged, introducing a novel surrogate biomarker for myelin in the form of a diamagnetic susceptibility map. This paper comprehensively reviews this cutting-edge method, providing the fundamental concepts of magnetic susceptibility, susceptibility imaging, and the validation of the diamagnetic susceptibility map as a myelin biomarker that indirectly measure myelin content. Additionally, the paper explores essential aspects of data acquisition and processing, offering practical insights for readers. A comparison with established myelin imaging methods is also presented, and both current and prospective clinical and scientific applications are discussed to provide a holistic understanding of the technique. This work aims to serve as a foundational resource for newcomers entering this dynamic and rapidly expanding field. △ Less

Submitted 28 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted to Magnetic Resonance in Medical Sciences

arXiv:2401.02615 [pdf, other]

doi 10.1109/TIFS.2024.3350911

AdvSQLi: Generating Adversarial SQL Injections against Real-world WAF-as-a-service

Authors: Zhenqing Qu, Xiang Ling, Ting Wang, Xiang Chen, Shouling Ji, Chunming Wu

Abstract: As the first defensive layer that attacks would hit, the web application firewall (WAF) plays an indispensable role in defending against malicious web attacks like SQL injection (SQLi). With the development of cloud computing, WAF-as-a-service, as one kind of Security-as-a-service, has been proposed to facilitate the deployment, configuration, and update of WAFs in the cloud. Despite its tremendou… ▽ More As the first defensive layer that attacks would hit, the web application firewall (WAF) plays an indispensable role in defending against malicious web attacks like SQL injection (SQLi). With the development of cloud computing, WAF-as-a-service, as one kind of Security-as-a-service, has been proposed to facilitate the deployment, configuration, and update of WAFs in the cloud. Despite its tremendous popularity, the security vulnerabilities of WAF-as-a-service are still largely unknown, which is highly concerning given its massive usage. In this paper, we propose a general and extendable attack framework, namely AdvSQLi, in which a minimal series of transformations are performed on the hierarchical tree representation of the original SQLi payload, such that the generated SQLi payloads can not only bypass WAF-as-a-service under black-box settings but also keep the same functionality and maliciousness as the original payload. With AdvSQLi, we make it feasible to inspect and understand the security vulnerabilities of WAFs automatically, helping vendors make products more secure. To evaluate the attack effectiveness and efficiency of AdvSQLi, we first employ two public datasets to generate adversarial SQLi payloads, leading to a maximum attack success rate of 100% against state-of-the-art ML-based SQLi detectors. Furthermore, to demonstrate the immediate security threats caused by AdvSQLi, we evaluate the attack effectiveness against 7 WAF-as-a-service solutions from mainstream vendors and find all of them are vulnerable to AdvSQLi. For instance, AdvSQLi achieves an attack success rate of over 79% against the F5 WAF. Through in-depth analysis of the evaluation results, we further condense out several general yet severe flaws of these vendors that cannot be easily patched. △ Less

Submitted 9 January, 2024; v1 submitted 4 January, 2024; originally announced January 2024.

Comments: Accepted by IEEE Transactions on Information Forensics and Security (IEEE TIFS)

arXiv:2312.15507 [pdf, other]

doi 10.1145/3625687.3625812

Construct 3D Hand Skeleton with Commercial WiFi

Authors: Sijie Ji, Xuanye Zhang, Yuanqing Zheng, Mo Li

Abstract: This paper presents HandFi, which constructs hand skeletons with practical WiFi devices. Unlike previous WiFi hand sensing systems that primarily employ predefined gestures for pattern matching, by constructing the hand skeleton, HandFi can enable a variety of downstream WiFi-based hand sensing applications in gaming, healthcare, and smart homes. Deriving the skeleton from WiFi signals is challeng… ▽ More This paper presents HandFi, which constructs hand skeletons with practical WiFi devices. Unlike previous WiFi hand sensing systems that primarily employ predefined gestures for pattern matching, by constructing the hand skeleton, HandFi can enable a variety of downstream WiFi-based hand sensing applications in gaming, healthcare, and smart homes. Deriving the skeleton from WiFi signals is challenging, especially because the palm is a dominant reflector compared with fingers. HandFi develops a novel multi-task learning neural network with a series of customized loss functions to capture the low-level hand information from WiFi signals. During offline training, HandFi takes raw WiFi signals as input and uses the leap motion to provide supervision. During online use, only with commercial WiFi, HandFi is capable of producing 2D hand masks as well as 3D hand poses. We demonstrate that HandFi can serve as a foundation model to enable developers to build various applications such as finger tracking and sign language recognition, and outperform existing WiFi-based solutions. Artifacts can be found: https://github.com/SIJIEJI/HandFi △ Less

Submitted 24 December, 2023; originally announced December 2023.

Journal ref: ACM SenSys 2023

arXiv:2312.14677 [pdf, other]

MEAOD: Model Extraction Attack against Object Detectors

Authors: Zeyu Li, Chenghui Shi, Yuwen Pu, Xuhong Zhang, Yu Li, Jinbao Li, Shouling Ji

Abstract: The widespread use of deep learning technology across various industries has made deep neural network models highly valuable and, as a result, attractive targets for potential attackers. Model extraction attacks, particularly query-based model extraction attacks, allow attackers to replicate a substitute model with comparable functionality to the victim model and present a significant threat to th… ▽ More The widespread use of deep learning technology across various industries has made deep neural network models highly valuable and, as a result, attractive targets for potential attackers. Model extraction attacks, particularly query-based model extraction attacks, allow attackers to replicate a substitute model with comparable functionality to the victim model and present a significant threat to the confidentiality and security of MLaaS platforms. While many studies have explored threats of model extraction attacks against classification models in recent years, object detection models, which are more frequently used in real-world scenarios, have received less attention. In this paper, we investigate the challenges and feasibility of query-based model extraction attacks against object detection models and propose an effective attack method called MEAOD. It selects samples from the attacker-possessed dataset to construct an efficient query dataset using active learning and enhances the categories with insufficient objects. We additionally improve the extraction effectiveness by updating the annotations of the query dataset. According to our gray-box and black-box scenarios experiments, we achieve an extraction performance of over 70% under the given condition of a 10k query budget. △ Less

Submitted 22 December, 2023; originally announced December 2023.

arXiv:2312.14181 [pdf, other]

doi 10.1103/PhysRevB.109.155407

Reversal of Orbital Hall Conductivity and Emergence of Tunable Topological Quantum States in Orbital Hall Insulator

Authors: Shilei Ji, Chuye Quan, Ruijia Yao, Jianping Yang, Xing'ao Li

Abstract: Recent findings indicate that orbital angular momentum (OAM) has the capability to induce the intrinsic orbital Hall effect (OHE), which is characterized by orbital Chern number in the orbital Hall insulator. Unlike the spin-polarized channel in Quantum anomalous Hall insulator, the OAM is valley-locked, posing challenges in manipulating the corresponding edge state. Here we demonstrate the sign-r… ▽ More Recent findings indicate that orbital angular momentum (OAM) has the capability to induce the intrinsic orbital Hall effect (OHE), which is characterized by orbital Chern number in the orbital Hall insulator. Unlike the spin-polarized channel in Quantum anomalous Hall insulator, the OAM is valley-locked, posing challenges in manipulating the corresponding edge state. Here we demonstrate the sign-reversal orbital Chern number through strain engineering by combing the $k \cdot p$ model and first-principles calculation. Under the manipulation of strain, we observe the transfer of non-zero OAM from the valence band to the conduction band, aligning with the orbital contribution in the electronic structure. Our investigation reveals that electrons and holes with OAM exhibit opposing trajectories, resulting in a reversal of the orbital Hall conductivity. Furthermore, we explore the topological quantum state between the sign-reversible OHE. △ Less

Submitted 21 February, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

arXiv:2312.13305 [pdf, other]

DVIS++: Improved Decoupled Framework for Universal Video Segmentation

Authors: Tao Zhang, Xingye Tian, Yikang Zhou, Shunping Ji, Xuebo Wang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Yu Wu

Abstract: We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS). Unlike previous methods that model video segmentation in an end-to-end manner, our approach decouples video segmentat… ▽ More We present the \textbf{D}ecoupled \textbf{VI}deo \textbf{S}egmentation (DVIS) framework, a novel approach for the challenging task of universal video segmentation, including video instance segmentation (VIS), video semantic segmentation (VSS), and video panoptic segmentation (VPS). Unlike previous methods that model video segmentation in an end-to-end manner, our approach decouples video segmentation into three cascaded sub-tasks: segmentation, tracking, and refinement. This decoupling design allows for simpler and more effective modeling of the spatio-temporal representations of objects, especially in complex scenes and long videos. Accordingly, we introduce two novel components: the referring tracker and the temporal refiner. These components track objects frame by frame and model spatio-temporal representations based on pre-aligned features. To improve the tracking capability of DVIS, we propose a denoising training strategy and introduce contrastive learning, resulting in a more robust framework named DVIS++. Furthermore, we evaluate DVIS++ in various settings, including open vocabulary and using a frozen pre-trained backbone. By integrating CLIP with DVIS++, we present OV-DVIS++, the first open-vocabulary universal video segmentation framework. We conduct extensive experiments on six mainstream benchmarks, including the VIS, VSS, and VPS datasets. Using a unified architecture, DVIS++ significantly outperforms state-of-the-art specialized methods on these benchmarks in both close- and open-vocabulary settings. Code:~\url{https://github.com/zhang-tao-whu/DVIS_Plus}. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.10307 [pdf, other]

MusER: Musical Element-Based Regularization for Generating Symbolic Music with Emotion

Authors: Shulei Ji, Xinyu Yang

Abstract: Generating music with emotion is an important task in automatic music generation, in which emotion is evoked through a variety of musical elements (such as pitch and duration) that change over time and collaborate with each other. However, prior research on deep learning-based emotional music generation has rarely explored the contribution of different musical elements to emotions, let alone the d… ▽ More Generating music with emotion is an important task in automatic music generation, in which emotion is evoked through a variety of musical elements (such as pitch and duration) that change over time and collaborate with each other. However, prior research on deep learning-based emotional music generation has rarely explored the contribution of different musical elements to emotions, let alone the deliberate manipulation of these elements to alter the emotion of music, which is not conducive to fine-grained element-level control over emotions. To address this gap, we present a novel approach employing musical element-based regularization in the latent space to disentangle distinct elements, investigate their roles in distinguishing emotions, and further manipulate elements to alter musical emotions. Specifically, we propose a novel VQ-VAE-based model named MusER. MusER incorporates a regularization loss to enforce the correspondence between the musical element sequences and the specific dimensions of latent variable sequences, providing a new solution for disentangling discrete sequences. Taking advantage of the disentangled latent vectors, a two-level decoding strategy that includes multiple decoders attending to latent vectors with different semantics is devised to better predict the elements. By visualizing latent space, we conclude that MusER yields a disentangled and interpretable latent space and gain insights into the contribution of distinct elements to the emotional dimensions (i.e., arousal and valence). Experimental results demonstrate that MusER outperforms the state-of-the-art models for generating emotional music in both objective and subjective evaluation. Besides, we rearrange music through element transfer and attempt to alter the emotion of music by transferring emotion-distinguishable elements. △ Less

Submitted 1 January, 2024; v1 submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.09716 [pdf, other]

Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

Authors: Zhe Ma, Jianfeng Dong, Shouling Ji, Zhenguang Liu, Xuhong Zhang, Zonghui Wang, Sifeng He, Feng Qian, Xiaobo Zhang, Lei Yang

Abstract: Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowled… ▽ More Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval. Furthermore, we discover that the similarities obtained by different retrieval models are diversified and incommensurable, which makes it challenging to jointly distill knowledge from multiple models. Therefore, we propose to whiten the output of teacher models before fusion, which enables effective multi-teacher distillation for retrieval models. Whiten-MTD is conceptually simple and practically effective. Extensive experiments on two landmark image retrieval datasets and one video retrieval dataset demonstrate the effectiveness of our proposed method, and its good balance of retrieval performance and efficiency. Our source code is released at https://github.com/Maryeon/whiten_mtd. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.09057 [pdf, other]

On the Difficulty of Defending Contrastive Learning against Backdoor Attacks

Authors: Changjiang Li, Ren Pang, Bochuan Cao, Zhaohan Xi, Jinghui Chen, Shouling Ji, Ting Wang

Abstract: Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers. However, thus far it remains under-explored how contrastive backdoor attacks fundamentally differ from their supervised counterparts, which impedes the development of effective… ▽ More Recent studies have shown that contrastive learning, like supervised learning, is highly vulnerable to backdoor attacks wherein malicious functions are injected into target models, only to be activated by specific triggers. However, thus far it remains under-explored how contrastive backdoor attacks fundamentally differ from their supervised counterparts, which impedes the development of effective defenses against the emerging threat. This work represents a solid step toward answering this critical question. Specifically, we define TRL, a unified framework that encompasses both supervised and contrastive backdoor attacks. Through the lens of TRL, we uncover that the two types of attacks operate through distinctive mechanisms: in supervised attacks, the learning of benign and backdoor tasks tends to occur independently, while in contrastive attacks, the two tasks are deeply intertwined both in their representations and throughout their learning processes. This distinction leads to the disparate learning dynamics and feature distributions of supervised and contrastive attacks. More importantly, we reveal that the specificities of contrastive backdoor attacks entail important implications from a defense perspective: existing defenses for supervised attacks are often inadequate and not easily retrofitted to contrastive attacks. We also explore several alternative defenses and discuss their potential challenges. Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks, pointing to promising directions for future research. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: USENIX Security 24

arXiv:2312.09030 [pdf, other]

Dual Branch Network Towards Accurate Printed Mathematical Expression Recognition

Authors: Yuqing Wang, Zhenyu Weng, Zhaokun Zhou, Shuaijian Ji, Zhongjie Ye, Yuesheng Zhu

Abstract: Over the past years, Printed Mathematical Expression Recognition (PMER) has progressed rapidly. However, due to the insufficient context information captured by Convolutional Neural Networks, some mathematical symbols might be incorrectly recognized or missed. To tackle this problem, in this paper, a Dual Branch transformer-based Network (DBN) is proposed to learn both local and global context inf… ▽ More Over the past years, Printed Mathematical Expression Recognition (PMER) has progressed rapidly. However, due to the insufficient context information captured by Convolutional Neural Networks, some mathematical symbols might be incorrectly recognized or missed. To tackle this problem, in this paper, a Dual Branch transformer-based Network (DBN) is proposed to learn both local and global context information for accurate PMER. In our DBN, local and global features are extracted simultaneously, and a Context Coupling Module (CCM) is developed to complement the features between the global and local contexts. CCM adopts an interactive manner so that the coupled context clues are highly correlated to each expression symbol. Additionally, we design a Dynamic Soft Target (DST) strategy to utilize the similarities among symbol categories for reasonable label generation. Our experimental results have demonstrated that DBN can accurately recognize mathematical expressions and has achieved state-of-the-art performance. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Published at ICANN 2022

arXiv:2312.06983 [pdf, other]

doi 10.1016/j.dsp.2023.104336

Robust Target Detection of Intelligent Integrated Optical Camera and mmWave Radar System

Authors: Chen Zhu, Zhouxiang Zhao, Zejing Shan, Lijie Yang, Sijie Ji, Zhaohui Yang, Zhaoyang Zhang

Abstract: Target detection is pivotal for modern urban computing applications. While image-based techniques are widely adopted, they falter under challenging environmental conditions such as adverse weather, poor lighting, and occlusion. To improve the target detection performance under complex real-world scenarios, this paper proposes an intelligent integrated optical camera and millimeter-wave (mmWave) ra… ▽ More Target detection is pivotal for modern urban computing applications. While image-based techniques are widely adopted, they falter under challenging environmental conditions such as adverse weather, poor lighting, and occlusion. To improve the target detection performance under complex real-world scenarios, this paper proposes an intelligent integrated optical camera and millimeter-wave (mmWave) radar system. Utilizing both physical knowledge and data-driven methods, a long-term robust radar-camera fusion algorithm is proposed to solve the heterogeneous data fusion problem for detection improvement. For the occlusion scenarios, the proposed algorithm can effectively detect occluded targets with the help of memory through performing long-term detection. For dark scenarios with low-light conditions, the proposed algorithm can effectively mark the target in the dark picture as well as provide rough stickman imaging. The above two innovative functions of the hybrid optical camera and mmWave radar system are tested in real-world scenarios. The results demonstrate the robustness and significant enhancement in the target detection performance of our integrated system. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.03378 [pdf, other]

Riemannian Complex Matrix Convolution Network for PolSAR Image Classification

Authors: Junfei Shi, Wei Wang, Haiyan Jin, Mengmeng Nie, Shanshan Ji

Abstract: Recently, deep learning methods have achieved superior performance for Polarimetric Synthetic Aperture Radar(PolSAR) image classification. Existing deep learning methods learn PolSAR data by converting the covariance matrix into a feature vector or complex-valued vector as the input. However, all these methods cannot learn the structure of complex matrix directly and destroy the channel correlatio… ▽ More Recently, deep learning methods have achieved superior performance for Polarimetric Synthetic Aperture Radar(PolSAR) image classification. Existing deep learning methods learn PolSAR data by converting the covariance matrix into a feature vector or complex-valued vector as the input. However, all these methods cannot learn the structure of complex matrix directly and destroy the channel correlation. To learn geometric structure of complex matrix, we propose a Riemannian complex matrix convolution network for PolSAR image classification in Riemannian space for the first time, which directly utilizes the complex matrix as the network input and defines the Riemannian operations to learn complex matrix's features. The proposed Riemannian complex matrix convolution network considers PolSAR complex matrix endowed in Riemannian manifold, and defines a series of new Riemannian convolution, ReLu and LogEig operations in Riemannian space, which breaks through the Euclidean constraint of conventional networks. Then, a CNN module is appended to enhance contextual Riemannian features. Besides, a fast kernel learning method is developed for the proposed method to learn class-specific features and reduce the computation time effectively. Experiments are conducted on three sets of real PolSAR data with different bands and sensors. Experiments results demonstrates the proposed method can obtain superior performance than the state-of-the-art methods. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2311.17400 [pdf, other]

doi 10.14722/ndss.2024.24115

Improving the Robustness of Transformer-based Large Language Models with Dynamic Attention

Authors: Lujia Shen, Yuwen Pu, Shouling Ji, Changjiang Li, Xuhong Zhang, Chunpeng Ge, Ting Wang

Abstract: Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. Despite various methods that have been proposed to enhance the model's robustness and… ▽ More Transformer-based models, such as BERT and GPT, have been widely adopted in natural language processing (NLP) due to their exceptional performance. However, recent studies show their vulnerability to textual adversarial attacks where the model's output can be misled by intentionally manipulating the text inputs. Despite various methods that have been proposed to enhance the model's robustness and mitigate this vulnerability, many require heavy consumption resources (e.g., adversarial training) or only provide limited protection (e.g., defensive dropout). In this paper, we propose a novel method called dynamic attention, tailored for the transformer architecture, to enhance the inherent robustness of the model itself against various adversarial attacks. Our method requires no downstream task knowledge and does not incur additional costs. The proposed dynamic attention consists of two modules: (I) attention rectification, which masks or weakens the attention value of the chosen tokens, and (ii) dynamic modeling, which dynamically builds the set of candidate tokens. Extensive experiments demonstrate that dynamic attention significantly mitigates the impact of adversarial attacks, improving up to 33\% better performance than previous methods against widely-used adversarial attacks. The model-level design of dynamic attention enables it to be easily combined with other defense methods (e.g., adversarial training) to further enhance the model's robustness. Furthermore, we demonstrate that dynamic attention preserves the state-of-the-art robustness space of the original model compared to other dynamic modeling methods. △ Less

Submitted 29 November, 2023; v1 submitted 29 November, 2023; originally announced November 2023.

arXiv:2311.16417 [pdf, other]

Challenges and Opportunities to Enable Large-Scale Computing via Heterogeneous Chiplets

Authors: Zhuoping Yang, Shixin Ji, Xingzhen Chen, Jinming Zhuang, Weifeng Zhang, Dharmesh Jani, Peipei Zhou

Abstract: Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scali… ▽ More Fast-evolving artificial intelligence (AI) algorithms such as large language models have been driving the ever-increasing computing demands in today's data centers. Heterogeneous computing with domain-specific architectures (DSAs) brings many opportunities when scaling up and scaling out the computing system. In particular, heterogeneous chiplet architecture is favored to keep scaling up and scaling out the system as well as to reduce the design complexity and the cost stemming from the traditional monolithic chip design. However, how to interconnect computing resources and orchestrate heterogeneous chiplets is the key to success. In this paper, we first discuss the diversity and evolving demands of different AI workloads. We discuss how chiplet brings better cost efficiency and shorter time to market. Then we discuss the challenges in establishing chiplet interface standards, packaging, and security issues. We further discuss the software programming challenges in chiplet systems. △ Less

Submitted 4 March, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.11267 [pdf, other]

Rethinking Large Language Models in Mental Health Applications

Authors: Shaoxiong Ji, Tianlin Zhang, Kailai Yang, Sophia Ananiadou, Erik Cambria

Abstract: Large Language Models (LLMs) have become valuable assets in mental health, showing promise in both classification tasks and counseling applications. This paper offers a perspective on using LLMs in mental health applications. It discusses the instability of generative models for prediction and the potential for generating hallucinatory outputs, underscoring the need for ongoing audits and evaluati… ▽ More Large Language Models (LLMs) have become valuable assets in mental health, showing promise in both classification tasks and counseling applications. This paper offers a perspective on using LLMs in mental health applications. It discusses the instability of generative models for prediction and the potential for generating hallucinatory outputs, underscoring the need for ongoing audits and evaluations to maintain their reliability and dependability. The paper also distinguishes between the often interchangeable terms ``explainability'' and ``interpretability'', advocating for developing inherently interpretable methods instead of relying on potentially hallucinated self-explanations generated by LLMs. Despite the advancements in LLMs, human counselors' empathetic understanding, nuanced interpretation, and contextual awareness remain irreplaceable in the sensitive and complex realm of mental health counseling. The use of LLMs should be approached with a judicious and considerate mindset, viewing them as tools that complement human expertise rather than seeking to replace it. △ Less

Submitted 17 December, 2023; v1 submitted 19 November, 2023; originally announced November 2023.

arXiv:2311.07277 [pdf, other]

AdaCCD: Adaptive Semantic Contrasts Discovery Based Cross Lingual Adaptation for Code Clone Detection

Authors: Yangkai Du, Tengfei Ma, Lingfei Wu, Xuhong Zhang, Shouling Ji

Abstract: Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annotated data as well as their own model design constr… ▽ More Code Clone Detection, which aims to retrieve functionally similar programs from large code bases, has been attracting increasing attention. Modern software often involves a diverse range of programming languages. However, current code clone detection methods are generally limited to only a few popular programming languages due to insufficient annotated data as well as their own model design constraints. To address these issues, we present AdaCCD, a novel cross-lingual adaptation method that can detect cloned codes in a new language without annotations in that language. AdaCCD leverages language-agnostic code representations from pre-trained programming language models and propose an Adaptively Refined Contrastive Learning framework to transfer knowledge from resource-rich languages to resource-poor languages. We evaluate the cross-lingual adaptation results of AdaCCD by constructing a multilingual code clone detection benchmark consisting of 5 programming languages. AdaCCD achieves significant improvements over other baselines, and achieve comparable performance to supervised fine-tuning. △ Less

Submitted 6 March, 2024; v1 submitted 13 November, 2023; originally announced November 2023.

arXiv:2311.06530 [pdf, other]

Exploring ChatGPT's Capabilities on Vulnerability Management

Authors: Peiyu Liu, Junming Liu, Lirong Fu, Kangjie Lu, Yifan Xia, Xuhong Zhang, Wenzhi Chen, Haiqin Weng, Shouling Ji, Wenhai Wang

Abstract: Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors. However, it is unclear whether ChatGPT can complete more complicated real-world vulner… ▽ More Recently, ChatGPT has attracted great attention from the code analysis domain. Prior works show that ChatGPT has the capabilities of processing foundational code analysis tasks, such as abstract syntax tree generation, which indicates the potential of using ChatGPT to comprehend code syntax and static behaviors. However, it is unclear whether ChatGPT can complete more complicated real-world vulnerability management tasks, such as the prediction of security relevance and patch correctness, which require an all-encompassing understanding of various aspects, including code syntax, program semantics, and related manual comments. In this paper, we explore ChatGPT's capabilities on 6 tasks involving the complete vulnerability management process with a large-scale dataset containing 70,346 samples. For each task, we compare ChatGPT against SOTA approaches, investigate the impact of different prompts, and explore the difficulties. The results suggest promising potential in leveraging ChatGPT to assist vulnerability management. One notable example is ChatGPT's proficiency in tasks like generating titles for software bug reports. Furthermore, our findings reveal the difficulties encountered by ChatGPT and shed light on promising future directions. For instance, directly providing random demonstration examples in the prompt cannot consistently guarantee good performance in vulnerability management. By contrast, leveraging ChatGPT in a self-heuristic way -- extracting expertise from demonstration examples itself and integrating the extracted expertise in the prompt is a promising research direction. Besides, ChatGPT may misunderstand and misuse the information in the prompt. Consequently, effectively guiding ChatGPT to focus on helpful information rather than the irrelevant content is still an open problem. △ Less

Submitted 20 June, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: Accepted by USENIX Security 2024

arXiv:2311.02295 [pdf, ps, other]

On the irreducibility and weakly homogeneity of a class of operators

Authors: Shanshan Ji, Xiaomeng Wei

Abstract: To construct more homogeneous operators, B. Bagchi and G. Misra in \cite{d} introduced the operator $\left(\begin{smallmatrix} T_0 & T_0-T_1 \\ 0 & T_1\\ \end{smallmatrix}\right)$ and proved that when $T_0$ and $T_1$ are homogeneous operators with the same unitary representation $U(g)$, it is homogeneous with associated representation $U(g)\oplus U(g)$. At the same time, they asked an open questio… ▽ More To construct more homogeneous operators, B. Bagchi and G. Misra in \cite{d} introduced the operator $\left(\begin{smallmatrix} T_0 & T_0-T_1 \\ 0 & T_1\\ \end{smallmatrix}\right)$ and proved that when $T_0$ and $T_1$ are homogeneous operators with the same unitary representation $U(g)$, it is homogeneous with associated representation $U(g)\oplus U(g)$. At the same time, they asked an open question, is the constructed operator irreducible? A. Kor$\acute{a}$nyi in \cite{e} showed that when the (1,2)-entry of the matrix is $α(T_0-T_1)$, $α\in\mathbb{C}$ the above result is also valid, and their unitary equivalence class depends only on $|α|$. In this case, he and S. Hazra \cite{f} gave a large class of irreducible homogeneous bilateral $2\times2$ block shifts, respectively, which are mutually unitarily inequivalent for $α>0$. In this note, we generalize the construction to $T=\left(\begin{smallmatrix} T_0 & XT_1-T_0X \\ 0 & T_1\\ \end{smallmatrix}\right)$ and provide some sufficient conditions for its irreducibility. We also find that for the above-mentioned $T_0,T_1$ and non-scalar operator $X$, $T$ is weakly homogeneous rather than homogeneous. So the weak homogeneity problem related to $T$ is investigated. △ Less

Submitted 3 November, 2023; originally announced November 2023.

arXiv:2310.17304 [pdf, other]

Static Semantics Reconstruction for Enhancing JavaScript-WebAssembly Multilingual Malware Detection

Authors: Yifan Xia, Ping He, Xuhong Zhang, Peiyu Liu, Shouling Ji, Wenhai Wang

Abstract: The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiveness decreases significantly against JWMM. The dete… ▽ More The emergence of WebAssembly allows attackers to hide the malicious functionalities of JavaScript malware in cross-language interoperations, termed JavaScript-WebAssembly multilingual malware (JWMM). However, existing anti-virus solutions based on static program analysis are still limited to monolingual code. As a result, their detection effectiveness decreases significantly against JWMM. The detection of JWMM is challenging due to the complex interoperations and semantic diversity between JavaScript and WebAssembly. To bridge this gap, we present JWBinder, the first technique aimed at enhancing the static detection of JWMM. JWBinder performs a language-specific data-flow analysis to capture the cross-language interoperations and then characterizes the functionalities of JWMM through a unified high-level structure called Inter-language Program Dependency Graph. The extensive evaluation on one of the most representative real-world anti-virus platforms, VirusTotal, shows that \system effectively enhances anti-virus systems from various vendors and increases the overall successful detection rate against JWMM from 49.1\% to 86.2\%. Additionally, we assess the side effects and runtime overhead of JWBinder, corroborating its practical viability in real-world applications. △ Less

Submitted 19 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

Comments: Accepted to ESORICS 2023

arXiv:2310.16853 [pdf, other]

CP-BCS: Binary Code Summarization Guided by Control Flow Graph and Pseudo Code

Authors: Tong Ye, Lingfei Wu, Tengfei Ma, Xuhong Zhang, Yangkai Du, Peiyu Liu, Shouling Ji, Wenhai Wang

Abstract: Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that… ▽ More Automatically generating function summaries for binaries is an extremely valuable but challenging task, since it involves translating the execution behavior and semantics of the low-level language (assembly code) into human-readable natural language. However, most current works on understanding assembly code are oriented towards generating function names, which involve numerous abbreviations that make them still confusing. To bridge this gap, we focus on generating complete summaries for binary functions, especially for stripped binary (no symbol table and debug information in reality). To fully exploit the semantics of assembly code, we present a control flow graph and pseudo code guided binary code summarization framework called CP-BCS. CP-BCS utilizes a bidirectional instruction-level control flow graph and pseudo code that incorporates expert knowledge to learn the comprehensive binary function execution behavior and logic semantics. We evaluate CP-BCS on 3 different binary optimization levels (O1, O2, and O3) for 3 different computer architectures (X86, X64, and ARM). The evaluation results demonstrate CP-BCS is superior and significantly improves the efficiency of reverse engineering. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: EMNLP 2023 Main Conference

arXiv:2310.15590 [pdf, other]

Facial Data Minimization: Shallow Model as Your Privacy Filter

Authors: Yuwen Pu, Jiahao Chen, Jiayu Pan, Hao li, Diqun Yan, Xuhong Zhang, Shouling Ji

Abstract: Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail w… ▽ More Face recognition service has been used in many fields and brings much convenience to people. However, once the user's facial data is transmitted to a service provider, the user will lose control of his/her private data. In recent years, there exist various security and privacy issues due to the leakage of facial data. Although many privacy-preserving methods have been proposed, they usually fail when they are not accessible to adversaries' strategies or auxiliary data. Hence, in this paper, by fully considering two cases of uploading facial images and facial features, which are very typical in face recognition service systems, we proposed a data privacy minimization transformation (PMT) method. This method can process the original facial data based on the shallow model of authorized services to obtain the obfuscated data. The obfuscated data can not only maintain satisfactory performance on authorized models and restrict the performance on other unauthorized models but also prevent original privacy data from leaking by AI methods and human visual theft. Additionally, since a service provider may execute preprocessing operations on the received data, we also propose an enhanced perturbation method to improve the robustness of PMT. Besides, to authorize one facial image to multiple service models simultaneously, a multiple restriction mechanism is proposed to improve the scalability of PMT. Finally, we conduct extensive experiments and evaluate the effectiveness of the proposed PMT in defending against face reconstruction, data abuse, and face attribute estimation attacks. These experimental results demonstrate that PMT performs well in preventing facial data abuse and privacy leakage while maintaining face recognition accuracy. △ Less

Submitted 12 November, 2023; v1 submitted 24 October, 2023; originally announced October 2023.

Comments: 14 pages, 11 figures

arXiv:2310.14561 [pdf, other]

F$^2$AT: Feature-Focusing Adversarial Training via Disentanglement of Natural and Perturbed Patterns

Authors: Yaguan Qian, Chenyu Zhao, Zhaoquan Gu, Bin Wang, Shouling Ji, Wei Wang, Boyang Zhou, Pan Zhou

Abstract: Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations. This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis. At present, adversarial training is one of the most effective defenses against adversarial examples. However, traditional adversarial training makes it diffi… ▽ More Deep neural networks (DNNs) are vulnerable to adversarial examples crafted by well-designed perturbations. This could lead to disastrous results on critical applications such as self-driving cars, surveillance security, and medical diagnosis. At present, adversarial training is one of the most effective defenses against adversarial examples. However, traditional adversarial training makes it difficult to achieve a good trade-off between clean accuracy and robustness since spurious features are still learned by DNNs. The intrinsic reason is that traditional adversarial training makes it difficult to fully learn core features from adversarial examples when adversarial noise and clean examples cannot be disentangled. In this paper, we disentangle the adversarial examples into natural and perturbed patterns by bit-plane slicing. We assume the higher bit-planes represent natural patterns and the lower bit-planes represent perturbed patterns, respectively. We propose a Feature-Focusing Adversarial Training (F$^2$AT), which differs from previous work in that it enforces the model to focus on the core features from natural patterns and reduce the impact of spurious features from perturbed patterns. The experimental results demonstrated that F$^2$AT outperforms state-of-the-art methods in clean accuracy and adversarial robustness. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.11092 [pdf, other]

DORec: Decomposed Object Reconstruction Utilizing 2D Self-Supervised Features

Authors: Jun Wu, Sicheng Li, Sihui Ji, Yue Wang, Rong Xiong, Yiyi Liao

Abstract: Decomposing a target object from a complex background while reconstructing is challenging. Most approaches acquire the perception for object instances through the use of manual labels, but the annotation procedure is costly. The recent advancements in 2D self-supervised learning have brought new prospects to object-aware representation, yet it remains unclear how to leverage such noisy 2D features… ▽ More Decomposing a target object from a complex background while reconstructing is challenging. Most approaches acquire the perception for object instances through the use of manual labels, but the annotation procedure is costly. The recent advancements in 2D self-supervised learning have brought new prospects to object-aware representation, yet it remains unclear how to leverage such noisy 2D features for clean decomposition. In this paper, we propose a Decomposed Object Reconstruction (DORec) network based on neural implicit representations. Our key idea is to transfer 2D self-supervised features into masks of two levels of granularity to supervise the decomposition, including a binary mask to indicate the foreground regions and a K-cluster mask to indicate the semantically similar regions. These two masks are complementary to each other and lead to robust decomposition. Experimental results show the superiority of DORec in segmenting and reconstructing the foreground object on various datasets. △ Less

Submitted 19 October, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.04722 [pdf, other]

A Holistic Evaluation of Piano Sound Quality

Authors: Monan Zhou, Shangda Wu, Shaohua Ji, Zijin Li, Wei Li

Abstract: This paper aims to develop a holistic evaluation method for piano sound quality to assist in purchasing decisions. Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos. To derive quality evaluation systems, the study uses subjective questionnaires based on a piano sound quality datas… ▽ More This paper aims to develop a holistic evaluation method for piano sound quality to assist in purchasing decisions. Unlike previous studies that focused on the effect of piano performance techniques on sound quality, this study evaluates the inherent sound quality of different pianos. To derive quality evaluation systems, the study uses subjective questionnaires based on a piano sound quality dataset. The method selects the optimal piano classification models by comparing the fine-tuning results of different pre-training models of Convolutional Neural Networks (CNN). To improve the interpretability of the models, the study applies Equivalent Rectangular Bandwidth (ERB) analysis. The results reveal that musically trained individuals are better able to distinguish between the sound quality differences of different pianos. The best fine-tuned CNN pre-trained backbone achieves a high accuracy of 98.3\% as the piano classifier. However, the dataset is limited, and the audio is sliced to increase its quantity, resulting in a lack of diversity and balance, so we use focal loss to reduce the impact of data imbalance. To optimize the method, the dataset will be expanded, or few-shot learning techniques will be employed in future research. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2310.01580 [pdf, other]

Active Learning on Neural Networks through Interactive Generation of Digit Patterns and Visual Representation

Authors: Dong H. Jeong, Jin-Hee Cho, Feng Chen, Audun Josang, Soo-Yeon Ji

Abstract: Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying computation and meaning are hidden. Due to this nature, users often face difficulties in interpreting the underlying mechanism of the NNs and the benefits of using them.… ▽ More Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying computation and meaning are hidden. Due to this nature, users often face difficulties in interpreting the underlying mechanism of the NNs and the benefits of using them. In this paper, to improve users' learning and understanding of NNs, an interactive learning system is designed to create digit patterns and recognize them in real time. To help users clearly understand the visual differences of digit patterns (i.e., 0 ~ 9) and their results with an NN, integrating visualization is considered to present all digit patterns in a two-dimensional display space with supporting multiple user interactions. An evaluation with multiple datasets is conducted to determine its usability for active learning. In addition, informal user testing is managed during a summer workshop by asking the workshop participants to use the system. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.15132 [pdf, other]

Genetic InfoMax: Exploring Mutual Information Maximization in High-Dimensional Imaging Genetics Studies

Authors: Yaochen Xie, Ziqian Xie, Sheikh Muhammad Saiful Islam, Degui Zhi, Shuiwang Ji

Abstract: Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits. When applied to high-dimensional medical imaging data, a key step is to extract lower-dimensional, yet informative representations of the data as traits. Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS in compari… ▽ More Genome-wide association studies (GWAS) are used to identify relationships between genetic variations and specific traits. When applied to high-dimensional medical imaging data, a key step is to extract lower-dimensional, yet informative representations of the data as traits. Representation learning for imaging genetics is largely under-explored due to the unique challenges posed by GWAS in comparison to typical visual representation learning. In this study, we tackle this problem from the mutual information (MI) perspective by identifying key limitations of existing methods. We introduce a trans-modal learning framework Genetic InfoMax (GIM), including a regularized MI estimator and a novel genetics-informed transformer to address the specific challenges of GWAS. We evaluate GIM on human brain 3D MRI data and establish standardized evaluation protocols to compare it to existing approaches. Our results demonstrate the effectiveness of GIM and a significantly improved performance on GWAS. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 17 pages, 7 figures

arXiv:2309.14742 [pdf, other]

SyzTrust: State-aware Fuzzing on Trusted OS Designed for IoT Devices

Authors: Qinying Wang, Boyu Chang, Shouling Ji, Yuan Tian, Xuhong Zhang, Binbin Zhao, Gaoning Pan, Chenyang Lyu, Mathias Payer, Wenhai Wang, Raheem Beyah

Abstract: Trusted Execution Environments (TEEs) embedded in IoT devices provide a deployable solution to secure IoT applications at the hardware level. By design, in TEEs, the Trusted Operating System (Trusted OS) is the primary component. It enables the TEE to use security-based design techniques, such as data encryption and identity authentication. Once a Trusted OS has been exploited, the TEE can no long… ▽ More Trusted Execution Environments (TEEs) embedded in IoT devices provide a deployable solution to secure IoT applications at the hardware level. By design, in TEEs, the Trusted Operating System (Trusted OS) is the primary component. It enables the TEE to use security-based design techniques, such as data encryption and identity authentication. Once a Trusted OS has been exploited, the TEE can no longer ensure security. However, Trusted OSes for IoT devices have received little security analysis, which is challenging from several perspectives: (1) Trusted OSes are closed-source and have an unfavorable environment for sending test cases and collecting feedback. (2) Trusted OSes have complex data structures and require a stateful workflow, which limits existing vulnerability detection tools. To address the challenges, we present SyzTrust, the first state-aware fuzzing framework for vetting the security of resource-limited Trusted OSes. SyzTrust adopts a hardware-assisted framework to enable fuzzing Trusted OSes directly on IoT devices as well as tracking state and code coverage non-invasively. SyzTrust utilizes composite feedback to guide the fuzzer to effectively explore more states as well as to increase the code coverage. We evaluate SyzTrust on Trusted OSes from three major vendors: Samsung, Tsinglink Cloud, and Ali Cloud. These systems run on Cortex M23/33 MCUs, which provide the necessary abstraction for embedded TEEs. We discovered 70 previously unknown vulnerabilities in their Trusted OSes, receiving 10 new CVEs so far. Furthermore, compared to the baseline, SyzTrust has demonstrated significant improvements, including 66% higher code coverage, 651% higher state coverage, and 31% improved vulnerability-finding capability. We report all discovered new vulnerabilities to vendors and open source SyzTrust. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: To appear in the IEEE Symposium on Security and Privacy (IEEE S&P) 2024, San Francisco, CA, USA

arXiv:2309.13446 [pdf, other]

Video Timeline Modeling For News Story Understanding

Authors: Meng Liu, Mingda Zhang, Jialu Liu, Hanjun Dai, Ming-Hsuan Yang, Shuiwang Ji, Zheyun Feng, Boqing Gong

Abstract: In this paper, we present a novel problem, namely video timeline modeling. Our objective is to create a video-associated timeline from a set of videos related to a specific topic, thereby facilitating the content and structure understanding of the story being told. This problem has significant potential in various real-world applications, for instance, news story summarization. To bootstrap resear… ▽ More In this paper, we present a novel problem, namely video timeline modeling. Our objective is to create a video-associated timeline from a set of videos related to a specific topic, thereby facilitating the content and structure understanding of the story being told. This problem has significant potential in various real-world applications, for instance, news story summarization. To bootstrap research in this area, we curate a realistic benchmark dataset, YouTube-News-Timeline, consisting of over $12$k timelines and $300$k YouTube news videos. Additionally, we propose a set of quantitative metrics to comprehensively evaluate and compare methodologies. With such a testbed, we further develop and benchmark several deep learning approaches to tackling this problem. We anticipate that this exploratory work will pave the way for further research in video timeline modeling. The assets are available via https://github.com/google-research/google-research/tree/master/video_timeline_modeling. △ Less

Submitted 27 October, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

Comments: Accepted as a spotlight by NeurIPS 2023, Track on Datasets and Benchmarks

arXiv:2309.13256 [pdf, other]

Defending Pre-trained Language Models as Few-shot Learners against Backdoor Attacks

Authors: Zhaohan Xi, Tianyu Du, Changjiang Li, Ren Pang, Shouling Ji, Jinghui Chen, Fenglong Ma, Ting Wang

Abstract: Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such c… ▽ More Pre-trained language models (PLMs) have demonstrated remarkable performance as few-shot learners. However, their security risks under such settings are largely unexplored. In this work, we conduct a pilot study showing that PLMs as few-shot learners are highly vulnerable to backdoor attacks while existing defenses are inadequate due to the unique challenges of few-shot scenarios. To address such challenges, we advocate MDP, a novel lightweight, pluggable, and effective defense for PLMs as few-shot learners. Specifically, MDP leverages the gap between the masking-sensitivity of poisoned and clean samples: with reference to the limited few-shot data as distributional anchors, it compares the representations of given samples under varying masking and identifies poisoned samples as ones with significant variations. We show analytically that MDP creates an interesting dilemma for the attacker to choose between attack effectiveness and detection evasiveness. The empirical evaluation using benchmark datasets and representative attacks validates the efficacy of MDP. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: Accepted by NeurIPS'23

arXiv:2309.08958 [pdf, other]

Monolingual or Multilingual Instruction Tuning: Which Makes a Better Alpaca

Authors: Pinzhen Chen, Shaoxiong Ji, Nikolay Bogoychev, Andrey Kutuzov, Barry Haddow, Kenneth Heafield

Abstract: Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants. While such efforts are often carried out in a single language, we empirically analyze cost-efficient strategies for multilingual scenarios. Our study employs the Alpaca dataset and machine translations of it to form multilingual data, which i… ▽ More Foundational large language models (LLMs) can be instruction-tuned to perform open-domain question answering, facilitating applications like chat assistants. While such efforts are often carried out in a single language, we empirically analyze cost-efficient strategies for multilingual scenarios. Our study employs the Alpaca dataset and machine translations of it to form multilingual data, which is then used to tune LLMs through either low-rank adaptation or full-parameter training. Under a controlled computation budget, comparisons show that multilingual tuning is on par or better than tuning a model for each language. Furthermore, multilingual tuning with downsampled data can be as powerful and more robust. Our findings serve as a guide for expanding language support through instruction tuning. △ Less

Submitted 30 January, 2024; v1 submitted 16 September, 2023; originally announced September 2023.

Comments: Accepted to Findings of ACL: EACL 2024. Added human evaluation and shortened writing

arXiv:2309.06009 [pdf, other]

Content Reduction, Surprisal and Information Density Estimation for Long Documents

Authors: Shaoxiong Ji, Wei Sun, Pekka Marttinen

Abstract: Many computational linguistic methods have been proposed to study the information content of languages. We consider two interesting research questions: 1) how is information distributed over long documents, and 2) how does content reduction, such as token selection and text summarization, affect the information density in long documents. We present four criteria for information density estimation… ▽ More Many computational linguistic methods have been proposed to study the information content of languages. We consider two interesting research questions: 1) how is information distributed over long documents, and 2) how does content reduction, such as token selection and text summarization, affect the information density in long documents. We present four criteria for information density estimation for long documents, including surprisal, entropy, uniform information density, and lexical density. Among those criteria, the first three adopt the measures from information theory. We propose an attention-based word selection method for clinical notes and study machine summarization for multiple-domain documents. Our findings reveal the systematic difference in information density of long text in various domains. Empirical results on automated medical coding from long clinical notes show the effectiveness of the attention-based word selection method. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2309.04225 [pdf]

Long-Range Correlation Supervision for Land-Cover Classification from Remote Sensing Images

Authors: Dawen Yu, Shunping Ji

Abstract: Long-range dependency modeling has been widely considered in modern deep learning based semantic segmentation methods, especially those designed for large-size remote sensing images, to compensate the intrinsic locality of standard convolutions. However, in previous studies, the long-range dependency, modeled with an attention mechanism or transformer model, has been based on unsupervised learning… ▽ More Long-range dependency modeling has been widely considered in modern deep learning based semantic segmentation methods, especially those designed for large-size remote sensing images, to compensate the intrinsic locality of standard convolutions. However, in previous studies, the long-range dependency, modeled with an attention mechanism or transformer model, has been based on unsupervised learning, instead of explicit supervision from the objective ground truth. In this paper, we propose a novel supervised long-range correlation method for land-cover classification, called the supervised long-range correlation network (SLCNet), which is shown to be superior to the currently used unsupervised strategies. In SLCNet, pixels sharing the same category are considered highly correlated and those having different categories are less relevant, which can be easily supervised by the category consistency information available in the ground truth semantic segmentation map. Under such supervision, the recalibrated features are more consistent for pixels of the same category and more discriminative for pixels of other categories, regardless of their proximity. To complement the detailed information lacking in the global long-range correlation, we introduce an auxiliary adaptive receptive field feature extraction module, parallel to the long-range correlation module in the encoder, to capture finely detailed feature representations for multi-size objects in multi-scale remote sensing images. In addition, we apply multi-scale side-output supervision and a hybrid loss function as local and global constraints to further boost the segmentation accuracy. Experiments were conducted on three remote sensing datasets. Compared with the advanced segmentation methods from the computer vision, medicine, and remote sensing communities, the SLCNet achieved a state-of-the-art performance on all the datasets. △ Less

Submitted 8 September, 2023; originally announced September 2023.

Comments: 14 pages, 11 figures

arXiv:2309.03463 [pdf, ps, other]

Lower dimensional invariant tori for multi-scale Hamiltonian systems

Authors: Weichao Qian, Shuguan Ji, Yong Li

Abstract: The ``Fundamental Theorem" given by Arnold in [2] asserts the persistence of full dimensional invariant tori for 2-scale Hamiltonian systems. However, persistence in multi-scale systems is much more complicated and difficult. In this paper, we explore the persistence of lower dimensional invariant tori for multi-scale Hamiltonian systems, which play an important role in dynamics of resonant Hamilt… ▽ More The ``Fundamental Theorem" given by Arnold in [2] asserts the persistence of full dimensional invariant tori for 2-scale Hamiltonian systems. However, persistence in multi-scale systems is much more complicated and difficult. In this paper, we explore the persistence of lower dimensional invariant tori for multi-scale Hamiltonian systems, which play an important role in dynamics of resonant Hamiltonian systems. Moreover, using the corresponding results we give a quasi-periodic Poincaré Theorem for multi-scale Hamiltonian systems, i.e., at least $2^{m_0}$ families resonant tori survive small perturbations, where the integer $m_0$ is the multiplicity of resonance. △ Less

Submitted 6 September, 2023; originally announced September 2023.

arXiv:2309.03081 [pdf, other]

doi 10.14722/ndss.2024.23184

ORL-AUDITOR: Dataset Auditing in Offline Deep Reinforcement Learning

Authors: Linkang Du, Min Chen, Mingyang Sun, Shouling Ji, Peng Cheng, Jiming Chen, Zhikun Zhang

Abstract: Data is a critical asset in AI, as high-quality datasets can significantly improve the performance of machine learning models. In safety-critical domains such as autonomous vehicles, offline deep reinforcement learning (offline DRL) is frequently used to train models on pre-collected datasets, as opposed to training these models by interacting with the real-world environment as the online DRL. To… ▽ More Data is a critical asset in AI, as high-quality datasets can significantly improve the performance of machine learning models. In safety-critical domains such as autonomous vehicles, offline deep reinforcement learning (offline DRL) is frequently used to train models on pre-collected datasets, as opposed to training these models by interacting with the real-world environment as the online DRL. To support the development of these models, many institutions make datasets publicly available with opensource licenses, but these datasets are at risk of potential misuse or infringement. Injecting watermarks to the dataset may protect the intellectual property of the data, but it cannot handle datasets that have already been published and is infeasible to be altered afterward. Other existing solutions, such as dataset inference and membership inference, do not work well in the offline DRL scenario due to the diverse model behavior characteristics and offline setting constraints. In this paper, we advocate a new paradigm by leveraging the fact that cumulative rewards can act as a unique identifier that distinguishes DRL models trained on a specific dataset. To this end, we propose ORL-AUDITOR, which is the first trajectory-level dataset auditing mechanism for offline RL scenarios. Our experiments on multiple offline DRL models and tasks reveal the efficacy of ORL-AUDITOR, with auditing accuracy over 95% and false positive rates less than 2.88%. We also provide valuable insights into the practical implementation of ORL-AUDITOR by studying various parameter settings. Furthermore, we demonstrate the auditing capability of ORL-AUDITOR on open-source datasets from Google and DeepMind, highlighting its effectiveness in auditing published datasets. ORL-AUDITOR is open-sourced at https://github.com/link-zju/ORL-Auditor. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: To appear in the Network and Distributed System Security Symposium (NDSS) 2024, San Diego, CA, USA

arXiv:2309.01866 [pdf, other]

Efficient Query-Based Attack against ML-Based Android Malware Detection under Zero Knowledge Setting

Authors: Ping He, Yifan Xia, Xuhong Zhang, Shouling Ji

Abstract: The widespread adoption of the Android operating system has made malicious Android applications an appealing target for attackers. Machine learning-based (ML-based) Android malware detection (AMD) methods are crucial in addressing this problem; however, their vulnerability to adversarial examples raises concerns. Current attacks against ML-based AMD methods demonstrate remarkable performance but r… ▽ More The widespread adoption of the Android operating system has made malicious Android applications an appealing target for attackers. Machine learning-based (ML-based) Android malware detection (AMD) methods are crucial in addressing this problem; however, their vulnerability to adversarial examples raises concerns. Current attacks against ML-based AMD methods demonstrate remarkable performance but rely on strong assumptions that may not be realistic in real-world scenarios, e.g., the knowledge requirements about feature space, model parameters, and training dataset. To address this limitation, we introduce AdvDroidZero, an efficient query-based attack framework against ML-based AMD methods that operates under the zero knowledge setting. Our extensive evaluation shows that AdvDroidZero is effective against various mainstream ML-based AMD methods, in particular, state-of-the-art such methods and real-world antivirus solutions. △ Less

Submitted 6 September, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: To Appear in the ACM Conference on Computer and Communications Security, November, 2023

arXiv:2309.01182 [pdf, other]

Direct visualization of electric current induced dipoles of atomic impurities

Authors: Yaowu Liu, Zichun Zhang, Sidan Chen, Shengnan Xu, Lichen Ji, Wei Chen, Xinyu Zhou, Jiaxin Luo, Xiaopen Hu, Wenhui Duan, Xi Chen, Qi-Kun Xue, Shuai-Hua Ji

Abstract: Learning the electron scattering around atomic impurities is a fundamental step to fully understand the basic electronic transport properties of realistic conducting materials. Although many efforts have been made in this field for several decades, atomic scale transport around single point-like impurities has yet been achieved. Here, we report the direct visualization of the electric current indu… ▽ More Learning the electron scattering around atomic impurities is a fundamental step to fully understand the basic electronic transport properties of realistic conducting materials. Although many efforts have been made in this field for several decades, atomic scale transport around single point-like impurities has yet been achieved. Here, we report the direct visualization of the electric current induced dipoles around single atomic impurities in epitaxial bilayer graphene by multi-probe low temperature scanning tunneling potentiometry as the local current density is raised up to around 25 A/m, which is considerably higher than that in previous studies. We find the directions of these dipoles which are parallel or anti-parallel to local current are determined by the charge polarity of the impurities, revealing the direct evidence for the existence of the carrier density modulation effect proposed by Landauer in 1976. Furthermore, by $in$ $situ$ tuning local current direction with contact probes, these dipoles are redirected correspondingly. Our work paves the way to explore the electronic quantum transport phenomena at single atomic impurity level and the potential future electronics toward or beyond the end of Moore's Law. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Showing 51–100 of 703 results for author: Ji, S