subscribe to arXiv mailings

arXiv:2407.11866 [pdf, other]

FitteR for Accretion ProPErties of T Tauri stars (FRAPPE): A new approach to use Class III spectra to derive stellar and accretion properties

Authors: R. A. B. Claes, J. Campbell-White, C. F. Manara, A. Frasca, A. Natta, J. M. Alcalá, A. Armeni, M. Fang, J. B. Lovell, B. Stelzer, L. Venuti, M. Wyatt, A. Queitsch

Abstract: Studies of the stellar and accretion properties of classical T Tauri stars (CTTS) require comparison with photospheric spectral templates. Here we aim at expanding the currently available grid of wide-wavelength coverage observed spectra of non-accreting stars with additional new spectra and an interpolation method that allows us to obtain a continuous grid of low resolution spectra ranging from s… ▽ More Studies of the stellar and accretion properties of classical T Tauri stars (CTTS) require comparison with photospheric spectral templates. Here we aim at expanding the currently available grid of wide-wavelength coverage observed spectra of non-accreting stars with additional new spectra and an interpolation method that allows us to obtain a continuous grid of low resolution spectra ranging from spectral type G8 to M9.5, while also mitigating observational uncertainties. This interpolated grid is then implemented in the self-consistent method to derive stellar and accretion properties of CTTS. With the new templates, we aim to estimate a lower limit on the accretion luminosities that can be obtained through a study of the UV excess emission using observed templates. We analyse the molecular photospheric features present in the VLT/X-Shooter spectra of the targets to perform a spectral classification, including estimates of their extinction. We apply a non-parametric fitting method to the full grid of observed templates to obtain an interpolated grid of templates. We use the uncertainties on our interpolated grid to estimate a lower limit on the accretion luminosity that we can measure with this method. We find that the measurable accretion luminosities ranges from $\sim 2.7$ dex lower than the stellar luminosity in M5.5 stars to $\sim 1.3$ dex lower for G8 stars. For young stars with masses of $\sim 1M_{\odot}$ and ages of 3-6 Myr this limit translates into an observational limit of mass accretion rate on the order of $10^{-10} \rm M_{\odot}/yr$. The implementation of an interpolated grid of observed templates allows us to better disentangle degenerate solutions, leading to a more reliable estimate of accretion rates in young accreting stars. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted to A&A, Version before changes by the language editor

arXiv:2407.09721 [pdf, other]

Purrfect Pitch: Exploring Musical Interval Learning through Multisensory Interfaces

Authors: Sam Chin, Cathy Mengying Fang, Nikhil Singh, Ibrahim Ibrahim, Joe Paradiso, Pattie Maes

Abstract: We introduce Purrfect Pitch, a system consisting of a wearable haptic device and a custom-designed learning interface for musical ear training. We focus on the ability to identify musical intervals (sequences of two musical notes), which is a perceptually ambiguous task that usually requires strenuous rote training. With our system, the user would hear a sequence of two tones while simultaneously… ▽ More We introduce Purrfect Pitch, a system consisting of a wearable haptic device and a custom-designed learning interface for musical ear training. We focus on the ability to identify musical intervals (sequences of two musical notes), which is a perceptually ambiguous task that usually requires strenuous rote training. With our system, the user would hear a sequence of two tones while simultaneously receiving two corresponding vibrotactile stimuli on the back. Providing haptic feedback along the back makes the auditory distance between the two tones more salient, and the back-worn design is comfortable and unobtrusive. During training, the user receives multi-sensory feedback from our system and inputs their guessed interval value on our web-based learning interface. They see a green (otherwise red) screen for a correct guess with the correct interval value. Our study with 18 participants shows that our system enables novice learners to identify intervals more accurately and consistently than those who only received audio feedback, even after the haptic feedback is removed. We also share further insights on how to design a multisensory learning system. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.07930 [pdf]

Token-Mol 1.0: Tokenized drug design with large language model

Authors: Jike Wang, Rui Qin, Mingyang Wang, Meijing Fang, Yangyang Zhang, Yuchen Zhu, Qun Su, Qiaolin Gou, Chao Shen, Odin Zhang, Zhenxing Wu, Dejun Jiang, Xujun Zhang, Huifeng Zhao, Xiaozhe Wan, Zhourui Wu, Liwei Liu, Yu Kang, Chang-Yu Hsieh, Tingjun Hou

Abstract: Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug… ▽ More Significant interests have recently risen in leveraging sequence-based large language models (LLMs) for drug design. However, most current applications of LLMs in drug discovery lack the ability to comprehend three-dimensional (3D) structures, thereby limiting their effectiveness in tasks that explicitly involve molecular conformations. In this study, we introduced Token-Mol, a token-only 3D drug design model. This model encodes all molecular information, including 2D and 3D structures, as well as molecular property data, into tokens, which transforms classification and regression tasks in drug discovery into probabilistic prediction problems, thereby enabling learning through a unified paradigm. Token-Mol is built on the transformer decoder architecture and trained using random causal masking techniques. Additionally, we proposed the Gaussian cross-entropy (GCE) loss function to overcome the challenges in regression tasks, significantly enhancing the capacity of LLMs to learn continuous numerical values. Through a combination of fine-tuning and reinforcement learning (RL), Token-Mol achieves performance comparable to or surpassing existing task-specific methods across various downstream tasks, including pocket-based molecular generation, conformation generation, and molecular property prediction. Compared to existing molecular pre-trained models, Token-Mol exhibits superior proficiency in handling a wider range of downstream tasks essential for drug design. Notably, our approach improves regression task accuracy by approximately 30% compared to similar token-only methods. Token-Mol overcomes the precision limitations of token-only models and has the potential to integrate seamlessly with general models such as ChatGPT, paving the way for the development of a universal artificial intelligence drug design model that facilitates rapid and high-quality drug design by experts. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07221 [pdf, other]

Tracing Back the Malicious Clients in Poisoning Attacks to Federated Learning

Authors: Yuqi Jia, Minghong Fang, Hongbin Liu, Jinghuai Zhang, Neil Zhenqiang Gong

Abstract: Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non… ▽ More Poisoning attacks compromise the training phase of federated learning (FL) such that the learned global model misclassifies attacker-chosen inputs called target inputs. Existing defenses mainly focus on protecting the training phase of FL such that the learnt global model is poison free. However, these defenses often achieve limited effectiveness when the clients' local training data is highly non-iid or the number of malicious clients is large, as confirmed in our experiments. In this work, we propose FLForensics, the first poison-forensics method for FL. FLForensics complements existing training-phase defenses. In particular, when training-phase defenses fail and a poisoned global model is deployed, FLForensics aims to trace back the malicious clients that performed the poisoning attack after a misclassified target input is identified. We theoretically show that FLForensics can accurately distinguish between benign and malicious clients under a formal definition of poisoning attack. Moreover, we empirically show the effectiveness of FLForensics at tracing back both existing and adaptive poisoning attacks on five benchmark datasets. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.04285 [pdf, other]

Robust Decision Transformer: Tackling Data Corruption in Offline RL via Sequence Modeling

Authors: Jiawei Xu, Rui Yang, Feng Luo, Meng Fang, Baoxiang Wang, Lei Han

Abstract: Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods… ▽ More Learning policies from offline datasets through offline reinforcement learning (RL) holds promise for scaling data-driven decision-making and avoiding unsafe and costly online interactions. However, real-world data collected from sensors or humans often contains noise and errors, posing a significant challenge for existing offline RL methods. Our study indicates that traditional offline RL methods based on temporal difference learning tend to underperform Decision Transformer (DT) under data corruption, especially when the amount of data is limited. This suggests the potential of sequential modeling for tackling data corruption in offline RL. To further unleash the potential of sequence modeling methods, we propose Robust Decision Transformer (RDT) by incorporating several robust techniques. Specifically, we introduce Gaussian weighted learning and iterative data correction to reduce the effect of corrupted data. Additionally, we leverage embedding dropout to enhance the model's resistance to erroneous inputs. Extensive experiments on MoJoCo, KitChen, and Adroit tasks demonstrate RDT's superior performance under diverse data corruption compared to previous methods. Moreover, RDT exhibits remarkable robustness in a challenging setting that combines training-time data corruption with testing-time observation perturbations. These results highlight the potential of robust sequence modeling for learning from noisy or corrupted offline datasets, thereby promoting the reliable application of offline RL in real-world tasks. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.01917 [pdf, other]

Securing Distributed Network Digital Twin Systems Against Model Poisoning Attacks

Authors: Zifan Zhang, Minghong Fang, Mingzhe Chen, Gaolei Li, Xi Lin, Yuchen Liu

Abstract: In the era of 5G and beyond, the increasing complexity of wireless networks necessitates innovative frameworks for efficient management and deployment. Digital twins (DTs), embodying real-time monitoring, predictive configurations, and enhanced decision-making capabilities, stand out as a promising solution in this context. Within a time-series data-driven framework that effectively maps wireless… ▽ More In the era of 5G and beyond, the increasing complexity of wireless networks necessitates innovative frameworks for efficient management and deployment. Digital twins (DTs), embodying real-time monitoring, predictive configurations, and enhanced decision-making capabilities, stand out as a promising solution in this context. Within a time-series data-driven framework that effectively maps wireless networks into digital counterparts, encapsulated by integrated vertical and horizontal twinning phases, this study investigates the security challenges in distributed network DT systems, which potentially undermine the reliability of subsequent network applications such as wireless traffic forecasting. Specifically, we consider a minimal-knowledge scenario for all attackers, in that they do not have access to network data and other specialized knowledge, yet can interact with previous iterations of server-level models. In this context, we spotlight a novel fake traffic injection attack designed to compromise a distributed network DT system for wireless traffic prediction. In response, we then propose a defense mechanism, termed global-local inconsistency detection (GLID), to counteract various model poisoning threats. GLID strategically removes abnormal model parameters that deviate beyond a particular percentile range, thereby fortifying the security of network twinning process. Through extensive experiments on real-world wireless traffic datasets, our experimental evaluations show that both our attack and defense strategies significantly outperform existing baselines, highlighting the importance of security measures in the design and implementation of DTs for 5G and beyond network systems. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted by Internet of Things Journal (IoT-J). arXiv admin note: substantial text overlap with arXiv:2404.14389

arXiv:2406.19283 [pdf, other]

PhysioLLM: Supporting Personalized Health Insights with Wearables and Large Language Models

Authors: Cathy Mengying Fang, Valdemar Danry, Nathan Whitmore, Andria Bao, Andrew Hutchison, Cayden Pierce, Pattie Maes

Abstract: We present PhysioLLM, an interactive system that leverages large language models (LLMs) to provide personalized health understanding and exploration by integrating physiological data from wearables with contextual information. Unlike commercial health apps for wearables, our system offers a comprehensive statistical analysis component that discovers correlations and trends in user data, allowing u… ▽ More We present PhysioLLM, an interactive system that leverages large language models (LLMs) to provide personalized health understanding and exploration by integrating physiological data from wearables with contextual information. Unlike commercial health apps for wearables, our system offers a comprehensive statistical analysis component that discovers correlations and trends in user data, allowing users to ask questions in natural language and receive generated personalized insights, and guides them to develop actionable goals. As a case study, we focus on improving sleep quality, given its measurability through physiological data and its importance to general well-being. Through a user study with 24 Fitbit watch users, we demonstrate that PhysioLLM outperforms both the Fitbit App alone and a generic LLM chatbot in facilitating a deeper, personalized understanding of health data and supporting actionable steps toward personal health goals. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18984 [pdf, other]

Amplify Graph Learning for Recommendation via Sparsity Completion

Authors: Peng Yuan, Haojie Li, Minying Fang, Xu Yu, Yongjing Hao, Junwei Du

Abstract: Graph learning models have been widely deployed in collaborative filtering (CF) based recommendation systems. Due to the issue of data sparsity, the graph structure of the original input lacks potential positive preference edges, which significantly reduces the performance of recommendations. In this paper, we study how to enhance the graph structure for CF more effectively, thereby optimizing the… ▽ More Graph learning models have been widely deployed in collaborative filtering (CF) based recommendation systems. Due to the issue of data sparsity, the graph structure of the original input lacks potential positive preference edges, which significantly reduces the performance of recommendations. In this paper, we study how to enhance the graph structure for CF more effectively, thereby optimizing the representation of graph nodes. Previous works introduced matrix completion techniques into CF, proposing the use of either stochastic completion methods or superficial structure completion to address this issue. However, most of these approaches employ random numerical filling that lack control over noise perturbations and limit the in-depth exploration of higher-order interaction features of nodes, resulting in biased graph representations. In this paper, we propose an Amplify Graph Learning framework based on Sparsity Completion (called AGL-SC). First, we utilize graph neural network to mine direct interaction features between user and item nodes, which are used as the inputs of the encoder. Second, we design a factorization-based method to mine higher-order interaction features. These features serve as perturbation factors in the latent space of the hidden layer to facilitate generative enhancement. Finally, by employing the variational inference, the above multi-order features are integrated to implement the completion and enhancement of missing graph structures. We conducted benchmark and strategy experiments on four real-world datasets related to recommendation tasks. The experimental results demonstrate that AGL-SC significantly outperforms the state-of-the-art methods. △ Less

Submitted 1 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.18321 [pdf, other]

MathOdyssey: Benchmarking Mathematical Problem-Solving Skills in Large Language Models Using Odyssey Math Data

Authors: Meng Fang, Xiangpeng Wan, Fei Lu, Fei Xing, Kai Zou

Abstract: Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset. The data… ▽ More Large language models (LLMs) have significantly advanced natural language understanding and demonstrated strong problem-solving abilities. Despite these successes, most LLMs still struggle with solving mathematical problems due to the intricate reasoning required. This paper investigates the mathematical problem-solving capabilities of LLMs using the newly developed "MathOdyssey" dataset. The dataset includes diverse mathematical problems at high school and university levels, created by experts from notable institutions to rigorously test LLMs in advanced problem-solving scenarios and cover a wider range of subject areas. By providing the MathOdyssey dataset as a resource to the AI community, we aim to contribute to the understanding and improvement of AI capabilities in complex mathematical problem-solving. We conduct benchmarking on open-source models, such as Llama-3 and DBRX-Instruct, and closed-source models from the GPT series and Gemini models. Our results indicate that while LLMs perform well on routine and moderately difficult tasks, they face significant challenges with Olympiad-level problems and complex university-level questions. Our analysis shows a narrowing performance gap between open-source and closed-source models, yet substantial challenges remain, particularly with the most demanding problems. This study highlights the ongoing need for research to enhance the mathematical reasoning of LLMs. The dataset, results, and code are publicly available. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17507 [pdf, other]

ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights for cross-modal retrieval. However, constructing identifiers for multimodal data remains an untapped problem, and the modality gap between natural language queries and multimodal candidates hinders retrieval performance due to the absence of additional encoders. To this end, we propose a pioneering generAtive Cross-modal rEtrieval framework (ACE), which is a comprehensive framework for end-to-end cross-modal retrieval based on coarse-to-fine semantic modeling. We propose combining K-Means and RQ-VAE to construct coarse and fine tokens, serving as identifiers for multimodal data. Correspondingly, we design the coarse-to-fine feature fusion strategy to efficiently align natural language queries and candidate identifiers. ACE is the first work to comprehensively demonstrate the feasibility of generative approach on text-to-image/audio/video retrieval, challenging the dominance of the embedding-based dual-tower architecture. Extensive experiments show that ACE achieves state-of-the-art performance in cross-modal retrieval and outperforms the strong baselines on Recall@1 by 15.27% on average. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16702 [pdf, other]

North-PHASE: Studying Periodicity, Hot Spots, Accretion Stability and Early Evolution in young stars in the northern hemisphere

Authors: A. Sicilia-Aguilar, R. S. Kahar, M. E. Pelayo-Baldárrago, V. Roccatagliata, D. Froebrich, F. J. Galindo-Guil, J. Campbell-White, J. S. Kim, I. Mendigutía, L. Schlueter, P. S. Teixeira, S. Matsumura, M. Fang, A. Scholz, P. Ábrahám, A. Frasca, A. Garufi, C. Herbert, Á. Kóspál, C. F. Manara

Abstract: We present the overview and first results from the North-PHASE Legacy Survey, which follows six young clusters for five years, using the 2 deg$^2$ FoV of the JAST80 telescope from the Javalambre Observatory (Spain). North-PHASE investigates stellar variability on timescales from days to years for thousands of young stars distributed over entire clusters. This allows us to find new YSO, characteris… ▽ More We present the overview and first results from the North-PHASE Legacy Survey, which follows six young clusters for five years, using the 2 deg$^2$ FoV of the JAST80 telescope from the Javalambre Observatory (Spain). North-PHASE investigates stellar variability on timescales from days to years for thousands of young stars distributed over entire clusters. This allows us to find new YSO, characterise accretion and study inner disk evolution within the cluster context. Each region (Tr37, CepOB3, IC5070, IC348, NGC2264, and NGC1333) is observed in six filters (SDSS griz, u band, and J0660, which covers H$α$), detecting cluster members as well as field variable stars. Tr37 is used to prove feasibility and optimise the variability analysis techniques. In Tr37, variability reveals 50 new YSO, most of them proper motion outliers. North-PHASE independently confirms the youth of astrometric members, efficiently distinguishes accreting and non-accreting stars, reveals the extent of the cluster populations along Tr37/IC1396 bright rims, and detects variability resulting from rotation, dips, and irregular bursts. The proper motion outliers unveil a more complex star formation history than inferred from Gaia alone, and variability highlights previously hidden proper motion deviations in the surrounding clouds. We also find that non-YSO variables identified by North-PHASE cover a different variability parameter space and include long-period variables, eclipsing binaries, RR Lyr, and $δ$ Scuti stars. These early results also emphasize the power of variability to complete the picture of star formation where it is missed by astrometry. △ Less

Submitted 24 June, 2024; originally announced June 2024.

Comments: Accepted by MNRAS

arXiv:2406.16253 [pdf, other]

LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing

Authors: Jiangshu Du, Yibo Wang, Wenting Zhao, Zhongfen Deng, Shuaiqi Liu, Renze Lou, Henry Peng Zou, Pranav Narayanan Venkit, Nan Zhang, Mukund Srinath, Haoran Ranran Zhang, Vipul Gupta, Yinghui Li, Tao Li, Fei Wang, Qin Liu, Tianlin Liu, Pengzhi Gao, Congying Xia, Chen Xing, Jiayang Cheng, Zhaowei Wang, Ying Su, Raj Sanjay Shah, Ruohao Guo , et al. (15 additional authors not shown)

Abstract: This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th… ▽ More This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload? This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis. △ Less

Submitted 25 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

arXiv:2406.12844 [pdf, other]

Synergizing Foundation Models and Federated Learning: A Survey

Authors: Shenghui Li, Fanghua Ye, Meng Fang, Jiaxu Zhao, Yun-Hin Chan, Edith C. -H. Ngai, Thiemo Voigt

Abstract: The recent development of Foundation Models (FMs), represented by large language models, vision transformers, and multimodal models, has been making a significant impact on both academia and industry. Compared with small-scale models, FMs have a much stronger demand for high-volume data during the pre-training phase. Although general FMs can be pre-trained on data collected from open sources such… ▽ More The recent development of Foundation Models (FMs), represented by large language models, vision transformers, and multimodal models, has been making a significant impact on both academia and industry. Compared with small-scale models, FMs have a much stronger demand for high-volume data during the pre-training phase. Although general FMs can be pre-trained on data collected from open sources such as the Internet, domain-specific FMs need proprietary data, posing a practical challenge regarding the amount of data available due to privacy concerns. Federated Learning (FL) is a collaborative learning paradigm that breaks the barrier of data availability from different participants. Therefore, it provides a promising solution to customize and adapt FMs to a wide range of domain-specific tasks using distributed datasets whilst preserving privacy. This survey paper discusses the potentials and challenges of synergizing FL and FMs and summarizes core techniques, future directions, and applications. A periodically updated paper collection on FM-FL is available at https://github.com/lishenghui/awesome-fm-fl. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.10671 [pdf]

Augmenting Biomedical Named Entity Recognition with General-domain Resources

Authors: Yu Yin, Hyunjae Kim, Xiao Xiao, Chih Hsuan Wei, Jaewoo Kang, Zhiyong Lu, Hua Xu, Meng Fang, Qingyu Chen

Abstract: Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle… ▽ More Training a neural network-based biomedical named entity recognition (BioNER) model usually requires extensive and costly human annotations. While several studies have employed multi-task learning with multiple BioNER datasets to reduce human effort, this approach does not consistently yield performance improvements and may introduce label ambiguity in different biomedical corpora. We aim to tackle those challenges through transfer learning from easily accessible resources with fewer concept overlaps with biomedical datasets. In this paper, we proposed GERBERA, a simple-yet-effective method that utilized a general-domain NER dataset for training. Specifically, we performed multi-task learning to train a pre-trained biomedical language model with both the target BioNER dataset and the general-domain dataset. Subsequently, we fine-tuned the models specifically for the BioNER dataset. We systematically evaluated GERBERA on five datasets of eight entity types, collectively consisting of 81,410 instances. Despite using fewer biomedical resources, our models demonstrated superior performance compared to baseline models trained with multiple additional BioNER datasets. Specifically, our models consistently outperformed the baselines in six out of eight entity types, achieving an average improvement of 0.9% over the best baseline performance across eight biomedical entity types sourced from five different corpora. Our method was especially effective in amplifying performance on BioNER datasets characterized by limited data, with a 4.7% improvement in F1 scores on the JNLPBA-RNA dataset. △ Less

Submitted 18 June, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: We make data, codes, and models publicly available via https://github.com/qingyu-qc/bioner_gerbera

arXiv:2406.10416 [pdf, other]

Byzantine-Robust Decentralized Federated Learning

Authors: Minghong Fang, Zifan Zhang, Hairi, Prashant Khanduri, Jia Liu, Songtao Lu, Yuchen Liu, Neil Gong

Abstract: Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bot… ▽ More Federated learning (FL) enables multiple clients to collaboratively train machine learning models without revealing their private training data. In conventional FL, the system follows the server-assisted architecture (server-assisted FL), where the training process is coordinated by a central server. However, the server-assisted FL framework suffers from poor scalability due to a communication bottleneck at the server, and trust dependency issues. To address challenges, decentralized federated learning (DFL) architecture has been proposed to allow clients to train models collaboratively in a serverless and peer-to-peer manner. However, due to its fully decentralized nature, DFL is highly vulnerable to poisoning attacks, where malicious clients could manipulate the system by sending carefully-crafted local models to their neighboring clients. To date, only a limited number of Byzantine-robust DFL methods have been proposed, most of which are either communication-inefficient or remain vulnerable to advanced poisoning attacks. In this paper, we propose a new algorithm called BALANCE (Byzantine-robust averaging through local similarity in decentralization) to defend against poisoning attacks in DFL. In BALANCE, each client leverages its own local model as a similarity reference to determine if the received model is malicious or benign. We establish the theoretical convergence guarantee for BALANCE under poisoning attacks in both strongly convex and non-convex settings. Furthermore, the convergence rate of BALANCE under poisoning attacks matches those of the state-of-the-art counterparts in Byzantine-free settings. Extensive experiments also demonstrate that BALANCE outperforms existing DFL methods and effectively defends against poisoning attacks. △ Less

Submitted 13 July, 2024; v1 submitted 14 June, 2024; originally announced June 2024.

Comments: To appear in ACM Conference on Computer and Communications Security 2024 (CCS '24)

arXiv:2406.09304 [pdf]

Self-reconfigurable Multifunctional Memristive Nociceptor for Intelligent Robotics

Authors: Shengbo Wang, Mingchao Fang, Lekai Song, Cong Li, Jian Zhang, Arokia Nathan, Guohua Hu, Shuo Gao

Abstract: Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute… ▽ More Artificial nociceptors, mimicking human-like stimuli perception, are of significance for intelligent robotics to work in hazardous and dynamic scenarios. One of the most essential characteristics of the human nociceptor is its self-adjustable attribute, which indicates that the threshold of determination of a potentially hazardous stimulus relies on environmental knowledge. This critical attribute has been currently omitted, but it is highly desired for artificial nociceptors. Inspired by these shortcomings, this article presents, for the first time, a Self-Directed Channel (SDC) memristor-based self-reconfigurable nociceptor, capable of perceiving hazardous pressure stimuli under different temperatures and demonstrates key features of tactile nociceptors, including 'threshold,' 'no-adaptation,' and 'sensitization.' The maximum amplification of hazardous external stimuli is 1000%, and its response characteristics dynamically adapt to current temperature conditions by automatically altering the generated modulation schemes for the memristor. The maximum difference ratio of the response of memristors at different temperatures is 500%, and this adaptability closely mimics the functions of biological tactile nociceptors, resulting in accurate danger perception in various conditions. Beyond temperature adaptation, this memristor-based nociceptor has the potential to integrate different sensory modalities by applying various sensors, thereby achieving human-like perception capabilities in real-world environments. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2406.08835 [pdf, other]

A Single-Step Non-Autoregressive Automatic Speech Recognition Architecture with High Accuracy and Inference Speed

Authors: Ziyang Zhuang, Chenfeng Miao, Kun Zou, Shuai Gong, Ming Fang, Tao Wei, Zijian Li, Wei Hu, Shaojun Wang, Jing Xiao

Abstract: Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, ca… ▽ More Non-autoregressive (NAR) automatic speech recognition (ASR) models predict tokens independently and simultaneously, bringing high inference speed. However, there is still a gap in the accuracy of the NAR models compared to the autoregressive (AR) models. To further narrow the gap between the NAR and AR models, we propose a single-step NAR ASR architecture with high accuracy and inference speed, called EfficientASR. It uses an Index Mapping Vector (IMV) based alignment generator to generate alignments during training, and an alignment predictor to learn the alignments for inference. It can be trained end-to-end (E2E) with cross-entropy loss combined with alignment loss. The proposed EfficientASR achieves competitive results on the AISHELL-1 and AISHELL-2 benchmarks compared to the state-of-the-art (SOTA) models. Specifically, it achieves character error rates (CER) of 4.26%/4.62% on the AISHELL-1 dev/test dataset, which outperforms the SOTA AR Conformer with about 30x inference speedup. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.04836 [pdf, other]

Revisiting Catastrophic Forgetting in Large Language Model Tuning

Authors: Hongyu Li, Liang Ding, Meng Fang, Dacheng Tao

Abstract: Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data. It compromises the effectiveness of large language models (LLMs) during fine-tuning, yet the underlying causes have not been thoroughly investigated. This paper takes the first step to reveal the direct link between the flatness of the model loss landscape and the extent of CF in the field of… ▽ More Catastrophic Forgetting (CF) means models forgetting previously acquired knowledge when learning new data. It compromises the effectiveness of large language models (LLMs) during fine-tuning, yet the underlying causes have not been thoroughly investigated. This paper takes the first step to reveal the direct link between the flatness of the model loss landscape and the extent of CF in the field of LLMs. Based on this, we introduce the sharpness-aware minimization to mitigate CF by flattening the loss landscape. Experiments on three widely-used fine-tuning datasets, spanning different model scales, demonstrate the effectiveness of our method in alleviating CF. Analyses show that we nicely complement the existing anti-forgetting strategies, further enhancing the resistance of LLMs to CF. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.01205 [pdf, other]

ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control With Decoupled Codec

Authors: Shengpeng Ji, Jialong Zuo, Minghui Fang, Siqi Zheng, Qian Chen, Wen Wang, Ziyue Jiang, Hai Huang, Xize Cheng, Rongjie Huang, Zhou Zhao

Abstract: In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and… ▽ More In this paper, we present ControlSpeech, a text-to-speech (TTS) system capable of fully cloning the speaker's voice and enabling arbitrary control and adjustment of speaking style, merely based on a few seconds of audio prompt and a simple textual style description prompt. Prior zero-shot TTS models and controllable TTS models either could only mimic the speaker's voice without further control and adjustment capabilities or were unrelated to speaker-specific voice generation. Therefore, ControlSpeech focuses on a more challenging new task-a TTS system with controllable timbre, content, and style at the same time. ControlSpeech takes speech prompts, content prompts, and style prompts as inputs and utilizes bidirectional attention and mask-based parallel decoding to capture corresponding codec representations in a discrete decoupling codec space. Moreover, we discovered the issue of text style controllability in a many-to-many mapping fashion and proposed the Style Mixture Semantic Density (SMSD) model to resolve this problem. SMSD module which is based on Gaussian mixture density networks, is designed to enhance the fine-grained partitioning and sampling capabilities of style semantic information and generate speech with more diverse styles. In terms of experiments, we make available a controllable model toolkit called ControlToolkit with a new style controllable dataset, some replicated baseline models and propose new metrics to evaluate both the control capability and the quality of generated audio in ControlSpeech. The relevant ablation studies validate the necessity of each component in ControlSpeech is necessary. We hope that ControlSpeech can establish the next foundation paradigm of controllable speech synthesis. The relevant code and demo are available at https://github.com/jishengpeng/ControlSpeech . △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.20018 [pdf, other]

Safe Multi-agent Reinforcement Learning with Natural Language Constraints

Authors: Ziyan Wang, Meng Fang, Tristan Tomilin, Fei Fang, Yali Du

Abstract: The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked. While Safe MARL has vast potential, especially in fields like robotics and autonomous vehicles, its full potential is limited by the need to define constraints in pre-designed mathematical terms, which requires extensive domain expertise and reinforcement learning knowledge,… ▽ More The role of natural language constraints in Safe Multi-agent Reinforcement Learning (MARL) is crucial, yet often overlooked. While Safe MARL has vast potential, especially in fields like robotics and autonomous vehicles, its full potential is limited by the need to define constraints in pre-designed mathematical terms, which requires extensive domain expertise and reinforcement learning knowledge, hindering its broader adoption. To address this limitation and make Safe MARL more accessible and adaptable, we propose a novel approach named Safe Multi-agent Reinforcement Learning with Natural Language constraints (SMALL). Our method leverages fine-tuned language models to interpret and process free-form textual constraints, converting them into semantic embeddings that capture the essence of prohibited states and behaviours. These embeddings are then integrated into the multi-agent policy learning process, enabling agents to learn policies that minimize constraint violations while optimizing rewards. To evaluate the effectiveness of SMALL, we introduce the LaMaSafe, a multi-task benchmark designed to assess the performance of multiple agents in adhering to natural language constraints. Empirical evaluations across various environments demonstrate that SMALL achieves comparable rewards and significantly fewer constraint violations, highlighting its effectiveness in understanding and enforcing natural language constraints. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 23 pages, 6 figures

arXiv:2405.19946 [pdf, other]

Learning to Discuss Strategically: A Case Study on One Night Ultimate Werewolf

Authors: Xuanfa Jin, Ziyan Wang, Yali Du, Meng Fang, Haifeng Zhang, Jun Wang

Abstract: Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Were… ▽ More Communication is a fundamental aspect of human society, facilitating the exchange of information and beliefs among people. Despite the advancements in large language models (LLMs), recent agents built with these often neglect the control over discussion tactics, which are essential in communication scenarios and games. As a variant of the famous communication game Werewolf, One Night Ultimate Werewolf (ONUW) requires players to develop strategic discussion policies due to the potential role changes that increase the uncertainty and complexity of the game. In this work, we first present the existence of the Perfect Bayesian Equilibria (PBEs) in two scenarios of the ONUW game: one with discussion and one without. The results showcase that the discussion greatly changes players' utilities by affecting their beliefs, emphasizing the significance of discussion tactics. Based on the insights obtained from the analyses, we propose an RL-instructed language agent framework, where a discussion policy trained by reinforcement learning (RL) is employed to determine appropriate discussion tactics to adopt. Our experimental results on several ONUW game settings demonstrate the effectiveness and generalizability of our proposed framework. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 27 pages, 5 figures

arXiv:2405.12604 [pdf, other]

Tiny Refinements Elicit Resilience: Toward Efficient Prefix-Model Against LLM Red-Teaming

Authors: Jiaxu Liu, Xiangyu Yin, Sihao Wu, Jianhong Wang, Meng Fang, Xinping Yi, Xiaowei Huang

Abstract: With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, ef… ▽ More With the proliferation of red-teaming strategies for Large Language Models (LLMs), the deficiency in the literature about improving the safety and robustness of LLM defense strategies is becoming increasingly pronounced. This paper introduces the LLM-based \textbf{sentinel} model as a plug-and-play prefix module designed to reconstruct the input prompt with just a few ($<30$) additional tokens, effectively reducing toxicity in responses from target LLMs. The sentinel model naturally overcomes the \textit{parameter inefficiency} and \textit{limited model accessibility} for fine-tuning large target models. We employ an interleaved training regimen using Proximal Policy Optimization (PPO) to optimize both red team and sentinel models dynamically, incorporating a value head-sharing mechanism inspired by the multi-agent centralized critic to manage the complex interplay between agents. Our extensive experiments across text-to-text and text-to-image demonstrate the effectiveness of our approach in mitigating toxic outputs, even when dealing with larger models like \texttt{Llama-2}, \texttt{GPT-3.5} and \texttt{Stable-Diffusion}, highlighting the potential of our framework in enhancing safety and robustness in various applications. △ Less

Submitted 17 June, 2024; v1 submitted 21 May, 2024; originally announced May 2024.

Comments: Preprint, 10 pages main with 10 pages appendix

arXiv:2405.11286 [pdf, other]

Motion Avatar: Generate Human and Animal Avatars with Arbitrary Motion

Authors: Zeyu Zhang, Yiran Wang, Biao Wu, Shuo Chen, Zhiyuan Zhang, Shiya Huang, Wenbo Zhang, Meng Fang, Ling Chen, Yang Zhao

Abstract: In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. A… ▽ More In recent years, there has been significant interest in creating 3D avatars and motions, driven by their diverse applications in areas like film-making, video games, AR/VR, and human-robot interaction. However, current efforts primarily concentrate on either generating the 3D avatar mesh alone or producing motion sequences, with integrating these two aspects proving to be a persistent challenge. Additionally, while avatar and motion generation predominantly target humans, extending these techniques to animals remains a significant challenge due to inadequate training data and methods. To bridge these gaps, our paper presents three key contributions. Firstly, we proposed a novel agent-based approach named Motion Avatar, which allows for the automatic generation of high-quality customizable human and animal avatars with motions through text queries. The method significantly advanced the progress in dynamic 3D character generation. Secondly, we introduced a LLM planner that coordinates both motion and avatar generation, which transforms a discriminative planning into a customizable Q&A fashion. Lastly, we presented an animal motion dataset named Zoo-300K, comprising approximately 300,000 text-motion pairs across 65 animal categories and its building pipeline ZooGen, which serves as a valuable resource for the community. See project website https://steve-zeyu-zhang.github.io/MotionAvatar/ △ Less

Submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.02745 [pdf, other]

Understanding Server-Assisted Federated Learning in the Presence of Incomplete Client Participation

Authors: Haibo Yang, Peiwen Qiu, Prashant Khanduri, Minghong Fang, Jia Liu

Abstract: Existing works in federated learning (FL) often assume an ideal system with either full client or uniformly distributed client participation. However, in practice, it has been observed that some clients may never participate in FL training (aka incomplete client participation) due to a myriad of system heterogeneity factors. A popular approach to mitigate impacts of incomplete client participation… ▽ More Existing works in federated learning (FL) often assume an ideal system with either full client or uniformly distributed client participation. However, in practice, it has been observed that some clients may never participate in FL training (aka incomplete client participation) due to a myriad of system heterogeneity factors. A popular approach to mitigate impacts of incomplete client participation is the server-assisted federated learning (SA-FL) framework, where the server is equipped with an auxiliary dataset. However, despite SA-FL has been empirically shown to be effective in addressing the incomplete client participation problem, there remains a lack of theoretical understanding for SA-FL. Meanwhile, the ramifications of incomplete client participation in conventional FL are also poorly understood. These theoretical gaps motivate us to rigorously investigate SA-FL. Toward this end, we first show that conventional FL is {\em not} PAC-learnable under incomplete client participation in the worst case. Then, we show that the PAC-learnability of FL with incomplete client participation can indeed be revived by SA-FL, which theoretically justifies the use of SA-FL for the first time. Lastly, to provide practical guidance for SA-FL training under {\em incomplete client participation}, we propose the $\mathsf{SAFARI}$ (server-assisted federated averaging) algorithm that enjoys the same linear convergence speedup guarantees as classic FL with ideal client participation assumptions, offering the first SA-FL algorithm with convergence guarantee. Extensive experiments on different datasets show $\mathsf{SAFARI}$ significantly improves the performance under incomplete client participation. △ Less

Submitted 25 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

Comments: Accepted in ICML2024

arXiv:2404.18074 [pdf, other]

MMAC-Copilot: Multi-modal Agent Collaboration Operating System Copilot

Authors: Zirui Song, Yaohang Li, Meng Fang, Zhenhao Chen, Zecheng Shi, Yuan Huang, Ling Chen

Abstract: Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration ch… ▽ More Autonomous virtual agents are often limited by their singular mode of interaction with real-world environments, restricting their versatility. To address this, we propose the Multi-Modal Agent Collaboration framework (MMAC-Copilot), a framework utilizes the collective expertise of diverse agents to enhance interaction ability with operating systems. The framework introduces a team collaboration chain, enabling each participating agent to contribute insights based on their specific domain knowledge, effectively reducing the hallucination associated with knowledge domain gaps. To evaluate the performance of MMAC-Copilot, we conducted experiments using both the GAIA benchmark and our newly introduced Visual Interaction Benchmark (VIBench). VIBench focuses on non-API-interactable applications across various domains, including 3D gaming, recreation, and office scenarios. MMAC-Copilot achieved exceptional performance on GAIA, with an average improvement of 6.8\% over existing leading systems. Furthermore, it demonstrated remarkable capability on VIBench, particularly in managing various methods of interaction within systems and applications. These results underscore MMAC-Copilot's potential in advancing the field of autonomous virtual agents through its innovative approach to agent collaboration. △ Less

Submitted 4 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

Comments: In processing

arXiv:2404.15611 [pdf, other]

Model Poisoning Attacks to Federated Learning via Multi-Round Consistency

Authors: Yueqi Xie, Minghong Fang, Neil Zhenqiang Gong

Abstract: Model poisoning attacks are critical security threats to Federated Learning (FL). Existing model poisoning attacks suffer from two key limitations: 1) they achieve suboptimal effectiveness when defenses are deployed, and/or 2) they require knowledge of the model updates or local training data on genuine clients. In this work, we make a key observation that their suboptimal effectiveness arises fro… ▽ More Model poisoning attacks are critical security threats to Federated Learning (FL). Existing model poisoning attacks suffer from two key limitations: 1) they achieve suboptimal effectiveness when defenses are deployed, and/or 2) they require knowledge of the model updates or local training data on genuine clients. In this work, we make a key observation that their suboptimal effectiveness arises from only leveraging model-update consistency among malicious clients within individual training rounds, making the attack effect self-cancel across training rounds. In light of this observation, we propose PoisonedFL, which enforces multi-round consistency among the malicious clients' model updates while not requiring any knowledge about the genuine clients. Our empirical evaluation on five benchmark datasets shows that PoisonedFL breaks eight state-of-the-art defenses and outperforms seven existing model poisoning attacks. Moreover, we also explore new defenses that are tailored to PoisonedFL, but our results show that we can still adapt PoisonedFL to break them. Our study shows that FL systems are considerably less robust than previously thought, underlining the urgency for the development of new defense mechanisms. △ Less

Submitted 6 June, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

arXiv:2404.14389 [pdf, other]

Poisoning Attacks on Federated Learning-based Wireless Traffic Prediction

Authors: Zifan Zhang, Minghong Fang, Jiayuan Huang, Yuchen Liu

Abstract: Federated Learning (FL) offers a distributed framework to train a global control model across multiple base stations without compromising the privacy of their local network data. This makes it ideal for applications like wireless traffic prediction (WTP), which plays a crucial role in optimizing network resources, enabling proactive traffic flow management, and enhancing the reliability of downstr… ▽ More Federated Learning (FL) offers a distributed framework to train a global control model across multiple base stations without compromising the privacy of their local network data. This makes it ideal for applications like wireless traffic prediction (WTP), which plays a crucial role in optimizing network resources, enabling proactive traffic flow management, and enhancing the reliability of downstream communication-aided applications, such as IoT devices, autonomous vehicles, and industrial automation systems. Despite its promise, the security aspects of FL-based distributed wireless systems, particularly in regression-based WTP problems, remain inadequately investigated. In this paper, we introduce a novel fake traffic injection (FTI) attack, designed to undermine the FL-based WTP system by injecting fabricated traffic distributions with minimal knowledge. We further propose a defense mechanism, termed global-local inconsistency detection (GLID), which strategically removes abnormal model parameters that deviate beyond a specific percentile range estimated through statistical methods in each dimension. Extensive experimental evaluations, performed on real-world wireless traffic datasets, demonstrate that both our attack and defense strategies significantly outperform existing baselines. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Accepted by IFIP/IEEE Networking 2024

ACM Class: C.2.1

arXiv:2404.12754 [pdf, other]

Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation

Authors: Qiang He, Tianyi Zhou, Meng Fang, Setareh Maghsudi

Abstract: Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation r… ▽ More Representation rank is an important concept for understanding the role of Neural Networks (NNs) in Deep Reinforcement learning (DRL), which measures the expressive capacity of value networks. Existing studies focus on unboundedly maximizing this rank; nevertheless, that approach would introduce overly complex models in the learning, thus undermining performance. Hence, fine-tuning representation rank presents a challenging and crucial optimization problem. To address this issue, we find a guiding principle for adaptive control of the representation rank. We employ the Bellman equation as a theoretical foundation and derive an upper bound on the cosine similarity of consecutive state-action pairs representations of value networks. We then leverage this upper bound to propose a novel regularizer, namely BEllman Equation-based automatic rank Regularizer (BEER). This regularizer adaptively regularizes the representation rank, thus improving the DRL agent's performance. We first validate the effectiveness of automatic control of rank on illustrative experiments. Then, we scale up BEER to complex continuous control tasks by combining it with the deterministic policy gradient method. Among 12 challenging DeepMind control tasks, BEER outperforms the baselines by a large margin. Besides, BEER demonstrates significant advantages in Q-value approximation. Our code is available at https://github.com/sweetice/BEER-ICLR2024. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR23; Code: https://github.com/sweetice/BEER-ICLR2024

arXiv:2404.09975 [pdf, other]

doi 10.1051/0004-6361/202449476

Stellar population astrophysics (SPA) with the TNG: Measurement of the He I 10830Å line in the open cluster Stock 2

Authors: Mingjie Jian, Xiaoting Fu, Noriyuki Matsunaga, Valentina D'Orazi, Angela Bragaglia, Daisuke Taniguchi, Min Fang, Nicoletta Sanna, Sara Lucatello, Antonio Frasca, Javier Alonso-Santiago, Giovanni Catanzaro, Ernesto Oliva

Abstract: The precise measurement of stellar abundances plays a pivotal role in providing constraints on the chemical evolution of the Galaxy. However, before spectral lines can be employed as reliable abundance indicators, particularly for challenging elements such as helium, they must undergo thorough scrutiny. Galactic open clusters, representing well-defined single stellar populations, offer an ideal se… ▽ More The precise measurement of stellar abundances plays a pivotal role in providing constraints on the chemical evolution of the Galaxy. However, before spectral lines can be employed as reliable abundance indicators, particularly for challenging elements such as helium, they must undergo thorough scrutiny. Galactic open clusters, representing well-defined single stellar populations, offer an ideal setting for unfolding the information stored in the helium spectral line feature. In this study, we characterize the profile and strength of the helium transition at around 10830Å (He 10830) in nine giant stars in the Galactic open cluster Stock 2. To remove the influence of weak blending lines near the helium feature, we calibrated their oscillator strengths ($\log gf$) by employing corresponding abundances obtained from simultaneously observed optical spectra. Our observations reveal that He 10830 in all the targets is observed in absorption, with line strengths categorized into two groups. Three stars exhibit strong absorption, including a discernible secondary component, while the remaining stars exhibit weaker absorption. The lines are in symmetry and align with or around their rest wavelengths, suggesting a stable upper chromosphere without a significant systematic mass motion. We found a correlation between He 10830 strength and Ca II $\log{R'_\mathrm{HK}}$ index, with a slope similar to that reported in previous studies on dwarf stars. This correlation underscores the necessity of accounting for stellar chromosphere structure when employing He 10830 as a probe for stellar helium abundance. The procedure of measuring the He 10830 we developed in this study is applicable not only to other Galactic open clusters but also to field stars, with the aim of mapping helium abundance across various types of stars in the future. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 15 pages, 10 figures, 4 tables, accepted for publication in A&A

Journal ref: A&A 687, A189 (2024)

arXiv:2403.16927 [pdf, other]

Enabling pulse shape discrimination with commercial ASICs

Authors: John Leland, Ming Fang, Satwik Pani, Yuri Venturini, Marco Locatelli, Angela Di Fulvio

Abstract: Fast electronic readout for high-channel density scintillator-based systems is needed for radiation tracking and imaging in a wide range of applications, including nuclear physics, nuclear security and nonproliferation. Programmable electronics, like FPGAs and ASICs, provide a fast way of conditioning and processing the signal in real time. In this paper, we present a pulse shape discrimination (P… ▽ More Fast electronic readout for high-channel density scintillator-based systems is needed for radiation tracking and imaging in a wide range of applications, including nuclear physics, nuclear security and nonproliferation. Programmable electronics, like FPGAs and ASICs, provide a fast way of conditioning and processing the signal in real time. In this paper, we present a pulse shape discrimination (PSD) method based on the shaping circuit of a commercially available ASIC, the Citiroc1A by CAEN Technologies. We used two different shaping times per detector channel to calculate a shaping parameter that enables PSD. Using our new method, neutron and gamma-ray pulses detected by a d$_{12}$-stilbene scintillator can be effectively discriminated at light output values greater than 0.15 MeVee. While not achieving the PSD performance of traditional offline charge integration, our method does not require the transfer of data to a separate system for further processing and enables the direct deployment of high-channel density multi-particle detection systems. Moreover, the availability of a wider range of shaping times than those on the Citiroc1A can potentially further improve the PSD performance. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 20 pages, 17 figures

arXiv:2403.15347 [pdf, other]

doi 10.1103/PhysRevB.109.155426

Exciton-activated effective phonon magnetic moment in monolayer MoS2

Authors: Chunli Tang, Gaihua Ye, Cynthia Nnokwe, Mengqi Fang, Li Xiang, Masoud Mahjouri-Samani, Dmitry Smirnov, Eui-Hyeok Yang, Tingting Wang, Lifa Zhang, Rui He, Wencan Jin

Abstract: Optical excitation of chiral phonons plays a vital role in studying the phonon-driven magnetic phenomena in solids. Transition metal dichalcogenides host chiral phonons at high symmetry points of the Brillouin zone, providing an ideal platform to explore the interplay between chiral phonons and valley degree of freedom. Here, we investigate the helicity-resolved magneto-Raman response of monolayer… ▽ More Optical excitation of chiral phonons plays a vital role in studying the phonon-driven magnetic phenomena in solids. Transition metal dichalcogenides host chiral phonons at high symmetry points of the Brillouin zone, providing an ideal platform to explore the interplay between chiral phonons and valley degree of freedom. Here, we investigate the helicity-resolved magneto-Raman response of monolayer MoS2 and identify a doubly degenerate Brillouin-zone-center chiral phonon mode at ~270 cm-1. Our wavelength- and temperature-dependent measurements show that this chiral phonon is activated through the resonant excitation of A exciton. Under an out-of-plane magnetic field, the chiral phonon exhibits giant Zeeman splitting, which corresponds to an effective magnetic moment of ~2.5mu_B. Moreover, we carry out theoretical calculations based on the morphic effects in nonmagnetic crystals, which reproduce the linear Zeeman splitting and Raman cross-section of the chiral phonon. Our study provides important insights into lifting the chiral phonon degeneracy in an achiral covalent material, paving a new route to excite and control chiral phonons. △ Less

Submitted 7 April, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Journal ref: Phys. Rev. B 109, 155426 (2024)

arXiv:2403.12771 [pdf, other]

TYC 3340-2437-1: A Quadruple System with A Massive Star

Authors: Jiao Li, Chao Liu, Changqing Luo, Bo Zhang, Jiang-Dan Li, Jia-Dong Li, Zhan-Wen Han, Xue-Fei Chen, Lu-Qian Wang, Min Fang, Li-Feng Xing, Xi-Liang Zhang, Chichuan Jin

Abstract: Hierarchical massive quadruple systems are ideal laboratories for examining the theories of star formation, dynamical evolution, and stellar evolution. The successive mergers of hierarchical quadruple systems might explain the mass gap between neutron stars and black holes. Looking for light curves of O-type binaries identified by LAMOST, we find a (2+2) quadruple system: TYC 3340-2437-1, located… ▽ More Hierarchical massive quadruple systems are ideal laboratories for examining the theories of star formation, dynamical evolution, and stellar evolution. The successive mergers of hierarchical quadruple systems might explain the mass gap between neutron stars and black holes. Looking for light curves of O-type binaries identified by LAMOST, we find a (2+2) quadruple system: TYC 3340-2437-1, located in the stellar bow-shock nebula (SBN). It has a probability of over 99.99\% being a quadruple system derived from the surface density of the vicinity stars. Its inner orbital periods are 3.390602(89) days and 2.4378(16) days, respectively, and the total mass is about (11.47 + 5.79) + (5.2 + 2.02) = 24.48 $M_{\odot}$. The line-of-sight inclinations of the inner binaries, B$_1$ and B$_2$, are 55.94 and 78.2 degrees, respectively, indicating that they are not co-planar. Based on observations spanning 34 months and the significance of the astrometric excess noise ($D>2$) in Gaia DR3 data, we guess that its outer orbital period might be a few years. If it were true, the quadruple system might form through the disk fragmentation mechanism with outer eccentric greater than zero. This eccentricity could be the cause of both the arc-like feature of the SBN and the noncoplanarity of the inner orbit. The outer orbital period and outer eccentric could be determined with the release of future epoch astrometric data of Gaia. △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.10529 [pdf, ps, other]

Closed Form for Half-Area Overlap Offset of 2 Unit Disks

Authors: Max Chicky Fang

Abstract: The separation between the centers of two unit circles such that their overlapping area is exactly half of each's area is known to be around $0.8079455\dots$ (OEIS A133741). However, no closed form of this number is known. Here, we determine its closed form representation in terms of the inverse regularized beta function. The separation between the centers of two unit circles such that their overlapping area is exactly half of each's area is known to be around $0.8079455\dots$ (OEIS A133741). However, no closed form of this number is known. Here, we determine its closed form representation in terms of the inverse regularized beta function. △ Less

Submitted 15 January, 2024; originally announced March 2024.

arXiv:2403.09308 [pdf, other]

Enabling Waypoint Generation for Collaborative Robots using LLMs and Mixed Reality

Authors: Cathy Mengying Fang, Krzysztof Zieliński, Pattie Maes, Joe Paradiso, Bruce Blumberg, Mikkel Baun Kjærgaard

Abstract: Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It als… ▽ More Programming a robotic is a complex task, as it demands the user to have a good command of specific programming languages and awareness of the robot's physical constraints. We propose a framework that simplifies robot deployment by allowing direct communication using natural language. It uses large language models (LLM) for prompt processing, workspace understanding, and waypoint generation. It also employs Augmented Reality (AR) to provide visual feedback of the planned outcome. We showcase the effectiveness of our framework with a simple pick-and-place task, which we implement on a real robot. Moreover, we present an early concept of expressive robot behavior and skill generation that can be used to communicate with the user and learn new skills (e.g., object grasping). △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Submitted to VLMNM 2024 - Workshop, ICRA 2024. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2403.06475 [pdf, other]

Relative velocities between $^{13}$CO structures within $^{12}$CO Molecular clouds

Authors: Lixia Yuan, Ji Yang, Xuepeng Chen, Yang Su, Shaobo Zhang, Xin Zhou, Zhiwei Chen, Qing-Zeng Yan, Min Fang, Fujun Du, Yan Sun, Hongchi Wang, Ye Xu

Abstract: Velocity fields of molecular clouds (MCs) can provide crucial information on the merger and split between clouds, as well as their internal kinematics and maintenance, energy injection and redistribution, even star formation within clouds. Using the CO spectral lines data from the Milky Way Imaging Scroll Painting (MWISP) survey, we measure the relative velocities along the line of sight ($Δ$V… ▽ More Velocity fields of molecular clouds (MCs) can provide crucial information on the merger and split between clouds, as well as their internal kinematics and maintenance, energy injection and redistribution, even star formation within clouds. Using the CO spectral lines data from the Milky Way Imaging Scroll Painting (MWISP) survey, we measure the relative velocities along the line of sight ($Δ$V$_{\rm LOS}$) between $^{13}$CO structures within $^{12}$CO MCs. Emphasizing MCs with double and triple $^{13}$CO structures, we find that approximately 70$\%$ of $Δ$V$_{\rm LOS}$ values are less than $\sim$ 1 km s$^{-1}$, and roughly 10$\%$ of values exceed 2 km s$^{-1}$, with a maximum of $\sim$ 5 km s$^{-1}$. Additionally, we compare $Δ$V$_{\rm LOS}$ with the internal velocity dispersion of $^{13}$CO structures ($σ_{\rm ^{13}CO,in}$) and find that about 40$\%$ of samples in either double or triple regime display distinct velocity discontinuities, i.e. the relative velocities between $^{13}$CO structures are larger than the internal linewidths of $^{13}$CO structures. Among these 40$\%$ samples in the triple regime, 33$\%$ exhibit signatures of combinations through the two-body motion, whereas the remaining 7$\%$ show features of configurations through the multiple-body motion. The $Δ$V$_{\rm LOS}$ distributions for MCs with double and triple $^{13}$CO structures are similar, as well as their $Δ$V$_{\rm LOS}$/$σ_{\rm ^{13}CO,in}$ distributions. This suggests that relative motions of $^{13}$CO structures within MCs are random and independent of cloud complexities and scales. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 16 pages, 11 figures, accepted for publication in AJ

arXiv:2403.03149 [pdf, other]

Robust Federated Learning Mitigates Client-side Training Data Distribution Inference Attacks

Authors: Yichang Xu, Ming Yin, Minghong Fang, Neil Zhenqiang Gong

Abstract: Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or… ▽ More Recent studies have revealed that federated learning (FL), once considered secure due to clients not sharing their private data with the server, is vulnerable to attacks such as client-side training data distribution inference, where a malicious client can recreate the victim's data. While various countermeasures exist, they are not practical, often assuming server access to some training data or knowledge of label distribution before the attack. In this work, we bridge the gap by proposing InferGuard, a novel Byzantine-robust aggregation rule aimed at defending against client-side training data distribution inference attacks. In our proposed InferGuard, the server first calculates the coordinate-wise median of all the model updates it receives. A client's model update is considered malicious if it significantly deviates from the computed median update. We conduct a thorough evaluation of our proposed InferGuard on five benchmark datasets and perform a comparison with ten baseline methods. The results of our experiments indicate that our defense mechanism is highly effective in protecting against client-side training data distribution inference attacks, even against strong adaptive attacks. Furthermore, our method substantially outperforms the baseline methods in various practical FL scenarios. △ Less

Submitted 4 April, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

Comments: To appear in The Web Conference 2024 (WWW '24)

arXiv:2403.02385 [pdf, other]

Magnetically Aligned Striations in the L914 Filamentary Cloud

Authors: Li Sun, Xuepeng Chen, Min Fang, Shaobo Zhang, Yan Gong, Jiancheng Feng, Xuefu Li, Qing-Zeng Yan, Ji Yang

Abstract: We present CO ($J = 1-0$) multi-line observations toward the L914 dark cloud in the vicinity of the Cygnus X region, using the 13.7 m millimeter telescope of the Purple Mountain Observatory (PMO). The CO observations reveal in the L914 cloud a long filament with an angular length of $\sim 3.\!\!^\circ 6$, corresponding to approximately $\rm 50~pc$ at the measured distance of $\sim\rm 760~pc$. Furt… ▽ More We present CO ($J = 1-0$) multi-line observations toward the L914 dark cloud in the vicinity of the Cygnus X region, using the 13.7 m millimeter telescope of the Purple Mountain Observatory (PMO). The CO observations reveal in the L914 cloud a long filament with an angular length of $\sim 3.\!\!^\circ 6$, corresponding to approximately $\rm 50~pc$ at the measured distance of $\sim\rm 760~pc$. Furthermore, a group of hair-like striations are discovered in the two subregions of the L914 cloud, which are connected with the dense ridge of the filament. These striations display quasi-periodic characteristics in both the CO intensity images and position-velocity diagrams. Two of the striations also show increasing velocity gradients and dispersions toward the dense ridge, which could be fitted by accretion flows under gravity. Based on the $Planck$ 353 GHz dust polarization data, we find that the striations are well aligned with the magnetic fields. Moreover, both the striations and magnetic fields are perpendicular to the dense ridge, which constructs a bimodal configuration. Using the classic method, we estimate the strength of magnetic field, and further evaluate the relative importance of gravity, turbulence and magnetic field, and find that the L914 cloud is strongly magnetized. Our results suggest that magnetic fields play an important role in the formation of filamentary structures by channelling the material along the striations toward the dense ridge. The comparison between the observations and simulations suggests that striations could be a product of the magnetohydrodynamic (MHD) process. △ Less

Submitted 4 March, 2024; originally announced March 2024.

Comments: 25 pages, 17 figures, 2 tables. Accepted for publication in AJ

arXiv:2403.00061 [pdf, other]

The Multilayer Nature of Molecular Gas toward the Cygnus Region

Authors: Shiyu Zhang, Yang Su, Xuepeng Chen, Min Fang, Qingzeng Yan, Shaobo Zhang, Yan Sun, Xiaolong Wang, Haoran Feng, Yuehui Ma, Miaomiao Zhang, Zi Zhuang, Xin Zhou, Zhiwei Chen, Ji Yang

Abstract: We study the physical properties and 3D distribution of molecular clouds (MCs) toward the Cygnus region using the MWISP CO survey and Gaia DR3 data. Based on Gaussian decomposition and clustering for $\rm ^{13}CO$ lines, over 70% of the fluxes are recovered. With the identification result of $\rm ^{13}CO$ structures, two models are designed to measure the distances of the molecular gas in velocity… ▽ More We study the physical properties and 3D distribution of molecular clouds (MCs) toward the Cygnus region using the MWISP CO survey and Gaia DR3 data. Based on Gaussian decomposition and clustering for $\rm ^{13}CO$ lines, over 70% of the fluxes are recovered. With the identification result of $\rm ^{13}CO$ structures, two models are designed to measure the distances of the molecular gas in velocity crowding regions. The distances of more than 200 large $\rm ^{13}CO$ structures are obtained toward the 150 square degree region. Additionally, tens of the identified MC structures coincide well with masers and/or intense mid-IR emission. We find multiple gas layers toward the region: (1) the extensive gas structures composing the Cygnus Rift from 700 pc to 1 kpc across the whole region; (2) the $\sim$ 1.3 kpc gas layer mainly in the Cygnus X South region; and (3) the 1.5 kpc dense filament at the Cygnus X North region and many cometary clouds shaped by Cygnus OB2. We also note that the spatial distribution of YSO candidates is generally consistent with the molecular gas structures. The total molecular mass of the Cygnus region is estimated to be $\sim 2.7\times10^{6}~M_{\odot}$ assuming an X-factor ratio $X_{\rm CO} = 2 \times 10^{20} \rm cm^{-2} (K\cdot km\cdot s^{-1})^{-1}$. The foreground Cygnus Rift contributes $\sim$25% of the molecular mass in the whole region. Our work presents a new 3D view of the MCs' distribution toward the Cygnus X region, as well as the exact molecular gas mass distribution in the foreground Cygnus Rift. △ Less

Submitted 23 April, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: 51 pages, 26 figures, 4 tables, to match the AJ version (2024 AJ 167 220Z). The data can be found at doi: 10.57760/sciencedb.16716

arXiv:2402.17333 [pdf, other]

Unsupervised multiple choices question answering via universal corpus

Authors: Qin Zhang, Hao Ge, Xiaojun Chen, Meng Fang

Abstract: Unsupervised question answering is a promising yet challenging task, which alleviates the burden of building large-scale annotated data in a new domain. It motivates us to study the unsupervised multiple-choice question answering (MCQA) problem. In this paper, we propose a novel framework designed to generate synthetic MCQA data barely based on contexts from the universal domain without relying on… ▽ More Unsupervised question answering is a promising yet challenging task, which alleviates the burden of building large-scale annotated data in a new domain. It motivates us to study the unsupervised multiple-choice question answering (MCQA) problem. In this paper, we propose a novel framework designed to generate synthetic MCQA data barely based on contexts from the universal domain without relying on any form of manual annotation. Possible answers are extracted and used to produce related questions, then we leverage both named entities (NE) and knowledge graphs to discover plausible distractors to form complete synthetic samples. Experiments on multiple MCQA datasets demonstrate the effectiveness of our method. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 5 pages, 1 figures, published to ICASSP 2024

arXiv:2402.16457 [pdf, other]

RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering

Authors: Zihan Zhang, Meng Fang, Ling Chen

Abstract: Adaptive retrieval-augmented generation (ARAG) aims to dynamically determine the necessity of retrieval for queries instead of retrieving indiscriminately to enhance the efficiency and relevance of the sourced information. However, previous works largely overlook the evaluation of ARAG approaches, leading to their effectiveness being understudied. This work presents a benchmark, RetrievalQA, compr… ▽ More Adaptive retrieval-augmented generation (ARAG) aims to dynamically determine the necessity of retrieval for queries instead of retrieving indiscriminately to enhance the efficiency and relevance of the sourced information. However, previous works largely overlook the evaluation of ARAG approaches, leading to their effectiveness being understudied. This work presents a benchmark, RetrievalQA, comprising 1,271 short-form questions covering new world and long-tail knowledge. The knowledge necessary to answer the questions is absent from LLMs; therefore, external information must be retrieved to answer correctly. This makes RetrievalQA a suitable testbed to evaluate existing ARAG methods. We observe that calibration-based methods heavily rely on threshold tuning, while vanilla prompting is inadequate for guiding LLMs to make reliable retrieval decisions. Based on our findings, we propose Time-Aware Adaptive Retrieval (TA-ARE), a simple yet effective method that helps LLMs assess the necessity of retrieval without calibration or additional training. The dataset and code will be available at https://github.com/hyintell/RetrievalQA △ Less

Submitted 5 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: Findings of ACL 2024

arXiv:2402.15346 [pdf, other]

Updated kinematics of the Radcliffe Wave: non-synchronous, dipole-like vertical oscillations

Authors: Zhi-Kai Zhu, Min Fang, Zu-Jia Lu, Junzhi Wang, Guang-Xing Li, Shiyu Zhang, Veli-Matti Pelkonen, Paolo Padoan, En-Wei Liang

Abstract: The kinematic structure of the Radcliffe Wave (RW) is crucial for understanding its origin and evolution. In this work, we present an accurate measurement of the vertical velocity $V_Z$ by where the radial velocity (RV) measures are taken into consideration. This is achieved in two ways. First, the velocities are measured towards Young Stellar Objects (YSOs), using their RV and proper motion measu… ▽ More The kinematic structure of the Radcliffe Wave (RW) is crucial for understanding its origin and evolution. In this work, we present an accurate measurement of the vertical velocity $V_Z$ by where the radial velocity (RV) measures are taken into consideration. This is achieved in two ways. First, the velocities are measured towards Young Stellar Objects (YSOs), using their RV and proper motion measurements from APOGEE-2 and Gaia DR3. Second, we combine RV measurements toward clouds with proper motion measurements of associated YSOs to determine the vertical velocities of the clouds. The results reveal that the oscillations in $V_Z$ are not synchronous with the vertical coordinate. The difference is caused by a combination of the effect of the radial velocity which we include in this paper, and the difference in models. By supplementing our analysis with additional young star samples, we find a consistent dipole pattern in $V_Z$. The fact that no significant amplitude differences are found among the analyzed samples indicates that there is no apparent age gradient within the dipole. We propose that RW evolves at a relatively slow rate. The fact that it will take a much longer time for RW to complete a full period compared to the cloud lifetimes challenges its classification as a traditional "wave". This age discrepancy should explain the phase difference, and non-synchronous oscillation found in kinematic studies. △ Less

Submitted 23 February, 2024; originally announced February 2024.

Comments: 13 pages, 9 figures, submitted on 6 Feb 2024

arXiv:2402.14849 [pdf]

Asynchronous and Segmented Bidirectional Encoding for NMT

Authors: Jingpu Yang, Zehua Han, Mengyu Xiang, Helin Wang, Yuxiao Huang, Miao Fang

Abstract: With the rapid advancement of Neural Machine Translation (NMT), enhancing translation efficiency and quality has become a focal point of research. Despite the commendable performance of general models such as the Transformer in various aspects, they still fall short in processing long sentences and fully leveraging bidirectional contextual information. This paper introduces an improved model based… ▽ More With the rapid advancement of Neural Machine Translation (NMT), enhancing translation efficiency and quality has become a focal point of research. Despite the commendable performance of general models such as the Transformer in various aspects, they still fall short in processing long sentences and fully leveraging bidirectional contextual information. This paper introduces an improved model based on the Transformer, implementing an asynchronous and segmented bidirectional decoding strategy aimed at elevating translation efficiency and accuracy. Compared to traditional unidirectional translations from left-to-right or right-to-left, our method demonstrates heightened efficiency and improved translation quality, particularly in handling long sentences. Experimental results on the IWSLT2017 dataset confirm the effectiveness of our approach in accelerating translation and increasing accuracy, especially surpassing traditional unidirectional strategies in long sentence translation. Furthermore, this study analyzes the impact of sentence length on decoding outcomes and explores the model's performance in various scenarios. The findings of this research not only provide an effective encoding strategy for the NMT field but also pave new avenues and directions for future studies. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.13740 [pdf, other]

From Text to CQL: Bridging Natural Language and Corpus Search Engine

Authors: Luming Lu, Jiyuan An, Yujie Wang, Liner yang, Cunliang Kong, Zhenghao Liu, Shuo Wang, Haozhe Lin, Mingwei Fang, Yaping Huang, Erhong Yang

Abstract: Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction… ▽ More Natural Language Processing (NLP) technologies have revolutionized the way we interact with information systems, with a significant focus on converting natural language queries into formal query languages such as SQL. However, less emphasis has been placed on the Corpus Query Language (CQL), a critical tool for linguistic research and detailed analysis within text corpora. The manual construction of CQL queries is a complex and time-intensive task that requires a great deal of expertise, which presents a notable challenge for both researchers and practitioners. This paper presents the first text-to-CQL task that aims to automate the translation of natural language into CQL. We present a comprehensive framework for this task, including a specifically curated large-scale dataset and methodologies leveraging large language models (LLMs) for effective text-to-CQL task. In addition, we established advanced evaluation metrics to assess the syntactic and semantic accuracy of the generated queries. We created innovative LLM-based conversion approaches and detailed experiments. The results demonstrate the efficacy of our methods and provide insights into the complexities of text-to-CQL task. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13494 [pdf, other]

GradSafe: Detecting Jailbreak Prompts for LLMs via Safety-Critical Gradient Analysis

Authors: Yueqi Xie, Minghong Fang, Renjie Pi, Neil Gong

Abstract: Large Language Models (LLMs) face threats from jailbreak prompts. Existing methods for detecting jailbreak prompts are primarily online moderation APIs or finetuned LLMs. These strategies, however, often require extensive and resource-intensive data collection and training processes. In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of sa… ▽ More Large Language Models (LLMs) face threats from jailbreak prompts. Existing methods for detecting jailbreak prompts are primarily online moderation APIs or finetuned LLMs. These strategies, however, often require extensive and resource-intensive data collection and training processes. In this study, we propose GradSafe, which effectively detects jailbreak prompts by scrutinizing the gradients of safety-critical parameters in LLMs. Our method is grounded in a pivotal observation: the gradients of an LLM's loss for jailbreak prompts paired with compliance response exhibit similar patterns on certain safety-critical parameters. In contrast, safe prompts lead to different gradient patterns. Building on this observation, GradSafe analyzes the gradients from prompts (paired with compliance responses) to accurately detect jailbreak prompts. We show that GradSafe, applied to Llama-2 without further training, outperforms Llama Guard, despite its extensive finetuning with a large dataset, in detecting jailbreak prompts. This superior performance is consistent across both zero-shot and adaptation scenarios, as evidenced by our evaluations on ToxicChat and XSTest. The source code is available at https://github.com/xyq7/GradSafe. △ Less

Submitted 29 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Accepted to ACL 2024 Main

arXiv:2402.12208 [pdf, other]

Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models

Authors: Shengpeng Ji, Minghui Fang, Ziyue Jiang, Siqi Zheng, Qian Chen, Rongjie Huang, Jialung Zuo, Shulei Wang, Zhou Zhao

Abstract: In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs a… ▽ More In recent years, large language models have achieved significant success in generative tasks (e.g., speech cloning and audio generation) related to speech, audio, music, and other signal domains. A crucial element of these models is the discrete acoustic codecs, which serves as an intermediate representation replacing the mel-spectrogram. However, there exist several gaps between discrete codecs and downstream speech language models. Specifically, 1) most codec models are trained on only 1,000 hours of data, whereas most speech language models are trained on 60,000 hours; 2) Achieving good reconstruction performance requires the utilization of numerous codebooks, which increases the burden on downstream speech language models; 3) The initial channel of the codebooks contains excessive information, making it challenging to directly generate acoustic tokens from weakly supervised signals such as text in downstream tasks. Consequently, leveraging the characteristics of speech language models, we propose Language-Codec. In the Language-Codec, we introduce a Mask Channel Residual Vector Quantization (MCRVQ) mechanism along with improved Fourier transform structures and larger training datasets to address the aforementioned gaps. We compare our method with competing audio compression algorithms and observe significant outperformance across extensive evaluations. Furthermore, we also validate the efficiency of the Language-Codec on downstream speech language models. The source code and pre-trained models can be accessed at https://github.com/jishengpeng/languagecodec . △ Less

Submitted 27 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: We release a more powerful checkpoint in Language-Codec v3

arXiv:2402.11637 [pdf, other]

Poisoning Federated Recommender Systems with Fake Users

Authors: Ming Yin, Yichang Xu, Minghong Fang, Neil Zhenqiang Gong

Abstract: Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strat… ▽ More Federated recommendation is a prominent use case within federated learning, yet it remains susceptible to various attacks, from user to server-side vulnerabilities. Poisoning attacks are particularly notable among user-side attacks, as participants upload malicious model updates to deceive the global model, often intending to promote or demote specific targeted items. This study investigates strategies for executing promotion attacks in federated recommender systems. Current poisoning attacks on federated recommender systems often rely on additional information, such as the local training data of genuine users or item popularity. However, such information is challenging for the potential attacker to obtain. Thus, there is a need to develop an attack that requires no extra information apart from item embeddings obtained from the server. In this paper, we introduce a novel fake user based poisoning attack named PoisonFRS to promote the attacker-chosen targeted item in federated recommender systems without requiring knowledge about user-item rating data, user attributes, or the aggregation rule used by the server. Extensive experiments on multiple real-world datasets demonstrate that PoisonFRS can effectively promote the attacker-chosen targeted item to a large portion of genuine users and outperform current benchmarks that rely on additional information about the system. We further observe that the model updates from both genuine and fake users are indistinguishable within the latent space. △ Less

Submitted 18 February, 2024; originally announced February 2024.

Comments: To appear in The Web Conference 2024 (WWW '24)

arXiv:2402.07676 [pdf, other]

Statistical modelling and Bayesian inversion for a Compton imaging system: application to radioactive source localisation

Authors: Cecilia Tarpau, Ming Fang, Konstantinos C. Zygalakis, Marcelo Pereyra, Angela Di Fulvio, Yoann Altmann

Abstract: This paper presents a statistical forward model for a Compton imaging system, called Compton imager. This system, under development at the University of Illinois Urbana Champaign, is a variant of Compton cameras with a single type of sensors which can simultaneously act as scatterers and absorbers. This imager is convenient for imaging situations requiring a wide field of view. The proposed statis… ▽ More This paper presents a statistical forward model for a Compton imaging system, called Compton imager. This system, under development at the University of Illinois Urbana Champaign, is a variant of Compton cameras with a single type of sensors which can simultaneously act as scatterers and absorbers. This imager is convenient for imaging situations requiring a wide field of view. The proposed statistical forward model is then used to solve the inverse problem of estimating the location and energy of point-like sources from observed data. This inverse problem is formulated and solved in a Bayesian framework by using a Metropolis within Gibbs algorithm for the estimation of the location, and an expectation-maximization algorithm for the estimation of the energy. This approach leads to more accurate estimation when compared with the deterministic standard back-projection approach, with the additional benefit of uncertainty quantification in the low photon imaging setting. △ Less

Submitted 16 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

arXiv:2401.15419 [pdf, other]

Photoevaporation from Inner Protoplanetary Disks Confronted with Observations

Authors: Yiren Lin, Lile Wang, Min Fang, Ahmad Nemer, Jeremy Goodman

Abstract: The decades-long explorations on the dispersal of protoplanetary disks involve many debates about photoevaporation versus magnetized wind launching mechanisms. This letter argues that the observed winds originating from the inner disk ($R\lesssim 0.3$ AU) cannot be explained by the photoevaporative mechanism. Energy conservation requires the presumed photoevaporative winds to be heated to… ▽ More The decades-long explorations on the dispersal of protoplanetary disks involve many debates about photoevaporation versus magnetized wind launching mechanisms. This letter argues that the observed winds originating from the inner disk ($R\lesssim 0.3$ AU) cannot be explained by the photoevaporative mechanism. Energy conservation requires the presumed photoevaporative winds to be heated to $\gtrsim 10^5$ K when launched from inner disks. However, due to efficient thermal accommodation with dust grains and cooling processes at high densities, X-ray irradiation at energies above 1 keV cannot efficiently launch winds in the first place because of its high penetration. Some studies claiming X-ray wind launching have oversimplified the thermochemical couplings. Furthermore, heating the gas to escape velocity will over-ionize it, suppressing the species responsible for observed forbidden lines (e.g., [OI] 6300 $Å$ ). Confirmed by semi-analytic integrations of thermochemical fluid structures, such high ionizations contradict the observed emission of neutral and singly-ionized atoms from the winds originating from the inner disks. △ Less

Submitted 16 July, 2024; v1 submitted 27 January, 2024; originally announced January 2024.

Comments: 15 pages, 6 figures, re-submitted the revised version to the Astrophysical Journal

arXiv:2401.14665 [pdf, other]

PepGB: Facilitating peptide drug discovery via graph neural networks

Authors: Yipin Lei, Xu Wang, Meng Fang, Han Li, Xiang Li, Jianyang Zeng

Abstract: Peptides offer great biomedical potential and serve as promising drug candidates. Currently, the majority of approved peptide drugs are directly derived from well-explored natural human peptides. It is quite necessary to utilize advanced deep learning techniques to identify novel peptide drugs in the vast, unexplored biochemical space. Despite various in silico methods having been developed to acc… ▽ More Peptides offer great biomedical potential and serve as promising drug candidates. Currently, the majority of approved peptide drugs are directly derived from well-explored natural human peptides. It is quite necessary to utilize advanced deep learning techniques to identify novel peptide drugs in the vast, unexplored biochemical space. Despite various in silico methods having been developed to accelerate peptide early drug discovery, existing models face challenges of overfitting and lacking generalizability due to the limited size, imbalanced distribution and inconsistent quality of experimental data. In this study, we propose PepGB, a deep learning framework to facilitate peptide early drug discovery by predicting peptide-protein interactions (PepPIs). Employing graph neural networks, PepGB incorporates a fine-grained perturbation module and a dual-view objective with contrastive learning-based peptide pre-trained representation to predict PepPIs. Through rigorous evaluations, we demonstrated that PepGB greatly outperforms baselines and can accurately identify PepPIs for novel targets and peptide hits, thereby contributing to the target identification and hit discovery processes. Next, we derive an extended version, diPepGB, to tackle the bottleneck of modeling highly imbalanced data prevalent in lead generation and optimization processes. Utilizing directed edges to represent relative binding strength between two peptide nodes, diPepGB achieves superior performance in real-world assays. In summary, our proposed frameworks can serve as potent tools to facilitate peptide early drug discovery. △ Less

Submitted 26 January, 2024; originally announced January 2024.

arXiv:2401.09334 [pdf, other]

Large Language Models Are Neurosymbolic Reasoners

Authors: Meng Fang, Shilong Deng, Yudi Zhang, Zijing Shi, Ling Chen, Mykola Pechenizkiy, Jun Wang

Abstract: A wide range of real-world applications is characterized by their symbolic nature, necessitating a strong capability for symbolic reasoning. This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners. We focus on text-based games, significant benchmarks for agents with natural language capabilities, particularly in symbolic tasks like math, map reading,… ▽ More A wide range of real-world applications is characterized by their symbolic nature, necessitating a strong capability for symbolic reasoning. This paper investigates the potential application of Large Language Models (LLMs) as symbolic reasoners. We focus on text-based games, significant benchmarks for agents with natural language capabilities, particularly in symbolic tasks like math, map reading, sorting, and applying common sense in text-based worlds. To facilitate these agents, we propose an LLM agent designed to tackle symbolic challenges and achieve in-game objectives. We begin by initializing the LLM agent and informing it of its role. The agent then receives observations and a set of valid actions from the text-based games, along with a specific symbolic module. With these inputs, the LLM agent chooses an action and interacts with the game environments. Our experimental results demonstrate that our method significantly enhances the capability of LLMs as automated agents for symbolic reasoning, and our LLM agent is effective in text-based games involving symbolic tasks, achieving an average performance of 88% across all tasks. △ Less

Submitted 17 January, 2024; originally announced January 2024.

Comments: Accepted by AAAI 2024

Showing 1–50 of 404 results for author: Fang, M