subscribe to arXiv mailings

doi 10.1145/3613904.3642740

Data Cubes in Hand: A Design Space of Tangible Cubes for Visualizing 3D Spatio-Temporal Data in Mixed Reality

Authors: Shuqi He, Haonan Yao, Luyan Jiang, Kaiwen Li, Nan Xiang, Yue Li, Hai-Ning Liang, Lingyun Yu

Abstract: Tangible interfaces in mixed reality (MR) environments allow for intuitive data interactions. Tangible cubes, with their rich interaction affordances, high maneuverability, and stable structure, are particularly well-suited for exploring multi-dimensional data types. However, the design potential of these cubes is underexplored. This study introduces a design space for tangible cubes in MR, focusi… ▽ More Tangible interfaces in mixed reality (MR) environments allow for intuitive data interactions. Tangible cubes, with their rich interaction affordances, high maneuverability, and stable structure, are particularly well-suited for exploring multi-dimensional data types. However, the design potential of these cubes is underexplored. This study introduces a design space for tangible cubes in MR, focusing on interaction space, visualization space, sizes, and multiplicity. Using spatio-temporal data, we explored the interaction affordances of these cubes in a workshop (N=24). We identified unique interactions like rotating, tapping, and stacking, which are linked to augmented reality (AR) visualization commands. Integrating user-identified interactions, we created a design space for tangible-cube interactions and visualization. A prototype visualizing global health spending with small cubes was developed and evaluated, supporting both individual and combined cube manipulation. This research enhances our grasp of tangible interaction in MR, offering insights for future design and application in diverse data contexts. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.05875 [pdf, other]

Detecting quantum chaos via pseudo-entropy and negativity

Authors: Song He, Pak Hang Chris Lau, Long Zhao

Abstract: Quantum informatic quantities such as entanglement entropy are useful in detecting quantum phase transitions. Recently, a new entanglement measure called pseudo-entropy was proposed which is a generalization of the more well-known entanglement entropy. It has many nice properties and is useful in the study of post-selection measurements. In this paper, one of our goals is to explore the properties… ▽ More Quantum informatic quantities such as entanglement entropy are useful in detecting quantum phase transitions. Recently, a new entanglement measure called pseudo-entropy was proposed which is a generalization of the more well-known entanglement entropy. It has many nice properties and is useful in the study of post-selection measurements. In this paper, one of our goals is to explore the properties of pseudo-entropy and study the effectiveness of it as a quantum chaos diagnostic, i.e. as a tool to distinguish between chaotic and integrable systems. Using various variants of the SYK model, we study the signal of quantum chaos captured in the pseudo-entropy and relate it to the spectral form factor (SFF) and local operator entanglement (LOE). We also explore another quantity called the negativity of entanglement which is a useful entanglement measure for a mixed state. We generalized it to accommodate the transition matrix and called it pseudo-negativity in analogy to pseudo-entropy. We found that it also nicely captures the spectral properties of a chaotic system and hence also plays a role as a tool of quantum chaos diagnostic. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 31 pages, 12 figures

arXiv:2403.05789 [pdf, other]

ItD: Large Language Models Can Teach Themselves Induction through Deduction

Authors: Wangtao Sun, Haotian Xu, Xuanqing Yu, Pei Chen, Shizhu He, Jun Zhao, Kang Liu

Abstract: Although Large Language Models (LLMs) are showing impressive performance on a wide range of Natural Language Processing tasks, researchers have found that they still have limited ability to conduct induction. Recent works mainly adopt ``post processes'' paradigms to improve the performance of LLMs on induction (e.g., the hypothesis search & refinement methods), but their performance is still const… ▽ More Although Large Language Models (LLMs) are showing impressive performance on a wide range of Natural Language Processing tasks, researchers have found that they still have limited ability to conduct induction. Recent works mainly adopt ``post processes'' paradigms to improve the performance of LLMs on induction (e.g., the hypothesis search & refinement methods), but their performance is still constrained by the inherent inductive capability of the LLMs. In this paper, we propose a novel framework, Induction through Deduction (ItD), to enable the LLMs to teach themselves induction through deduction. The ItD framework is composed of two main components: a Deductive Data Generation module to generate induction data and a Naive Bayesian Induction module to optimize the fine-tuning and decoding of LLMs. Our empirical results showcase the effectiveness of ItD on two induction benchmarks, achieving relative performance improvement of 36% and 10% compared with previous state-of-the-art, respectively. Our ablation study verifies the effectiveness of two key modules of ItD. We also verify the effectiveness of ItD across different LLMs and deductors. The data and code of this paper can be found at https://anonymous.4open.science/r/ItD-E844. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.05130 [pdf, other]

From Chain to Tree: Refining Chain-like Rules into Tree-like Rules on Knowledge Graphs

Authors: Wangtao Sun, Shizhu He, Jun Zhao, Kang Liu

Abstract: With good explanatory power and controllability, rule-based methods play an important role in many tasks such as knowledge reasoning and decision support. However, existing studies primarily focused on learning chain-like rules, which limit their semantic expressions and accurate prediction abilities. As a result, chain-like rules usually fire on the incorrect grounding values, producing inaccurat… ▽ More With good explanatory power and controllability, rule-based methods play an important role in many tasks such as knowledge reasoning and decision support. However, existing studies primarily focused on learning chain-like rules, which limit their semantic expressions and accurate prediction abilities. As a result, chain-like rules usually fire on the incorrect grounding values, producing inaccurate or even erroneous reasoning results. In this paper, we propose the concept of tree-like rules on knowledge graphs to expand the application scope and improve the reasoning ability of rule-based methods. Meanwhile, we propose an effective framework for refining chain-like rules into tree-like rules. Experimental comparisons on four public datasets show that the proposed framework can easily adapt to other chain-like rule induction methods and the refined tree-like rules consistently achieve better performances than chain-like rules on link prediction. The data and code of this paper can be available at https://anonymous.4open.science/r/tree-rule-E3CD/. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.04651 [pdf, other]

Cedar: A New Language for Expressive, Fast, Safe, and Analyzable Authorization (Extended Version)

Authors: Joseph W. Cutler, Craig Disselkoen, Aaron Eline, Shaobo He, Kyle Headley, Michael Hicks, Kesha Hietala, Eleftherios Ioannidis, John Kastner, Anwar Mamat, Darin McAdams, Matt McCutchen, Neha Rungta, Emina Torlak, Andrew Wells

Abstract: Cedar is a new authorization policy language designed to be ergonomic, fast, safe, and analyzable. Rather than embed authorization logic in an application's code, developers can write that logic as Cedar policies and delegate access decisions to Cedar's evaluation engine. Cedar's simple and intuitive syntax supports common authorization use-cases with readable policies, naturally leveraging concep… ▽ More Cedar is a new authorization policy language designed to be ergonomic, fast, safe, and analyzable. Rather than embed authorization logic in an application's code, developers can write that logic as Cedar policies and delegate access decisions to Cedar's evaluation engine. Cedar's simple and intuitive syntax supports common authorization use-cases with readable policies, naturally leveraging concepts from role-based, attribute-based, and relation-based access control models. Cedar's policy structure enables access requests to be decided quickly. Cedar's policy validator leverages optional typing to help policy writers avoid mistakes, but not get in their way. Cedar's design has been finely balanced to allow for a sound and complete logical encoding, which enables precise policy analysis, e.g., to ensure that when refactoring a set of policies, the authorized permissions do not change. We have modeled Cedar in the Lean programming language, and used Lean's proof assistant to prove important properties of Cedar's design. We have implemented Cedar in Rust, and released it open-source. Comparing Cedar to two open-source languages, OpenFGA and Rego, we find (subjectively) that Cedar has equally or more readable policies, but (objectively) performs far better. △ Less

Submitted 8 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2403.00127 [pdf]

Prompting ChatGPT for Translation: A Comparative Analysis of Translation Brief and Persona Prompts

Authors: Sui He

Abstract: Prompt engineering has shown potential for improving translation quality in LLMs. However, the possibility of using translation concepts in prompt design remains largely underexplored. Against this backdrop, the current paper discusses the effectiveness of incorporating the conceptual tool of translation brief and the personas of translator and author into prompt design for translation tasks in Ch… ▽ More Prompt engineering has shown potential for improving translation quality in LLMs. However, the possibility of using translation concepts in prompt design remains largely underexplored. Against this backdrop, the current paper discusses the effectiveness of incorporating the conceptual tool of translation brief and the personas of translator and author into prompt design for translation tasks in ChatGPT. Findings suggest that, although certain elements are constructive in facilitating human-to-human communication for translation tasks, their effectiveness is limited for improving translation quality in ChatGPT. This accentuates the need for explorative research on how translation theorists and practitioners can develop the current set of conceptual tools rooted in the human-to-human communication paradigm for translation purposes in this emerging workflow involving human-machine interaction, and how translation concepts developed in translation studies can inform the training of GPT models for translation tasks. △ Less

Submitted 28 April, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.15986 [pdf, other]

Photoassociation of multiple cold molecules in a dipole trap

Authors: Li Li, Yi-Jia Liu, Xiao-Long Zhou, Ze-Min Shen, Si-Jian He, Zhao-Di Liu, Jian Wang

Abstract: The generation of cold molecules is a core topic in the field of cold atoms and molecules, which has advanced relevant research like ultracold chemistry, quantum computation, and quantum metrology. With high atomic phase space density, optical dipole trap has been widely performed to prepare and trap cold molecules, and can also be further developed for multiple cold molecule formation and dynamic… ▽ More The generation of cold molecules is a core topic in the field of cold atoms and molecules, which has advanced relevant research like ultracold chemistry, quantum computation, and quantum metrology. With high atomic phase space density, optical dipole trap has been widely performed to prepare and trap cold molecules, and can also be further developed for multiple cold molecule formation and dynamics study. In this work, Rb2 molecules are photoassociated in the magneto-optical trap to obtain precise rovibrational spectroscopy, which provides accurate numerical references for multiple photoassociations. By achieving the harsh requirements of photoassociation in the optical dipole trap, the cold molecule photoassociation process is well explored, and different rovibrational cold molecules are formed in the optical dipole trap for the first time. This method can be universally extended to simultaneously photoassociate various molecules with different internal states or atomic species in just one optical dipole trap, and then advance generous cold molecule research such as cold molecule collision dynamics. △ Less

Submitted 24 February, 2024; originally announced February 2024.

Comments: 6 pages, 5 figures

arXiv:2402.15627 [pdf, other]

MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs

Authors: Ziheng Jiang, Haibin Lin, Yinmin Zhong, Qi Huang, Yangrui Chen, Zhi Zhang, Yanghua Peng, Xiang Li, Cong Xie, Shibiao Nong, Yulu Jia, Sun He, Hongmin Chen, Zhihao Bai, Qi Hou, Shipeng Yan, Ding Zhou, Yiyao Sheng, Zhuo Jiang, Haohan Xu, Haoran Wei, Zhang Zhang, Pengfei Nie, Leqi Zou, Sida Zhao , et al. (7 additional authors not shown)

Abstract: We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model bl… ▽ More We present the design, implementation and engineering experience in building and deploying MegaScale, a production system for training large language models (LLMs) at the scale of more than 10,000 GPUs. Training LLMs at this scale brings unprecedented challenges to training efficiency and stability. We take a full-stack approach that co-designs the algorithmic and system components across model block and optimizer design, computation and communication overlapping, operator optimization, data pipeline, and network performance tuning. Maintaining high efficiency throughout the training process (i.e., stability) is an important consideration in production given the long extent of LLM training jobs. Many hard stability issues only emerge at large scale, and in-depth observability is the key to address them. We develop a set of diagnosis tools to monitor system components and events deep in the stack, identify root causes, and derive effective techniques to achieve fault tolerance and mitigate stragglers. MegaScale achieves 55.2% Model FLOPs Utilization (MFU) when training a 175B LLM model on 12,288 GPUs, improving the MFU by 1.34x compared to Megatron-LM. We share our operational experience in identifying and fixing failures and stragglers. We hope by articulating the problems and sharing our experience from a systems perspective, this work can inspire future LLM systems research. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.14225 [pdf, other]

SICRN: Advancing Speech Enhancement through State Space Model and Inplace Convolution Techniques

Authors: Changjiang Zhao, Shulin He, Xueliang Zhang

Abstract: Speech enhancement aims to improve speech quality and intelligibility, especially in noisy environments where background noise degrades speech signals. Currently, deep learning methods achieve great success in speech enhancement, e.g. the representative convolutional recurrent neural network (CRN) and its variants. However, CRN typically employs consecutive downsampling and upsampling convolution… ▽ More Speech enhancement aims to improve speech quality and intelligibility, especially in noisy environments where background noise degrades speech signals. Currently, deep learning methods achieve great success in speech enhancement, e.g. the representative convolutional recurrent neural network (CRN) and its variants. However, CRN typically employs consecutive downsampling and upsampling convolution for frequency modeling, which destroys the inherent structure of the signal over frequency. Additionally, convolutional layers lacks of temporal modelling abilities. To address these issues, we propose an innovative module combing a State space model and Inplace Convolution (SIC), and to replace the conventional convolution in CRN, called SICRN. Specifically, a dual-path multidimensional State space model captures the global frequencies dependency and long-term temporal dependencies. Meanwhile, the 2D-inplace convolution is used to capture the local structure, which abandons the downsampling and upsampling. Systematic evaluations on the public INTERSPEECH 2020 DNS challenge dataset demonstrate SICRN's efficacy. Compared to strong baselines, SICRN achieves performance close to state-of-the-art while having advantages in model parameters, computations, and algorithmic delay. The proposed SICRN shows great promise for improved speech enhancement. △ Less

Submitted 21 February, 2024; originally announced February 2024.

arXiv:2402.13430 [pdf, other]

LinkSAGE: Optimizing Job Matching Using Graph Neural Networks

Authors: Ping Liu, Haichao Wei, Xiaochen Hou, Jianqiang Shen, Shihai He, Kay Qianqi Shen, Zhujun Chen, Fedor Borisyuk, Daniel Hewlett, Liang Wu, Srikant Veeraraghavan, Alex Tsun, Chengming Jiang, Wenjing Zhang

Abstract: We present LinkSAGE, an innovative framework that integrates Graph Neural Networks (GNNs) into large-scale personalized job matching systems, designed to address the complex dynamics of LinkedIns extensive professional network. Our approach capitalizes on a novel job marketplace graph, the largest and most intricate of its kind in industry, with billions of nodes and edges. This graph is not merel… ▽ More We present LinkSAGE, an innovative framework that integrates Graph Neural Networks (GNNs) into large-scale personalized job matching systems, designed to address the complex dynamics of LinkedIns extensive professional network. Our approach capitalizes on a novel job marketplace graph, the largest and most intricate of its kind in industry, with billions of nodes and edges. This graph is not merely extensive but also richly detailed, encompassing member and job nodes along with key attributes, thus creating an expansive and interwoven network. A key innovation in LinkSAGE is its training and serving methodology, which effectively combines inductive graph learning on a heterogeneous, evolving graph with an encoder-decoder GNN model. This methodology decouples the training of the GNN model from that of existing Deep Neural Nets (DNN) models, eliminating the need for frequent GNN retraining while maintaining up-to-date graph signals in near realtime, allowing for the effective integration of GNN insights through transfer learning. The subsequent nearline inference system serves the GNN encoder within a real-world setting, significantly reducing online latency and obviating the need for costly real-time GNN infrastructure. Validated across multiple online A/B tests in diverse product scenarios, LinkSAGE demonstrates marked improvements in member engagement, relevance matching, and member retention, confirming its generalizability and practical impact. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12851 [pdf, other]

MoELoRA: Contrastive Learning Guided Mixture of Experts on Parameter-Efficient Fine-Tuning for Large Language Models

Authors: Tongxu Luo, Jiahe Lei, Fangyu Lei, Weihao Liu, Shizhu He, Jun Zhao, Kang Liu

Abstract: Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (P… ▽ More Fine-tuning is often necessary to enhance the adaptability of Large Language Models (LLM) to downstream tasks. Nonetheless, the process of updating billions of parameters demands significant computational resources and training time, which poses a substantial obstacle to the widespread application of large-scale models in various scenarios. To address this issue, Parameter-Efficient Fine-Tuning (PEFT) has emerged as a prominent paradigm in recent research. However, current PEFT approaches that employ a limited set of global parameters (such as LoRA, which adds low-rank approximation matrices to all weights) face challenges in flexibly combining different computational modules in downstream tasks. In this work, we introduce a novel PEFT method: MoELoRA. We consider LoRA as Mixture of Experts (MoE), and to mitigate the random routing phenomenon observed in MoE, we propose the utilization of contrastive learning to encourage experts to learn distinct features. We conducted experiments on 11 tasks in math reasoning and common-sense reasoning benchmarks. With the same number of parameters, our approach outperforms LoRA significantly. In math reasoning, MoELoRA achieved an average performance that was 4.2% higher than LoRA, and demonstrated competitive performance compared to the 175B GPT-3.5 on several benchmarks. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.12271 [pdf, other]

Secure Federated Learning Across Heterogeneous Cloud and High-Performance Computing Resources -- A Case Study on Federated Fine-tuning of LLaMA 2

Authors: Zilinghan Li, Shilan He, Pranshu Chaturvedi, Volodymyr Kindratenko, Eliu A Huerta, Kibaek Kim, Ravi Madduri

Abstract: Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learn… ▽ More Federated learning enables multiple data owners to collaboratively train robust machine learning models without transferring large or sensitive local datasets by only sharing the parameters of the locally trained models. In this paper, we elaborate on the design of our Advanced Privacy-Preserving Federated Learning (APPFL) framework, which streamlines end-to-end secure and reliable federated learning experiments across cloud computing facilities and high-performance computing resources by leveraging Globus Compute, a distributed function as a service platform, and Amazon Web Services. We further demonstrate the use case of APPFL in fine-tuning a LLaMA 2 7B model using several cloud resources and supercomputers. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.12219 [pdf, other]

Reformatted Alignment

Authors: Run-Ze Fan, Xuefeng Li, Haoyang Zou, Junlong Li, Shwai He, Ethan Chern, Jiewen Hu, Pengfei Liu

Abstract: The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which re… ▽ More The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper explores elevating the quality of existing instruction data to better align with human values, introducing a simple and effective approach named ReAlign, which reformats the responses of instruction data into a format that better aligns with pre-established criteria and the collated evidence. This approach minimizes human annotation, hallucination, and the difficulty in scaling, remaining orthogonal to existing alignment techniques. Experimentally, ReAlign significantly boosts the general alignment ability, math reasoning, factuality, and readability of the LLMs. Encouragingly, without introducing any additional data or advanced training techniques, and merely by reformatting the response, LLaMA-2-13B's mathematical reasoning ability on GSM8K can be improved from 46.77% to 56.63% in accuracy. Additionally, a mere 5% of ReAlign data yields a 67% boost in general alignment ability measured by the Alpaca dataset. This work highlights the need for further research into the science and mechanistic interpretability of LLMs. We have made the associated code and data publicly accessible to support future studies at https://github.com/GAIR-NLP/ReAlign. △ Less

Submitted 17 April, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: Homepage: https://gair-nlp.github.io/ReAlign/

arXiv:2402.12099

Human Video Translation via Query Warping

Authors: Haiming Zhu, Yangyang Xu, Shengfeng He

Abstract: In this paper, we present QueryWarp, a novel framework for temporally coherent human motion video translation. Existing diffusion-based video editing approaches that rely solely on key and value tokens to ensure temporal consistency, which scarifies the preservation of local and structural regions. In contrast, we aim to consider complementary query priors by constructing the temporal correlations… ▽ More In this paper, we present QueryWarp, a novel framework for temporally coherent human motion video translation. Existing diffusion-based video editing approaches that rely solely on key and value tokens to ensure temporal consistency, which scarifies the preservation of local and structural regions. In contrast, we aim to consider complementary query priors by constructing the temporal correlations among query tokens from different frames. Initially, we extract appearance flows from source poses to capture continuous human foreground motion. Subsequently, during the denoising process of the diffusion model, we employ appearance flows to warp the previous frame's query token, aligning it with the current frame's query. This query warping imposes explicit constraints on the outputs of self-attention layers, effectively guaranteeing temporally coherent translation. We perform experiments on various human motion video translation tasks, and the results demonstrate that our QueryWarp framework surpasses state-of-the-art methods both qualitatively and quantitatively. △ Less

Submitted 21 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: This is not a complete paper and the methods and results have not been updated. We decided to withdraw and make further improvements

arXiv:2402.11139 [pdf, other]

LiGNN: Graph Neural Networks at LinkedIn

Authors: Fedor Borisyuk, Shihai He, Yunbo Ouyang, Morteza Ramezani, Peng Du, Xiaochen Hou, Chengming Jiang, Nitin Pasumarthy, Priya Bannur, Birjodh Tiwana, Ping Liu, Siddharth Dangi, Daqi Sun, Zhoutao Pei, Xiao Shi, Sirou Zhu, Qianqi Shen, Kuang-Hsuan Lee, David Stein, Baolei Li, Haichao Wei, Amol Ghoting, Souvik Ghosh

Abstract: In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embedd… ▽ More In this paper, we present LiGNN, a deployed large-scale Graph Neural Networks (GNNs) Framework. We share our insight on developing and deployment of GNNs at large scale at LinkedIn. We present a set of algorithmic improvements to the quality of GNN representation learning including temporal graph architectures with long term losses, effective cold start solutions via graph densification, ID embeddings and multi-hop neighbor sampling. We explain how we built and sped up by 7x our large-scale training on LinkedIn graphs with adaptive sampling of neighbors, grouping and slicing of training data batches, specialized shared-memory queue and local gradient optimization. We summarize our deployment lessons and learnings gathered from A/B test experiments. The techniques presented in this work have contributed to an approximate relative improvements of 1% of Job application hearing back rate, 2% Ads CTR lift, 0.5% of Feed engaged daily active users, 0.2% session lift and 0.1% weekly active user lift from people recommendation. We believe that this work can provide practical solutions and insights for engineers who are interested in applying Graph neural networks at large scale. △ Less

Submitted 16 February, 2024; originally announced February 2024.

arXiv:2402.10940 [pdf, ps, other]

Neural machine translation of clinical procedure codes for medical diagnosis and uncertainty quantification

Authors: Pei-Hung Chung, Shuhan He, Norawit Kijpaisalratana, Abdel-badih el Ariss, Byung-Jun Yoon

Abstract: A Clinical Decision Support System (CDSS) is designed to enhance clinician decision-making by combining system-generated recommendations with medical expertise. Given the high costs, intensive labor, and time-sensitive nature of medical treatments, there is a pressing need for efficient decision support, especially in complex emergency scenarios. In these scenarios, where information can be limite… ▽ More A Clinical Decision Support System (CDSS) is designed to enhance clinician decision-making by combining system-generated recommendations with medical expertise. Given the high costs, intensive labor, and time-sensitive nature of medical treatments, there is a pressing need for efficient decision support, especially in complex emergency scenarios. In these scenarios, where information can be limited, an advanced CDSS framework that leverages AI (artificial intelligence) models to effectively reduce diagnostic uncertainty has utility. Such an AI-enabled CDSS framework with quantified uncertainty promises to be practical and beneficial in the demanding context of real-world medical care. In this study, we introduce the concept of Medical Entropy, quantifying uncertainties in patient outcomes predicted by neural machine translation based on the ICD-9 code of procedures. Our experimental results not only show strong correlations between procedure and diagnosis sequences based on the simple ICD-9 code but also demonstrate the promising capacity to model trends of uncertainties during hospitalizations through a data-driven approach. △ Less

Submitted 7 February, 2024; originally announced February 2024.

arXiv:2402.10464 [pdf, other]

FedKit: Enabling Cross-Platform Federated Learning for Android and iOS

Authors: Sichang He, Beilong Tang, Boyan Zhang, Jiaoqi Shao, Xiaomin Ouyang, Daniel Nata Nugraha, Bing Luo

Abstract: We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and traini… ▽ More We present FedKit, a federated learning (FL) system tailored for cross-platform FL research on Android and iOS devices. FedKit pipelines cross-platform FL development by enabling model conversion, hardware-accelerated training, and cross-platform model aggregation. Our FL workflow supports flexible machine learning operations (MLOps) in production, facilitating continuous model delivery and training. We have deployed FedKit in a real-world use case for health data analysis on university campuses, demonstrating its effectiveness. FedKit is open-source at https://github.com/FedCampus/FedKit. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: This work has been accepted for demonstration on IEEE International Conference on Computer Communications (INFOCOM) 2024

arXiv:2402.10151 [pdf, other]

ControlLM: Crafting Diverse Personalities for Language Models

Authors: Yixuan Weng, Shizhu He, Kang Liu, Shengping Liu, Jun Zhao

Abstract: As language models continue to scale in size and capability, they display an array of emerging behaviors, both beneficial and concerning. This heightens the need to control model behaviors. We hope to be able to control the personality traits of language models at the inference-time so as to have various character features, on top of which the requirements of different types of tasks can be met. P… ▽ More As language models continue to scale in size and capability, they display an array of emerging behaviors, both beneficial and concerning. This heightens the need to control model behaviors. We hope to be able to control the personality traits of language models at the inference-time so as to have various character features, on top of which the requirements of different types of tasks can be met. Personality is a higher-level and more abstract behavioral representation for language models. We introduce ControlLM, which leverages differential activation patterns, derived from contrasting behavioral prompts in the model's latent space, to influence the model's personality traits at inference. This approach allows for the precise, real-time adjustment of model behavior. First, we demonstrate ControlLM's capacity to elicit diverse persona behaviors without any training, while precision control allows personality traits to closely match average human values. Subsequently, we showcase improved reasoning and question answering through selective amplification of beneficial attributes like conscientiousness and friendliness. We hope that this work will inspire research on controlling human-like behaviors of language models and provide insights for future research. Our code is publicly available at: https://github.com/wengsyx/ControlLM. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 17 pages

arXiv:2402.10110 [pdf, other]

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning

Authors: Ming Li, Lichang Chen, Jiuhai Chen, Shwai He, Jiuxiang Gu, Tianyi Zhou

Abstract: Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a no… ▽ More Instruction tuning is critical to large language models (LLMs) for achieving better instruction following and task adaptation capabilities but its success heavily relies on the training data quality. Many recent methods focus on improving the data quality but often overlook the compatibility of the data with the student model being finetuned. This paper introduces Selective Reflection-Tuning, a novel paradigm that synergizes a teacher LLM's reflection and introspection for improving existing data quality with the data selection capability of the student LLM, to automatically refine existing instruction-tuning data. This teacher-student collaboration produces high-quality and student-compatible instruction-response pairs, resulting in sample-efficient instruction tuning and LLMs of superior performance. Selective Reflection-Tuning is a data augmentation and synthesis that generally improves LLM finetuning and self-improvement without collecting brand-new data. We apply our method to Alpaca and WizardLM data and achieve much stronger and top-tier 7B and 13B LLMs. △ Less

Submitted 7 June, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

Comments: ACL2024 (findings), Camera-ready

arXiv:2402.07939 [pdf, other]

UFO: A UI-Focused Agent for Windows OS Interaction

Authors: Chaoyun Zhang, Liqun Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang

Abstract: We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications… ▽ More We introduce UFO, an innovative UI-Focused agent to fulfill user requests tailored to applications on Windows OS, harnessing the capabilities of GPT-Vision. UFO employs a dual-agent framework to meticulously observe and analyze the graphical user interface (GUI) and control information of Windows applications. This enables the agent to seamlessly navigate and operate within individual applications and across them to fulfill user requests, even when spanning multiple applications. The framework incorporates a control interaction module, facilitating action grounding without human intervention and enabling fully automated execution. Consequently, UFO transforms arduous and time-consuming processes into simple tasks achievable solely through natural language commands. We conducted testing of UFO across 9 popular Windows applications, encompassing a variety of scenarios reflective of users' daily usage. The results, derived from both quantitative metrics and real-case studies, underscore the superior effectiveness of UFO in fulfilling user requests. To the best of our knowledge, UFO stands as the first UI agent specifically tailored for task completion within the Windows OS environment. The open-source code for UFO is available on https://github.com/microsoft/UFO. △ Less

Submitted 23 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05798 [pdf, other]

Visual Harmony: Text-Visual Interplay in Circular Infographics

Authors: Shuqi He, Yuqing Chen, Yuxin Xia, Yichun Li, Hai-Ning Liang, Lingyun Yu

Abstract: Infographics are visual representations designed for efficient and effective communication of data and knowledge. One crucial aspect of infographic design is the interplay between text and visual elements, particularly in circular visualizations where the textual descriptions can either be embedded within the graphics or placed adjacent to the visual representation. While several studies have exam… ▽ More Infographics are visual representations designed for efficient and effective communication of data and knowledge. One crucial aspect of infographic design is the interplay between text and visual elements, particularly in circular visualizations where the textual descriptions can either be embedded within the graphics or placed adjacent to the visual representation. While several studies have examined text layout design in visualizations in general, the text-visual interplay in infographics and its subsequent perceptual effects remain underexplored. To address this, our study investigates how varying text placement and descriptiveness impact pleasantness, comprehension and overall memorability in the infographics viewing experience. We recruited 30 participants and presented them with a collection of 15 infographics across a diverse set of topics, including media and public events, health and nutrition, science and research, and sustainability. The text placement (embed, side-to-side) and descriptiveness (simplistic, normal, descriptive) were systematically manipulated, resulting in a total of six experimental conditions. Our key findings indicate that text placement can significantly influence the memorability of infographics, whereas descriptiveness can significantly impact the pleasantness of the viewing experience. Embedding text placement and simplistic text can potentially contribute to more effective infographic designs. These results offer valuable insights for infographic designers, contributing to the creation of more effective and memorable visual representations. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.05438 [pdf, other]

Penalized spline estimation of principal components for sparse functional data: rates of convergence

Authors: Shiyuan He, Jianhua Z. Huang, Kejun He

Abstract: This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functi… ▽ More This paper gives a comprehensive treatment of the convergence rates of penalized spline estimators for simultaneously estimating several leading principal component functions, when the functional data is sparsely observed. The penalized spline estimators are defined as the solution of a penalized empirical risk minimization problem, where the loss function belongs to a general class of loss functions motivated by the matrix Bregman divergence, and the penalty term is the integrated squared derivative. The theory reveals that the asymptotic behavior of penalized spline estimators depends on the interesting interplay between several factors, i.e., the smoothness of the unknown functions, the spline degree, the spline knot number, the penalty order, and the penalty parameter. The theory also classifies the asymptotic behavior into seven scenarios and characterizes whether and how the minimax optimal rates of convergence are achievable in each scenario. △ Less

Submitted 8 February, 2024; originally announced February 2024.

arXiv:2402.01723 [pdf, other]

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

Authors: Zongjie Li, Wenying Qiu, Pingchuan Ma, Yichen Li, You Li, Sijia He, Baozheng Jiang, Shuai Wang, Weixi Gu

Abstract: Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment… ▽ More Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment in industrial production by enterprises and users in those sectors. However, the accuracy and robustness of LLMs in industrial scenarios have not been well studied. In this paper, we present a comprehensive empirical study on the accuracy and robustness of LLMs in the context of the Chinese industrial production area. We manually collected 1,200 domain-specific problems from 8 different industrial sectors to evaluate LLM accuracy. Furthermore, we designed a metamorphic testing framework containing four industrial-specific stability categories with eight abilities, totaling 13,631 questions with variants to evaluate LLM robustness. In total, we evaluated 9 different LLMs developed by Chinese vendors, as well as four different LLMs developed by global vendors. Our major findings include: (1) Current LLMs exhibit low accuracy in Chinese industrial contexts, with all LLMs scoring less than 0.6. (2) The robustness scores vary across industrial sectors, and local LLMs overall perform worse than global ones. (3) LLM robustness differs significantly across abilities. Global LLMs are more robust under logical-related variants, while advanced local LLMs perform better on problems related to understanding Chinese industrial terminology. Our study results provide valuable guidance for understanding and promoting the industrial domain capabilities of LLMs from both development and industrial enterprise perspectives. The results further motivate possible research directions and tooling support. △ Less

Submitted 26 January, 2024; originally announced February 2024.

arXiv:2402.00530 [pdf, other]

Superfiltering: Weak-to-Strong Data Filtering for Fast Instruction-Tuning

Authors: Ming Li, Yong Zhang, Shwai He, Zhitao Li, Hongyu Zhao, Jianzong Wang, Ning Cheng, Tianyi Zhou

Abstract: Instruction tuning is critical to improve LLMs but usually suffers from low-quality and redundant data. Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process. But it also leads to extra cost and computation due to the involvement of LLMs in this process. To reduce the filtering cost, we study Superfiltering: Can we use a s… ▽ More Instruction tuning is critical to improve LLMs but usually suffers from low-quality and redundant data. Data filtering for instruction tuning has proved important in improving both the efficiency and performance of the tuning process. But it also leads to extra cost and computation due to the involvement of LLMs in this process. To reduce the filtering cost, we study Superfiltering: Can we use a smaller and weaker model to select data for finetuning a larger and stronger model? Despite the performance gap between weak and strong language models, we find their highly consistent capability to perceive instruction difficulty and data selection results. This enables us to use a much smaller and more efficient model to filter the instruction data used to train a larger language model. Not only does it largely speed up the data filtering, but the filtered-data-finetuned LLM achieves even better performance on standard benchmarks. Extensive experiments validate the efficacy and efficiency of our approach. △ Less

Submitted 7 June, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

Comments: ACL2024 main, Camera-ready

arXiv:2402.00268 [pdf, other]

Relation between timelike and spacelike entanglement entropy

Authors: Wu-zhong Guo, Song He, Yu-Xuan Zhang

Abstract: In this study, we establish a connection between timelike and spacelike entanglement entropy. Specifically, for a diverse range of states, the timelike entanglement entropy is uniquely determined by a linear combination of the spacelike entanglement entropy and its first-order temporal derivative. This framework reveals that the imaginary component of the timelike entanglement entropy primarily or… ▽ More In this study, we establish a connection between timelike and spacelike entanglement entropy. Specifically, for a diverse range of states, the timelike entanglement entropy is uniquely determined by a linear combination of the spacelike entanglement entropy and its first-order temporal derivative. This framework reveals that the imaginary component of the timelike entanglement entropy primarily originates from the non-commutativity between the twist operator and its first-order temporal derivative. Furthermore, we analyze the constraints of this relation and highlight the possible extension to accommodate more complex state configurations. △ Less

Submitted 31 January, 2024; originally announced February 2024.

Comments: 5+8 pages, 1 figure

arXiv:2401.15852 [pdf, ps, other]

The Spectral base and quotients of bounded symmetric domains

Authors: Siqi He, Jie Liu, Ngaiming Mok

Abstract: In this article, we explore Higgs bundles on a projective manifold $X$, focusing on their spectral bases, a concept introduced by T.Chen and B.Ngô. The spectral base is a specific closed subscheme within the space of symmetric differentials. We observe that if the spectral base vanishes, then any reductive representation $ρ: π_1(X) \to \text{GL}_r(\mathbb{C})$ is both rigid and integral. Additiona… ▽ More In this article, we explore Higgs bundles on a projective manifold $X$, focusing on their spectral bases, a concept introduced by T.Chen and B.Ngô. The spectral base is a specific closed subscheme within the space of symmetric differentials. We observe that if the spectral base vanishes, then any reductive representation $ρ: π_1(X) \to \text{GL}_r(\mathbb{C})$ is both rigid and integral. Additionally, we prove that for $X=Ω/Γ$, a quotient of a bounded symmetric domain $Ω$ of rank at least $2$ by a torsion-free cocompact irreducible lattice $Γ$, the spectral base indeed vanishes, which generalizes a result of B.Klingler. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: 21 pages

MSC Class: 14J60; 53C35

arXiv:2401.15123 [pdf, other]

Large Language Model Guided Knowledge Distillation for Time Series Anomaly Detection

Authors: Chen Liu, Shibo He, Qihang Zhou, Shizhong Li, Wenchao Meng

Abstract: Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge… ▽ More Self-supervised methods have gained prominence in time series anomaly detection due to the scarcity of available annotations. Nevertheless, they typically demand extensive training data to acquire a generalizable representation map, which conflicts with scenarios of a few available samples, thereby limiting their performance. To overcome the limitation, we propose \textbf{AnomalyLLM}, a knowledge distillation-based time series anomaly detection approach where the student network is trained to mimic the features of the large language model (LLM)-based teacher network that is pretrained on large-scale datasets. During the testing phase, anomalies are detected when the discrepancy between the features of the teacher and student networks is large. To circumvent the student network from learning the teacher network's feature of anomalous samples, we devise two key strategies. 1) Prototypical signals are incorporated into the student network to consolidate the normal feature extraction. 2) We use synthetic anomalies to enlarge the representation gap between the two networks. AnomalyLLM demonstrates state-of-the-art performance on 15 datasets, improving accuracy by at least 14.5\% in the UCR dataset. △ Less

Submitted 26 January, 2024; originally announced January 2024.

Comments: 12 pages, 5 figures

arXiv:2401.13714 [pdf, other]

Value-Driven Mixed-Precision Quantization for Patch-Based Inference on Microcontrollers

Authors: Wei Tao, Shenglin He, Kai Lu, Xiaoyang Qu, Guokuan Li, Jiguang Wan, Jianzong Wang, Jing Xiao

Abstract: Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory without sacrificing model accuracy. However, this technique suffers from severe redundant computation overhead, leading to a substantial increase in execution lat… ▽ More Deploying neural networks on microcontroller units (MCUs) presents substantial challenges due to their constrained computation and memory resources. Previous researches have explored patch-based inference as a strategy to conserve memory without sacrificing model accuracy. However, this technique suffers from severe redundant computation overhead, leading to a substantial increase in execution latency. A feasible solution to address this issue is mixed-precision quantization, but it faces the challenges of accuracy degradation and a time-consuming search time. In this paper, we propose QuantMCU, a novel patch-based inference method that utilizes value-driven mixed-precision quantization to reduce redundant computation. We first utilize value-driven patch classification (VDPC) to maintain the model accuracy. VDPC classifies patches into two classes based on whether they contain outlier values. For patches containing outlier values, we apply 8-bit quantization to the feature maps on the dataflow branches that follow. In addition, for patches without outlier values, we utilize value-driven quantization search (VDQS) on the feature maps of their following dataflow branches to reduce search time. Specifically, VDQS introduces a novel quantization search metric that takes into account both computation and accuracy, and it employs entropy as an accuracy representation to avoid additional training. VDQS also adopts an iterative approach to determine the bitwidth of each feature map to further accelerate the search process. Experimental results on real-world MCU devices show that QuantMCU can reduce computation by 2.2x on average while maintaining comparable model accuracy compared to the state-of-the-art patch-based inference methods. △ Less

Submitted 23 January, 2024; originally announced January 2024.

Comments: Accepted by the 27th Design, Automation and Test in Europe Conference (DATE 2024)

arXiv:2401.11235 [pdf, other]

TreeMIL: A Multi-instance Learning Framework for Time Series Anomaly Detection with Inexact Supervision

Authors: Chen Liu, Shibo He, Haoyu Liu, Shizhong Li

Abstract: Time series anomaly detection (TSAD) plays a vital role in various domains such as healthcare, networks, and industry. Considering labels are crucial for detection but difficult to obtain, we turn to TSAD with inexact supervision: only series-level labels are provided during the training phase, while point-level anomalies are predicted during the testing phase. Previous works follow a traditional… ▽ More Time series anomaly detection (TSAD) plays a vital role in various domains such as healthcare, networks, and industry. Considering labels are crucial for detection but difficult to obtain, we turn to TSAD with inexact supervision: only series-level labels are provided during the training phase, while point-level anomalies are predicted during the testing phase. Previous works follow a traditional multi-instance learning (MIL) approach, which focuses on encouraging high anomaly scores at individual time steps. However, time series anomalies are not only limited to individual point anomalies, they can also be collective anomalies, typically exhibiting abnormal patterns over subsequences. To address the challenge of collective anomalies, in this paper, we propose a tree-based MIL framework (TreeMIL). We first adopt an N-ary tree structure to divide the entire series into multiple nodes, where nodes at different levels represent subsequences with different lengths. Then, the subsequence features are extracted to determine the presence of collective anomalies. Finally, we calculate point-level anomaly scores by aggregating features from nodes at different levels. Experiments conducted on seven public datasets and eight baselines demonstrate that TreeMIL achieves an average 32.3% improvement in F1- score compared to previous state-of-the-art methods. The code is available at https://github.com/fly-orange/TreeMIL. △ Less

Submitted 20 January, 2024; originally announced January 2024.

Comments: This paper has been accepted by IEEE ICASSP 2024

arXiv:2401.09991 [pdf, ps, other]

doi 10.1007/JHEP04(2024)138

Irrelevant and marginal deformed BMS field theories

Authors: Song He, Xin-Cheng Mao

Abstract: In this study, we investigate various deformations within the framework of Bondi-van der Burg-Metzner-Sachs invariant field theory (BMSFT). Specifically, we explore the impact of Bondi-van der Burg-Metzner-Sachs (BMS) symmetry on the theory by introducing key deformations, namely, $T \overline{T}$, $JT_μ$, and $\sqrt{T \overline{T}}$ deformations. In the context of generic seed theories possessing… ▽ More In this study, we investigate various deformations within the framework of Bondi-van der Burg-Metzner-Sachs invariant field theory (BMSFT). Specifically, we explore the impact of Bondi-van der Burg-Metzner-Sachs (BMS) symmetry on the theory by introducing key deformations, namely, $T \overline{T}$, $JT_μ$, and $\sqrt{T \overline{T}}$ deformations. In the context of generic seed theories possessing BMS symmetry, we derive the first-order correction of correlation functions using the systematic application of BMS symmetry ward identities. However, it is worth noting that higher-order corrections are intricately dependent on the specific characteristics of the seed theories. To illustrate our findings, we select the BMS free scalar and free fermion as representative seed theories. We then proceed to analytically determine the deformed action by solving the nontrivial flow equations. Additionally, we extend our analysis to include second-order deformations within these deformed theories. △ Less

Submitted 27 March, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

Comments: 54 pages, 0 figure

Journal ref: JHEP 04 (2024) 138

arXiv:2401.05483 [pdf, other]

NLSM $\subset$ Tr$(φ^3)$

Authors: Nima Arkani-Hamed, Qu Cao, Jin Dong, Carolina Figueiredo, Song He

Abstract: Scattering amplitudes for the simplest theory of colored scalar particles - the Tr($Φ^3$) theory - have recently been the subject of active investigations. In this letter we describe an unanticipated wider implication of this work: the Tr($Φ^3$) theory secretly contains Non-linear Sigma Model (NLSM) amplitudes to all loop orders. The NLSM amplitudes are obtained from Tr$(Φ^3)$ amplitudes by a uniq… ▽ More Scattering amplitudes for the simplest theory of colored scalar particles - the Tr($Φ^3$) theory - have recently been the subject of active investigations. In this letter we describe an unanticipated wider implication of this work: the Tr($Φ^3$) theory secretly contains Non-linear Sigma Model (NLSM) amplitudes to all loop orders. The NLSM amplitudes are obtained from Tr$(Φ^3)$ amplitudes by a unique shift of kinematic variables. We show that this shifted kinematics produces amplitudes for a cubic theory with a linear term in potential, with extrema spontaneously breaking $U(N) \to U(N-k) \times U(k)$. The Goldstone amplitudes for this theory coincide with those of pions in the $U(N) \times U(N) \to U(N)$ chiral Lagrangian to all orders in the planar limit. We also give a purely on-shell understanding of this correspondence, showing integrands defined by the kinematic shifts have the correct residues on poles and appropriately produce the Adler zero. Finally, we discuss how similar kinematic shifts produce certain infinite classes of mixed amplitudes of pions and Tr($Φ^3$) scalars, most of which are not interpretable from the Lagrangian description. △ Less

Submitted 15 April, 2024; v1 submitted 10 January, 2024; originally announced January 2024.

Comments: 10 pages, 13 figures. Addition of new material including a derivation of the results from a simple Lagrangian, identifying the symmetry breaking pattern, as well as some further discussions

arXiv:2401.04723 [pdf, other]

Spatio-temporal data fusion for the analysis of in situ and remote sensing data using the INLA-SPDE approach

Authors: Shiyu He, Samuel W. K. Wong

Abstract: We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. T… ▽ More We propose a Bayesian hierarchical model to address the challenge of spatial misalignment in spatio-temporal data obtained from in situ and satellite sources. The model is fit using the INLA-SPDE approach, which provides efficient computation. Our methodology combines the different data sources in a "fusion"" model via the construction of projection matrices in both spatial and temporal domains. Through simulation studies, we demonstrate that the fusion model has superior performance in prediction accuracy across space and time compared to standalone "in situ" and "satellite" models based on only in situ or satellite data, respectively. The fusion model also generally outperforms the standalone models in terms of parameter inference. Such a modeling approach is motivated by environmental problems, and our specific focus is on the analysis and prediction of harmful algae bloom (HAB) events, where the convention is to conduct separate analyses based on either in situ samples or satellite images. A real data analysis shows that the proposed model is a necessary step towards a unified characterization of bloom dynamics and identifying the key drivers of HAB events. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: 23 pages, 7 figures

arXiv:2401.02880 [pdf, other]

Lotto: Secure Participant Selection against Adversarial Servers in Federated Learning

Authors: Zhifeng Jiang, Peng Ye, Shiqi He, Wei Wang, Ruichuan Chen, Bo Li

Abstract: In Federated Learning (FL), common privacy-enhancing techniques, such as secure aggregation and distributed differential privacy, rely on the critical assumption of an honest majority among participants to withstand various attacks. In practice, however, servers are not always trusted, and an adversarial server can strategically select compromised clients to create a dishonest majority, thereby un… ▽ More In Federated Learning (FL), common privacy-enhancing techniques, such as secure aggregation and distributed differential privacy, rely on the critical assumption of an honest majority among participants to withstand various attacks. In practice, however, servers are not always trusted, and an adversarial server can strategically select compromised clients to create a dishonest majority, thereby undermining the system's security guarantees. In this paper, we present Lotto, an FL system that addresses this fundamental, yet underexplored issue by providing secure participant selection against an adversarial server. Lotto supports two selection algorithms: random and informed. To ensure random selection without a trusted server, Lotto enables each client to autonomously determine their participation using verifiable randomness. For informed selection, which is more vulnerable to manipulation, Lotto approximates the algorithm by employing random selection within a refined client pool. Our theoretical analysis shows that Lotto effectively aligns the proportion of server-selected compromised participants with the base rate of dishonest clients in the population. Large-scale experiments further reveal that Lotto achieves time-to-accuracy performance comparable to that of insecure selection methods, indicating a low computational overhead for secure selection. △ Less

Submitted 6 March, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: This article has been accepted to USENIX Security '24

arXiv:2401.01718 [pdf]

RHDLPP: A multigroup radiation hydrodynamics code for laser-produced plasmas

Authors: Qi Min, Ziyang Xu, Siqi He, Haidong Lu, Xingbang Liu, Ruizi Shen, Yanhong Wu, Qikun Pan, Chongxiao Zhao, Fei Chen, Maogen Su, Chenzhong Dong

Abstract: We introduce the RHDLPP, a flux-limited multigroup radiation hydrodynamics numerical code designed for simulating laser-produced plasmas in diverse environments. The code bifurcates into two packages: RHDLPP-LTP for low-temperature plasmas generated by moderate-intensity nanosecond lasers, and RHDLPP-HTP for high-temperature, high-density plasmas formed by high-intensity laser pulses. The core rad… ▽ More We introduce the RHDLPP, a flux-limited multigroup radiation hydrodynamics numerical code designed for simulating laser-produced plasmas in diverse environments. The code bifurcates into two packages: RHDLPP-LTP for low-temperature plasmas generated by moderate-intensity nanosecond lasers, and RHDLPP-HTP for high-temperature, high-density plasmas formed by high-intensity laser pulses. The core radiation hydrodynamic equations are resolved in the Eulerian frame, employing an operator-split method. This method decomposes the solution into two substeps: first, the explicit resolution of the hyperbolic subsystems integrating radiation and fluid dynamics, and second, the implicit treatment of the parabolic part comprising stiff radiation diffusion, heat conduction, and energy exchange. Laser propagation and energy deposition are modeled through a hybrid approach, combining geometrical optics ray-tracing in sub-critical plasma regions with a one-dimensional solution of the Helmholtz wave equation in super-critical areas. The thermodynamic states are ascertained using an equation of state, based on either the real gas approximation or the quotidian equation of state (QEOS). Additionally, RHDLPP includes RHDLPP-SpeIma3D, a three-dimensional spectral simulation post-processing module, for generating both temporally-spatially resolved and time-integrated spectra and imaging, facilitating direct comparisons with experimental data. The paper showcases a series of verification tests to establish the code's accuracy and efficiency, followed by application cases, including simulations of laser-produced aluminum (Al) plasmas, pre-pulse-induced target deformation of tin (Sn) microdroplets relevant to extreme ultraviolet lithography light sources, and varied imaging and spectroscopic simulations. △ Less

Submitted 3 January, 2024; originally announced January 2024.

arXiv:2401.00667 [pdf, other]

Channelling Multimodality Through a Unimodalizing Transport: Warp-U Sampler and Stochastic Bridge Sampling

Authors: Fei Ding, David E. Jones, Shiyuan He, Xiao-Li Meng

Abstract: Monte Carlo integration is fundamental in scientific and statistical computation, but requires reliable samples from the target distribution, which poses a substantial challenge in the case of multi-modal distributions. Existing methods often involve time-consuming tuning, and typically lack tailored estimators for efficient use of the samples. This paper adapts the Warp-U transformation [Wang et… ▽ More Monte Carlo integration is fundamental in scientific and statistical computation, but requires reliable samples from the target distribution, which poses a substantial challenge in the case of multi-modal distributions. Existing methods often involve time-consuming tuning, and typically lack tailored estimators for efficient use of the samples. This paper adapts the Warp-U transformation [Wang et al., 2022] to form multi-modal sampling strategy called Warp-U sampling. It constructs a stochastic map to transport a multi-modal density into a uni-modal one, and subsequently inverts the transport but with new stochasticity injected. For efficient use of the samples for normalising constant estimation, we propose (i) an unbiased estimation scheme based coupled chains, where the Warp-U sampling is used to reduce the coupling time; and (ii) a stochastic Warp-U bridge sampling estimator, which improves its deterministic counterpart given in Wang et al. [2022]. Our overall approach requires less tuning and is easier to apply than common alternatives. Theoretically, we establish the ergodicity of our sampling algorithm and that our stochastic Warp-U bridge sampling estimator has greater (asymptotic) precision per CPU second compared to the Warp-U bridge estimator of Wang et al. [2022] under practical conditions. The advantages and current limitations of our approach are demonstrated through simulation studies and an application to exoplanet detection. △ Less

Submitted 1 January, 2024; originally announced January 2024.

arXiv:2401.00041 [pdf, other]

Scalar-Scaffolded Gluons and the Combinatorial Origins of Yang-Mills Theory

Authors: Nima Arkani-Hamed, Qu Cao, Jin Dong, Carolina Figueiredo, Song He

Abstract: We present a new formulation for Yang-Mills scattering amplitudes in any number of dimensions and at any loop order, based on the same combinatorial and binary-geometric ideas in kinematic space recently used to give an all-order description of Tr $φ^3$ theory. We propose that in a precise sense the amplitudes for a suitably "stringy" form of these two theories are identical, up to a simple shift… ▽ More We present a new formulation for Yang-Mills scattering amplitudes in any number of dimensions and at any loop order, based on the same combinatorial and binary-geometric ideas in kinematic space recently used to give an all-order description of Tr $φ^3$ theory. We propose that in a precise sense the amplitudes for a suitably "stringy" form of these two theories are identical, up to a simple shift of kinematic variables. This connection is made possible by describing the amplitudes for $n$ gluons via a "scalar scaffolding", arising from the scattering of $2n$ colored scalars coming in $n$ distinct pairs of flavors fusing to produce the gluons. Fundamental properties of the "$u$-variables", describing the "binary geometry" for surfaces appearing in the topological expansion, magically guarantee that the kinematically shifted Tr $φ^3$ amplitudes satisfy the physical properties needed to be interpreted as scaffolded gluons. These include multilinearity, gauge invariance, and factorization on tree- and loop- level gluon cuts. Our "stringy" scaffolded gluon amplitudes coincide with amplitudes in the bosonic string for extra-dimensional gluon polarizations at tree-level, but differ (and are simpler) at loop-level. We provide many checks on our proposal, including matching non-trivial leading singularities through two loops. The simple counting problem underlying the $u$ variables autonomously "knows" about everything needed to convert colored scalar to gluon amplitudes, exposing a striking "discovery" of Yang-Mills amplitudes from elementary combinatorial ideas in kinematic space. △ Less

Submitted 29 December, 2023; originally announced January 2024.

Comments: 92 pages, 37 figures

arXiv:2312.17591 [pdf, other]

Towards Faithful Explanations for Text Classification with Robustness Improvement and Explanation Guided Training

Authors: Dongfang Li, Baotian Hu, Qingcai Chen, Shan He

Abstract: Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training toward… ▽ More Feature attribution methods highlight the important input tokens as explanations to model predictions, which have been widely applied to deep neural networks towards trustworthy AI. However, recent works show that explanations provided by these methods face challenges of being faithful and robust. In this paper, we propose a method with Robustness improvement and Explanation Guided training towards more faithful EXplanations (REGEX) for text classification. First, we improve model robustness by input gradient regularization technique and virtual adversarial training. Secondly, we use salient ranking to mask noisy tokens and maximize the similarity between model attention and feature attribution, which can be seen as a self-training procedure without importing other external information. We conduct extensive experiments on six datasets with five attribution methods, and also evaluate the faithfulness in the out-of-domain setting. The results show that REGEX improves fidelity metrics of explanations in all settings and further achieves consistent gains based on two randomization tests. Moreover, we show that using highlight explanations produced by REGEX to train select-then-predict models results in comparable task performance to the end-to-end method. △ Less

Submitted 29 December, 2023; originally announced December 2023.

arXiv:2312.16282 [pdf, other]

Hidden zeros for particle/string amplitudes and the unity of colored scalars, pions and gluons

Authors: Nima Arkani-Hamed, Qu Cao, Jin Dong, Carolina Figueiredo, Song He

Abstract: Recent years have seen the emergence of a new understanding of scattering amplitudes in the simplest theory of colored scalar particles - the Tr$(φ^3)$ theory - based on combinatorial and geometric ideas in the kinematic space of scattering data. In this paper we report a surprise: far from the toy model it appears to be, the ''stringy'' Tr$(φ^3)$ amplitudes secretly contain the scattering amplitu… ▽ More Recent years have seen the emergence of a new understanding of scattering amplitudes in the simplest theory of colored scalar particles - the Tr$(φ^3)$ theory - based on combinatorial and geometric ideas in the kinematic space of scattering data. In this paper we report a surprise: far from the toy model it appears to be, the ''stringy'' Tr$(φ^3)$ amplitudes secretly contain the scattering amplitudes for pions, as well as non-supersymmetric gluons, in any number of dimensions. The amplitudes for the different theories are given by one and the same function, related by a simple shift of the kinematics. This discovery was spurred by another fundamental observation: the tree-level Tr$(φ^3)$ field theory amplitudes have a hidden pattern of zeros when a special set of non-planar Mandelstam invariants is set to zero. Furthermore, near these zeros, the amplitudes simplify, by factoring into a non-trivial product of smaller amplitudes. Remarkably the amplitudes for pions and gluons are observed to also vanish in the same kinematical locus. These properties further generalize to the ''stringy'' Tr$(φ^3)$ amplitudes. There is a unique shift of the kinematic data that preserves the zeros, and this shift is precisely the one that unifies colored scalars, pions, and gluons into a single object. We will focus in this paper on explaining the hidden zeros and factorization properties and the connection between all the colored theories, working for simplicity at tree-level. Subsequent works will describe this new formulation for the Non-linear Sigma Model and non-supersymmetric Yang-Mills theory, at all loop orders. △ Less

Submitted 1 May, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

Comments: Added reference to early work of Gliozzi et. al. giving a different derivation of zeros for string amplitudes from monodromy relations, corrected typos

arXiv:2312.16218 [pdf, other]

Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks

Authors: Christian Simon, Sen He, Juan-Manuel Perez-Rua, Mengmeng Xu, Amine Benhalloum, Tao Xiang

Abstract: Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the… ▽ More Solving image-to-3D from a single view is an ill-posed problem, and current neural reconstruction methods addressing it through diffusion models still rely on scene-specific optimization, constraining their generalization capability. To overcome the limitations of existing approaches regarding generalization and consistency, we introduce a novel neural rendering technique. Our approach employs the signed distance function as the surface representation and incorporates generalizable priors through geometry-encoding volumes and HyperNetworks. Specifically, our method builds neural encoding volumes from generated multi-view inputs. We adjust the weights of the SDF network conditioned on an input image at test-time to allow model adaptation to novel scenes in a feed-forward manner via HyperNetworks. To mitigate artifacts derived from the synthesized views, we propose the use of a volume transformer module to improve the aggregation of image features instead of processing each viewpoint separately. Through our proposed method, dubbed as Hyper-VolTran, we avoid the bottleneck of scene-specific optimization and maintain consistency across the images generated from multiple viewpoints. Our experiments show the advantages of our proposed approach with consistent results and rapid generation. △ Less

Submitted 5 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

arXiv:2312.15633 [pdf, other]

MuLA-GAN: Multi-Level Attention GAN for Enhanced Underwater Visibility

Authors: Ahsan Baidar Bakht, Zikai Jia, Muhayy ud Din, Waseem Akram, Lyes Saad Soud, Lakmal Seneviratne, Defu Lin, Shaoming He, Irfan Hussain

Abstract: The underwater environment presents unique challenges, including color distortions, reduced contrast, and blurriness, hindering accurate analysis. In this work, we introduce MuLA-GAN, a novel approach that leverages the synergistic power of Generative Adversarial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement. The integration of Multi-Level Atte… ▽ More The underwater environment presents unique challenges, including color distortions, reduced contrast, and blurriness, hindering accurate analysis. In this work, we introduce MuLA-GAN, a novel approach that leverages the synergistic power of Generative Adversarial Networks (GANs) and Multi-Level Attention mechanisms for comprehensive underwater image enhancement. The integration of Multi-Level Attention within the GAN architecture significantly enhances the model's capacity to learn discriminative features crucial for precise image restoration. By selectively focusing on relevant spatial and multi-level features, our model excels in capturing and preserving intricate details in underwater imagery, essential for various applications. Extensive qualitative and quantitative analyses on diverse datasets, including UIEB test dataset, UIEB challenge dataset, U45, and UCCS dataset, highlight the superior performance of MuLA-GAN compared to existing state-of-the-art methods. Experimental evaluations on a specialized dataset tailored for bio-fouling and aquaculture applications demonstrate the model's robustness in challenging environmental conditions. On the UIEB test dataset, MuLA-GAN achieves exceptional PSNR (25.59) and SSIM (0.893) scores, surpassing Water-Net, the second-best model, with scores of 24.36 and 0.885, respectively. This work not only addresses a significant research gap in underwater image enhancement but also underscores the pivotal role of Multi-Level Attention in enhancing GANs, providing a novel and comprehensive framework for restoring underwater image quality. △ Less

Submitted 25 December, 2023; originally announced December 2023.

arXiv:2312.15484 [pdf, other]

On constructibility of AdS supergluon amplitudes

Authors: Qu Cao, Song He, Yichao Tang

Abstract: We prove that all tree-level $n$-point supergluon (scalar) amplitudes in AdS$_5$ can be recursively constructed, using factorization and flat-space limit. Our method is greatly facilitated by a natural R-symmetry basis for planar color-ordered amplitudes, which reduces the latter to "partial amplitudes" with simpler pole structures and factorization properties. Given the $n$-point scalar amplitude… ▽ More We prove that all tree-level $n$-point supergluon (scalar) amplitudes in AdS$_5$ can be recursively constructed, using factorization and flat-space limit. Our method is greatly facilitated by a natural R-symmetry basis for planar color-ordered amplitudes, which reduces the latter to "partial amplitudes" with simpler pole structures and factorization properties. Given the $n$-point scalar amplitude, we first extract spinning amplitudes with $n{-}2$ scalars and one gluon by imposing "gauge invariance", and then use a special "no-gluon kinematics" to determine the $(n{+}1)$-point scalar amplitude completely (which in turn contains the $n$-point single-gluon amplitude). Explicit results of up to 8-point scalar amplitudes and up to 6-point single-gluon amplitudes are included as supplemental materials. △ Less

Submitted 14 January, 2024; v1 submitted 24 December, 2023; originally announced December 2023.

Comments: 5 pages, 4 figures, major revision from v2 including new ancillary file

arXiv:2312.13875 [pdf, other]

Best Arm Identification in Batched Multi-armed Bandit Problems

Authors: Shengyu Cao, Simai He, Ruoqing Jiang, Jin Xu, Hongsong Yuan

Abstract: Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed… ▽ More Recently multi-armed bandit problem arises in many real-life scenarios where arms must be sampled in batches, due to limited time the agent can wait for the feedback. Such applications include biological experimentation and online marketing. The problem is further complicated when the number of arms is large and the number of batches is small. We consider pure exploration in a batched multi-armed bandit problem. We introduce a general linear programming framework that can incorporate objectives of different theoretical settings in best arm identification. The linear program leads to a two-stage algorithm that can achieve good theoretical properties. We demonstrate by numerical studies that the algorithm also has good performance compared to certain UCB-type or Thompson sampling methods. △ Less

Submitted 21 December, 2023; originally announced December 2023.

arXiv:2312.11988 [pdf, other]

Xpert: Empowering Incident Management with Query Recommendations via Large Language Models

Authors: Yuxuan Jiang, Chaoyun Zhang, Shilin He, Zhihao Yang, Minghua Ma, Si Qin, Yu Kang, Yingnong Dang, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang

Abstract: Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consumin… ▽ More Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents occurring within these systems can lead to service disruptions and adversely affect user experience. To swiftly resolve such incidents, on-call engineers depend on crafting domain-specific language (DSL) queries to analyze telemetry data. However, writing these queries can be challenging and time-consuming. This paper presents a thorough empirical study on the utilization of queries of KQL, a DSL employed for incident management in a large-scale cloud management system at Microsoft. The findings obtained underscore the importance and viability of KQL queries recommendation to enhance incident management. Building upon these valuable insights, we introduce Xpert, an end-to-end machine learning framework that automates KQL recommendation process. By leveraging historical incident data and large language models, Xpert generates customized KQL queries tailored to new incidents. Furthermore, Xpert incorporates a novel performance metric called Xcore, enabling a thorough evaluation of query quality from three comprehensive perspectives. We conduct extensive evaluations of Xpert, demonstrating its effectiveness in offline settings. Notably, we deploy Xpert in the real production environment of a large-scale incident management system in Microsoft, validating its efficiency in supporting incident management. To the best of our knowledge, this paper represents the first empirical study of its kind, and Xpert stands as a pioneering DSL query recommendation framework designed for incident management. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted as a reseach paper at ICSE 2024

arXiv:2312.11549 [pdf, other]

Label-Free Multivariate Time Series Anomaly Detection

Authors: Qihang Zhou, Shibo He, Haoyu Liu, Jiming Chen, Wenchao Meng

Abstract: Anomaly detection in multivariate time series (MTS) has been widely studied in one-class classification (OCC) setting. The training samples in OCC are assumed to be normal, which is difficult to guarantee in practical situations. Such a case may degrade the performance of OCC-based anomaly detection methods which fit the training distribution as the normal distribution. In this paper, we propose M… ▽ More Anomaly detection in multivariate time series (MTS) has been widely studied in one-class classification (OCC) setting. The training samples in OCC are assumed to be normal, which is difficult to guarantee in practical situations. Such a case may degrade the performance of OCC-based anomaly detection methods which fit the training distribution as the normal distribution. In this paper, we propose MTGFlow, an unsupervised anomaly detection approach for MTS anomaly detection via dynamic Graph and entity-aware normalizing Flow. MTGFlow first estimates the density of the entire training samples and then identifies anomalous instances based on the density of the test samples within the fitted distribution. This relies on a widely accepted assumption that anomalous instances exhibit more sparse densities than normal ones, with no reliance on the clean training dataset. However, it is intractable to directly estimate the density due to complex dependencies among entities and their diverse inherent characteristics. To mitigate this, we utilize the graph structure learning model to learn interdependent and evolving relations among entities, which effectively captures complex and accurate distribution patterns of MTS. In addition, our approach incorporates the unique characteristics of individual entities by employing an entity-aware normalizing flow. This enables us to represent each entity as a parameterized normal distribution. Furthermore, considering that some entities present similar characteristics, we propose a cluster strategy that capitalizes on the commonalities of entities with similar characteristics, resulting in more precise and detailed density estimation. We refer to this cluster-aware extension as MTGFlow_cluster. Extensive experiments are conducted on six widely used benchmark datasets, in which MTGFlow and MTGFlow cluster demonstrate their superior detection performance. △ Less

Submitted 6 February, 2024; v1 submitted 16 December, 2023; originally announced December 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2208.02108

arXiv:2312.10979 [pdf, ps, other]

3S-TSE: Efficient Three-Stage Target Speaker Extraction for Real-Time and Low-Resource Applications

Authors: Shulin He, Jinjiang liu, Hao Li, Yang Yang, Fei Chen, Xueliang Zhang

Abstract: Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE tas… ▽ More Target speaker extraction (TSE) aims to isolate a specific voice from multiple mixed speakers relying on a registerd sample. Since voiceprint features usually vary greatly, current end-to-end neural networks require large model parameters which are computational intensive and impractical for real-time applications, espetially on resource-constrained platforms. In this paper, we address the TSE task using microphone array and introduce a novel three-stage solution that systematically decouples the process: First, a neural network is trained to estimate the direction of the target speaker. Second, with the direction determined, the Generalized Sidelobe Canceller (GSC) is used to extract the target speech. Third, an Inplace Convolutional Recurrent Neural Network (ICRN) acts as a denoising post-processor, refining the GSC output to yield the final separated speech. Our approach delivers superior performance while drastically reducing computational load, setting a new standard for efficient real-time target speaker extraction. △ Less

Submitted 4 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024

arXiv:2312.09716 [pdf, other]

Let All be Whitened: Multi-teacher Distillation for Efficient Visual Retrieval

Authors: Zhe Ma, Jianfeng Dong, Shouling Ji, Zhenguang Liu, Xuhong Zhang, Zonghui Wang, Sifeng He, Feng Qian, Xiaobo Zhang, Lei Yang

Abstract: Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowled… ▽ More Visual retrieval aims to search for the most relevant visual items, e.g., images and videos, from a candidate gallery with a given query item. Accuracy and efficiency are two competing objectives in retrieval tasks. Instead of crafting a new method pursuing further improvement on accuracy, in this paper we propose a multi-teacher distillation framework Whiten-MTD, which is able to transfer knowledge from off-the-shelf pre-trained retrieval models to a lightweight student model for efficient visual retrieval. Furthermore, we discover that the similarities obtained by different retrieval models are diversified and incommensurable, which makes it challenging to jointly distill knowledge from multiple models. Therefore, we propose to whiten the output of teacher models before fusion, which enables effective multi-teacher distillation for retrieval models. Whiten-MTD is conceptually simple and practically effective. Extensive experiments on two landmark image retrieval datasets and one video retrieval dataset demonstrate the effectiveness of our proposed method, and its good balance of retrieval performance and efficiency. Our source code is released at https://github.com/Maryeon/whiten_mtd. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI 2024

arXiv:2312.08672 [pdf, other]

doi 10.1016/j.ins.2024.120916

CAT: A Causally Graph Attention Network for Trimming Heterophilic Graph

Authors: Silu He, Qinyao Luo, Xinsha Fu, Ling Zhao, Ronghua Du, Haifeng Li

Abstract: Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination abi… ▽ More Local Attention-guided Message Passing Mechanism (LAMP) adopted in Graph Attention Networks (GATs) is designed to adaptively learn the importance of neighboring nodes for better local aggregation on the graph, which can bring the representations of similar neighbors closer effectively, thus showing stronger discrimination ability. However, existing GATs suffer from a significant discrimination ability decline in heterophilic graphs because the high proportion of dissimilar neighbors can weaken the self-attention of the central node, jointly resulting in the deviation of the central node from similar nodes in the representation space. This kind of effect generated by neighboring nodes is called the Distraction Effect (DE) in this paper. To estimate and weaken the DE of neighboring nodes, we propose a Causally graph Attention network for Trimming heterophilic graph (CAT). To estimate the DE, since the DE are generated through two paths (grab the attention assigned to neighbors and reduce the self-attention of the central node), we use Total Effect to model DE, which is a kind of causal estimand and can be estimated from intervened data; To weaken the DE, we identify the neighbors with the highest DE (we call them Distraction Neighbors) and remove them. We adopt three representative GATs as the base model within the proposed CAT framework and conduct experiments on seven heterophilic datasets in three different sizes. Comparative experiments show that CAT can improve the node classification accuracy of all base GAT models. Ablation experiments and visualization further validate the enhancement of discrimination ability brought by CAT. The source code is available at https://github.com/GeoX-Lab/CAT. △ Less

Submitted 17 June, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

Comments: 25 pages, 18 figures, 5 tables

Journal ref: Information Science 2024

arXiv:2312.05062 [pdf, ps, other]

Deep Learning Enabled Semantic Communication Systems for Video Transmission

Authors: Zhenguo Zhang, Qianqian Yang, Shibo He, Jiming Chen

Abstract: Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compact… ▽ More Semantic communication has emerged as a promising approach for improving efficient transmission in the next generation of wireless networks. Inspired by the success of semantic communication in different areas, we aim to provide a new semantic communication scheme from the semantic level. In this paper, we propose a novel DL-based semantic communication system for video transmission, which compacts semantic-related information to improve transmission efficiency. In particular, we utilize the Bi-optical flow to estimate residual information of inter-frame details. We also propose a feature choice module and a feature fusion module to drop semantically redundant features while paying more attention to the important semantic-related content. We employ a frame prediction module to reconstruct semantic features of the prediction frame from the received signal at the receiver. To enhance the system's robustness, we propose a noise attention module that assigns different importance weights to the extracted features. Simulation results indicate that our proposed method outperforms existing approaches in terms of transmission efficiency, achieving about 33.3\% reduction in the number of transmitted symbols while improving the peak signal-to-noise ratio (PSNR) performance by an average of 0.56dB. △ Less

Submitted 8 December, 2023; originally announced December 2023.

arXiv:2312.04557 [pdf, other]

GenTron: Diffusion Transformers for Image and Video Generation

Authors: Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua

Abstract: In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilizes CNN-based U-Net architectures, particularly in diffusion-based models. We introduce GenTron, a family of Generative models employing Transformer-bas… ▽ More In this study, we explore Transformer-based diffusion models for image and video generation. Despite the dominance of Transformer architectures in various fields due to their flexibility and scalability, the visual generative domain primarily utilizes CNN-based U-Net architectures, particularly in diffusion-based models. We introduce GenTron, a family of Generative models employing Transformer-based diffusion, to address this gap. Our initial step was to adapt Diffusion Transformers (DiTs) from class to text conditioning, a process involving thorough empirical exploration of the conditioning mechanism. We then scale GenTron from approximately 900M to over 3B parameters, observing significant improvements in visual quality. Furthermore, we extend GenTron to text-to-video generation, incorporating novel motion-free guidance to enhance video quality. In human evaluations against SDXL, GenTron achieves a 51.1% win rate in visual quality (with a 19.8% draw rate), and a 42.3% win rate in text alignment (with a 42.9% draw rate). GenTron also excels in the T2I-CompBench, underscoring its strengths in compositional generation. We believe this work will provide meaningful insights and serve as a valuable reference for future research. △ Less

Submitted 2 June, 2024; v1 submitted 7 December, 2023; originally announced December 2023.

Comments: CVPR2024 Camera Ready. Website: https://www.shoufachen.com/gentron_website/

arXiv:2312.02679 [pdf, other]

Entanglement and Pseudo Entanglement Dynamics versus Fusion in CFT

Authors: Song He, Yu-Xuan Zhang, Long Zhao, Zi-Xuan Zhao

Abstract: The fusion rules and operator product expansion (OPE) serve as crucial tools in the study of operator algebras within conformal field theory (CFT). Building upon the vision of using entanglement to explore the connections between fusion coefficients and OPE coefficients, we employ the replica method and Schmidt decomposition method to investigate the time evolution of entanglement entropy (EE) and… ▽ More The fusion rules and operator product expansion (OPE) serve as crucial tools in the study of operator algebras within conformal field theory (CFT). Building upon the vision of using entanglement to explore the connections between fusion coefficients and OPE coefficients, we employ the replica method and Schmidt decomposition method to investigate the time evolution of entanglement entropy (EE) and pseudo entropy (PE) for linear combinations of operators in rational conformal field theory (RCFT). We obtain a formula that links fusion coefficients, quantum dimensions, and OPE coefficients. We also identify two definition schemes for linear combination operators. Under one scheme, the EE captures information solely for the heaviest operators, while the PE retains information for all operators, reflecting the phenomenon of pseudo entropy amplification. Irrespective of the scheme employed, the EE demonstrates a step-like evolution, illustrating the effectiveness of the quasiparticle propagation picture for the general superposition of locally excited states in RCFT. From the perspective of quasiparticle propagation, we observe spontaneous block-diagonalization of the reduced density matrix of a subsystem when quasiparticles enter the subsystem. △ Less

Submitted 29 June, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: 29 pages, 4 figures, published version

Showing 101–150 of 1,185 results for author: He, S