subscribe to arXiv mailings

DeepGate3: Towards Scalable Circuit Representation Learning

Authors: Zhengyuan Shi, Ziyang Zheng, Sadaf Khan, Jianyuan Zhong, Min Li, Qiang Xu

Abstract: Circuit representation learning has shown promising results in advancing the field of Electronic Design Automation (EDA). Existing models, such as DeepGate Family, primarily utilize Graph Neural Networks (GNNs) to encode circuit netlists into gate-level embeddings. However, the scalability of GNN-based models is fundamentally constrained by architectural limitations, impacting their ability to gen… ▽ More Circuit representation learning has shown promising results in advancing the field of Electronic Design Automation (EDA). Existing models, such as DeepGate Family, primarily utilize Graph Neural Networks (GNNs) to encode circuit netlists into gate-level embeddings. However, the scalability of GNN-based models is fundamentally constrained by architectural limitations, impacting their ability to generalize across diverse and complex circuit designs. To address these challenges, we introduce DeepGate3, an enhanced architecture that integrates Transformer modules following the initial GNN processing. This novel architecture not only retains the robust gate-level representation capabilities of its predecessor, DeepGate2, but also enhances them with the ability to model subcircuits through a novel pooling transformer mechanism. DeepGate3 is further refined with multiple innovative supervision tasks, significantly enhancing its learning process and enabling superior representation of both gate-level and subcircuit structures. Our experiments demonstrate marked improvements in scalability and generalizability over traditional GNN-based approaches, establishing a significant step forward in circuit representation learning technology. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10670 [pdf, other]

Enhancing Retrieval and Managing Retrieval: A Four-Module Synergy for Improved Quality and Efficiency in RAG Systems

Authors: Yunxiao Shi, Xing Zi, Zijing Shi, Haimin Zhang, Qiang Wu, Min Xu

Abstract: Retrieval-augmented generation (RAG) techniques leverage the in-context learning capabilities of large language models (LLMs) to produce more accurate and relevant responses. Originating from the simple 'retrieve-then-read' approach, the RAG framework has evolved into a highly flexible and modular paradigm. A critical component, the Query Rewriter module, enhances knowledge retrieval by generating… ▽ More Retrieval-augmented generation (RAG) techniques leverage the in-context learning capabilities of large language models (LLMs) to produce more accurate and relevant responses. Originating from the simple 'retrieve-then-read' approach, the RAG framework has evolved into a highly flexible and modular paradigm. A critical component, the Query Rewriter module, enhances knowledge retrieval by generating a search-friendly query. This method aligns input questions more closely with the knowledge base. Our research identifies opportunities to enhance the Query Rewriter module to Query Rewriter+ by generating multiple queries to overcome the Information Plateaus associated with a single query and by rewriting questions to eliminate Ambiguity, thereby clarifying the underlying intent. We also find that current RAG systems exhibit issues with Irrelevant Knowledge; to overcome this, we propose the Knowledge Filter. These two modules are both based on the instruction-tuned Gemma-2B model, which together enhance response quality. The final identified issue is Redundant Retrieval; we introduce the Memory Knowledge Reservoir and the Retriever Trigger to solve this. The former supports the dynamic expansion of the RAG system's knowledge base in a parameter-free manner, while the latter optimizes the cost for accessing external knowledge, thereby improving resource utilization and response efficiency. These four RAG modules synergistically improve the response quality and efficiency of the RAG system. The effectiveness of these modules has been validated through experiments and ablation studies across six common QA datasets. The source code can be accessed at https://github.com/Ancientshi/ERM4. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: ECAI2024 #1304

arXiv:2407.10427 [pdf, other]

Transformer for Multitemporal Hyperspectral Image Unmixing

Authors: Hang Li, Qiankun Dong, Xueshuo Xie, Xia Xu, Tao Li, Zhenwei Shi

Abstract: Multitemporal hyperspectral image unmixing (MTHU) holds significant importance in monitoring and analyzing the dynamic changes of surface. However, compared to single-temporal unmixing, the multitemporal approach demands comprehensive consideration of information across different phases, rendering it a greater challenge. To address this challenge, we propose the Multitemporal Hyperspectral Image U… ▽ More Multitemporal hyperspectral image unmixing (MTHU) holds significant importance in monitoring and analyzing the dynamic changes of surface. However, compared to single-temporal unmixing, the multitemporal approach demands comprehensive consideration of information across different phases, rendering it a greater challenge. To address this challenge, we propose the Multitemporal Hyperspectral Image Unmixing Transformer (MUFormer), an end-to-end unsupervised deep learning model. To effectively perform multitemporal hyperspectral image unmixing, we introduce two key modules: the Global Awareness Module (GAM) and the Change Enhancement Module (CEM). The Global Awareness Module computes self-attention across all phases, facilitating global weight allocation. On the other hand, the Change Enhancement Module dynamically learns local temporal changes by comparing endmember changes between adjacent phases. The synergy between these modules allows for capturing semantic information regarding endmember and abundance changes, thereby enhancing the effectiveness of multitemporal hyperspectral image unmixing. We conducted experiments on one real dataset and two synthetic datasets, demonstrating that our model significantly enhances the effect of multitemporal hyperspectral image unmixing. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.08216 [pdf, other]

Multimodal contrastive learning for spatial gene expression prediction using histology images

Authors: Wenwen Min, Zhiceng Shi, Jun Zhang, Jun Wan, Changmiao Wang

Abstract: In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effect… ▽ More In recent years, the advent of spatial transcriptomics (ST) technology has unlocked unprecedented opportunities for delving into the complexities of gene expression patterns within intricate biological systems. Despite its transformative potential, the prohibitive cost of ST technology remains a significant barrier to its widespread adoption in large-scale studies. An alternative, more cost-effective strategy involves employing artificial intelligence to predict gene expression levels using readily accessible whole-slide images (WSIs) stained with Hematoxylin and Eosin (H\&E). However, existing methods have yet to fully capitalize on multimodal information provided by H&E images and ST data with spatial location. In this paper, we propose \textbf{mclSTExp}, a multimodal contrastive learning with Transformer and Densenet-121 encoder for Spatial Transcriptomics Expression prediction. We conceptualize each spot as a "word", integrating its intrinsic features with spatial context through the self-attention mechanism of a Transformer encoder. This integration is further enriched by incorporating image features via contrastive learning, thereby enhancing the predictive capability of our model. Our extensive evaluation of \textbf{mclSTExp} on two breast cancer datasets and a skin squamous cell carcinoma dataset demonstrates its superior performance in predicting spatial gene expression. Moreover, mclSTExp has shown promise in interpreting cancer-specific overexpressed genes, elucidating immune-related genes, and identifying specialized spatial domains annotated by pathologists. Our source code is available at https://github.com/shizhiceng/mclSTExp. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: BIB, Code: https://github.com/shizhiceng/mclSTExp

arXiv:2407.07941 [pdf, other]

Analytic framework for self-dual criticality in $\mathbb{Z}_k$ gauge theory with matter

Authors: Zhengyan Darius Shi, Arkya Chatterjee

Abstract: We study the putative multicritical point in 2+1D $\mathbb{Z}_k$ gauge theory where the Higgs and confinement transitions meet. The presence of an $e$-$m$ duality symmetry at this critical point forces anyons with nontrivial braiding to close their gaps simultaneously, giving rise to a critical theory that mixes strong interactions with mutual statistics. An effective U(1) $\times$ U(1) gauge theo… ▽ More We study the putative multicritical point in 2+1D $\mathbb{Z}_k$ gauge theory where the Higgs and confinement transitions meet. The presence of an $e$-$m$ duality symmetry at this critical point forces anyons with nontrivial braiding to close their gaps simultaneously, giving rise to a critical theory that mixes strong interactions with mutual statistics. An effective U(1) $\times$ U(1) gauge theory with a mutual Chern-Simons term at level $k$ is proposed to describe the vicinity of the multicritical point for $k \geq 4$. We argue analytically that monopoles are irrelevant in the IR CFT and compute the scaling dimensions of the leading duality-symmetric/anti-symmetric operators. In the large $k$ limit, these scaling dimensions approach $3 - ν_{\rm XY}^{-1}$ as $1/k^2$, where $ν_{\rm XY}$ is the correlation length exponent of the 3D XY model. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 4+$ε$ pages, 15 page appendix, (2+4) figures

arXiv:2407.05467 [pdf, other]

The infrastructure powering IBM's Gen AI model development

Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering efficient and high-performing AI training requires an end-to-end solution that combines hardware, software and holistic telemetry to cater for multiple types of AI workloads. In this report, we describe IBM's hybrid cloud infrastructure that powers our generative AI model development. This infrastructure includes (1) Vela: an AI-optimized supercomputing capability directly integrated into the IBM Cloud, delivering scalable, dynamic, multi-tenant and geographically distributed infrastructure for large-scale model training and other AI workflow steps and (2) Blue Vela: a large-scale, purpose-built, on-premises hosting environment that is optimized to support our largest and most ambitious AI model training tasks. Vela provides IBM with the dual benefit of high performance for internal use along with the flexibility to adapt to an evolving commercial landscape. Blue Vela provides us with the benefits of rapid development of our largest and most ambitious models, as well as future-proofing against the evolving model landscape in the industry. Taken together, they provide IBM with the ability to rapidly innovate in the development of both AI models and commercial offerings. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

arXiv:2407.04100 [pdf, other]

C$^3$DG: Conditional Domain Generalization for Hyperspectral Imagery Classification with Convergence and Constrained-risk Theories

Authors: Zhe Gao, Bin Pan, Zhenwei Shi

Abstract: Hyperspectral imagery (HSI) classification may suffer the challenge of hyperspectral-monospectra, where different classes present similar spectra. Joint spatial-spectral feature extraction is a popular solution for the problem, but this strategy tends to inflate accuracy since test pixels may exist in training patches. Domain generalization methods show promising potential, but they still fail to… ▽ More Hyperspectral imagery (HSI) classification may suffer the challenge of hyperspectral-monospectra, where different classes present similar spectra. Joint spatial-spectral feature extraction is a popular solution for the problem, but this strategy tends to inflate accuracy since test pixels may exist in training patches. Domain generalization methods show promising potential, but they still fail to distinguish similar spectra across varying domains, in addition, the theoretical support is usually ignored. In this paper, we only rely on spectral information to solve the hyperspectral-monospectra problem, and propose a Convergence and Error-Constrained Conditional Domain Generalization method for Hyperspectral Imagery Classification (C$^3$DG). The major contributions of this paper include two aspects: the Conditional Revising Inference Block (CRIB), and the corresponding theories for model convergence and generalization errors. CRIB is the kernel structure of the proposed method, which employs a shared encoder and multi-branch decoders to fully leverage the conditional distribution during training, achieving a decoupling that aligns with the generation mechanisms of HSI. Moreover, to ensure model convergence and maintain controllable error, we propose the optimization convergence theorem and risk upper bound theorem. In the optimization convergence theorem, we ensure the model convergence by demonstrating that the gradients of the loss terms are not contradictory. In the risk upper bound theorem, our theoretical analysis explores the relationship between test-time training and recent related work to establish a concrete bound for error. Experimental results on three benchmark datasets indicate the superiority of C$^3$DG. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03007 [pdf, other]

What Affects the Stability of Tool Learning? An Empirical Study on the Robustness of Tool Learning Frameworks

Authors: Chengrui Huang, Zhengliang Shi, Yuntao Wen, Xiuying Chen, Peng Han, Shen Gao, Shuo Shang

Abstract: Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, a… ▽ More Tool learning methods have enhanced the ability of large language models (LLMs) to interact with real-world applications. Many existing works fine-tune LLMs or design prompts to enable LLMs to select appropriate tools and correctly invoke them to meet user requirements. However, it is observed in previous works that the performance of tool learning varies from tasks, datasets, training settings, and algorithms. Without understanding the impact of these factors, it can lead to inconsistent results, inefficient model deployment, and suboptimal tool utilization, ultimately hindering the practical integration and scalability of LLMs in real-world scenarios. Therefore, in this paper, we explore the impact of both internal and external factors on the performance of tool learning frameworks. Through extensive experiments on two benchmark datasets, we find several insightful conclusions for future work, including the observation that LLMs can benefit significantly from increased trial and exploration. We believe our empirical study provides a new perspective for future tool learning research. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 19 pages, 9 figures

arXiv:2407.02146 [pdf, ps, other]

Coderivative-Based Newton Methods with Wolfe Linesearch for Nonsmooth Optimization

Authors: Miantao Chao, Boris S. Mordukhovich, Zijian Shi, Jin Zhang

Abstract: This paper introduces and develops novel coderivative-based Newton methods with Wolfe linesearch conditions to solve various classes of problems in nonsmooth optimization. We first propose a generalized regularized Newton method with Wolfe linesearch (GRNM-W) for unconstrained $C^{1,1}$ minimization problems (which are second-order nonsmooth) and establish global as well as local superlinear conve… ▽ More This paper introduces and develops novel coderivative-based Newton methods with Wolfe linesearch conditions to solve various classes of problems in nonsmooth optimization. We first propose a generalized regularized Newton method with Wolfe linesearch (GRNM-W) for unconstrained $C^{1,1}$ minimization problems (which are second-order nonsmooth) and establish global as well as local superlinear convergence of their iterates. To deal with convex composite minimization problems (which are first-order nonsmooth and can be constrained), we combine the proposed GRNM-W with two algorithmic frameworks: the forward-backward envelope and the augmented Lagrangian method resulting in the two new algorithms called CNFB and CNAL, respectively. Finally, we present numerical results to solve Lasso and support vector machine problems appearing in, e.g., machine learning and statistics, which demonstrate the efficiency of the proposed algorithms. △ Less

Submitted 3 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.01636 [pdf, other]

Learning Frequency-Aware Dynamic Transformers for All-In-One Image Restoration

Authors: Zenglin Shi, Tong Su, Pei Liu, Yunpeng Wu, Le Zhang, Meng Wang

Abstract: This work aims to tackle the all-in-one image restoration task, which seeks to handle multiple types of degradation with a single model. The primary challenge is to extract degradation representations from the input degraded images and use them to guide the model's adaptation to specific degradation types. Recognizing that various degradations affect image content differently across frequency band… ▽ More This work aims to tackle the all-in-one image restoration task, which seeks to handle multiple types of degradation with a single model. The primary challenge is to extract degradation representations from the input degraded images and use them to guide the model's adaptation to specific degradation types. Recognizing that various degradations affect image content differently across frequency bands, we propose a new all-in-one image restoration approach from a frequency perspective, leveraging advanced vision transformers. Our method consists of two main components: a frequency-aware Degradation prior learning transformer (Dformer) and a degradation-adaptive Restoration transformer (Rformer). The Dformer captures the essential characteristics of various degradations by decomposing inputs into different frequency components. By understanding how degradations affect these frequency components, the Dformer learns robust priors that effectively guide the restoration process. The Rformer then employs a degradation-adaptive self-attention module to selectively focus on the most affected frequency components, guided by the learned degradation representations. Extensive experimental results demonstrate that our approach outperforms the existing methods on four representative restoration tasks, including denoising, deraining, dehazing and deblurring. Additionally, our method offers benefits for handling spatially variant degradations and unseen degradation levels. △ Less

Submitted 30 June, 2024; originally announced July 2024.

Comments: 8 pages

arXiv:2407.00987 [pdf, other]

Exploiting Dependency-Aware Priority Adjustment for Mixed-Criticality TSN Flow Scheduling

Authors: Miao Guo, Yifei Sun, Chaojie Gu, Shibo He, Zhiguo Shi

Abstract: Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads… ▽ More Time-Sensitive Networking (TSN) serves as a one-size-fits-all solution for mixed-criticality communication, in which flow scheduling is vital to guarantee real-time transmissions. Traditional approaches statically assign priorities to flows based on their associated applications, resulting in significant queuing delays. In this paper, we observe that assigning different priorities to a flow leads to varying delays due to different shaping mechanisms applied to different flow types. Leveraging this insight, we introduce a new scheduling method in mixed-criticality TSN that incorporates a priority adjustment scheme among diverse flow types to mitigate queuing delays and enhance schedulability. Specifically, we propose dependency-aware priority adjustment algorithms tailored to different link-overlapping conditions. Experiments in various settings validate the effectiveness of the proposed method, which enhances the schedulability by 20.57% compared with the SOTA method. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: This paper has been accepted by IWQoS'24

arXiv:2406.19043 [pdf]

CMRxRecon2024: A Multi-Modality, Multi-View K-Space Dataset Boosting Universal Machine Learning for Accelerated Cardiac MRI

Authors: Zi Wang, Fanwen Wang, Chen Qin, Jun Lyu, Ouyang Cheng, Shuo Wang, Yan Li, Mengyao Yu, Haoyu Zhang, Kunyuan Guo, Zhang Shi, Qirong Li, Ziqiang Xu, Yajing Zhang, Hao Li, Sha Hua, Binghua Chen, Longyu Sun, Mengting Sun, Qin Li, Ying-Hua Chu, Wenjia Bai, Jing Qin, Xiahai Zhuang, Claudia Prieto , et al. (7 additional authors not shown)

Abstract: Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover h… ▽ More Cardiac magnetic resonance imaging (MRI) has emerged as a clinically gold-standard technique for diagnosing cardiac diseases, thanks to its ability to provide diverse information with multiple modalities and anatomical views. Accelerated cardiac MRI is highly expected to achieve time-efficient and patient-friendly imaging, and then advanced image reconstruction approaches are required to recover high-quality, clinically interpretable images from undersampled measurements. However, the lack of publicly available cardiac MRI k-space dataset in terms of both quantity and diversity has severely hindered substantial technological progress, particularly for data-driven artificial intelligence. Here, we provide a standardized, diverse, and high-quality CMRxRecon2024 dataset to facilitate the technical development, fair evaluation, and clinical transfer of cardiac MRI reconstruction approaches, towards promoting the universal frameworks that enable fast and robust reconstructions across different cardiac MRI protocols in clinical practice. To the best of our knowledge, the CMRxRecon2024 dataset is the largest and most diverse publicly available cardiac k-space dataset. It is acquired from 330 healthy volunteers, covering commonly used modalities, anatomical views, and acquisition trajectories in clinical cardiac MRI workflows. Besides, an open platform with tutorials, benchmarks, and data processing tools is provided to facilitate data usage, advanced method development, and fair performance evaluation. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 19 pages, 3 figures, 2 tables

arXiv:2406.18993 [pdf, ps, other]

Interference Cancellation Based Neural Receiver for Superimposed Pilot in Multi-Layer Transmission

Authors: Han Xiao, Wenqiang Tian, Shi Jin, Wendong Liu, Jia Shen, Zhihua Shi, Zhi Zhang

Abstract: In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol ai… ▽ More In this paper, an interference cancellation based neural receiver for superimposed pilot (SIP) in multi-layer transmission is proposed, where the data and pilot are non-orthogonally superimposed in the same time-frequency resource. Specifically, to deal with the intra-layer and inter-layer interference of SIP under multi-layer transmission, the interference cancellation with superimposed symbol aided channel estimation is leveraged in the neural receiver, accompanied by the pre-design of pilot code-division orthogonal mechanism at transmitter. In addition, to address the complexity issue for inter-vendor collaboration and the generalization problem in practical deployments, respectively, this paper also provides a fixed SIP (F-SIP) design based on constant pilot power ratio and scalable mechanisms for different modulation and coding schemes (MCSs) and transmission layers. Simulation results demonstrate the superiority of the proposed schemes on the performance of block error rate and throughput compared with existing counterparts. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.17803 [pdf, other]

Understanding the Role of User Profile in the Personalization of Large Language Models

Authors: Bin Wu, Zhengyan Shi, Hossein A. Rahmani, Varsha Ramineni, Emine Yilmaz

Abstract: Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we inves… ▽ More Utilizing user profiles to personalize Large Language Models (LLMs) has been shown to enhance the performance on a wide range of tasks. However, the precise role of user profiles and their effect mechanism on LLMs remains unclear. This study first confirms that the effectiveness of user profiles is primarily due to personalization information rather than semantic information. Furthermore, we investigate how user profiles affect the personalization of LLMs. Within the user profile, we reveal that it is the historical personalized response produced or approved by users that plays a pivotal role in personalizing LLMs. This discovery unlocks the potential of LLMs to incorporate a greater number of user profiles within the constraints of limited input length. As for the position of user profiles, we observe that user profiles integrated into different positions of the input context do not contribute equally to personalization. Instead, where the user profile that is closer to the beginning affects more on the personalization of LLMs. Our findings reveal the role of user profiles for the personalization of LLMs, and showcase how incorporating user profiles impacts performance providing insight to leverage user profiles effectively. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.16583 [pdf, other]

Personalized federated learning based on feature fusion

Authors: Wolong Xing, Zhenkui Shi, Hongyan Peng, Xiantao Hu, Xianxian Li

Abstract: Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy. However, due to the heterogeneity of data, models, and devices, the final global model may need to perform better for tasks on each client. Communication bottlenecks, data heterogeneity, and model heterogeneity have been common challenges in federated learning. In t… ▽ More Federated learning enables distributed clients to collaborate on training while storing their data locally to protect client privacy. However, due to the heterogeneity of data, models, and devices, the final global model may need to perform better for tasks on each client. Communication bottlenecks, data heterogeneity, and model heterogeneity have been common challenges in federated learning. In this work, we considered a label distribution skew problem, a type of data heterogeneity easily overlooked. In the context of classification, we propose a personalized federated learning approach called pFedPM. In our process, we replace traditional gradient uploading with feature uploading, which helps reduce communication costs and allows for heterogeneous client models. These feature representations play a role in preserving privacy to some extent. We use a hyperparameter $a$ to mix local and global features, which enables us to control the degree of personalization. We also introduced a relation network as an additional decision layer, which provides a non-linear learnable classifier to predict labels. Experimental results show that, with an appropriate setting of $a$, our scheme outperforms several recent FL methods on MNIST, FEMNIST, and CRIFAR10 datasets and achieves fewer communications. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16285 [pdf, other]

Gradient enhanced ADMM Algorithm for dynamic optimal transport on surfaces

Authors: Guozhi Dong, Hailong Guo, Chengrun Jiang, Zuoqiang Shi

Abstract: A gradient enhanced ADMM algorithm for optimal transport on general surfaces is proposed in this paper. Based on Benamou and Brenier's dynamical formulation, we combine gradient recovery techniques on surfaces with the ADMM algorithm, not only improving the computational accuracy, but also providing a novel method to deal with dual variables in the algorithm. This method avoids the use of stagger… ▽ More A gradient enhanced ADMM algorithm for optimal transport on general surfaces is proposed in this paper. Based on Benamou and Brenier's dynamical formulation, we combine gradient recovery techniques on surfaces with the ADMM algorithm, not only improving the computational accuracy, but also providing a novel method to deal with dual variables in the algorithm. This method avoids the use of stagger grids, has better accuracy and is more robust comparing to other averaging techniques. △ Less

Submitted 23 June, 2024; originally announced June 2024.

MSC Class: 65M22; 49M41

arXiv:2406.14891 [pdf, other]

Generate-then-Ground in Retrieval-Augmented Generation for Multi-hop Question Answering

Authors: Zhengliang Shi, Shuo Zhang, Weiwei Sun, Shen Gao, Pengjie Ren, Zhumin Chen, Zhaochun Ren

Abstract: Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable no… ▽ More Multi-Hop Question Answering (MHQA) tasks present a significant challenge for large language models (LLMs) due to the intensive knowledge required. Current solutions, like Retrieval-Augmented Generation, typically retrieve potential documents from an external corpus to read an answer. However, the performance of this retrieve-then-read paradigm is constrained by the retriever and the inevitable noise in the retrieved documents. To mitigate these challenges, we introduce a novel generate-then-ground (GenGround) framework, synergizing the parametric knowledge of LLMs and external documents to solve a multi-hop question. GenGround empowers LLMs to alternate two phases until the final answer is derived: (1) formulate a simpler, single-hop question and directly generate the answer; (2) ground the question-answer pair in retrieved documents, amending any wrong predictions in the answer. We also propose an instructional grounding distillation method to generalize our method into smaller models. Extensive experiments conducted on four datasets illustrate the superiority of our method. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: ACL 2024 (main conference)

arXiv:2406.14852 [pdf, other]

Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models

Authors: Jiayu Wang, Yifei Ming, Zhenmei Shi, Vibhav Vineet, Xin Wang, Neel Joshi

Abstract: Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains. Despite this promise, spatial understanding and reasoning -- a fundamental component of human cognition -- remains under-explored. We develop novel benchmarks that cover diverse aspects of spatial reasoning such as relationship understanding, navigation,… ▽ More Large language models (LLMs) and vision-language models (VLMs) have demonstrated remarkable performance across a wide range of tasks and domains. Despite this promise, spatial understanding and reasoning -- a fundamental component of human cognition -- remains under-explored. We develop novel benchmarks that cover diverse aspects of spatial reasoning such as relationship understanding, navigation, and counting. We conduct a comprehensive evaluation of competitive language and vision-language models. Our findings reveal several counter-intuitive insights that have been overlooked in the literature: (1) Spatial reasoning poses significant challenges where competitive models can fall behind random guessing; (2) Despite additional visual input, VLMs often under-perform compared to their LLM counterparts; (3) When both textual and visual information is available, multi-modal language models become less reliant on visual information if sufficient textual clues are provided. Additionally, we demonstrate that leveraging redundancy between vision and text can significantly enhance model performance. We hope our study will inform the development of multimodal models to improve spatial intelligence and further close the gap with human intelligence. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.14036 [pdf, other]

Toward Infinite-Long Prefix in Transformer

Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Chiwun Yang

Abstract: Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoretical understanding of how these methods work. In this paper, we aim to relieve this limitation by studying the learning ability of Prefix Learning fro… ▽ More Prompting and contextual-based fine-tuning methods, which we call Prefix Learning, have been proposed to enhance the performance of language models on various downstream tasks that can match full parameter fine-tuning. There remains a limited theoretical understanding of how these methods work. In this paper, we aim to relieve this limitation by studying the learning ability of Prefix Learning from the perspective of prefix length. In particular, we approximate the infinite-long Prefix Learning optimization process by the Neural Tangent Kernel (NTK) technique. We formulate and solve it as a learning problem of the infinite-long prefix in a one-layer attention network. Our results confirm the over-parameterization property and arbitrary small loss convergence guarantee of the infinite-long Prefix Learning in attention. To the implementation end, we propose our NTK-Attention method, which is "equivalent" to attention computation with arbitrary prefix length efficiently. Its time complexity mainly depends on the sub-quadratic of input length (without prefix), and our method only requires $d^2 + d$ extra parameters for representation, where $d$ is the feature dimension. In addition, we conducted experiments that compare our NTK-Attention with full parameters fine-tuning, LoRA, and P-Tuning V2 methods across vision or natural language datasets. The results indicate our approach may be a promising parameter-efficient-fine-tuning method since it has demonstrated superior performance in numerous scenarios. Our code can be found at \url{https://github.com/ChristianYang37/chiwun/tree/main/src/NTK-Attention}. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13975 [pdf, other]

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Authors: Zhongshen Zeng, Yinhong Liu, Yingjia Wan, Jingyao Li, Pengguang Chen, Jianbo Dai, Yuxuan Yao, Rongwu Xu, Zehan Qi, Wanru Zhao, Linling Shen, Jianqiao Lu, Haochen Tan, Yukang Chen, Hao Zhang, Zhan Shi, Bailin Wang, Zhijiang Guo, Jiaya Jia

Abstract: Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we pr… ▽ More Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we present a process-based benchmark MR-BEN that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. MR-BEN is a comprehensive benchmark comprising 5,975 questions collected from human experts, covering various subjects such as physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models). For example, open-source models are seemingly comparable to GPT-4 on outcome-based benchmarks, but they lag far behind on our benchmark, revealing the underlying reasoning capability gap between them. Our dataset and codes are available on https://randolph-zeng.github.io/Mr-Ben.github.io/. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13317 [pdf, other]

M4Fog: A Global Multi-Regional, Multi-Modal, and Multi-Stage Dataset for Marine Fog Detection and Forecasting to Bridge Ocean and Atmosphere

Authors: Mengqiu Xu, Ming Wu, Kaixin Chen, Yixiang Huang, Mingrui Xu, Yujia Yang, Yiqing Feng, Yiying Guo, Bin Huang, Dongliang Chang, Zhenwei Shi, Chuang Zhang, Zhanyu Ma, Jun Guo

Abstract: Marine fog poses a significant hazard to global shipping, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are oft… ▽ More Marine fog poses a significant hazard to global shipping, necessitating effective detection and forecasting to reduce economic losses. In recent years, several machine learning (ML) methods have demonstrated superior detection accuracy compared to traditional meteorological methods. However, most of these works are developed on proprietary datasets, and the few publicly accessible datasets are often limited to simplistic toy scenarios for research purposes. To advance the field, we have collected nearly a decade's worth of multi-modal data related to continuous marine fog stages from four series of geostationary meteorological satellites, along with meteorological observations and numerical analysis, covering 15 marine regions globally where maritime fog frequently occurs. Through pixel-level manual annotation by meteorological experts, we present the most comprehensive marine fog detection and forecasting dataset to date, named M4Fog, to bridge ocean and atmosphere. The dataset comprises 68,000 "super data cubes" along four dimensions: elements, latitude, longitude and time, with a temporal resolution of half an hour and a spatial resolution of 1 kilometer. Considering practical applications, we have defined and explored three meaningful tracks with multi-metric evaluation systems: static or dynamic marine fog detection, and spatio-temporal forecasting for cloud images. Extensive benchmarking and experiments demonstrate the rationality and effectiveness of the construction concept for proposed M4Fog. The data and codes are available to whole researchers through cloud platforms to develop ML-driven marine fog solutions and mitigate adverse impacts on human activities. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.11596 [pdf, other]

A hybrid graphene-siliconnitride nanomembrane as a versatile and ultra-widely tunable mechanical device

Authors: Mengqi Fu, Bojan Bošnjak, Zhan Shi, Jannik Dornseiff, Robert H. Blick, Elke Scheer, Fan Yang

Abstract: Integration of 2D materials in nanoelectromechanical systems (NEMS) marries the robustness of silicon-based materials with exceptional electrical controllability in 2D materials, drastically enhancing system performance which now is the key for many advanced applications in nanotechnology. Here, we experimentally demonstrate and theoretically analyze a powerful on-chip graphene integrated NEMS dev… ▽ More Integration of 2D materials in nanoelectromechanical systems (NEMS) marries the robustness of silicon-based materials with exceptional electrical controllability in 2D materials, drastically enhancing system performance which now is the key for many advanced applications in nanotechnology. Here, we experimentally demonstrate and theoretically analyze a powerful on-chip graphene integrated NEMS device consisting of a hybrid graphene/silicon-nitride membrane with metallic leads that enables an extremely large static and dynamic parameter regulation. When a static voltage is applied to the leads, the force induced by the thermal expansion difference between the leads and the membrane results in ultra-wide frequency tuning, deformation (post-buckling transition) and regulation of mechanical properties. Moreover, by injecting an alternating voltage to the leads, we can excite the resonator vibrating even far beyond its linear regime without a complex and space consuming actuation system. Our results prove that the device is a compact integrated system possessing mechanical robustness, high controllability, and fast response. It not only expands the limit of the application range of NEMS devices but also pushes multidimensional nanomechanical resonators into working in the nonlinear regime. △ Less

Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.10678 [pdf, other]

A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection

Authors: Chenyao Zhou, Haotian Zhang, Han Guo, Zhengxia Zou, Zhenwei Shi

Abstract: Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection sub… ▽ More Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection subtasks. However, previous works comprise triple branches in an entangled manner, which may not be optimal and hard to adopt foundation models. Besides, lacking explicit refinement of bitemporal features during fusion may cause low accuracy. In this letter, we propose a novel late-stage bitemporal feature fusion network to address the issue. Specifically, we propose local global attentional aggregation module to strengthen feature fusion, and propose local global context enhancement module to highlight pivotal semantics. Comprehensive experiments are conducted on two public datasets, including SECOND and Landsat-SCD. Quantitative and qualitative results show that our proposed model achieves new state-of-the-art performance on both datasets. △ Less

Submitted 15 June, 2024; originally announced June 2024.

arXiv:2406.09687 [pdf]

Interplay between topology and correlations in the second moiré band of twisted bilayer MoTe2

Authors: Fan Xu, Xumin Chang, Jiayong Xiao, Yixin Zhang, Feng Liu, Zheng Sun, Ning Mao, Nikolai Peshcherenko, Jiayi Li, Kenji Watanabe, Takashi Taniguchi, Bingbing Tong, Li Lu, Jinfeng Jia, Dong Qian, Zhiwen Shi, Yang Zhang, Xiaoxue Liu, Shengwei Jiang, Tingxin Li

Abstract: Topological flat bands formed in two-dimensional lattice systems offer unique opportunity to study the fractional phases of matter in the absence of an external magnetic field. Celebrated examples include fractional quantum anomalous Hall (FQAH) effects and fractional topological insulators. Recently, FQAH effects have been experimentally realized in both the twisted bilayer MoTe2 (tMoTe2) system… ▽ More Topological flat bands formed in two-dimensional lattice systems offer unique opportunity to study the fractional phases of matter in the absence of an external magnetic field. Celebrated examples include fractional quantum anomalous Hall (FQAH) effects and fractional topological insulators. Recently, FQAH effects have been experimentally realized in both the twisted bilayer MoTe2 (tMoTe2) system and the rhombohedral stacked multilayer graphene/hBN moiré systems. To date, experimental studies mainly focus on the first moiré flat band, except a very recent work that reported the evidence of the integer and fractional quantum spin Hall effects in higher moiré bands of a 2.1° tMoTe2 device. Here, we present the systematical transport study of approximately 3° tMoTe2 devices, especially for the second moiré band. At ν = -2 and -4, time-reversal-symmetric single and double quantum spin Hall states formed, consistent with the previous observation in 2.1° tMoTe2 device. On the other hand, we observed ferromagnetism in the second moiré band, and a Chern insulator state driven by out-of-plane magnetic fields at ν = -3. At ν = -2.5, nonmonotonic temperature dependence of resistivity and large out-of-plane negative magnetoresistance have been observed, which likely arises from the frustrated liquid-like ground state and weak effective Ruderman-Kittel-Kasuya-Yosida interactions. Applying out-of-plane electric field can induce quantum phase transitions at both integer and fractional filling factors. Our studies pave the way for realizing tunable topological states and other unexpected magnetic phases beyond the first moiré flat band based on twisted MoTe2 platform. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07650 [pdf, ps, other]

A unique continuation property for $|\overline \partial u| \leq V |u|$

Authors: Ziming Shi

Abstract: Let $u: Ω\subset \mathbb C^n \to \mathbb C^m$, for $n \geq 2$ and $m \geq 1$. Let $1 \leq p \leq 2$, and $2(2n)^2 -1 \leq q < \infty$ such that $\displaystyle \frac{1}{p} + \frac{1}{p'} = 1$ and $\displaystyle \frac{1}{p} - \frac{1}{p'} = \frac{1}{q}$. Suppose $|\overline \partial u| \leq V |u|$, where $V \in L^q_{\operatorname{loc}}(Ω)$. Then $u$ has a unique continuation property in the followin… ▽ More Let $u: Ω\subset \mathbb C^n \to \mathbb C^m$, for $n \geq 2$ and $m \geq 1$. Let $1 \leq p \leq 2$, and $2(2n)^2 -1 \leq q < \infty$ such that $\displaystyle \frac{1}{p} + \frac{1}{p'} = 1$ and $\displaystyle \frac{1}{p} - \frac{1}{p'} = \frac{1}{q}$. Suppose $|\overline \partial u| \leq V |u|$, where $V \in L^q_{\operatorname{loc}}(Ω)$. Then $u$ has a unique continuation property in the following sense: if $u \in W^{1,p}_{\operatorname{loc}}(Ω)$ and for some $z_0 \in Ω$, $\| u \|_{L^{p'}(B(z_0,r))} $ decays faster than any powers of $r$ as $r \to 0$, then $u \equiv 0$. The same result holds for $q=\infty$ if $u$ is scalar-valued ($m=1$). △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: 28 pages

arXiv:2406.05628 [pdf, other]

Domain Generalization Guided by Large-Scale Pre-Trained Priors

Authors: Zongbin Wang, Bin Pan, Shiyu Shen, Tianyang Shi, Zhenwei Shi

Abstract: Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to m… ▽ More Domain generalization (DG) aims to train a model from limited source domains, allowing it to generalize to unknown target domains. Typically, DG models only employ large-scale pre-trained models during the initialization of fine-tuning. However, large-scale pre-trained models already possess the ability to resist domain shift. If we reference pre-trained models continuously during fine-tuning to maintain this ability, it could further enhance the generalization ability of the DG model. For this purpose, we introduce a new method called Fine-Tune with Large-scale pre-trained Priors (FT-LP), which incorporates the pre-trained model as a prior into the DG fine-tuning process, ensuring that the model refers to its pre-trained model at each optimization step. FT-LP comprises a theoretical framework and a simple implementation strategy. In theory, we verify the rationality of FT-LP by introducing a generalization error bound with the pre-trained priors for DG. In implementation, we utilize an encoder to simulate the model distribution, enabling the use of FT-LP when only pre-trained weights are available. In summary, we offer a new fine-tuning method for DG algorithms to utilize pre-trained models throughout the fine-tuning process. Through experiments on various datasets and DG models, our proposed method exhibits significant improvements, indicating its effectiveness. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05616 [pdf, other]

Domain Agnostic Conditional Invariant Predictions for Domain Generalization

Authors: Zongbin Wang, Bin Pan, Zhenwei Shi

Abstract: Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capt… ▽ More Domain generalization aims to develop a model that can perform well on unseen target domains by learning from multiple source domains. However, recent-proposed domain generalization models usually rely on domain labels, which may not be available in many real-world scenarios. To address this challenge, we propose a Discriminant Risk Minimization (DRM) theory and the corresponding algorithm to capture the invariant features without domain labels. In DRM theory, we prove that reducing the discrepancy of prediction distribution between overall source domain and any subset of it can contribute to obtaining invariant features. To apply the DRM theory, we develop an algorithm which is composed of Bayesian inference and a new penalty termed as Categorical Discriminant Risk (CDR). In Bayesian inference, we transform the output of the model into a probability distribution to align with our theoretical assumptions. We adopt sliding update approach to approximate the overall prediction distribution of the model, which enables us to obtain CDR penalty. We also indicate the effectiveness of these components in finding invariant features. We evaluate our algorithm against various domain generalization methods on multiple real-world datasets, providing empirical support for our theory. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.05112 [pdf]

Ohms law lost and regained: observation and impact of zeros and poles

Authors: Krishna Joshi, Israel Kurtz, Zhou Shi, Azriel Z. Genack

Abstract: The quantum conductance and its classical wave analogue, the transmittance, are given by the sum of the eigenvalues of the transmission matrix. The lowest transmission eigenvalue in diffusive media might be expected to play a negligible role in the conductance, and, in any case, to be too small to be observed. Here, we observe the lowest transmission eigenchannel in microwave waveguides, though it… ▽ More The quantum conductance and its classical wave analogue, the transmittance, are given by the sum of the eigenvalues of the transmission matrix. The lowest transmission eigenvalue in diffusive media might be expected to play a negligible role in the conductance, and, in any case, to be too small to be observed. Here, we observe the lowest transmission eigenchannel in microwave waveguides, though it is orders of magnitude below the nominal noise level, and show that the transmittance is pulled down by global correlation among transmission eigenvalues and among zeros and poles of the transmission matrix. Transmission vanishes either when the energy density on the sample output vanishes at topological transmission zeros or when the longitudinal velocity vanishes precisely at the crossover to a new channel. This lowers the conductance by an amount proportional to the modulation of the density of states. In accord with the correspondence principle, the conductance approaches Ohms law as the number of channels increases with sample width. The exploration of the transmission matrix opens the door to a new understanding of mesoscopic transport and ultrasensitive detection techniques. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04207 [pdf, other]

CDMamba: Remote Sensing Image Change Detection with Mamba

Authors: Haotian Zhang, Keyan Chen, Chenyang Liu, Hao Chen, Zhengxia Zou, Zhenwei Shi

Abstract: Recently, the Mamba architecture based on state space models has demonstrated remarkable performance in a series of natural language processing tasks and has been rapidly applied to remote sensing change detection (CD) tasks. However, most methods enhance the global receptive field by directly modifying the scanning mode of Mamba, neglecting the crucial role that local information plays in dense p… ▽ More Recently, the Mamba architecture based on state space models has demonstrated remarkable performance in a series of natural language processing tasks and has been rapidly applied to remote sensing change detection (CD) tasks. However, most methods enhance the global receptive field by directly modifying the scanning mode of Mamba, neglecting the crucial role that local information plays in dense prediction tasks (e.g., CD). In this article, we propose a model called CDMamba, which effectively combines global and local features for handling CD tasks. Specifically, the Scaled Residual ConvMamba (SRCM) block is proposed to utilize the ability of Mamba to extract global features and convolution to enhance the local details, to alleviate the issue that current Mamba-based methods lack detailed clues and are difficult to achieve fine detection in dense prediction tasks. Furthermore, considering the characteristics of bi-temporal feature interaction required for CD, the Adaptive Global Local Guided Fusion (AGLGF) block is proposed to dynamically facilitate the bi-temporal interaction guided by other temporal global/local features. Our intuition is that more discriminative change features can be acquired with the guidance of other temporal features. Extensive experiments on three datasets demonstrate that our proposed CDMamba outperforms the current state-of-the-art methods. Our code will be open-sourced at https://github.com/zmoka-zht/CDMamba. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.02037 [pdf]

Multi-Scale Direction-Aware Network for Infrared Small Target Detection

Authors: Jinmiao Zhao, Zelin Shi, Chuang Yu, Yunpeng Liu

Abstract: Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target. Existing deep learning-based methods focus on appearance features and ignore high-frequency directional features. Therefore, we propose a multi-scale direction-aware network (MSDA-Net), which is the first attempt to integrate the high-frequency directional features of infra… ▽ More Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target. Existing deep learning-based methods focus on appearance features and ignore high-frequency directional features. Therefore, we propose a multi-scale direction-aware network (MSDA-Net), which is the first attempt to integrate the high-frequency directional features of infrared small targets as domain prior knowledge into neural networks. Specifically, an innovative multi-directional feature awareness (MDFA) module is constructed, which fully utilizes the prior knowledge of targets and emphasizes the focus on high-frequency directional features. On this basis, combined with the multi-scale local relation learning (MLRL) module, a multi-scale direction-aware (MSDA) module is further constructed. The MSDA module promotes the full extraction of local relations at different scales and the full perception of key features in different directions. Meanwhile, a high-frequency direction injection (HFDI) module without training parameters is constructed to inject the high-frequency directional information of the original image into the network. This helps guide the network to pay attention to detailed information such as target edges and shapes. In addition, we propose a feature aggregation (FA) structure that aggregates multi-level features to solve the problem of small targets disappearing in deep feature maps. Furthermore, a lightweight feature alignment fusion (FAF) module is constructed, which can effectively alleviate the pixel offset existing in multi-level feature map fusion. Extensive experimental results show that our MSDA-Net achieves state-of-the-art (SOTA) results on the public NUDT-SIRST, SIRST and IRSTD-1k datasets. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00738 [pdf, other]

Global Rewards in Restless Multi-Armed Bandits

Authors: Naveen Raman, Zheyuan Ryan Shi, Fei Fang

Abstract: Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop… ▽ More Restless multi-armed bandits (RMAB) extend multi-armed bandits so pulling an arm impacts future states. Despite the success of RMABs, a key limiting assumption is the separability of rewards into a sum across arms. We address this deficiency by proposing restless-multi-armed bandit with global rewards (RMAB-G), a generalization of RMABs to global non-separable rewards. To solve RMAB-G, we develop the Linear- and Shapley-Whittle indices, which extend Whittle indices from RMABs to RMAB-Gs. We prove approximation bounds but also point out how these indices could fail when reward functions are highly non-linear. To overcome this, we propose two sets of adaptive policies: the first computes indices iteratively, and the second combines indices with Monte-Carlo Tree Search (MCTS). Empirically, we demonstrate that our proposed policies outperform baselines and index-based policies with synthetic data and real-world data from food rescue. △ Less

Submitted 7 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

Comments: 27 pages

arXiv:2405.21063 [pdf, other]

Neural Network Verification with Branch-and-Bound for General Nonlinearities

Authors: Zhouxing Shi, Qirui Jin, Zico Kolter, Suman Jana, Cho-Jui Hsieh, Huan Zhang

Abstract: Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide whic… ▽ More Branch-and-bound (BaB) is among the most effective methods for neural network (NN) verification. However, existing works on BaB have mostly focused on NNs with piecewise linear activations, especially ReLU networks. In this paper, we develop a general framework, named GenBaB, to conduct BaB for general nonlinearities in general computational graphs based on linear bound propagation. To decide which neuron to branch, we design a new branching heuristic which leverages linear bounds as shortcuts to efficiently estimate the potential improvement after branching. To decide nontrivial branching points for general nonlinear functions, we propose to optimize branching points offline, which can be efficiently leveraged during verification with a lookup table. We demonstrate the effectiveness of our GenBaB on verifying a wide range of NNs, including networks with activation functions such as Sigmoid, Tanh, Sine and GeLU, as well as networks involving multi-dimensional nonlinear operations such as multiplications in LSTMs and Vision Transformers. Our framework also allows the verification of general nonlinear computation graphs and enables verification applications beyond simple neural networks, particularly for AC Optimal Power Flow (ACOPF). GenBaB is part of the latest $α,\!β$-CROWN, the winner of the 4th International Verification of Neural Networks Competition (VNN-COMP 2023). △ Less

Submitted 31 May, 2024; originally announced May 2024.

Comments: Preprint

arXiv:2405.19592 [pdf, other]

Why Larger Language Models Do In-context Learning Differently?

Authors: Zhenmei Shi, Junyi Wei, Zhuoyan Xu, Yingyu Liang

Abstract: Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend… ▽ More Large language models (LLM) have emerged as a powerful tool for AI, with the key ability of in-context learning (ICL), where they can perform well on unseen tasks based on a brief series of task examples without necessitating any adjustments to the model parameters. One recent interesting mysterious observation is that models of different scales may have different ICL behaviors: larger models tend to be more sensitive to noise in the test context. This work studies this observation theoretically aiming to improve the understanding of LLM and ICL. We analyze two stylized settings: (1) linear regression with one-layer single-head linear transformers and (2) parity classification with two-layer multiple attention heads transformers (non-linear data and non-linear model). In both settings, we give closed-form optimal solutions and find that smaller models emphasize important hidden features while larger ones cover more hidden features; thus, smaller models are more robust to noise while larger ones are more easily distracted, leading to different ICL behaviors. This sheds light on where transformers pay attention to and how that affects ICL. Preliminary experimental results on large base and chat models provide positive support for our analysis. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.17193 [pdf, other]

Anisotropic Gauss Reconstruction for Unoriented Point Clouds

Authors: Yueji Ma, Dong Xiao, Zuoqiang Shi, Bin Wang

Abstract: Unoriented surface reconstructions based on the Gauss formula have attracted much attention due to their elegant mathematical formulation and excellent performance. However, the isotropic characteristics of the formulation limit their capacity to leverage the anisotropic information within the point cloud. In this work, we propose a novel anisotropic formulation by introducing a convection term in… ▽ More Unoriented surface reconstructions based on the Gauss formula have attracted much attention due to their elegant mathematical formulation and excellent performance. However, the isotropic characteristics of the formulation limit their capacity to leverage the anisotropic information within the point cloud. In this work, we propose a novel anisotropic formulation by introducing a convection term in the original Laplace operator. By choosing different velocity vectors, the anisotropic feature can be exploited to construct more effective linear equations. Moreover, an adaptive selection strategy is introduced for the velocity vector to further enhance the orientation and reconstruction performance of thin structures. Extensive experiments demonstrate that our method achieves state-of-the-art performance and manages various challenging situations, especially for models with thin structures or small holes. The source code will be released on GitHub. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 17pages;14figures

arXiv:2405.16634 [pdf, other]

Fast and Globally Consistent Normal Orientation based on the Winding Number Normal Consistency

Authors: Siyou Lin, Zuoqiang Shi, Yebin Liu

Abstract: Estimating a consistently oriented normal vector field for an unoriented point cloud enables a number of important downstream applications in computer graphics. While normal estimation for a small patch of points can be done with simple techniques like principal component analysis (PCA), orienting these normals to be globally consistent has been a notoriously difficult problem. Some recent methods… ▽ More Estimating a consistently oriented normal vector field for an unoriented point cloud enables a number of important downstream applications in computer graphics. While normal estimation for a small patch of points can be done with simple techniques like principal component analysis (PCA), orienting these normals to be globally consistent has been a notoriously difficult problem. Some recent methods exploit various properties of the winding number formula to achieve global consistency with state-of-the-art performance. Despite their exciting progress, these algorithms either have high space/time complexity, or do not produce accurate and consistently oriented normals for imperfect data. In this paper, we derive a novel property from the winding number formula to tackle this problem: the normal consistency property of the winding number formula. We refer to this property as the winding number normal consistency (WNNC). The derived property is based on the simple observation that the normals (negative gradients) sampled from the winding number field should be codirectional to the normals used to compute the winding number field. We further propose to turn the WNNC property into a normal update formula, which leads to an embarrassingly simple yet effective iterative algorithm that allows fast and high-quality convergence to a globally consistent normal vector field. Furthermore, our proposed algorithm only involves repeatedly evaluating the winding number formula and its derivatives, which can be accelerated and parallelized using treecode-based approximation algorithms due to their special structures. Exploiting this fact, we implement a GPU-accelerated treecode-based solver. Our GPU (and even CPU) implementation can be significantly faster than the recent state-of-the-art methods for normal orientation from raw points. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.16533 [pdf, other]

Chain of Tools: Large Language Model is an Automatic Multi-tool Learner

Authors: Zhengliang Shi, Shen Gao, Xiuyi Chen, Yue Feng, Lingyong Yan, Haibo Shi, Dawei Yin, Zhumin Chen, Suzan Verberne, Zhaochun Ren

Abstract: Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, empowering them to solve practical tasks. Existing work typically empowers LLMs as tool users with a manually designed workflow, where the LLM plans a series of tools in a step-by-step manner, and sequentially executes each tool to obtain intermediate results until deriving the… ▽ More Augmenting large language models (LLMs) with external tools has emerged as a promising approach to extend their utility, empowering them to solve practical tasks. Existing work typically empowers LLMs as tool users with a manually designed workflow, where the LLM plans a series of tools in a step-by-step manner, and sequentially executes each tool to obtain intermediate results until deriving the final answer. However, they suffer from two challenges in realistic scenarios: (1) The handcrafted control flow is often ad-hoc and constraints the LLM to local planning; (2) The LLM is instructed to use only manually demonstrated tools or well-trained Python functions, which limits its generalization to new tools. In this work, we first propose Automatic Tool Chain (ATC), a framework that enables the LLM to act as a multi-tool user, which directly utilizes a chain of tools through programming. To scale up the scope of the tools, we next propose a black-box probing method. This further empowers the LLM as a tool learner that can actively discover and document tool usages, teaching themselves to properly master new tools. For a comprehensive evaluation, we build a challenging benchmark named ToolFlow, which diverges from previous benchmarks by its long-term planning scenarios and complex toolset. Experiments on both existing datasets and ToolFlow illustrate the superiority of our framework. Analysis on different settings also validates the effectiveness and the utility of our black-box probing algorithm. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: Work in progress

arXiv:2405.16418 [pdf, other]

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective

Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

Abstract: Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this paper, we bridge this gap by providing a detailed examination of these smoothness properties for the case where the target data distribution is a mixtur… ▽ More Diffusion models have made rapid progress in generating high-quality samples across various domains. However, a theoretical understanding of the Lipschitz continuity and second momentum properties of the diffusion process is still lacking. In this paper, we bridge this gap by providing a detailed examination of these smoothness properties for the case where the target data distribution is a mixture of Gaussians, which serves as a universal approximator for smooth densities such as image data. We prove that if the target distribution is a $k$-mixture of Gaussians, the density of the entire diffusion process will also be a $k$-mixture of Gaussians. We then derive tight upper bounds on the Lipschitz constant and second momentum that are independent of the number of mixture components $k$. Finally, we apply our analysis to various diffusion solvers, both SDE and ODE based, to establish concrete error guarantees in terms of the total variation distance and KL divergence between the target and learned distributions. Our results provide deeper theoretical insights into the dynamics of the diffusion process under common data distributions. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16411 [pdf, other]

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers

Authors: Jiuxiang Gu, Yingyu Liang, Zhenmei Shi, Zhao Song, Yufa Zhou

Abstract: Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention. However, the $Ω(n^3)$ time complexity of tensor attention poses a significant obstacle to its practical implementation in transformers, where $n$ is the input sequence length. In this work, we prove that the… ▽ More Tensor Attention, a multi-view attention that is able to capture high-order correlations among multiple modalities, can overcome the representational limitations of classical matrix attention. However, the $Ω(n^3)$ time complexity of tensor attention poses a significant obstacle to its practical implementation in transformers, where $n$ is the input sequence length. In this work, we prove that the backward gradient of tensor attention training can be computed in almost linear $n^{1+o(1)}$ time, the same complexity as its forward computation under a bounded entries assumption. We provide a closed-form solution for the gradient and propose a fast computation method utilizing polynomial approximation methods and tensor algebraic tricks. Furthermore, we prove the necessity and tightness of our assumption through hardness analysis, showing that slightly weakening it renders the gradient problem unsolvable in truly subcubic time. Our theoretical results establish the feasibility of efficient higher-order transformer training and may facilitate practical applications of tensor attention architectures. △ Less

Submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.14602 [pdf, other]

Controllable Continual Test-Time Adaptation

Authors: Ziqi Shi, Fan Lyu, Ye Liu, Fanhua Shang, Fuyuan Hu, Wei Feng, Zhang Zhang, Liang Wang

Abstract: Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppr… ▽ More Continual Test-Time Adaptation (CTTA) is an emerging and challenging task where a model trained in a source domain must adapt to continuously changing conditions during testing, without access to the original source data. CTTA is prone to error accumulation due to uncontrollable domain shifts, leading to blurred decision boundaries between categories. Existing CTTA methods primarily focus on suppressing domain shifts, which proves inadequate during the unsupervised test phase. In contrast, we introduce a novel approach that guides rather than suppresses these shifts. Specifically, we propose $\textbf{C}$ontrollable $\textbf{Co}$ntinual $\textbf{T}$est-$\textbf{T}$ime $\textbf{A}$daptation (C-CoTTA), which explicitly prevents any single category from encroaching on others, thereby mitigating the mutual influence between categories caused by uncontrollable shifts. Moreover, our method reduces the sensitivity of model to domain transformations, thereby minimizing the magnitude of category shifts. Extensive quantitative experiments demonstrate the effectiveness of our method, while qualitative analyses, such as t-SNE plots, confirm the theoretical validity of our approach. △ Less

Submitted 28 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14394 [pdf, other]

Instruction Tuning With Loss Over Instructions

Authors: Zhengyan Shi, Adam X. Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, Aldo Lipani

Abstract: Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can… ▽ More Instruction tuning plays a crucial role in shaping the outputs of language models (LMs) to desired styles. In this work, we propose a simple yet effective method, Instruction Modelling (IM), which trains LMs by applying a loss function to the instruction and prompt part rather than solely to the output part. Through experiments across 21 diverse benchmarks, we show that, in many scenarios, IM can effectively improve the LM performance on both NLP tasks (e.g., MMLU, TruthfulQA, and HumanEval) and open-ended generation benchmarks (e.g., MT-Bench and AlpacaEval). Remarkably, in the most advantageous case, IM boosts model performance on AlpacaEval 1.0 by over 100%. We identify two key factors influencing the effectiveness of IM: (1) The ratio between instruction length and output length in the training data; and (2) The number of training examples. We observe that IM is especially beneficial when trained on datasets with lengthy instructions paired with brief outputs, or under the Superficial Alignment Hypothesis (SAH) where a small amount of training examples are used for instruction tuning. Further analysis substantiates our hypothesis that the improvement can be attributed to reduced overfitting to instruction tuning datasets. Our work provides practical guidance for instruction tuning LMs, especially in low-resource scenarios. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: Code is available at https://github.com/ZhengxiangShi/InstructionModelling

arXiv:2405.13570 [pdf, other]

MetaEarth: A Generative Foundation Model for Global-Scale Remote Sensing Image Generation

Authors: Zhiping Yu, Chenyang Liu, Liqin Liu, Zhenwei Shi, Zhengxia Zou

Abstract: The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation mo… ▽ More The recent advancement of generative foundational models has ushered in a new era of image generation in the realm of natural images, revolutionizing art design, entertainment, environment simulation, and beyond. Despite producing high-quality samples, existing methods are constrained to generating images of scenes at a limited scale. In this paper, we present MetaEarth, a generative foundation model that breaks the barrier by scaling image generation to a global level, exploring the creation of worldwide, multi-resolution, unbounded, and virtually limitless remote sensing images. In MetaEarth, we propose a resolution-guided self-cascading generative framework, which enables the generating of images at any region with a wide range of geographical resolutions. To achieve unbounded and arbitrary-sized image generation, we design a novel noise sampling strategy for denoising diffusion models by analyzing the generation conditions and initial noise. To train MetaEarth, we construct a large dataset comprising multi-resolution optical remote sensing images with geographical information. Experiments have demonstrated the powerful capabilities of our method in generating global-scale images. Additionally, the MetaEarth serves as a data engine that can provide high-quality and rich training data for downstream tasks. Our model opens up new possibilities for constructing generative world models by simulating Earth visuals from an innovative overhead perspective. △ Less

Submitted 28 May, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

Comments: Project page: https://jiupinjia.github.io/metaearth/

arXiv:2405.09964 [pdf, other]

KPNDepth: Depth Estimation of Lane Images under Complex Rainy Environment

Authors: Zhengxu Shi

Abstract: With the development of deep neural network generative models in recent years, significant progress has been made in the research of depth estimation in lane scenes. However, current research achievements are mainly focused on clear daytime scenarios. In complex rainy environments, the influence of rain streaks and local fog effects often leads to erroneous increases in the overall depth estimatio… ▽ More With the development of deep neural network generative models in recent years, significant progress has been made in the research of depth estimation in lane scenes. However, current research achievements are mainly focused on clear daytime scenarios. In complex rainy environments, the influence of rain streaks and local fog effects often leads to erroneous increases in the overall depth estimation values in images. Moreover, these natural factors can introduce disturbances to the accurate prediction of depth boundaries in images. In this paper, we investigate lane depth estimation in complex rainy environments. Based on the concept of convolutional kernel prediction, we propose a dual-layer pixel-wise convolutional kernel prediction network trained on offline data. By predicting two sets of independent convolutional kernels for the target image, we restore the depth information loss caused by complex environmental factors and address the issue of rain streak artifacts generated by a single convolutional kernel set. Furthermore, considering the lack of real rainy lane data currently available, we introduce an image synthesis algorithm, RCFLane, which comprehensively considers the darkening of the environment due to rainfall and local fog effects. We create a synthetic dataset containing 820 experimental images, which we refer to as RainKITTI, on the commonly used depth estimation dataset KITTI. Extensive experiments demonstrate that our proposed depth estimation framework achieves favorable results in highly complex lane rainy environments. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.09820 [pdf, other]

Densely Distilling Cumulative Knowledge for Continual Learning

Authors: Zenglin Shi, Pei Liu, Tong Su, Yunpeng Wu, Kuien Liu, Yu Song, Meng Wang

Abstract: Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track t… ▽ More Continual learning, involving sequential training on diverse tasks, often faces catastrophic forgetting. While knowledge distillation-based approaches exhibit notable success in preventing forgetting, we pinpoint a limitation in their ability to distill the cumulative knowledge of all the previous tasks. To remedy this, we propose Dense Knowledge Distillation (DKD). DKD uses a task pool to track the model's capabilities. It partitions the output logits of the model into dense groups, each corresponding to a task in the task pool. It then distills all tasks' knowledge using all groups. However, using all the groups can be computationally expensive, we also suggest random group selection in each optimization step. Moreover, we propose an adaptive weighting scheme, which balances the learning of new classes and the retention of old classes, based on the count and similarity of the classes. Our DKD outperforms recent state-of-the-art baselines across diverse benchmarks and scenarios. Empirical analysis underscores DKD's ability to enhance model stability, promote flatter minima for improved generalization, and remains robust across various memory budgets and task orders. Moreover, it seamlessly integrates with other CL methods to boost performance and proves versatile in offline scenarios like model compression. △ Less

Submitted 16 May, 2024; originally announced May 2024.

Comments: 12 pages; Continual Leanrning; Class-incremental Learning; Knowledge Distillation; Forgetting

arXiv:2405.09256 [pdf, other]

Drag prediction of rough-wall turbulent flow using data-driven regression

Authors: Zhaoyu Shi, Seyed Morteza Habibi Khorasani, Heesoo Shin, Jiasheng Yang, Sangseung Lee, Shervin Bagheri

Abstract: Efficient tools for predicting the drag of rough walls in turbulent flows would have a tremendous impact. However, methods for drag prediction rely on experiments or numerical simulations which are costly and time-consuming. Data-driven regression methods have the potential to provide a prediction that is accurate and fast. We assess the performance and limitations of linear regression, kernel met… ▽ More Efficient tools for predicting the drag of rough walls in turbulent flows would have a tremendous impact. However, methods for drag prediction rely on experiments or numerical simulations which are costly and time-consuming. Data-driven regression methods have the potential to provide a prediction that is accurate and fast. We assess the performance and limitations of linear regression, kernel methods and neural networks for drag prediction using a database of 1000 homogeneous rough surfaces. Model performance is evaluated using the roughness function obtained at friction-scaled Reynolds number 500. With two trainable parameters, the kernel method can fully account for nonlinear relations between $ΔU^+$ and surface statistics (roughness height, effective slope, skewness, etc). In contrast, linear regression cannot account for nonlinear correlations and display large errors and high uncertainty. Multilayer perceptron and convolutional neural networks demonstrate performance on par with the kernel method but have orders of magnitude more trainable parameters. For the current database size, the networks' capacity cannot be fully exploited, resulting in reduced generalizability and reliability. Our study provides insight into the appropriateness of different regression models for drag prediction. We also discuss the remaining steps before data-driven methods emerge as useful tools in applications. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: This manuscript consists of 18 pages, where 2 appendices, 11 figures and 3 tables are included. It is currently under review at FLOW journal. Dr. Zhaoyu Shi developed the machine learning models and conducted the data analysis as well as the draft writing. The direct numerical simulations (see Appendix A) were conducted by Dr S.M.H. Khorasani

arXiv:2405.09112 [pdf, other]

doi 10.1145/3660782

Enhancing Function Name Prediction using Votes-Based Name Tokenization and Multi-Task Learning

Authors: Xiaoling Zhang, Zhengzi Xu, Shouguo Yang, Zhi Li, Zhiqiang Shi, Limin Sun

Abstract: Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function n… ▽ More Reverse engineers would acquire valuable insights from descriptive function names, which are absent in publicly released binaries. Recent advances in binary function name prediction using data-driven machine learning show promise. However, existing approaches encounter difficulties in capturing function semantics in diverse optimized binaries and fail to reserve the meaning of labels in function names. We propose Epitome, a framework that enhances function name prediction using votes-based name tokenization and multi-task learning, specifically tailored for different compilation optimization binaries. Epitome learns comprehensive function semantics by pre-trained assembly language model and graph neural network, incorporating function semantics similarity prediction task, to maximize the similarity of function semantics in the context of different compilation optimization levels. In addition, we present two data preprocessing methods to improve the comprehensibility of function names. We evaluate the performance of Epitome using 2,597,346 functions extracted from binaries compiled with 5 optimizations (O0-Os) for 4 architectures (x64, x86, ARM, and MIPS). Epitome outperforms the state-of-the-art function name prediction tool by up to 44.34%, 64.16%, and 54.44% in precision, recall, and F1 score, while also exhibiting superior generalizability. △ Less

Submitted 15 May, 2024; originally announced May 2024.

Comments: 24 pages, 10 figures, ACM ESEC/FSE 2024

Journal ref: Proc. ACM Softw. Eng. 1,FSE, Article 75 (July 2024), 24 pages

arXiv:2405.09071 [pdf, other]

Data-driven discovery of drag-inducing elements on a rough surface through convolutional neural networks

Authors: Heesoo Shin, Seyed Morteza Habibi Khorasani, Zhaoyu Shi, Jiasheng Yang, Sangseung Lee, Shervin Bagheri

Abstract: Understanding the influence of surface roughness on drag forces remains a significant challenge in fluid dynamics. This paper presents a convolutional neural network (CNN) that predicts drag solely by the topography of rough surfaces and is capable of discovering spatial patterns linked to drag-inducing structures. A CNN model was developed to analyze spatial information from the topography of a r… ▽ More Understanding the influence of surface roughness on drag forces remains a significant challenge in fluid dynamics. This paper presents a convolutional neural network (CNN) that predicts drag solely by the topography of rough surfaces and is capable of discovering spatial patterns linked to drag-inducing structures. A CNN model was developed to analyze spatial information from the topography of a rough surface and predict the roughness function, $ΔU^+$, obtained from direct numerical simulation. This model enables the prediction of drag from rough surface data alone, which was not possible with previous methods owing to the large number of surface-derived parameters. Additionally, the retention of spatial information by the model enables the creation of a feature map that accentuates critical areas for drag prediction on rough surfaces. By interpreting the feature maps, we show that the developed CNN model is able to discover spatial patterns associated with drag distributions across rough surfaces, even without a direct training on drag distribution data. The analysis of the feature map indicates that, even without flow field information, the CNN model extracts the importance of the flow-directional slope and height of roughness elements as key factors in inducing pressure drag. This study demonstrates that CNN-based drag prediction is grounded in physical principles of fluid dynamics, underscoring the utility of CNNs in both predicting and understanding drag on rough surfaces. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.06683 [pdf, other]

ERAGent: Enhancing Retrieval-Augmented Language Models with Improved Accuracy, Efficiency, and Personalization

Authors: Yunxiao Shi, Xing Zi, Zijing Shi, Haimin Zhang, Qiang Wu, Min Xu

Abstract: Retrieval-augmented generation (RAG) for language models significantly improves language understanding systems. The basic retrieval-then-read pipeline of response generation has evolved into a more extended process due to the integration of various components, sometimes even forming loop structures. Despite its advancements in improving response accuracy, challenges like poor retrieval quality for… ▽ More Retrieval-augmented generation (RAG) for language models significantly improves language understanding systems. The basic retrieval-then-read pipeline of response generation has evolved into a more extended process due to the integration of various components, sometimes even forming loop structures. Despite its advancements in improving response accuracy, challenges like poor retrieval quality for complex questions that require the search of multifaceted semantic information, inefficiencies in knowledge re-retrieval during long-term serving, and lack of personalized responses persist. Motivated by transcending these limitations, we introduce ERAGent, a cutting-edge framework that embodies an advancement in the RAG area. Our contribution is the introduction of the synergistically operated module: Enhanced Question Rewriter and Knowledge Filter, for better retrieval quality. Retrieval Trigger is incorporated to curtail extraneous external knowledge retrieval without sacrificing response quality. ERAGent also personalizes responses by incorporating a learned user profile. The efficiency and personalization characteristics of ERAGent are supported by the Experiential Learner module which makes the AI assistant being capable of expanding its knowledge and modeling user profile incrementally. Rigorous evaluations across six datasets and three question-answering tasks prove ERAGent's superior accuracy, efficiency, and personalization, emphasizing its potential to advance the RAG field and its applicability in practical systems. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: Draft Paper

arXiv:2405.05219 [pdf, other]

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

Authors: Jiuxiang Gu, Yingyu Liang, Heshan Liu, Zhenmei Shi, Zhao Song, Junze Yin

Abstract: Large Language Models (LLMs) have profoundly changed the world. Their self-attention mechanism is the key to the success of transformers in LLMs. However, the quadratic computational cost $O(n^2)$ to the length $n$ input sequence is the notorious obstacle for further improvement and scalability in the longer context. In this work, we leverage the convolution-like structure of attention matrices to… ▽ More Large Language Models (LLMs) have profoundly changed the world. Their self-attention mechanism is the key to the success of transformers in LLMs. However, the quadratic computational cost $O(n^2)$ to the length $n$ input sequence is the notorious obstacle for further improvement and scalability in the longer context. In this work, we leverage the convolution-like structure of attention matrices to develop an efficient approximation method for attention computation using convolution matrices. We propose a $\mathsf{conv}$ basis system, "similar" to the rank basis, and show that any lower triangular (attention) matrix can always be decomposed as a sum of $k$ structured convolution matrices in this basis system. We then design an algorithm to quickly decompose the attention matrix into $k$ convolution matrices. Thanks to Fast Fourier Transforms (FFT), the attention {\it inference} can be computed in $O(knd \log n)$ time, where $d$ is the hidden dimension. In practice, we have $ d \ll n$, i.e., $d=3,072$ and $n=1,000,000$ for Gemma. Thus, when $kd = n^{o(1)}$, our algorithm achieve almost linear time, i.e., $n^{1+o(1)}$. Furthermore, the attention {\it training forward} and {\it backward gradient} can be computed in $n^{1+o(1)}$ as well. Our approach can avoid explicitly computing the $n \times n$ attention matrix, which may largely alleviate the quadratic computational complexity. Furthermore, our algorithm works on any input matrices. This work provides a new paradigm for accelerating attention computation in transformers to enable their application to longer contexts. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 55 pages

arXiv:2405.04122 [pdf, other]

Ranking-based Client Selection with Imitation Learning for Efficient Federated Learning

Authors: Chunlin Tian, Zhan Shi, Xinpeng Qin, Li Li, Chengzhong Xu

Abstract: Federated Learning (FL) enables multiple devices to collaboratively train a shared model while ensuring data privacy. The selection of participating devices in each training round critically affects both the model performance and training efficiency, especially given the vast heterogeneity in training capabilities and data distribution across devices. To address these challenges, we introduce a no… ▽ More Federated Learning (FL) enables multiple devices to collaboratively train a shared model while ensuring data privacy. The selection of participating devices in each training round critically affects both the model performance and training efficiency, especially given the vast heterogeneity in training capabilities and data distribution across devices. To address these challenges, we introduce a novel device selection solution called FedRank, which is an end-to-end, ranking-based approach that is pre-trained by imitation learning against state-of-the-art analytical approaches. It not only considers data and system heterogeneity at runtime but also adaptively and efficiently chooses the most suitable clients for model training. Specifically, FedRank views client selection in FL as a ranking problem and employs a pairwise training strategy for the smart selection process. Additionally, an imitation learning-based approach is designed to counteract the cold-start issues often seen in state-of-the-art learning-based approaches. Experimental results reveal that \model~ boosts model accuracy by 5.2\% to 56.9\%, accelerates the training convergence up to $2.01 \times$ and saves the energy consumption up to $40.1\%$. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted by ICML 2024

arXiv:2405.03251 [pdf, ps, other]

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond

Authors: Jiuxiang Gu, Chenyang Li, Yingyu Liang, Zhenmei Shi, Zhao Song

Abstract: The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture. However, the underlying learning dynamics that contribute to the effectiveness of softmax remain largely unexplored. As a step towards better understanding, this paper provides a theoretical study of the op… ▽ More The softmax activation function plays a crucial role in the success of large language models (LLMs), particularly in the self-attention mechanism of the widely adopted Transformer architecture. However, the underlying learning dynamics that contribute to the effectiveness of softmax remain largely unexplored. As a step towards better understanding, this paper provides a theoretical study of the optimization and generalization properties of two-layer softmax neural networks, providing theoretical insights into their superior performance as other activation functions, such as ReLU and exponential. Leveraging the Neural Tangent Kernel (NTK) framework, our analysis reveals that the normalization effect of the softmax function leads to a good perturbation property of the induced NTK matrix, resulting in a good convex region of the loss landscape. Consequently, softmax neural networks can learn the target function in the over-parametrization regime. To demonstrate the broad applicability of our theoretical findings, we apply them to the task of learning score estimation functions in diffusion models, a promising approach for generative modeling. Our analysis shows that gradient-based algorithms can learn the score function with a provable accuracy. Our work provides a deeper understanding of the effectiveness of softmax neural networks and their potential in various domains, paving the way for further advancements in natural language processing and beyond. △ Less

Submitted 6 May, 2024; originally announced May 2024.

Comments: 53 pages

Showing 1–50 of 1,138 results for author: Shi, Z