-
Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics
Authors:
Yuang Zhang,
Yu Hu,
Yunlong Song,
Danping Zou,
Weiyao Lin
Abstract:
Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulatio…
▽ More
Swarm navigation in cluttered environments is a grand challenge in robotics. This work combines deep learning with first-principle physics through differentiable simulation to enable autonomous navigation of multiple aerial robots through complex environments at high speed. Our approach optimizes a neural network control policy directly by backpropagating loss gradients through the robot simulation using a simple point-mass physics model and a depth rendering engine. Despite this simplicity, our method excels in challenging tasks for both multi-agent and single-agent applications with zero-shot sim-to-real transfer. In multi-agent scenarios, our system demonstrates self-organized behavior, enabling autonomous coordination without communication or centralized planning - an achievement not seen in existing traditional or learning-based methods. In single-agent scenarios, our system achieves a 90% success rate in navigating through complex environments, significantly surpassing the 60% success rate of the previous state-of-the-art approach. Our system can operate without state estimation and adapt to dynamic obstacles. In real-world forest environments, it navigates at speeds up to 20 m/s, doubling the speed of previous imitation learning-based solutions. Notably, all these capabilities are deployed on a budget-friendly $21 computer, costing less than 5% of a GPU-equipped board used in existing systems. Video demonstrations are available at https://youtu.be/LKg9hJqc2cc.
△ Less
Submitted 15 July, 2024; v1 submitted 15 July, 2024;
originally announced July 2024.
-
Improving Hyperbolic Representations via Gromov-Wasserstein Regularization
Authors:
Yifei Yang,
Wonjun Lee,
Dongmian Zou,
Gilad Lerman
Abstract:
Hyperbolic representations have shown remarkable efficacy in modeling inherent hierarchies and complexities within data structures. Hyperbolic neural networks have been commonly applied for learning such representations from data, but they often fall short in preserving the geometric structures of the original feature spaces. In response to this challenge, our work applies the Gromov-Wasserstein (…
▽ More
Hyperbolic representations have shown remarkable efficacy in modeling inherent hierarchies and complexities within data structures. Hyperbolic neural networks have been commonly applied for learning such representations from data, but they often fall short in preserving the geometric structures of the original feature spaces. In response to this challenge, our work applies the Gromov-Wasserstein (GW) distance as a novel regularization mechanism within hyperbolic neural networks. The GW distance quantifies how well the original data structure is maintained after embedding the data in a hyperbolic space. Specifically, we explicitly treat the layers of the hyperbolic neural networks as a transport map and calculate the GW distance accordingly. We validate that the GW distance computed based on a training set well approximates the GW distance of the underlying data distribution. Our approach demonstrates consistent enhancements over current state-of-the-art methods across various tasks, including few-shot image classification, as well as semi-supervised graph link prediction and node classification.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Hardware-Efficient and Reliable Coherent DSCM Systems Enabled by Single-Pilot-Tone-Based Polarization Demultiplexing
Authors:
Wei Wang,
Dongdong Zou,
Weihao Ni,
Fan Li
Abstract:
Recently, coherent digital subcarrier multiplexing (DSCM) technology has become an attractive solution for next-generation ultra-high-speed datacenter interconnects (DCIs). To meet the requirements of low-cost and low-power consumption in DCI applications, a comprehensive simplification of the coherent DSCM system has been investigated. The pilot-tone-based polarization demultiplexing (PT-PDM) tec…
▽ More
Recently, coherent digital subcarrier multiplexing (DSCM) technology has become an attractive solution for next-generation ultra-high-speed datacenter interconnects (DCIs). To meet the requirements of low-cost and low-power consumption in DCI applications, a comprehensive simplification of the coherent DSCM system has been investigated. The pilot-tone-based polarization demultiplexing (PT-PDM) technique, known for its low-power consumption and ultra-fast polarization tracking capabilities, has emerged as a compelling alternative to the power-hungry N-tap adaptive multi-input multiple-output (MIMO) equalizer. However, the effectiveness of this PT-PDM technique is extremely vulnerable to the receiver-side XY-skew (Rx-XY-skew), which is revealed in this paper for the first time. Then, a pilot-tone-enabled modified Godard phase detector (PT-MGPD) scheme is proposed to realize Rx-XY-skew estimation, serving as the prerequisite for the successful implementation of the PT-PDM and simplification of the adaptive equalizer. Both the simulation and experiment are conducted to evaluate the accuracy of the proposed PT-MGPD scheme. The results prove it can achieve accurate estimation with an error of less than 0.3ps. Besides, a low-complexity, high-spectral-efficiency, and ultra-fast polarization demultiplexing method based on a single pilot tone (SPT) is proposed for the DSCM system in this work. Based on the proposed PT-MGPD and SPT schemes, the conventional N-tap MIMO equalizer served for each subcarrier can be successfully pruned into two polarization-independent single-input single-output equalizers, and there is no performance penalty even if the polarization rotation speed reaches 10Mrad/s. According to the results, the proposed schemes provide a hardware-efficient and reliable coherent DSCM solution for next-generation ultra-high-speed DCIs.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Limiting Over-Smoothing and Over-Squashing of Graph Message Passing by Deep Scattering Transforms
Authors:
Yuanhong Jiang,
Dongmian Zou,
Xiaoqun Zhang,
Yu Guang Wang
Abstract:
Graph neural networks (GNNs) have become pivotal tools for processing graph-structured data, leveraging the message passing scheme as their core mechanism. However, traditional GNNs often grapple with issues such as instability, over-smoothing, and over-squashing, which can degrade performance and create a trade-off dilemma. In this paper, we introduce a discriminatively trained, multi-layer Deep…
▽ More
Graph neural networks (GNNs) have become pivotal tools for processing graph-structured data, leveraging the message passing scheme as their core mechanism. However, traditional GNNs often grapple with issues such as instability, over-smoothing, and over-squashing, which can degrade performance and create a trade-off dilemma. In this paper, we introduce a discriminatively trained, multi-layer Deep Scattering Message Passing (DSMP) neural network designed to overcome these challenges. By harnessing spectral transformation, the DSMP model aggregates neighboring nodes with global information, thereby enhancing the precision and accuracy of graph signal processing. We provide theoretical proofs demonstrating the DSMP's effectiveness in mitigating these issues under specific conditions. Additionally, we support our claims with empirical evidence and thorough frequency analysis, showcasing the DSMP's superior ability to address instability, over-smoothing, and over-squashing.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Identifying \textit{doppelgänge} Black Holes through Shadow Images
Authors:
Yukun Xu,
Hyat Huang,
Meng-Yun Lai,
De-Cheng Zou
Abstract:
Recently, an interesting \textit{doppelgänge} black hole solution is obtained in the string-inspired Euler-Heisenberg theory, where the black holes have the same radii but share different charges. We found, however, they possess different ISCOs and photon spheres, and hence affect their shadow images. In this work, we investigate the optical appearances, illuminated by an optically and geometrical…
▽ More
Recently, an interesting \textit{doppelgänge} black hole solution is obtained in the string-inspired Euler-Heisenberg theory, where the black holes have the same radii but share different charges. We found, however, they possess different ISCOs and photon spheres, and hence affect their shadow images. In this work, we investigate the optical appearances, illuminated by an optically and geometrically thin disk, are investigated, of such black hole. One finds that doppelgänge black holes have different optical appearances. Even the horizon radii are the same, the size of shadows are not equal. Furthermore, we found that the large magnetic charge $Q_m$ black holes give rise to novel shadow images that the usual bright rings inside shadow are not clear, The optical appearances illuminated by spherically accretions are also examined, and it can also identify two doppelgänge black holes.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
DiffRetouch: Using Diffusion to Retouch on the Shoulder of Experts
Authors:
Zheng-Peng Duan,
Jiawei zhang,
Zheng Lin,
Xin Jin,
Dongqing Zou,
Chunle Guo,
Chongyi Li
Abstract:
Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during…
▽ More
Image retouching aims to enhance the visual quality of photos. Considering the different aesthetic preferences of users, the target of retouching is subjective. However, current retouching methods mostly adopt deterministic models, which not only neglects the style diversity in the expert-retouched results and tends to learn an average style during training, but also lacks sample diversity during inference. In this paper, we propose a diffusion-based method, named DiffRetouch. Thanks to the excellent distribution modeling ability of diffusion, our method can capture the complex fine-retouched distribution covering various visual-pleasing styles in the training data. Moreover, four image attributes are made adjustable to provide a user-friendly editing mechanism. By adjusting these attributes in specified ranges, users are allowed to customize preferred styles within the learned fine-retouched distribution. Additionally, the affine bilateral grid and contrastive learning scheme are introduced to handle the problem of texture distortion and control insensitivity respectively. Extensive experiments have demonstrated the superior performance of our method on visually appealing and sample diversity. The code will be made available to the community.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Extracting Training Data from Unconditional Diffusion Models
Authors:
Yunhao Chen,
Xingjun Ma,
Difan Zou,
Yu-Gang Jiang
Abstract:
As diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI), the study of their memorization of the raw training data has attracted growing attention. Existing works in this direction aim to establish an understanding of whether or to what extent DPMs learn by memorization. Such an understanding is crucial for identifying potential r…
▽ More
As diffusion probabilistic models (DPMs) are being employed as mainstream models for generative artificial intelligence (AI), the study of their memorization of the raw training data has attracted growing attention. Existing works in this direction aim to establish an understanding of whether or to what extent DPMs learn by memorization. Such an understanding is crucial for identifying potential risks of data leakage and copyright infringement in diffusion models and, more importantly, for more controllable generation and trustworthy application of Artificial Intelligence Generated Content (AIGC). While previous works have made important observations of when DPMs are prone to memorization, these findings are mostly empirical, and the developed data extraction methods only work for conditional diffusion models. In this work, we aim to establish a theoretical understanding of memorization in DPMs with 1) a memorization metric for theoretical analysis, 2) an analysis of conditional memorization with informative and random labels, and 3) two better evaluation metrics for measuring memorization. Based on the theoretical analysis, we further propose a novel data extraction method called \textbf{Surrogate condItional Data Extraction (SIDE)} that leverages a classifier trained on generated data as a surrogate condition to extract training data directly from unconditional diffusion models. Our empirical results demonstrate that SIDE can extract training data from diffusion models where previous methods fail, and it is on average over 50\% more effective across different scales of the CelebA dataset.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Explainable Bayesian Recurrent Neural Smoother to Capture Global State Evolutionary Correlations
Authors:
Shi Yan,
Yan Liang,
Huayu Zhang,
Le Zheng,
Difan Zou,
Binglu Wang
Abstract:
Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformati…
▽ More
Through integrating the evolutionary correlations across global states in the bidirectional recursion, an explainable Bayesian recurrent neural smoother (EBRNS) is proposed for offline data-assisted fixed-interval state smoothing. At first, the proposed model, containing global states in the evolutionary interval, is transformed into an equivalent model with bidirectional memory. This transformation incorporates crucial global state information with support for bi-directional recursive computation. For the transformed model, the joint state-memory-trend Bayesian filtering and smoothing frameworks are derived by introducing the bidirectional memory iteration mechanism and offline data into Bayesian estimation theory. The derived frameworks are implemented using the Gaussian approximation to ensure analytical properties and computational efficiency. Finally, the neural network modules within EBRNS and its two-stage training scheme are designed. Unlike most existing approaches that artificially combine deep learning and model-based estimation, the bidirectional recursion and internal gated structures of EBRNS are naturally derived from Bayesian estimation theory, explainably integrating prior model knowledge, online measurement, and offline data. Experiments on representative real-world datasets demonstrate that the high smoothing accuracy of EBRNS is accompanied by data efficiency and a lightweight parameter scale.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
The Implicit Bias of Adam on Separable Data
Authors:
Chenyang Zhang,
Difan Zou,
Yuan Cao
Abstract:
Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maxim…
▽ More
Adam has become one of the most favored optimizers in deep learning problems. Despite its success in practice, numerous mysteries persist regarding its theoretical understanding. In this paper, we study the implicit bias of Adam in linear logistic regression. Specifically, we show that when the training data are linearly separable, Adam converges towards a linear classifier that achieves the maximum $\ell_\infty$-margin. Notably, for a general class of diminishing learning rates, this convergence occurs within polynomial time. Our result shed light on the difference between Adam and (stochastic) gradient descent from a theoretical perspective.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Self-Control of LLM Behaviors by Compressing Suffix Gradient into Prefix Controller
Authors:
Min Cai,
Yuchen Zhang,
Shichang Zhang,
Fan Yin,
Difan Zou,
Yisong Yue,
Ziniu Hu
Abstract:
We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation pro…
▽ More
We propose Self-Control, a novel method utilizing suffix gradients to control the behavior of large language models (LLMs) without explicit human annotations. Given a guideline expressed in suffix string and the model's self-assessment of adherence, Self-Control computes the gradient of this self-judgment concerning the model's hidden states, directly influencing the auto-regressive generation process towards desired behaviors. To enhance efficiency, we introduce Self-Control_{prefix}, a compact module that encapsulates the learned representations from suffix gradients into a Prefix Controller, facilitating inference-time control for various LLM behaviors. Our experiments demonstrate Self-Control's efficacy across multiple domains, including emotional modulation, ensuring harmlessness, and enhancing complex reasoning. Especially, Self-Control_{prefix} enables a plug-and-play control and jointly controls multiple attributes, improving model outputs without altering model parameters or increasing inference-time costs.
△ Less
Submitted 18 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
RASE: Efficient Privacy-preserving Data Aggregation against Disclosure Attacks for IoTs
Authors:
Zuyan Wang,
Jun Tao,
Dika Zou
Abstract:
The growing popular awareness of personal privacy raises the following quandary: what is the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices. Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes. Very recent work has taken steps towards relaxing the assumption by…
▽ More
The growing popular awareness of personal privacy raises the following quandary: what is the new paradigm for collecting and protecting the data produced by ever-increasing sensor devices. Most previous studies on co-design of data aggregation and privacy preservation assume that a trusted fusion center adheres to privacy regimes. Very recent work has taken steps towards relaxing the assumption by allowing data contributors to locally perturb their own data. Although these solutions withhold some data content to mitigate privacy risks, they have been shown to offer insufficient protection against disclosure attacks. Aiming at providing a more rigorous data safeguard for the Internet of Things (IoTs), this paper initiates the study of privacy-preserving data aggregation. We propose a novel paradigm (called RASE), which can be generalized into a 3-step sequential procedure, noise addition, followed by random permutation, and then parameter estimation. Specially, we design a differentially private randomizer, which carefully guides data contributors to obfuscate the truth. Then, a shuffler is employed to receive the noisy data from all data contributors. After that, it breaks the correct linkage between senders and receivers by applying a random permutation. The estimation phase involves using inaccurate data to calculate an approximate aggregate value. Extensive simulations are provided to explore the privacy-utility landscape of our RASE.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Knowledge Enhanced Multi-intent Transformer Network for Recommendation
Authors:
Ding Zou,
Wei Wei,
Feida Zhu,
Chuanyu Xu,
Tao Zhang,
Chengfu Huo
Abstract:
Incorporating Knowledge Graphs into Recommendation has attracted growing attention in industry, due to the great potential of KG in providing abundant supplementary information and interpretability for the underlying models. However, simply integrating KG into recommendation usually brings in negative feedback in industry, due to the ignorance of the following two factors: i) users' multiple inten…
▽ More
Incorporating Knowledge Graphs into Recommendation has attracted growing attention in industry, due to the great potential of KG in providing abundant supplementary information and interpretability for the underlying models. However, simply integrating KG into recommendation usually brings in negative feedback in industry, due to the ignorance of the following two factors: i) users' multiple intents, which involve diverse nodes in KG. For example, in e-commerce scenarios, users may exhibit preferences for specific styles, brands, or colors. ii) knowledge noise, which is a prevalent issue in Knowledge Enhanced Recommendation (KGR) and even more severe in industry scenarios. The irrelevant knowledge properties of items may result in inferior model performance compared to approaches that do not incorporate knowledge. To tackle these challenges, we propose a novel approach named Knowledge Enhanced Multi-intent Transformer Network for Recommendation (KGTN), comprising two primary modules: Global Intents Modeling with Graph Transformer, and Knowledge Contrastive Denoising under Intents. Specifically, Global Intents with Graph Transformer focuses on capturing learnable user intents, by incorporating global signals from user-item-relation-entity interactions with a graph transformer, meanwhile learning intent-aware user/item representations. Knowledge Contrastive Denoising under Intents is dedicated to learning precise and robust representations. It leverages intent-aware representations to sample relevant knowledge, and proposes a local-global contrastive mechanism to enhance noise-irrelevant representation learning. Extensive experiments conducted on benchmark datasets show the superior performance of our proposed method over the state-of-the-arts. And online A/B testing results on Alibaba large-scale industrial recommendation platform also indicate the real-scenario effectiveness of KGTN.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Slight Corruption in Pre-training Data Makes Better Diffusion Models
Authors:
Hao Chen,
Yujin Han,
Diganta Misra,
Xiang Li,
Kai Hu,
Difan Zou,
Masashi Sugiyama,
Jindong Wang,
Bhiksha Raj
Abstract:
Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pair…
▽ More
Diffusion models (DMs) have shown remarkable capabilities in generating realistic high-quality images, audios, and videos. They benefit significantly from extensive pre-training on large-scale datasets, including web-crawled data with paired data and conditions, such as image-text and image-class pairs. Despite rigorous filtering, these pre-training datasets often inevitably contain corrupted pairs where conditions do not accurately describe the data. This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs. We synthetically corrupt ImageNet-1K and CC3M to pre-train and evaluate over 50 conditional DMs. Our empirical findings reveal that various types of slight corruption in pre-training can significantly enhance the quality, diversity, and fidelity of the generated images across different DMs, both during pre-training and downstream adaptation stages. Theoretically, we consider a Gaussian mixture model and prove that slight corruption in the condition leads to higher entropy and a reduced 2-Wasserstein distance to the ground truth of the data distribution generated by the corruptly trained DMs. Inspired by our analysis, we propose a simple method to improve the training of DMs on practical datasets by adding condition embedding perturbations (CEP). CEP significantly improves the performance of various DMs in both pre-training and downstream tasks. We hope that our study provides new insights into understanding the data and pre-training processes of DMs.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models
Authors:
Chengxing Xie,
Difan Zou
Abstract:
Recent studies have highlighted their proficiency in some simple tasks like writing and coding through various reasoning strategies. However, LLM agents still struggle with tasks that require comprehensive planning, a process that challenges current models and remains a critical research issue. In this study, we concentrate on travel planning, a Multi-Phases planning problem, that involves multipl…
▽ More
Recent studies have highlighted their proficiency in some simple tasks like writing and coding through various reasoning strategies. However, LLM agents still struggle with tasks that require comprehensive planning, a process that challenges current models and remains a critical research issue. In this study, we concentrate on travel planning, a Multi-Phases planning problem, that involves multiple interconnected stages, such as outlining, information gathering, and planning, often characterized by the need to manage various constraints and uncertainties. Existing reasoning approaches have struggled to effectively address this complex task. Our research aims to address this challenge by developing a human-like planning framework for LLM agents, i.e., guiding the LLM agent to simulate various steps that humans take when solving Multi-Phases problems. Specifically, we implement several strategies to enable LLM agents to generate a coherent outline for each travel query, mirroring human planning patterns. Additionally, we integrate Strategy Block and Knowledge Block into our framework: Strategy Block facilitates information collection, while Knowledge Block provides essential information for detailed planning. Through our extensive experiments, we demonstrate that our framework significantly improves the planning capabilities of LLM agents, enabling them to tackle the travel planning task with improved efficiency and effectiveness. Our experimental results showcase the exceptional performance of the proposed framework; when combined with GPT-4-Turbo, it attains $10\times$ the performance gains in comparison to the baseline framework deployed on GPT-4-Turbo.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Faster Sampling via Stochastic Gradient Proximal Sampler
Authors:
Xunpeng Huang,
Difan Zou,
Yi-An Ma,
Hanze Dong,
Tong Zhang
Abstract:
Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochasti…
▽ More
Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochastic Proximal Samplers (SPS) for sampling from non-log-concave distributions. We first establish a general framework for implementing stochastic proximal samplers and establish the convergence theory accordingly. We show that the convergence to the target distribution can be guaranteed as long as the second moment of the algorithm trajectory is bounded and restricted Gaussian oracles can be well approximated. We then provide two implementable variants based on Stochastic gradient Langevin dynamics (SGLD) and Metropolis-adjusted Langevin algorithm (MALA), giving rise to SPS-SGLD and SPS-MALA. We further show that SPS-SGLD and SPS-MALA can achieve $ε$-sampling error in total variation (TV) distance within $\tilde{\mathcal{O}}(dε^{-2})$ and $\tilde{\mathcal{O}}(d^{1/2}ε^{-2})$ gradient complexities, which outperform the best-known result by at least an $\tilde{\mathcal{O}}(d^{1/3})$ factor. This enhancement in performance is corroborated by our empirical studies on synthetic data with various dimensions, demonstrating the efficiency of our proposed algorithm.
△ Less
Submitted 26 May, 2024;
originally announced May 2024.
-
Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference
Authors:
Xunpeng Huang,
Difan Zou,
Hanze Dong,
Yi Zhang,
Yi-An Ma,
Tong Zhang
Abstract:
To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Ga…
▽ More
To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in $\tilde O(1)$ subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference. In theory, we further develop the convergence guarantees for RTK-MALA and RTK-ULD in total variation (TV) distance: RTK-ULD can achieve $ε$ target error within $\tilde{\mathcal O}(d^{1/2}ε^{-1})$ under mild conditions, and RTK-MALA enjoys a $\mathcal{O}(d^{2}\log(d/ε))$ convergence rate under slightly stricter conditions. These theoretical results surpass the state-of-the-art convergence rates for diffusion inference and are well supported by numerical experiments.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
Nonlinear scalarization of Schwarzschild black holes in Einstein-scalar-Gauss-Bonnet gravity
Authors:
Chao-Ming Zhang,
Zhen-Hao Yang,
Meng-Yun Lai,
Yun Soo Myung,
De-Cheng Zou
Abstract:
In this paper, we propose a fully nonlinear mechanism for obtaining scalarized black holes in Einstein-scalar-Gauss-Bonnet (EsGB) gravity which is beyond the spontaneous scalarization. Introducing three coupling functions $f(\varphi)$ satisfying $f''(0) = 0$, we find that Schwarzschild black hole is linearly stable against scalar perturbation, whereas it is unstable against nonlinear scalar pertur…
▽ More
In this paper, we propose a fully nonlinear mechanism for obtaining scalarized black holes in Einstein-scalar-Gauss-Bonnet (EsGB) gravity which is beyond the spontaneous scalarization. Introducing three coupling functions $f(\varphi)$ satisfying $f''(0) = 0$, we find that Schwarzschild black hole is linearly stable against scalar perturbation, whereas it is unstable against nonlinear scalar perturbation if the coupling function includes term higher than $\varphi^6$. For a specific choice of coupling function $f(\varphi)=α(\varphi^4-β\varphi^6)$, we obtain new black holes with scalar hair in the EsGB gravity. In this case, the coupling parameter $α$ plays a major role in making different nonlinear scalarized black holes, while the other parameter $β$ plays a supplementary role. Furthermore, we study thermodynamic aspects of these scalarized black holes and prove the first-law of thermodynamics.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Thermodynamics of charged Lifshitz black holes with scalar hair
Authors:
Shan Wu,
Kai-Qiang Qian,
Rui-Hong Yue,
Ming Zhang,
De-Cheng Zou
Abstract:
In this work, we discuss the generalized Einstein-Maxwell-Dilaton gravity theory with a nonminimal coupling between the Maxwell field and scalar field. Considering different geometric properties of black hole horizon structure, the charged dilaton Lifshitz black hole solutions are presented in 4-dimensional spacetimes. Later, utilizing the Wald Formalism, we derive the thermodynamic first law of b…
▽ More
In this work, we discuss the generalized Einstein-Maxwell-Dilaton gravity theory with a nonminimal coupling between the Maxwell field and scalar field. Considering different geometric properties of black hole horizon structure, the charged dilaton Lifshitz black hole solutions are presented in 4-dimensional spacetimes. Later, utilizing the Wald Formalism, we derive the thermodynamic first law of black hole and conserved quantities. According to the relationship between the heat capacity and the local stability of black hole, we study the stability of charged Lifshitz black holes and identify the thermodynamic stable region of black holes that meet the criteria.
△ Less
Submitted 30 April, 2024; v1 submitted 30 April, 2024;
originally announced April 2024.
-
Improving Group Robustness on Spurious Correlation Requires Preciser Group Inference
Authors:
Yujin Han,
Difan Zou
Abstract:
Standard empirical risk minimization (ERM) models may prioritize learning spurious correlations between spurious features and true labels, leading to poor accuracy on groups where these correlations do not hold. Mitigating this issue often requires expensive spurious attribute (group) labels or relies on trained ERM models to infer group labels when group information is unavailable. However, the s…
▽ More
Standard empirical risk minimization (ERM) models may prioritize learning spurious correlations between spurious features and true labels, leading to poor accuracy on groups where these correlations do not hold. Mitigating this issue often requires expensive spurious attribute (group) labels or relies on trained ERM models to infer group labels when group information is unavailable. However, the significant performance gap in worst-group accuracy between using pseudo group labels and using oracle group labels inspires us to consider further improving group robustness through preciser group inference. Therefore, we propose GIC, a novel method that accurately infers group labels, resulting in improved worst-group performance. GIC trains a spurious attribute classifier based on two key properties of spurious correlations: (1) high correlation between spurious attributes and true labels, and (2) variability in this correlation between datasets with different group distributions. Empirical studies on multiple datasets demonstrate the effectiveness of GIC in inferring group labels, and combining GIC with various downstream invariant learning methods improves worst-group accuracy, showcasing its powerful flexibility. Additionally, through analyzing the misclassifications in GIC, we identify an interesting phenomenon called semantic consistency, which may contribute to better decoupling the association between spurious attributes and labels, thereby mitigating spurious correlation. The code for GIC is available at https://github.com/yujinhanml/GIC.
△ Less
Submitted 3 June, 2024; v1 submitted 21 April, 2024;
originally announced April 2024.
-
The Dog Walking Theory: Rethinking Convergence in Federated Learning
Authors:
Kun Zhai,
Yifeng Gao,
Xingjun Ma,
Difan Zou,
Guangnan Ye,
Yu-Gang Jiang
Abstract:
Federated learning (FL) is a collaborative learning paradigm that allows different clients to train one powerful global model without sharing their private data. Although FL has demonstrated promising results in various applications, it is known to suffer from convergence issues caused by the data distribution shift across different clients, especially on non-independent and identically distribute…
▽ More
Federated learning (FL) is a collaborative learning paradigm that allows different clients to train one powerful global model without sharing their private data. Although FL has demonstrated promising results in various applications, it is known to suffer from convergence issues caused by the data distribution shift across different clients, especially on non-independent and identically distributed (non-IID) data. In this paper, we study the convergence of FL on non-IID data and propose a novel \emph{Dog Walking Theory} to formulate and identify the missing element in existing research. The Dog Walking Theory describes the process of a dog walker leash walking multiple dogs from one side of the park to the other. The goal of the dog walker is to arrive at the right destination while giving the dogs enough exercise (i.e., space exploration). In FL, the server is analogous to the dog walker while the clients are analogous to the dogs. This analogy allows us to identify one crucial yet missing element in existing FL algorithms: the leash that guides the exploration of the clients. To address this gap, we propose a novel FL algorithm \emph{FedWalk} that leverages an external easy-to-converge task at the server side as a \emph{leash task} to guide the local training of the clients. We theoretically analyze the convergence of FedWalk with respect to data heterogeneity (between server and clients) and task discrepancy (between the leash and the original tasks). Experiments on multiple benchmark datasets demonstrate the superiority of FedWalk over state-of-the-art FL methods under both IID and non-IID settings.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
PRIME: A CyberGIS Platform for Resilience Inference Measurement and Enhancement
Authors:
Debayan Mandal,
Dr. Lei Zou,
Rohan Singh Wilkho,
Joynal Abedin,
Bing Zhou,
Dr. Heng Cai,
Dr. Furqan Baig,
Dr. Nasir Gharaibeh,
Dr. Nina Lam
Abstract:
In an era of increased climatic disasters, there is an urgent need to develop reliable frameworks and tools for evaluating and improving community resilience to climatic hazards at multiple geographical and temporal scales. Defining and quantifying resilience in the social domain is relatively subjective due to the intricate interplay of socioeconomic factors with disaster resilience. Meanwhile, t…
▽ More
In an era of increased climatic disasters, there is an urgent need to develop reliable frameworks and tools for evaluating and improving community resilience to climatic hazards at multiple geographical and temporal scales. Defining and quantifying resilience in the social domain is relatively subjective due to the intricate interplay of socioeconomic factors with disaster resilience. Meanwhile, there is a lack of computationally rigorous, user-friendly tools that can support customized resilience assessment considering local conditions. This study aims to address these gaps through the power of CyberGIS with three objectives: 1) To develop an empirically validated disaster resilience model - Customized Resilience Inference Measurement designed for multi-scale community resilience assessment and influential socioeconomic factors identification, 2) To implement a Platform for Resilience Inference Measurement and Enhancement module in the CyberGISX platform backed by high-performance computing, 3) To demonstrate the utility of PRIME through a representative study. CRIM generates vulnerability, adaptability, and overall resilience scores derived from empirical hazard parameters. Computationally intensive Machine Learning methods are employed to explain the intricate relationships between these scores and socioeconomic driving factors. PRIME provides a web-based notebook interface guiding users to select study areas, configure parameters, calculate and geo-visualize resilience scores, and interpret socioeconomic factors shaping resilience capacities. A representative study showcases the efficiency of the platform while explaining how the visual results obtained may be interpreted. The essence of this work lies in its comprehensive architecture that encapsulates the requisite data, analytical and geo-visualization functions, and ML models for resilience assessment.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Stereo-LiDAR Depth Estimation with Deformable Propagation and Learned Disparity-Depth Conversion
Authors:
Ang Li,
Anning Hu,
Wei Xi,
Wenxian Yu,
Danping Zou
Abstract:
Accurate and dense depth estimation with stereo cameras and LiDAR is an important task for automatic driving and robotic perception. While sparse hints from LiDAR points have improved cost aggregation in stereo matching, their effectiveness is limited by the low density and non-uniform distribution. To address this issue, we propose a novel stereo-LiDAR depth estimation network with Semi-Dense hin…
▽ More
Accurate and dense depth estimation with stereo cameras and LiDAR is an important task for automatic driving and robotic perception. While sparse hints from LiDAR points have improved cost aggregation in stereo matching, their effectiveness is limited by the low density and non-uniform distribution. To address this issue, we propose a novel stereo-LiDAR depth estimation network with Semi-Dense hint Guidance, named SDG-Depth. Our network includes a deformable propagation module for generating a semi-dense hint map and a confidence map by propagating sparse hints using a learned deformable window. These maps then guide cost aggregation in stereo matching. To reduce the triangulation error in depth recovery from disparity, especially in distant regions, we introduce a disparity-depth conversion module. Our method is both accurate and efficient. The experimental results on benchmark tests show its superior performance. Our code is available at https://github.com/SJTU-ViSYS/SDG-Depth.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks
Authors:
Xingwu Chen,
Difan Zou
Abstract:
We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to perform memorization, reasoning, generalization, and contextual generalization. We show a transformer with only one attention layer can excel in memorization but f…
▽ More
We study the capabilities of the transformer architecture with varying depth. Specifically, we designed a novel set of sequence learning tasks to systematically evaluate and comprehend how the depth of transformer affects its ability to perform memorization, reasoning, generalization, and contextual generalization. We show a transformer with only one attention layer can excel in memorization but falls short in other tasks. Then, we show that exhibiting reasoning and generalization ability requires the transformer to have at least two attention layers, while context generalization ability may necessitate three attention layers. Additionally, we identify a class of simple operations that a single attention layer can execute, and show that the complex tasks can be approached as the combinations of these simple operations and thus can be resolved by stacking multiple attention layers. This sheds light on studying more practical and complex tasks beyond our design. Numerical experiments corroborate our theoretical findings.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
On the Benefits of Over-parameterization for Out-of-Distribution Generalization
Authors:
Yifan Hao,
Yong Lin,
Difan Zou,
Tong Zhang
Abstract:
In recent years, machine learning models have achieved success based on the independently and identically distributed assumption. However, this assumption can be easily violated in real-world applications, leading to the Out-of-Distribution (OOD) problem. Understanding how modern over-parameterized DNNs behave under non-trivial natural distributional shifts is essential, as current theoretical und…
▽ More
In recent years, machine learning models have achieved success based on the independently and identically distributed assumption. However, this assumption can be easily violated in real-world applications, leading to the Out-of-Distribution (OOD) problem. Understanding how modern over-parameterized DNNs behave under non-trivial natural distributional shifts is essential, as current theoretical understanding is insufficient. Existing theoretical works often provide meaningless results for over-parameterized models in OOD scenarios or even contradict empirical findings. To this end, we are investigating the performance of the over-parameterized model in terms of OOD generalization under the general benign overfitting conditions. Our analysis focuses on a random feature model and examines non-trivial natural distributional shifts, where the benign overfitting estimators demonstrate a constant excess OOD loss, despite achieving zero excess in-distribution (ID) loss. We demonstrate that in this scenario, further increasing the model's parameterization can significantly reduce the OOD loss. Intuitively, the variance term of ID loss remains low due to orthogonality of long-tail features, meaning overfitting noise during training generally doesn't raise testing loss. However, in OOD cases, distributional shift increases the variance term. Thankfully, the inherent shift is unrelated to individual x, maintaining the orthogonality of long-tail features. Expanding the hidden dimension can additionally improve this orthogonality by mapping the features into higher-dimensional spaces, thereby reducing the variance term. We further show that model ensembles also improve OOD loss, akin to increasing model capacity. These insights explain the empirical phenomenon of enhanced OOD generalization through model ensembles, supported by consistent simulations with theoretical results.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Large-scale Array for Radio Astronomy on the Farside
Authors:
Xuelei Chen,
Feng Gao,
Fengquan Wu,
Yechi Zhang,
Tong Wang,
Weilin Liu,
Dali Zou,
Furen Deng,
Yang Gong,
Kai He,
Jixia Li,
Shijie Sun,
Nanben Suo,
Yougang Wang,
Pengju Wu,
Jiaqin Xu,
Yidong Xu,
Bin Yue,
Cong Zhang,
Jia Zhou,
Minquan Zhou,
Chenguang Zhu,
Jiacong Zhu
Abstract:
At the Royal Society meeting in 2023, we have mainly presented our lunar orbit array concept called DSL, and also briefly introduced a concept of a lunar surface array, LARAF. As the DSL concept had been presented before, in this article we introduce the LARAF. We propose to build an array in the far side of the Moon, with a master station which handles the data collection and processing, and 20 s…
▽ More
At the Royal Society meeting in 2023, we have mainly presented our lunar orbit array concept called DSL, and also briefly introduced a concept of a lunar surface array, LARAF. As the DSL concept had been presented before, in this article we introduce the LARAF. We propose to build an array in the far side of the Moon, with a master station which handles the data collection and processing, and 20 stations with maximum baseline of 10 km. Each station consists 12 membrane antenna units, and the stations are connected to the master station by power line and optical fiber. The array will make interferometric observation in the 0.1-50 MHz band during the lunar night, powered by regenerated fuel cells (RFCs). The whole array can be carried to the lunar surface with a heavy rocket mission, and deployed with a rover in 8 months. Such an array would be an important step in the long term development of lunar based ultralong wavelength radio astronomy. It has a sufficiently high sensitivity to observe many radio sources in the sky, though still short of the dark age fluctuations. We discuss the possible options in the power supply, data communication, deployment, etc.
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Thermodynamics of charged black holes in Maxwell-dilaton-massive gravity
Authors:
Rui-Hong Yue,
Kai-Qiang Qian,
Bo Liu,
De-Cheng Zou
Abstract:
Considering the nonminimal coupling of the dilaton field to the massive graviton field in Maxwell-dilaton-massive gravity, we obtain a class of analytical solutions of charged black holes, which are neither asymptotically flat nor (A)dS. The calculated thermodynamic quantities, such as mass, temperature and entropy, verify the validity of the first law of black hole thermodynamics. Moreover, we fu…
▽ More
Considering the nonminimal coupling of the dilaton field to the massive graviton field in Maxwell-dilaton-massive gravity, we obtain a class of analytical solutions of charged black holes, which are neither asymptotically flat nor (A)dS. The calculated thermodynamic quantities, such as mass, temperature and entropy, verify the validity of the first law of black hole thermodynamics. Moreover, we further investigate the critical behaviors of these black holes in the grand canonical and canonical ensemble, and find a novel critical phenomenon never before observed, known as the ``reverse" reentrant phase transition with a tricritical point. It implies that the system undergoes a novel ``SBH-LBH-SBH" phase transition process, and is the reverse of the ``LBH-SBH-LBH" process observed in reentrant phase transitions.
△ Less
Submitted 23 May, 2024; v1 submitted 20 March, 2024;
originally announced March 2024.
-
Multi-mode Fault Diagnosis Datasets of Gearbox Under Variable Working Conditions
Authors:
Shijin Chen,
Zeyi Liu,
Xiao He,
Dongliang Zou,
Donghua Zhou
Abstract:
The gearbox is a critical component of electromechanical systems. The occurrence of multiple faults can significantly impact system accuracy and service life. The vibration signal of the gearbox is an effective indicator of its operational status and fault information. However, gearboxes in real industrial settings often operate under variable working conditions, such as varying speeds and loads.…
▽ More
The gearbox is a critical component of electromechanical systems. The occurrence of multiple faults can significantly impact system accuracy and service life. The vibration signal of the gearbox is an effective indicator of its operational status and fault information. However, gearboxes in real industrial settings often operate under variable working conditions, such as varying speeds and loads. It is a significant and challenging research area to complete the gearbox fault diagnosis procedure under varying operating conditions using vibration signals. This data article presents vibration datasets collected from a gearbox exhibiting various fault degrees of severity and fault types, operating under diverse speed and load conditions. These faults are manually implanted into the gears or bearings through precise machining processes, which include health, missing teeth, wear, pitting, root cracks, and broken teeth. Several kinds of actual compound faults are also encompassed. The development of these datasets facilitates testing the effectiveness and reliability of newly developed fault diagnosis methods.
△ Less
Submitted 8 April, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
Simplified Self-homodyne Coherent System Based on Alamouti Coding and Digital Subcarrier Multiplexing
Authors:
Wei Wang,
Dongdong Zou,
Zhenpeng Wu,
Qi Sui,
Xingwen Yi,
Fan Li,
Chao Lu,
Zhaohui Li
Abstract:
Coherent technology inherent with more availabledegrees of freedom is deemed a competitive solution for nextgeneration ultra-high-speed short-reach optical interconnects.However, the fatal barriers to implementing the conventiona.coherent system in short-reach optical interconnect are the costfootprint, and power consumption. Self-homodyne coherentsystem exhibits its potential to reduce the power…
▽ More
Coherent technology inherent with more availabledegrees of freedom is deemed a competitive solution for nextgeneration ultra-high-speed short-reach optical interconnects.However, the fatal barriers to implementing the conventiona.coherent system in short-reach optical interconnect are the costfootprint, and power consumption. Self-homodyne coherentsystem exhibits its potential to reduce the power consumption ofthe receiver-side digital signal processing (Rx-DSP) by deliveringthe local oscillator (LO) from the transmitter. However, anautomatic polarization controller (APC) is inevitable in the remoteLO link to avoid polarization fading, resulting in additional costsTo address the polarization fading issue, a simplified self.homodyne coherent system is proposed enabled by Alamouticoding in this paper. Benefiting from the Alamouti coding betweentwo polarizations, a polarization-insensitive receiver onlyincluding a 3dB coupler, a 90o Hybrid, and two balancedphotodiodes (BPDs)is sufficient for reception. Meanwhile, theAPC in the LO link is needless, simplifying the receiver structuresignificantly. Besides, the digital subcarrier multiplexing (DSCM)technique is also adopted to relax the computational complexity ofthe chromatic dispersion compensation (CDC), which is one of thedominant power consumption modules in Rx-DSP. Thetransmission performance of 50Gbaud 4-subcarrier 16/32OAM(4SC-16/320AM) DSCM signal based on the proposed simplifiedself-homodyne coherent system is investigated experimentallyThe results show that the bit-error-ratio(BER) performancedegradation caused by CD can be solved by increasing 4 taps inthe equalizer for 80km single mode fiber(SMF)transmissionwithout individual CDC, which operates in a low-complexitymanner.
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
Improving Implicit Regularization of SGD with Preconditioning for Least Square Problems
Authors:
Junwei Su,
Difan Zou,
Chuan Wu
Abstract:
Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice and plays an important role in the generalization of modern machine learning. However, prior research has revealed instances where the generalization performance of SGD is worse than ridge regression due to uneven optimization along different dimensions. Preconditioning offers a natural solution to thi…
▽ More
Stochastic gradient descent (SGD) exhibits strong algorithmic regularization effects in practice and plays an important role in the generalization of modern machine learning. However, prior research has revealed instances where the generalization performance of SGD is worse than ridge regression due to uneven optimization along different dimensions. Preconditioning offers a natural solution to this issue by rebalancing optimization across different directions. Yet, the extent to which preconditioning can enhance the generalization performance of SGD and whether it can bridge the existing gap with ridge regression remains uncertain. In this paper, we study the generalization performance of SGD with preconditioning for the least squared problem. We make a comprehensive comparison between preconditioned SGD and (standard \& preconditioned) ridge regression. Our study makes several key contributions toward understanding and improving SGD with preconditioning. First, we establish excess risk bounds (generalization performance) for preconditioned SGD and ridge regression under an arbitrary preconditions matrix. Second, leveraging the excessive risk characterization of preconditioned SGD and ridge regression, we show that (through construction) there exists a simple preconditioned matrix that can make SGD comparable to (standard \& preconditioned) ridge regression. Finally, we show that our proposed preconditioning matrix is straightforward enough to allow robust estimation from finite samples while maintaining a theoretical improvement. Our empirical results align with our theoretical findings, collectively showcasing the enhanced regularization effect of preconditioned SGD.
△ Less
Submitted 26 May, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
An Improved Analysis of Langevin Algorithms with Prior Diffusion for Non-Log-Concave Sampling
Authors:
Xunpeng Huang,
Hanze Dong,
Difan Zou,
Tong Zhang
Abstract:
Understanding the dimension dependency of computational complexity in high-dimensional sampling problem is a fundamental problem, both from a practical and theoretical perspective. Compared with samplers with unbiased stationary distribution, e.g., Metropolis-adjusted Langevin algorithm (MALA), biased samplers, e.g., Underdamped Langevin Dynamics (ULD), perform better in low-accuracy cases just be…
▽ More
Understanding the dimension dependency of computational complexity in high-dimensional sampling problem is a fundamental problem, both from a practical and theoretical perspective. Compared with samplers with unbiased stationary distribution, e.g., Metropolis-adjusted Langevin algorithm (MALA), biased samplers, e.g., Underdamped Langevin Dynamics (ULD), perform better in low-accuracy cases just because a lower dimension dependency in their complexities. Along this line, Freund et al. (2022) suggest that the modified Langevin algorithm with prior diffusion is able to converge dimension independently for strongly log-concave target distributions. Nonetheless, it remains open whether such property establishes for more general cases. In this paper, we investigate the prior diffusion technique for the target distributions satisfying log-Sobolev inequality (LSI), which covers a much broader class of distributions compared to the strongly log-concave ones. In particular, we prove that the modified Langevin algorithm can also obtain the dimension-independent convergence of KL divergence with different step size schedules. The core of our proof technique is a novel construction of an interpolating SDE, which significantly helps to conduct a more accurate characterization of the discrete updates of the overdamped Langevin dynamics. Our theoretical analysis demonstrates the benefits of prior diffusion for a broader class of target distributions and provides new insights into developing faster sampling algorithms.
△ Less
Submitted 10 March, 2024;
originally announced March 2024.
-
Three Revisits to Node-Level Graph Anomaly Detection: Outliers, Message Passing and Hyperbolic Neural Networks
Authors:
Jing Gu,
Dongmian Zou
Abstract:
Graph anomaly detection plays a vital role for identifying abnormal instances in complex networks. Despite advancements of methodology based on deep learning in recent years, existing benchmarking approaches exhibit limitations that hinder a comprehensive comparison. In this paper, we revisit datasets and approaches for unsupervised node-level graph anomaly detection tasks from three aspects. Firs…
▽ More
Graph anomaly detection plays a vital role for identifying abnormal instances in complex networks. Despite advancements of methodology based on deep learning in recent years, existing benchmarking approaches exhibit limitations that hinder a comprehensive comparison. In this paper, we revisit datasets and approaches for unsupervised node-level graph anomaly detection tasks from three aspects. Firstly, we introduce outlier injection methods that create more diverse and graph-based anomalies in graph datasets. Secondly, we compare methods employing message passing against those without, uncovering the unexpected decline in performance associated with message passing. Thirdly, we explore the use of hyperbolic neural networks, specifying crucial architecture and loss design that contribute to enhanced performance. Through rigorous experiments and evaluations, our study sheds light on general strategies for improving node-level graph anomaly detection methods.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
Identify Critical Nodes in Complex Network with Large Language Models
Authors:
Jinzhu Mao,
Dongyun Zou,
Li Sheng,
Siyi Liu,
Chen Gao,
Yue Wang,
Yong Li
Abstract:
Identifying critical nodes in networks is a classical decision-making task, and many methods struggle to strike a balance between adaptability and utility. Therefore, we propose an approach that empowers Evolutionary Algorithm (EA) with Large Language Models (LLMs), to generate a function called "score\_nodes" which can further be used to identify crucial nodes based on their assigned scores. Our…
▽ More
Identifying critical nodes in networks is a classical decision-making task, and many methods struggle to strike a balance between adaptability and utility. Therefore, we propose an approach that empowers Evolutionary Algorithm (EA) with Large Language Models (LLMs), to generate a function called "score\_nodes" which can further be used to identify crucial nodes based on their assigned scores. Our model consists of three main components: Manual Initialization, Population Management, and LLMs-based Evolution. It evolves from initial populations with a set of designed node scoring functions created manually. LLMs leverage their strong contextual understanding and rich programming skills to perform crossover and mutation operations on the individuals, generating excellent new functions. These functions are then categorized, ranked, and eliminated to ensure the stable development of the populations while preserving diversity. Extensive experiments demonstrate the excellent performance of our method, showcasing its strong generalization ability compared to other state-of-the-art algorithms. It can consistently and orderly generate diverse and efficient node scoring functions. All source codes and models that can reproduce all results in this work are publicly available at this link: \url{https://anonymous.4open.science/r/LLM4CN-6520}
△ Less
Submitted 1 March, 2024;
originally announced March 2024.
-
Pairwise Alignment Improves Graph Domain Adaptation
Authors:
Shikun Liu,
Deyu Zou,
Han Zhao,
Pan Li
Abstract:
Graph-based methods, pivotal for label inference over interconnected objects in many real-world applications, often encounter generalization challenges, if the graph used for model training differs significantly from the graph used for testing. This work delves into Graph Domain Adaptation (GDA) to address the unique complexities of distribution shifts over graph data, where interconnected data po…
▽ More
Graph-based methods, pivotal for label inference over interconnected objects in many real-world applications, often encounter generalization challenges, if the graph used for model training differs significantly from the graph used for testing. This work delves into Graph Domain Adaptation (GDA) to address the unique complexities of distribution shifts over graph data, where interconnected data points experience shifts in features, labels, and in particular, connecting patterns. We propose a novel, theoretically principled method, Pairwise Alignment (Pair-Align) to counter graph structure shift by mitigating conditional structure shift (CSS) and label shift (LS). Pair-Align uses edge weights to recalibrate the influence among neighboring nodes to handle CSS and adjusts the classification loss with label weights to handle LS. Our method demonstrates superior performance in real-world applications, including node classification with region shift in social networks, and the pileup mitigation task in particle colliding experiments. For the first application, we also curate the largest dataset by far for GDA studies. Our method shows strong performance in synthetic and other existing benchmark datasets.
△ Less
Submitted 4 June, 2024; v1 submitted 1 March, 2024;
originally announced March 2024.
-
Ground-Fusion: A Low-cost Ground SLAM System Robust to Corner Cases
Authors:
Jie Yin,
Ang Li,
Wei Xi,
Wenxian Yu,
Danping Zou
Abstract:
We introduce Ground-Fusion, a low-cost sensor fusion simultaneous localization and mapping (SLAM) system for ground vehicles. Our system features efficient initialization, effective sensor anomaly detection and handling, real-time dense color mapping, and robust localization in diverse environments. We tightly integrate RGB-D images, inertial measurements, wheel odometer and GNSS signals within a…
▽ More
We introduce Ground-Fusion, a low-cost sensor fusion simultaneous localization and mapping (SLAM) system for ground vehicles. Our system features efficient initialization, effective sensor anomaly detection and handling, real-time dense color mapping, and robust localization in diverse environments. We tightly integrate RGB-D images, inertial measurements, wheel odometer and GNSS signals within a factor graph to achieve accurate and reliable localization both indoors and outdoors. To ensure successful initialization, we propose an efficient strategy that comprises three different methods: stationary, visual, and dynamic, tailored to handle diverse cases. Furthermore, we develop mechanisms to detect sensor anomalies and degradation, handling them adeptly to maintain system accuracy. Our experimental results on both public and self-collected datasets demonstrate that Ground-Fusion outperforms existing low-cost SLAM systems in corner cases. We release the code and datasets at https://github.com/SJTU-ViSYS/Ground-Fusion.
△ Less
Submitted 22 February, 2024;
originally announced February 2024.
-
Towards Robust Graph Incremental Learning on Evolving Graphs
Authors:
Junwei Su,
Difan Zou,
Zijun Zhang,
Chuan Wu
Abstract:
Incremental learning is a machine learning approach that involves training a model on a sequence of tasks, rather than all tasks at once. This ability to learn incrementally from a stream of tasks is crucial for many real-world applications. However, incremental learning is a challenging problem on graph-structured data, as many graph-related problems involve prediction tasks for each individual n…
▽ More
Incremental learning is a machine learning approach that involves training a model on a sequence of tasks, rather than all tasks at once. This ability to learn incrementally from a stream of tasks is crucial for many real-world applications. However, incremental learning is a challenging problem on graph-structured data, as many graph-related problems involve prediction tasks for each individual node, known as Node-wise Graph Incremental Learning (NGIL). This introduces non-independent and non-identically distributed characteristics in the sample data generation process, making it difficult to maintain the performance of the model as new tasks are added. In this paper, we focus on the inductive NGIL problem, which accounts for the evolution of graph structure (structural shift) induced by emerging tasks. We provide a formal formulation and analysis of the problem, and propose a novel regularization-based technique called Structural-Shift-Risk-Mitigation (SSRM) to mitigate the impact of the structural shift on catastrophic forgetting of the inductive NGIL problem. We show that the structural shift can lead to a shift in the input distribution for the existing tasks, and further lead to an increased risk of catastrophic forgetting. Through comprehensive empirical studies with several benchmark datasets, we demonstrate that our proposed method, Structural-Shift-Risk-Mitigation (SSRM), is flexible and easy to adapt to improve the performance of state-of-the-art GNN incremental learning frameworks in the inductive setting.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
PRES: Toward Scalable Memory-Based Dynamic Graph Neural Networks
Authors:
Junwei Su,
Difan Zou,
Chuan Wu
Abstract:
Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts. However, training MDGNNs faces the challenge of handling entangled temporal and structural dependencies, requiring sequential and chron…
▽ More
Memory-based Dynamic Graph Neural Networks (MDGNNs) are a family of dynamic graph neural networks that leverage a memory module to extract, distill, and memorize long-term temporal dependencies, leading to superior performance compared to memory-less counterparts. However, training MDGNNs faces the challenge of handling entangled temporal and structural dependencies, requiring sequential and chronological processing of data sequences to capture accurate temporal patterns. During the batch training, the temporal data points within the same batch will be processed in parallel, while their temporal dependencies are neglected. This issue is referred to as temporal discontinuity and restricts the effective temporal batch size, limiting data parallelism and reducing MDGNNs' flexibility in industrial applications. This paper studies the efficient training of MDGNNs at scale, focusing on the temporal discontinuity in training MDGNNs with large temporal batch sizes. We first conduct a theoretical study on the impact of temporal batch size on the convergence of MDGNN training. Based on the analysis, we propose PRES, an iterative prediction-correction scheme combined with a memory coherence learning objective to mitigate the effect of temporal discontinuity, enabling MDGNNs to be trained with significantly larger temporal batches without sacrificing generalization performance. Experimental results demonstrate that our approach enables up to a 4x larger temporal batch (3.4x speed-up) during MDGNN training.
△ Less
Submitted 26 February, 2024; v1 submitted 5 February, 2024;
originally announced February 2024.
-
On the Effectiveness of Function-Level Vulnerability Detectors for Inter-Procedural Vulnerabilities
Authors:
Zhen Li,
Ning Wang,
Deqing Zou,
Yating Li,
Ruqian Zhang,
Shouhuai Xu,
Chao Zhang,
Hai Jin
Abstract:
Software vulnerabilities are a major cyber threat and it is important to detect them. One important approach to detecting vulnerabilities is to use deep learning while treating a program function as a whole, known as function-level vulnerability detectors. However, the limitation of this approach is not understood. In this paper, we investigate its limitation in detecting one class of vulnerabilit…
▽ More
Software vulnerabilities are a major cyber threat and it is important to detect them. One important approach to detecting vulnerabilities is to use deep learning while treating a program function as a whole, known as function-level vulnerability detectors. However, the limitation of this approach is not understood. In this paper, we investigate its limitation in detecting one class of vulnerabilities known as inter-procedural vulnerabilities, where the to-be-patched statements and the vulnerability-triggering statements belong to different functions. For this purpose, we create the first Inter-Procedural Vulnerability Dataset (InterPVD) based on C/C++ open-source software, and we propose a tool dubbed VulTrigger for identifying vulnerability-triggering statements across functions. Experimental results show that VulTrigger can effectively identify vulnerability-triggering statements and inter-procedural vulnerabilities. Our findings include: (i) inter-procedural vulnerabilities are prevalent with an average of 2.8 inter-procedural layers; and (ii) function-level vulnerability detectors are much less effective in detecting to-be-patched functions of inter-procedural vulnerabilities than detecting their counterparts of intra-procedural vulnerabilities.
△ Less
Submitted 20 January, 2024; v1 submitted 18 January, 2024;
originally announced January 2024.
-
Faster Sampling without Isoperimetry via Diffusion-based Monte Carlo
Authors:
Xunpeng Huang,
Difan Zou,
Hanze Dong,
Yian Ma,
Tong Zhang
Abstract:
To sample from a general target distribution $p_*\propto e^{-f_*}$ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimat…
▽ More
To sample from a general target distribution $p_*\propto e^{-f_*}$ beyond the isoperimetric condition, Huang et al. (2023) proposed to perform sampling through reverse diffusion, giving rise to Diffusion-based Monte Carlo (DMC). Specifically, DMC follows the reverse SDE of a diffusion process that transforms the target distribution to the standard Gaussian, utilizing a non-parametric score estimation. However, the original DMC algorithm encountered high gradient complexity, resulting in an exponential dependency on the error tolerance $ε$ of the obtained samples. In this paper, we demonstrate that the high complexity of DMC originates from its redundant design of score estimation, and proposed a more efficient algorithm, called RS-DMC, based on a novel recursive score estimation method. In particular, we first divide the entire diffusion process into multiple segments and then formulate the score estimation step (at any time step) as a series of interconnected mean estimation and sampling subproblems accordingly, which are correlated in a recursive manner. Importantly, we show that with a proper design of the segment decomposition, all sampling subproblems will only need to tackle a strongly log-concave distribution, which can be very efficient to solve using the Langevin-based samplers with a provably rapid convergence rate. As a result, we prove that the gradient complexity of RS-DMC only has a quasi-polynomial dependency on $ε$, which significantly improves exponential gradient complexity in Huang et al. (2023). Furthermore, under commonly used dissipative conditions, our algorithm is provably much faster than the popular Langevin-based algorithms. Our algorithm design and theoretical framework illuminate a novel direction for addressing sampling problems, which could be of broader applicability in the community.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
Scalarization of Kerr-Newman black holes in the Einstein-Chern-Simons-scalar theory
Authors:
Kun-Hui Fan,
Yun Soo Myung,
De-Cheng Zou,
Meng-Yun Lai
Abstract:
We investigate the tachyonic instability of Kerr-Newman (KN) black hole with a rotation parameter $a$ in the Einstein-Chern-Simons-scalar theory coupled with a quadratic massive scalar field. This instability analysis corresponds to exploring the onset of spontaneous scalarization for KN black holes. First, we find no $a$-bound for $α<0$ case by considering (1+1)-dimensional analytical method. A d…
▽ More
We investigate the tachyonic instability of Kerr-Newman (KN) black hole with a rotation parameter $a$ in the Einstein-Chern-Simons-scalar theory coupled with a quadratic massive scalar field. This instability analysis corresponds to exploring the onset of spontaneous scalarization for KN black holes. First, we find no $a$-bound for $α<0$ case by considering (1+1)-dimensional analytical method. A direct numerical method is adopted to explore (2+1)-dimensional time evolution of a massive scalar perturbation with positive and negative $α$ to obtain threshold curves numerically. We obtain threshold curves $α_{\rm th}(a)$ of tachyonic instability for positive $α$ without any $a$-bounds. We expect to find the same threshold curves $α_{\rm th}(a)$ of tachyonic instability for negative $α$ without any $a$-bound because its linearized scalar theory is invariant under the transformation of $α\to -α$ and $θ\to -θ$. In addition, it is found that the scalar mass term suppresses tachyonic instability of KN black holes.
△ Less
Submitted 2 January, 2024; v1 submitted 30 December, 2023;
originally announced January 2024.
-
Diffusion-based Blind Text Image Super-Resolution
Authors:
Yuzhe Zhang,
Jiawei Zhang,
Hao Li,
Zhouxia Wang,
Luwei Hou,
Dongqing Zou,
Liheng Bian
Abstract:
Recovering degraded low-resolution text images is challenging, especially for Chinese text images with complex strokes and severe degradation in real-world scenarios. Ensuring both text fidelity and style realness is crucial for high-quality text image super-resolution. Recently, diffusion models have achieved great success in natural image synthesis and restoration due to their powerful data dist…
▽ More
Recovering degraded low-resolution text images is challenging, especially for Chinese text images with complex strokes and severe degradation in real-world scenarios. Ensuring both text fidelity and style realness is crucial for high-quality text image super-resolution. Recently, diffusion models have achieved great success in natural image synthesis and restoration due to their powerful data distribution modeling abilities and data generation capabilities. In this work, we propose an Image Diffusion Model (IDM) to restore text images with realistic styles. For diffusion models, they are not only suitable for modeling realistic image distribution but also appropriate for learning text distribution. Since text prior is important to guarantee the correctness of the restored text structure according to existing arts, we also propose a Text Diffusion Model (TDM) for text recognition which can guide IDM to generate text images with correct structures. We further propose a Mixture of Multi-modality module (MoM) to make these two diffusion models cooperate with each other in all the diffusion steps. Extensive experiments on synthetic and real-world datasets demonstrate that our Diffusion-based Blind Text Image Super-Resolution (DiffTSR) can restore text images with more accurate text structures as well as more realistic appearances simultaneously.
△ Less
Submitted 3 March, 2024; v1 submitted 13 December, 2023;
originally announced December 2023.
-
Predicting Scores of Various Aesthetic Attribute Sets by Learning from Overall Score Labels
Authors:
Heng Huang,
Xin Jin,
Yaqi Liu,
Hao Lou,
Chaoen Xiao,
Shuai Cui,
Xinning Li,
Dongqing Zou
Abstract:
Now many mobile phones embed deep-learning models for evaluation or guidance on photography. These models cannot provide detailed results like human pose scores or scene color scores because of the rare of corresponding aesthetic attribute data. However, the annotation of image aesthetic attribute scores requires experienced artists and professional photographers, which hinders the collection of l…
▽ More
Now many mobile phones embed deep-learning models for evaluation or guidance on photography. These models cannot provide detailed results like human pose scores or scene color scores because of the rare of corresponding aesthetic attribute data. However, the annotation of image aesthetic attribute scores requires experienced artists and professional photographers, which hinders the collection of large-scale fully-annotated datasets. In this paper, we propose to replace image attribute labels with feature extractors. First, a novel aesthetic attribute evaluation framework based on attribute features is proposed to predict attribute scores and overall scores. We call it the F2S (attribute features to attribute scores) model. We use networks from different tasks to provide attribute features to our F2S models. Then, we define an aesthetic attribute contribution to describe the role of aesthetic attributes throughout an image and use it with the attribute scores and the overall scores to train our F2S model. Sufficient experiments on publicly available datasets demonstrate that our F2S model achieves comparable performance with those trained on the datasets with fully-annotated aesthetic attribute score labels. Our method makes it feasible to learn meaningful attribute scores for various aesthetic attribute sets in different types of images with only overall aesthetic scores.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
AI-driven emergence of frequency information non-uniform distribution via THz metasurface spectrum prediction
Authors:
Xiaohua Xing,
Yuqi Ren,
Die Zou,
Qiankun Zhang,
Bingxuan Mao,
Jianquan Yao,
Deyi Xiong,
Shuang Zhang,
Liang Wu
Abstract:
Recently, artificial intelligence has been extensively deployed across various scientific disciplines, optimizing and guiding the progression of experiments through the integration of abundant datasets, whilst continuously probing the vast theoretical space encapsulated within the data. Particularly, deep learning models, due to their end-to-end adaptive learning capabilities, are capable of auton…
▽ More
Recently, artificial intelligence has been extensively deployed across various scientific disciplines, optimizing and guiding the progression of experiments through the integration of abundant datasets, whilst continuously probing the vast theoretical space encapsulated within the data. Particularly, deep learning models, due to their end-to-end adaptive learning capabilities, are capable of autonomously learning intrinsic data features, thereby transcending the limitations of traditional experience to a certain extent. Here, we unveil previously unreported information characteristics pertaining to different frequencies emerged during our work on predicting the terahertz spectral modulation effects of metasurfaces based on AI-prediction. Moreover, we have substantiated that our proposed methodology of simply adding supplementary multi-frequency inputs to the existing dataset during the target spectral prediction process can significantly enhance the predictive accuracy of the network. This approach effectively optimizes the utilization of existing datasets and paves the way for interdisciplinary research and applications in artificial intelligence, chemistry, composite material design, biomedicine, and other fields.
△ Less
Submitted 4 December, 2023;
originally announced December 2023.
-
OptScaler: A Hybrid Proactive-Reactive Framework for Robust Autoscaling in the Cloud
Authors:
Ding Zou,
Wei Lu,
Zhibo Zhu,
Xingyu Lu,
Jun Zhou,
Xiaojin Wang,
Kangyu Liu,
Haiqing Wang,
Kefan Wang,
Renen Sun
Abstract:
Autoscaling is a vital mechanism in cloud computing that supports the autonomous adjustment of computing resources under dynamic workloads. A primary goal of autoscaling is to stabilize resource utilization at a desirable level, thus reconciling the need for resource-saving with the satisfaction of Service Level Objectives (SLOs). Existing proactive autoscaling methods anticipate the future worklo…
▽ More
Autoscaling is a vital mechanism in cloud computing that supports the autonomous adjustment of computing resources under dynamic workloads. A primary goal of autoscaling is to stabilize resource utilization at a desirable level, thus reconciling the need for resource-saving with the satisfaction of Service Level Objectives (SLOs). Existing proactive autoscaling methods anticipate the future workload and scale the resources in advance, whereas the reliability may suffer from prediction deviations arising from the frequent fluctuations and noise of cloud workloads; reactive methods rely on real-time system feedback, while the hysteretic nature of reactive methods could cause violations of the rigorous SLOs. To this end, this paper presents OptScaler, a hybrid autoscaling framework that integrates the power of both proactive and reactive methods for regulating CPU utilization. Specifically, the proactive module of OptScaler consists of a sophisticated workload prediction model and an optimization model, where the former provides reliable inputs to the latter for making optimal scaling decisions. The reactive module provides a self-tuning estimator of CPU utilization to the optimization model. We embed Model Predictive Control (MPC) mechanism and robust optimization techniques into the optimization model to further enhance its reliability. Numerical results have demonstrated the superiority of both the workload prediction model and the hybrid framework of OptScaler in the scenario of online services compared to prevalent reactive, proactive, or hybrid autoscalers. OptScaler has been successfully deployed at Alipay, supporting the autoscaling of applets in the world-leading payment platform.
△ Less
Submitted 26 October, 2023;
originally announced November 2023.
-
GRAM: An Interpretable Approach for Graph Anomaly Detection using Gradient Attention Maps
Authors:
Yifei Yang,
Peng Wang,
Xiaofan He,
Dongmian Zou
Abstract:
Detecting unusual patterns in graph data is a crucial task in data mining. However, existing methods face challenges in consistently achieving satisfactory performance and often lack interpretability, which hinders our understanding of anomaly detection decisions. In this paper, we propose a novel approach to graph anomaly detection that leverages the power of interpretability to enhance performan…
▽ More
Detecting unusual patterns in graph data is a crucial task in data mining. However, existing methods face challenges in consistently achieving satisfactory performance and often lack interpretability, which hinders our understanding of anomaly detection decisions. In this paper, we propose a novel approach to graph anomaly detection that leverages the power of interpretability to enhance performance. Specifically, our method extracts an attention map derived from gradients of graph neural networks, which serves as a basis for scoring anomalies. Notably, our approach is flexible and can be used in various anomaly detection settings. In addition, we conduct theoretical analysis using synthetic data to validate our method and gain insights into its decision-making process. To demonstrate the effectiveness of our method, we extensively evaluate our approach against state-of-the-art graph anomaly detection techniques on real-world graph classification and wireless network datasets. The results consistently demonstrate the superior performance of our method compared to the baselines.
△ Less
Submitted 26 June, 2024; v1 submitted 10 November, 2023;
originally announced November 2023.
-
MultiSPANS: A Multi-range Spatial-Temporal Transformer Network for Traffic Forecast via Structural Entropy Optimization
Authors:
Dongcheng Zou,
Senzhang Wang,
Xuefeng Li,
Hao Peng,
Yuandong Wang,
Chunyang Liu,
Kehua Sheng,
Bo Zhang
Abstract:
Traffic forecasting is a complex multivariate time-series regression task of paramount importance for traffic management and planning. However, existing approaches often struggle to model complex multi-range dependencies using local spatiotemporal features and road network hierarchical knowledge. To address this, we propose MultiSPANS. First, considering that an individual recording point cannot r…
▽ More
Traffic forecasting is a complex multivariate time-series regression task of paramount importance for traffic management and planning. However, existing approaches often struggle to model complex multi-range dependencies using local spatiotemporal features and road network hierarchical knowledge. To address this, we propose MultiSPANS. First, considering that an individual recording point cannot reflect critical spatiotemporal local patterns, we design multi-filter convolution modules for generating informative ST-token embeddings to facilitate attention computation. Then, based on ST-token and spatial-temporal position encoding, we employ the Transformers to capture long-range temporal and spatial dependencies. Furthermore, we introduce structural entropy theory to optimize the spatial attention mechanism. Specifically, The structural entropy minimization algorithm is used to generate optimal road network hierarchies, i.e., encoding trees. Based on this, we propose a relative structural entropy-based position encoding and a multi-head attention masking scheme based on multi-layer encoding trees. Extensive experiments demonstrate the superiority of the presented framework over several state-of-the-art methods in real-world traffic datasets, and the longer historical windows are effectively utilized. The code is available at https://github.com/SELGroup/MultiSPANS.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
Monotone Generative Modeling via a Gromov-Monge Embedding
Authors:
Wonjun Lee,
Yifei Yang,
Dongmian Zou,
Gilad Lerman
Abstract:
Generative adversarial networks (GANs) are popular for generative tasks; however, they often require careful architecture selection, extensive empirical tuning, and are prone to mode collapse. To overcome these challenges, we propose a novel model that identifies the low-dimensional structure of the underlying data distribution, maps it into a low-dimensional latent space while preserving the unde…
▽ More
Generative adversarial networks (GANs) are popular for generative tasks; however, they often require careful architecture selection, extensive empirical tuning, and are prone to mode collapse. To overcome these challenges, we propose a novel model that identifies the low-dimensional structure of the underlying data distribution, maps it into a low-dimensional latent space while preserving the underlying geometry, and then optimally transports a reference measure to the embedded distribution. We prove three key properties of our method: 1) The encoder preserves the geometry of the underlying data; 2) The generator is $c$-cyclically monotone, where $c$ is an intrinsic embedding cost employed by the encoder; and 3) The discriminator's modulus of continuity improves with the geometric preservation of the data. Numerical experiments demonstrate the effectiveness of our approach in generating high-quality images and exhibiting robustness to both mode collapse and training instability.
△ Less
Submitted 3 July, 2024; v1 submitted 2 November, 2023;
originally announced November 2023.
-
Benign Oscillation of Stochastic Gradient Descent with Large Learning Rates
Authors:
Miao Lu,
Beining Wu,
Xiaodong Yang,
Difan Zou
Abstract:
In this work, we theoretically investigate the generalization properties of neural networks (NN) trained by stochastic gradient descent (SGD) algorithm with large learning rates. Under such a training regime, our finding is that, the oscillation of the NN weights caused by the large learning rate SGD training turns out to be beneficial to the generalization of the NN, which potentially improves ov…
▽ More
In this work, we theoretically investigate the generalization properties of neural networks (NN) trained by stochastic gradient descent (SGD) algorithm with large learning rates. Under such a training regime, our finding is that, the oscillation of the NN weights caused by the large learning rate SGD training turns out to be beneficial to the generalization of the NN, which potentially improves over the same NN trained by SGD with small learning rates that converges more smoothly. In view of this finding, we call such a phenomenon "benign oscillation". Our theory towards demystifying such a phenomenon builds upon the feature learning perspective of deep learning. Specifically, we consider a feature-noise data generation model that consists of (i) weak features which have a small $\ell_2$-norm and appear in each data point; (ii) strong features which have a larger $\ell_2$-norm but only appear in a certain fraction of all data points; and (iii) noise. We prove that NNs trained by oscillating SGD with a large learning rate can effectively learn the weak features in the presence of those strong features. In contrast, NNs trained by SGD with a small learning rate can only learn the strong features but makes little progress in learning the weak features. Consequently, when it comes to the new testing data which consist of only weak features, the NN trained by oscillating SGD with a large learning rate could still make correct predictions consistently, while the NN trained by small learning rate SGD fails. Our theory sheds light on how large learning rate training benefits the generalization of NNs. Experimental results demonstrate our finding on "benign oscillation".
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Black holes in massive Einstein-dilaton gravity
Authors:
Bo Liu,
Rui-Hong Yue,
De-Cheng Zou,
Lina Zhang,
Zhan-Ying Yang,
Qiyuan Pan
Abstract:
In this paper, we focus on massive Einstein-dilaton gravity including the coupling of dilaton scalar field to massive graviton terms, and then derive static and spherically symmetric solutions of dilatonic black holes in four dimensional spacetime. We find that the dilatonic black hole could possess two horizons (event and cosmological), extreme (Nariai) and naked singularity for the suitably fixe…
▽ More
In this paper, we focus on massive Einstein-dilaton gravity including the coupling of dilaton scalar field to massive graviton terms, and then derive static and spherically symmetric solutions of dilatonic black holes in four dimensional spacetime. We find that the dilatonic black hole could possess two horizons (event and cosmological), extreme (Nariai) and naked singularity for the suitably fixed parameters. In addition, we investigate thermodynamic properties of these dilatonic black holes, and check the corresponding first law of black hole thermodynamics. Extending to the massive Einstein-dilaton gravity in high dimensions, we further obtain the dilatonic black hole solutions in $(d+1)$ dimensional spacetime.
△ Less
Submitted 5 March, 2024; v1 submitted 23 October, 2023;
originally announced October 2023.
-
GDL-DS: A Benchmark for Geometric Deep Learning under Distribution Shifts
Authors:
Deyu Zou,
Shikun Liu,
Siqi Miao,
Victor Fung,
Shiyu Chang,
Pan Li
Abstract:
Geometric deep learning (GDL) has gained significant attention in various scientific fields, chiefly for its proficiency in modeling data with intricate geometric structures. Yet, very few works have delved into its capability of tackling the distribution shift problem, a prevalent challenge in many relevant applications. To bridge this gap, we propose GDL-DS, a comprehensive benchmark designed fo…
▽ More
Geometric deep learning (GDL) has gained significant attention in various scientific fields, chiefly for its proficiency in modeling data with intricate geometric structures. Yet, very few works have delved into its capability of tackling the distribution shift problem, a prevalent challenge in many relevant applications. To bridge this gap, we propose GDL-DS, a comprehensive benchmark designed for evaluating the performance of GDL models in scenarios with distribution shifts. Our evaluation datasets cover diverse scientific domains from particle physics and materials science to biochemistry, and encapsulate a broad spectrum of distribution shifts including conditional, covariate, and concept shifts. Furthermore, we study three levels of information access from the out-of-distribution (OOD) testing data, including no OOD information, only OOD features without labels, and OOD features with a few labels. Overall, our benchmark results in 30 different experiment settings, and evaluates 3 GDL backbones and 11 learning algorithms in each setting. A thorough analysis of the evaluation results is provided, poised to illuminate insights for DGL researchers and domain practitioners who are to use DGL in their applications.
△ Less
Submitted 12 October, 2023;
originally announced October 2023.
-
How Many Pretraining Tasks Are Needed for In-Context Learning of Linear Regression?
Authors:
Jingfeng Wu,
Difan Zou,
Zixiang Chen,
Vladimir Braverman,
Quanquan Gu,
Peter L. Bartlett
Abstract:
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a stati…
▽ More
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters. In this paper, we study ICL in one of its simplest setups: pretraining a linearly parameterized single-layer linear attention model for linear regression with a Gaussian prior. We establish a statistical task complexity bound for the attention model pretraining, showing that effective pretraining only requires a small number of independent tasks. Furthermore, we prove that the pretrained model closely matches the Bayes optimal algorithm, i.e., optimally tuned ridge regression, by achieving nearly Bayes optimal risk on unseen tasks under a fixed context length. These theoretical findings complement prior experimental research and shed light on the statistical foundations of ICL.
△ Less
Submitted 14 March, 2024; v1 submitted 12 October, 2023;
originally announced October 2023.