-
Stable Weight Updating: A Key to Reliable PDE Solutions Using Deep Learning
Authors:
A. Noorizadegan,
R. Cavoretto,
D. L. Young,
C. S. Chen
Abstract:
Background: Deep learning techniques, particularly neural networks, have revolutionized computational physics, offering powerful tools for solving complex partial differential equations (PDEs). However, ensuring stability and efficiency remains a challenge, especially in scenarios involving nonlinear and time-dependent equations. Methodology: This paper introduces novel residual-based architecture…
▽ More
Background: Deep learning techniques, particularly neural networks, have revolutionized computational physics, offering powerful tools for solving complex partial differential equations (PDEs). However, ensuring stability and efficiency remains a challenge, especially in scenarios involving nonlinear and time-dependent equations. Methodology: This paper introduces novel residual-based architectures, namely the Simple Highway Network and the Squared Residual Network, designed to enhance stability and accuracy in physics-informed neural networks (PINNs). These architectures augment traditional neural networks by incorporating residual connections, which facilitate smoother weight updates and improve backpropagation efficiency. Results: Through extensive numerical experiments across various examples including linear and nonlinear, time-dependent and independent PDEs we demonstrate the efficacy of the proposed architectures. The Squared Residual Network, in particular, exhibits robust performance, achieving enhanced stability and accuracy compared to conventional neural networks. These findings underscore the potential of residual-based architectures in advancing deep learning for PDEs and computational physics applications.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
On a problem of Nathanson on non-minimal additive complements
Authors:
Shi--Qiang Chen,
Yuchen Ding
Abstract:
Let $C$ and $W$ be two sets of integers. If $C+W=\mathbb{Z}$, then $C$ is called an additive complement to $W$. We further call $C$ a minimal additive complement to $W$ if no proper subset of $C$ is an additive complement to $W$. Answering a problem of Nathanson in part, we give sufficient conditions of $W$ which has no minimal additive complements. Our result also extends the prior result of Chen…
▽ More
Let $C$ and $W$ be two sets of integers. If $C+W=\mathbb{Z}$, then $C$ is called an additive complement to $W$. We further call $C$ a minimal additive complement to $W$ if no proper subset of $C$ is an additive complement to $W$. Answering a problem of Nathanson in part, we give sufficient conditions of $W$ which has no minimal additive complements. Our result also extends the prior result of Chen and Yang.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Shape optimization of non-matching isogeometric shells with moving intersections
Authors:
Han Zhao,
John T. Hwang,
J. S. Chen
Abstract:
While shape optimization using isogeometric shells exhibits appealing features by integrating design geometries and analysis models, challenges arise when addressing computer-aided design (CAD) geometries comprised of multiple non-uniform rational B-splines (NURBS) patches, which are common in practice. The intractability stems from surface intersections within these CAD models. In this paper, we…
▽ More
While shape optimization using isogeometric shells exhibits appealing features by integrating design geometries and analysis models, challenges arise when addressing computer-aided design (CAD) geometries comprised of multiple non-uniform rational B-splines (NURBS) patches, which are common in practice. The intractability stems from surface intersections within these CAD models. In this paper, we develop an approach for shape optimization of non-matching isogeometric shells incorporating intersection movement. Separately parametrized NURBS surfaces are modeled using Kirchhoff--Love shell theory and coupled using a penalty-based formulation. The optimization scheme allows shell patches to move without preserving relative location with other members during the shape optimization. This flexibility is achieved through an implicit state function, and analytical sensitivities are derived for the relative movement of shell patches. The introduction of differentiable intersections expands the design space and overcomes challenges associated with large mesh distortion, particularly when optimal shapes involve significant movement of patch intersections in physical space. Throughout optimization iterations, all members within the shell structures maintain the NURBS geometry representation, enabling efficient integration of analysis and design models. The optimization approach leverages the multilevel design concept by selecting a refined model for accurate analysis from a coarse design model while maintaining the same geometry. We adopt several example problems to verify the effectiveness of the proposed scheme and demonstrate its applicability to the optimization of the internal stiffeners of an aircraft wing.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
An adaptive Levin method for complicated domains
Authors:
Shukui Chen,
Kirill Serkh,
James Bremer
Abstract:
In this paper we describe an adaptive Levin method for numerically evaluating integrals of the form $\int_Ωf(\mathbf x) \exp(i g(\mathbf x)) \,dΩ$ over general domains that have been meshed by transfinite elements. On each element, we apply the multivariate Levin method over adaptively refined sub-elements, until the integral has been computed to the desired accuracy. Resonance points on the bound…
▽ More
In this paper we describe an adaptive Levin method for numerically evaluating integrals of the form $\int_Ωf(\mathbf x) \exp(i g(\mathbf x)) \,dΩ$ over general domains that have been meshed by transfinite elements. On each element, we apply the multivariate Levin method over adaptively refined sub-elements, until the integral has been computed to the desired accuracy. Resonance points on the boundaries of the elements are handled by the application of the univariate adaptive Levin method. When the domain does not contain stationary points, the cost of the resulting method is essentially independent of the frequency, even in the presence of resonance points.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Faster Diffusion-based Sampling with Randomized Midpoints: Sequential and Parallel
Authors:
Shivam Gupta,
Linda Cai,
Sitan Chen
Abstract:
In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and…
▽ More
In recent years, there has been a surge of interest in proving discretization bounds for diffusion models. These works show that for essentially any data distribution, one can approximately sample in polynomial time given a sufficiently accurate estimate of its score functions at different noise levels. In this work, we propose a new discretization scheme for diffusion models inspired by Shen and Lee's randomized midpoint method for log-concave sampling~\cite{ShenL19}. We prove that this approach achieves the best known dimension dependence for sampling from arbitrary smooth distributions in total variation distance ($\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work). We also show that our algorithm can be parallelized to run in only $\widetilde O(\log^2 d)$ parallel rounds, constituting the first provable guarantees for parallel sampling with diffusion models.
As a byproduct of our methods, for the well-studied problem of log-concave sampling in total variation distance, we give an algorithm and simple analysis achieving dimension dependence $\widetilde O(d^{5/12})$ compared to $\widetilde O(\sqrt{d})$ from prior work.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Global Convergence of Decentralized Retraction-Free Optimization on the Stiefel Manifold
Authors:
Youbang Sun,
Shixiang Chen,
Alfredo Garcia,
Shahin Shahrampour
Abstract:
Many classical and modern machine learning algorithms require solving optimization tasks under orthogonal constraints. Solving these tasks often require calculating retraction-based gradient descent updates on the corresponding Riemannian manifold, which can be computationally expensive. Recently Ablin et al. proposed an infeasible retraction-free algorithm, which is significantly more efficient.…
▽ More
Many classical and modern machine learning algorithms require solving optimization tasks under orthogonal constraints. Solving these tasks often require calculating retraction-based gradient descent updates on the corresponding Riemannian manifold, which can be computationally expensive. Recently Ablin et al. proposed an infeasible retraction-free algorithm, which is significantly more efficient. In this paper, we study the decentralized non-convex optimization task over a network of agents on the Stiefel manifold with retraction-free updates. We propose \textbf{D}ecentralized \textbf{R}etraction-\textbf{F}ree \textbf{G}radient \textbf{T}racking (DRFGT) algorithm, and show that DRFGT exhibits ergodic $\mathcal{O}(1/K)$ convergence rate, the same rate of convergence as the centralized, retraction-based methods. We also provide numerical experiments demonstrating that DRFGT performs on par with the state-of-the-art retraction based methods with substantially reduced computational overhead.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
Dynamical behavior and optimal control of a stochastic SAIRS epidemic model with two saturated incidences
Authors:
Xiaohui Zhang,
Zhiming Li,
Shenglong Chen,
Jikai Yang
Abstract:
Stochastic models are widely used to investigate the spread of epidemics in a complex environment. This paper extends a deterministic SAIRS epidemic model to a stochastic case with limited patient capacity and exposure. We first study the dynamical properties of the model under certain conditions, including persistence, extinction, and ergodic. Then, we introduce vaccination and isolation into the…
▽ More
Stochastic models are widely used to investigate the spread of epidemics in a complex environment. This paper extends a deterministic SAIRS epidemic model to a stochastic case with limited patient capacity and exposure. We first study the dynamical properties of the model under certain conditions, including persistence, extinction, and ergodic. Then, we introduce vaccination and isolation into the model as control variables. The optimal control strategies are obtained based on the Pontryagin minimum principle. Finally, numerical simulations are given to illustrate our theoretical results.
△ Less
Submitted 16 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Bismut torsion parallel metrics with constant holomorphic sectional curvature
Authors:
Shuwen Chen,
Fangyang Zheng
Abstract:
An old conjecture in non-Kähler geometry states that, if a compact Hermitian manifold has constant holomorphic sectional curvature, then the metric must be Kähler (when the constant is non-zero) or Chern flat (when the constant is zero). It is known to be true in complex dimension $2$ by the work of Balas and Gauduchon in 1985 (when the constant is negative or zero) and Apostolov, Davidov and Musk…
▽ More
An old conjecture in non-Kähler geometry states that, if a compact Hermitian manifold has constant holomorphic sectional curvature, then the metric must be Kähler (when the constant is non-zero) or Chern flat (when the constant is zero). It is known to be true in complex dimension $2$ by the work of Balas and Gauduchon in 1985 (when the constant is negative or zero) and Apostolov, Davidov and Muskarov in 1996 (when the constant is positive). In dimension $3$ or higher, the conjecture is only known in some special cases, such as the locally conformally Kähler case (when the constant is negative or zero) by the work of Chen, Chen and Nie, or for complex nilmanifolds with nilpotent $J$ by the work of Li and the second named author.
In this note, we confirm the above conjecture for all non-balanced Bismut torsion parallel (BTP) manifolds. Here the BTP condition means that the Bismut connection has parallel torsion. In particular, the conjecture is valid for all Vaisman manifolds.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Existence of non-Abelian vortices in a coupled 4D-2D quantum field theory
Authors:
Yilu Xu,
Shouxin Chen
Abstract:
Vortices produce locally concentrated field configurations and are solutions to the nonlinear partial differential equations systems of complicated structures. In this paper, we establish the existence and uniqueness for solutions of the gauged non-Abelian vortices in a coupled 4D-2D quantum field theory by researching the nonlinear elliptic equations systems with exponential terms in…
▽ More
Vortices produce locally concentrated field configurations and are solutions to the nonlinear partial differential equations systems of complicated structures. In this paper, we establish the existence and uniqueness for solutions of the gauged non-Abelian vortices in a coupled 4D-2D quantum field theory by researching the nonlinear elliptic equations systems with exponential terms in $\mathbb{R}^{2}$ using the calculus of variations. In addition, we obtain the asymptotic behavior of the solutions at infinity and the quantized integrals in $\mathbb{R}^{2}$.
△ Less
Submitted 30 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Post-Poisson algebras and Poisson Yang-Baxter equation via bimodule algebra deformations
Authors:
Siyuan Chen,
Chengming Bai,
Li Guo
Abstract:
A fundamental construction of Poisson algebras is as the quasiclassical limits (QCLs) of associative algebra deformations of commutative associative algebras. This paper extends this construction to the relative context with the notion of (bi)module algebras over another algebra for a given algebraic structure. In this language, a module Poisson algebras can be realized as the QCLs of bimodule ass…
▽ More
A fundamental construction of Poisson algebras is as the quasiclassical limits (QCLs) of associative algebra deformations of commutative associative algebras. This paper extends this construction to the relative context with the notion of (bi)module algebras over another algebra for a given algebraic structure. In this language, a module Poisson algebras can be realized as the QCLs of bimodule associative deformations of module commutative associative algebras. Moreover, the notion of the scalar deformation of an $\mathcal O$-operator is introduced so that the process of bimodule algebras deformations to QCLs is endowed with $\mathcal O$-operators in a consistent manner. As an explicit illustration of this process, post-Poisson algebras are realized as the QCLs of bimodule associative deformations of module commutative associative algebras with the identity maps as $\mathcal O$-operators, recovering the known fact that post-Poisson algebras are the QCLs of tridendriform algebra deformations of commutative tridendriform algebras. Furthermore,
the notion of scalar deformations of solutions of the associative Yang-Baxter equation (AYBE) is applied to realize solutions of the Poisson Yang-Baxter equation (PYBE) in Poisson algebras as solutions of the AYBE in commutative associative algebras, giving a YBE version of Poisson algebras as the QCLs of associative deformations of the commutative associative algebras. Finally, concrete solutions of the PYBE are obtained from the aforementioned tridendriform deformation-to-QCLs process.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
A Lane Usage Strategy for General Traffic Access on Bus Lanes under Mixed Traffic Environment
Authors:
Haoran Li,
Zhenzhou Yuan,
Rui Yue,
Guangchuan Yang,
Chuang Zhu,
Siyuan Chen
Abstract:
The strategy of permitting general traffic to use the bus lane for improved utilization while ensuring bus priority has gained increasingly attention, particularly with the support of vehicle-to-everything technology. In this study, we propose a novel lane usage strategy called Dynamic Spatial-Temporal Priority (DSTP) to ensure bus priority and optimize bus lane usage in a mixed traffic environmen…
▽ More
The strategy of permitting general traffic to use the bus lane for improved utilization while ensuring bus priority has gained increasingly attention, particularly with the support of vehicle-to-everything technology. In this study, we propose a novel lane usage strategy called Dynamic Spatial-Temporal Priority (DSTP) to ensure bus priority and optimize bus lane usage in a mixed traffic environment. DSTP leverages dynamic methods to identify available spatial-temporal resources in the lane, utilizing signal timing, road information, and vehicle data. A Right-of-Way assignment optimization model is then developed based on these resources to determine which vehicles can enter the bus lane. The model is dynamically enacted using a rolling horizon scheme to accommodate time-varying traffic conditions. Numerical studies have validated the advantages of DSTP, showing maintained bus priority, improved traffic efficiency, reduced fuel consumption, and lower CO2 emissions, especially during periods of high traffic demand and concentrated bus arrivals.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Uniform vorticity depletion and inviscid damping for periodic shear flows in the high Reynolds number regime
Authors:
Rajendra Beekie,
Shan Chen,
Hao Jia
Abstract:
We study the dynamics of the two dimensional Navier-Stokes equations linearized around a shear flow on a (non-square) torus which possesses exactly two non-degenerate critical points. We obtain linear inviscid damping and vorticity depletion estimates for the linearized flow that are uniform with respect to the viscosity, and enhanced dissipation type decay estimates. The main task is to understan…
▽ More
We study the dynamics of the two dimensional Navier-Stokes equations linearized around a shear flow on a (non-square) torus which possesses exactly two non-degenerate critical points. We obtain linear inviscid damping and vorticity depletion estimates for the linearized flow that are uniform with respect to the viscosity, and enhanced dissipation type decay estimates. The main task is to understand the associated Rayleigh and Orr-Sommerfeld equations, under the natural assumption that the linearized operator around the shear flow in the inviscid case has no discrete eigenvalues. The key difficulty is to understand the behavior of the solution to Orr-Sommerfeld equations in three distinct regimes depending on the spectral parameter: the non-degenerate case when the spectral parameter is away from the critical values, the intermediate case when the spectral parameter is close to but still separated from the critical values, and the most singular case when the spectral parameter is inside the viscous layer.
△ Less
Submitted 26 April, 2024; v1 submitted 19 March, 2024;
originally announced March 2024.
-
A GNN Approach for Cell-Free Massive MIMO
Authors:
Lou Salaun,
Hong Yang,
Shashwat Mishra,
Chung Shue Chen
Abstract:
Beyond 5G wireless technology Cell-Free Massive MIMO (CFmMIMO) downlink relies on carefully designed precoders and power control to attain uniformly high rate coverage. Many such power control problems can be calculated via second order cone programming (SOCP). In practice, several order of magnitude faster numerical procedure is required because power control has to be rapidly updated to adapt to…
▽ More
Beyond 5G wireless technology Cell-Free Massive MIMO (CFmMIMO) downlink relies on carefully designed precoders and power control to attain uniformly high rate coverage. Many such power control problems can be calculated via second order cone programming (SOCP). In practice, several order of magnitude faster numerical procedure is required because power control has to be rapidly updated to adapt to changing channel conditions. We propose a Graph Neural Network (GNN) based solution to replace SOCP. Specifically, we develop a GNN to obtain downlink max-min power control for a CFmMIMO with maximum ratio transmission (MRT) beamforming. We construct a graph representation of the problem that properly captures the dominant dependence relationship between access points (APs) and user equipments (UEs). We exploit a symmetry property, called permutation equivariance, to attain training simplicity and efficiency. Simulation results show the superiority of our approach in terms of computational complexity, scalability and generalizability for different system sizes and deployment scenarios.
△ Less
Submitted 8 February, 2024;
originally announced March 2024.
-
Implicit Regularization of Gradient Flow on One-Layer Softmax Attention
Authors:
Heejune Sheen,
Siyu Chen,
Tianhao Wang,
Harrison H. Zhou
Abstract:
We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model, where the key and query weight matrices are trained separately. Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices. Such i…
▽ More
We study gradient flow on the exponential loss for a classification problem with a one-layer softmax attention model, where the key and query weight matrices are trained separately. Under a separability assumption on the data, we show that when gradient flow achieves the minimal loss value, it further implicitly minimizes the nuclear norm of the product of the key and query weight matrices. Such implicit regularization can be described by a Support Vector Machine (SVM) problem with respect to the attention weights. This finding contrasts with prior results showing that the gradient descent induces an implicit regularization on the Frobenius norm on the product weight matrix when the key and query matrices are combined into a single weight matrix for training. For diagonal key and query matrices, our analysis builds upon the reparameterization technique and exploits approximate KKT conditions of the SVM associated with the classification data. Moreover, the results are extended to general weights configurations given proper alignment of the weight matrices' singular spaces with the data features at initialization.
△ Less
Submitted 13 March, 2024;
originally announced March 2024.
-
Training Dynamics of Multi-Head Softmax Attention for In-Context Learning: Emergence, Convergence, and Optimality
Authors:
Siyu Chen,
Heejune Sheen,
Tianhao Wang,
Zhuoran Yang
Abstract:
We study the dynamics of gradient flow for training a multi-head softmax attention model for in-context learning of multi-task linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving…
▽ More
We study the dynamics of gradient flow for training a multi-head softmax attention model for in-context learning of multi-task linear regression. We establish the global convergence of gradient flow under suitable choices of initialization. In addition, we prove that an interesting "task allocation" phenomenon emerges during the gradient flow dynamics, where each attention head focuses on solving a single task of the multi-task model. Specifically, we prove that the gradient flow dynamics can be split into three phases -- a warm-up phase where the loss decreases rather slowly and the attention heads gradually build up their inclination towards individual tasks, an emergence phase where each head selects a single task and the loss rapidly decreases, and a convergence phase where the attention parameters converge to a limit. Furthermore, we prove the optimality of gradient flow in the sense that the limiting model learned by gradient flow is on par with the best possible multi-head softmax attention model up to a constant factor. Our analysis also delineates a strict separation in terms of the prediction accuracy of ICL between single-head and multi-head attention models. The key technique for our convergence analysis is to map the gradient flow dynamics in the parameter space to a set of ordinary differential equations in the spectral domain, where the relative magnitudes of the semi-singular values of the attention weights determines task allocation. To our best knowledge, our work provides the first convergence result for the multi-head softmax attention model.
△ Less
Submitted 10 June, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Parallel Summation in P-Recursive Extensions
Authors:
Shaoshi Chen,
Ruyong Feng,
Manuel Kauers,
Xiuyun Li
Abstract:
We propose investigating a summation analog of the paradigm for parallel integration. We make some first steps towards an indefinite summation method applicable to summands that rationally depend on the summation index and a P-recursive sequence and its shifts. There is a distinction between so-called normal and so-called special polynomials. Under the assumption that the corresponding difference…
▽ More
We propose investigating a summation analog of the paradigm for parallel integration. We make some first steps towards an indefinite summation method applicable to summands that rationally depend on the summation index and a P-recursive sequence and its shifts. There is a distinction between so-called normal and so-called special polynomials. Under the assumption that the corresponding difference field has no unnatural constants, we are able to predict the normal polynomials appearing in the denominator of a potential closed form. We can also handle the numerator. Our method is incomplete so far as we cannot predict the special polynomials appearing in the denominator. However, we do have some structural results about special polynomials for the setting under consideration.
△ Less
Submitted 7 June, 2024; v1 submitted 7 February, 2024;
originally announced February 2024.
-
Log concavity of the Grothendieck class of $\overline{\mathcal M}_{0,n}$
Authors:
Paolo Aluffi,
Stephanie Chen,
Matilde Marcolli
Abstract:
Using a known recursive formula for the Grothendieck classes of the moduli spaces $\overline{\mathcal M}_{0,n}$, we prove that they satisfy an asymptotic form of ultra-log-concavity as polynomials in the Lefschetz class. We also observe that these polynomials are $γ$-positive. Both properties, along with numerical evidence, support the conjecture that these polynomials only have real zeros. This c…
▽ More
Using a known recursive formula for the Grothendieck classes of the moduli spaces $\overline{\mathcal M}_{0,n}$, we prove that they satisfy an asymptotic form of ultra-log-concavity as polynomials in the Lefschetz class. We also observe that these polynomials are $γ$-positive. Both properties, along with numerical evidence, support the conjecture that these polynomials only have real zeros. This conjecture may be viewed as a particular case of a possible extension of a conjecture of Ferroni-Schröter and Huh on Hilbert series of Chow rings of matroids.
We prove asymptotic ultra-log-concavity by studying differential equations obtained from the recursion, whose solutions are the generating functions of the individual betti numbers of $\overline{\mathcal M}_{0,n}$. We obtain a rather complete description of these generating functions, determining their asymptotic behavior; their dominant term is controlled by the coefficients of the Lambert W function. The $γ$-positivity property follows directly from the recursion, extending the argument of Ferroni et al. proving $γ$-positivity for the Hilbert series of the Chow ring of matroids.
△ Less
Submitted 8 February, 2024; v1 submitted 4 February, 2024;
originally announced February 2024.
-
Exponential Ergodicity of CBIRE-Processes with Competition and Catastrophes
Authors:
Shukai Chen,
Rongjuan Fang,
Lina Ji,
Jian Wang
Abstract:
We establish the exponential ergodic property in a weighted total variation distance of continuous-state branching processes with immigration in random environments with competition and catastrophes, under a Lyapunov-type condition and other mild assumptions. The proof is based on a Markov coupling process along with some delicate estimates for the associated coupling generator. In particular, the…
▽ More
We establish the exponential ergodic property in a weighted total variation distance of continuous-state branching processes with immigration in random environments with competition and catastrophes, under a Lyapunov-type condition and other mild assumptions. The proof is based on a Markov coupling process along with some delicate estimates for the associated coupling generator. In particular, the main result indicates whether and how the competition mechanism, the environment and the catastrophe could balance the branching mechanism respectively to guarantee the exponential ergodicity of the process.
△ Less
Submitted 2 February, 2024;
originally announced February 2024.
-
Bayesian sampling using interacting particles
Authors:
Shi Chen,
Zhiyan Ding,
Qin Li
Abstract:
Bayesian sampling is an important task in statistics and machine learning. Over the past decade, many ensemble-type sampling methods have been proposed. In contrast to the classical Markov chain Monte Carlo methods, these new methods deploy a large number of interactive samples, and the communication between these samples is crucial in speeding up the convergence. To justify the validity of these…
▽ More
Bayesian sampling is an important task in statistics and machine learning. Over the past decade, many ensemble-type sampling methods have been proposed. In contrast to the classical Markov chain Monte Carlo methods, these new methods deploy a large number of interactive samples, and the communication between these samples is crucial in speeding up the convergence. To justify the validity of these sampling strategies, the concept of interacting particles naturally calls for the mean-field theory. The theory establishes a correspondence between particle interactions encoded in a set of coupled ODEs/SDEs and a PDE that characterizes the evolution of the underlying distribution. This bridges numerical algorithms with the PDE theory used to show convergence in time. Many mathematical machineries are developed to provide the mean-field analysis, and we showcase two such examples: The coupling method and the compactness argument built upon the martingale strategy. The former has been deployed to show the convergence of ensemble Kalman sampler and ensemble Kalman inversion, and the latter will be shown to be immensely powerful in proving the validity of the Vlasov-Boltzmann simulator.
△ Less
Submitted 13 May, 2024; v1 submitted 23 January, 2024;
originally announced January 2024.
-
N-Adaptive Ritz Method: A Neural Network Enriched Partition of Unity for Boundary Value Problems
Authors:
Jonghyuk Baek,
Yanran Wang,
J. S. Chen
Abstract:
Conventional finite element methods are known to be tedious in adaptive refinements due to their conformal regularity requirements. Further, the enrichment functions for adaptive refinements are often not readily available in general applications. This work introduces a novel neural network-enriched Partition of Unity (NN-PU) approach for solving boundary value problems via artificial neural netwo…
▽ More
Conventional finite element methods are known to be tedious in adaptive refinements due to their conformal regularity requirements. Further, the enrichment functions for adaptive refinements are often not readily available in general applications. This work introduces a novel neural network-enriched Partition of Unity (NN-PU) approach for solving boundary value problems via artificial neural networks with a potential energy-based loss function minimization. The flexibility and adaptivity of the NN function space are utilized to capture complex solution patterns that the conventional Galerkin methods fail to capture. The NN enrichment is constructed by combining pre-trained feature-encoded NN blocks with an additional untrained NN block. The pre-trained NN blocks learn specific local features during the offline stage, enabling efficient enrichment of the approximation space during the online stage through the Ritz-type energy minimization. The NN enrichment is introduced under the Partition of Unity (PU) framework, ensuring convergence of the proposed method. The proposed NN-PU approximation and feature-encoded transfer learning forms an adaptive approximation framework, termed the neural-refinement (n-refinement), for solving boundary value problems. Demonstrated by solving various elasticity problems, the proposed method offers accurate solutions while notably reducing the computational cost compared to the conventional adaptive refinement in the mesh-based methods.
△ Less
Submitted 16 January, 2024;
originally announced January 2024.
-
Interference Management in 5G and Beyond Networks
Authors:
Nessrine Trabelsi,
Lamia Chaari Fourati,
Chung Shue Chen
Abstract:
During the last decade, wireless data services have had an incredible impact on people's lives in ways we could never have imagined. The number of mobile devices has increased exponentially and data traffic has almost doubled every year. Undoubtedly, the rate of growth will continue to be rapid with the explosive increase in demands for data rates, latency, massive connectivity, network reliabilit…
▽ More
During the last decade, wireless data services have had an incredible impact on people's lives in ways we could never have imagined. The number of mobile devices has increased exponentially and data traffic has almost doubled every year. Undoubtedly, the rate of growth will continue to be rapid with the explosive increase in demands for data rates, latency, massive connectivity, network reliability, and energy efficiency. In order to manage this level of growth and meet these requirements, the fifth-generation (5G) mobile communications network is envisioned as a revolutionary advancement combining various improvements to previous mobile generation networks and new technologies, including the use of millimeter wavebands (mm-wave), massive multiple-input multipleoutput (mMIMO) multi-beam antennas, network densification, dynamic Time Division Duplex (TDD) transmission, and new waveforms with mixed numerologies. New revolutionary features including terahertz (THz) communications and the integration of Non-Terrestrial Networks (NTN) can further improve the performance and signal quality for future 6G networks. However, despite the inevitable benefits of all these key technologies, the heterogeneous and ultra-flexible structure of the 5G and beyond network brings non-orthogonality into the system and generates significant interference that needs to be handled carefully. Therefore, it is essential to design effective interference management schemes to mitigate severe and sometimes unpredictable interference in mobile networks. In this paper, we provide a comprehensive review of interference management in 5G and Beyond networks and discuss its future evolution. We start with a unified classification and a detailed explanation of the different types of interference and continue by presenting our taxonomy of existing interference management approaches. Then, after explaining interference measurement reports and signaling, we provide for each type of interference identified, an in-depth literature review and technical discussion of appropriate management schemes. We finish by discussing the main interference challenges that will be encountered in future 6G networks and by presenting insights on the suggested new interference management approaches, including useful guidelines for an AI-based solution. This review will provide a first-hand guide to the industry in determining the most relevant technology for interference management, and will also allow for consideration of future challenges and research directions.
△ Less
Submitted 3 January, 2024;
originally announced January 2024.
-
Representation functions in the set of natural numbers
Authors:
Shi-Qiang Chen,
Csaba Sándor,
Quan-Hui Yang
Abstract:
Let $\mathbb{N}$ be the set of all nonnegative integers. For $S\subseteq \mathbb{N}$ and $n\in \mathbb{N}$, let $R_S(n)$ denote the number of solutions of the equation $n=s+s'$, $s, s'\in S$, $s<s'$.
In this paper, we determine the structure of all sets $A$ and $B$ such that $A\cup B=\mathbb{N}\setminus\{r+mk:k\in\mathbb{N}\}$, $A\cap B=\emptyset$ and $R_{A}(n)=R_{B}(n)$ for every positive integ…
▽ More
Let $\mathbb{N}$ be the set of all nonnegative integers. For $S\subseteq \mathbb{N}$ and $n\in \mathbb{N}$, let $R_S(n)$ denote the number of solutions of the equation $n=s+s'$, $s, s'\in S$, $s<s'$.
In this paper, we determine the structure of all sets $A$ and $B$ such that $A\cup B=\mathbb{N}\setminus\{r+mk:k\in\mathbb{N}\}$, $A\cap B=\emptyset$ and $R_{A}(n)=R_{B}(n)$ for every positive integer $n$, where $m$ and $r$ are two integers with $m\ge 2$ and $r\ge 0$.
△ Less
Submitted 28 December, 2023;
originally announced December 2023.
-
On Meta-Prompting
Authors:
Adrian de Wynter,
Xun Wang,
Qilong Gu,
Si-Qing Chen
Abstract:
Certain statistical models are capable of interpreting input strings as instructions, or prompts, and carry out tasks based on them. Many approaches to prompting and pre-training these models involve the automated generation of these prompts. We call these approaches meta-prompting, or prompting to obtain prompts. We propose a theoretical framework based on category theory to generalize and descri…
▽ More
Certain statistical models are capable of interpreting input strings as instructions, or prompts, and carry out tasks based on them. Many approaches to prompting and pre-training these models involve the automated generation of these prompts. We call these approaches meta-prompting, or prompting to obtain prompts. We propose a theoretical framework based on category theory to generalize and describe them. This framework is flexible enough to account for LLM stochasticity; and allows us to obtain formal results around task agnosticity and equivalence of various meta-prompting approaches. We experiment with meta-prompting in two active areas of model research: creativity and ideation. We find that user preference favors (p < 0.01) the prompts generated under meta-prompting, as well as their corresponding outputs, over a series of hardcoded baseline prompts that include the original task prompt. Using our framework, we argue that meta-prompting is more effective than basic prompting at generating desirable outputs.
△ Less
Submitted 11 December, 2023;
originally announced December 2023.
-
Bijection between positive clusters and projectively signed exceptional sequences
Authors:
Shujian Chen,
Kiyoshi Igusa
Abstract:
In 2017, Igusa and Todorov gave a bijection between signed exceptional sequences and ordered partial clusters. In this paper, we show that every term in an exceptional sequence is either relatively projective or relatively injective or both and we refine this bijection to one between projectively signed exceptional sequences and ordered partial positive clusters by assigning signs to objects that…
▽ More
In 2017, Igusa and Todorov gave a bijection between signed exceptional sequences and ordered partial clusters. In this paper, we show that every term in an exceptional sequence is either relatively projective or relatively injective or both and we refine this bijection to one between projectively signed exceptional sequences and ordered partial positive clusters by assigning signs to objects that are not relatively injective. We also give a characterization of relatively projective/injective objects in terms of supports of the objects in the exceptional sequence.
△ Less
Submitted 19 April, 2024; v1 submitted 10 December, 2023;
originally announced December 2023.
-
Positive scalar curvature metrics and aspherical summands
Authors:
Shuli Chen,
Jianchun Chu,
Jintian Zhu
Abstract:
We prove for $n\in\{3,4,5\}$ that the connected sum of a closed aspherical $n$-manifold with an arbitrary non-compact manifold does not admit a complete metric with nonnegative scalar curvature. In particular, a special case of our result answers a question of Gromov.
More generally, we generalize the partial classification result of Chodosh, Li, and Liokumovich to the non-compact domination cas…
▽ More
We prove for $n\in\{3,4,5\}$ that the connected sum of a closed aspherical $n$-manifold with an arbitrary non-compact manifold does not admit a complete metric with nonnegative scalar curvature. In particular, a special case of our result answers a question of Gromov.
More generally, we generalize the partial classification result of Chodosh, Li, and Liokumovich to the non-compact domination case with our newly-developed technique.
Our result unifies all previous results of this type, and confirms the validity of Gromov's non-compact domination conjecture for closed aspherical manifolds of dimensions 3, 4, and 5.
△ Less
Submitted 31 March, 2024; v1 submitted 7 December, 2023;
originally announced December 2023.
-
Optimal (partial) transport between non-convex polygonal domains
Authors:
Shibing Chen,
Yuanyuan Li,
Jiakun Liu
Abstract:
In this paper, we investigate the optimal (partial) transport problem between non-convex polygonal domains in \(\mathbb{R}^2\). In the case of the complete optimal transport problem, we demonstrate that the singular set is either a finite set, or, except for a finite number of points, is locally a 1-dimensional smooth curve. As for the optimal partial transport, we establish that the free boundary…
▽ More
In this paper, we investigate the optimal (partial) transport problem between non-convex polygonal domains in \(\mathbb{R}^2\). In the case of the complete optimal transport problem, we demonstrate that the singular set is either a finite set, or, except for a finite number of points, is locally a 1-dimensional smooth curve. As for the optimal partial transport, we establish that the free boundary is smooth except for a finite number of singular points.
△ Less
Submitted 3 December, 2023; v1 submitted 27 November, 2023;
originally announced November 2023.
-
Bounding Lifts of Markoff Triples mod $p$
Authors:
Elisa Bellah,
Siran Chen,
Elena Fuchs,
Lynnelle Ye
Abstract:
In 2016, Bourgain, Gamburd, and Sarnak proved that Strong Approximation holds for the Markoff surface in most cases. That is, the modulo $p$ solutions to the equation $X_1^2+X_2^2+X_3^2=3X_1X_2X_3$ are covered by the integer solutions for most primes $p$. In this paper, we provide upper bounds on lifts of mod $p$ points of the Markoff surface by analyzing the growth along paths in the Markoff mod…
▽ More
In 2016, Bourgain, Gamburd, and Sarnak proved that Strong Approximation holds for the Markoff surface in most cases. That is, the modulo $p$ solutions to the equation $X_1^2+X_2^2+X_3^2=3X_1X_2X_3$ are covered by the integer solutions for most primes $p$. In this paper, we provide upper bounds on lifts of mod $p$ points of the Markoff surface by analyzing the growth along paths in the Markoff mod $p$ graphs. Our first upper bound follows the algorithm given in the paper of Bourgain, Gamburd, and Sarnak, which constructs a path of possibly long length but where points grow relatively slowly. Our second bound considers paths in these graphs of short length but possibly large growth. We then provide numerical evidence and heuristic arguments for how these bounds might be improved.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Bell-INGARCH Model
Authors:
Ying Wang,
Shuang Chen,
Lianyong Qian
Abstract:
Integer-valued time series exist widely in economics, finance, biology, computer science, medicine, insurance, and many other fields. In recent years, many types of models have been proposed to model integer-valued time series data, in which the integer autoregressive model and integer-valued GARCH model are the most representative. Although there have been many results of integer-valued time seri…
▽ More
Integer-valued time series exist widely in economics, finance, biology, computer science, medicine, insurance, and many other fields. In recent years, many types of models have been proposed to model integer-valued time series data, in which the integer autoregressive model and integer-valued GARCH model are the most representative. Although there have been many results of integer-valued time series data, the parameters of integer-valued time series model structure are more complicated. This paper is dedicated to proposing a new simple integer-valued GARCH model. First, the Bell integer-valued GARCH model is given based on Bell distribution. Then, the conditional maximum likelihood estimation method is used to obtain the estimators of parameters. Later, numerical simulations confirm the finite sample properties of the estimation of unknown parameters. Finally, the model is applied in the two real examples. Compared with the existing models, the proposed model is more simple and applicable.
△ Less
Submitted 19 November, 2023;
originally announced November 2023.
-
Xi-Zhao Model Preserves WD Spaces
Authors:
Siheng Chen,
Qingguo Li
Abstract:
For a $T_1$ space $X$, Zhao and Xi constructed a dcpo model $\hat{P}$, where $P$ is a bounded complete algebraic poset model of $X$. In this paper, we formulate the closed WD subsets of the maximal point space $\mathrm{Max}(\hat{P})$ and the Scott space $Σ\hat{P}$, and then prove that $X$ is a WD space if and only if $Σ\hat{P}$ is a WD space.
It is also shown that the sobrification $X^s$ (resp.,…
▽ More
For a $T_1$ space $X$, Zhao and Xi constructed a dcpo model $\hat{P}$, where $P$ is a bounded complete algebraic poset model of $X$. In this paper, we formulate the closed WD subsets of the maximal point space $\mathrm{Max}(\hat{P})$ and the Scott space $Σ\hat{P}$, and then prove that $X$ is a WD space if and only if $Σ\hat{P}$ is a WD space.
It is also shown that the sobrification $X^s$ (resp., the well-filtered reflection $X^w$) of $X$ can be embedded into the sobrification $\hat{P}^s$ (resp., the well-filtered reflection $\hat{P}^w$) of $\hat{P}$ as a saturated subspace. Finally, we introduce two new concepts ``${\rm H}$-model spaces" and ``weak ${\rm H}$-model spaces", and provide a general framework to prove that such $T_1$ spaces can be preserved by Xi-Zhao dcpo models.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Stability Problems on D-finite Functions
Authors:
Shaoshi Chen,
Ruyong Feng,
Zewang Guo,
Wei Lu
Abstract:
This paper continues the studies of symbolic integration by focusing on the stability problems on D-finite functions. We introduce the notion of stability index in order to investigate the order growth of the differential operators satisfied by iterated integrals of D-finite functions and determine bounds and exact formula for stability indices of several special classes of differential operators.…
▽ More
This paper continues the studies of symbolic integration by focusing on the stability problems on D-finite functions. We introduce the notion of stability index in order to investigate the order growth of the differential operators satisfied by iterated integrals of D-finite functions and determine bounds and exact formula for stability indices of several special classes of differential operators. With the basic properties of stability index, we completely solve the stability problem on general hyperexponential functions.
△ Less
Submitted 10 November, 2023;
originally announced November 2023.
-
Generalized Goulden-Yong duals and signed minimal factorizations
Authors:
Shujian Chen,
Kiyoshi Igusa
Abstract:
We show the equivalence between one-way reflections and relative projective representations. We construct generalized Goulden-Yong duals using reverse Garside element actions and folded chord diagrams. We give two applications of the generalized Goulden-Yong duals: constructing generalized Prüfer codes and counting signed factorizations using the matrix-tree theorem.
We show the equivalence between one-way reflections and relative projective representations. We construct generalized Goulden-Yong duals using reverse Garside element actions and folded chord diagrams. We give two applications of the generalized Goulden-Yong duals: constructing generalized Prüfer codes and counting signed factorizations using the matrix-tree theorem.
△ Less
Submitted 22 April, 2024; v1 submitted 9 November, 2023;
originally announced November 2023.
-
Escaping Saddle Points in Heterogeneous Federated Learning via Distributed SGD with Communication Compression
Authors:
Sijin Chen,
Zhize Li,
Yuejie Chi
Abstract:
We consider the problem of finding second-order stationary points of heterogeneous federated learning (FL). Previous works in FL mostly focus on first-order convergence guarantees, which do not rule out the scenario of unstable saddle points. Meanwhile, it is a key bottleneck of FL to achieve communication efficiency without compensating the learning accuracy, especially when local data are highly…
▽ More
We consider the problem of finding second-order stationary points of heterogeneous federated learning (FL). Previous works in FL mostly focus on first-order convergence guarantees, which do not rule out the scenario of unstable saddle points. Meanwhile, it is a key bottleneck of FL to achieve communication efficiency without compensating the learning accuracy, especially when local data are highly heterogeneous across different clients. Given this, we propose a novel algorithm Power-EF that only communicates compressed information via a novel error-feedback scheme. To our knowledge, Power-EF is the first distributed and compressed SGD algorithm that provably escapes saddle points in heterogeneous FL without any data homogeneity assumptions. In particular, Power-EF improves to second-order stationary points after visiting first-order (possibly saddle) points, using additional gradient queries and communication rounds only of almost the same order required by first-order convergence, and the convergence rate exhibits a linear speedup in terms of the number of workers. Our theory improves/recovers previous results, while extending to much more tolerant settings on the local data. Numerical experiments are provided to complement the theory.
△ Less
Submitted 29 October, 2023;
originally announced October 2023.
-
Power-Enhanced Residual Network for Function Approximation and Physics-Informed Inverse Problems
Authors:
Amir Noorizadegan,
D. L. Young,
Y. C. Hon,
C. S. Chen
Abstract:
In this study, we investigate how the updating of weights during forward operation and the computation of gradients during backpropagation impact the optimization process, training procedure, and overall performance of the neural network, particularly the multi-layer perceptrons (MLPs). This paper introduces a novel neural network structure called the Power-Enhancing residual network, inspired by…
▽ More
In this study, we investigate how the updating of weights during forward operation and the computation of gradients during backpropagation impact the optimization process, training procedure, and overall performance of the neural network, particularly the multi-layer perceptrons (MLPs). This paper introduces a novel neural network structure called the Power-Enhancing residual network, inspired by highway network and residual network, designed to improve the network's capabilities for both smooth and non-smooth functions approximation in 2D and 3D settings. By incorporating power terms into residual elements, the architecture enhances the stability of weight updating, thereby facilitating better convergence and accuracy. The study explores network depth, width, and optimization methods, showing the architecture's adaptability and performance advantages. Consistently, the results emphasize the exceptional accuracy of the proposed Power-Enhancing residual network, particularly for non-smooth functions. Real-world examples also confirm its superiority over plain neural network in terms of accuracy, convergence, and efficiency. Moreover, the proposed architecture is also applied to solving the inverse Burgers' equation, demonstrating superior performance. In conclusion, the Power-Enhancing residual network offers a versatile solution that significantly enhances neural network capabilities by emphasizing the importance of stable weight updates for effective training in deep neural networks. The codes implemented are available at: \url{https://github.com/CMMAi/ResNet_for_PINN}.
△ Less
Submitted 8 July, 2024; v1 submitted 24 October, 2023;
originally announced October 2023.
-
Riesz type theorems for $κ$-pluriharmonic mappings, invariant harmonic quasiregular mappings and harmonic quasiregular mappings
Authors:
Shaolin Chen,
Manzi Huang
Abstract:
The main purpose of this paper is to develop some methods to improve and generalize the main results in a recent paper by Liu and Zhu (Adv. Math., 2023, i.e., \cite{L-Z}). The paper consists of two parts. In the first part, we discuss the Riesz type theorem in the setting of $n$-dimensional complex spaces for all $n\geq 1$. In this part, we first introduce the family of $κ$-pluriharmonic mappings…
▽ More
The main purpose of this paper is to develop some methods to improve and generalize the main results in a recent paper by Liu and Zhu (Adv. Math., 2023, i.e., \cite{L-Z}). The paper consists of two parts. In the first part, we discuss the Riesz type theorem in the setting of $n$-dimensional complex spaces for all $n\geq 1$. In this part, we first introduce the family of $κ$-pluriharmonic mappings of the $n$-dimensional complex unit ball. Then we establish two Riesz type theorems for these mappings, which are the $n$-dimensional versions of Theorems 1.1 and 1.2 in \cite{L-Z}, respectively. Furthermore, even when $n=1$, our first result shows that the assumption of the real parts of the mappings not being negative (or being negative) in \cite[Theorem 1.1]{L-Z} is redundant; and our second result illustrates that the assumption of "quasiconformality" on the mappings in \cite[Theorem 1.2]{L-Z} can be replaced by the weaker one of "quasiregularity". In the second part, we investigate the Riesz type theorem in the setting of $n$-dimensional real spaces for all $n\geq 2$. In this part, first, we prove a Riesz type theorem for invariant harmonic quasiregular mappings of the unit $n$-dimensional real ball. Our result indicates that $(i)$ the range of the parameter $p$ discussed in \cite[Theorem 1.3]{L-Z} can be changed from $(1,2)$ to $(1,\infty)$; $(ii)$ the assumption of the first coordinate functions of the mappings being non-zero in \cite[Theorem 1.3]{L-Z} is redundant. In this way, we complete the discussions carried out in \cite[Theorems 1.3 and 1.4]{L-Z}. Second, we obtain a Riesz type theorem for harmonic $K$-quasiregular mappings of the unit $n$-dimensional real ball. Our result demonstrates that the range of the parameter $p$ discussed in \cite[Theorem 2.1]{K-2023} can be changed from $(1,2)$ to $(1,\infty)$.
△ Less
Submitted 29 October, 2023; v1 submitted 23 October, 2023;
originally announced October 2023.
-
Destabilization of synchronous periodic solutions for patch models: a criterion by period functions
Authors:
Shuang Chen,
Jicai Huang
Abstract:
In this paper, we study the destabilization of synchronous periodic solutions for patch models. By applying perturbation theory for matrices, we derive asymptotic expressions of the Floquet spectra and provide a destabilization criterion for synchronous periodic solutions arising from closed orbits or degenerate Hopf bifurcations in terms of period functions. Finally, we apply the main results to…
▽ More
In this paper, we study the destabilization of synchronous periodic solutions for patch models. By applying perturbation theory for matrices, we derive asymptotic expressions of the Floquet spectra and provide a destabilization criterion for synchronous periodic solutions arising from closed orbits or degenerate Hopf bifurcations in terms of period functions. Finally, we apply the main results to the well-known two-patch Holling-Tanner model.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Partition theorems and the Chinese remainder theorem
Authors:
Shi-Chao Chen
Abstract:
The famous partition theorem of Euler states that partitions of $n$ into distinct parts are equinumerous with partitions of $n$ into odd parts. Another famous partition theorem due to MacMahon states that the number of partitions of $n$ with all parts repeated at least once equals the number of partitions of $n$ where all parts must be even or congruent to $3 \pmod 6$. These partition theorems wer…
▽ More
The famous partition theorem of Euler states that partitions of $n$ into distinct parts are equinumerous with partitions of $n$ into odd parts. Another famous partition theorem due to MacMahon states that the number of partitions of $n$ with all parts repeated at least once equals the number of partitions of $n$ where all parts must be even or congruent to $3 \pmod 6$. These partition theorems were further extended by Glaisher, Andrews, Subbarao, Nyirenda and Mugwangwavari. In this paper, we utilize the Chinese remainder theorem to prove a comprehensive partition theorem that encompasses all existing partition theorems. We also give a natural generalization of Euler's theorem based on a special complete residue system. Furthermore, we establish interesting congruence connections between the partition function $p(n)$ and related partition functions.
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Accelerating optimization over the space of probability measures
Authors:
Shi Chen,
Qin Li,
Oliver Tse,
Stephen J. Wright
Abstract:
The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine learning applications. While much attention has been directed towards optimizing within Euclidean space, the need to optimize over spaces of probability measures in machine learning motivates exploration of accelerated gradient methods in this contex…
▽ More
The acceleration of gradient-based optimization methods is a subject of significant practical and theoretical importance, particularly within machine learning applications. While much attention has been directed towards optimizing within Euclidean space, the need to optimize over spaces of probability measures in machine learning motivates exploration of accelerated gradient methods in this context too. To this end, we introduce a Hamiltonian-flow approach analogous to momentum-based approaches in Euclidean space. We demonstrate that, in the continuous-time setting, algorithms based on this approach can achieve convergence rates of arbitrarily high order. We complement our findings with numerical examples.
△ Less
Submitted 18 June, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Machine Learning Clifford invariants of ADE Coxeter elements
Authors:
Siqi Chen,
Pierre-Philippe Dechant,
Yang-Hui He,
Elli Heyes,
Edward Hirst,
Dmitrii Riabchenko
Abstract:
There has been recent interest in novel Clifford geometric invariants of linear transformations. This motivates the investigation of such invariants for a certain type of geometric transformation of interest in the context of root systems, reflection groups, Lie groups and Lie algebras: the Coxeter transformations. We perform exhaustive calculations of all Coxeter transformations for $A_8$, $D_8$…
▽ More
There has been recent interest in novel Clifford geometric invariants of linear transformations. This motivates the investigation of such invariants for a certain type of geometric transformation of interest in the context of root systems, reflection groups, Lie groups and Lie algebras: the Coxeter transformations. We perform exhaustive calculations of all Coxeter transformations for $A_8$, $D_8$ and $E_8$ for a choice of basis of simple roots and compute their invariants, using high-performance computing. This computational algebra paradigm generates a dataset that can then be mined using techniques from data science such as supervised and unsupervised machine learning. In this paper we focus on neural network classification and principal component analysis. Since the output -- the invariants -- is fully determined by the choice of simple roots and the permutation order of the corresponding reflections in the Coxeter element, we expect huge degeneracy in the mapping. This provides the perfect setup for machine learning, and indeed we see that the datasets can be machine learned to very high accuracy. This paper is a pump-priming study in experimental mathematics using Clifford algebras, showing that such Clifford algebraic datasets are amenable to machine learning, and shedding light on relationships between these novel and other well-known geometric invariants and also giving rise to analytic results.
△ Less
Submitted 26 May, 2024; v1 submitted 29 September, 2023;
originally announced October 2023.
-
Efficient Pauli channel estimation with logarithmic quantum memory
Authors:
Sitan Chen,
Weiyuan Gong
Abstract:
Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $ε$. Prior work (Chen et al., 2022) proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must…
▽ More
Here we revisit one of the prototypical tasks for characterizing the structure of noise in quantum devices: estimating every eigenvalue of an $n$-qubit Pauli noise channel to error $ε$. Prior work (Chen et al., 2022) proved no-go theorems for this task in the practical regime where one has a limited amount of quantum memory, e.g. any protocol with $\le 0.99n$ ancilla qubits of quantum memory must make exponentially many measurements, provided it is non-concatenating. Such protocols can only interact with the channel by repeatedly preparing a state, passing it through the channel, and measuring immediately afterward.
This left open a natural question: does the lower bound hold even for general protocols, i.e. ones which chain together many queries to the channel, interleaved with arbitrary data-processing channels, before measuring? Surprisingly, in this work we show the opposite: there is a protocol that can estimate the eigenvalues of a Pauli channel to error $ε$ using only $O(\log n/ε^2)$ ancilla qubits and $\tilde{O}(n^2/ε^2)$ measurements. In contrast, we show that any protocol with zero ancilla, even a concatenating one, must make $Ω(2^n/ε^2)$ measurements, which is tight.
Our results imply, to our knowledge, the first quantum learning task where logarithmically many qubits of quantum memory suffice for an exponential statistical advantage.
△ Less
Submitted 30 November, 2023; v1 submitted 25 September, 2023;
originally announced September 2023.
-
FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data
Authors:
Hao Sun,
Li Shen,
Shixiang Chen,
Jingwei Sun,
Jing Li,
Guangzhong Sun,
Dacheng Tao
Abstract:
Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same le…
▽ More
Federated learning is an emerging distributed machine learning method, enables a large number of clients to train a model without exchanging their local data. The time cost of communication is an essential bottleneck in federated learning, especially for training large-scale deep neural networks. Some communication-efficient federated learning methods, such as FedAvg and FedAdam, share the same learning rate across different clients. But they are not efficient when data is heterogeneous. To maximize the performance of optimization methods, the main challenge is how to adjust the learning rate without hurting the convergence. In this paper, we propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate based on local historical gradient squares and synchronized learning rates. Theoretical analysis shows that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients, which enables promising scalability in federated optimization. We also empirically compare our method with several communication-efficient federated optimization methods. Extensive experimental results on Computer Vision (CV) tasks and Natural Language Processing (NLP) task show the efficacy of our proposed FedLALR method and also coincides with our theoretical findings.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Dynamical Analysis of an Allelopathic Phytoplankton Model with Fear Effect
Authors:
Shangming Chen,
Fengde Chen,
Vaibhava Srivastava,
Rana D. Parshad
Abstract:
This paper is the first to propose an allelopathic phytoplankton competition ODE model influenced by a fear effect based on natural biological phenomena. It is shown that the interplay of this fear effect and the allelopathic term cause rich dynamics in the proposed competition model, such as global stability, transcritical bifurcation, pitchfork bifurcation, and saddle-node bifurcation. We also c…
▽ More
This paper is the first to propose an allelopathic phytoplankton competition ODE model influenced by a fear effect based on natural biological phenomena. It is shown that the interplay of this fear effect and the allelopathic term cause rich dynamics in the proposed competition model, such as global stability, transcritical bifurcation, pitchfork bifurcation, and saddle-node bifurcation. We also consider the spatially explicit version of the model and prove analogous results. Numerical simulations verify the feasibility of the theoretical analysis. The results demonstrate that the primary cause of the extinction of non-toxic species is the fear of toxic species compared to toxins. Allelopathy only affects the density of non-toxic species. The discussion provides guidance for the conservation of species and the maintenance of biodiversity.
△ Less
Submitted 15 September, 2023;
originally announced September 2023.
-
Safety Filter Design for Neural Network Systems via Convex Optimization
Authors:
Shaoru Chen,
Kong Yao Chee,
Nikolai Matni,
M. Ani Hsieh,
George J. Pappas
Abstract:
With the increase in data availability, it has been widely demonstrated that neural networks (NN) can capture complex system dynamics precisely in a data-driven manner. However, the architectural complexity and nonlinearity of the NNs make it challenging to synthesize a provably safe controller. In this work, we propose a novel safety filter that relies on convex optimization to ensure safety for…
▽ More
With the increase in data availability, it has been widely demonstrated that neural networks (NN) can capture complex system dynamics precisely in a data-driven manner. However, the architectural complexity and nonlinearity of the NNs make it challenging to synthesize a provably safe controller. In this work, we propose a novel safety filter that relies on convex optimization to ensure safety for a NN system, subject to additive disturbances that are capable of capturing modeling errors. Our approach leverages tools from NN verification to over-approximate NN dynamics with a set of linear bounds, followed by an application of robust linear MPC to search for controllers that can guarantee robust constraint satisfaction. We demonstrate the efficacy of the proposed framework numerically on a nonlinear pendulum system.
△ Less
Submitted 28 August, 2023; v1 submitted 15 August, 2023;
originally announced August 2023.
-
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks
Authors:
Siyu Chen,
Mengdi Wang,
Zhuoran Yang
Abstract:
We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization…
▽ More
We study reinforcement learning (RL) for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. In specific, at the outset of the game, the leader announces her policy to the follower and commits to it. The follower observes the leader's policy and, in turn, adopts a quantal response policy by solving an entropy-regularized policy optimization problem induced by leader's policy. The goal of the leader is to find her optimal policy, which yields the optimal expected total return, by interacting with the follower and learning from data. A key challenge of this problem is that the leader cannot observe the follower's reward, and needs to infer the follower's quantal response model from his actions against leader's policies. We propose sample-efficient algorithms for both the online and offline settings, in the context of function approximation. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem, and we show that they achieve sublinear regret upper bounds. Moreover, we quantify the uncertainty of these estimators and leverage the uncertainty to implement optimistic and pessimistic algorithms for online and offline settings. Besides, when specialized to the linear and myopic setting, our algorithms are also computationally efficient. Our theoretical analysis features a novel performance-difference lemma which incorporates the error of quantal response model, which might be of independent interest.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
$C^{k}$ extension and invariant manifolds for the compactification of nonautonomous systems with autonomous limits
Authors:
Shuang Chen,
Jinqiao Duan
Abstract:
We study the compactification of nonautonomous systems with autonomous limits and related dynamics. Although the $C^{1}$ extension of the compactification was well established, a great number of problems arising in bifurcation and stability analysis require the compactified systems with high-order smoothness. Inspired by this, we give a criterion for the $C^{k}$ ($k\geq 2$) extension of the compac…
▽ More
We study the compactification of nonautonomous systems with autonomous limits and related dynamics. Although the $C^{1}$ extension of the compactification was well established, a great number of problems arising in bifurcation and stability analysis require the compactified systems with high-order smoothness. Inspired by this, we give a criterion for the $C^{k}$ ($k\geq 2$) extension of the compactification. After compactifying nonautonomous systems, the compactified systems may gain an additional center direction. We prove the existence and uniqueness of center or center-stable manifolds for general compact invariant sets including normally hyperbolic invariant manifolds and admissible sets.
△ Less
Submitted 19 December, 2023; v1 submitted 20 July, 2023;
originally announced July 2023.
-
Continuous Non-monotone DR-submodular Maximization with Down-closed Convex Constraint
Authors:
Shengminjie Chen,
Donglei Du,
Wenguo Yang,
Dachuan Xu,
Suixiang Gao
Abstract:
We investigate the continuous non-monotone DR-submodular maximization problem subject to a down-closed convex solvable constraint. Our first contribution is to construct an example to demonstrate that (first-order) stationary points can have arbitrarily bad approximation ratios, and they are usually on the boundary of the feasible domain. These findings are in contrast with the monotone case where…
▽ More
We investigate the continuous non-monotone DR-submodular maximization problem subject to a down-closed convex solvable constraint. Our first contribution is to construct an example to demonstrate that (first-order) stationary points can have arbitrarily bad approximation ratios, and they are usually on the boundary of the feasible domain. These findings are in contrast with the monotone case where any stationary point yields a $1/2$-approximation (Hassani et al. (2017)). Moreover, this example offers insights on how to design improved algorithms by avoiding bad stationary points, such as the restricted continuous local search algorithm (Chekuri et al. (2014)) and the aided measured continuous greedy (Buchbinder and Feldman (2019)). However, the analyses in the last two algorithms only work for the discrete domain because both need to invoke the inequality that the multilinear extension of any submodular set function is bounded from below by its Lovasz extension. Our second contribution, therefore, is to remove this restriction and show that both algorithms can be extended to the continuous domain while retaining the same approximation ratios, and hence offering improved approximation ratios over those in Bian et al. (2017a). for the same problem. At last, we also include numerical experiments to demonstrate our algorithms on problems arising from machine learning and artificial intelligence.
△ Less
Submitted 26 March, 2024; v1 submitted 13 July, 2023;
originally announced July 2023.
-
Mixed state branching evolution for cell division models
Authors:
Shukai Chen,
Lina Ji,
Jie Xiong
Abstract:
We prove a scaling limit theorem for two-type Galton-Waston branching processes with interaction. The limit theorem gives rise to a class of mixed state branching processes with interaction using to simulate the evolution for cell division affected by parasites. Such process can also be obtained by the pathwise unique solution to a stochastic equation system. Moreover, we present sufficient condit…
▽ More
We prove a scaling limit theorem for two-type Galton-Waston branching processes with interaction. The limit theorem gives rise to a class of mixed state branching processes with interaction using to simulate the evolution for cell division affected by parasites. Such process can also be obtained by the pathwise unique solution to a stochastic equation system. Moreover, we present sufficient conditions for extinction with probability one and the exponential ergodicity in the total variation distance of such process.
△ Less
Submitted 19 November, 2023; v1 submitted 3 July, 2023;
originally announced July 2023.
-
Global regularity in the Monge-Ampère obstacle problem
Authors:
Shibing Chen,
Jiakun Liu,
Xianduo Wang
Abstract:
In this paper, we establish the global $W^{2,p}$ estimate for the Monge-Ampère obstacle problem: $(Du)_{\sharp}fχ{_{\{u>\frac{1}{2}|x|^2\}}}=g$, where $f$ and $g$ are positive continuous functions supported in disjoint bounded $C^2$ uniformly convex domains $\overlineΩ$ and $\overline{Ω^*}$, respectively. Furthermore, we assume that $\int_Ωf\geq \int_{Ω^*}g$. The main result shows that…
▽ More
In this paper, we establish the global $W^{2,p}$ estimate for the Monge-Ampère obstacle problem: $(Du)_{\sharp}fχ{_{\{u>\frac{1}{2}|x|^2\}}}=g$, where $f$ and $g$ are positive continuous functions supported in disjoint bounded $C^2$ uniformly convex domains $\overlineΩ$ and $\overline{Ω^*}$, respectively. Furthermore, we assume that $\int_Ωf\geq \int_{Ω^*}g$. The main result shows that $Du:\overline U\rightarrow\overline{Ω^*}$, where $ U=\{u>\frac{1}{2}|x|^2\}$, is a $W^{1, p}$ diffeomorphism for any $p\in(1,\infty)$. Previously, it was only known to be a continuous homeomorphism according to Caffarelli and McCann \cite{CM}. It is worth noting that our result is sharp, as we can construct examples showing that even with the additional assumption of smooth densities, the optimal map $Du$ is not Lipschitz. This obstacle problem arises naturally in optimal partial transportation.
△ Less
Submitted 1 July, 2023;
originally announced July 2023.
-
Correct order on some certain weighted representation functions
Authors:
Shi--Qiang Chen,
Yuchen Ding,
Xiaodong Lü,
Yuhan Zhang
Abstract:
Let $\mathbb{N}$ be the set of all nonnegative integers. For any positive integer $k$ and any subset $A$ of nonnegative integers, let $r_{1,k}(A,n)$ be the number of solutions $(a_1,a_2)$ to the equation $n=a_1+ka_2$. In 2016, Qu proved that $$\liminf_{n\rightarrow\infty}r_{1,k}(A,n)=\infty$$ providing that $r_{1,k}(A,n)=r_{1,k}(\mathbb{N}\setminus A,n)$ for all sufficiently large integers, which…
▽ More
Let $\mathbb{N}$ be the set of all nonnegative integers. For any positive integer $k$ and any subset $A$ of nonnegative integers, let $r_{1,k}(A,n)$ be the number of solutions $(a_1,a_2)$ to the equation $n=a_1+ka_2$. In 2016, Qu proved that $$\liminf_{n\rightarrow\infty}r_{1,k}(A,n)=\infty$$ providing that $r_{1,k}(A,n)=r_{1,k}(\mathbb{N}\setminus A,n)$ for all sufficiently large integers, which answered affirmatively a 2012 problem of Yang and Chen. In a very recent article, another Chen (the first named author) slightly improved Qu's result and obtained that $$\liminf_{n\rightarrow\infty}\frac{r_{1,k}(A,n)}{\log n}>0.$$ In this note, we further improve the lower bound on $r_{1,k}(A,n)$ by showing that $$\liminf_{n\rightarrow\infty}\frac{r_{1,k}(A,n)}{n}>0.$$ Our bound reflects the correct order of magnitude of the representation function $r_{1,k}(A,n)$ under the above restrictions due to the trivial fact that $r_{1,k}(A,n)\le n/k.$
△ Less
Submitted 11 September, 2023; v1 submitted 29 June, 2023;
originally announced June 2023.
-
The lower bound of weighted representation function
Authors:
Shi-Qiang Chen
Abstract:
For any given set $A$ of nonnegative integers and for any given two positive integers $k_1,k_2$, $R_{k_1,k_2}(A,n)$ is defined as the number of solutions of the equation $n=k_1a_1+k_2a_2$ with $a_1,a_2\in A$. In this paper, we prove that if integer $k\geq2$ and set $A\subseteq\mathbb{N}$ such that $R_{1,k}(A,n)=R_{1,k}(\mathbb{N}\setminus A,n)$ holds for all integers $n\geq n_0$, then…
▽ More
For any given set $A$ of nonnegative integers and for any given two positive integers $k_1,k_2$, $R_{k_1,k_2}(A,n)$ is defined as the number of solutions of the equation $n=k_1a_1+k_2a_2$ with $a_1,a_2\in A$. In this paper, we prove that if integer $k\geq2$ and set $A\subseteq\mathbb{N}$ such that $R_{1,k}(A,n)=R_{1,k}(\mathbb{N}\setminus A,n)$ holds for all integers $n\geq n_0$, then $R_{1,k}(A,n)\gg \log n$.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
D-finiteness, rationality, and height III: multivariate Pólya-Carlson dichotomy
Authors:
Jason P. Bell,
Shaoshi Chen,
Khoa D. Nguyen,
Umberto Zannier
Abstract:
We prove a result that can be seen as an analogue of the Pólya-Carlson theorem for multivariate D-finite power series with coefficients in $\bar{\mathbb{Q}}$. In the special case that the coefficients are algebraic integers, our main result says that if $$F(x_1,\ldots ,x_m)=\sum f(n_1,\ldots ,n_m)x_1^{n_1}\cdots x_m^{n_m}$$ is a D-finite power series in $m$ variables with algebraic integer coeffic…
▽ More
We prove a result that can be seen as an analogue of the Pólya-Carlson theorem for multivariate D-finite power series with coefficients in $\bar{\mathbb{Q}}$. In the special case that the coefficients are algebraic integers, our main result says that if $$F(x_1,\ldots ,x_m)=\sum f(n_1,\ldots ,n_m)x_1^{n_1}\cdots x_m^{n_m}$$ is a D-finite power series in $m$ variables with algebraic integer coefficients and if the logarithmic Weil height of $f(n_1,\ldots ,n_m)$ is $o(n_1+\cdots +n_m)$, then $F$ is a rational function and, up to scalar multiplication, every irreducible factor of the denominator of $F$ has the form $1-ζx_1^{q_1}\cdots x_m^{q_m}$ where $ζ$ is a root of unity and $q_1,\ldots ,q_m$ are nonnegative integers, not all of which are zero.
△ Less
Submitted 5 June, 2023;
originally announced June 2023.