subscribe to arXiv mailings

Differentiable Voxelization and Mesh Morphing

Authors: Yihao Luo, Yikai Wang, Zhengrui Xiang, Yuliang Xiu, Guang Yang, ChoonHwai Yap

Abstract: In this paper, we propose the differentiable voxelization of 3D meshes via the winding number and solid angles. The proposed approach achieves fast, flexible, and accurate voxelization of 3D meshes, admitting the computation of gradients with respect to the input mesh and GPU acceleration. We further demonstrate the application of the proposed voxelization in mesh morphing, where the voxelized mes… ▽ More In this paper, we propose the differentiable voxelization of 3D meshes via the winding number and solid angles. The proposed approach achieves fast, flexible, and accurate voxelization of 3D meshes, admitting the computation of gradients with respect to the input mesh and GPU acceleration. We further demonstrate the application of the proposed voxelization in mesh morphing, where the voxelized mesh is deformed by a neural network. The proposed method is evaluated on the ShapeNet dataset and achieves state-of-the-art performance in terms of both accuracy and efficiency. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.01281 [pdf, other]

Bridging Smoothness and Approximation: Theoretical Insights into Over-Smoothing in Graph Neural Networks

Authors: Guangrui Yang, Jianfei Li, Ming Li, Han Feng, Ding-Xuan Zhou

Abstract: In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we… ▽ More In this paper, we explore the approximation theory of functions defined on graphs. Our study builds upon the approximation results derived from the $K$-functional. We establish a theoretical framework to assess the lower bounds of approximation for target functions using Graph Convolutional Networks (GCNs) and examine the over-smoothing phenomenon commonly observed in these networks. Initially, we introduce the concept of a $K$-functional on graphs, establishing its equivalence to the modulus of smoothness. We then analyze a typical type of GCN to demonstrate how the high-frequency energy of the output decays, an indicator of over-smoothing. This analysis provides theoretical insights into the nature of over-smoothing within GCNs. Furthermore, we establish a lower bound for the approximation of target functions by GCNs, which is governed by the modulus of smoothness of these functions. This finding offers a new perspective on the approximation capabilities of GCNs. In our numerical experiments, we analyze several widely applied GCNs and observe the phenomenon of energy decay. These observations corroborate our theoretical results on exponential decay order. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.17763 [pdf, other]

DiffusionPDE: Generative PDE-Solving Under Partial Observation

Authors: Jiahe Huang, Guandao Yang, Zichen Wang, Jeong Joon Park

Abstract: We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which… ▽ More We introduce a general framework for solving partial differential equations (PDEs) using generative diffusion models. In particular, we focus on the scenarios where we do not have the full knowledge of the scene necessary to apply classical solvers. Most existing forward or inverse PDE approaches perform poorly when the observations on the data or the underlying coefficients are incomplete, which is a common assumption for real-world measurements. In this work, we propose DiffusionPDE that can simultaneously fill in the missing information and solve a PDE by modeling the joint distribution of the solution and coefficient spaces. We show that the learned generative priors lead to a versatile framework for accurately solving a wide range of PDEs under partial observation, significantly outperforming the state-of-the-art methods for both forward and inverse directions. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Project page: https://jhhuangchloe.github.io/Diffusion-PDE/

arXiv:2404.18051 [pdf, ps, other]

Liouville type theorems for the 3D stationary MHD and Hall-MHD equations with non-zero constant vectors at infinity

Authors: Wendong Wang, Guoxu Yang

Abstract: In this paper, we investigate Liouville type theorems for the three-dimensional steady-state MHD or Hall-MHD system under some asymptotic assumptions at infinity. Firstly, for the Hall-MHD system we obtain that $u$ and $B$ are constant vectors for any fluid viscosity, magnetic resistivity or Hall-coefficient when the magnetic field $B$ tends to a non-zero constant vector at infinity while the velo… ▽ More In this paper, we investigate Liouville type theorems for the three-dimensional steady-state MHD or Hall-MHD system under some asymptotic assumptions at infinity. Firstly, for the Hall-MHD system we obtain that $u$ and $B$ are constant vectors for any fluid viscosity, magnetic resistivity or Hall-coefficient when the magnetic field $B$ tends to a non-zero constant vector at infinity while the velocity field $u$ tends to $0$. Secondly, it also follows that $u$ and $B$ are constant for the Hall-MHD system when the velocity field tends to a constant vector at infinity while the magnetic field tends to $0$ without any assumptions on viscosity, magnetic resistivity or Hall-coefficient. One main difficulty lies in the Hall term, and we obtain the $L^p$ estimates of a generalized Oseen system with some supercritical terms via Lizorkin's theory and prove that the operator is stable by exploring Kato's stability theorem. Moreover, some similar results for the degenerate fluid viscosity or magnetic resistivity for the MHD system are also obtained, which is independent of interest. △ Less

Submitted 27 April, 2024; originally announced April 2024.

arXiv:2404.00697 [pdf]

A Lane Usage Strategy for General Traffic Access on Bus Lanes under Mixed Traffic Environment

Authors: Haoran Li, Zhenzhou Yuan, Rui Yue, Guangchuan Yang, Chuang Zhu, Siyuan Chen

Abstract: The strategy of permitting general traffic to use the bus lane for improved utilization while ensuring bus priority has gained increasingly attention, particularly with the support of vehicle-to-everything technology. In this study, we propose a novel lane usage strategy called Dynamic Spatial-Temporal Priority (DSTP) to ensure bus priority and optimize bus lane usage in a mixed traffic environmen… ▽ More The strategy of permitting general traffic to use the bus lane for improved utilization while ensuring bus priority has gained increasingly attention, particularly with the support of vehicle-to-everything technology. In this study, we propose a novel lane usage strategy called Dynamic Spatial-Temporal Priority (DSTP) to ensure bus priority and optimize bus lane usage in a mixed traffic environment. DSTP leverages dynamic methods to identify available spatial-temporal resources in the lane, utilizing signal timing, road information, and vehicle data. A Right-of-Way assignment optimization model is then developed based on these resources to determine which vehicles can enter the bus lane. The model is dynamically enacted using a rolling horizon scheme to accommodate time-varying traffic conditions. Numerical studies have validated the advantages of DSTP, showing maintained bus priority, improved traffic efficiency, reduced fuel consumption, and lower CO2 emissions, especially during periods of high traffic demand and concentrated bus arrivals. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 16 pages, 22 figures

arXiv:2402.03541 [pdf, other]

HAMLET: Graph Transformer Neural Operator for Partial Differential Equations

Authors: Andrey Bryutkin, Jiahao Huang, Zhongying Deng, Guang Yang, Carola-Bibiane Schönlieb, Angelica Aviles-Rivero

Abstract: We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to… ▽ More We present a novel graph transformer framework, HAMLET, designed to address the challenges in solving partial differential equations (PDEs) using neural networks. The framework uses graph transformers with modular input encoders to directly incorporate differential equation information into the solution process. This modularity enhances parameter correspondence control, making HAMLET adaptable to PDEs of arbitrary geometries and varied input formats. Notably, HAMLET scales effectively with increasing data complexity and noise, showcasing its robustness. HAMLET is not just tailored to a single type of physical simulation, but can be applied across various domains. Moreover, it boosts model resilience and performance, especially in scenarios with limited data. We demonstrate, through extensive experiments, that our framework is capable of outperforming current techniques for PDEs. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 17 pages, 7 figures, 6 tables

arXiv:2310.06560 [pdf, ps, other]

On friendship and cyclic parking functions

Authors: Yujia Kang, Thomas Selig, Guanyi Yang, Yanting Zhang, Haoyue Zhu

Abstract: In parking problems, a given number of cars enter a one-way street sequentially, and try to park according to a specified preferred spot in the street. Various models are possible depending on the chosen rule for collisions, when two cars have the same preferred spot. In classical parking functions, if a car's preferred spot is already occupied by a previous car, it drives forward and looks for th… ▽ More In parking problems, a given number of cars enter a one-way street sequentially, and try to park according to a specified preferred spot in the street. Various models are possible depending on the chosen rule for collisions, when two cars have the same preferred spot. In classical parking functions, if a car's preferred spot is already occupied by a previous car, it drives forward and looks for the first unoccupied spot to park. In this work, we introduce a variant of classical parking functions, called "friendship parking functions", which imposes additional restrictions on where cars can park. Namely, a car can only end up parking next to cars which are its friends (friendship will correspond to adjacency in an underlying graph). We characterise and enumerate such friendship parking functions according to their outcome permutation, which describes the final configuration when all cars have parked. We apply this to the case where the underlying friendship graph is the cycle graph. Finally, we consider a subset of classical parking functions, called "cyclic parking functions", where cars end up in an increasing cyclic order. We enumerate these cyclic parking functions and exhibit a bijection to permutation components. △ Less

Submitted 4 January, 2024; v1 submitted 10 October, 2023; originally announced October 2023.

Comments: 19 pages, 5 figures

MSC Class: 05A19 (Primary) 05A15; 05A05; 05C30 (Secondary)

arXiv:2310.02808 [pdf, other]

Probabilistic Method to Fundamental gap problems on the sphere

Authors: Gunhee Cho, Guofang Wei, Guang Yang

Abstract: We provide a probabilistic proof of the fundamental gap estimate for Schrödinger operators in convex domains on the sphere, which extends the probabilistic proof of F. Gong, H. Li, and D. Luo for the Euclidean case. Our results further generalize the results achieved for the Laplacian by S. Seto, L. Wang, and G. Wei, as well as by C. He, G. Wei, and Qi S. Zhang. The essential ingredient in our ana… ▽ More We provide a probabilistic proof of the fundamental gap estimate for Schrödinger operators in convex domains on the sphere, which extends the probabilistic proof of F. Gong, H. Li, and D. Luo for the Euclidean case. Our results further generalize the results achieved for the Laplacian by S. Seto, L. Wang, and G. Wei, as well as by C. He, G. Wei, and Qi S. Zhang. The essential ingredient in our analysis is the reflection coupling method on Riemannian manifolds. △ Less

Submitted 4 October, 2023; originally announced October 2023.

arXiv:2310.02244 [pdf, other]

Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks

Authors: Greg Yang, Dingli Yu, Chen Zhu, Soufiane Hayou

Abstract: By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $μ$P, for *widthwise hyperparameter transfer*, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for *depthwise parametrizations* of deep residual networks (resnets). We cla… ▽ More By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $μ$P, for *widthwise hyperparameter transfer*, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for *depthwise parametrizations* of deep residual networks (resnets). We classify depthwise parametrizations of block multiplier and learning rate by their infinite-width-then-depth limits. In resnets where each block has only one layer, we identify a unique optimal parametrization, called Depth-$μ$P that extends $μ$P and show empirically it admits depthwise hyperparameter transfer. We identify *feature diversity* as a crucial factor in deep networks, and Depth-$μ$P can be characterized as maximizing both feature learning and feature diversity. Exploiting this, we find that absolute value, among all homogeneous nonlinearities, maximizes feature diversity and indeed empirically leads to significantly better performance. However, if each block is deeper (such as modern transformers), then we find fundamental limitations in all possible infinite-depth limits of such parametrizations, which we illustrate both theoretically and empirically on simple networks as well as Megatron transformer trained on Common Crawl. △ Less

Submitted 12 October, 2023; v1 submitted 3 October, 2023; originally announced October 2023.

arXiv:2308.01814 [pdf, other]

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

Authors: Greg Yang, Etai Littwin

Abstract: Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of "kernel." We derive the corresponding "neural tangent" and "maximal update" lim… ▽ More Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam? Here we show: The same dichotomy between feature learning and kernel behaviors (as in SGD) holds for general optimizers as well, including Adam -- albeit with a nonlinear notion of "kernel." We derive the corresponding "neural tangent" and "maximal update" limits for any architecture. Two foundational advances underlie the above results: 1) A new Tensor Program language, NEXORT, that can express how adaptive optimizers process gradients into updates. 2) The introduction of bra-ket notation to drastically simplify expressions and calculations in Tensor Programs. This work summarizes and generalizes all previous results in the Tensor Programs series of papers. △ Less

Submitted 7 August, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: This is the complete version of "Adaptive Optimization in the Infinite-Width Limit" in ICLR 2023, https://openreview.net/forum?id=zgVDqw9ZUES

arXiv:2307.02062 [pdf, other]

Convergence Analysis for Restarted Anderson Mixing and Beyond

Authors: Fuchao Wei, Chenglong Bao, Yang Liu, Guangwen Yang

Abstract: Anderson mixing (AM) is a classical method that can accelerate fixed-point iterations by exploring historical information. Despite the successful application of AM in scientific computing, the theoretical properties of AM are still under exploration. In this paper, we study the restarted version of the Type-I and Type-II AM methods, i.e., restarted AM. With a multi-step analysis, we give a unified… ▽ More Anderson mixing (AM) is a classical method that can accelerate fixed-point iterations by exploring historical information. Despite the successful application of AM in scientific computing, the theoretical properties of AM are still under exploration. In this paper, we study the restarted version of the Type-I and Type-II AM methods, i.e., restarted AM. With a multi-step analysis, we give a unified convergence analysis for the two types of restarted AM and justify that the restarted Type-II AM can locally improve the convergence rate of the fixed-point iteration. Furthermore, we propose an adaptive mixing strategy by estimating the spectrum of the Jacobian matrix. If the Jacobian matrix is symmetric, we develop the short-term recurrence forms of restarted AM to reduce the memory cost. Finally, experimental results on various problems validate our theoretical findings. △ Less

Submitted 5 July, 2023; originally announced July 2023.

arXiv:2306.05707 [pdf, ps, other]

On the Mathematics of RNA Velocity II: Algorithmic Aspects

Authors: Tiejun Li, Yizhuo Wang, Guoguo Yang, Peijie Zhou

Abstract: In a previous paper [CSIAM Trans. Appl. Math. 2 (2021), 1-55], the authors proposed a theoretical framework for the analysis of RNA velocity, which is a promising concept in scRNA-seq data analysis to reveal the cell state-transition dynamical processes underlying snapshot data. The current paper is devoted to the algorithmic study of some key components in RNA velocity workflow. Four important po… ▽ More In a previous paper [CSIAM Trans. Appl. Math. 2 (2021), 1-55], the authors proposed a theoretical framework for the analysis of RNA velocity, which is a promising concept in scRNA-seq data analysis to reveal the cell state-transition dynamical processes underlying snapshot data. The current paper is devoted to the algorithmic study of some key components in RNA velocity workflow. Four important points are addressed in this paper: (1) We construct a rational time-scale fixation method which can determine the global gene-shared latent time for cells. (2) We present an uncertainty quantification strategy for the inferred parameters obtained through the EM algorithm. (3) We establish the optimal criterion for the choice of velocity kernel bandwidth with respect to the sample size in the downstream analysis and discuss its implications. (4) We propose a temporal distance estimation approach between two cell clusters along the cellular development path. Some illustrative numerical tests are also carried out to verify our analysis. These results are intended to provide tools and insights in further development of RNA velocity type methods in the future. △ Less

Submitted 9 June, 2023; originally announced June 2023.

Comments: 32 pages, 5 figures

arXiv:2303.16872 [pdf, ps, other]

Multi-dimensional Mean-field Type Backward Stochastic Differential Equations with Diagonally Quadratic Generators

Authors: Shanjian Tang, Guang Yang

Abstract: In this paper, we study the multi-dimensional backward stochastic differential equations (BSDEs) whose generator depends also on the mean of both variables. When the generator is diagonally quadratic, we prove that the BSDE admits a unique local solution with a fixed point argument. When the generator has a logarithmic growth of the off-diagonal elements (i.e., for each $i$, the $i$-th component o… ▽ More In this paper, we study the multi-dimensional backward stochastic differential equations (BSDEs) whose generator depends also on the mean of both variables. When the generator is diagonally quadratic, we prove that the BSDE admits a unique local solution with a fixed point argument. When the generator has a logarithmic growth of the off-diagonal elements (i.e., for each $i$, the $i$-th component of the generator has a logarithmic growth of the $j$-th row $z^j$ of the variable $z$ for each $j \neq i$), we give a new apriori estimate and obtain the existence and uniqueness of the global solution. △ Less

Submitted 30 March, 2023; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: 16 pages. arXiv admin note: text overlap with arXiv:2302.12470

MSC Class: 60H10

arXiv:2303.11659 [pdf, ps, other]

doi 10.1137/22M152013X

Formulae for mixed moments of Wiener processes and a stochastic area integral

Authors: Yoshio Komori, Guoguo Yang, Kevin Burrage

Abstract: This paper deals with the expectation of monomials with respect to the stochastic area integral $A_{1,2}(t,t+h)=\int_{t}^{t+h}\int_{t}^{s}{\rm d} W_{1}(r){\rm d} W_{2}(s) -\int_{t}^{t+h}\int_{t}^{s}{\rm d} W_{2}(r){\rm d} W_{1}(s)$ and the increments of two Wiener processes, $Δ{W}_{i}(t,t+h)=W_{i}(t+h)-W_{i}(t),\ i=1,2$. In a monomial, if the exponent of one of the Wiener increments or the stochas… ▽ More This paper deals with the expectation of monomials with respect to the stochastic area integral $A_{1,2}(t,t+h)=\int_{t}^{t+h}\int_{t}^{s}{\rm d} W_{1}(r){\rm d} W_{2}(s) -\int_{t}^{t+h}\int_{t}^{s}{\rm d} W_{2}(r){\rm d} W_{1}(s)$ and the increments of two Wiener processes, $Δ{W}_{i}(t,t+h)=W_{i}(t+h)-W_{i}(t),\ i=1,2$. In a monomial, if the exponent of one of the Wiener increments or the stochastic area integral is an odd number, then the expectation of the monomial is zero. However, if the exponent of any of them is an even number, then the expectation is nonzero and its exact value is not known in general. In the present paper, we derive formulae to give the value in general. As an application of the formulae, we will utilize the formulae for a careful stability analysis on a Magnus-type Milstein method. As another application, we will give some mixed moments of the increments of Wiener processes and stochastic double integrals. △ Less

Submitted 21 March, 2023; originally announced March 2023.

Comments: This is a preprint of a paper, which has been accepted for publication in SIAM Journal on Numerical Analysis

MSC Class: 60H10; 60H30; 65C30

arXiv:2303.09016 [pdf, ps, other]

Chaos processes as rough paths

Authors: Guang Yang

Abstract: In this article we investigate the rough paths structure of a process $X_t$ living in a fixed Wiener chaos. Specifically, we formulate various types of rough lifts of $X_t$ and study their properties. As application, we study the integrabilities of quantities related to rough differential equations driven by $X_t$. In this article we investigate the rough paths structure of a process $X_t$ living in a fixed Wiener chaos. Specifically, we formulate various types of rough lifts of $X_t$ and study their properties. As application, we study the integrabilities of quantities related to rough differential equations driven by $X_t$. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: 34 pages

arXiv:2302.12470 [pdf, ps, other]

Multi-dimensional Backward Stochastic Differential Equations of Diagonally Quadratic Generators with a Special Structure

Authors: Guang Yang

Abstract: The present paper is devoted to the well-posedness of a type of multi-dimensional backward stochastic differential equations (BSDEs) with a diagonally quadratic generator. We give a new priori estimate, and prove that the BSDE admits a unique solution on a given interval when the generator has a sufficiently small growth of the off-diagonal elements (i.e., for each $i$, the $i$-th component of the… ▽ More The present paper is devoted to the well-posedness of a type of multi-dimensional backward stochastic differential equations (BSDEs) with a diagonally quadratic generator. We give a new priori estimate, and prove that the BSDE admits a unique solution on a given interval when the generator has a sufficiently small growth of the off-diagonal elements (i.e., for each $i$, the $i$-th component of the generator has a small growth of the $j$-th row $z^j$ of the variable $z$ for each $j \neq i$). Finally, we give a solvability result when the diagonally quadratic generator is triangular. △ Less

Submitted 15 April, 2024; v1 submitted 24 February, 2023; originally announced February 2023.

Comments: 14 pages

MSC Class: 60H10

arXiv:2301.12483 [pdf, other]

Topological entropy of switched nonlinear and interconnected systems

Authors: Guosong Yang, Daniel Liberzon, João P. Hespanha

Abstract: A general upper bound for topological entropy of switched nonlinear systems is constructed, using an asymptotic average of upper limits of the matrix measures of Jacobian matrices of strongly persistent individual modes, weighted by their active rates. A general lower bound is constructed as well, using a similar weighted average of lower limits of the traces of these Jacobian matrices. In a case… ▽ More A general upper bound for topological entropy of switched nonlinear systems is constructed, using an asymptotic average of upper limits of the matrix measures of Jacobian matrices of strongly persistent individual modes, weighted by their active rates. A general lower bound is constructed as well, using a similar weighted average of lower limits of the traces of these Jacobian matrices. In a case of interconnected structure, the general upper bound is readily applied to derive upper bounds for entropy that depend only on "network-level" information. In a case of block-diagonal structure, less conservative upper and lower bounds for entropy are constructed. In each case, upper bounds for entropy that require less information about the switching signal are also derived. The upper bounds for entropy and their relations are illustrated by numerical examples of a switched Lotka-Volterra ecosystem model. △ Less

Submitted 29 January, 2023; originally announced January 2023.

Comments: 31 pages, 3 figures

MSC Class: 37B40; 93C30 (Primary) 93C57 (Secondary)

arXiv:2301.02291 [pdf, other]

doi 10.1007/s10463-022-00864-0

Least absolute deviation estimation for AR(1) processes with roots close to unity

Authors: Nannan Ma, Hailin Sang, Guangyu Yang

Abstract: We establish the asymptotic theory of least absolute deviation estimators for AR(1) processes with autoregressive parameter satisfying $n(ρ_n-1)\toγ$ for some fixed $γ$ as $n\to\infty$, which is parallel to the results of ordinary least squares estimators developed by Andrews and Guggenberger (2008) in the case $γ=0$ or Chan and Wei (1987) and Phillips (1987) in the case $γ\ne 0$. Simulation exper… ▽ More We establish the asymptotic theory of least absolute deviation estimators for AR(1) processes with autoregressive parameter satisfying $n(ρ_n-1)\toγ$ for some fixed $γ$ as $n\to\infty$, which is parallel to the results of ordinary least squares estimators developed by Andrews and Guggenberger (2008) in the case $γ=0$ or Chan and Wei (1987) and Phillips (1987) in the case $γ\ne 0$. Simulation experiments are conducted to confirm the theoretical results and to demonstrate the robustness of the least absolute deviation estimation. △ Less

Submitted 5 January, 2023; originally announced January 2023.

Comments: accepted by Annals of the Institute of Statistical Mathematics, 29 pages, 8 figures, 4 tables

MSC Class: 62M10; 62F12

arXiv:2301.00507 [pdf, ps, other]

On Geodesics of Sprays and Projective Completeness

Authors: Guojun Yang

Abstract: Geodesics, which play an important role in spray-Finsler geometry, are integral curves of a spray vector field on a manifold. Some comparison theorems and rigidity issues are established on the completeness of geodesics of a spray or a Finsler metric. In this paper, projectively flat sprays with weak Ricci constant (eps. constant curvature) are classified at the level of geodesics. Further, a geod… ▽ More Geodesics, which play an important role in spray-Finsler geometry, are integral curves of a spray vector field on a manifold. Some comparison theorems and rigidity issues are established on the completeness of geodesics of a spray or a Finsler metric. In this paper, projectively flat sprays with weak Ricci constant (eps. constant curvature) are classified at the level of geodesics. Further, a geodesic method is introduced to determine an $n$-dimensional spray based on a family of curves with $2(n-1)$ free constant parameters as geodesics. Finally, it shows that a spray is projectively complete under certain condition satisfied by the domain of geodesic parameter of all geodesics. △ Less

Submitted 1 January, 2023; originally announced January 2023.

Comments: 16 pages

MSC Class: 53B40; 53C60

arXiv:2211.16578 [pdf, other]

ButterflyNet2D: Bridging Classical Methods and Neural Network Methods in Image Processing

Authors: Gengzhi Yang, Yingzhou Li

Abstract: Both classical Fourier transform-based methods and neural network methods are widely used in image processing tasks. The former has better interpretability, whereas the latter often achieves better performance in practice. This paper introduces ButterflyNet2D, a regular CNN with sparse cross-channel connections. A Fourier initialization strategy for ButterflyNet2D is proposed to approximate Fourie… ▽ More Both classical Fourier transform-based methods and neural network methods are widely used in image processing tasks. The former has better interpretability, whereas the latter often achieves better performance in practice. This paper introduces ButterflyNet2D, a regular CNN with sparse cross-channel connections. A Fourier initialization strategy for ButterflyNet2D is proposed to approximate Fourier transforms. Numerical experiments validate the accuracy of ButterflyNet2D approximating both the Fourier and the inverse Fourier transforms. Moreover, through four image processing tasks and image datasets, we show that training ButterflyNet2D from Fourier initialization does achieve better performance than random initialized neural networks. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2211.06197 [pdf, ps, other]

A convergence study of SGD-type methods for stochastic optimization

Authors: Tiannan Xiao, Guoguo Yang

Abstract: In this paper, we first reinvestigate the convergence of vanilla SGD method in the sense of $L^2$ under more general learning rates conditions and a more general convex assumption, which relieves the conditions on learning rates and do not need the problem to be strongly convex. Then, by taking advantage of the Lyapunov function technique, we present the convergence of the momentum SGD and Nestero… ▽ More In this paper, we first reinvestigate the convergence of vanilla SGD method in the sense of $L^2$ under more general learning rates conditions and a more general convex assumption, which relieves the conditions on learning rates and do not need the problem to be strongly convex. Then, by taking advantage of the Lyapunov function technique, we present the convergence of the momentum SGD and Nesterov accelerated SGD methods for the convex and non-convex problem under $L$-smooth assumption that extends the bounded gradient limitation to a certain extent. The convergence of time averaged SGD was also analyzed. △ Less

Submitted 9 June, 2023; v1 submitted 11 November, 2022; originally announced November 2022.

Comments: 14 pages

MSC Class: 60F05; 60J22; 37N40

arXiv:2211.02221 [pdf, ps, other]

Gorenstein homological dimension and some invariants of groups

Authors: Wei Ren, Gang Yang

Abstract: For any group $G$, the Gorenstein homological dimension ${\rm Ghd}_RG$ is defined to be the Gorenstein flat dimension of the coefficient ring $R$, which is considered as an $RG$-module with trivial group action. We prove that ${\rm Ghd}_RG < \infty$ if and only if the Gorenstein flat dimension of any $RG$-module is finite, if and only if there exists an $R$-pure $RG$-monic $R\rightarrow A$ with… ▽ More For any group $G$, the Gorenstein homological dimension ${\rm Ghd}_RG$ is defined to be the Gorenstein flat dimension of the coefficient ring $R$, which is considered as an $RG$-module with trivial group action. We prove that ${\rm Ghd}_RG < \infty$ if and only if the Gorenstein flat dimension of any $RG$-module is finite, if and only if there exists an $R$-pure $RG$-monic $R\rightarrow A$ with $A$ being $R$-flat and ${\rm Ghd}_RG = {\rm fd}_{RG}A$, where $R$ is a commutative ring with finite Gorenstein weak global dimension. As applications, properties of ${\rm Ghd}$ on subgroup, quotient group, extension of groups as well as Weyl group are investigated. Moreover, we compare the relations between some invariants such as ${\rm sfli}RG$, ${\rm silf}RG$, ${\rm spli}RG$, ${\rm silp}RG$, and Gorenstein projective, Gorenstein flat and PGF dimensions of $RG$-modules; a sufficient condition for Gorenstein projective-flat problem over group rings is given. △ Less

Submitted 18 April, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

Comments: Revised version of the paper which was previously titled "Gorenstein flat dimension with group ring coefficients". We appreciate any comments and suggestions

MSC Class: 18G20; 18G25; 20J05

arXiv:2211.01151 [pdf, ps, other]

On stability of subelliptic harmonic maps with potential

Authors: Tian Chong, Yuxin Dong, Guilin Yang

Abstract: In this paper, we investigate the stability problem of subelliptic harmonic maps with potential. First, we derive the first and second variation formulas for subelliptic harmonic maps with potential. As a result, it is proved that a subelliptic harmonic map with potential is stable if the target manifold has nonpositive curvature and the Hessian of the potential is nonpositive definite. We also gi… ▽ More In this paper, we investigate the stability problem of subelliptic harmonic maps with potential. First, we derive the first and second variation formulas for subelliptic harmonic maps with potential. As a result, it is proved that a subelliptic harmonic map with potential is stable if the target manifold has nonpositive curvature and the Hessian of the potential is nonpositive definite. We also give Leung type results which involve the instability of subelliptic harmonic maps with potential when the target manifold is a sphere of dimension $\geq 3$. △ Less

Submitted 2 November, 2022; originally announced November 2022.

arXiv:2209.04157 [pdf, other]

doi 10.1109/TAES.2023.3271961

A Fast Algorithm for Onboard Atmospheric Powered Descent Guidance

Authors: Yushu Chen, Guangwen Yang, Lu Wang, Qingzhong Gan, Haipeng Chen, Quanyong Xu

Abstract: Atmospheric powered descent guidance can be solved by successive convexification; however, its onboard application is impeded by the sharp increase in computation caused by nonlinear aerodynamic forces. The problem has to be converted into a sequence of convex subproblems instead of a single convex problem when aerodynamic forces are ignored. Besides, each subproblem is significantly more complica… ▽ More Atmospheric powered descent guidance can be solved by successive convexification; however, its onboard application is impeded by the sharp increase in computation caused by nonlinear aerodynamic forces. The problem has to be converted into a sequence of convex subproblems instead of a single convex problem when aerodynamic forces are ignored. Besides, each subproblem is significantly more complicated, which increases computation. A fast real-time interior point method was presented to solve the correlated convex subproblems onboard in the work. The main contributions are as follows: Firstly, an algorithm was proposed to accelerate the solution of linear systems that cost most of the computation in each iterative step by exploiting the specific problem structure. Secondly, a warm-starting scheme was introduced to refine the initial value of a subproblem with a rough approximate solution of the former subproblem, which lessened the iterative steps required for each subproblem. The method proposed reduced the run time by a factor of 9 compared with the fastest publicly available solver tested in Monte Carlo simulations to evaluate the efficiency of solvers. Runtimes on the order of 0.6 s are achieved on a radiation-hardened flight processor, which demonstrated the potential of the real-time onboard application. △ Less

Submitted 6 June, 2023; v1 submitted 9 September, 2022; originally announced September 2022.

Comments: The paper is accepted by IEEE Transactions on Aerospace and Electronic Systems, 2023

arXiv:2208.09960 [pdf, ps, other]

The Stochastic Schwarz lemma on Kähler Manifolds by Couplings and Its Applications

Authors: Myeongju Chae, Gunhee Cho, Maria Gordina, Guang Yang

Abstract: We first provide a stochastic formula for the Carathéodory distance in terms of general Markovian couplings and prove a comparison result between the Carathéodory distance and the complete Kähler metric with a negative lower curvature bound using the Kendall-Cranston coupling. This probabilistic approach gives a version of the Schwarz lemma on complete non-compact Kähler manifolds with a further d… ▽ More We first provide a stochastic formula for the Carathéodory distance in terms of general Markovian couplings and prove a comparison result between the Carathéodory distance and the complete Kähler metric with a negative lower curvature bound using the Kendall-Cranston coupling. This probabilistic approach gives a version of the Schwarz lemma on complete non-compact Kähler manifolds with a further decomposition Ricci curvature into the orthogonal Ricci curvature and the holomorphic sectional curvature, which cannot be obtained by using Yau--Royden's Schwarz lemma. We also prove coupling estimates on quaternionic Kähler manifolds. As a byproduct, we obtain an improved gradient estimate of positive harmonic functions on Kähler manifolds and quaternionic Kähler manifolds under lower curvature bounds. △ Less

Submitted 30 November, 2023; v1 submitted 21 August, 2022; originally announced August 2022.

Comments: To appear JLMS

arXiv:2208.01615 [pdf, ps, other]

On Non-degenerate Chaos Processes

Authors: Guang Yang

Abstract: We consider a process $\{X_t\}_{0\leq t\leq 1}$ in a fixed Wiener chaos $\mathcal{H}_n$. We establish some non-degenerate properties and related results for $\{X_t\}_{0\leq t\leq 1}$. As an application, we show that solution to SDE driven by $\{X_t\}_{0\leq t\leq 1}$ admits a density. Our approach relies on an interplay between Malliavin calculus and analysis on Wiener space. We consider a process $\{X_t\}_{0\leq t\leq 1}$ in a fixed Wiener chaos $\mathcal{H}_n$. We establish some non-degenerate properties and related results for $\{X_t\}_{0\leq t\leq 1}$. As an application, we show that solution to SDE driven by $\{X_t\}_{0\leq t\leq 1}$ admits a density. Our approach relies on an interplay between Malliavin calculus and analysis on Wiener space. △ Less

Submitted 2 August, 2022; originally announced August 2022.

arXiv:2207.11755 [pdf, other]

Revisiting the central limit theorems for the SGD-type methods

Authors: Tiejun Li, Tiannan Xiao, Guoguo Yang

Abstract: We revisited the central limit theorem (CLT) for stochastic gradient descent (SGD) type methods, including the vanilla SGD, momentum SGD and Nesterov accelerated SGD methods with constant or vanishing damping parameters. By taking advantage of Lyapunov function technique and $L^p$ bound estimates, we established the CLT under more general conditions on learning rates for broader classes of SGD met… ▽ More We revisited the central limit theorem (CLT) for stochastic gradient descent (SGD) type methods, including the vanilla SGD, momentum SGD and Nesterov accelerated SGD methods with constant or vanishing damping parameters. By taking advantage of Lyapunov function technique and $L^p$ bound estimates, we established the CLT under more general conditions on learning rates for broader classes of SGD methods compared with previous results. The CLT for the time average was also investigated, and we found that it held in the linear case, while it was not generally true in nonlinear situation. Numerical tests were also carried out to verify our theoretical analysis. △ Less

Submitted 9 June, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

Comments: 23 pages, 2 figures

MSC Class: 60F05; 60J22; 37N40

arXiv:2207.03756 [pdf, ps, other]

Sprays on Hamel-Funk Functions Model

Authors: Guojun Yang

Abstract: Hamel functions of a spray play an important role in the study of the projective metrizability of the concerned spray, and Funk functions are special Hamel functions. A Finsler metric is a special Hamel function of the spray induced by the metric itself and a Funk metric is a special Funk function of a Minkowski spray. In this paper, we study sprays on a Hamel or Funk function model. Firstly, we g… ▽ More Hamel functions of a spray play an important role in the study of the projective metrizability of the concerned spray, and Funk functions are special Hamel functions. A Finsler metric is a special Hamel function of the spray induced by the metric itself and a Funk metric is a special Funk function of a Minkowski spray. In this paper, we study sprays on a Hamel or Funk function model. Firstly, we give some basic properties of a Hamel or Funk function of a spray and some curvature properties of a Hamel or Funk function in projective relations. We use the Funk metric to construct a family of sprays and obtain some of their curvature properties and their metrizability conditions. Secondly, we consider the existence of Funk functions on certain spray manifold. We prove that there exist local Funk functions on a R-flat spray manifold, and on certain projectively flat Berwald spray manifolds, we construct a multitude of nonzero Funk functions. Finally, we introduce a new class of sprays called Hamel or Funk sprays associated to given sprays and Hamel or Funk functions. We obtain some special properties of a Hamel or Funk spray of scalar curvature, especially on its metrizability and a special form of its Riemann curvature. △ Less

Submitted 8 July, 2022; originally announced July 2022.

Comments: 21 pages

MSC Class: 53C60; 53B40

arXiv:2205.11892 [pdf, ps, other]

On Sprays of Scalar Curvature and Metrizability

Authors: Guojun Yang

Abstract: Every Finsler metric naturally induces a spray but not so for the converse. The notion for sprays of scalar (resp. isotropic) curvature has been known as a generalization for Finsler metrics of scalar (resp. isotropic) flag curvature. In this paper, a new notion, sprays of constant curvature, is introduced and especially it shows that a spray of isotropic curvature is not necessarily of constant c… ▽ More Every Finsler metric naturally induces a spray but not so for the converse. The notion for sprays of scalar (resp. isotropic) curvature has been known as a generalization for Finsler metrics of scalar (resp. isotropic) flag curvature. In this paper, a new notion, sprays of constant curvature, is introduced and especially it shows that a spray of isotropic curvature is not necessarily of constant curvature even in dimension $n\ge3$. Further, complete conditions are given for sprays of isotropic (resp. constant) curvature to be Finsler-metrizabile. As applications of such a result, the local structure is determined for locally projectively flat Berwald sprays of isotropic (resp. constant) curvature which are Finsler-metrizable, and some more sprays of isotropic curvature are discussed for their metrizability. Besides, the metrizability problem is also investigated for sprays of scalar curvature under certain curvature conditions. △ Less

Submitted 24 May, 2022; originally announced May 2022.

Comments: 20 pages

MSC Class: 53C60; 53B40

arXiv:2205.08038 [pdf, other]

Newton and interior-point methods for (constrained) nonconvex-nonconcave minmax optimization with stability and instability guarantees

Authors: Raphael Chinchilla, Guosong Yang, Joao P. Hespanha

Abstract: We address the problem of finding a local solution to a nonconvex-nonconcave minmax optimization using Newton type methods, including interior-point ones. We modify the Hessian matrix of these methods such that, at each step, the modified Newton update direction can be seen as the solution to a quadratic program that locally approximates the minmax problem. Moreover, we show that by selecting the… ▽ More We address the problem of finding a local solution to a nonconvex-nonconcave minmax optimization using Newton type methods, including interior-point ones. We modify the Hessian matrix of these methods such that, at each step, the modified Newton update direction can be seen as the solution to a quadratic program that locally approximates the minmax problem. Moreover, we show that by selecting the modification in an appropriate way, the only stable equilibrium points of the algorithm's iterations are local minmax points. As a consequence, the algorithm can only converge towards an equilibrium point if such point is a local minmax, and it will escape if the point is not a local minmax. Using numerical examples, we show that the computation time of our algorithm scales roughly linearly with the number of nonzero elements in the Hessian. △ Less

Submitted 11 February, 2024; v1 submitted 16 May, 2022; originally announced May 2022.

Comments: Published at the Journal of the Mathematics of Control, Signals, and Systems

arXiv:2205.03570 [pdf, ps, other]

Iteration Complexity of an Infeasible Interior Point Methods for Seconder-order Cone Programming and its Warmstarting

Authors: Yushu Chen, Guangwen Yang, Lu Wang, Qingzhong Gan, Haipeng Chen

Abstract: This paper studies the worst case iteration complexity of an infeasible interior point method (IPM) for seconder order cone programming (SOCP), which is more convenient for warmstarting compared with feasible IPMs. The method studied bases on the homogeneous and self-dual model and the Monteiro-Zhang family of searching directions. Its worst case iteration complexity is… ▽ More This paper studies the worst case iteration complexity of an infeasible interior point method (IPM) for seconder order cone programming (SOCP), which is more convenient for warmstarting compared with feasible IPMs. The method studied bases on the homogeneous and self-dual model and the Monteiro-Zhang family of searching directions. Its worst case iteration complexity is $O\left(k^{1/2}\log\left(ε^{-1}\right)\right)$, to reduce the primal residual, dual residual, and complementarity gap by a factor of $ε$, where $k$ is the number of cone constraints. The result is the same as the best known result for feasible IPMs. The condition under which warmstarting improves the complexity bound is also studied. △ Less

Submitted 24 January, 2023; v1 submitted 7 May, 2022; originally announced May 2022.

ACM Class: F.2.1; G.1.6

arXiv:2205.01445 [pdf, other]

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

Authors: Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

Abstract: We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\topσ(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss:… ▽ More We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\topσ(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$. In the proportional asymptotic limit where $n,d,N\to\infty$ at the same rate, and an idealized student-teacher setting, we show that the first gradient update contains a rank-1 "spike", which results in an alignment between the first-layer weights and the linear component of the teacher model $f^*$. To characterize the impact of this alignment, we compute the prediction risk of ridge regression on the conjugate kernel after one gradient step on $\boldsymbol{W}$ with learning rate $η$, when $f^*$ is a single-index model. We consider two scalings of the first step learning rate $η$. For small $η$, we establish a Gaussian equivalence property for the trained feature map, and prove that the learned kernel improves upon the initial random features model, but cannot defeat the best linear model on the input. Whereas for sufficiently large $η$, we prove that for certain $f^*$, the same ridge estimator on trained features can go beyond this "linear regime" and outperform a wide range of random features and rotationally invariant kernels. Our results demonstrate that even one gradient step can lead to a considerable advantage over random features, and highlight the role of learning rate scaling in the initial phase of training. △ Less

Submitted 3 May, 2022; originally announced May 2022.

Comments: 71 pages

arXiv:2203.04126 [pdf, ps, other]

Some multivariable Rado numbers

Authors: Gang Yang, Yaping Mao, Changxiang He, Zhao Wang

Abstract: The Rado number of an equation is a Ramsey-theoretic quantity associated to the equation. Let $\mathcal{E}$ be a linear equation. Denote by $\operatorname{R}_r(\mathcal{E})$ the minimal integer, if it exists, such that any $r$-coloring of $[1,\operatorname{R}_r(\mathcal{E})]$ must admit a monochromatic solution to $\mathcal{E}$. In this paper, we give upper and lower bounds for the Rado number of… ▽ More The Rado number of an equation is a Ramsey-theoretic quantity associated to the equation. Let $\mathcal{E}$ be a linear equation. Denote by $\operatorname{R}_r(\mathcal{E})$ the minimal integer, if it exists, such that any $r$-coloring of $[1,\operatorname{R}_r(\mathcal{E})]$ must admit a monochromatic solution to $\mathcal{E}$. In this paper, we give upper and lower bounds for the Rado number of $\sum_{i=1}^{m-2}x_i+kx_{m-1}=\ell x_{m}$, and some exact values are also given. Furthermore, we derive some results for the cases that $\ell=m=4$ and $m=5, \ell=k+i \ (1\leq i\leq 5)$. As a generalization, the \emph{$r$-color Rado numbers} for linear equations $\mathcal{E}_1,\mathcal{E}_2,...,\mathcal{E}_r$ is defined as the minimal integer, if it exists, such that any $r$-coloring of $[1,\operatorname{R}_r(\mathcal{E}_1,\mathcal{E}_2,...,\mathcal{E}_r)]$ must admit a monochromatic solution to some $\mathcal{E}_i$, where $1\leq i\leq r$. A lower bound for $\operatorname{R}_r(\mathcal{E}_1,\mathcal{E}_2,...,\mathcal{E}_r)$ and the exact values of $\operatorname{R}_2(x+y=z,\ell x=y)=5k$ and $\operatorname{R}_2(x+y=z, x+a=y)$ was given by Lovász Local Lemma. △ Less

Submitted 10 March, 2022; v1 submitted 5 March, 2022; originally announced March 2022.

arXiv:2111.00534 [pdf, other]

Focal Attention Networks: optimising attention for biomedical image segmentation

Authors: Michael Yeung, Leonardo Rundo, Evis Sala, Carola-Bibiane Schönlieb, Guang Yang

Abstract: In recent years, there has been increasing interest to incorporate attention into deep learning architectures for biomedical image segmentation. The modular design of attention mechanisms enables flexible integration into convolutional neural network architectures, such as the U-Net. Whether attention is appropriate to use, what type of attention to use, and where in the network to incorporate att… ▽ More In recent years, there has been increasing interest to incorporate attention into deep learning architectures for biomedical image segmentation. The modular design of attention mechanisms enables flexible integration into convolutional neural network architectures, such as the U-Net. Whether attention is appropriate to use, what type of attention to use, and where in the network to incorporate attention modules, are all important considerations that are currently overlooked. In this paper, we investigate the role of the Focal parameter in modulating attention, revealing a link between attention in loss functions and networks. By incorporating a Focal distance penalty term, we extend the Unified Focal loss framework to include boundary-based losses. Furthermore, we develop a simple and interpretable, dataset and model-specific heuristic to integrate the Focal parameter into the Squeeze-and-Excitation block and Attention Gate, achieving optimal performance with fewer number of attention modules on three well-validated biomedical imaging datasets, suggesting judicious use of attention modules results in better performance and efficiency. △ Less

Submitted 31 October, 2021; originally announced November 2021.

arXiv:2111.00533 [pdf, other]

Incorporating Boundary Uncertainty into loss functions for biomedical image segmentation

Authors: Michael Yeung, Guang Yang, Evis Sala, Carola-Bibiane Schönlieb, Leonardo Rundo

Abstract: Manual segmentation is used as the gold-standard for evaluating neural networks on automated image segmentation tasks. Due to considerable heterogeneity in shapes, colours and textures, demarcating object boundaries is particularly difficult in biomedical images, resulting in significant inter and intra-rater variability. Approaches, such as soft labelling and distance penalty term, apply a global… ▽ More Manual segmentation is used as the gold-standard for evaluating neural networks on automated image segmentation tasks. Due to considerable heterogeneity in shapes, colours and textures, demarcating object boundaries is particularly difficult in biomedical images, resulting in significant inter and intra-rater variability. Approaches, such as soft labelling and distance penalty term, apply a global transformation to the ground truth, redefining the loss function with respect to uncertainty. However, global operations are computationally expensive, and neither approach accurately reflects the uncertainty underlying manual annotation. In this paper, we propose the Boundary Uncertainty, which uses morphological operations to restrict soft labelling to object boundaries, providing an appropriate representation of uncertainty in ground truth labels, and may be adapted to enable robust model training where systematic manual segmentation errors are present. We incorporate Boundary Uncertainty with the Dice loss, achieving consistently improved performance across three well-validated biomedical imaging datasets compared to soft labelling and distance-weighted penalty. Boundary Uncertainty not only more accurately reflects the segmentation process, but it is also efficient, robust to segmentation errors and exhibits better generalisation. △ Less

Submitted 31 October, 2021; originally announced November 2021.

arXiv:2111.00528 [pdf, other]

Calibrating the Dice loss to handle neural network overconfidence for biomedical image segmentation

Authors: Michael Yeung, Leonardo Rundo, Yang Nan, Evis Sala, Carola-Bibiane Schönlieb, Guang Yang

Abstract: The Dice similarity coefficient (DSC) is both a widely used metric and loss function for biomedical image segmentation due to its robustness to class imbalance. However, it is well known that the DSC loss is poorly calibrated, resulting in overconfident predictions that cannot be usefully interpreted in biomedical and clinical practice. Performance is often the only metric used to evaluate segment… ▽ More The Dice similarity coefficient (DSC) is both a widely used metric and loss function for biomedical image segmentation due to its robustness to class imbalance. However, it is well known that the DSC loss is poorly calibrated, resulting in overconfident predictions that cannot be usefully interpreted in biomedical and clinical practice. Performance is often the only metric used to evaluate segmentations produced by deep neural networks, and calibration is often neglected. However, calibration is important for translation into biomedical and clinical practice, providing crucial contextual information to model predictions for interpretation by scientists and clinicians. In this study, we provide a simple yet effective extension of the DSC loss, named the DSC++ loss, that selectively modulates the penalty associated with overconfident, incorrect predictions. As a standalone loss function, the DSC++ loss achieves significantly improved calibration over the conventional DSC loss across six well-validated open-source biomedical imaging datasets, including both 2D binary and 3D multi-class segmentation tasks. Similarly, we observe significantly improved calibration when integrating the DSC++ loss into four DSC-based loss functions. Finally, we use softmax thresholding to illustrate that well calibrated outputs enable tailoring of recall-precision bias, which is an important post-processing technique to adapt the model predictions to suit the biomedical or clinical task. The DSC++ loss overcomes the major limitation of the DSC loss, providing a suitable loss function for training deep learning segmentation models for use in biomedical and clinical practice. Source code is available at: https://github.com/mlyg/DicePlusPlus. △ Less

Submitted 1 November, 2022; v1 submitted 31 October, 2021; originally announced November 2021.

arXiv:2109.03688 [pdf, ps, other]

Limit theorems for linear random fields with innovations in the domain of attraction of a stable law

Authors: Magda Peligrad, Hailin Sang, Yimin Xiao, Guangyu Yang

Abstract: In this paper we study the convergence in distribution and the local limit theorem for the partial sums of linear random fields with i.i.d. innovations that have infinite second moment and belong to the domain of attraction of a stable law with index $0<α\leq2$ under the condition that the innovations are centered if $1<α\leq2$ and are symmetric if $α=1$. We establish these two types of limit theo… ▽ More In this paper we study the convergence in distribution and the local limit theorem for the partial sums of linear random fields with i.i.d. innovations that have infinite second moment and belong to the domain of attraction of a stable law with index $0<α\leq2$ under the condition that the innovations are centered if $1<α\leq2$ and are symmetric if $α=1$. We establish these two types of limit theorems as long as the linear random fields are well-defined, the coefficients are either absolutely summable or not absolutely summable. △ Less

Submitted 7 May, 2022; v1 submitted 8 September, 2021; originally announced September 2021.

Comments: 23 pages, accepted by Stochastic Processes and their Applications

MSC Class: Primary 60G60; 60G52; Secondary 60F05; 62M40

arXiv:2105.14588 [pdf, ps, other]

A note on first eigenvalue estimates by coupling methods in Kähler and quaternion Kähler manifolds

Authors: Fabrice Baudoin, Gunhee Cho, Guang Yang

Abstract: In this short note, using the Kendall-Cranston coupling, we study on Kähler (resp. quaternion Kähler) manifolds first eigenvalue estimates in terms of dimension, diameter, and lower bounds on the holomorphic (resp. quaternionic) sectional curvature. In this short note, using the Kendall-Cranston coupling, we study on Kähler (resp. quaternion Kähler) manifolds first eigenvalue estimates in terms of dimension, diameter, and lower bounds on the holomorphic (resp. quaternionic) sectional curvature. △ Less

Submitted 8 December, 2021; v1 submitted 30 May, 2021; originally announced May 2021.

Comments: v2: References added

arXiv:2105.03703 [pdf, other]

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

Authors: Greg Yang, Etai Littwin

Abstract: Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at initialization has an infinite-width limit for a large class of architectures including modern staples such as ResNet and Transformers. However, their analysis does not apply to training. Here, we show the same neural networks (in the so-called NTK parametrization) during training follow a kernel gradient descent dynamics in func… ▽ More Yang (2020a) recently showed that the Neural Tangent Kernel (NTK) at initialization has an infinite-width limit for a large class of architectures including modern staples such as ResNet and Transformers. However, their analysis does not apply to training. Here, we show the same neural networks (in the so-called NTK parametrization) during training follow a kernel gradient descent dynamics in function space, where the kernel is the infinite-width NTK. This completes the proof of the *architectural universality* of NTK behavior. To achieve this result, we apply the Tensor Programs technique: Write the entire SGD dynamics inside a Tensor Program and analyze it via the Master Theorem. To facilitate this proof, we develop a graphical notation for Tensor Programs. △ Less

Submitted 8 May, 2021; originally announced May 2021.

Comments: ICML 2021

arXiv:2101.03253 [pdf, other]

Adaptive Learning in Two-Player Stackelberg Games with Application to Network Security

Authors: Guosong Yang, Radha Poovendran, João P. Hespanha

Abstract: We study a two-player Stackelberg game with incomplete information such that the follower's strategy belongs to a known family of parameterized functions with an unknown parameter vector. We design an adaptive learning approach to simultaneously estimate the unknown parameter and minimize the leader's cost, based on adaptive control techniques and hysteresis switching. Our approach guarantees that… ▽ More We study a two-player Stackelberg game with incomplete information such that the follower's strategy belongs to a known family of parameterized functions with an unknown parameter vector. We design an adaptive learning approach to simultaneously estimate the unknown parameter and minimize the leader's cost, based on adaptive control techniques and hysteresis switching. Our approach guarantees that the leader's cost predicted using the parameter estimate becomes indistinguishable from its actual cost in finite time, up to a preselected, arbitrarily small error threshold. Also, the first-order necessary condition for optimality holds asymptotically for the predicted cost. Additionally, if a persistent excitation condition holds, then the parameter estimation error becomes bounded by a preselected, arbitrarily small threshold in finite time as well. For the case where there is a mismatch between the follower's strategy and the parameterized function that is known to the leader, our approach is able to guarantee the same convergence results for error thresholds larger than the size of the mismatch. The algorithms and the convergence results are illustrated via a simulation example in the domain of network security. △ Less

Submitted 8 January, 2021; originally announced January 2021.

MSC Class: 91A26; 91A65 (Primary) 37N40; 93C40; 65L20 (Secondary)

arXiv:2009.10685 [pdf, other]

Tensor Programs III: Neural Matrix Laws

Authors: Greg Yang

Abstract: In a neural network (NN), *weight matrices* linearly transform inputs into *preactivations* that are then transformed nonlinearly into *activations*. A typical NN interleaves multitudes of such linear and nonlinear transforms to express complex functions. Thus, the (pre-)activations depend on the weights in an intricate manner. We show that, surprisingly, (pre-)activations of a randomly initialize… ▽ More In a neural network (NN), *weight matrices* linearly transform inputs into *preactivations* that are then transformed nonlinearly into *activations*. A typical NN interleaves multitudes of such linear and nonlinear transforms to express complex functions. Thus, the (pre-)activations depend on the weights in an intricate manner. We show that, surprisingly, (pre-)activations of a randomly initialized NN become *independent* from the weights as the NN's widths tend to infinity, in the sense of asymptotic freeness in random matrix theory. We call this the Free Independence Principle (FIP), which has these consequences: 1) It rigorously justifies the calculation of asymptotic Jacobian singular value distribution of an NN in Pennington et al. [36,37], essential for training ultra-deep NNs [48]. 2) It gives a new justification of gradient independence assumption used for calculating the Neural Tangent Kernel of a neural network. FIP and these results hold for any neural architecture. We show FIP by proving a Master Theorem for any Tensor Program, as introduced in Yang [50,51], generalizing the Master Theorems proved in those works. As warmup demonstrations of this new Master Theorem, we give new proofs of the semicircle and Marchenko-Pastur laws, which benchmarks our framework against these fundamental mathematical results. △ Less

Submitted 8 May, 2021; v1 submitted 22 September, 2020; originally announced September 2020.

arXiv:2008.10116 [pdf, ps, other]

Octonionic Brownian Windings

Authors: Gunhee Cho, Guang Yang

Abstract: We define and study the windings along Brownian paths in the octonionic Euclidean, projective and hyperbolic spaces which are isometric to 8-dimensional Riemannian model spaces. In particular, the asymptotic laws of these windings are shown to be Gaussian for the flat and spherical geometries while the hyperbolic winding exhibits a different long time-behavior. We define and study the windings along Brownian paths in the octonionic Euclidean, projective and hyperbolic spaces which are isometric to 8-dimensional Riemannian model spaces. In particular, the asymptotic laws of these windings are shown to be Gaussian for the flat and spherical geometries while the hyperbolic winding exhibits a different long time-behavior. △ Less

Submitted 20 August, 2021; v1 submitted 23 August, 2020; originally announced August 2020.

Comments: To appear in Journal of Theoretical Probability

arXiv:2005.09192 [pdf, ps, other]

A Version of Hörmander's Theorem for Markovian Rough Paths

Authors: Guang Yang

Abstract: We consider a rough differential equation of the form $dY_t=\sum_i V_i(Y_t)d\boldsymbol{X}^i_t+V_0(Y_t)dt $, where $\boldsymbol{X}_t $ is a Markovian rough path. We demonstrate that if the vector fields $(V_i)_{0\leq i\leq d} $ satisfy Hörmander's bracket generating condition, then $Y_t$ admits a smooth density with a Gaussian type upper bound, given that the generator of $X_t$ satisfy c… ▽ More We consider a rough differential equation of the form $dY_t=\sum_i V_i(Y_t)d\boldsymbol{X}^i_t+V_0(Y_t)dt $, where $\boldsymbol{X}_t $ is a Markovian rough path. We demonstrate that if the vector fields $(V_i)_{0\leq i\leq d} $ satisfy Hörmander's bracket generating condition, then $Y_t$ admits a smooth density with a Gaussian type upper bound, given that the generator of $X_t$ satisfy certain non-degenerate conditions. The main new ingredient of this paper is the study of non-degenerate property of the Jacobian process of $X_t$. △ Less

Submitted 1 February, 2022; v1 submitted 18 May, 2020; originally announced May 2020.

Comments: Improved the writing, results unchanged

arXiv:2003.05603 [pdf, ps, other]

Brownian motions and heat kernel lower bounds on Kähler and quaternion Kähler manifolds

Authors: Fabrice Baudoin, Guang Yang

Abstract: We study the radial parts of the Brownian motions on Kähler and quaternion Kähler manifolds. Thanks to sharp Laplacian comparison theorems, we deduce as a consequence a sharp Cheeger-Yau type lower bound for the heat kernels of such manifolds and also sharp Cheng's type estimates for the Dirichlet eigenvalues of metric balls. We study the radial parts of the Brownian motions on Kähler and quaternion Kähler manifolds. Thanks to sharp Laplacian comparison theorems, we deduce as a consequence a sharp Cheeger-Yau type lower bound for the heat kernels of such manifolds and also sharp Cheng's type estimates for the Dirichlet eigenvalues of metric balls. △ Less

Submitted 11 March, 2020; originally announced March 2020.

arXiv:1912.01820 [pdf, ps, other]

The equivalent theorem of a new generalized Bernstein-Bezier operators

Authors: Qiu-Lan Qi, Dan-Dan Guo, Ge Yang

Abstract: In this paper, a new generalized Bernstein-Bezier type operators is constructed.The estimates of the moments of these operators are investigated. The rate of convergence in terms of modulus of continuity is given. Then, the equivalent theorem of these operators is studied. In this paper, a new generalized Bernstein-Bezier type operators is constructed.The estimates of the moments of these operators are investigated. The rate of convergence in terms of modulus of continuity is given. Then, the equivalent theorem of these operators is studied. △ Less

Submitted 4 December, 2019; originally announced December 2019.

Comments: 15 pages

arXiv:1910.03764 [pdf, ps, other]

Geck's Conjecture and the Generalized Gelfand-Graev Representations in Bad Characteristic

Authors: Junbin Dong, Gao Yang

Abstract: For a connected reductive algebraic group $G$ defined over a finite field $\mathbb F_q$, Kawanaka introduced the generalized Gelfand-Graev representations (GGGRs for short) of the finite group $G(\mathbb F_q)$ in the case where $q$ is a power of a good prime for $G$. This representation has been widely studied and used in various contexts. Recently, Geck proposed a conjecture, characterizing Luszt… ▽ More For a connected reductive algebraic group $G$ defined over a finite field $\mathbb F_q$, Kawanaka introduced the generalized Gelfand-Graev representations (GGGRs for short) of the finite group $G(\mathbb F_q)$ in the case where $q$ is a power of a good prime for $G$. This representation has been widely studied and used in various contexts. Recently, Geck proposed a conjecture, characterizing Lusztig's special unipotent classes in terms of weighted Dynkin diagrams. Based on this conjecture, he gave a guideline for extending the definition of GGGRs to the case where $q$ is a power of a bad prime for $G$. Here, we will give a proof of Geck's conjecture. Combined with Geck's pioneer work, our proof verifies Geck's conjectural characterization of special unipotent classes, and completes his definition of GGGRs in bad characteristics. △ Less

Submitted 8 October, 2019; originally announced October 2019.

Comments: 33 pages

arXiv:1909.02159 [pdf, ps, other]

Free resolutions of function classes via order complexes

Authors: Justin Chen, Christopher Eur, Greg Yang, Mengyuan Zhang

Abstract: Function classes are collections of Boolean functions on a finite set, which are fundamental objects of study in theoretical computer science. We study algebraic properties of ideals associated to function classes previously defined by the third author. We consider the broad family of intersection-closed function classes, and describe cellular free resolutions of their ideals by order complexes of… ▽ More Function classes are collections of Boolean functions on a finite set, which are fundamental objects of study in theoretical computer science. We study algebraic properties of ideals associated to function classes previously defined by the third author. We consider the broad family of intersection-closed function classes, and describe cellular free resolutions of their ideals by order complexes of the associated posets. For function classes arising from matroids, polyhedral cell complexes, and more generally interval Cohen-Macaulay posets, we show that the multigraded Betti numbers are pure, and are given combinatorially by the Möbius functions. We then apply our methods to derive bounds on the VC dimension of some important families of function classes in learning theory. △ Less

Submitted 16 June, 2020; v1 submitted 4 September, 2019; originally announced September 2019.

Comments: 18 pages with figures. Final journal version, to appear in Advances in Applied Mathematics

MSC Class: 13D02; 68Q32; 05E40; 06A12; 05B35

arXiv:1905.01422 [pdf, other]

An Adaptive Remote Stochastic Gradient Method for Training Neural Networks

Authors: Yushu Chen, Hao Jing, Wenlai Zhao, Zhiqiang Liu, Ouyi Li, Liang Qiao, Wei Xue, Guangwen Yang

Abstract: We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to i… ▽ More We present the remote stochastic gradient (RSG) method, which computes the gradients at configurable remote observation points, in order to improve the convergence rate and suppress gradient noise at the same time for different curvatures. RSG is further combined with adaptive methods to construct ARSG for acceleration. The method is efficient in computation and memory, and is straightforward to implement. We analyze the convergence properties by modeling the training process as a dynamic system, which provides a guideline to select the configurable observation factor without grid search. ARSG yields $O(1/\sqrt{T})$ convergence rate in non-convex settings, that can be further improved to $O(\log(T)/T)$ in strongly convex settings. Numerical experiments demonstrate that ARSG achieves both faster convergence and better generalization, compared with popular adaptive methods, such as ADAM, NADAM, AMSGRAD, and RANGER for the tested problems. In particular, for training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization. △ Less

Submitted 6 September, 2020; v1 submitted 3 May, 2019; originally announced May 2019.

Comments: The generalization is improved by modifying the preconditioner. For training ResNet-50 on ImageNet, ARSG outperforms ADAM in convergence speed and meanwhile it surpasses SGD in generalization. We also present a convergence bound in non-convex settings

arXiv:1902.08129 [pdf, other]

A Mean Field Theory of Batch Normalization

Authors: Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz

Abstract: We develop a mean field theory for batch normalization in fully-connected feedforward neural networks. In so doing, we provide a precise characterization of signal propagation and gradient backpropagation in wide batch-normalized networks at initialization. Our theory shows that gradient signals grow exponentially in depth and that these exploding gradients cannot be eliminated by tuning the initi… ▽ More We develop a mean field theory for batch normalization in fully-connected feedforward neural networks. In so doing, we provide a precise characterization of signal propagation and gradient backpropagation in wide batch-normalized networks at initialization. Our theory shows that gradient signals grow exponentially in depth and that these exploding gradients cannot be eliminated by tuning the initial weight variances or by adjusting the nonlinear activation function. Indeed, batch normalization itself is the cause of gradient explosion. As a result, vanilla batch-normalized networks without skip connections are not trainable at large depths for common initialization schemes, a prediction that we verify with a variety of empirical simulations. While gradient explosion cannot be eliminated, it can be reduced by tuning the network close to the linear regime, which improves the trainability of deep batch-normalized networks without residual connections. Finally, we investigate the learning dynamics of batch-normalized networks and observe that after a single step of optimization the networks achieve a relatively stable equilibrium in which gradients have dramatically smaller dynamic range. Our theory leverages Laplace, Fourier, and Gegenbauer transforms and we derive new identities that may be of independent interest. △ Less

Submitted 5 March, 2019; v1 submitted 21 February, 2019; originally announced February 2019.

Comments: To appear in ICLR 2019

arXiv:1809.10210 [pdf, ps, other]

A Machine Learning Approach to Shipping Box Design

Authors: Guang Yang, Cun Mu

Abstract: Having the right assortment of shipping boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it will not only help maintain a profitable business but also create great experiences for customers. However, it is an extremely challenging operations task to strategically select the best combination of tens… ▽ More Having the right assortment of shipping boxes in the fulfillment warehouse to pack and ship customer's online orders is an indispensable and integral part of nowadays eCommerce business, as it will not only help maintain a profitable business but also create great experiences for customers. However, it is an extremely challenging operations task to strategically select the best combination of tens of box sizes from thousands of feasible ones to be responsible for hundreds of thousands of orders daily placed on millions of inventory products. In this paper, we present a machine learning approach to tackle the task by formulating the box design problem prescriptively as a generalized version of weighted $k$-medoids clustering problem, where the parameters are estimated through a variety of descriptive analytics. We test this machine learning approach on fulfillment data collected from Walmart U.S. eCommerce, and our approach is shown to be capable of improving the box utilization rate by more than $10\%$. △ Less

Submitted 25 March, 2019; v1 submitted 26 September, 2018; originally announced September 2018.

Comments: Accepted by 2019 Intelligent Systems Conference (A shorter version of the paper is presented at the 13th INFORMS Workshop on Data Mining and Decision Analytics)

Showing 1–50 of 106 results for author: Yang, G