-
RAM: Retrieval-Based Affordance Transfer for Generalizable Zero-Shot Robotic Manipulation
Authors:
Yuxuan Kuang,
Junjie Ye,
Haoran Geng,
Jiageng Mao,
Congyue Deng,
Leonidas Guibas,
He Wang,
Yue Wang
Abstract:
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation, dubbed RAM, featuring generalizability across various objects, environments, and embodiments. Unlike existing approaches that learn manipulation from expensive in-domain demonstrations, RAM capitalizes on a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundan…
▽ More
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation, dubbed RAM, featuring generalizability across various objects, environments, and embodiments. Unlike existing approaches that learn manipulation from expensive in-domain demonstrations, RAM capitalizes on a retrieval-based affordance transfer paradigm to acquire versatile manipulation capabilities from abundant out-of-domain data. First, RAM extracts unified affordance at scale from diverse sources of demonstrations including robotic data, human-object interaction (HOI) data, and custom data to construct a comprehensive affordance memory. Then given a language instruction, RAM hierarchically retrieves the most similar demonstration from the affordance memory and transfers such out-of-domain 2D affordance to in-domain 3D executable affordance in a zero-shot and embodiment-agnostic manner. Extensive simulation and real-world evaluations demonstrate that our RAM consistently outperforms existing works in diverse daily tasks. Additionally, RAM shows significant potential for downstream applications such as automatic and efficient data collection, one-shot visual imitation, and LLM/VLM-integrated long-horizon manipulation. For more details, please check our website at https://yxkryptonite.github.io/RAM/.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Deep Symbolic Optimization for Combinatorial Optimization: Accelerating Node Selection by Discovering Potential Heuristics
Authors:
Hongyu Liu,
Haoyang Liu,
Yufei Kuang,
Jie Wang,
Bin Li
Abstract:
Combinatorial optimization (CO) is one of the most fundamental mathematical models in real-world applications. Traditional CO solvers, such as Branch-and-Bound (B&B) solvers, heavily rely on expert-designed heuristics, which are reliable but require substantial manual tuning. Recent studies have leveraged deep learning (DL) models as an alternative to capture rich feature patterns for improved per…
▽ More
Combinatorial optimization (CO) is one of the most fundamental mathematical models in real-world applications. Traditional CO solvers, such as Branch-and-Bound (B&B) solvers, heavily rely on expert-designed heuristics, which are reliable but require substantial manual tuning. Recent studies have leveraged deep learning (DL) models as an alternative to capture rich feature patterns for improved performance on GPU machines. Nonetheless, the drawbacks of high training and inference costs, as well as limited interpretability, severely hinder the adoption of DL methods in real-world applications. To address these challenges, we propose a novel deep symbolic optimization learning framework that combines their advantages. Specifically, we focus on the node selection module within B&B solvers -- namely, deep symbolic optimization for node selection (Dso4NS). With data-driven approaches, Dso4NS guides the search for mathematical expressions within the high-dimensional discrete symbolic space and then incorporates the highest-performing mathematical expressions into a solver. The data-driven model captures the rich feature information in the input data and generates symbolic expressions, while the expressions deployed in solvers enable fast inference with high interpretability. Experiments demonstrate the effectiveness of Dso4NS in learning high-quality expressions, outperforming existing approaches on a CPU machine. Encouragingly, the learned CPU-based policies consistently achieve performance comparable to state-of-the-art GPU-based approaches.
△ Less
Submitted 10 July, 2024; v1 submitted 14 June, 2024;
originally announced June 2024.
-
Dynamic properties of a class of van der Pol-Duffing oscillators
Authors:
Yelei Kuang,
Xuemei Li
Abstract:
In this paper, we study the existence of bifurcation of a van der Pol-Duffing oscillator with quintic terms and its quasi-periodic solutions by means of qualitative and bifurcation theories. Firstly, we analyze the autonomous system and find that it has two kinds of local bifurcations and a global bifurcation: pitchfork bifurcation, Hopf bifurcation, homoclinic bifurcation. It is worth noting that…
▽ More
In this paper, we study the existence of bifurcation of a van der Pol-Duffing oscillator with quintic terms and its quasi-periodic solutions by means of qualitative and bifurcation theories. Firstly, we analyze the autonomous system and find that it has two kinds of local bifurcations and a global bifurcation: pitchfork bifurcation, Hopf bifurcation, homoclinic bifurcation. It is worth noting that the disappearance of the homoclinic orbit is synchronized with the emergence of a large limit cycle. Then, by discussing the stability of equilibria at infinity and the orientation of the trajectory, the existence and stability of limit circles of the autonomous system are analyzed by combining the Poincaré-Bendixson theorem and the index theory. The global phase portrait and the numerical simulation of the autonomous system in different parameter values are given. Finally, the existence of periodic and quasi-periodic solutions to periodic forced system is proved by a KAM theorem.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Robust Deep Reinforcement Learning with Adaptive Adversarial Perturbations in Action Space
Authors:
Qianmei Liu,
Yufei Kuang,
Jie Wang
Abstract:
Deep reinforcement learning (DRL) algorithms can suffer from modeling errors between the simulation and the real world. Many studies use adversarial learning to generate perturbation during training process to model the discrepancy and improve the robustness of DRL. However, most of these approaches use a fixed parameter to control the intensity of the adversarial perturbation, which can lead to a…
▽ More
Deep reinforcement learning (DRL) algorithms can suffer from modeling errors between the simulation and the real world. Many studies use adversarial learning to generate perturbation during training process to model the discrepancy and improve the robustness of DRL. However, most of these approaches use a fixed parameter to control the intensity of the adversarial perturbation, which can lead to a trade-off between average performance and robustness. In fact, finding the optimal parameter of the perturbation is challenging, as excessive perturbations may destabilize training and compromise agent performance, while insufficient perturbations may not impart enough information to enhance robustness. To keep the training stable while improving robustness, we propose a simple but effective method, namely, Adaptive Adversarial Perturbation (A2P), which can dynamically select appropriate adversarial perturbations for each sample. Specifically, we propose an adaptive adversarial coefficient framework to adjust the effect of the adversarial perturbation during training. By designing a metric for the current intensity of the perturbation, our method can calculate the suitable perturbation levels based on the current relative performance. The appealing feature of our method is that it is simple to deploy in real-world applications and does not require accessing the simulator in advance. The experiments in MuJoCo show that our method can improve the training stability and learn a robust policy when migrated to different test environments. The code is available at https://github.com/Lqm00/A2P-SAC.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Learning to Cut via Hierarchical Sequence/Set Model for Efficient Mixed-Integer Programming
Authors:
Jie Wang,
Zhihai Wang,
Xijun Li,
Yufei Kuang,
Zhihao Shi,
Fangzhou Zhu,
Mingxuan Yuan,
Jia Zeng,
Yongdong Zhang,
Feng Wu
Abstract:
Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), which formulate many important real-world applications. Cut selection heavily depends on (P1) which cuts to prefer and (P2) how many cuts to select. Although modern MILP solvers tackle (P1)-(P2) by human-designed heuristics, machine learning carries the potential to learn more effective heuristics. Howev…
▽ More
Cutting planes (cuts) play an important role in solving mixed-integer linear programs (MILPs), which formulate many important real-world applications. Cut selection heavily depends on (P1) which cuts to prefer and (P2) how many cuts to select. Although modern MILP solvers tackle (P1)-(P2) by human-designed heuristics, machine learning carries the potential to learn more effective heuristics. However, many existing learning-based methods learn which cuts to prefer, neglecting the importance of learning how many cuts to select. Moreover, we observe that (P3) what order of selected cuts to prefer significantly impacts the efficiency of MILP solvers as well. To address these challenges, we propose a novel hierarchical sequence/set model (HEM) to learn cut selection policies. Specifically, HEM is a bi-level model: (1) a higher-level module that learns how many cuts to select, (2) and a lower-level module -- that formulates the cut selection as a sequence/set to sequence learning problem -- to learn policies selecting an ordered subset with the cardinality determined by the higher-level module. To the best of our knowledge, HEM is the first data-driven methodology that well tackles (P1)-(P3) simultaneously. Experiments demonstrate that HEM significantly improves the efficiency of solving MILPs on eleven challenging MILP benchmarks, including two Huawei's real problems.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
CipherFormer: Efficient Transformer Private Inference with Low Round Complexity
Authors:
Weize Wang,
Yi Kuang
Abstract:
There is a growing trend to outsource the inference task of large transformer models to cloud servers. However, this poses a severe threat to users' private data as they are exposed to cloud servers after uploading. Although several works attempted to provide private inference for transformer models, their hundreds of communication rounds limit the application scenarios. Motivated by the desire to…
▽ More
There is a growing trend to outsource the inference task of large transformer models to cloud servers. However, this poses a severe threat to users' private data as they are exposed to cloud servers after uploading. Although several works attempted to provide private inference for transformer models, their hundreds of communication rounds limit the application scenarios. Motivated by the desire to minimize round complexity, we propose CipherFormer, a novel transformer private inference scheme using homomorphic encryption and garbled circuits. We present a protocol for quickly computing homomorphic matrix multiplications. We then modify the attention mechanism and design the corresponding garbled circuits. Furthermore, we show how to use a lightweight attention mechanism and mixed-bitwidth to reduce the inference latency while maintaining accuracy. In comparison with an advanced homomorphic encryption scheme on text classification tasks, our model improves accuracy by 3% to 11% while performing private inference with a 7.7x-11.9x speedup.
△ Less
Submitted 25 March, 2024;
originally announced March 2024.
-
Simple Full-Spectrum Correlated k-Distribution Model based on Multilayer Perceptron
Authors:
Xin Wang,
Yucheng Kuang,
Chaojun Wang,
Hongyuan Di,
Boshu He
Abstract:
While neural networks have been successfully applied to the full-spectrum k-distribution (FSCK) method at a large range of thermodynamics with k-values predicted by a trained multilayer perceptron (MLP) model, the required a-values still need to be calculated on-the-fly, which theoretically degrades the FSCK method and may lead to errors. On the other hand, too complicated structure of the current…
▽ More
While neural networks have been successfully applied to the full-spectrum k-distribution (FSCK) method at a large range of thermodynamics with k-values predicted by a trained multilayer perceptron (MLP) model, the required a-values still need to be calculated on-the-fly, which theoretically degrades the FSCK method and may lead to errors. On the other hand, too complicated structure of the current MLP model inevitably slows down the calculation efficiency. Therefore, to compensate among accuracy, efficiency and storage, the simple MLP designed based on the nature of FSCK method are developed, i.e., the simple FSCK MLP (SFM) model, from which those correlated k-values and corresponding ka-values can be efficiently obtained. Several test cases have been carried out to compare the developed SFM model and other FSCK tools including look-up tables and traditional FSCK MLP (TFM) model. Results show that the SFM model can achieve excellent accuracy that is even better than look-up tables at a tiny computational cost that is far less than that of TFM model. Considering accuracy, efficiency and portability, the SFM model is not only an excellent tool for the prediction of spectral properties, but also provides a method to reduce the errors due to nonlinear effects.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Neutron radius determination of 133Cs and its impact on the interpretation of CEvNS-CsI measurement
Authors:
Y. Huang,
S. Y. Xia,
Y. F. Li,
X. L. Tu,
J. T. Zhang,
C. J. Shao,
K. Yue,
P. Ma,
Y. F. Niu,
Z. P. Li,
Y. Kuang,
X. Q. Liu,
J. F. Han,
P. Egelhof,
Yu. A. Litvinov,
M. Wang,
Y. H. Zhang,
X. H. Zhou,
Z. Y. Sun
Abstract:
Proton-$^{133}$Cs elastic scattering at low momentum transfer is performed using an in-ring reaction technique at the Cooler Storage Ring at the Heavy Ion Research Facility in Lanzhou. Recoil protons from the elastic collisions between the internal H$_2$-gas target and the circulating $^{133}$Cs ions at 199.4 MeV/u are detected by a silicon-strip detector. The matter radius of $^{133}$Cs is deduce…
▽ More
Proton-$^{133}$Cs elastic scattering at low momentum transfer is performed using an in-ring reaction technique at the Cooler Storage Ring at the Heavy Ion Research Facility in Lanzhou. Recoil protons from the elastic collisions between the internal H$_2$-gas target and the circulating $^{133}$Cs ions at 199.4 MeV/u are detected by a silicon-strip detector. The matter radius of $^{133}$Cs is deduced by describing the measured differential cross sections using the Glauber model. Employing the adopted proton distribution radius, a point-neutron radius of 4.86(21) fm for $^{133}$Cs is obtained. With the newly determined neutron radius, the weak mixing angle sin$^2 θ_W$ is independently extracted to be 0.227(28) by fitting the coherent elastic neutrino-nucleus scattering data. Our work limits the sin$^2 θ_W$ value in a range smaller than the ones proposed by the previous independent approaches, and would play an important role in searching new physics via the high precision CE$ν$NS-CsI cross section data in the near future.
△ Less
Submitted 8 April, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
Authors:
Yuxuan Kuang,
Hai Lin,
Meng Jiang
Abstract:
Object navigation (ObjectNav) requires an agent to navigate through unseen environments to find queried objects. Many previous methods attempted to solve this task by relying on supervised or reinforcement learning, where they are trained on limited household datasets with close-set objects. However, two key challenges are unsolved: understanding free-form natural language instructions that demand…
▽ More
Object navigation (ObjectNav) requires an agent to navigate through unseen environments to find queried objects. Many previous methods attempted to solve this task by relying on supervised or reinforcement learning, where they are trained on limited household datasets with close-set objects. However, two key challenges are unsolved: understanding free-form natural language instructions that demand open-set objects, and generalizing to new environments in a zero-shot manner. Aiming to solve the two challenges, in this paper, we propose OpenFMNav, an Open-set Foundation Model based framework for zero-shot object Navigation. We first unleash the reasoning abilities of large language models (LLMs) to extract proposed objects from natural language instructions that meet the user's demand. We then leverage the generalizability of large vision language models (VLMs) to actively discover and detect candidate objects from the scene, building a Versatile Semantic Score Map (VSSM). Then, by conducting common sense reasoning on VSSM, our method can perform effective language-guided exploration and exploitation of the scene and finally reach the goal. By leveraging the reasoning and generalizing abilities of foundation models, our method can understand free-form human instructions and perform effective open-set zero-shot navigation in diverse environments. Extensive experiments on the HM3D ObjectNav benchmark show that our method surpasses all the strong baselines on all metrics, proving our method's effectiveness. Furthermore, we perform real robot demonstrations to validate our method's open-set-ness and generalizability to real-world environments.
△ Less
Submitted 24 March, 2024; v1 submitted 16 February, 2024;
originally announced February 2024.
-
BAFLineDP: Code Bilinear Attention Fusion Framework for Line-Level Defect Prediction
Authors:
Shaojian Qiu,
Huihao Huang,
Jianxiang Luo,
Yingjie Kuang,
Haoyu Luo
Abstract:
Software defect prediction aims to identify defect-prone code, aiding developers in optimizing testing resource allocation. Most defect prediction approaches primarily focus on coarse-grained, file-level defect prediction, which fails to provide developers with the precision required to locate defective code. Recently, some researchers have proposed fine-grained, line-level defect prediction metho…
▽ More
Software defect prediction aims to identify defect-prone code, aiding developers in optimizing testing resource allocation. Most defect prediction approaches primarily focus on coarse-grained, file-level defect prediction, which fails to provide developers with the precision required to locate defective code. Recently, some researchers have proposed fine-grained, line-level defect prediction methods. However, most of these approaches lack an in-depth consideration of the contextual semantics of code lines and neglect the local interaction information among code lines. To address the above issues, this paper presents a line-level defect prediction method grounded in a code bilinear attention fusion framework (BAFLineDP). This method discerns defective code files and lines by integrating source code line semantics, line-level context, and local interaction information between code lines and line-level context. Through an extensive analysis involving within- and cross-project defect prediction across 9 distinct projects encompassing 32 releases, our results demonstrate that BAFLineDP outperforms current advanced file-level and line-level defect prediction approaches.
△ Less
Submitted 11 February, 2024;
originally announced February 2024.
-
Unveiling Latent Causal Rules: A Temporal Point Process Approach for Abnormal Event Explanation
Authors:
Yiling Kuang,
Chao Yang,
Yang Yang,
Shuang Li
Abstract:
In high-stakes systems such as healthcare, it is critical to understand the causal reasons behind unusual events, such as sudden changes in patient's health. Unveiling the causal reasons helps with quick diagnoses and precise treatment planning. In this paper, we propose an automated method for uncovering "if-then" logic rules to explain observational events. We introduce temporal point processes…
▽ More
In high-stakes systems such as healthcare, it is critical to understand the causal reasons behind unusual events, such as sudden changes in patient's health. Unveiling the causal reasons helps with quick diagnoses and precise treatment planning. In this paper, we propose an automated method for uncovering "if-then" logic rules to explain observational events. We introduce temporal point processes to model the events of interest, and discover the set of latent rules to explain the occurrence of events. To achieve this, we employ an Expectation-Maximization (EM) algorithm. In the E-step, we calculate the likelihood of each event being explained by each discovered rule. In the M-step, we update both the rule set and model parameters to enhance the likelihood function's lower bound. Notably, we optimize the rule set in a differential manner. Our approach demonstrates accurate performance in both discovering rules and identifying root causes. We showcase its promising results using synthetic and real healthcare datasets.
△ Less
Submitted 19 March, 2024; v1 submitted 3 February, 2024;
originally announced February 2024.
-
Machine Learning Insides OptVerse AI Solver: Design Principles and Applications
Authors:
Xijun Li,
Fangzhou Zhu,
Hui-Ling Zhen,
Weilin Luo,
Meng Lu,
Yimin Huang,
Zhenan Fan,
Zirui Zhou,
Yufei Kuang,
Zhihai Wang,
Zijie Geng,
Yang Li,
Haoyang Liu,
Zhiwu An,
Muming Yang,
Jianshu Li,
Jie Wang,
Junchi Yan,
Defeng Sun,
Tao Zhong,
Yong Zhang,
Jia Zeng,
Mingxuan Yuan,
Jianye Hao,
Jun Yao
, et al. (1 additional authors not shown)
Abstract:
In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional opt…
▽ More
In an era of digital ubiquity, efficient resource management and decision-making are paramount across numerous industries. To this end, we present a comprehensive study on the integration of machine learning (ML) techniques into Huawei Cloud's OptVerse AI Solver, which aims to mitigate the scarcity of real-world mathematical programming instances, and to surpass the capabilities of traditional optimization techniques. We showcase our methods for generating complex SAT and MILP instances utilizing generative models that mirror multifaceted structures of real-world problem. Furthermore, we introduce a training framework leveraging augmentation policies to maintain solvers' utility in dynamic environments. Besides the data generation and augmentation, our proposed approaches also include novel ML-driven policies for personalized solver strategies, with an emphasis on applications like graph convolutional networks for initial basis selection and reinforcement learning for advanced presolving and cut selection. Additionally, we detail the incorporation of state-of-the-art parameter tuning algorithms which markedly elevate solver performance. Compared with traditional solvers such as Cplex and SCIP, our ML-augmented OptVerse AI Solver demonstrates superior speed and precision across both established benchmarks and real-world scenarios, reinforcing the practical imperative and effectiveness of machine learning techniques in mathematical programming solvers.
△ Less
Submitted 17 January, 2024; v1 submitted 11 January, 2024;
originally announced January 2024.
-
On the Coherency of Completed Group Algebra
Authors:
David Burns,
Yu Kuang,
Dingli Liang
Abstract:
We investigate coherency properties of certain completed integral group rings, precisely for compact $p$-adic Lie groups.
We investigate coherency properties of certain completed integral group rings, precisely for compact $p$-adic Lie groups.
△ Less
Submitted 16 January, 2024; v1 submitted 10 January, 2024;
originally announced January 2024.
-
Non-Vacuous Generalization Bounds for Large Language Models
Authors:
Sanae Lotfi,
Marc Finzi,
Yilun Kuang,
Tim G. J. Rudner,
Micah Goldblum,
Andrew Gordon Wilson
Abstract:
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular,…
▽ More
Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we derive a compression bound that is valid for the unbounded log-likelihood loss using prediction smoothing, and we extend the bound to handle subsampling, accelerating bound computation on massive datasets. To achieve the extreme level of compression required for non-vacuous generalization bounds, we devise SubLoRA, a low-dimensional non-linear parameterization. Using this approach, we find that larger models have better generalization bounds and are more compressible than smaller models.
△ Less
Submitted 12 February, 2024; v1 submitted 28 December, 2023;
originally announced December 2023.
-
Probing scalar induced gravitational waves with PTA and LISA: The Importance of third order correction
Authors:
Zhe Chang,
Yu-Ting Kuang,
Di Wu,
Jing-Zhi Zhou
Abstract:
We revisit the calculation of third order \acp{SIGW} and extend it from a monochromatic primordial power spectrum to a more general log-normal one. We investigate the impact of third order SIGWs on \ac{SNR} of \ac{LISA} and \ac{PTA} observations, and find that third order SIGWs significantly contribute to the total energy density spectrum of \acp{GW} in high-frequency region. For a primordial powe…
▽ More
We revisit the calculation of third order \acp{SIGW} and extend it from a monochromatic primordial power spectrum to a more general log-normal one. We investigate the impact of third order SIGWs on \ac{SNR} of \ac{LISA} and \ac{PTA} observations, and find that third order SIGWs significantly contribute to the total energy density spectrum of \acp{GW} in high-frequency region. For a primordial power spectrum amplitude of $A_ζ=10^{-2}\sim 10^{-1}$, the effects of third order SIGWs lead to a $40\%$ to $400\%$ increase in the SNR for LISA. Additionally, our PTA data analysis reveals that third order SIGWs diminish both the amplitude $A_ζ$ and the peak frequency $f_*$ of the primordial power spectrum.
△ Less
Submitted 26 February, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Unsupervised learning on spontaneous retinal activity leads to efficient neural representation geometry
Authors:
Andrew Ligeralde,
Yilun Kuang,
Thomas Edward Yerxa,
Miah N. Pitcher,
Marla Feller,
SueYeon Chung
Abstract:
Prior to the onset of vision, neurons in the developing mammalian retina spontaneously fire in correlated activity patterns known as retinal waves. Experimental evidence suggests that retinal waves strongly influence the emergence of sensory representations before visual experience. We aim to model this early stage of functional development by using movies of neurally active developing retinas as…
▽ More
Prior to the onset of vision, neurons in the developing mammalian retina spontaneously fire in correlated activity patterns known as retinal waves. Experimental evidence suggests that retinal waves strongly influence the emergence of sensory representations before visual experience. We aim to model this early stage of functional development by using movies of neurally active developing retinas as pre-training data for neural networks. Specifically, we pre-train a ResNet-18 with an unsupervised contrastive learning objective (SimCLR) on both simulated and experimentally-obtained movies of retinal waves, then evaluate its performance on image classification tasks. We find that pre-training on retinal waves significantly improves performance on tasks that test object invariance to spatial translation, while slightly improving performance on more complex tasks like image classification. Notably, these performance boosts are realized on held-out natural images even though the pre-training procedure does not include any natural image data. We then propose a geometrical explanation for the increase in network performance, namely that the spatiotemporal characteristics of retinal waves facilitate the formation of separable feature representations. In particular, we demonstrate that networks pre-trained on retinal waves are more effective at separating image manifolds than randomly initialized networks, especially for manifolds defined by sets of spatial translations. These findings indicate that the broad spatiotemporal properties of retinal waves prepare networks for higher order feature extraction.
△ Less
Submitted 5 December, 2023;
originally announced December 2023.
-
Extracting neutron skin from elastic proton-nucleus scattering with deep neural network
Authors:
G. H. Yang,
Y. Kuang,
Z. X. Yang,
Z. P. Li
Abstract:
Based on the relativistic impulse approximation of proton-nucleus elastic scattering theory, the nucleon density distribution and neutron skin thickness of $^{48}$Ca are estimated via the deep learning method. The neural-network-generated densities are mainly compressed to be lower inside the nucleus compared with the results from the relativistic PC-PK1 density functional, resulting in a signific…
▽ More
Based on the relativistic impulse approximation of proton-nucleus elastic scattering theory, the nucleon density distribution and neutron skin thickness of $^{48}$Ca are estimated via the deep learning method. The neural-network-generated densities are mainly compressed to be lower inside the nucleus compared with the results from the relativistic PC-PK1 density functional, resulting in a significant improvement on the large-angle scattering observables, both for the differential cross section and analyzing power. The neutron skin thickness of $^{48}$Ca is captured to be 0.211(11) fm. The relatively thicker neutron skin is deemed reasonable from the perspective of density functional analysis.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
New constraints on primordial non-Gaussianity from missing two-loop contributions of scalar induced gravitational waves
Authors:
Zhe Chang,
Yu-Ting Kuang,
Di Wu,
Jing-Zhi Zhou,
Qing-Hua Zhu
Abstract:
We analyze the energy density spectrum of \acp{SIGW} using the NANOGrav 15-year data set, thereby constraining the primordial non-Gaussian parameter $f_{\mathrm{NL}}$. For the first time, we calculate the seventeen missing two-loop diagrams proportional to $f_{\mathrm{NL}}A_ζ^3$ that correspond to the two-point correlation function $\langle h^{λ,(3)}_{\mathbf{k}} h^{λ',(2)}_{\mathbf{k}'} \rangle$…
▽ More
We analyze the energy density spectrum of \acp{SIGW} using the NANOGrav 15-year data set, thereby constraining the primordial non-Gaussian parameter $f_{\mathrm{NL}}$. For the first time, we calculate the seventeen missing two-loop diagrams proportional to $f_{\mathrm{NL}}A_ζ^3$ that correspond to the two-point correlation function $\langle h^{λ,(3)}_{\mathbf{k}} h^{λ',(2)}_{\mathbf{k}'} \rangle$ for local-type primordial non-Gaussianity. The total energy density spectrum of \acp{SIGW} can be significantly suppressed by these two-loop diagrams. If \acp{SIGW} dominate the \acp{SGWB} observed in \ac{PTA} experiments, the parameter interval $f_{\mathrm{NL}}\in [-5,-1]$ is notably excluded based on NANOGrav 15-year data set. After taking into account abundance of \acp{PBH} and the convergence of the cosmological perturbation expansion, we find that the only possible parameter range for $f_{\mathrm{NL}}$ might be $-1\le f_{\mathrm{NL}}< 0$.
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
State Sequences Prediction via Fourier Transform for Representation Learning
Authors:
Mingxuan Ye,
Yufei Kuang,
Jie Wang,
Rui Yang,
Wengang Zhou,
Houqiang Li,
Feng Wu
Abstract:
While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However,…
▽ More
While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Towards chemical accuracy using a multi-mesh adaptive finite element method in all-electron density functional theory
Authors:
Yang Kuang,
Yedan Shen,
Guanghui Hu
Abstract:
Chemical accuracy serves as an important metric for assessing the effectiveness of the numerical method in Kohn--Sham density functional theory. It is found that to achieve chemical accuracy, not only the Kohn--Sham wavefunctions but also the Hartree potential, should be approximated accurately. Under the adaptive finite element framework, this can be implemented by constructing the \emph{a poster…
▽ More
Chemical accuracy serves as an important metric for assessing the effectiveness of the numerical method in Kohn--Sham density functional theory. It is found that to achieve chemical accuracy, not only the Kohn--Sham wavefunctions but also the Hartree potential, should be approximated accurately. Under the adaptive finite element framework, this can be implemented by constructing the \emph{a posteriori} error indicator based on approximations of the aforementioned two quantities. However, this way results in a large amount of computational cost. To reduce the computational cost, we propose a novel multi-mesh adaptive method, in which the Kohn--Sham equation and the Poisson equation are solved in two different meshes on the same computational domain, respectively. With the proposed method, chemical accuracy can be achieved with less computational consumption compared with the adaptive method on a single mesh, as demonstrated in a number of numerical experiments.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Promoting Generalization for Exact Solvers via Adversarial Instance Augmentation
Authors:
Haoyang Liu,
Yufei Kuang,
Jie Wang,
Xijun Li,
Yongdong Zhang,
Feng Wu
Abstract:
Machine learning has been successfully applied to improve the efficiency of Mixed-Integer Linear Programming (MILP) solvers. However, the learning-based solvers often suffer from severe performance degradation on unseen MILP instances -- especially on large-scale instances from a perturbed environment -- due to the limited diversity of training distributions. To tackle this problem, we propose a n…
▽ More
Machine learning has been successfully applied to improve the efficiency of Mixed-Integer Linear Programming (MILP) solvers. However, the learning-based solvers often suffer from severe performance degradation on unseen MILP instances -- especially on large-scale instances from a perturbed environment -- due to the limited diversity of training distributions. To tackle this problem, we propose a novel approach, which is called Adversarial Instance Augmentation and does not require to know the problem type for new instance generation, to promote data diversity for learning-based branching modules in the branch-and-bound (B&B) Solvers (AdaSolver). We use the bipartite graph representations for MILP instances and obtain various perturbed instances to regularize the solver by augmenting the graph structures with a learned augmentation policy. The major technical contribution of AdaSolver is that we formulate the non-differentiable instance augmentation as a contextual bandit problem and adversarially train the learning-based solver and augmentation policy, enabling efficient gradient-based training of the augmentation policy. To the best of our knowledge, AdaSolver is the first general and effective framework for understanding and improving the generalization of both imitation-learning-based (IL-based) and reinforcement-learning-based (RL-based) B&B solvers. Extensive experiments demonstrate that by producing various augmented instances, AdaSolver leads to a remarkable efficiency improvement across various distributions.
△ Less
Submitted 21 October, 2023;
originally announced October 2023.
-
Accelerate Presolve in Large-Scale Linear Programming via Reinforcement Learning
Authors:
Yufei Kuang,
Xijun Li,
Jie Wang,
Fangzhou Zhu,
Meng Lu,
Zhihai Wang,
Jia Zeng,
Houqiang Li,
Yongdong Zhang,
Feng Wu
Abstract:
Large-scale LP problems from industry usually contain much redundancy that severely hurts the efficiency and reliability of solving LPs, making presolve (i.e., the problem simplification module) one of the most critical components in modern LP solvers. However, how to design high-quality presolve routines -- that is, the program determining (P1) which presolvers to select, (P2) in what order to ex…
▽ More
Large-scale LP problems from industry usually contain much redundancy that severely hurts the efficiency and reliability of solving LPs, making presolve (i.e., the problem simplification module) one of the most critical components in modern LP solvers. However, how to design high-quality presolve routines -- that is, the program determining (P1) which presolvers to select, (P2) in what order to execute, and (P3) when to stop -- remains a highly challenging task due to the extensive requirements on expert knowledge and the large search space. Due to the sequential decision property of the task and the lack of expert demonstrations, we propose a simple and efficient reinforcement learning (RL) framework -- namely, reinforcement learning for presolve (RL4Presolve) -- to tackle (P1)-(P3) simultaneously. Specifically, we formulate the routine design task as a Markov decision process and propose an RL framework with adaptive action sequences to generate high-quality presolve routines efficiently. Note that adaptive action sequences help learn complex behaviors efficiently and adapt to various benchmarks. Experiments on two solvers (open-source and commercial) and eight benchmarks (real-world and synthetic) demonstrate that RL4Presolve significantly and consistently improves the efficiency of solving large-scale LPs, especially on benchmarks from industry. Furthermore, we optimize the hard-coded presolve routines in LP solvers by extracting rules from learned policies for simple and efficient deployment to Huawei's supply chain. The results show encouraging economic and academic potential for incorporating machine learning to modern solvers.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
STOPNet: Multiview-based 6-DoF Suction Detection for Transparent Objects on Production Lines
Authors:
Yuxuan Kuang,
Qin Han,
Danshi Li,
Qiyu Dai,
Lian Ding,
Dong Sun,
Hanlin Zhao,
He Wang
Abstract:
In this work, we present STOPNet, a framework for 6-DoF object suction detection on production lines, with a focus on but not limited to transparent objects, which is an important and challenging problem in robotic systems and modern industry. Current methods requiring depth input fail on transparent objects due to depth cameras' deficiency in sensing their geometry, while we proposed a novel fram…
▽ More
In this work, we present STOPNet, a framework for 6-DoF object suction detection on production lines, with a focus on but not limited to transparent objects, which is an important and challenging problem in robotic systems and modern industry. Current methods requiring depth input fail on transparent objects due to depth cameras' deficiency in sensing their geometry, while we proposed a novel framework to reconstruct the scene on the production line depending only on RGB input, based on multiview stereo. Compared to existing works, our method not only reconstructs the whole 3D scene in order to obtain high-quality 6-DoF suction poses in real time but also generalizes to novel environments, novel arrangements and novel objects, including challenging transparent objects, both in simulation and the real world. Extensive experiments in simulation and the real world show that our method significantly surpasses the baselines and has better generalizability, which caters to practical industrial needs.
△ Less
Submitted 9 October, 2023;
originally announced October 2023.
-
Scalar Induced Gravitational Waves from Finslerian Inflation and Pulsar Timing Arrays Observations
Authors:
Zhe Chang,
Yu-Ting Kuang,
Di Wu,
Jing-Zhi Zhou
Abstract:
The recent data from NANOGrav provide strong evidence of the existence of the \acp{SGWB}. We investigate \acp{SIGW} from Finslerian inflation as a potential source of stochastic gravitational wave background. Small-scale ($\lesssim$1 Mpc) statistically anisotropic primordial scalar perturbations can be generated in Finslerian inflation. The second order \acp{SIGW} from Finslerian inflation are als…
▽ More
The recent data from NANOGrav provide strong evidence of the existence of the \acp{SGWB}. We investigate \acp{SIGW} from Finslerian inflation as a potential source of stochastic gravitational wave background. Small-scale ($\lesssim$1 Mpc) statistically anisotropic primordial scalar perturbations can be generated in Finslerian inflation. The second order \acp{SIGW} from Finslerian inflation are also anisotropic on small scales. After spatially averaging the small-scale anisotropic \acp{SIGW}, we obtain the large-scale isotropic \acp{SGWB}. We find that the parameters of small-scale anisotropic primordial power spectrum generated by Finslerian inflation affect the \acp{PTA} observations of large-scale isotropic gravitational wave background.
△ Less
Submitted 12 September, 2023;
originally announced September 2023.
-
Robotic Table Tennis: A Case Study into a High Speed Learning System
Authors:
David B. D'Ambrosio,
Jonathan Abelian,
Saminda Abeyruwan,
Michael Ahn,
Alex Bewley,
Justin Boyd,
Krzysztof Choromanski,
Omar Cortes,
Erwin Coumans,
Tianli Ding,
Wenbo Gao,
Laura Graesser,
Atil Iscen,
Navdeep Jaitly,
Deepali Jain,
Juhana Kangaspunta,
Satoshi Kataoka,
Gus Kouretas,
Yuheng Kuang,
Nevena Lazic,
Corey Lynch,
Reza Mahjourian,
Sherry Q. Moore,
Thinh Nguyen,
Ken Oslund
, et al. (10 additional authors not shown)
Abstract:
We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real w…
▽ More
We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.
△ Less
Submitted 6 September, 2023;
originally announced September 2023.
-
RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control
Authors:
Anthony Brohan,
Noah Brown,
Justice Carbajal,
Yevgen Chebotar,
Xi Chen,
Krzysztof Choromanski,
Tianli Ding,
Danny Driess,
Avinava Dubey,
Chelsea Finn,
Pete Florence,
Chuyuan Fu,
Montse Gonzalez Arenas,
Keerthana Gopalakrishnan,
Kehang Han,
Karol Hausman,
Alexander Herzog,
Jasmine Hsu,
Brian Ichter,
Alex Irpan,
Nikhil Joshi,
Ryan Julian,
Dmitry Kalashnikov,
Yuheng Kuang,
Isabel Leal
, et al. (29 additional authors not shown)
Abstract:
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web.…
▽ More
We study how vision-language models trained on Internet-scale data can be incorporated directly into end-to-end robotic control to boost generalization and enable emergent semantic reasoning. Our goal is to enable a single end-to-end trained model to both learn to map robot observations to actions and enjoy the benefits of large-scale pretraining on language and vision-language data from the web. To this end, we propose to co-fine-tune state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks, such as visual question answering. In contrast to other approaches, we propose a simple, general recipe to achieve this goal: in order to fit both natural language responses and robotic actions into the same format, we express the actions as text tokens and incorporate them directly into the training set of the model in the same way as natural language tokens. We refer to such category of models as vision-language-action models (VLA) and instantiate an example of such a model, which we call RT-2. Our extensive evaluation (6k evaluation trials) shows that our approach leads to performant robotic policies and enables RT-2 to obtain a range of emergent capabilities from Internet-scale training. This includes significantly improved generalization to novel objects, the ability to interpret commands not present in the robot training data (such as placing an object onto a particular number or icon), and the ability to perform rudimentary reasoning in response to user commands (such as picking up the smallest or largest object, or the one closest to another object). We further show that incorporating chain of thought reasoning allows RT-2 to perform multi-stage semantic reasoning, for example figuring out which object to pick up for use as an improvised hammer (a rock), or which type of drink is best suited for someone who is tired (an energy drink).
△ Less
Submitted 28 July, 2023;
originally announced July 2023.
-
Primordial black holes from second order density perturbations as probes of the small-scale primordial power spectrum
Authors:
Yu-Ting Kuang,
Jing-Zhi Zhou,
Zhe Chang,
Xukun Zhang,
Qing-Hua Zhu
Abstract:
We investigate the second order energy density perturbation $δ^{(2)}$ induced by small-scale Gaussian and local-type non-Gaussian primordial curvature perturbations. The relative abundance of primordial black hole is calculated in terms of the probability density function of total energy density perturbation $δ_r=δ^{(1)}+\frac{1}{2}δ^{(2)}$. The effects of second order density perturbation greatly…
▽ More
We investigate the second order energy density perturbation $δ^{(2)}$ induced by small-scale Gaussian and local-type non-Gaussian primordial curvature perturbations. The relative abundance of primordial black hole is calculated in terms of the probability density function of total energy density perturbation $δ_r=δ^{(1)}+\frac{1}{2}δ^{(2)}$. The effects of second order density perturbation greatly reduce the upper bounds of small-scale power spectra of primordial curvature perturbations by one to two orders of magnitude. For log-normal primordial power spectrum, its amplitude $A_ζ$ is constrained to be about $A_ζ\sim 3\times10^{-3}$. And for local-type non-Gaussianity with $f_{\mathrm{NL}}=10$, the upper bound of $A_ζ$ is about $2.5\times10^{-4}$.
△ Less
Submitted 7 July, 2023; v1 submitted 5 July, 2023;
originally announced July 2023.
-
Barkour: Benchmarking Animal-level Agility with Quadruped Robots
Authors:
Ken Caluwaerts,
Atil Iscen,
J. Chase Kew,
Wenhao Yu,
Tingnan Zhang,
Daniel Freeman,
Kuang-Huei Lee,
Lisa Lee,
Stefano Saliceti,
Vincent Zhuang,
Nathan Batchelor,
Steven Bohez,
Federico Casarini,
Jose Enrique Chen,
Omar Cortes,
Erwin Coumans,
Adil Dostmohamed,
Gabriel Dulac-Arnold,
Alejandro Escontrela,
Erik Frey,
Roland Hafner,
Deepali Jain,
Bauyrjan Jyenis,
Yuheng Kuang,
Edward Lee
, et al. (19 additional authors not shown)
Abstract:
Animals have evolved various agile locomotion strategies, such as sprinting, leaping, and jumping. There is a growing interest in developing legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agili…
▽ More
Animals have evolved various agile locomotion strategies, such as sprinting, leaping, and jumping. There is a growing interest in developing legged robots that move like their biological counterparts and show various agile skills to navigate complex environments quickly. Despite the interest, the field lacks systematic benchmarks to measure the performance of control policies and hardware in agility. We introduce the Barkour benchmark, an obstacle course to quantify agility for legged robots. Inspired by dog agility competitions, it consists of diverse obstacles and a time based scoring mechanism. This encourages researchers to develop controllers that not only move fast, but do so in a controllable and versatile way. To set strong baselines, we present two methods for tackling the benchmark. In the first approach, we train specialist locomotion skills using on-policy reinforcement learning methods and combine them with a high-level navigation controller. In the second approach, we distill the specialist skills into a Transformer-based generalist locomotion policy, named Locomotion-Transformer, that can handle various terrains and adjust the robot's gait based on the perceived environment and robot states. Using a custom-built quadruped robot, we demonstrate that our method can complete the course at half the speed of a dog. We hope that our work represents a step towards creating controllers that enable robots to reach animal-level agility.
△ Less
Submitted 23 May, 2023;
originally announced May 2023.
-
Learning Efficient Coding of Natural Images with Maximum Manifold Capacity Representations
Authors:
Thomas Yerxa,
Yilun Kuang,
Eero Simoncelli,
SueYeon Chung
Abstract:
The efficient coding hypothesis proposes that the response properties of sensory systems are adapted to the statistics of their inputs such that they capture maximal information about the environment, subject to biological constraints. While elegant, information theoretic properties are notoriously difficult to measure in practical settings or to employ as objective functions in optimization. This…
▽ More
The efficient coding hypothesis proposes that the response properties of sensory systems are adapted to the statistics of their inputs such that they capture maximal information about the environment, subject to biological constraints. While elegant, information theoretic properties are notoriously difficult to measure in practical settings or to employ as objective functions in optimization. This difficulty has necessitated that computational models designed to test the hypothesis employ several different information metrics ranging from approximations and lower bounds to proxy measures like reconstruction error. Recent theoretical advances have characterized a novel and ecologically relevant efficiency metric, the manifold capacity, which is the number of object categories that may be represented in a linearly separable fashion. However, calculating manifold capacity is a computationally intensive iterative procedure that until now has precluded its use as an objective. Here we outline the simplifying assumptions that allow manifold capacity to be optimized directly, yielding Maximum Manifold Capacity Representations (MMCR). The resulting method is closely related to and inspired by advances in the field of self supervised learning (SSL), and we demonstrate that MMCRs are competitive with state of the art results on standard SSL benchmarks. Empirical analyses reveal differences between MMCRs and representations learned by other SSL frameworks, and suggest a mechanism by which manifold compression gives rise to class separability. Finally we evaluate a set of SSL methods on a suite of neural predictivity benchmarks, and find MMCRs are higly competitive as models of the ventral stream.
△ Less
Submitted 3 December, 2023; v1 submitted 6 March, 2023;
originally announced March 2023.
-
Pulsars as candidates of LHAASO sources J2226+6057, J1908+0621 and J1825-1326: The leptonic origin
Authors:
Zhe Chang,
Yu-Ting Kuang,
Xukun Zhang,
Jing-Zhi Zhou
Abstract:
Recently, from 12 $γ$-ray Galactic sources, the LHAASO has detected ultrahigh-energy photons up to 1.4PeV. The $γ$-ray spectra of the sources J2226+6057, J1908+0621, J1825-1326 and the suggested origin pulsars near the sources have been published. In our previous work, we studied the hadronic $γ$-ray spectra of the sources J2226+6057, J1908+0621, J1825-1326 in terms of the Hertzian dipole model of…
▽ More
Recently, from 12 $γ$-ray Galactic sources, the LHAASO has detected ultrahigh-energy photons up to 1.4PeV. The $γ$-ray spectra of the sources J2226+6057, J1908+0621, J1825-1326 and the suggested origin pulsars near the sources have been published. In our previous work, we studied the hadronic $γ$-ray spectra of the sources J2226+6057, J1908+0621, J1825-1326 in terms of the Hertzian dipole model of pulsar. In this paper, we investigate the possibility of the leptonic origin of the $γ$-ray. We use the Hertzian dipole model to describe the pulsars around the sources. The electrons around the pulsars can be accelerated to PeV by the electromagnetic fields of pulsars. Under the assumption that the initial electrons are uniform distributed in a spherical shell between $10^{7}$ to $10^{9}$m around the pulsar, we obtain the energy distribution of electrons. The leptonic $γ$-ray spectra can be calculated through inverse Compton scattering processes. The leptonic $γ$-ray can roughly conform to the observation of LHAASO.
△ Less
Submitted 26 April, 2023; v1 submitted 28 February, 2023;
originally announced March 2023.
-
Learning Cut Selection for Mixed-Integer Linear Programming via Hierarchical Sequence Model
Authors:
Zhihai Wang,
Xijun Li,
Jie Wang,
Yufei Kuang,
Mingxuan Yuan,
Jia Zeng,
Yongdong Zhang,
Feng Wu
Abstract:
Cutting planes (cuts) are important for solving mixed-integer linear programs (MILPs), which formulate a wide range of important real-world applications. Cut selection -- which aims to select a proper subset of the candidate cuts to improve the efficiency of solving MILPs -- heavily depends on (P1) which cuts should be preferred, and (P2) how many cuts should be selected. Although many modern MILP…
▽ More
Cutting planes (cuts) are important for solving mixed-integer linear programs (MILPs), which formulate a wide range of important real-world applications. Cut selection -- which aims to select a proper subset of the candidate cuts to improve the efficiency of solving MILPs -- heavily depends on (P1) which cuts should be preferred, and (P2) how many cuts should be selected. Although many modern MILP solvers tackle (P1)-(P2) by manually designed heuristics, machine learning offers a promising approach to learn more effective heuristics from MILPs collected from specific applications. However, many existing learning-based methods focus on learning which cuts should be preferred, neglecting the importance of learning the number of cuts that should be selected. Moreover, we observe from extensive empirical results that (P3) what order of selected cuts should be preferred has a significant impact on the efficiency of solving MILPs as well. To address this challenge, we propose a novel hierarchical sequence model (HEM) to learn cut selection policies via reinforcement learning. Specifically, HEM consists of a two-level model: (1) a higher-level model to learn the number of cuts that should be selected, (2) and a lower-level model -- that formulates the cut selection task as a sequence to sequence learning problem -- to learn policies selecting an ordered subset with the size determined by the higher-level model. To the best of our knowledge, HEM is the first method that can tackle (P1)-(P3) in cut selection simultaneously from a data-driven perspective. Experiments show that HEM significantly improves the efficiency of solving MILPs compared to human-designed and learning-based baselines on both synthetic and large-scale real-world MILPs, including MIPLIB 2017. Moreover, experiments demonstrate that HEM well generalizes to MILPs that are significantly larger than those seen during training.
△ Less
Submitted 31 January, 2023;
originally announced February 2023.
-
RT-1: Robotics Transformer for Real-World Control at Scale
Authors:
Anthony Brohan,
Noah Brown,
Justice Carbajal,
Yevgen Chebotar,
Joseph Dabis,
Chelsea Finn,
Keerthana Gopalakrishnan,
Karol Hausman,
Alex Herzog,
Jasmine Hsu,
Julian Ibarz,
Brian Ichter,
Alex Irpan,
Tomas Jackson,
Sally Jesmonth,
Nikhil J Joshi,
Ryan Julian,
Dmitry Kalashnikov,
Yuheng Kuang,
Isabel Leal,
Kuang-Huei Lee,
Sergey Levine,
Yao Lu,
Utsav Malla,
Deeksha Manjunath
, et al. (26 additional authors not shown)
Abstract:
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, wher…
▽ More
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine learning models can solve specific downstream tasks either zero-shot or with small task-specific datasets to a high level of performance. While this capability has been demonstrated in other fields such as computer vision, natural language processing or speech recognition, it remains to be shown in robotics, where the generalization capabilities of the models are particularly critical due to the difficulty of collecting real-world robotic data. We argue that one of the keys to the success of such general robotic models lies with open-ended task-agnostic training, combined with high-capacity architectures that can absorb all of the diverse, robotic data. In this paper, we present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks. The project's website and videos can be found at robotics-transformer1.github.io
△ Less
Submitted 11 August, 2023; v1 submitted 13 December, 2022;
originally announced December 2022.
-
Primordial gravitational waves and curvature perturbations induced energy density perturbation
Authors:
Zhe Chang,
Yu-Ting Kuang,
Xukun Zhang,
Jing-Zhi Zhou
Abstract:
We study the second order scalar and density perturbations generated by the Gaussian curvature perturbations and primordial gravitational waves in the radiation-dominated era. After presenting all the possible second-order source terms, we obtain the explicit expressions of the kernel functions and the power spectra of the second order scalar perturbations. It shows that the primordial gravitation…
▽ More
We study the second order scalar and density perturbations generated by the Gaussian curvature perturbations and primordial gravitational waves in the radiation-dominated era. After presenting all the possible second-order source terms, we obtain the explicit expressions of the kernel functions and the power spectra of the second order scalar perturbations. It shows that the primordial gravitational waves might affect the second order energy density perturbation significantly. The effects of the primordial gravitational waves are studied in terms of different kinds of primordial power spectra.
△ Less
Submitted 28 January, 2024; v1 submitted 21 November, 2022;
originally announced November 2022.
-
On Galois-Gauss sums and the square root of the inverse different
Authors:
Y. Kuang
Abstract:
We discuss a possible generalisation of a conjecture of Bley, Burns and Hahn concerning the relation between the second Adams-operator twisted Galois-Gauss sums of weakly ramified Artin characters and the square root of the inverse different of finite, odd degree, Galois extensions of number fields, to the setting of all finite Galois extensions of number fields for which a square root of the inve…
▽ More
We discuss a possible generalisation of a conjecture of Bley, Burns and Hahn concerning the relation between the second Adams-operator twisted Galois-Gauss sums of weakly ramified Artin characters and the square root of the inverse different of finite, odd degree, Galois extensions of number fields, to the setting of all finite Galois extensions of number fields for which a square root of the inverse different exists. We also extend the key methods and results of Bley, Burns and Hahn to this more general setting and, by combining these methods with a recent result of Agboola, Burns, Caputo and the present author concerning Artin root numbers of twisted irreducible symplectic characters, we provide new insight into a conjecture of Erez concerning the Galois structure of the square root of the inverse different.
△ Less
Submitted 19 October, 2022;
originally announced October 2022.
-
Primordial black holes and third order scalar induced gravitational waves
Authors:
Zhe Chang,
Yu-Ting Kuang,
Xukun Zhang,
Jing-Zhi Zhou
Abstract:
The process of \acp{PBH} formation would be inevitably accompanied by \acp{SIGW}. This strong correlation between \acp{PBH} and \acp{SIGW} signals could be a promising approach to detecting \acp{PBH} in the upcoming \ac{GW} experiments, such as \ac{LISA}. We investigate the third order \acp{SIGW} during a \ac{RD} era in the case of a monochromatic primordial power spectrum…
▽ More
The process of \acp{PBH} formation would be inevitably accompanied by \acp{SIGW}. This strong correlation between \acp{PBH} and \acp{SIGW} signals could be a promising approach to detecting \acp{PBH} in the upcoming \ac{GW} experiments, such as \ac{LISA}. We investigate the third order \acp{SIGW} during a \ac{RD} era in the case of a monochromatic primordial power spectrum $\mathcal{P}_ζ=A_ζk_*δ\left(k-k_*\right)$. For \ac{LISA} observations, the relations between \ac{SNR} and monochromatic primordial power spectrum are studied systematically. It shows that the effects of third order \acp{SIGW} extend the cutoff frequency from $2f_*$ to $3f_*$ and lead to about $200\%$ increase of the \ac{SNR} for frequency band from $10^{-5}$Hz to $1.6\times 10^{-3}$Hz corresponding to \acp{PBH} with mass range $4\times 10^{-12}M_{\odot} \sim 10^{-7}M_{\odot}$. We find that there exists a critical value $A_*=1.76\times 10^{-2}$ for the amplitude of the monochromatic primordial power spectra, such that when $A_ζ>A_*$, the energy density of third order \acp{SIGW} will be larger than the energy density of second order \acp{SIGW}.
△ Less
Submitted 22 March, 2023; v1 submitted 26 September, 2022;
originally announced September 2022.
-
The emergence of a virus variant: dynamics of a competition model with cross-immunity time-delay validated by wastewater surveillance data for COVID-19
Authors:
Bruce Pell,
Samantha Brozak,
Tin Phan,
Fuqing Wu,
Yang Kuang
Abstract:
We consider the dynamics of a virus spreading through a population that produces a mutant strain with the ability to infect individuals that were infected with the established strain. Temporary cross-immunity is included using a time delay, but is found to be a harmless delay. We provide some sufficient conditions that guarantee local and global asymptotic stability of the disease-free equilibrium…
▽ More
We consider the dynamics of a virus spreading through a population that produces a mutant strain with the ability to infect individuals that were infected with the established strain. Temporary cross-immunity is included using a time delay, but is found to be a harmless delay. We provide some sufficient conditions that guarantee local and global asymptotic stability of the disease-free equilibrium and the two boundary equilibria when the two strains outcompete one another. It is shown that, due to the immune evasion of the emerging strain, the reproduction number of the emerging strain must be significantly lower than that of the established strain for the local stability of the established-strain-only boundary equilibrium. To analyze the unique coexistence equilibrium we apply a quasi steady-state argument to reduce the full model to a two-dimensional one that exhibits a global asymptotically stable established-strain-only equilibrium or global asymptotically stable coexistence equilibrium. Our results indicate that the basic reproduction numbers of both strains govern the overall dynamics, but in nontrivial ways due to the inclusion of cross-immunity. The model is applied to study the emergence of the SARS-CoV-2 Delta variant in the presence of the Alpha variant using wastewater surveillance data from the Deer Island Treatment Plant in Massachusetts, USA.
△ Less
Submitted 15 February, 2023; v1 submitted 15 September, 2022;
originally announced September 2022.
-
Broad Recommender System: An Efficient Nonlinear Collaborative Filtering Approach
Authors:
Ling Huang,
Can-Rong Guan,
Zhen-Wei Huang,
Yuefang Gao,
Yingjie Kuang,
Chang-Dong Wang,
C. L. Philip Chen
Abstract:
Recently, Deep Neural Networks (DNNs) have been widely introduced into Collaborative Filtering (CF) to produce more accurate recommendation results due to their capability of capturing the complex nonlinear relationships between items and users.However, the DNNs-based models usually suffer from high computational complexity, i.e., consuming very long training time and storing huge amount of traina…
▽ More
Recently, Deep Neural Networks (DNNs) have been widely introduced into Collaborative Filtering (CF) to produce more accurate recommendation results due to their capability of capturing the complex nonlinear relationships between items and users.However, the DNNs-based models usually suffer from high computational complexity, i.e., consuming very long training time and storing huge amount of trainable parameters. To address these problems, we propose a new broad recommender system called Broad Collaborative Filtering (BroadCF), which is an efficient nonlinear collaborative filtering approach. Instead of DNNs, Broad Learning System (BLS) is used as a mapping function to learn the complex nonlinear relationships between users and items, which can avoid the above issues while achieving very satisfactory recommendation performance. However, it is not feasible to directly feed the original rating data into BLS. To this end, we propose a user-item rating collaborative vector preprocessing procedure to generate low-dimensional user-item input data, which is able to harness quality judgments of the most similar users/items. Extensive experiments conducted on seven benchmark datasets have confirmed the effectiveness of the proposed BroadCF algorithm
△ Less
Submitted 24 February, 2024; v1 submitted 19 April, 2022;
originally announced April 2022.
-
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Authors:
Michael Ahn,
Anthony Brohan,
Noah Brown,
Yevgen Chebotar,
Omar Cortes,
Byron David,
Chelsea Finn,
Chuyuan Fu,
Keerthana Gopalakrishnan,
Karol Hausman,
Alex Herzog,
Daniel Ho,
Jasmine Hsu,
Julian Ibarz,
Brian Ichter,
Alex Irpan,
Eric Jang,
Rosario Jauregui Ruano,
Kyle Jeffrey,
Sally Jesmonth,
Nikhil J Joshi,
Ryan Julian,
Dmitry Kalashnikov,
Yuheng Kuang,
Kuang-Huei Lee
, et al. (20 additional authors not shown)
Abstract:
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embo…
▽ More
Large language models can encode a wealth of semantic knowledge about the world. Such knowledge could be extremely useful to robots aiming to act upon high-level, temporally extended instructions expressed in natural language. However, a significant weakness of language models is that they lack real-world experience, which makes it difficult to leverage them for decision making within a given embodiment. For example, asking a language model to describe how to clean a spill might result in a reasonable narrative, but it may not be applicable to a particular agent, such as a robot, that needs to perform this task in a particular environment. We propose to provide real-world grounding by means of pretrained skills, which are used to constrain the model to propose natural language actions that are both feasible and contextually appropriate. The robot can act as the language model's "hands and eyes," while the language model supplies high-level semantic knowledge about the task. We show how low-level skills can be combined with large language models so that the language model provides high-level knowledge about the procedures for performing complex and temporally-extended instructions, while value functions associated with these skills provide the grounding necessary to connect this knowledge to a particular physical environment. We evaluate our method on a number of real-world robotic tasks, where we show the need for real-world grounding and that this approach is capable of completing long-horizon, abstract, natural language instructions on a mobile manipulator. The project's website and the video can be found at https://say-can.github.io/.
△ Less
Submitted 16 August, 2022; v1 submitted 4 April, 2022;
originally announced April 2022.
-
On the Galois-Gauss sums of weakly ramified characters
Authors:
Y. Kuang
Abstract:
Bley, Burns and Hahn used relative algebraic $K$-theory methods to formulate a precise conjectural link between the (second Adams-operator twisted) Galois-Gauss sums of weakly ramified Artin characters and the square root of the inverse different of finite, odd degree, Galois extensions of number fields. We provide concrete new evidence for this conjecture in the setting of extensions of odd prime…
▽ More
Bley, Burns and Hahn used relative algebraic $K$-theory methods to formulate a precise conjectural link between the (second Adams-operator twisted) Galois-Gauss sums of weakly ramified Artin characters and the square root of the inverse different of finite, odd degree, Galois extensions of number fields. We provide concrete new evidence for this conjecture in the setting of extensions of odd prime-power degree by using a refined version of a well-known result of Ullom.
△ Less
Submitted 9 March, 2023; v1 submitted 26 March, 2022;
originally announced March 2022.
-
Learning Robust Policy against Disturbance in Transition Dynamics via State-Conservative Policy Optimization
Authors:
Yufei Kuang,
Miao Lu,
Jie Wang,
Qi Zhou,
Bin Li,
Houqiang Li
Abstract:
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the…
▽ More
Deep reinforcement learning algorithms can perform poorly in real-world tasks due to the discrepancy between source and target environments. This discrepancy is commonly viewed as the disturbance in transition dynamics. Many existing algorithms learn robust policies by modeling the disturbance and applying it to source environments during training, which usually requires prior knowledge about the disturbance and control of simulators. However, these algorithms can fail in scenarios where the disturbance from target environments is unknown or is intractable to model in simulators. To tackle this problem, we propose a novel model-free actor-critic algorithm -- namely, state-conservative policy optimization (SCPO) -- to learn robust policies without modeling the disturbance in advance. Specifically, SCPO reduces the disturbance in transition dynamics to that in state space and then approximates it by a simple gradient-based regularizer. The appealing features of SCPO include that it is simple to implement and does not require additional knowledge about the disturbance or specially designed simulators. Experiments in several robot control tasks demonstrate that SCPO learns robust policies against the disturbance in transition dynamics.
△ Less
Submitted 20 December, 2021;
originally announced December 2021.
-
Roadmap on Signal Processing for Next Generation Measurement Systems
Authors:
D. K. Iakovidis,
M. Ooi,
Y. C. Kuang,
S. Demidenko,
A. Shestakov,
V. Sinitsin,
M. Henry,
A. Sciacchitano,
A. Discetti,
S. Donati,
M. Norgia,
A. Menychtas,
I. Maglogiannis,
S. C. Wriessnegger,
L. A. Barradas Chacon,
G. Dimas,
D. Filos,
A. H. Aletras,
J. Töger,
F. Dong,
S. Ren,
A. Uhl,
J. Paziewski,
J. Geng,
F. Fioranelli
, et al. (9 additional authors not shown)
Abstract:
Signal processing is a fundamental component of almost any sensor-enabled system, with a wide range of applications across different scientific disciplines. Time series data, images, and video sequences comprise representative forms of signals that can be enhanced and analysed for information extraction and quantification. The recent advances in artificial intelligence and machine learning are shi…
▽ More
Signal processing is a fundamental component of almost any sensor-enabled system, with a wide range of applications across different scientific disciplines. Time series data, images, and video sequences comprise representative forms of signals that can be enhanced and analysed for information extraction and quantification. The recent advances in artificial intelligence and machine learning are shifting the research attention towards intelligent, data-driven, signal processing. This roadmap presents a critical overview of the state-of-the-art methods and applications aiming to highlight future challenges and research opportunities towards next generation measurement systems. It covers a broad spectrum of topics ranging from basic to industrial research, organized in concise thematic sections that reflect the trends and the impacts of current and future developments per research field. Furthermore, it offers guidance to researchers and funding agencies in identifying new prospects.
△ Less
Submitted 28 January, 2022; v1 submitted 3 November, 2021;
originally announced November 2021.
-
Regularization of Complex Langevin Method
Authors:
Zhenning Cai,
Yang Kuang,
Hong Kiat Tan
Abstract:
The complex Langevin method, a numerical method used to compute the ensemble average with a complex partition function, often suffers from runaway instability. We study the regularization of the complex Langevin method via augmenting the action with a stabilization term. Since the regularization introduces biases to the numerical result, two approaches, named 2R and 3R methods, are introduced to r…
▽ More
The complex Langevin method, a numerical method used to compute the ensemble average with a complex partition function, often suffers from runaway instability. We study the regularization of the complex Langevin method via augmenting the action with a stabilization term. Since the regularization introduces biases to the numerical result, two approaches, named 2R and 3R methods, are introduced to recover the unbiased result. The 2R method supplements the regularization with regression to estimate the unregularized ensemble average, and the 3R method reduces the computational cost by coupling the regularization with a reweighting strategy before regression. Both methods can be generalized to the SU(n) theory and are assessed from several perspectives. Several numerical experiments in the lattice field theory are carried out to show the effectiveness of our approaches.
△ Less
Submitted 26 September, 2021;
originally announced September 2021.
-
CrowdDriven: A New Challenging Dataset for Outdoor Visual Localization
Authors:
Ara Jafarzadeh,
Manuel Lopez Antequera,
Pau Gargallo,
Yubin Kuang,
Carl Toft,
Fredrik Kahl,
Torsten Sattler
Abstract:
Visual localization is the problem of estimating the position and orientation from which a given image (or a sequence of images) is taken in a known scene. It is an important part of a wide range of computer vision and robotics applications, from self-driving cars to augmented/virtual reality systems. Visual localization techniques should work reliably and robustly under a wide range of conditions…
▽ More
Visual localization is the problem of estimating the position and orientation from which a given image (or a sequence of images) is taken in a known scene. It is an important part of a wide range of computer vision and robotics applications, from self-driving cars to augmented/virtual reality systems. Visual localization techniques should work reliably and robustly under a wide range of conditions, including seasonal, weather, illumination and man-made changes. Recent benchmarking efforts model this by providing images under different conditions, and the community has made rapid progress on these datasets since their inception. However, they are limited to a few geographical regions and often recorded with a single device. We propose a new benchmark for visual localization in outdoor scenes, using crowd-sourced data to cover a wide range of geographical regions and camera devices with a focus on the failure cases of current algorithms. Experiments with state-of-the-art localization approaches show that our dataset is very challenging, with all evaluated methods failing on its hardest parts. As part of the dataset release, we provide the tooling used to generate it, enabling efficient and effective 2D correspondence annotation to obtain reference poses.
△ Less
Submitted 9 September, 2021;
originally announced September 2021.
-
An entropic method for discrete systems with Gibbs entropy
Authors:
Zhenning Cai,
Jingwei Hu,
Yang Kuang,
Bo Lin
Abstract:
We consider general systems of ordinary differential equations with monotonic Gibbs entropy, and introduce an entropic scheme that simply imposes an entropy fix after every time step of any existing time integrator. It is proved that in the general case, our entropy fix has only infinitesimal influence on the numerical order of the original scheme, and in many circumstances, it can be shown that t…
▽ More
We consider general systems of ordinary differential equations with monotonic Gibbs entropy, and introduce an entropic scheme that simply imposes an entropy fix after every time step of any existing time integrator. It is proved that in the general case, our entropy fix has only infinitesimal influence on the numerical order of the original scheme, and in many circumstances, it can be shown that the scheme does not affect the numerical order. Numerical experiments on the linear Fokker-Planck equation and nonlinear Boltzmann equation are carried out to support our numerical analysis.
△ Less
Submitted 23 June, 2021;
originally announced June 2021.
-
AsyncTaichi: On-the-fly Inter-kernel Optimizations for Imperative and Spatially Sparse Programming
Authors:
Yuanming Hu,
Mingkuan Xu,
Ye Kuang,
Frédo Durand
Abstract:
Leveraging spatial sparsity has become a popular approach to accelerate 3D computer graphics applications. Spatially sparse data structures and efficient sparse kernels (such as parallel stencil operations on active voxels), are key to achieve high performance. Existing work focuses on improving performance within a single sparse computational kernel. We show that a system that looks beyond a sing…
▽ More
Leveraging spatial sparsity has become a popular approach to accelerate 3D computer graphics applications. Spatially sparse data structures and efficient sparse kernels (such as parallel stencil operations on active voxels), are key to achieve high performance. Existing work focuses on improving performance within a single sparse computational kernel. We show that a system that looks beyond a single kernel, plus additional domain-specific sparse data structure analysis, opens up exciting new space for optimizing sparse computations. Specifically, we propose a domain-specific data-flow graph model of imperative and sparse computation programs, which describes kernel relationships and enables easy analysis and optimization. Combined with an asynchronous execution engine that exposes a wide window of kernels, the inter-kernel optimizer can then perform effective sparse computation optimizations, such as eliminating unnecessary voxel list generations and removing voxel activation checks. These domain-specific optimizations further make way for classical general-purpose optimizations that are originally challenging to directly apply to computations with sparse data structures. Without any computational code modification, our new system leads to $4.02\times$ fewer kernel launches and $1.87\times$ speed up on our GPU benchmarks, including computations on Eulerian grids, Lagrangian particles, meshes, and automatic differentiation.
△ Less
Submitted 22 June, 2021; v1 submitted 15 December, 2020;
originally announced December 2020.
-
Second-order accurate BGK schemes for the special relativistic hydrodynamics with the Synge equation of state
Authors:
Yaping Chen,
Yangyu Kuang,
Huazhong Tang
Abstract:
This paper extends the second-order accurate BGK finite volume schemes for the ultra-relativistic flow simulations [5] to the 1D and 2D special relativistic hydrodynamics with the Synge equation of state. It is shown that such 2D schemes are very time-consuming due to the moment integrals (triple integrals) so that they are no longer practical. In view of this, the simplified BGK (sBGK) schemes ar…
▽ More
This paper extends the second-order accurate BGK finite volume schemes for the ultra-relativistic flow simulations [5] to the 1D and 2D special relativistic hydrodynamics with the Synge equation of state. It is shown that such 2D schemes are very time-consuming due to the moment integrals (triple integrals) so that they are no longer practical. In view of this, the simplified BGK (sBGK) schemes are presented by removing some terms in the approximate nonequilibrium distribution at the cell interface for the BGK scheme without loss of accuracy. They are practical because the moment integrals of the approximate distribution can be reduced to the single integrals by some coordinate transformations. The relations between the left and right states of the shock wave, rarefaction wave, and contact discontinuity are also discussed, so that the exact solution of the 1D Riemann problem could be derived and used for the numerical comparisons. Several numerical experiments are conducted to demonstrate that the proposed gas-kinetic schemes are accurate and stable. A comparison of the sBGK schemes with the BGK scheme in one dimension shows that the former performs almost the same as the latter in terms of the accuracy and resolution, but is much more efficiency.
△ Less
Submitted 2 November, 2020;
originally announced November 2020.
-
An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory
Authors:
Bin Gao,
Guanghui Hu,
Yang Kuang,
Xin Liu
Abstract:
All-electron calculations play an important role in density functional theory, in which improving computational efficiency is one of the most needed and challenging tasks. In the model formulations, both nonlinear eigenvalue problem and total energy minimization problem pursue orthogonal solutions. Most existing algorithms for solving these two models invoke orthogonalization process either explic…
▽ More
All-electron calculations play an important role in density functional theory, in which improving computational efficiency is one of the most needed and challenging tasks. In the model formulations, both nonlinear eigenvalue problem and total energy minimization problem pursue orthogonal solutions. Most existing algorithms for solving these two models invoke orthogonalization process either explicitly or implicitly in each iteration. Their efficiency suffers from this process in view of its cubic complexity and low parallel scalability in terms of the number of electrons for large scale systems. To break through this bottleneck, we propose an orthogonalization-free algorithm framework based on the total energy minimization problem. It is shown that the desired orthogonality can be gradually achieved without invoking orthogonalization in each iteration. Moreover, this framework fully consists of Basic Linear Algebra Subprograms (BLAS) operations and thus can be naturally parallelized. The global convergence of the proposed algorithm is established. We also present a precondition technique which can dramatically accelerate the convergence of the algorithm. The numerical experiments on all-electron calculations show the efficiency and high scalability of the proposed algorithm.
△ Less
Submitted 28 July, 2020;
originally announced July 2020.
-
On the validity of complex Langevin method for path integral computations
Authors:
Zhenning Cai,
Xiaoyu Dong,
Yang Kuang
Abstract:
The complex Langevin (CL) method is a classical numerical strategy to alleviate the numerical sign problem in the computation of lattice field theories. Mathematically, it is a simple numerical tool to compute a wide class of high-dimensional and oscillatory integrals. However, it is often observed that the CL method converges but the limiting result is incorrect. The literature has several unclea…
▽ More
The complex Langevin (CL) method is a classical numerical strategy to alleviate the numerical sign problem in the computation of lattice field theories. Mathematically, it is a simple numerical tool to compute a wide class of high-dimensional and oscillatory integrals. However, it is often observed that the CL method converges but the limiting result is incorrect. The literature has several unclear or even conflicting statements, making the method look mysterious. By an in-depth analysis of a model problem, we reveal the mechanism of how the CL result turns biased as the parameter changes, and it is demonstrated that such a transition is difficult to capture. Our analysis also shows that the method works for any observables only if the probability density function generated by the CL process is localized. To generalize such observations to lattice field theories, we formulate the CL method on general groups using rigorous mathematical languages for the first time, and we demonstrate that such localized probability density function does not exist in the simulation of lattice field theories for general compact groups, which explains the unstable behavior of the CL method. Fortunately, we also find that the gauge cooling technique creates additional velocity that helps confine the samples, so that we can still see localized probability density functions in certain cases, as significantly broadens the application of the CL method. The limitations of gauge cooling are also discussed. In particular, we prove that gauge cooling has no effect for Abelian groups, and we provide an example showing that biased results still exist when gauge cooling is insufficient to confine the probability density function.
△ Less
Submitted 5 November, 2020; v1 submitted 20 July, 2020;
originally announced July 2020.
-
Mathematical analysis and potential therapeutic implications of a novel HIV-1 model of basal and activated transcription in T-cells and macrophages
Authors:
Tin Phan,
Catherine DeMarino,
Fatah Kashanchi,
Yang Kuang,
Daniel M. Anderson,
Maria Emelianenko
Abstract:
HIV-1 affects tens of millions of people worldwide. Current treatments often involve a cocktail of antiretroviral drugs, which are effective in reducing the virus and extending life spans. However, there is currently no FDA-approved HIV-1 transcription inhibitor. Furthermore, there have only been a few attempts to model the transcription process in HIV-1. In this work, we extend a novel three-stat…
▽ More
HIV-1 affects tens of millions of people worldwide. Current treatments often involve a cocktail of antiretroviral drugs, which are effective in reducing the virus and extending life spans. However, there is currently no FDA-approved HIV-1 transcription inhibitor. Furthermore, there have only been a few attempts to model the transcription process in HIV-1. In this work, we extend a novel three-state model of HIV-1 transcription introduced in DeMarino et al. (2020) that has been developed and validated against experimental data. After fitting this model to in vitro data, significant differences in the transcription process of HIV-1 in T-cells and macrophages have been observed. In particular, the activation of the HIV-1 promoter in T-cells appears to take place rapidly as the Tat protein approaches a critical threshold. In contrast, the same process occurs smoother in macrophages.
In this work, we carry out systematic mathematical analyses of the model to complement experimental data fitting and sensitivity analysis performed earlier. We derive explicit solutions of the model to obtain exact transcription process decay rates for the original model and then study the effect of nonlinearity on the system behavior, including the existence and the local and global stability of the positive equilibrium. We were able to show the stability of the positive steady state in limiting cases, with the global stability in the general case remaining an open question.
By modeling the effect of transcription-inhibiting drug therapy, we provide a nontrivial condition for it to be effective in reducing viral load. Moreover, our numerical simulations and analysis point out that the effect of the transcription-inhibitor can be enhanced by synchronizing with standard treatments, such as combination antiretroviral therapy, to allow the reduction of total dosages and toxicity.
△ Less
Submitted 22 May, 2020;
originally announced May 2020.
-
To mask or not to mask: Modeling the potential for face mask use by the general public to curtail the COVID-19 pandemic
Authors:
Steffen E. Eikenberry,
Marina Mancuso,
Enahoro Iboi,
Tin Phan,
Keenan Eikenberry,
Yang Kuang,
Eric Kostelich,
Abba B. Gumel
Abstract:
Face mask use by the general public for limiting the spread of the COVID-19 pandemic is controversial, though increasingly recommended, and the potential of this intervention is not well understood. We develop a compartmental model for assessing the community-wide impact of mask use by the general, asymptomatic public, a portion of which may be asymptomatically infectious. Model simulations, using…
▽ More
Face mask use by the general public for limiting the spread of the COVID-19 pandemic is controversial, though increasingly recommended, and the potential of this intervention is not well understood. We develop a compartmental model for assessing the community-wide impact of mask use by the general, asymptomatic public, a portion of which may be asymptomatically infectious. Model simulations, using data relevant to COVID-19 dynamics in the US states of New York and Washington, suggest that broad adoption of even relatively ineffective face masks may meaningfully reduce community transmission of COVID-19 and decrease peak hospitalizations and deaths. Moreover, mask use decreases the effective transmission rate in nearly linear proportion to the product of mask effectiveness (as a fraction of potentially infectious contacts blocked) and coverage rate (as a fraction of the general population), while the impact on epidemiologic outcomes (death, hospitalizations) is highly nonlinear, indicating masks could synergize with other non-pharmaceutical measures. Masks are found to be useful with respect to both preventing illness in healthy persons and preventing asymptomatic transmission. Hypothetical mask adoption scenarios suggest that immediate near universal (80%) adoption of moderately (50%) effective masks could prevent on the order of 17--45% of projected deaths over two months in New York, while decreasing the peak daily death rate by 34--58%, absent other changes in epidemic dynamics. Our results suggest use of face masks by the general public is potentially of high value in curtailing community transmission and the burden of the pandemic. The community-wide benefits are likely to be greatest when face masks are used in conjunction with other non-pharmaceutical practices (such as social-distancing), and when adoption is nearly universal (nation-wide) and compliance is high.
△ Less
Submitted 7 April, 2020;
originally announced April 2020.