subscribe to arXiv mailings

Physics-informed active learning for accelerating quantum chemical simulations

Authors: Yi-Fan Hou, Lina Zhang, Quanhao Zhang, Fuchun Ge, Pavlo O. Dral

Abstract: Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable inves… ▽ More Quantum chemical simulations can be greatly accelerated by constructing machine learning potentials, which is often done using active learning (AL). The usefulness of the constructed potentials is often limited by the high effort required and their insufficient robustness in the simulations. Here we introduce the end-to-end AL for constructing robust data-efficient potentials with affordable investment of time and resources and minimum human interference. Our AL protocol is based on the physics-informed sampling of training points, automatic selection of initial data, uncertainty quantification, and convergence monitoring. The versatility of this protocol is shown in our implementation of quasi-classical molecular dynamics for simulating vibrational spectra, conformer search of a key biochemical molecule, and time-resolved mechanism of the Diels-Alder reactions. These investigations took us days instead of weeks of pure quantum chemical calculations on a high-performance computing cluster. The code in MLatom and tutorials are available at https://github.com/dralgroup/mlatom. △ Less

Submitted 16 July, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.06600 [pdf, other]

BEV2PR: BEV-Enhanced Visual Place Recognition with Structural Cues

Authors: Fudong Ge, Yiwei Zhang, Shuhan Shen, Yue Wang, Weiming Hu, Jin Gao

Abstract: In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about VPR: 1) For the methods based on both camera and LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data bet… ▽ More In this paper, we propose a new image-based visual place recognition (VPR) framework by exploiting the structural cues in bird's-eye view (BEV) from a single monocular camera. The motivation arises from two key observations about VPR: 1) For the methods based on both camera and LiDAR sensors, the integration of LiDAR in robotic systems has led to increased expenses, while the alignment of data between different sensors is also a major challenge. 2) Other image-/camera-based methods, involving integrating RGB images and their derived variants (e.g., pseudo depth images, pseudo 3D point clouds), exhibit several limitations, such as the failure to effectively exploit the explicit spatial relationships between different objects. To tackle the above issues, we design a new BEV-enhanced VPR framework, nemely BEV2PR, which can generate a composite descriptor with both visual cues and spatial awareness solely based on a single camera. For the visual cues, any popular aggregation module for RGB global features can be integrated into our framework. The key points lie in: 1) We use BEV segmentation features as an explicit source of structural knowledge in constructing global features. 2) The lower layers of the pre-trained backbone from BEV map generation are shared for visual and structural streams in VPR, facilitating the learning of fine-grained local features in the visual stream. 3) The complementary visual features and structural features can jointly enhance VPR performance. Our BEV2PR framework enables consistent performance improvements over several popular camera-based VPR aggregation modules when integrating them. The experiments on our collected VPR-NuScenes dataset demonstrate an absolute gain of 2.47% on Recall@1 for the strong Conv-AP baseline to achieve the best performance in our setting, and notably, a 18.06% gain on the hard set. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2310.20155 [pdf]

doi 10.1021/acs.jctc.3c01203

MLatom 3: Platform for machine learning-enhanced computational chemistry simulations and workflows

Authors: Pavlo O. Dral, Fuchun Ge, Yi-Fan Hou, Peikun Zheng, Yuxinxin Chen, Mario Barbatti, Olexandr Isayev, Cheng Wang, Bao-Xin Xue, Max Pinheiro Jr, Yuming Su, Yiheng Dai, Yangtao Chen, Lina Zhang, Shuang Zhang, Arif Ullah, Quanhao Zhang, Yanchi Ou

Abstract: Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provid… ▽ More Machine learning (ML) is increasingly becoming a common tool in computational chemistry. At the same time, the rapid development of ML methods requires a flexible software framework for designing custom workflows. MLatom 3 is a program package designed to leverage the power of ML to enhance typical computational chemistry simulations and to create complex workflows. This open-source package provides plenty of choice to the users who can run simulations with the command line options, input files, or with scripts using MLatom as a Python package, both on their computers and on the online XACS cloud computing at XACScloud.com. Computational chemists can calculate energies and thermochemical properties, optimize geometries, run molecular and quantum dynamics, and simulate (ro)vibrational, one-photon UV/vis absorption, and two-photon absorption spectra with ML, quantum mechanical, and combined models. The users can choose from an extensive library of methods containing pre-trained ML models and quantum mechanical approximations such as AIQM1 approaching coupled-cluster accuracy. The developers can build their own models using various ML algorithms. The great flexibility of MLatom is largely due to the extensive use of the interfaces to many state-of-the-art software packages and libraries. △ Less

Submitted 30 October, 2023; originally announced October 2023.

arXiv:2305.13699 [pdf, other]

Achieving Maximum Efficiency in Schnorr-based Multi-signature and Applications in Blockchain

Authors: Peng Zhang, Fa Ge, Yuhong Liu

Abstract: Multi-signature aggregates signatures from multiple users on the same message into a joint signature, which is widely applied in blockchain to reduce the percentage of signatures in blocks and improve the throughput of transactions. The $k$-sum attacks are one of the major challenges to design secure multi-signature schemes. In this work, we address $k$-sum attacks from a novel angle by defining a… ▽ More Multi-signature aggregates signatures from multiple users on the same message into a joint signature, which is widely applied in blockchain to reduce the percentage of signatures in blocks and improve the throughput of transactions. The $k$-sum attacks are one of the major challenges to design secure multi-signature schemes. In this work, we address $k$-sum attacks from a novel angle by defining a Public Third Party (PTP), which is an automatic process that can be verifiable by the public and restricts the signing phase from continuing until receiving commitments from all signers. Further, a two-round multi-signature scheme MEMS with PTP is proposed, which is secure based on discrete logarithm assumption in the random oracle model. As each signer communicates directly with the PTP instead of other co-signers, the total amount of communications is significantly reduced. In addition, as PTP participates in the computation of the aggregation and signing algorithms, the computation cost left for each signer and verifier remains the same as the basis Schnorr signature. To the best of our knowledge, this is the maximum efficiency that a Schnorr-based multi-signature scheme can achieve. Further, MEMS is applied in blockchain platform, e.g., Fabric, to improve the transaction efficiency. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2205.12662 [pdf, other]

DFM: Dialogue Foundation Model for Universal Large-Scale Dialogue-Oriented Task Learning

Authors: Zhi Chen, Jijia Bao, Lu Chen, Yuncong Liu, Da Ma, Bei Chen, Mengyue Wu, Su Zhu, Xin Dong, Fujiang Ge, Qingliang Miao, Jian-Guang Lou, Kai Yu

Abstract: Building a universal conversational agent has been a long-standing goal of the dialogue research community. Most previous works only focus on a small set of dialogue tasks. In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks. To achieve this goal, a large-scale well-annotated dialogue dataset with rich task diversity (Di… ▽ More Building a universal conversational agent has been a long-standing goal of the dialogue research community. Most previous works only focus on a small set of dialogue tasks. In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks. To achieve this goal, a large-scale well-annotated dialogue dataset with rich task diversity (DialogZoo) is collected. We introduce a framework to unify all dialogue tasks and propose novel auxiliary self-supervised tasks to achieve stable training of DFM on the highly diverse large scale DialogZoo corpus. Experiments show that, compared with models of the same size, DFM can achieve state-of-the-art or competitive performance on very rich cross-domain downstream dialogue tasks. This demonstrates that DFM largely extends the ability of unified dialogue pre-trained model. △ Less

Submitted 9 October, 2022; v1 submitted 25 May, 2022; originally announced May 2022.

Comments: Work in Progress

arXiv:2103.11118 [pdf, other]

Keywords Guided Method Name Generation

Authors: Fan Ge, Li Kuang

Abstract: High quality method names are descriptive and readable, which are helpful for code development and maintenance. The majority of recent research suggest method names based on the text summarization approach. They take the token sequence and abstract syntax tree of the source code as input, and generate method names through a powerful neural network based model. However, the tokens composing the met… ▽ More High quality method names are descriptive and readable, which are helpful for code development and maintenance. The majority of recent research suggest method names based on the text summarization approach. They take the token sequence and abstract syntax tree of the source code as input, and generate method names through a powerful neural network based model. However, the tokens composing the method name are closely related to the entity name within its method implementation. Actually, high proportions of the tokens in method name can be found in its corresponding method implementation, which makes it possible for incorporating these common shared token information to improve the performance of method naming task. Inspired by this key observation, we propose a two-stage keywords guided method name generation approach to suggest method names. Specifically, we decompose the method naming task into two subtasks, including keywords extraction task and method name generation task. For the keywords extraction task, we apply a graph neural network based model to extract the keywords from source code. For the method name generation task, we utilize the extracted keywords to guide the method name generation model. We apply a dual selective gate in encoder to control the information flow, and a dual attention mechanism in decoder to combine the semantics of input code sequence and keywords. Experiment results on an open source dataset demonstrate that keywords guidance can facilitate method naming task, which enables our model to outperform the competitive state-of-the-art models by margins of 1.5\%-3.5\% in ROUGE metrics. Especially when programs share one common token with method names, our approach improves the absolute ROUGE-1 score by 7.8\%. △ Less

Submitted 20 March, 2021; originally announced March 2021.

Comments: Research paper accepted at ICPC 2021 (29th IEEE/ACM International Conference on Program Comprehension)

arXiv:2008.12298 [pdf, other]

One Shot 3D Photography

Authors: Johannes Kopf, Kevin Matzen, Suhib Alsisan, Ocean Quigley, Francis Ge, Yangming Chong, Josh Patterson, Jan-Michael Frahm, Shu Wu, Matthew Yu, Peizhao Zhang, Zijian He, Peter Vajda, Ayush Saraf, Michael Cohen

Abstract: 3D photography is a new medium that allows viewers to more fully experience a captured moment. In this work, we refer to a 3D photo as one that displays parallax induced by moving the viewpoint (as opposed to a stereo pair with a fixed viewpoint). 3D photos are static in time, like traditional photos, but are displayed with interactive parallax on mobile or desktop screens, as well as on Virtual R… ▽ More 3D photography is a new medium that allows viewers to more fully experience a captured moment. In this work, we refer to a 3D photo as one that displays parallax induced by moving the viewpoint (as opposed to a stereo pair with a fixed viewpoint). 3D photos are static in time, like traditional photos, but are displayed with interactive parallax on mobile or desktop screens, as well as on Virtual Reality devices, where viewing it also includes stereo. We present an end-to-end system for creating and viewing 3D photos, and the algorithmic and design choices therein. Our 3D photos are captured in a single shot and processed directly on a mobile device. The method starts by estimating depth from the 2D input image using a new monocular depth estimation network that is optimized for mobile devices. It performs competitively to the state-of-the-art, but has lower latency and peak memory consumption and uses an order of magnitude fewer parameters. The resulting depth is lifted to a layered depth image, and new geometry is synthesized in parallax regions. We synthesize color texture and structures in the parallax regions as well, using an inpainting network, also optimized for mobile devices, on the LDI directly. Finally, we convert the result into a mesh-based representation that can be efficiently transmitted and rendered even on low-end devices and over poor network connections. Altogether, the processing takes just a few seconds on a mobile device, and the result can be instantly viewed and shared. We perform extensive quantitative evaluation to validate our system and compare its new components against the current state-of-the-art. △ Less

Submitted 1 September, 2020; v1 submitted 27 August, 2020; originally announced August 2020.

Comments: Project page: https://facebookresearch.github.io/one_shot_3d_photography/ Code: https://github.com/facebookresearch/one_shot_3d_photography

Journal ref: ACM Transactions on Graphics (Proceedings of SIGGRAPH 2020), Volume 39, Number 4, 2020

Showing 1–7 of 7 results for author: Ge, F