-
Efficient Evolutionary Search Over Chemical Space with Large Language Models
Authors:
Haorui Wang,
Marta Skreta,
Cher-Tian Ser,
Wenhao Gao,
Lingkai Kong,
Felix Strieth-Kalthoff,
Chenru Duan,
Yuchen Zhuang,
Yue Yu,
Yanqiao Zhu,
Yuanqi Du,
Alán Aspuru-Guzik,
Kirill Neklyudov,
Chao Zhang
Abstract:
Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations…
▽ More
Molecular discovery, when formulated as an optimization problem, presents significant computational challenges because optimization objectives can be non-differentiable. Evolutionary Algorithms (EAs), often used to optimize black-box objectives in molecular discovery, traverse chemical space by performing random mutations and crossovers, leading to a large number of expensive objective evaluations. In this work, we ameliorate this shortcoming by incorporating chemistry-aware Large Language Models (LLMs) into EAs. Namely, we redesign crossover and mutation operations in EAs using LLMs trained on large corpora of chemical information. We perform extensive empirical studies on both commercial and open-source models on multiple tasks involving property optimization, molecular rediscovery, and structure-based drug design, demonstrating that the joint usage of LLMs with EAs yields superior performance over all baseline models across single- and multi-objective settings. We demonstrate that our algorithm improves both the quality of the final solution and convergence speed, thereby reducing the number of required objective evaluations. Our code is available at http://github.com/zoom-wang112358/MOLLEO
△ Less
Submitted 2 July, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Diffusion Models as Constrained Samplers for Optimization with Unknown Constraints
Authors:
Lingkai Kong,
Yuanqi Du,
Wenhao Mu,
Kirill Neklyudov,
Valentin De Bortoli,
Haorui Wang,
Dongxia Wu,
Aaron Ferber,
Yi-An Ma,
Carla P. Gomes,
Chao Zhang
Abstract:
Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailable. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in pra…
▽ More
Addressing real-world optimization problems becomes particularly challenging when analytic objective functions or constraints are unavailable. While numerous studies have addressed the issue of unknown objectives, limited research has focused on scenarios where feasibility constraints are not given explicitly. Overlooking these constraints can lead to spurious solutions that are unrealistic in practice. To deal with such unknown constraints, we propose to perform optimization within the data manifold using diffusion models. To constrain the optimization process to the data manifold, we reformulate the original optimization problem as a sampling problem from the product of the Boltzmann distribution defined by the objective function and the data distribution learned by the diffusion model. To enhance sampling efficiency, we propose a two-stage framework that begins with a guided diffusion process for warm-up, followed by a Langevin dynamics stage for further correction. Theoretical analysis shows that the initial stage results in a distribution focused on feasible solutions, thereby providing a better initialization for the later stage. Comprehensive experiments on a synthetic dataset, six real-world black-box optimization datasets, and a multi-objective optimization dataset show that our method achieves better or comparable performance with previous state-of-the-art baselines.
△ Less
Submitted 29 April, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Structured Inverse-Free Natural Gradient: Memory-Efficient & Numerically-Stable KFAC
Authors:
Wu Lin,
Felix Dangel,
Runa Eschenhagen,
Kirill Neklyudov,
Agustinus Kristiadi,
Richard E. Turner,
Alireza Makhzani
Abstract:
Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-fre…
▽ More
Second-order methods such as KFAC can be useful for neural net training. However, they are often memory-inefficient since their preconditioning Kronecker factors are dense, and numerically unstable in low precision as they require matrix inversion or decomposition. These limitations render such methods unpopular for modern mixed-precision training. We address them by (i) formulating an inverse-free KFAC update and (ii) imposing structures in the Kronecker factors, resulting in structured inverse-free natural gradient descent (SINGD). On modern neural networks, we show that SINGD is memory-efficient and numerically robust, in contrast to KFAC, and often outperforms AdamW even in half precision. Our work closes a gap between first- and second-order methods in modern low-precision training.
△ Less
Submitted 15 June, 2024; v1 submitted 9 December, 2023;
originally announced December 2023.
-
A Computational Framework for Solving Wasserstein Lagrangian Flows
Authors:
Kirill Neklyudov,
Rob Brekelmans,
Alexander Tong,
Lazar Atanackovic,
Qiang Liu,
Alireza Makhzani
Abstract:
The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry (kinetic energy), and the regularization of density paths (potential energy). These combinations yield different variational problems (Lagrangians), encompassing many variations of the optimal transport problem such as the Schrödinger bridge, unbalanced optimal transport, and optim…
▽ More
The dynamical formulation of the optimal transport can be extended through various choices of the underlying geometry (kinetic energy), and the regularization of density paths (potential energy). These combinations yield different variational problems (Lagrangians), encompassing many variations of the optimal transport problem such as the Schrödinger bridge, unbalanced optimal transport, and optimal transport with physical constraints, among others. In general, the optimal density path is unknown, and solving these variational problems can be computationally challenging. We propose a novel deep learning based framework approaching all of these problems from a unified perspective. Leveraging the dual formulation of the Lagrangians, our method does not require simulating or backpropagating through the trajectories of the learned dynamics, and does not need access to optimal couplings. We showcase the versatility of the proposed framework by outperforming previous approaches for the single-cell trajectory inference, where incorporating prior knowledge into the dynamics is crucial for correct predictions.
△ Less
Submitted 3 July, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
Wasserstein Quantum Monte Carlo: A Novel Approach for Solving the Quantum Many-Body Schrödinger Equation
Authors:
Kirill Neklyudov,
Jannes Nys,
Luca Thiede,
Juan Carrasquilla,
Qiang Liu,
Max Welling,
Alireza Makhzani
Abstract:
Solving the quantum many-body Schrödinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wa…
▽ More
Solving the quantum many-body Schrödinger equation is a fundamental and challenging problem in the fields of quantum physics, quantum chemistry, and material sciences. One of the common computational approaches to this problem is Quantum Variational Monte Carlo (QVMC), in which ground-state solutions are obtained by minimizing the energy of the system within a restricted family of parameterized wave functions. Deep learning methods partially address the limitations of traditional QVMC by representing a rich family of wave functions in terms of neural networks. However, the optimization objective in QVMC remains notoriously hard to minimize and requires second-order optimization methods such as natural gradient. In this paper, we first reformulate energy functional minimization in the space of Born distributions corresponding to particle-permutation (anti-)symmetric wave functions, rather than the space of wave functions. We then interpret QVMC as the Fisher-Rao gradient flow in this distributional space, followed by a projection step onto the variational manifold. This perspective provides us with a principled framework to derive new QMC algorithms, by endowing the distributional space with better metrics, and following the projected gradient flow induced by those metrics. More specifically, we propose "Wasserstein Quantum Monte Carlo" (WQMC), which uses the gradient flow induced by the Wasserstein metric, rather than Fisher-Rao metric, and corresponds to transporting the probability mass, rather than teleporting it. We demonstrate empirically that the dynamics of WQMC results in faster convergence to the ground state of molecular systems.
△ Less
Submitted 26 October, 2023; v1 submitted 6 July, 2023;
originally announced July 2023.
-
Quantum HyperNetworks: Training Binary Neural Networks in Quantum Superposition
Authors:
Juan Carrasquilla,
Mohamed Hibat-Allah,
Estelle Inack,
Alireza Makhzani,
Kirill Neklyudov,
Graham W. Taylor,
Giacomo Torlai
Abstract:
Binary neural networks, i.e., neural networks whose parameters and activations are constrained to only two possible values, offer a compelling avenue for the deployment of deep learning models on energy- and memory-limited devices. However, their training, architectural design, and hyperparameter tuning remain challenging as these involve multiple computationally expensive combinatorial optimizati…
▽ More
Binary neural networks, i.e., neural networks whose parameters and activations are constrained to only two possible values, offer a compelling avenue for the deployment of deep learning models on energy- and memory-limited devices. However, their training, architectural design, and hyperparameter tuning remain challenging as these involve multiple computationally expensive combinatorial optimization problems. Here we introduce quantum hypernetworks as a mechanism to train binary neural networks on quantum computers, which unify the search over parameters, hyperparameters, and architectures in a single optimization loop. Through classical simulations, we demonstrate that of our approach effectively finds optimal parameters, hyperparameters and architectural choices with high probability on classification problems including a two-dimensional Gaussian dataset and a scaled-down version of the MNIST handwritten digits. We represent our quantum hypernetworks as variational quantum circuits, and find that an optimal circuit depth maximizes the probability of finding performant binary neural networks. Our unified approach provides an immense scope for other applications in the field of machine learning.
△ Less
Submitted 19 January, 2023;
originally announced January 2023.
-
Action Matching: Learning Stochastic Dynamics from Samples
Authors:
Kirill Neklyudov,
Rob Brekelmans,
Daniel Severo,
Alireza Makhzani
Abstract:
Learning the continuous dynamics of a system from snapshots of its temporal marginals is a problem which appears throughout natural sciences and machine learning, including in quantum systems, single-cell biological data, and generative modeling. In these settings, we assume access to cross-sectional samples that are uncorrelated over time, rather than full trajectories of samples. In order to bet…
▽ More
Learning the continuous dynamics of a system from snapshots of its temporal marginals is a problem which appears throughout natural sciences and machine learning, including in quantum systems, single-cell biological data, and generative modeling. In these settings, we assume access to cross-sectional samples that are uncorrelated over time, rather than full trajectories of samples. In order to better understand the systems under observation, we would like to learn a model of the underlying process that allows us to propagate samples in time and thereby simulate entire individual trajectories. In this work, we propose Action Matching, a method for learning a rich family of dynamics using only independent samples from its time evolution. We derive a tractable training objective, which does not rely on explicit assumptions about the underlying dynamics and does not require back-propagation through differential equations or optimal transport solvers. Inspired by connections with optimal transport, we derive extensions of Action Matching to learn stochastic differential equations and dynamics involving creation and destruction of probability mass. Finally, we showcase applications of Action Matching by achieving competitive performance in a diverse set of experiments from biology, physics, and generative modeling.
△ Less
Submitted 8 June, 2023; v1 submitted 12 October, 2022;
originally announced October 2022.
-
Particle Dynamics for Learning EBMs
Authors:
Kirill Neklyudov,
Priyank Jaini,
Max Welling
Abstract:
Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such samplin…
▽ More
Energy-based modeling is a promising approach to unsupervised learning, which yields many downstream applications from a single model. The main difficulty in learning energy-based models with the "contrastive approaches" is the generation of samples from the current energy function at each iteration. Many advances have been made to accomplish this subroutine cheaply. Nevertheless, all such sampling paradigms run MCMC targeting the current model, which requires infinitely long chains to generate samples from the true energy distribution and is problematic in practice. This paper proposes an alternative approach to getting these samples and avoiding crude MCMC sampling from the current model. We accomplish this by viewing the evolution of the modeling distribution as (i) the evolution of the energy function, and (ii) the evolution of the samples from this distribution along some vector field. We subsequently derive this time-dependent vector field such that the particles following this field are approximately distributed as the current density model. Thereby we match the evolution of the particles with the evolution of the energy function prescribed by the learning procedure. Importantly, unlike Monte Carlo sampling, our method targets to match the current distribution in a finite time. Finally, we demonstrate its effectiveness empirically compared to MCMC-based learning methods.
△ Less
Submitted 26 November, 2021;
originally announced November 2021.
-
Deterministic Gibbs Sampling via Ordinary Differential Equations
Authors:
Kirill Neklyudov,
Roberto Bondesan,
Max Welling
Abstract:
Deterministic dynamics is an essential part of many MCMC algorithms, e.g. Hybrid Monte Carlo or samplers utilizing normalizing flows. This paper presents a general construction of deterministic measure-preserving dynamics using autonomous ODEs and tools from differential geometry. We show how Hybrid Monte Carlo and other deterministic samplers follow as special cases of our theory. We then demonst…
▽ More
Deterministic dynamics is an essential part of many MCMC algorithms, e.g. Hybrid Monte Carlo or samplers utilizing normalizing flows. This paper presents a general construction of deterministic measure-preserving dynamics using autonomous ODEs and tools from differential geometry. We show how Hybrid Monte Carlo and other deterministic samplers follow as special cases of our theory. We then demonstrate the utility of our approach by constructing a continuous non-sequential version of Gibbs sampling in terms of an ODE flow and extending it to discrete state spaces. We find that our deterministic samplers are more sample efficient than stochastic counterparts, even if the latter generate independent samples.
△ Less
Submitted 18 June, 2021;
originally announced June 2021.
-
Orbital MCMC
Authors:
Kirill Neklyudov,
Max Welling
Abstract:
Markov Chain Monte Carlo (MCMC) algorithms ubiquitously employ complex deterministic transformations to generate proposal points that are then filtered by the Metropolis-Hastings-Green (MHG) test. However, the condition of the target measure invariance puts restrictions on the design of these transformations. In this paper, we first derive the acceptance test for the stochastic Markov kernel consi…
▽ More
Markov Chain Monte Carlo (MCMC) algorithms ubiquitously employ complex deterministic transformations to generate proposal points that are then filtered by the Metropolis-Hastings-Green (MHG) test. However, the condition of the target measure invariance puts restrictions on the design of these transformations. In this paper, we first derive the acceptance test for the stochastic Markov kernel considering arbitrary deterministic maps as proposal generators. When applied to the transformations with orbits of period two (involutions), the test reduces to the MHG test. Based on the derived test we propose two practical algorithms: one operates by constructing periodic orbits from any diffeomorphism, another on contractions of the state space (such as optimization trajectories). Finally, we perform an empirical study demonstrating the practical advantages of both kernels.
△ Less
Submitted 7 June, 2021; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Involutive MCMC: a Unifying Framework
Authors:
Kirill Neklyudov,
Max Welling,
Evgenii Egorov,
Dmitry Vetrov
Abstract:
Markov Chain Monte Carlo (MCMC) is a computational approach to fundamental problems such as inference, integration, optimization, and simulation. The field has developed a broad spectrum of algorithms, varying in the way they are motivated, the way they are applied and how efficiently they sample. Despite all the differences, many of them share the same core principle, which we unify as the Involu…
▽ More
Markov Chain Monte Carlo (MCMC) is a computational approach to fundamental problems such as inference, integration, optimization, and simulation. The field has developed a broad spectrum of algorithms, varying in the way they are motivated, the way they are applied and how efficiently they sample. Despite all the differences, many of them share the same core principle, which we unify as the Involutive MCMC (iMCMC) framework. Building upon this, we describe a wide range of MCMC algorithms in terms of iMCMC, and formulate a number of "tricks" which one can use as design principles for developing new MCMC algorithms. Thus, iMCMC provides a unified view of many known MCMC algorithms, which facilitates the derivation of powerful extensions. We demonstrate the latter with two examples where we transform known reversible MCMC algorithms into more efficient irreversible ones.
△ Less
Submitted 30 June, 2020;
originally announced June 2020.
-
The Implicit Metropolis-Hastings Algorithm
Authors:
Kirill Neklyudov,
Evgenii Egorov,
Dmitry Vetrov
Abstract:
Recent works propose using the discriminator of a GAN to filter out unrealistic samples of the generator. We generalize these ideas by introducing the implicit Metropolis-Hastings algorithm. For any implicit probabilistic model and a target distribution represented by a set of samples, implicit Metropolis-Hastings operates by learning a discriminator to estimate the density-ratio and then generati…
▽ More
Recent works propose using the discriminator of a GAN to filter out unrealistic samples of the generator. We generalize these ideas by introducing the implicit Metropolis-Hastings algorithm. For any implicit probabilistic model and a target distribution represented by a set of samples, implicit Metropolis-Hastings operates by learning a discriminator to estimate the density-ratio and then generating a chain of samples. Since the approximation of density ratio introduces an error on every step of the chain, it is crucial to analyze the stationary distribution of such chain. For that purpose, we present a theoretical result stating that the discriminator loss upper bounds the total variation distance between the target distribution and the stationary distribution. Finally, we validate the proposed algorithm both for independent and Markov proposals on CIFAR-10 and CelebA datasets.
△ Less
Submitted 9 June, 2019;
originally announced June 2019.
-
Metropolis-Hastings view on variational inference and adversarial training
Authors:
Kirill Neklyudov,
Evgenii Egorov,
Pavel Shvechikov,
Dmitry Vetrov
Abstract:
A significant part of MCMC methods can be considered as the Metropolis-Hastings (MH) algorithm with different proposal distributions. From this point of view, the problem of constructing a sampler can be reduced to the question - how to choose a proposal for the MH algorithm? To address this question, we propose to learn an independent sampler that maximizes the acceptance rate of the MH algorithm…
▽ More
A significant part of MCMC methods can be considered as the Metropolis-Hastings (MH) algorithm with different proposal distributions. From this point of view, the problem of constructing a sampler can be reduced to the question - how to choose a proposal for the MH algorithm? To address this question, we propose to learn an independent sampler that maximizes the acceptance rate of the MH algorithm, which, as we demonstrate, is highly related to the conventional variational inference. For Bayesian inference, the proposed method compares favorably against alternatives to sample from the posterior distribution. Under the same approach, we step beyond the scope of classical MCMC methods and deduce the Generative Adversarial Networks (GANs) framework from scratch, treating the generator as the proposal and the discriminator as the acceptance test. On real-world datasets, we improve Frechet Inception Distance and Inception Score, using different GANs as a proposal distribution for the MH algorithm. In particular, we demonstrate improvements of recently proposed BigGAN model on ImageNet.
△ Less
Submitted 9 June, 2019; v1 submitted 16 October, 2018;
originally announced October 2018.
-
Uncertainty Estimation via Stochastic Batch Normalization
Authors:
Andrei Atanov,
Arsenii Ashukha,
Dmitry Molchanov,
Kirill Neklyudov,
Dmitry Vetrov
Abstract:
In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally ineff…
▽ More
In this work, we investigate Batch Normalization technique and propose its probabilistic interpretation. We propose a probabilistic model and show that Batch Normalization maximazes the lower bound of its marginalized log-likelihood. Then, according to the new probabilistic model, we design an algorithm which acts consistently during train and test. However, inference becomes computationally inefficient. To reduce memory and computational cost, we propose Stochastic Batch Normalization -- an efficient approximation of proper inference procedure. This method provides us with a scalable uncertainty estimation technique. We demonstrate the performance of Stochastic Batch Normalization on popular architectures (including deep convolutional architectures: VGG-like and ResNets) for MNIST and CIFAR-10 datasets.
△ Less
Submitted 20 March, 2018; v1 submitted 13 February, 2018;
originally announced February 2018.