Skip to main content

Showing 1–25 of 25 results for author: Likhosherstov, V

  1. arXiv:2405.03807  [pdf, other

    cs.RO cs.LG

    UniGen: Unified Modeling of Initial Agent States and Trajectories for Generating Autonomous Driving Scenarios

    Authors: Reza Mahjourian, Rongbing Mu, Valerii Likhosherstov, Paul Mougin, Xiukun Huang, Joao Messias, Shimon Whiteson

    Abstract: This paper introduces UniGen, a novel approach to generating new traffic scenarios for evaluating and improving autonomous driving software through simulation. Our approach models all driving scenario elements in a unified model: the position of new agents, their initial state, and their future motion trajectories. By predicting the distributions of all these variables from a shared global scenari… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted at ICRA 2024

  2. arXiv:2310.13225  [pdf, other

    cs.LG cs.AI

    Scalable Neural Network Kernels

    Authors: Arijit Sehanobish, Krzysztof Choromanski, Yunfan Zhao, Avinava Dubey, Valerii Likhosherstov

    Abstract: We introduce the concept of scalable neural network kernels (SNNKs), the replacements of regular feedforward layers (FFLs), capable of approximating the latter, but with favorable computational properties. SNNKs effectively disentangle the inputs from the parameters of the neural network in the FFL, only to connect them in the final computation via the dot-product kernel. They are also strictly mo… ▽ More

    Submitted 5 March, 2024; v1 submitted 19 October, 2023; originally announced October 2023.

    Comments: ICLR 2024

  3. arXiv:2302.01925  [pdf, other

    cs.LG

    Learning a Fourier Transform for Linear Relative Positional Encodings in Transformers

    Authors: Krzysztof Marcin Choromanski, Shanda Li, Valerii Likhosherstov, Kumar Avinava Dubey, Shengjie Luo, Di He, Yiming Yang, Tamas Sarlos, Thomas Weingarten, Adrian Weller

    Abstract: We propose a new class of linear Transformers called FourierLearner-Transformers (FLTs), which incorporate a wide range of relative positional encoding mechanisms (RPEs). These include regular RPE techniques applied for sequential data, as well as novel RPEs operating on geometric data embedded in higher-dimensional Euclidean spaces. FLTs construct the optimal RPE mechanism implicitly by learning… ▽ More

    Submitted 3 April, 2024; v1 submitted 3 February, 2023; originally announced February 2023.

    Comments: AISTATS 2024

  4. arXiv:2302.00942  [pdf, other

    cs.LG

    Efficient Graph Field Integrators Meet Point Clouds

    Authors: Krzysztof Choromanski, Arijit Sehanobish, Han Lin, Yunfan Zhao, Eli Berger, Tetiana Parshakova, Alvin Pan, David Watkins, Tianyi Zhang, Valerii Likhosherstov, Somnath Basu Roy Chowdhury, Avinava Dubey, Deepali Jain, Tamas Sarlos, Snigdha Chaturvedi, Adrian Weller

    Abstract: We present two new classes of algorithms for efficient field integration on graphs encoding point clouds. The first class, SeparatorFactorization(SF), leverages the bounded genus of point cloud mesh graphs, while the second class, RFDiffusion(RFD), uses popular epsilon-nearest-neighbor graph representations for point clouds. Both can be viewed as providing the functionality of Fast Multipole Metho… ▽ More

    Submitted 4 October, 2023; v1 submitted 2 February, 2023; originally announced February 2023.

    Journal ref: ICML 2023

  5. arXiv:2302.00787  [pdf, other

    cs.LG stat.ML

    FAVOR#: Sharp Attention Kernel Approximations via New Classes of Positive Random Features

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

    Abstract: The problem of efficient approximation of a linear operator induced by the Gaussian or softmax kernel is often addressed using random features (RFs) which yield an unbiased approximation of the operator's result. Such operators emerge in important applications ranging from kernel methods to efficient Transformers. We propose parameterized, positive, non-trigonometric RFs which approximate Gaussian… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

  6. arXiv:2301.13856  [pdf, other

    stat.ML cs.LG

    Simplex Random Features

    Authors: Isaac Reid, Krzysztof Choromanski, Valerii Likhosherstov, Adrian Weller

    Abstract: We present Simplex Random Features (SimRFs), a new random feature (RF) mechanism for unbiased approximation of the softmax and Gaussian kernels by geometrical correlation of random projection vectors. We prove that SimRFs provide the smallest possible mean square error (MSE) on unbiased estimates of these kernels among the class of weight-independent geometrically-coupled positive random feature (… ▽ More

    Submitted 7 October, 2023; v1 submitted 31 January, 2023; originally announced January 2023.

  7. arXiv:2301.13195  [pdf, other

    cs.LG cs.AI cs.CV

    Adaptive Computation with Elastic Input Sequence

    Authors: Fuzhao Xue, Valerii Likhosherstov, Anurag Arnab, Neil Houlsby, Mostafa Dehghani, Yang You

    Abstract: Humans have the ability to adapt the type of information they use, the procedure they employ, and the amount of time they spend when solving problems. However, most standard neural networks have a fixed function type and computation budget regardless of the sample's nature or difficulty. Adaptivity is a powerful paradigm as it not only imbues practitioners with flexibility pertaining to the downst… ▽ More

    Submitted 3 June, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  8. arXiv:2211.14312  [pdf

    q-bio.QM cs.CV cs.LG eess.IV

    Karyotype AI for Precision Oncology

    Authors: Zahra Shamsi, Drew Bryant, Jacob Wilson, Xiaoyu Qu, Avinava Dubey, Konik Kothari, Mostafa Dehghani, Mariya Chavarha, Valerii Likhosherstov, Brian Williams, Michael Frumkin, Fred Appelbaum, Krzysztof Choromanski, Ali Bashir, Min Fang

    Abstract: Chromosome analysis is essential for diagnosing genetic disorders. For hematologic malignancies, identification of somatic clonal aberrations by karyotype analysis remains the standard of care. However, karyotyping is costly and time-consuming because of the largely manual process and the expertise required in identifying and annotating aberrations. Efforts to automate karyotype analysis to date f… ▽ More

    Submitted 19 October, 2023; v1 submitted 19 November, 2022; originally announced November 2022.

  9. arXiv:2205.15317  [pdf, other

    cs.LG cs.AI

    Chefs' Random Tables: Non-Trigonometric Random Features

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Avinava Dubey, Frederick Liu, Tamas Sarlos, Adrian Weller

    Abstract: We introduce chefs' random tables (CRTs), a new class of non-trigonometric random features (RFs) to approximate Gaussian and softmax kernels. CRTs are an alternative to standard random kitchen sink (RKS) methods, which inherently rely on the trigonometric maps. We present variants of CRTs where RFs are positive, a key requirement for applications in recent low-rank Transformers. Further variance r… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  10. arXiv:2111.12993  [pdf, other

    cs.CV cs.LG

    PolyViT: Co-training Vision Transformers on Images, Videos and Audio

    Authors: Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

    Abstract: Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters? We present PolyViT, a model trained on image, audio and video which answers this question. By co-training different tasks on a single modality, we are able to improve the accuracy of each individual task and achieve state-of-the-art results on 5 sta… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  11. arXiv:2110.04367  [pdf, other

    cs.LG stat.ML

    Hybrid Random Features

    Authors: Krzysztof Choromanski, Haoxian Chen, Han Lin, Yuanzhe Ma, Arijit Sehanobish, Deepali Jain, Michael S Ryoo, Jake Varley, Andy Zeng, Valerii Likhosherstov, Dmitry Kalashnikov, Vikas Sindhwani, Adrian Weller

    Abstract: We propose a new class of random feature methods for linearizing softmax and Gaussian kernels called hybrid random features (HRFs) that automatically adapt the quality of kernel estimation to provide most accurate approximation in the defined regions of interest. Special instantiations of HRFs lead to well-known methods such as trigonometric (Rahimi and Recht, 2007) or (recently introduced in the… ▽ More

    Submitted 30 January, 2022; v1 submitted 8 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at ICLR 2022

  12. arXiv:2107.07999  [pdf, other

    cs.LG cs.AI

    From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

    Authors: Krzysztof Choromanski, Han Lin, Haoxian Chen, Tianyi Zhang, Arijit Sehanobish, Valerii Likhosherstov, Jack Parker-Holder, Tamas Sarlos, Adrian Weller, Thomas Weingarten

    Abstract: In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPE-attention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topo… ▽ More

    Submitted 27 March, 2023; v1 submitted 16 July, 2021; originally announced July 2021.

    Comments: 20 pages, 12 figures

  13. arXiv:2106.03764  [pdf, other

    cs.LG

    On the Expressive Power of Self-Attention Matrices

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Adrian Weller

    Abstract: Transformer networks are able to capture patterns in data coming from many domains (text, images, videos, proteins, etc.) with little or no change to architecture components. We perform a theoretical analysis of the core component responsible for signal propagation between elements, i.e. the self-attention matrix. In practice, this matrix typically exhibits two properties: (1) it is sparse, meanin… ▽ More

    Submitted 8 June, 2021; v1 submitted 7 June, 2021; originally announced June 2021.

  14. arXiv:2106.02487  [pdf, other

    cs.LG

    Debiasing a First-order Heuristic for Approximate Bi-level Optimization

    Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

    Abstract: Approximate bi-level optimization (ABLO) consists of (outer-level) optimization problems, involving numerical (inner-level) optimization loops. While ABLO has many applications across deep learning, it suffers from time and memory complexity proportional to the length $r$ of its inner optimization loop. To address this complexity, an earlier first-order method (FOM) was proposed as a heuristic tha… ▽ More

    Submitted 8 June, 2021; v1 submitted 4 June, 2021; originally announced June 2021.

    Comments: Proceedings of the 38th International Conference on Machine Learning, PMLR 139, 2021. arXiv admin note: text overlap with arXiv:2006.03631

  15. arXiv:2102.04353  [pdf, other

    cs.LG cs.AI cs.CV cs.RO

    Unlocking Pixels for Reinforcement Learning via Implicit Attention

    Authors: Krzysztof Marcin Choromanski, Deepali Jain, Wenhao Yu, Xingyou Song, Jack Parker-Holder, Tingnan Zhang, Valerii Likhosherstov, Aldo Pacchiano, Anirban Santara, Yunhao Tang, Jie Tan, Adrian Weller

    Abstract: There has recently been significant interest in training reinforcement learning (RL) agents in vision-based environments. This poses many challenges, such as high dimensionality and the potential for observational overfitting through spurious correlations. A promising approach to solve both of these problems is an attention bottleneck, which provides a simple and effective framework for learning h… ▽ More

    Submitted 1 October, 2021; v1 submitted 8 February, 2021; originally announced February 2021.

  16. arXiv:2012.11346  [pdf, other

    cs.LG

    Sub-Linear Memory: How to Make Performers SLiM

    Authors: Valerii Likhosherstov, Krzysztof Choromanski, Jared Davis, Xingyou Song, Adrian Weller

    Abstract: The Transformer architecture has revolutionized deep learning on sequential data, becoming ubiquitous in state-of-the-art solutions for a wide variety of applications. Yet vanilla Transformers are notoriously resource-expensive, requiring $O(L^2)$ in serial time and memory as functions of input length $L$. Recent works proposed various linear self-attention mechanisms, scaling only as $O(L)$ for s… ▽ More

    Submitted 21 December, 2020; originally announced December 2020.

  17. arXiv:2009.14794  [pdf, other

    cs.LG cs.CL stat.ML

    Rethinking Attention with Performers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: We introduce Performers, Transformer architectures which can estimate regular (softmax) full-rank-attention Transformers with provable accuracy, but using only linear (as opposed to quadratic) space and time complexity, without relying on any priors such as sparsity or low-rankness. To approximate softmax attention-kernels, Performers use a novel Fast Attention Via positive Orthogonal Random featu… ▽ More

    Submitted 19 November, 2022; v1 submitted 30 September, 2020; originally announced September 2020.

    Comments: Published as a conference paper + oral presentation at ICLR 2021. 38 pages. See https://github.com/google-research/google-research/tree/master/protein_lm for protein language model code, and https://github.com/google-research/google-research/tree/master/performer for Performer code. See https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html for Google AI Blog

  18. arXiv:2006.11421  [pdf, other

    cs.LG math.CA math.DS math.OC stat.ML

    An Ode to an ODE

    Authors: Krzysztof Choromanski, Jared Quincy Davis, Valerii Likhosherstov, Xingyou Song, Jean-Jacques Slotine, Jacob Varley, Honglak Lee, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the orthogonal group O(d). This nested system of two flows, where the parameter-flow is constrained to lie on the compact manifold, provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem wh… ▽ More

    Submitted 22 June, 2020; v1 submitted 19 June, 2020; originally announced June 2020.

    Comments: 20 pages, 9 figures

  19. arXiv:2006.03631  [pdf, other

    cs.LG math.OC stat.ML

    UFO-BLO: Unbiased First-Order Bilevel Optimization

    Authors: Valerii Likhosherstov, Xingyou Song, Krzysztof Choromanski, Jared Davis, Adrian Weller

    Abstract: Bilevel optimization (BLO) is a popular approach with many applications including hyperparameter optimization, neural architecture search, adversarial robustness and model-agnostic meta-learning. However, the approach suffers from time and memory complexity proportional to the length $r$ of its inner optimization loop, which has led to several modifications being proposed. One such modification is… ▽ More

    Submitted 7 June, 2021; v1 submitted 5 June, 2020; originally announced June 2020.

  20. arXiv:2006.03555  [pdf, other

    cs.LG cs.CL stat.ML

    Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers

    Authors: Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, David Belanger, Lucy Colwell, Adrian Weller

    Abstract: Transformer models have achieved state-of-the-art results across a diverse range of domains. However, concern over the cost of training the attention mechanism to learn complex dependencies between distant inputs continues to grow. In response, solutions that exploit the structure and sparsity of the learned attention matrix have blossomed. However, real-world applications that involve long sequen… ▽ More

    Submitted 30 September, 2020; v1 submitted 5 June, 2020; originally announced June 2020.

    Comments: This arXiv submission has been deprecated. Please see "Rethinking Attention with Performers" at arXiv:2009.14794 for the most updated version of the paper

  21. arXiv:2004.08675  [pdf, other

    cs.LG stat.ML

    CWY Parametrization: a Solution for Parallelized Optimization of Orthogonal and Stiefel Matrices

    Authors: Valerii Likhosherstov, Jared Davis, Krzysztof Choromanski, Adrian Weller

    Abstract: We introduce an efficient approach for optimization over orthogonal groups on highly parallel computation units such as GPUs or TPUs. As in earlier work, we parametrize an orthogonal matrix as a product of Householder reflections. However, to overcome low parallelization capabilities of computing Householder reflections sequentially, we propose employing an accumulation scheme called the compact W… ▽ More

    Submitted 16 February, 2021; v1 submitted 18 April, 2020; originally announced April 2020.

    Comments: Proceedings of the 24th International Conference on Artificial Intelligence and Statistics (AISTATS) 2021, San Diego, California, USA. PMLR: Volume 130. Copyright 2021 by the author(s)

  22. arXiv:2003.13563  [pdf, other

    cs.LG stat.ML

    Stochastic Flows and Geometric Optimization on the Orthogonal Group

    Authors: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

    Abstract: We present a new class of stochastic, geometrically-driven optimization algorithms on the orthogonal group $O(d)$ and naturally reductive homogeneous manifolds obtained from the action of the rotation group $SO(d)$. We theoretically and experimentally demonstrate that our methods can be applied in various fields of machine learning including deep, convolutional and recurrent neural networks, reinf… ▽ More

    Submitted 30 March, 2020; originally announced March 2020.

  23. arXiv:1910.11142  [pdf, other

    cs.DS math.ST physics.data-an stat.ML

    Tractable Minor-free Generalization of Planar Zero-field Ising Models

    Authors: Valerii Likhosherstov, Yury Maximov, Michael Chertkov

    Abstract: We present a new family of zero-field Ising models over $N$ binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components and subsets of at most three vertices into a tree. The polynomial-time algorithm of the dynamic programming type for solving exact inference (computing partition function) and exact sampling (generating i.i.d. samples) consists in a sequential ap… ▽ More

    Submitted 22 October, 2019; originally announced October 2019.

    Comments: 32 pages. arXiv admin note: substantial text overlap with arXiv:1906.06431, arXiv:1812.09587

    Journal ref: Journal of Statistical Mechanics: Theory and Experiment, 2020

  24. arXiv:1906.06431  [pdf, other

    cs.DS cond-mat.stat-mech physics.data-an stat.CO

    A New Family of Tractable Ising Models

    Authors: Valerii Likhosherstov, Yury Maximov, Michael Chertkov

    Abstract: We present a new family of zero-field Ising models over N binary variables/spins obtained by consecutive "gluing" of planar and $O(1)$-sized components along with subsets of at most three vertices into a tree. The polynomial time algorithm of the dynamic programming type for solving exact inference (partition function computation) and sampling consists of a sequential application of an efficient (… ▽ More

    Submitted 14 June, 2019; originally announced June 2019.

  25. arXiv:1812.09587  [pdf, other

    stat.CO cond-mat.stat-mech cs.LG stat.ML

    Inference and Sampling of $K_{33}$-free Ising Models

    Authors: Valerii Likhosherstov, Yury Maximov, Michael Chertkov

    Abstract: We call an Ising model tractable when it is possible to compute its partition function value (statistical inference) in polynomial time. The tractability also implies an ability to sample configurations of this model in polynomial time. The notion of tractability extends the basic case of planar zero-field Ising models. Our starting point is to describe algorithms for the basic case computing part… ▽ More

    Submitted 21 May, 2019; v1 submitted 22 December, 2018; originally announced December 2018.

    Comments: 20 pages

    Journal ref: 36-th International Conference on Machine Learning, PMLR 97, 2019