Skip to main content

Showing 1–10 of 10 results for author: Razin, N

  1. arXiv:2402.07875  [pdf, other

    cs.LG cs.AI eess.SY stat.ML

    Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

    Authors: Noam Razin, Yotam Alexander, Edo Cohen-Karlik, Raja Giryes, Amir Globerson, Nadav Cohen

    Abstract: In modern machine learning, models can often fit training data in numerous ways, some of which perform well on unseen (test) data, while others do not. Remarkably, in such cases gradient descent frequently exhibits an implicit bias that leads to excellent performance on unseen data. This implicit bias was extensively studied in supervised learning, but is far less understood in optimal control (re… ▽ More

    Submitted 1 June, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted to ICML 2024

  2. arXiv:2310.20703  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    Vanishing Gradients in Reinforcement Finetuning of Language Models

    Authors: Noam Razin, Hattie Zhou, Omid Saremi, Vimal Thilak, Arwen Bradley, Preetum Nakkiran, Joshua Susskind, Etai Littwin

    Abstract: Pretrained language models are commonly aligned with human preferences and downstream tasks via reinforcement finetuning (RFT), which refers to maximizing a (possibly learned) reward function using policy gradient algorithms. This work identifies a fundamental optimization obstacle in RFT: we prove that the expected gradient for an input vanishes when its reward standard deviation under the model… ▽ More

    Submitted 14 March, 2024; v1 submitted 31 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  3. arXiv:2310.16028  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    What Algorithms can Transformers Learn? A Study in Length Generalization

    Authors: Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, Preetum Nakkiran

    Abstract: Large language models exhibit surprising emergent generalization properties, yet also struggle on many simple reasoning tasks such as arithmetic and parity. This raises the question of if and when Transformer models can learn the true algorithm for solving a task. We study the scope of Transformers' abilities in the specific setting of length generalization on algorithmic tasks. Here, we propose a… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Preprint

  4. arXiv:2303.11249  [pdf, other

    cs.LG cs.AI quant-ph

    What Makes Data Suitable for a Locally Connected Neural Network? A Necessary and Sufficient Condition Based on Quantum Entanglement

    Authors: Yotam Alexander, Nimrod De La Vega, Noam Razin, Nadav Cohen

    Abstract: The question of what makes a data distribution suitable for deep learning is a fundamental open problem. Focusing on locally connected neural networks (a prevalent family of architectures that includes convolutional and recurrent neural networks as well as local self-attention models), we address this problem by adopting theoretical tools from quantum physics. Our main theoretical result states th… ▽ More

    Submitted 21 January, 2024; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted to NeurIPS 2023

  5. arXiv:2211.16494  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    On the Ability of Graph Neural Networks to Model Interactions Between Vertices

    Authors: Noam Razin, Tom Verbin, Nadav Cohen

    Abstract: Graph neural networks (GNNs) are widely used for modeling complex interactions between entities represented as vertices of a graph. Despite recent efforts to theoretically analyze the expressive power of GNNs, a formal characterization of their ability to model interactions is lacking. The current paper aims to address this gap. Formalizing strength of interactions through an established measure k… ▽ More

    Submitted 23 October, 2023; v1 submitted 29 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS 2023

  6. arXiv:2201.11729  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks

    Authors: Noam Razin, Asaf Maman, Nadav Cohen

    Abstract: In the pursuit of explaining implicit regularization in deep learning, prominent focus was given to matrix and tensor factorizations, which correspond to simplified neural networks. It was shown that these models exhibit an implicit tendency towards low matrix and tensor ranks, respectively. Drawing closer to practical deep learning, the current paper theoretically analyzes the implicit regulariza… ▽ More

    Submitted 18 September, 2022; v1 submitted 27 January, 2022; originally announced January 2022.

    Comments: Accepted to ICML 2022

  7. arXiv:2102.09972  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Implicit Regularization in Tensor Factorization

    Authors: Noam Razin, Asaf Maman, Nadav Cohen

    Abstract: Recent efforts to unravel the mystery of implicit regularization in deep learning have led to a theoretical focus on matrix factorization -- matrix completion via linear neural network. As a step further towards practical deep learning, we provide the first theoretical analysis of implicit regularization in tensor factorization -- tensor completion via certain type of non-linear neural network. We… ▽ More

    Submitted 9 June, 2021; v1 submitted 19 February, 2021; originally announced February 2021.

    Comments: Accepted to ICML 2021

  8. arXiv:2009.13292  [pdf, other

    cs.IR cs.CL cs.LG stat.ML

    RecoBERT: A Catalog Language Model for Text-Based Recommendations

    Authors: Itzik Malkiel, Oren Barkan, Avi Caciularu, Noam Razin, Ori Katz, Noam Koenigstein

    Abstract: Language models that utilize extensive self-supervised pre-training from unlabeled text, have recently shown to significantly advance the state-of-the-art performance in a variety of language understanding tasks. However, it is yet unclear if and how these recent models can be harnessed for conducting text-based recommendations. In this work, we introduce RecoBERT, a BERT-based approach for learni… ▽ More

    Submitted 25 September, 2020; originally announced September 2020.

  9. arXiv:2005.06398  [pdf, other

    cs.LG cs.NE stat.ML

    Implicit Regularization in Deep Learning May Not Be Explainable by Norms

    Authors: Noam Razin, Nadav Cohen

    Abstract: Mathematically characterizing the implicit regularization induced by gradient-based optimization is a longstanding pursuit in the theory of deep learning. A widespread hope is that a characterization based on minimization of norms may apply, and a standard test-bed for studying this prospect is matrix factorization (matrix completion via linear neural networks). It is an open question whether norm… ▽ More

    Submitted 17 October, 2020; v1 submitted 13 May, 2020; originally announced May 2020.

  10. arXiv:1908.05161  [pdf

    cs.LG cs.CL stat.ML

    Scalable Attentive Sentence-Pair Modeling via Distilled Sentence Embedding

    Authors: Oren Barkan, Noam Razin, Itzik Malkiel, Ori Katz, Avi Caciularu, Noam Koenigstein

    Abstract: Recent state-of-the-art natural language understanding models, such as BERT and XLNet, score a pair of sentences (A and B) using multiple cross-attention operations - a process in which each word in sentence A attends to all words in sentence B and vice versa. As a result, computing the similarity between a query sentence and a set of candidate sentences, requires the propagation of all query-cand… ▽ More

    Submitted 21 November, 2019; v1 submitted 14 August, 2019; originally announced August 2019.

    Comments: In Proceedings of AAAI 2020