Skip to main content

Showing 1–12 of 12 results for author: Zimerman, I

  1. arXiv:2406.14528  [pdf, other

    cs.LG cs.AI

    DeciMamba: Exploring the Length Extrapolation Potential of Mamba

    Authors: Assaf Ben-Kish, Itamar Zimerman, Shady Abu-Hussein, Nadav Cohen, Amir Globerson, Lior Wolf, Raja Giryes

    Abstract: Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be re… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: Link To Official Implementation: https://github.com/assafbk/DeciMamba

  2. arXiv:2405.16504  [pdf, other

    cs.LG

    A Unified Implicit Attention Formulation for Gated-Linear Recurrent Sequence Models

    Authors: Itamar Zimerman, Ameen Ali, Lior Wolf

    Abstract: Recent advances in efficient sequence modeling have led to attention-free layers, such as Mamba, RWKV, and various gated RNNs, all featuring sub-quadratic complexity in sequence length and excellent scaling properties, enabling the construction of a new type of foundation models. In this paper, we present a unified view of these models, formulating such layers as implicit causal self-attention lay… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    ACM Class: F.2.2; I.2.7

  3. arXiv:2403.01590  [pdf, other

    cs.LG

    The Hidden Attention of Mamba Models

    Authors: Ameen Ali, Itamar Zimerman, Lior Wolf

    Abstract: The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains, including NLP, long-range sequence processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via an IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such… ▽ More

    Submitted 31 March, 2024; v1 submitted 3 March, 2024; originally announced March 2024.

    MSC Class: F.2.2; I.2.7 ACM Class: F.2.2; I.2.7

  4. arXiv:2311.16620  [pdf, other

    cs.LG cs.CL

    On the Long Range Abilities of Transformers

    Authors: Itamar Zimerman, Lior Wolf

    Abstract: Despite their dominance in modern DL and, especially, NLP domains, transformer architectures exhibit sub-optimal performance on long-range tasks compared to recent layers that are specifically designed for this purpose. In this work, drawing inspiration from key attributes of long-range layers, such as state-space layers, linear RNN layers, and global convolution layers, we demonstrate that minima… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 18 pages

    ACM Class: F.2.2; I.2.7

  5. arXiv:2311.08610  [pdf, other

    cs.LG cs.CR

    Converting Transformers to Polynomial Form for Secure Inference Over Homomorphic Encryption

    Authors: Itamar Zimerman, Moran Baruch, Nir Drucker, Gilad Ezov, Omri Soceanu, Lior Wolf

    Abstract: Designing privacy-preserving deep learning models is a major challenge within the deep learning community. Homomorphic Encryption (HE) has emerged as one of the most promising approaches in this realm, enabling the decoupling of knowledge between the model owner and the data owner. Despite extensive research and application of this technology, primarily in convolutional neural networks, incorporat… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: 6 figures

    ACM Class: F.2.2; I.2.7

  6. arXiv:2309.13600  [pdf, other

    cs.CV cs.LG

    Multi-Dimensional Hyena for Spatial Inductive Bias

    Authors: Itamar Zimerman, Lior Wolf

    Abstract: In recent years, Vision Transformers have attracted increasing interest from computer vision researchers. However, the advantage of these transformers over CNNs is only fully manifested when trained over a large dataset, mainly due to the reduced inductive bias towards spatial locality within the transformer's self-attention mechanism. In this work, we present a data-efficient vision transformer t… ▽ More

    Submitted 24 September, 2023; originally announced September 2023.

    Comments: 10 pages, 3 figures

    ACM Class: F.2.2; I.2.7

  7. arXiv:2306.06736  [pdf, other

    cs.CR cs.LG

    Efficient Skip Connections Realization for Secure Inference on Encrypted Data

    Authors: Nir Drucker, Itamar Zimerman

    Abstract: Homomorphic Encryption (HE) is a cryptographic tool that allows performing computation under encryption, which is used by many privacy-preserving machine learning solutions, for example, to perform secure classification. Modern deep learning applications yield good performance for example in image processing tasks benchmarks by including many skip connections. The latter appears to be very costly… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

  8. arXiv:2306.06635  [pdf, other

    cs.CV cs.LG

    2-D SSM: A General Spatial Layer for Visual Transformers

    Authors: Ethan Baron, Itamar Zimerman, Lior Wolf

    Abstract: A central objective in computer vision is to design models with appropriate 2-D inductive bias. Desiderata for 2D inductive bias include two-dimensional position awareness, dynamic spatial locality, and translation and permutation invariance. To address these goals, we leverage an expressive variation of the multidimensional State Space Model (SSM). Our approach introduces efficient parameterizati… ▽ More

    Submitted 11 June, 2023; originally announced June 2023.

    Comments: 16 pages, 5 figures

    MSC Class: F.2.2; I.2.7

  9. arXiv:2306.05167  [pdf, other

    cs.LG

    Decision S4: Efficient Sequence-Based RL via State Spaces Layers

    Authors: Shmuel Bar-David, Itamar Zimerman, Eliya Nachmani, Lior Wolf

    Abstract: Recently, sequence learning methods have been applied to the problem of off-policy Reinforcement Learning, including the seminal work on Decision Transformers, which employs transformers for this task. Since transformers are parameter-heavy, cannot benefit from history longer than a fixed window size, and are not computed using recurrence, we set out to investigate the suitability of the S4 family… ▽ More

    Submitted 8 June, 2023; originally announced June 2023.

    Comments: 21 pages,13 figures

    MSC Class: 14J60 ACM Class: F.2.2; I.2.7

  10. arXiv:2305.14952  [pdf, other

    cs.LG eess.SP

    Focus Your Attention (with Adaptive IIR Filters)

    Authors: Shahar Lutati, Itamar Zimerman, Lior Wolf

    Abstract: We present a new layer in which dynamic (i.e.,input-dependent) Infinite Impulse Response (IIR) filters of order two are used to process the input sequence prior to applying conventional attention. The input is split into chunks, and the coefficients of these filters are determined based on previous chunks to maintain causality. Despite their relatively low order, the causal adaptive filters are sh… ▽ More

    Submitted 18 October, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: Accepted to EMNLP 2023

    ACM Class: F.2.2; I.2.7

  11. arXiv:2304.14836  [pdf, other

    cs.LG cs.AI cs.CR

    Training Large Scale Polynomial CNNs for E2E Inference over Homomorphic Encryption

    Authors: Moran Baruch, Nir Drucker, Gilad Ezov, Yoav Goldberg, Eyal Kushnir, Jenny Lerner, Omri Soceanu, Itamar Zimerman

    Abstract: Training large-scale CNNs that during inference can be run under Homomorphic Encryption (HE) is challenging due to the need to use only polynomial operations. This limits HE-based solutions adoption. We address this challenge and pioneer in providing a novel training method for large polynomial CNNs such as ResNet-152 and ConvNeXt models, and achieve promising accuracy on encrypted samples on larg… ▽ More

    Submitted 11 June, 2023; v1 submitted 26 April, 2023; originally announced April 2023.

  12. arXiv:2106.04876  [pdf, other

    cs.CR cs.IT cs.LG

    Recovering AES Keys with a Deep Cold Boot Attack

    Authors: Itamar Zimerman, Eliya Nachmani, Lior Wolf

    Abstract: Cold boot attacks inspect the corrupted random access memory soon after the power has been shut down. While most of the bits have been corrupted, many bits, at random locations, have not. Since the keys in many encryption schemes are being expanded in memory into longer keys with fixed redundancies, the keys can often be restored. In this work, we combine a novel cryptographic variant of a deep er… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted to ICML 2021