Skip to main content

Showing 1–6 of 6 results for author: Skliar, A

  1. arXiv:2402.16844  [pdf, other

    cs.LG cs.AI cs.CL

    Think Big, Generate Quick: LLM-to-SLM for Fast Autoregressive Decoding

    Authors: Benjamin Bergner, Andrii Skliar, Amelie Royer, Tijmen Blankevoort, Yuki Asano, Babak Ehteshami Bejnordi

    Abstract: Large language models (LLMs) have become ubiquitous in practice and are widely used for generation tasks such as translation, summarization and instruction following. However, their enormous size and reliance on autoregressive decoding increase deployment costs and complicate their use in latency-critical applications. In this work, we propose a hybrid approach that combines language models of dif… ▽ More

    Submitted 16 July, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Work presented at the ES-FoMo II Workshop at ICML 2024

  2. arXiv:2308.15639  [pdf

    cs.LG cs.AI

    Hyperbolic Convolutional Neural Networks

    Authors: Andrii Skliar, Maurice Weiler

    Abstract: Deep Learning is mostly responsible for the surge of interest in Artificial Intelligence in the last decade. So far, deep learning researchers have been particularly successful in the domain of image processing, where Convolutional Neural Networks are used. Although excelling at image classification, Convolutional Neural Networks are quite naive in that no inductive bias is set on the embedding sp… ▽ More

    Submitted 29 August, 2023; originally announced August 2023.

  3. arXiv:2304.05497  [pdf, other

    cs.CV cs.LG

    Revisiting Single-gated Mixtures of Experts

    Authors: Amelie Royer, Ilia Karmanov, Andrii Skliar, Babak Ehteshami Bejnordi, Tijmen Blankevoort

    Abstract: Mixture of Experts (MoE) are rising in popularity as a means to train extremely large-scale models, yet allowing for a reasonable computational cost at inference time. Recent state-of-the-art approaches usually assume a large number of experts, and require training all experts jointly, which often lead to training instabilities such as the router collapsing In contrast, in this work, we propose to… ▽ More

    Submitted 11 April, 2023; originally announced April 2023.

    Comments: BMVC 2022

  4. arXiv:2206.08236  [pdf, other

    cs.CV cs.LG eess.IV

    Simple and Efficient Architectures for Semantic Segmentation

    Authors: Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort

    Abstract: Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware. This paper demonstrates that a simple encoder-decoder architecture with a ResNet-like backbone and a sm… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: To be presented at Efficient Deep Learning for Computer Vision Workshop at CVPR 2022

  5. arXiv:2202.01290  [pdf, other

    cs.LG cs.CV

    Cyclical Pruning for Sparse Neural Networks

    Authors: Suraj Srinivas, Andrey Kuzmin, Markus Nagel, Mart van Baalen, Andrii Skliar, Tijmen Blankevoort

    Abstract: Current methods for pruning neural network weights iteratively apply magnitude-based pruning on the model weights and re-train the resulting model to recover lost accuracy. In this work, we show that such strategies do not allow for the recovery of erroneously pruned weights. To enable weight recovery, we propose a simple strategy called \textit{cyclical pruning} which requires the pruning schedul… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

  6. arXiv:2012.08859  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces

    Authors: Bert Moons, Parham Noorzad, Andrii Skliar, Giovanni Mariani, Dushyant Mehta, Chris Lott, Tijmen Blankevoort

    Abstract: Current state-of-the-art Neural Architecture Search (NAS) methods neither efficiently scale to multiple hardware platforms, nor handle diverse architectural search-spaces. To remedy this, we present DONNA (Distilling Optimal Neural Network Architectures), a novel pipeline for rapid, scalable and diverse NAS, that scales to many user scenarios. DONNA consists of three phases. First, an accuracy pre… ▽ More

    Submitted 27 August, 2021; v1 submitted 16 December, 2020; originally announced December 2020.

    Comments: Accepted at ICCV2021. Main text 9 pages, Full text 21 pages, 18 figures