Skip to main content

Showing 1–49 of 49 results for author: Lučić, M

  1. arXiv:2403.05530  [pdf, other

    cs.CL cs.AI

    Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

    Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

    Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More

    Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

  2. arXiv:2312.11805  [pdf, other

    cs.CL cs.AI cs.CV

    Gemini: A Family of Highly Capable Multimodal Models

    Authors: Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M. Dai, Anja Hauth, Katie Millican, David Silver, Melvin Johnson, Ioannis Antonoglou, Julian Schrittwieser, Amelia Glaese, Jilin Chen, Emily Pitler, Timothy Lillicrap, Angeliki Lazaridou, Orhan Firat, James Molloy, Michael Isard, Paul R. Barham, Tom Hennigan, Benjamin Lee , et al. (1325 additional authors not shown)

    Abstract: This report introduces a new family of multimodal models, Gemini, that exhibit remarkable capabilities across image, audio, video, and text understanding. The Gemini family consists of Ultra, Pro, and Nano sizes, suitable for applications ranging from complex reasoning tasks to on-device memory-constrained use-cases. Evaluation on a broad range of benchmarks shows that our most-capable Gemini Ultr… ▽ More

    Submitted 17 June, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

  3. arXiv:2312.07541  [pdf, other

    cs.CV cs.GR

    SMERF: Streamable Memory Efficient Radiance Fields for Real-Time Large-Scene Exploration

    Authors: Daniel Duckworth, Peter Hedman, Christian Reiser, Peter Zhizhin, Jean-François Thibert, Mario Lučić, Richard Szeliski, Jonathan T. Barron

    Abstract: Recent techniques for real-time view synthesis have rapidly advanced in fidelity and speed, and modern methods are capable of rendering near-photorealistic scenes at interactive frame rates. At the same time, a tension has arisen between explicit scene representations amenable to rasterization and neural fields built on ray marching, with state-of-the-art instances of the latter surpassing the for… ▽ More

    Submitted 2 July, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: Camera Ready. Project website: https://smerf-3d.github.io

  4. arXiv:2308.11093  [pdf, other

    cs.CV cs.AI cs.LG

    Video OWL-ViT: Temporally-consistent open-world localization in video

    Authors: Georg Heigold, Matthias Minderer, Alexey Gritsenko, Alex Bewley, Daniel Keysers, Mario Lučić, Fisher Yu, Thomas Kipf

    Abstract: We present an architecture and a training recipe that adapts pre-trained open-world image models to localization in videos. Understanding the open visual world (without being constrained by fixed label spaces) is crucial for many real-world vision tasks. Contrastive pre-training on large image-text datasets has recently led to significant improvements for image-level tasks. For more structured tas… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  5. arXiv:2307.06304  [pdf, other

    cs.CV cs.AI cs.LG

    Patch n' Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution

    Authors: Mostafa Dehghani, Basil Mustafa, Josip Djolonga, Jonathan Heek, Matthias Minderer, Mathilde Caron, Andreas Steiner, Joan Puigcerver, Robert Geirhos, Ibrahim Alabdulmohsin, Avital Oliver, Piotr Padlewski, Alexey Gritsenko, Mario Lučić, Neil Houlsby

    Abstract: The ubiquitous and demonstrably suboptimal choice of resizing images to a fixed resolution before processing them with computer vision models has not yet been successfully challenged. However, models such as the Vision Transformer (ViT) offer flexible sequence-based modeling, and hence varying input sequence lengths. We take advantage of this with NaViT (Native Resolution ViT) which uses sequence… ▽ More

    Submitted 12 July, 2023; originally announced July 2023.

  6. arXiv:2305.18565  [pdf, other

    cs.CV cs.CL cs.LG

    PaLI-X: On Scaling up a Multilingual Vision and Language Model

    Authors: Xi Chen, Josip Djolonga, Piotr Padlewski, Basil Mustafa, Soravit Changpinyo, Jialin Wu, Carlos Riquelme Ruiz, Sebastian Goodman, Xiao Wang, Yi Tay, Siamak Shakeri, Mostafa Dehghani, Daniel Salz, Mario Lucic, Michael Tschannen, Arsha Nagrani, Hexiang Hu, Mandar Joshi, Bo Pang, Ceslee Montgomery, Paulina Pietrzyk, Marvin Ritter, AJ Piergiovanni, Matthias Minderer, Filip Pavetic , et al. (18 additional authors not shown)

    Abstract: We present the training recipe and results of scaling up PaLI-X, a multilingual vision and language model, both in terms of size of the components and the breadth of its training task mixture. Our model achieves new levels of performance on a wide-range of varied and complex tasks, including multiple image-based captioning and question-answering tasks, image-based document understanding and few-sh… ▽ More

    Submitted 29 May, 2023; originally announced May 2023.

  7. arXiv:2304.12160  [pdf, other

    cs.CV

    End-to-End Spatio-Temporal Action Localisation with Video Transformers

    Authors: Alexey Gritsenko, Xuehan Xiong, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lučić, Cordelia Schmid, Anurag Arnab

    Abstract: The most performant spatio-temporal action localisation models use external person proposals and complex external memory banks. We propose a fully end-to-end, purely-transformer based model that directly ingests an input video, and outputs tubelets -- a sequence of bounding boxes and the action classes at each frame. Our flexible model can be trained with either sparse bounding-box supervision on… ▽ More

    Submitted 24 April, 2023; originally announced April 2023.

  8. arXiv:2302.05442  [pdf, other

    cs.CV cs.AI cs.LG

    Scaling Vision Transformers to 22 Billion Parameters

    Authors: Mostafa Dehghani, Josip Djolonga, Basil Mustafa, Piotr Padlewski, Jonathan Heek, Justin Gilmer, Andreas Steiner, Mathilde Caron, Robert Geirhos, Ibrahim Alabdulmohsin, Rodolphe Jenatton, Lucas Beyer, Michael Tschannen, Anurag Arnab, Xiao Wang, Carlos Riquelme, Matthias Minderer, Joan Puigcerver, Utku Evci, Manoj Kumar, Sjoerd van Steenkiste, Gamaleldin F. Elsayed, Aravindh Mahendran, Fisher Yu, Avital Oliver , et al. (17 additional authors not shown)

    Abstract: The scaling of Transformers has driven breakthrough capabilities for language models. At present, the largest large language models (LLMs) contain upwards of 100B parameters. Vision Transformers (ViT) have introduced the same architecture to image and video modelling, but these have not yet been successfully scaled to nearly the same degree; the largest dense ViT contains 4B parameters (Chen et al… ▽ More

    Submitted 10 February, 2023; originally announced February 2023.

  9. arXiv:2212.05922  [pdf, other

    cs.CV cs.SD

    Audiovisual Masked Autoencoders

    Authors: Mariana-Iuliana Georgescu, Eduardo Fonseca, Radu Tudor Ionescu, Mario Lucic, Cordelia Schmid, Anurag Arnab

    Abstract: Can we leverage the audiovisual information already present in video to improve self-supervised representation learning? To answer this question, we study various pretraining architectures and objectives within the masked autoencoding framework, motivated by the success of similar methods in natural language and image understanding. We show that we can achieve significant improvements on audiovisu… ▽ More

    Submitted 4 January, 2024; v1 submitted 9 December, 2022; originally announced December 2022.

    Comments: ICCV 2023

  10. arXiv:2211.14306  [pdf, other

    cs.CV cs.GR cs.LG eess.IV

    RUST: Latent Neural Scene Representations from Unposed Imagery

    Authors: Mehdi S. M. Sajjadi, Aravindh Mahendran, Thomas Kipf, Etienne Pot, Daniel Duckworth, Mario Lucic, Klaus Greff

    Abstract: Inferring the structure of 3D scenes from 2D observations is a fundamental challenge in computer vision. Recently popularized approaches based on neural scene representations have achieved tremendous impact and have been applied across a variety of applications. One of the major remaining challenges in this space is training a single model which can provide latent representations which effectively… ▽ More

    Submitted 24 March, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Comments: CVPR 2023 Highlight. Project website: https://rust-paper.github.io/

  11. arXiv:2207.03807  [pdf, other

    cs.CV

    Beyond Transfer Learning: Co-finetuning for Action Localisation

    Authors: Anurag Arnab, Xuehan Xiong, Alexey Gritsenko, Rob Romijnders, Josip Djolonga, Mostafa Dehghani, Chen Sun, Mario Lučić, Cordelia Schmid

    Abstract: Transfer learning is the predominant paradigm for training deep networks on small target datasets. Models are typically pretrained on large ``upstream'' datasets for classification, as such labels are easy to collect, and then finetuned on ``downstream'' tasks such as action localisation, which are smaller due to their finer-grained annotations. In this paper, we question this approach, and propos… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  12. arXiv:2206.07307  [pdf, other

    cs.CV cs.LG eess.IV

    VCT: A Video Compression Transformer

    Authors: Fabian Mentzer, George Toderici, David Minnen, Sung-Jin Hwang, Sergi Caelles, Mario Lucic, Eirikur Agustsson

    Abstract: We show how transformers can be used to vastly simplify neural video compression. Previous methods have been relying on an increasing number of architectural biases and priors, including motion prediction and warping operations, resulting in complex models. Instead, we independently map input frames to representations and use a transformer to model their dependencies, letting it predict the distri… ▽ More

    Submitted 12 October, 2022; v1 submitted 15 June, 2022; originally announced June 2022.

    Comments: NeurIPS'22 Camera Ready Version. Code: https://goo.gle/vct-paper

  13. arXiv:2206.06922  [pdf, other

    cs.CV cs.AI cs.LG

    Object Scene Representation Transformer

    Authors: Mehdi S. M. Sajjadi, Daniel Duckworth, Aravindh Mahendran, Sjoerd van Steenkiste, Filip Pavetić, Mario Lučić, Leonidas J. Guibas, Klaus Greff, Thomas Kipf

    Abstract: A compositional understanding of the world in terms of objects and their geometry in 3D space is considered a cornerstone of human cognition. Facilitating the learning of such a representation in neural networks holds promise for substantially improving labeled data efficiency. As a key step in this direction, we make progress on the problem of learning 3D-consistent decompositions of complex scen… ▽ More

    Submitted 12 October, 2022; v1 submitted 14 June, 2022; originally announced June 2022.

    Comments: Accepted at NeurIPS '22. Project page: https://osrt-paper.github.io/

  14. arXiv:2111.13152  [pdf, other

    cs.CV cs.AI cs.GR cs.LG cs.RO

    Scene Representation Transformer: Geometry-Free Novel View Synthesis Through Set-Latent Scene Representations

    Authors: Mehdi S. M. Sajjadi, Henning Meyer, Etienne Pot, Urs Bergmann, Klaus Greff, Noha Radwan, Suhani Vora, Mario Lucic, Daniel Duckworth, Alexey Dosovitskiy, Jakob Uszkoreit, Thomas Funkhouser, Andrea Tagliasacchi

    Abstract: A classical problem in computer vision is to infer a 3D scene representation from few images that can be used to render novel views at interactive rates. Previous work focuses on reconstructing pre-defined 3D representations, e.g. textured meshes, or implicit representations, e.g. radiance fields, and often requires input images with precise camera poses and long processing times for each novel sc… ▽ More

    Submitted 29 March, 2022; v1 submitted 25 November, 2021; originally announced November 2021.

    Comments: Accepted to CVPR 2022, Project website: https://srt-paper.github.io/

    Journal ref: CVPR 2022

  15. arXiv:2111.12993  [pdf, other

    cs.CV cs.LG

    PolyViT: Co-training Vision Transformers on Images, Videos and Audio

    Authors: Valerii Likhosherstov, Anurag Arnab, Krzysztof Choromanski, Mario Lucic, Yi Tay, Adrian Weller, Mostafa Dehghani

    Abstract: Can we train a single transformer model capable of processing multiple modalities and datasets, whilst sharing almost all of its learnable parameters? We present PolyViT, a model trained on image, audio and video which answers this question. By co-training different tasks on a single modality, we are able to improve the accuracy of each individual task and achieve state-of-the-art results on 5 sta… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  16. arXiv:2106.12887  [pdf, other

    cs.LG cs.AI stat.ML

    A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models

    Authors: Ibrahim Alabdulmohsin, Mario Lucic

    Abstract: We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk. We empirically validate its advantages on standard benchmark datasets across both classical algorithms as well as modern DNN architectures and demonstrate that it outperforms previous post-processing methods while… ▽ More

    Submitted 23 August, 2022; v1 submitted 6 June, 2021; originally announced June 2021.

    Comments: 21 pages, 5 figures

    MSC Class: 68T05; 68T45; 93E35; ACM Class: I.2.6; I.2.10

    Journal ref: Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), 2021

  17. arXiv:2106.09572  [pdf

    cs.CL

    Topic Modeling and Progression of American Digital News Media During the Onset of the COVID-19 Pandemic

    Authors: Xiangpeng Wan, Michael C. Lucic, Hakim Ghazzai, Yehia Massoud

    Abstract: Currently, the world is in the midst of a severe global pandemic, which has affected all aspects of people's lives. As a result, there is a deluge of COVID-related digital media articles published in the United States, due to the disparate effects of the pandemic. This large volume of information is difficult to consume by the audience in a reasonable amount of time. In this paper, we develop a Na… ▽ More

    Submitted 25 May, 2021; originally announced June 2021.

  18. arXiv:2106.07998  [pdf, other

    cs.LG cs.CV

    Revisiting the Calibration of Modern Neural Networks

    Authors: Matthias Minderer, Josip Djolonga, Rob Romijnders, Frances Hubis, Xiaohua Zhai, Neil Houlsby, Dustin Tran, Mario Lucic

    Abstract: Accurate estimation of predictive uncertainty (model calibration) is essential for the safe application of neural networks. Many instances of miscalibration in modern neural networks have been reported, suggesting a trend that newer, more accurate models produce poorly calibrated predictions. Here, we revisit this question for recent state-of-the-art image classification models. We systematically… ▽ More

    Submitted 26 October, 2021; v1 submitted 15 June, 2021; originally announced June 2021.

    Comments: 35th Conference on Neural Information Processing Systems (NeurIPS 2021)

  19. arXiv:2105.01601  [pdf, other

    cs.CV cs.AI cs.LG

    MLP-Mixer: An all-MLP Architecture for Vision

    Authors: Ilya Tolstikhin, Neil Houlsby, Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Thomas Unterthiner, Jessica Yung, Andreas Steiner, Daniel Keysers, Jakob Uszkoreit, Mario Lucic, Alexey Dosovitskiy

    Abstract: Convolutional Neural Networks (CNNs) are the go-to model for computer vision. Recently, attention-based networks, such as the Vision Transformer, have also become popular. In this paper we show that while convolutions and attention are both sufficient for good performance, neither of them are necessary. We present MLP-Mixer, an architecture based exclusively on multi-layer perceptrons (MLPs). MLP-… ▽ More

    Submitted 11 June, 2021; v1 submitted 4 May, 2021; originally announced May 2021.

    Comments: v2: Fixed parameter counts in Table 1. v3: Added results on JFT-3B in Figure 2(right); Added Section 3.4 on the input permutations. v4: Updated the x label in Figure 2(right)

  20. arXiv:2104.04191  [pdf, other

    cs.CV cs.AI cs.LG

    SI-Score: An image dataset for fine-grained analysis of robustness to object location, rotation and size

    Authors: Jessica Yung, Rob Romijnders, Alexander Kolesnikov, Lucas Beyer, Josip Djolonga, Neil Houlsby, Sylvain Gelly, Mario Lucic, Xiaohua Zhai

    Abstract: Before deploying machine learning models it is critical to assess their robustness. In the context of deep neural networks for image understanding, changing the object location, rotation and size may affect the predictions in non-trivial ways. In this work we perform a fine-grained analysis of robustness with respect to these factors of variation using SI-Score, a synthetic dataset. In particular,… ▽ More

    Submitted 9 April, 2021; originally announced April 2021.

    Comments: 4 pages (10 pages including references and appendix), 10 figures. Accepted at the ICLR 2021 RobustML Workshop. arXiv admin note: text overlap with arXiv:2007.08558

  21. arXiv:2103.15691  [pdf, other

    cs.CV

    ViViT: A Video Vision Transformer

    Authors: Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Lučić, Cordelia Schmid

    Abstract: We present pure-transformer based models for video classification, drawing upon the recent success of such models in image classification. Our model extracts spatio-temporal tokens from the input video, which are then encoded by a series of transformer layers. In order to handle the long sequences of tokens encountered in video, we propose several, efficient variants of our model which factorise t… ▽ More

    Submitted 1 November, 2021; v1 submitted 29 March, 2021; originally announced March 2021.

    Comments: ICCV 2021. Code at https://github.com/google-research/scenic/tree/main/scenic/projects/vivit

  22. arXiv:2011.03395  [pdf, other

    cs.LG stat.ML

    Underspecification Presents Challenges for Credibility in Modern Machine Learning

    Authors: Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne , et al. (15 additional authors not shown)

    Abstract: ML models often exhibit unexpectedly poor behavior when they are deployed in real-world domains. We identify underspecification as a key reason for these failures. An ML pipeline is underspecified when it can return many predictors with equivalently strong held-out performance in the training domain. Underspecification is common in modern ML pipelines, such as those based on deep learning. Predict… ▽ More

    Submitted 24 November, 2020; v1 submitted 6 November, 2020; originally announced November 2020.

    Comments: Updates: Updated statistical analysis in Section 6; Additional citations

  23. arXiv:2010.14766  [pdf, other

    cs.LG stat.ML

    A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation

    Authors: Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem

    Abstract: The idea behind the \emph{unsupervised} learning of \emph{disentangled} representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of d… ▽ More

    Submitted 27 October, 2020; originally announced October 2020.

    Comments: arXiv admin note: substantial text overlap with arXiv:1811.12359

    Journal ref: Journal of Machine Learning Research 2020, Volume 21, Number 209

  24. arXiv:2010.06402  [pdf, other

    cs.LG cs.CV

    Which Model to Transfer? Finding the Needle in the Growing Haystack

    Authors: Cedric Renggli, André Susano Pinto, Luka Rimanic, Joan Puigcerver, Carlos Riquelme, Ce Zhang, Mario Lucic

    Abstract: Transfer learning has been recently popularized as a data-efficient alternative to training models from scratch, in particular for computer vision tasks where it provides a remarkably solid baseline. The emergence of rich model repositories, such as TensorFlow Hub, enables the practitioners and researchers to unleash the potential of these models across a wide range of downstream tasks. As these r… ▽ More

    Submitted 25 March, 2022; v1 submitted 13 October, 2020; originally announced October 2020.

  25. arXiv:2010.02808  [pdf, other

    cs.CV

    Representation learning from videos in-the-wild: An object-centric approach

    Authors: Rob Romijnders, Aravindh Mahendran, Michael Tschannen, Josip Djolonga, Marvin Ritter, Neil Houlsby, Mario Lucic

    Abstract: We propose a method to learn image representations from uncurated videos. We combine a supervised loss from off-the-shelf object detectors and self-supervised losses which naturally arise from the video-shot-frame-object hierarchy present in each video. We report competitive results on 19 transfer learning tasks of the Visual Task Adaptation Benchmark (VTAB), and on 8 out-of-distribution-generaliz… ▽ More

    Submitted 9 February, 2021; v1 submitted 6 October, 2020; originally announced October 2020.

    Comments: Published at WACV 2021

  26. arXiv:2007.14184  [pdf, other

    cs.LG cs.AI stat.ML

    A Commentary on the Unsupervised Learning of Disentangled Representations

    Authors: Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem

    Abstract: The goal of the unsupervised learning of disentangled representations is to separate the independent explanatory factors of variation in the data without access to supervision. In this paper, we summarize the results of Locatello et al., 2019, and focus on their implications for practitioners. We discuss the theoretical result showing that the unsupervised learning of disentangled representations… ▽ More

    Submitted 28 July, 2020; originally announced July 2020.

    Journal ref: The Thirty-Fourth AAAI Conference on Artificial Intelligence 2020 (AAAI-20)

  27. arXiv:2007.08558  [pdf, other

    cs.CV cs.LG

    On Robustness and Transferability of Convolutional Neural Networks

    Authors: Josip Djolonga, Jessica Yung, Michael Tschannen, Rob Romijnders, Lucas Beyer, Alexander Kolesnikov, Joan Puigcerver, Matthias Minderer, Alexander D'Amour, Dan Moldovan, Sylvain Gelly, Neil Houlsby, Xiaohua Zhai, Mario Lucic

    Abstract: Modern deep convolutional networks (CNNs) are often criticized for not generalizing under distributional shifts. However, several recent breakthroughs in transfer learning suggest that these networks can cope with severe distribution shifts and successfully adapt to new tasks from a few training examples. In this work we study the interplay between out-of-distribution and transfer performance of m… ▽ More

    Submitted 23 March, 2021; v1 submitted 16 July, 2020; originally announced July 2020.

    Comments: Accepted at CVPR 2021

  28. arXiv:1912.02783  [pdf, other

    cs.CV cs.LG

    Self-Supervised Learning of Video-Induced Visual Invariances

    Authors: Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Xiaohua Zhai, Neil Houlsby, Sylvain Gelly, Mario Lucic

    Abstract: We propose a general framework for self-supervised learning of transferable visual representations based on Video-Induced Visual Invariances (VIVI). We consider the implicit hierarchy present in the videos and make use of (i) frame-level invariances (e.g. stability to color and contrast perturbations), (ii) shot/clip-level invariances (e.g. robustness to changes in object orientation and lighting… ▽ More

    Submitted 1 April, 2020; v1 submitted 5 December, 2019; originally announced December 2019.

    Comments: CVPR 2020

  29. arXiv:1911.11357  [pdf, other

    cs.LG cs.CV stat.ML

    Semantic Bottleneck Scene Generation

    Authors: Samaneh Azadi, Michael Tschannen, Eric Tzeng, Sylvain Gelly, Trevor Darrell, Mario Lucic

    Abstract: Coupling the high-fidelity generation capabilities of label-conditional image synthesis methods with the flexibility of unconditional generative models, we propose a semantic bottleneck GAN model for unconditional synthesis of complex scenes. We assume pixel-wise segmentation labels are available during training and use them to learn the scene structure. During inference, our model first synthesiz… ▽ More

    Submitted 26 November, 2019; originally announced November 2019.

  30. arXiv:1910.04867  [pdf, other

    cs.CV cs.LG stat.ML

    A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark

    Authors: Xiaohua Zhai, Joan Puigcerver, Alexander Kolesnikov, Pierre Ruyssen, Carlos Riquelme, Mario Lucic, Josip Djolonga, Andre Susano Pinto, Maxim Neumann, Alexey Dosovitskiy, Lucas Beyer, Olivier Bachem, Michael Tschannen, Marcin Michalski, Olivier Bousquet, Sylvain Gelly, Neil Houlsby

    Abstract: Representation learning promises to unlock deep learning for the long tail of vision tasks without expensive labelled datasets. Yet, the absence of a unified evaluation for general visual representations hinders progress. Popular protocols are often too constrained (linear classification), limited in diversity (ImageNet, CIFAR, Pascal-VOC), or only weakly related to representation quality (ELBO, r… ▽ More

    Submitted 21 February, 2020; v1 submitted 1 October, 2019; originally announced October 2019.

  31. arXiv:1907.13625  [pdf, other

    cs.LG stat.ML

    On Mutual Information Maximization for Representation Learning

    Authors: Michael Tschannen, Josip Djolonga, Paul K. Rubenstein, Sylvain Gelly, Mario Lucic

    Abstract: Many recent methods for unsupervised or self-supervised representation learning train feature extractors by maximizing an estimate of the mutual information (MI) between different views of the data. This comes with several immediate problems: For example, MI is notoriously hard to estimate, and using it as an objective for representation learning may lead to highly entangled representations due to… ▽ More

    Submitted 23 January, 2020; v1 submitted 31 July, 2019; originally announced July 2019.

    Comments: ICLR 2020. Michael Tschannen and Josip Djolonga contributed equally

  32. arXiv:1905.10768  [pdf, other

    cs.LG stat.ML

    Precision-Recall Curves Using Information Divergence Frontiers

    Authors: Josip Djolonga, Mario Lucic, Marco Cuturi, Olivier Bachem, Olivier Bousquet, Sylvain Gelly

    Abstract: Despite the tremendous progress in the estimation of generative models, the development of tools for diagnosing their failures and assessing their performance has advanced at a much slower pace. Recent developments have investigated metrics that quantify which parts of the true distribution is modeled well, and, on the contrary, what the model fails to capture, akin to precision and recall in info… ▽ More

    Submitted 8 June, 2020; v1 submitted 26 May, 2019; originally announced May 2019.

    Comments: Updated to the AISTATS 2020 version

  33. arXiv:1903.02271  [pdf, other

    cs.LG cs.CV stat.ML

    High-Fidelity Image Generation With Fewer Labels

    Authors: Mario Lucic, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly

    Abstract: Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial networks has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. In this work… ▽ More

    Submitted 14 May, 2019; v1 submitted 6 March, 2019; originally announced March 2019.

    Comments: Mario Lucic, Michael Tschannen, and Marvin Ritter contributed equally to this work. ICML 2019 camera-ready version. Code available at https://github.com/google/compare_gan

  34. arXiv:1812.05069  [pdf, other

    cs.LG cs.CV stat.ML

    Recent Advances in Autoencoder-Based Representation Learning

    Authors: Michael Tschannen, Olivier Bachem, Mario Lucic

    Abstract: Learning useful representations with little or no supervision is a key challenge in artificial intelligence. We provide an in-depth review of recent advances in representation learning with a focus on autoencoder-based models. To organize these results we make use of meta-priors believed useful for downstream tasks, such as disentanglement and hierarchical organization of features. In particular,… ▽ More

    Submitted 12 December, 2018; originally announced December 2018.

    Comments: Presented at the third workshop on Bayesian Deep Learning (NeurIPS 2018)

  35. arXiv:1811.12359  [pdf, other

    cs.LG cs.AI stat.ML

    Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations

    Authors: Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, Olivier Bachem

    Abstract: The key idea behind the unsupervised learning of disentangled representations is that real-world data is generated by a few explanatory factors of variation which can be recovered by unsupervised learning algorithms. In this paper, we provide a sober look at recent progress in the field and challenge some common assumptions. We first theoretically show that the unsupervised learning of disentangle… ▽ More

    Submitted 18 June, 2019; v1 submitted 29 November, 2018; originally announced November 2018.

    Journal ref: Proceedings of the 36th International Conference on Machine Learning (ICML 2019)

  36. arXiv:1811.11212  [pdf, other

    cs.LG cs.CV stat.ML

    Self-Supervised GANs via Auxiliary Rotation Loss

    Authors: Ting Chen, Xiaohua Zhai, Marvin Ritter, Mario Lucic, Neil Houlsby

    Abstract: Conditional GANs are at the forefront of natural image synthesis. The main drawback of such models is the necessity for labeled data. In this work we exploit two popular unsupervised learning techniques, adversarial training and self-supervision, and take a step towards bridging the gap between conditional and unconditional GANs. In particular, we allow the networks to collaborate on the task of r… ▽ More

    Submitted 9 April, 2019; v1 submitted 27 November, 2018; originally announced November 2018.

  37. arXiv:1810.01365  [pdf, other

    cs.LG cs.CV stat.ML

    On Self Modulation for Generative Adversarial Networks

    Authors: Ting Chen, Mario Lucic, Neil Houlsby, Sylvain Gelly

    Abstract: Training Generative Adversarial Networks (GANs) is notoriously challenging. We propose and study an architectural modification, self-modulation, which improves GAN performance across different data sets, architectures, losses, regularizers, and hyperparameter settings. Intuitively, self-modulation allows the intermediate feature maps of a generator to change as a function of the input noise vector… ▽ More

    Submitted 2 May, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

  38. arXiv:1807.04720  [pdf, other

    cs.LG stat.ML

    A Large-Scale Study on Regularization and Normalization in GANs

    Authors: Karol Kurach, Mario Lucic, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

    Abstract: Generative adversarial networks (GANs) are a class of deep generative models which aim to learn a target distribution in an unsupervised fashion. While they were successfully applied to many problems, training a GAN is a notoriously challenging task and requires a significant number of hyperparameter tuning, neural architecture engineering, and a non-trivial amount of "tricks". The success in many… ▽ More

    Submitted 14 May, 2019; v1 submitted 12 July, 2018; originally announced July 2018.

    Comments: Revision accepted to ICML'19: More focus on regularization and normalization aspects. Added recent references and promising future directions

  39. arXiv:1806.00035  [pdf, other

    stat.ML cs.LG

    Assessing Generative Models via Precision and Recall

    Authors: Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly

    Abstract: Recent advances in generative modeling have led to an increased interest in the study of statistical divergences as means of model comparison. Commonly used evaluation methods, such as the Frechet Inception Distance (FID), correlate well with the perceived quality of samples and are sensitive to mode dropping. However, these metrics are unable to distinguish between different failure cases since t… ▽ More

    Submitted 28 October, 2018; v1 submitted 31 May, 2018; originally announced June 2018.

    Comments: NIPS 2018

  40. arXiv:1805.11057  [pdf, other

    cs.LG stat.ML

    Deep Generative Models for Distribution-Preserving Lossy Compression

    Authors: Michael Tschannen, Eirikur Agustsson, Mario Lucic

    Abstract: We propose and study the problem of distribution-preserving lossy compression. Motivated by recent advances in extreme image compression which allow to maintain artifact-free reconstructions even at very low bitrates, we propose to optimize the rate-distortion tradeoff under the constraint that the reconstructed samples follow the distribution of the training data. The resulting compression system… ▽ More

    Submitted 28 October, 2018; v1 submitted 28 May, 2018; originally announced May 2018.

    Comments: NIPS 2018. Code: https://github.com/mitscha/dplc . Changes w.r.t. v1: Some clarifications in the text and additional numerical results

  41. arXiv:1711.10337  [pdf, other

    stat.ML cs.LG

    Are GANs Created Equal? A Large-Scale Study

    Authors: Mario Lucic, Karol Kurach, Marcin Michalski, Sylvain Gelly, Olivier Bousquet

    Abstract: Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach… ▽ More

    Submitted 29 October, 2018; v1 submitted 28 November, 2017; originally announced November 2017.

    Comments: NIPS'18: Added a section on the limitations of the study and additional empirical results

  42. arXiv:1711.09649  [pdf, other

    stat.ML cs.LG

    One-Shot Coresets: The Case of k-Clustering

    Authors: Olivier Bachem, Mario Lucic, Silvio Lattanzi

    Abstract: Scaling clustering algorithms to massive data sets is a challenging task. Recently, several successful approaches based on data summarization methods, such as coresets and sketches, were proposed. While these techniques provide provably good and small summaries, they are inherently problem dependent - the practitioner has to commit to a fixed clustering objective before even exploring the data. Ho… ▽ More

    Submitted 20 February, 2018; v1 submitted 27 November, 2017; originally announced November 2017.

    Comments: To Appear In AISTATS 2018

  43. arXiv:1711.01566  [pdf, other

    cs.LG cs.DM stat.ML

    Stochastic Submodular Maximization: The Case of Coverage Functions

    Authors: Mohammad Reza Karimi, Mario Lucic, Hamed Hassani, Andreas Krause

    Abstract: Stochastic optimization of continuous objectives is at the heart of modern machine learning. However, many important problems are of discrete nature and often involve submodular objectives. We seek to unleash the power of stochastic continuous optimization, namely stochastic gradient descent and its variants, to such discrete problems. We first introduce the problem of stochastic submodular optimi… ▽ More

    Submitted 5 November, 2017; originally announced November 2017.

    Comments: 31st Conference on Neural Information Processing Systems (NIPS 2017)

  44. arXiv:1702.08249  [pdf, other

    stat.ML cs.LG

    Uniform Deviation Bounds for Unbounded Loss Functions like k-Means

    Authors: Olivier Bachem, Mario Lucic, S. Hamed Hassani, Andreas Krause

    Abstract: Uniform deviation bounds limit the difference between a model's expected loss and its loss on an empirical sample uniformly for all models in a learning problem. As such, they are a critical component to empirical risk minimization. In this paper, we provide a novel framework to obtain uniform deviation bounds for loss functions which are *unbounded*. In our main application, this allows us to obt… ▽ More

    Submitted 27 February, 2017; originally announced February 2017.

  45. arXiv:1702.08248  [pdf, other

    stat.ML cs.DC cs.DS cs.LG stat.CO

    Scalable k-Means Clustering via Lightweight Coresets

    Authors: Olivier Bachem, Mario Lucic, Andreas Krause

    Abstract: Coresets are compact representations of data sets such that models trained on a coreset are provably competitive with models trained on the full data set. As such, they have been successfully used to scale up clustering models to massive data sets. While existing approaches generally only allow for multiplicative approximation errors, we propose a novel notion of lightweight coresets that allows f… ▽ More

    Submitted 6 June, 2018; v1 submitted 27 February, 2017; originally announced February 2017.

    Comments: To appear in the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD)

  46. arXiv:1605.09619  [pdf, other

    stat.ML cs.DC cs.DM cs.LG

    Horizontally Scalable Submodular Maximization

    Authors: Mario Lucic, Olivier Bachem, Morteza Zadimoghaddam, Andreas Krause

    Abstract: A variety of large-scale machine learning problems can be cast as instances of constrained submodular maximization. Existing approaches for distributed submodular maximization have a critical drawback: The capacity - number of instances that can fit in memory - must grow with the data set size. In practice, while one can provision many machines, the capacity of each machine is limited by physical… ▽ More

    Submitted 31 May, 2016; originally announced May 2016.

  47. arXiv:1605.00529  [pdf, other

    stat.ML cs.LG

    Tradeoffs for Space, Time, Data and Risk in Unsupervised Learning

    Authors: Mario Lucic, Mesrob I. Ohannessian, Amin Karbasi, Andreas Krause

    Abstract: Faced with massive data, is it possible to trade off (statistical) risk, and (computational) space and time? This challenge lies at the heart of large-scale machine learning. Using k-means clustering as a prototypical unsupervised learning problem, we show how we can strategically summarize the data (control space) in order to trade off risk and time when data is generated by a probabilistic model… ▽ More

    Submitted 2 May, 2016; originally announced May 2016.

    Comments: Conference version

  48. arXiv:1605.00519  [pdf, other

    stat.ML cs.LG

    Linear-time Outlier Detection via Sensitivity

    Authors: Mario Lucic, Olivier Bachem, Andreas Krause

    Abstract: Outliers are ubiquitous in modern data sets. Distance-based techniques are a popular non-parametric approach to outlier detection as they require no prior assumptions on the data generating distribution and are simple to implement. Scaling these techniques to massive data sets without sacrificing accuracy is a challenging task. We propose a novel algorithm based on the intuition that outliers have… ▽ More

    Submitted 2 May, 2016; originally announced May 2016.

    Comments: 8 pages

  49. arXiv:1508.05243  [pdf, other

    stat.ML cs.LG

    Strong Coresets for Hard and Soft Bregman Clustering with Applications to Exponential Family Mixtures

    Authors: Mario Lucic, Olivier Bachem, Andreas Krause

    Abstract: Coresets are efficient representations of data sets such that models trained on the coreset are provably competitive with models trained on the original data set. As such, they have been successfully used to scale up clustering models such as K-Means and Gaussian mixture models to massive data sets. However, until now, the algorithms and the corresponding theory were usually specific to each clust… ▽ More

    Submitted 2 May, 2016; v1 submitted 21 August, 2015; originally announced August 2015.

    Comments: 14 pages, camera ready version