Skip to main content

Showing 1–38 of 38 results for author: Alemi, A

  1. arXiv:2404.03626  [pdf, other

    cs.CL cs.LG

    Training LLMs over Neurally Compressed Text

    Authors: Brian Lester, Jaehoon Lee, Alex Alemi, Jeffrey Pennington, Adam Roberts, Jascha Sohl-Dickstein, Noah Constant

    Abstract: In this paper, we explore the idea of training large language models (LLMs) over highly compressed text. While standard subword tokenizers compress text by a small factor, neural text compressors can achieve much higher rates of compression. If it were possible to train LLMs directly over neurally compressed text, this would confer advantages in training and serving efficiency, as well as easier h… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  2. arXiv:2312.06585  [pdf, other

    cs.LG

    Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

    Authors: Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron , et al. (16 additional authors not shown)

    Abstract: Fine-tuning language models~(LMs) on human-generated data remains a prevalent practice. However, the performance of such models is often limited by the quantity and diversity of high-quality human data. In this paper, we explore whether we can go beyond human data on tasks where we have access to scalar feedback, for example, on math problems where one can verify correctness. To do so, we investig… ▽ More

    Submitted 17 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

    Comments: Accepted to TMLR. Camera-ready version. First three authors contributed equally

  3. arXiv:2311.07587  [pdf, other

    cs.CL cs.AI cs.CY cs.LG

    Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

    Authors: C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant , et al. (5 additional authors not shown)

    Abstract: We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment. This problem is comprised of arithmetic questions posed in natural language, with an arbitrary adversarial string inserted before the question is complete. Even in the simple setting of 1-digit addition problems, it is easy to find adversarial prompts that mak… ▽ More

    Submitted 15 November, 2023; v1 submitted 8 November, 2023; originally announced November 2023.

  4. arXiv:2309.14322  [pdf, other

    cs.LG

    Small-scale proxies for large-scale Transformer training instabilities

    Authors: Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

    Abstract: Teams that have trained large Transformer-based models have reported training instabilities at large scale that did not appear when training with the same hyperparameters at smaller scales. Although the causes of such instabilities are of scientific interest, the amount of resources required to reproduce them has made investigation difficult. In this work, we seek ways to reproduce and study train… ▽ More

    Submitted 16 October, 2023; v1 submitted 25 September, 2023; originally announced September 2023.

  5. arXiv:2307.14653  [pdf, other

    stat.ML cond-mat.dis-nn cs.LG

    Speed Limits for Deep Learning

    Authors: Inbar Seroussi, Alexander A. Alemi, Moritz Helias, Zohar Ringel

    Abstract: State-of-the-art neural networks require extreme computational power to train. It is therefore natural to wonder whether they are optimally trained. Here we apply a recent advancement in stochastic thermodynamics which allows bounding the speed at which one can go from the initial weight distribution to the final distribution of the fully trained network, based on the ratio of their Wasserstein-2… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  6. arXiv:2307.07568  [pdf, other

    cs.LG stat.ML

    Variational Prediction

    Authors: Alexander A. Alemi, Ben Poole

    Abstract: Bayesian inference offers benefits over maximum likelihood, but it also comes with computational costs. Computing the posterior is typically intractable, as is marginalizing that posterior to form the posterior predictive distribution. In this paper, we present variational prediction, a technique for directly learning a variational approximation to the posterior predictive distribution using a var… ▽ More

    Submitted 14 July, 2023; originally announced July 2023.

    Comments: AABI2023

  7. arXiv:2211.09981  [pdf, other

    cs.LG cs.AI stat.ML

    Weighted Ensemble Self-Supervised Learning

    Authors: Yangjun Ruan, Saurabh Singh, Warren Morningstar, Alexander A. Alemi, Sergey Ioffe, Ian Fischer, Joshua V. Dillon

    Abstract: Ensembling has proven to be a powerful technique for boosting model performance, uncertainty estimation, and robustness in supervised learning. Advances in self-supervised learning (SSL) enable leveraging large unlabeled corpora for state-of-the-art few-shot and supervised learning performance. In this paper, we explore how ensemble methods can improve recent SSL techniques by developing a framewo… ▽ More

    Submitted 9 April, 2023; v1 submitted 17 November, 2022; originally announced November 2022.

    Comments: Accepted by ICLR 2023

  8. arXiv:2202.07600  [pdf, other

    cs.RO cs.LG

    Bayesian Imitation Learning for End-to-End Mobile Manipulation

    Authors: Yuqing Du, Daniel Ho, Alexander A. Alemi, Eric Jang, Mohi Khansari

    Abstract: In this work we investigate and demonstrate benefits of a Bayesian approach to imitation learning from multiple sensor inputs, as applied to the task of opening office doors with a mobile manipulator. Augmenting policies with additional sensor inputs, such as RGB + depth cameras, is a straightforward approach to improving robot perception capabilities, especially for tasks that may favor different… ▽ More

    Submitted 15 February, 2022; originally announced February 2022.

  9. arXiv:2107.05712  [pdf, other

    cs.LG

    A Closer Look at the Adversarial Robustness of Information Bottleneck Models

    Authors: Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven Gowal

    Abstract: We study the adversarial robustness of information bottleneck models for classification. Previous works showed that the robustness of models trained with information bottlenecks can improve upon adversarial training. Our evaluation under a diverse range of white-box $l_{\infty}$ attacks suggests that information bottlenecks alone are not a strong defense strategy, and that previous results were li… ▽ More

    Submitted 12 July, 2021; originally announced July 2021.

  10. arXiv:2106.05945  [pdf, other

    cs.LG stat.ML

    Does Knowledge Distillation Really Work?

    Authors: Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson

    Abstract: Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood: there often remains a surprisingly large discrepancy between the predictive distributions of the teacher and the s… ▽ More

    Submitted 6 December, 2021; v1 submitted 10 June, 2021; originally announced June 2021.

    Comments: NeurIPS 2021. Code available at https://github.com/samuelstanton/gnosis

  11. arXiv:2011.08711  [pdf, other

    stat.ML cs.LG

    VIB is Half Bayes

    Authors: Alexander A Alemi, Warren R Morningstar, Ben Poole, Ian Fischer, Joshua V Dillon

    Abstract: In discriminative settings such as regression and classification there are two random variables at play, the inputs X and the targets Y. Here, we demonstrate that the Variational Information Bottleneck can be viewed as a compromise between fully empirical and fully Bayesian objectives, attempting to minimize the risks due to finite sampling of Y only. We argue that this approach provides some of t… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

  12. arXiv:2010.09629  [pdf, other

    cs.LG stat.ML

    PAC$^m$-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime

    Authors: Warren R. Morningstar, Alexander A. Alemi, Joshua V. Dillon

    Abstract: The Bayesian posterior minimizes the "inferential risk" which itself bounds the "predictive risk". This bound is tight when the likelihood and prior are well-specified. However since misspecification induces a gap, the Bayesian posterior predictive distribution may have poor generalization performance. This work develops a multi-sample loss (PAC$^m$) which can close the gap by spanning a trade-off… ▽ More

    Submitted 23 May, 2022; v1 submitted 19 October, 2020; originally announced October 2020.

    Comments: Accepted at AISTATS2022

    Journal ref: International Conference on Artificial Intelligence and Statistics, 8270-8298, (2022)

  13. arXiv:2006.09273  [pdf, other

    cs.LG stat.ML

    Density of States Estimation for Out-of-Distribution Detection

    Authors: Warren R. Morningstar, Cusuh Ham, Andrew G. Gallagher, Balaji Lakshminarayanan, Alexander A. Alemi, Joshua V. Dillon

    Abstract: Perhaps surprisingly, recent studies have shown probabilistic model likelihoods have poor specificity for out-of-distribution (OOD) detection and often assign higher likelihoods to OOD data than in-distribution data. To ameliorate this issue we propose DoSE, the density of states estimator. Drawing on the statistical physics notion of ``density of states,'' the DoSE decision rule avoids direct com… ▽ More

    Submitted 22 June, 2020; v1 submitted 16 June, 2020; originally announced June 2020.

    Comments: Submitted to NeurIPS. Corrected footnote from: "34th Conference on Neural Information Processing Systems (NeurIPS 2020), Vancouver, Canada" to "Preprint. Under review."

  14. arXiv:2002.05380  [pdf, other

    cs.LG stat.ML

    CEB Improves Model Robustness

    Authors: Ian Fischer, Alexander A. Alemi

    Abstract: We demonstrate that the Conditional Entropy Bottleneck (CEB) can improve model robustness. CEB is an easy strategy to implement and works in tandem with data augmentation procedures. We report results of a large scale adversarial robustness study on CIFAR-10, as well as the ImageNet-C Common Corruptions Benchmark, ImageNet-A, and PGD attacks.

    Submitted 13 February, 2020; originally announced February 2020.

  15. arXiv:1912.02803  [pdf, other

    stat.ML cs.LG

    Neural Tangents: Fast and Easy Infinite Neural Networks in Python

    Authors: Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, Samuel S. Schoenholz

    Abstract: Neural Tangents is a library designed to enable research into infinite-width neural networks. It provides a high-level API for specifying complex and hierarchical neural network architectures. These networks can then be trained and evaluated either at finite-width as usual or in their infinite-width limit. Infinite-width networks can be trained analytically using exact Bayesian inference or using… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

  16. arXiv:1911.09189  [pdf, other

    cs.LG cs.IT stat.ML

    Information in Infinite Ensembles of Infinitely-Wide Neural Networks

    Authors: Ravid Shwartz-Ziv, Alexander A. Alemi

    Abstract: In this preliminary work, we study the generalization properties of infinite ensembles of infinitely-wide neural networks. Amazingly, this model family admits tractable calculations for many information-theoretic quantities. We report analytical and empirical investigations in the search for signals that correlate with generalization.

    Submitted 7 November, 2022; v1 submitted 20 November, 2019; originally announced November 2019.

    Comments: 2nd Symposium on Advances in Approximate Bayesian Inference, 2019

    Journal ref: Proceedings of The 2nd Symposium on Advances in Approximate Bayesian Inference, PMLR 118:1-17 2019

  17. arXiv:1911.01968  [pdf

    cs.CY cs.ET

    Thermodynamic Computing

    Authors: Tom Conte, Erik DeBenedictis, Natesh Ganesh, Todd Hylton, John Paul Strachan, R. Stanley Williams, Alexander Alemi, Lee Altenberg, Gavin Crooks, James Crutchfield, Lidia del Rio, Josh Deutsch, Michael DeWeese, Khari Douglas, Massimiliano Esposito, Michael Frank, Robert Fry, Peter Harsha, Mark Hill, Christopher Kello, Jeff Krichmar, Suhas Kumar, Shih-Chii Liu, Seth Lloyd, Matteo Marsili , et al. (14 additional authors not shown)

    Abstract: The hardware and software foundations laid in the first half of the 20th Century enabled the computing technologies that have transformed the world, but these foundations are now under siege. The current computing paradigm, which is the foundation of much of the current standards of living that we now enjoy, faces fundamental limitations that are evident from several perspectives. In terms of hard… ▽ More

    Submitted 14 November, 2019; v1 submitted 5 November, 2019; originally announced November 2019.

    Comments: A Computing Community Consortium (CCC) workshop report, 36 pages

    Report number: ccc2019report_6

  18. arXiv:1910.10831  [pdf, ps, other

    cs.LG cs.IT stat.ML

    Variational Predictive Information Bottleneck

    Authors: Alexander A. Alemi

    Abstract: In classic papers, Zellner demonstrated that Bayesian inference could be derived as the solution to an information theoretic functional. Below we derive a generalized form of this functional as a variational lower bound of a predictive information bottleneck objective. This generalized functional encompasses most modern inference procedures and suggests novel ones.

    Submitted 23 October, 2019; originally announced October 2019.

  19. arXiv:1910.09578  [pdf, other

    cs.LG cs.IT stat.ML

    On Predictive Information in RNNs

    Authors: Zhe Dong, Deniz Oktay, Ben Poole, Alexander A. Alemi

    Abstract: Certain biological neurons demonstrate a remarkable capability to optimally compress the history of sensory inputs while being maximally informative about the future. In this work, we investigate if the same can be said of artificial neurons in recurrent neural networks (RNNs) trained with maximum likelihood. Empirically, we find that RNNs are suboptimal in the information plane. Instead of optima… ▽ More

    Submitted 10 February, 2020; v1 submitted 21 October, 2019; originally announced October 2019.

  20. arXiv:1905.07478  [pdf, other

    cs.LG stat.ML

    Dueling Decoders: Regularizing Variational Autoencoder Latent Spaces

    Authors: Bryan Seybold, Emily Fertig, Alex Alemi, Ian Fischer

    Abstract: Variational autoencoders learn unsupervised data representations, but these models frequently converge to minima that fail to preserve meaningful semantic information. For example, variational autoencoders with autoregressive decoders often collapse into autodecoders, where they learn to ignore the encoder input. In this work, we demonstrate that adding an auxiliary decoder to regularize the laten… ▽ More

    Submitted 17 May, 2019; originally announced May 2019.

    Comments: 16 pages, 9 figures, supplemental

  21. arXiv:1905.06922  [pdf, other

    cs.LG stat.ML

    On Variational Bounds of Mutual Information

    Authors: Ben Poole, Sherjil Ozair, Aaron van den Oord, Alexander A. Alemi, George Tucker

    Abstract: Estimating and optimizing Mutual Information (MI) is core to many problems in machine learning; however, bounding MI in high dimensions is challenging. To establish tractable and scalable objectives, recent work has turned to variational bounds parameterized by neural networks, but the relationships and tradeoffs between these bounds remains unclear. In this work, we unify these recent development… ▽ More

    Submitted 16 May, 2019; originally announced May 2019.

    Comments: ICML 2019

  22. arXiv:1905.00075  [pdf, ps, other

    cs.IR cs.LG cs.SI physics.soc-ph

    On the Use of ArXiv as a Dataset

    Authors: Colin B. Clement, Matthew Bierbaum, Kevin P. O'Keeffe, Alexander A. Alemi

    Abstract: The arXiv has collected 1.5 million pre-print articles over 28 years, hosting literature from scientific fields including Physics, Mathematics, and Computer Science. Each pre-print features text, figures, authors, citations, categories, and other metadata. These rich, multi-modal features, combined with the natural graph structure---created by citation, affiliation, and co-authorship---makes the a… ▽ More

    Submitted 30 April, 2019; originally announced May 2019.

    Comments: 7 pages, 3 tables, 2 figures, ICLR 2019 workshop RLGM submission

  23. arXiv:1812.02682  [pdf, other

    cs.LG stat.ML

    $β$-VAEs can retain label information even at high compression

    Authors: Emily Fertig, Aryan Arbabi, Alexander A. Alemi

    Abstract: In this paper, we investigate the degree to which the encoding of a $β$-VAE captures label information across multiple architectures on Binary Static MNIST and Omniglot. Even though they are trained in a completely unsupervised manner, we demonstrate that a $β$-VAE can retain a large amount of label information, even when asked to learn a highly compressed representation.

    Submitted 6 December, 2018; originally announced December 2018.

    Comments: NeurIPS2018, BDL workshop

  24. arXiv:1810.01392  [pdf, other

    stat.ML cs.LG

    WAIC, but Why? Generative Ensembles for Robust Anomaly Detection

    Authors: Hyunsun Choi, Eric Jang, Alexander A. Alemi

    Abstract: Machine learning models encounter Out-of-Distribution (OoD) errors when the data seen at test time are generated from a different stochastic generator than the one used to generate the training data. One proposal to scale OoD detection to high-dimensional data is to learn a tractable likelihood approximation of the training distribution, and use it to reject unlikely inputs. However, likelihood mo… ▽ More

    Submitted 23 May, 2019; v1 submitted 2 October, 2018; originally announced October 2018.

  25. arXiv:1807.04162  [pdf, other

    cs.LG cond-mat.stat-mech stat.ML

    TherML: Thermodynamics of Machine Learning

    Authors: Alexander A. Alemi, Ian Fischer

    Abstract: In this work we offer a framework for reasoning about a wide class of existing objectives in machine learning. We develop a formal correspondence between this work and thermodynamics and discuss its implications.

    Submitted 4 October, 2018; v1 submitted 11 July, 2018; originally announced July 2018.

    Comments: Presented at the ICML 2018 workshop on Theoretical Foundations and Applications of Deep Generative Models

  26. arXiv:1807.00906  [pdf, other

    cs.LG stat.ML

    Uncertainty in the Variational Information Bottleneck

    Authors: Alexander A. Alemi, Ian Fischer, Joshua V. Dillon

    Abstract: We present a simple case study, demonstrating that Variational Information Bottleneck (VIB) can improve a network's classification calibration as well as its ability to detect out-of-distribution data. Without explicitly being designed to do so, VIB gives two natural metrics for handling and quantifying uncertainty.

    Submitted 2 July, 2018; originally announced July 2018.

    Comments: 10 pages, 7 figures. Accepted to UAI 2018 - Uncertainty in Deep Learning Workshop

  27. arXiv:1802.04874  [pdf, other

    stat.ML cs.LG

    GILBO: One Metric to Measure Them All

    Authors: Alexander A. Alemi, Ian Fischer

    Abstract: We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data-independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is well-defined for both VAEs and GANs. We com… ▽ More

    Submitted 10 January, 2019; v1 submitted 13 February, 2018; originally announced February 2018.

    Comments: Accepted at NeurIPS 2018

  28. arXiv:1711.10604  [pdf, ps, other

    cs.LG cs.AI cs.PL stat.ML

    TensorFlow Distributions

    Authors: Joshua V. Dillon, Ian Langmore, Dustin Tran, Eugene Brevdo, Srinivas Vasudevan, Dave Moore, Brian Patton, Alex Alemi, Matt Hoffman, Rif A. Saurous

    Abstract: The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors… ▽ More

    Submitted 28 November, 2017; originally announced November 2017.

  29. arXiv:1711.00464  [pdf, other

    cs.LG stat.ML

    Fixing a Broken ELBO

    Authors: Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy

    Abstract: Recent work in unsupervised representation learning has focused on learning deep directed latent-variable models. Fitting these models by maximizing the marginal likelihood or evidence is typically intractable, thus a common approximation is to maximize the evidence lower bound (ELBO) instead. However, maximum likelihood training (whether exact or approximate) does not necessarily result in a good… ▽ More

    Submitted 13 February, 2018; v1 submitted 1 November, 2017; originally announced November 2017.

    Comments: 21 pages, 9 figures

  30. arXiv:1710.09599  [pdf, other

    cs.LG cs.SI stat.ML

    Watch Your Step: Learning Node Embeddings via Graph Attention

    Authors: Sami Abu-El-Haija, Bryan Perozzi, Rami Al-Rfou, Alex Alemi

    Abstract: Graph embedding methods represent nodes in a continuous vector space, preserving information from the graph (e.g. by sampling random walks). There are many hyper-parameters to these methods (such as random walk length) which have to be manually tuned for every graph. In this paper, we replace random walk hyper-parameters with trainable parameters that we automatically learn via backpropagation. In… ▽ More

    Submitted 12 September, 2018; v1 submitted 26 October, 2017; originally announced October 2017.

  31. arXiv:1705.10589  [pdf, other

    cond-mat.dis-nn cs.CV

    Jeffrey's prior sampling of deep sigmoidal networks

    Authors: Lorien X. Hayden, Alexander A. Alemi, Paul H. Ginsparg, James P. Sethna

    Abstract: Neural networks have been shown to have a remarkable ability to uncover low dimensional structure in data: the space of possible reconstructed images form a reduced model manifold in image space. We explore this idea directly by analyzing the manifold learned by Deep Belief Networks and Stacked Denoising Autoencoders using Monte Carlo sampling. The model manifold forms an only slightly elongated h… ▽ More

    Submitted 25 May, 2017; originally announced May 2017.

  32. arXiv:1705.02082  [pdf, other

    cs.CV

    Motion Prediction Under Multimodality with Conditional Stochastic Networks

    Authors: Katerina Fragkiadaki, Jonathan Huang, Alex Alemi, Sudheendra Vijayanarasimhan, Susanna Ricco, Rahul Sukthankar

    Abstract: Given a visual history, multiple future outcomes for a video scene are equally probable, in other words, the distribution of future outcomes has multiple modes. Multimodality is notoriously hard to handle by standard regressors or classifiers: the former regress to the mean and the latter discretize a continuous high dimensional output space. In this work, we present stochastic neural network arch… ▽ More

    Submitted 5 May, 2017; originally announced May 2017.

  33. arXiv:1612.02780  [pdf, other

    cs.LG stat.ML

    Improved generator objectives for GANs

    Authors: Ben Poole, Alexander A. Alemi, Jascha Sohl-Dickstein, Anelia Angelova

    Abstract: We present a framework to understand GAN training as alternating density ratio estimation and approximate divergence minimization. This provides an interpretation for the mismatched GAN generator and discriminator objectives often used in practice, and explains the problem of poor sample diversity. We also derive a family of generator objectives that target arbitrary $f$-divergences without minimi… ▽ More

    Submitted 8 December, 2016; originally announced December 2016.

    Comments: NIPS 2016 Workshop on Adversarial Training

  34. arXiv:1612.00410  [pdf, other

    cs.LG cs.IT

    Deep Variational Information Bottleneck

    Authors: Alexander A. Alemi, Ian Fischer, Joshua V. Dillon, Kevin Murphy

    Abstract: We present a variational approximation to the information bottleneck of Tishby et al. (1999). This variational approach allows us to parameterize the information bottleneck model using a neural network and leverage the reparameterization trick for efficient training. We call this method "Deep Variational Information Bottleneck", or Deep VIB. We show that models trained with the VIB objective outpe… ▽ More

    Submitted 23 October, 2019; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: 19 pages, 8 figures, Accepted to ICLR17

    Journal ref: Proceedings of the International Conference on Learning Representations (ICLR) 2017

  35. arXiv:1606.04442  [pdf, other

    cs.AI cs.LG cs.LO

    DeepMath - Deep Sequence Models for Premise Selection

    Authors: Alex A. Alemi, Francois Chollet, Niklas Een, Geoffrey Irving, Christian Szegedy, Josef Urban

    Abstract: We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is t… ▽ More

    Submitted 26 January, 2017; v1 submitted 14 June, 2016; originally announced June 2016.

  36. arXiv:1602.07261  [pdf, other

    cs.CV

    Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

    Authors: Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi

    Abstract: Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performanc… ▽ More

    Submitted 23 August, 2016; v1 submitted 23 February, 2016; originally announced February 2016.

  37. arXiv:1505.06538  [pdf, other

    stat.ML cs.LG cs.SI

    Clustering via Content-Augmented Stochastic Blockmodels

    Authors: J. Massey Cashore, Xiaoting Zhao, Alexander A. Alemi, Yujia Liu, Peter I. Frazier

    Abstract: Much of the data being created on the web contains interactions between users and items. Stochastic blockmodels, and other methods for community detection and clustering of bipartite graphs, can infer latent user communities and latent item clusters from this interaction data. These methods, however, typically ignore the items' contents and the information they provide about item clusters, despite… ▽ More

    Submitted 25 May, 2015; originally announced May 2015.

  38. arXiv:1503.05543  [pdf, other

    cs.CL cs.IR

    Text Segmentation based on Semantic Word Embeddings

    Authors: Alexander A Alemi, Paul Ginsparg

    Abstract: We explore the use of semantic word embeddings in text segmentation algorithms, including the C99 segmentation algorithm and new algorithms inspired by the distributed word vector representation. By developing a general framework for discussing a class of segmentation objectives, we study the effectiveness of greedy versus exact optimization approaches and suggest a new iterative refinement techni… ▽ More

    Submitted 18 March, 2015; originally announced March 2015.

    Comments: 10 pages, 4 figures. KDD2015 submission