Skip to main content

Showing 1–50 of 57 results for author: Poggio, T

  1. arXiv:2406.11110  [pdf, other

    cs.LG math.OC stat.ML

    How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD

    Authors: Pierfrancesco Beneventano, Andrea Pinto, Tomaso Poggio

    Abstract: We investigate the ability of deep neural networks to identify the support of the target function. Our findings reveal that mini-batch SGD effectively learns the support in the first layer of the network by shrinking to zero the weights associated with irrelevant components of input. In contrast, we demonstrate that while vanilla GD also approximates the target function, it requires an explicit re… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 34 pages, 19 figures

  2. arXiv:2312.04709  [pdf, other

    cs.LG cs.NE

    How to guess a gradient

    Authors: Utkarsh Singhal, Brian Cheung, Kartik Chandra, Jonathan Ragan-Kelley, Joshua B. Tenenbaum, Tomaso A. Poggio, Stella X. Yu

    Abstract: How much can you say about the gradient of a neural network without computing a loss or knowing the label? This may sound like a strange question: surely the answer is "very little." However, in this paper, we show that gradients are more structured than previously thought. Gradients lie in a predictable low-dimensional subspace which depends on the network architecture and incoming features. Expl… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  3. arXiv:2302.06677  [pdf, other

    q-bio.NC cs.AI cs.LG

    System identification of neural systems: If we got it right, would we know?

    Authors: Yena Han, Tomaso Poggio, Brian Cheung

    Abstract: Artificial neural networks are being proposed as models of parts of the brain. The networks are compared to recordings of biological neurons, and good performance in reproducing neural responses is considered to support the model's validity. A key question is how much this system identification approach tells us about brain computation. Does it validate one model architecture over another? We eval… ▽ More

    Submitted 30 August, 2023; v1 submitted 13 February, 2023; originally announced February 2023.

  4. arXiv:2301.12033  [pdf, other

    cs.LG

    Norm-based Generalization Bounds for Compositionally Sparse Neural Networks

    Authors: Tomer Galanti, Mengjia Xu, Liane Galanti, Tomaso Poggio

    Abstract: In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, including convolutional neural networks. These bounds differ from previous ones, as they consider the norms of the convolutional filters instead of the norms of the associated Toepli… ▽ More

    Submitted 27 January, 2023; originally announced January 2023.

  5. arXiv:2212.12675  [pdf, other

    stat.ML cs.LG math.OC

    Iterative regularization in classification via hinge loss diagonal descent

    Authors: Vassilis Apidopoulos, Tomaso Poggio, Lorenzo Rosasco, Silvia Villa

    Abstract: Iterative regularization is a classic idea in regularization theory, that has recently become popular in machine learning. On the one hand, it allows to design efficient algorithms controlling at the same time numerical and statistical accuracy. On the other hand it allows to shed light on the learning curves observed while training neural networks. In this paper, we focus on iterative regularizat… ▽ More

    Submitted 24 December, 2022; originally announced December 2022.

  6. arXiv:2206.05794  [pdf, other

    cs.LG stat.ML

    Characterizing the Implicit Bias of Regularized SGD in Rank Minimization

    Authors: Tomer Galanti, Zachary S. Siegel, Aparna Gupte, Tomaso Poggio

    Abstract: We study the bias of Stochastic Gradient Descent (SGD) to learn low-rank weight matrices when training deep neural networks. Our results show that training neural networks with mini-batch SGD and weight decay causes a bias towards rank minimization over the weight matrices. Specifically, we show, both theoretically and empirically, that this bias is more pronounced when using smaller batch sizes,… ▽ More

    Submitted 25 October, 2023; v1 submitted 12 June, 2022; originally announced June 2022.

  7. arXiv:2110.11536  [pdf, other

    cs.AI cs.LG

    Neural-guided, Bidirectional Program Search for Abstraction and Reasoning

    Authors: Simon Alford, Anshula Gandhi, Akshay Rangamani, Andrzej Banburski, Tony Wang, Sylee Dandekar, John Chin, Tomaso Poggio, Peter Chin

    Abstract: One of the challenges facing artificial intelligence research today is designing systems capable of utilizing systematic reasoning to generalize to new tasks. The Abstraction and Reasoning Corpus (ARC) measures such a capability through a set of visual reasoning tasks. In this paper we report incremental progress on ARC and lay the foundations for two approaches to abstraction and reasoning not ba… ▽ More

    Submitted 26 October, 2021; v1 submitted 21 October, 2021; originally announced October 2021.

    Comments: Published as a conference paper at Complex Networks 2021

  8. arXiv:2107.10199  [pdf, other

    cs.LG cs.AI stat.ML

    Distribution of Classification Margins: Are All Data Equal?

    Authors: Andrzej Banburski, Fernanda De La Torre, Nishka Pant, Ishana Shastri, Tomaso Poggio

    Abstract: Recent theoretical results show that gradient descent on deep neural networks under exponential loss functions locally maximizes classification margin, which is equivalent to minimizing the norm of the weight matrices under margin constraints. This property of the solution however does not fully characterize the generalization performance. We motivate theoretically and show empirically that the ar… ▽ More

    Submitted 21 July, 2021; originally announced July 2021.

    Comments: Previously online as CBMM Memo 115 on the CBMM MIT site

  9. arXiv:2102.10534  [pdf, other

    cs.LG cs.CV

    The Effects of Image Distribution and Task on Adversarial Robustness

    Authors: Owen Kunhardt, Arturo Deza, Tomaso Poggio

    Abstract: In this paper, we propose an adaptation to the area under the curve (AUC) metric to measure the adversarial robustness of a model over a particular $ε$-interval $[ε_0, ε_1]$ (interval of adversarial perturbation strengths) that facilitates unbiased comparisons across models when they have different initial $ε_0$ performance. This can be used to determine how adversarially robust a model is to diff… ▽ More

    Submitted 21 February, 2021; originally announced February 2021.

    Comments: Under review at ICML 2021

  10. arXiv:2101.00072  [pdf, other

    cs.LG stat.ML

    Explicit regularization and implicit bias in deep network classifiers trained with the square loss

    Authors: Tomaso Poggio, Qianli Liao

    Abstract: Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks. We provide here a theoretical justification based on analysis of the associated gradient flow. We show that convergence to a solution with the absolute minimum norm is expected when normalization techniques such as Batch Normalization (BN) or Weight Normalization (WN) are used together with… ▽ More

    Submitted 31 December, 2020; originally announced January 2021.

  11. arXiv:2012.08655  [pdf, other

    eess.IV cs.CV cs.LG q-bio.NC

    CUDA-Optimized real-time rendering of a Foveated Visual System

    Authors: Elian Malkin, Arturo Deza, Tomaso Poggio

    Abstract: The spatially-varying field of the human visual system has recently received a resurgence of interest with the development of virtual reality (VR) and neural networks. The computational demands of high resolution rendering desired for VR can be offset by savings in the periphery, while neural networks trained with foveated input have shown perceptual gains in i.i.d and o.o.d generalization. In thi… ▽ More

    Submitted 15 December, 2020; originally announced December 2020.

    Comments: 16 pages, 13 figures, presented at the Shared Visual Representations in Human and Machine Intelligence Workshop (SVRHM NeurIPS 2020)

  12. arXiv:2006.16427  [pdf, other

    cs.LG cs.CV stat.ML

    Biologically Inspired Mechanisms for Adversarial Robustness

    Authors: Manish V. Reddy, Andrzej Banburski, Nishka Pant, Tomaso Poggio

    Abstract: A convolutional neural network strongly robust to adversarial perturbations at reasonable computational and performance cost has not yet been demonstrated. The primate visual ventral stream seems to be robust to small perturbations in visual stimuli but the underlying mechanisms that give rise to this robust perception are not understood. In this work, we investigate the role of two biologically p… ▽ More

    Submitted 29 June, 2020; originally announced June 2020.

    Comments: 25 pages, 15 figures

  13. arXiv:2006.15522  [pdf, other

    stat.ML cs.LG

    For interpolating kernel machines, minimizing the norm of the ERM solution minimizes stability

    Authors: Akshay Rangamani, Lorenzo Rosasco, Tomaso Poggio

    Abstract: We study the average $\mbox{CV}_{loo}$ stability of kernel ridge-less regression and derive corresponding risk bounds. We show that the interpolating solution with minimum norm minimizes a bound on $\mbox{CV}_{loo}$ stability, which in turn is controlled by the condition number of the empirical kernel matrix. The latter can be characterized in the asymptotic regime where both the dimension and car… ▽ More

    Submitted 11 October, 2020; v1 submitted 28 June, 2020; originally announced June 2020.

  14. arXiv:2006.13915  [pdf, other

    cs.LG eess.IV q-bio.NC stat.ML

    Hierarchically Compositional Tasks and Deep Convolutional Networks

    Authors: Arturo Deza, Qianli Liao, Andrzej Banburski, Tomaso Poggio

    Abstract: The main success stories of deep learning, starting with ImageNet, depend on deep convolutional networks, which on certain tasks perform significantly better than traditional shallow classifiers, such as support vector machines, and also better than deep fully connected networks; but what is so special about deep convolutional networks? Recent results in approximation theory proved an exponential… ▽ More

    Submitted 25 March, 2021; v1 submitted 24 June, 2020; originally announced June 2020.

    Comments: A pre-print. Currently Under Review

    Report number: MIT Center for Brains, Minds and Machines (CBMM) Memo #109

  15. arXiv:1912.06190  [pdf, other

    cs.LG stat.ML

    Double descent in the condition number

    Authors: Tomaso Poggio, Gil Kur, Andrzej Banburski

    Abstract: In solving a system of $n$ linear equations in $d$ variables $Ax=b$, the condition number of the $n,d$ matrix $A$ measures how much errors in the data $b$ affect the solution $x$. Estimates of this type are important in many inverse problems. An example is machine learning where the key task is to estimate an underlying function from a set of measurements at random points in a high dimensional spa… ▽ More

    Submitted 28 April, 2020; v1 submitted 12 December, 2019; originally announced December 2019.

    Comments: Removed parts relating to kernel regression to streamline the presentation, fixed some typos

  16. arXiv:1908.09375  [pdf, other

    cs.LG stat.ML

    Theoretical Issues in Deep Networks: Approximation, Optimization and Generalization

    Authors: Tomaso Poggio, Andrzej Banburski, Qianli Liao

    Abstract: While deep learning is successful in a number of applications, it is not yet well understood theoretically. A satisfactory theoretical characterization of deep learning however, is beginning to emerge. It covers the following questions: 1) representation power of deep networks 2) optimization of the empirical risk 3) generalization properties of gradient descent techniques --- why the expected err… ▽ More

    Submitted 25 August, 2019; originally announced August 2019.

    Comments: arXiv admin note: text overlap with arXiv:1611.00740

  17. arXiv:1905.12882  [pdf, other

    cs.LG stat.ML

    Function approximation by deep networks

    Authors: H. N. Mhaskar, T. Poggio

    Abstract: We show that deep networks are better than shallow networks at approximating functions that can be expressed as a composition of functions described by a directed acyclic graph, because the deep networks can be designed to have the same compositional structure, while a shallow network cannot exploit this knowledge. Thus, the blessing of compositionality mitigates the curse of dimensionality. On th… ▽ More

    Submitted 23 November, 2019; v1 submitted 30 May, 2019; originally announced May 2019.

    Comments: To appear in Communications in pure and applied mathematics

  18. arXiv:1903.04991  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Theory III: Dynamics and Generalization in Deep Networks

    Authors: Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Fernanda De La Torre, Jack Hidary, Tomaso Poggio

    Abstract: The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity -- such as an explicit regularization term -- in the training of deep networks for classification. We will show that a classical form of norm control -- but kind of hidden -- is present in deep networks trained with gradient descent techniques on exponential-type losses. In pa… ▽ More

    Submitted 10 April, 2020; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: 47 pages, 11 figures. This replaces previous versions of Theory III, that appeared on Arxiv [arXiv:1806.11379, arXiv:1801.00173] or on the CBMM site. v5: Changes throughout the paper to the presentation and tightening some of the statements

  19. arXiv:1811.03567  [pdf, other

    cs.LG cs.AI cs.CV cs.NE stat.ML

    Biologically-plausible learning algorithms can scale to large datasets

    Authors: Will Xiao, Honglin Chen, Qianli Liao, Tomaso Poggio

    Abstract: The backpropagation (BP) algorithm is often thought to be biologically implausible in the brain. One of the main reasons is that BP requires symmetric weight matrices in the feedforward and feedback pathways. To address this "weight transport problem" (Grossberg, 1987), two more biologically plausible algorithms, proposed by Liao et al. (2016) and Lillicrap et al. (2016), relax BP's weight symmetr… ▽ More

    Submitted 20 December, 2018; v1 submitted 8 November, 2018; originally announced November 2018.

  20. arXiv:1807.09659  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    A Surprising Linear Relationship Predicts Test Performance in Deep Networks

    Authors: Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso Poggio

    Abstract: Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors? Better understanding of this question of generalization may improve practical applications of deep networks. In this paper we show that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that ha… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

  21. arXiv:1806.11379  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Theory IIIb: Generalization in Deep Networks

    Authors: Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary

    Abstract: A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent re… ▽ More

    Submitted 29 June, 2018; originally announced June 2018.

    Comments: 38 pages, 7 figures

  22. arXiv:1806.04542  [pdf, other

    stat.ML cs.LG

    Approximate inference with Wasserstein gradient flows

    Authors: Charlie Frogner, Tomaso Poggio

    Abstract: We present a novel approximate inference method for diffusion processes, based on the Wasserstein gradient flow formulation of the diffusion. In this formulation, the time-dependent density of the diffusion is derived as the limit of implicit Euler steps that follow the gradients of a particular free energy functional. Existing methods for computing Wasserstein gradient flows rely on discretizatio… ▽ More

    Submitted 12 June, 2018; originally announced June 2018.

  23. arXiv:1802.06266  [pdf, other

    cs.LG math.NA

    An analysis of training and generalization errors in shallow and deep networks

    Authors: Hrushikesh Mhaskar, Tomaso Poggio

    Abstract: This paper is motivated by an open problem around deep networks, namely, the apparent absence of over-fitting despite large over-parametrization which allows perfect fitting of the training data. In this paper, we analyze this phenomenon in the case of regression problems when each unit evaluates a periodic activation function. We argue that the minimal expected value of the square loss is inappro… ▽ More

    Submitted 27 August, 2019; v1 submitted 17 February, 2018; originally announced February 2018.

    Comments: 21 pages; Accepted for publication in Neural Networks

  24. arXiv:1801.02254  [pdf, other

    cs.LG

    Theory of Deep Learning IIb: Optimization Properties of SGD

    Authors: Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio

    Abstract: In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume, "flat" minima, selecting flat minimizers which… ▽ More

    Submitted 7 January, 2018; originally announced January 2018.

  25. arXiv:1801.00173  [pdf, other

    cs.LG

    Theory of Deep Learning III: explaining the non-overfitting puzzle

    Authors: Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar

    Abstract: A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to… ▽ More

    Submitted 16 January, 2018; v1 submitted 30 December, 2017; originally announced January 2018.

  26. arXiv:1711.01530  [pdf, other

    cs.LG cs.AI stat.ML

    Fisher-Rao Metric, Geometry, and Complexity of Neural Networks

    Authors: Tengyuan Liang, Tomaso Poggio, Alexander Rakhlin, James Stokes

    Abstract: We study the relationship between geometry and capacity measures for deep neural networks from an invariance viewpoint. We introduce a new notion of capacity --- the Fisher-Rao norm --- that possesses desirable invariance properties and is motivated by Information Geometry. We discover an analytical characterization of the new capacity measure, through which we establish norm-comparison inequaliti… ▽ More

    Submitted 23 February, 2019; v1 submitted 5 November, 2017; originally announced November 2017.

    Comments: To appear in the proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019

    Journal ref: The 22nd International Conference on Artificial Intelligence and Statistics 89 (2019) 888-896

  27. arXiv:1707.05455  [pdf, ps, other

    cs.CV

    Pruning Convolutional Neural Networks for Image Instance Retrieval

    Authors: Gaurav Manek, Jie Lin, Vijay Chandrasekhar, Lingyu Duan, Sateesh Giduthuri, Xiaoli Li, Tomaso Poggio

    Abstract: In this work, we focus on the problem of image instance retrieval with deep descriptors extracted from pruned Convolutional Neural Networks (CNN). The objective is to heavily prune convolutional edges while maintaining retrieval performance. To this end, we introduce both data-independent and data-dependent heuristics to prune convolutional edges, and evaluate their performance across various comp… ▽ More

    Submitted 17 July, 2017; originally announced July 2017.

    Comments: 5 pages

  28. arXiv:1706.08616  [pdf, other

    cs.CV

    Do Deep Neural Networks Suffer from Crowding?

    Authors: Anna Volokitin, Gemma Roig, Tomaso Poggio

    Abstract: Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks for object recognition. We analyze both standard deep convolutional neural networks (DCNNs) as well as a new version of DCNNs w… ▽ More

    Submitted 26 June, 2017; originally announced June 2017.

    Comments: CBMM memo

    Report number: 69

  29. arXiv:1703.09833  [pdf, other

    cs.LG cs.CV cs.NE

    Theory II: Landscape of the Empirical Risk in Deep Learning

    Authors: Qianli Liao, Tomaso Poggio

    Abstract: Previous theoretical work on deep learning and neural network optimization tend to focus on avoiding saddle points and local minima. However, the practical observation is that, at least in the case of the most successful Deep Convolutional Neural Networks (DCNNs), practitioners can always increase the network size to fit the training data (an extreme example would be [1]). The most successful DCNN… ▽ More

    Submitted 22 June, 2017; v1 submitted 28 March, 2017; originally announced March 2017.

    Comments: Merged figures to make the main text more compact. Moved some similar figures to the appendix

  30. arXiv:1701.04923  [pdf, other

    cs.CV

    Compression of Deep Neural Networks for Image Instance Retrieval

    Authors: Vijay Chandrasekhar, Jie Lin, Qianli Liao, Olivier Morère, Antoine Veillard, Lingyu Duan, Tomaso Poggio

    Abstract: Image instance retrieval is the problem of retrieving images from a database which contain the same object. Convolutional Neural Network (CNN) based descriptors are becoming the dominant approach for generating {\it global image descriptors} for the instance retrieval problem. One major drawback of CNN-based {\it global descriptors} is that uncompressed deep neural network models require hundreds… ▽ More

    Submitted 17 January, 2017; originally announced January 2017.

    Comments: 10 pages, accepted by DCC 2017

  31. arXiv:1611.00740  [pdf, other

    cs.LG

    Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

    Authors: Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao

    Abstract: The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage.

    Submitted 4 February, 2017; v1 submitted 2 November, 2016; originally announced November 2016.

  32. arXiv:1610.06160  [pdf, other

    cs.LG cs.NE

    Streaming Normalization: Towards Simpler and More Biologically-plausible Normalizations for Online and Recurrent Learning

    Authors: Qianli Liao, Kenji Kawaguchi, Tomaso Poggio

    Abstract: We systematically explored a spectrum of normalization algorithms related to Batch Normalization (BN) and propose a generalized formulation that simultaneously solves two major limitations of BN: (1) online learning and (2) recurrent learning. Our proposal is simpler and more biologically-plausible. Unlike previous approaches, our technique can be applied out of the box to all learning scenarios (… ▽ More

    Submitted 19 October, 2016; originally announced October 2016.

  33. arXiv:1608.03287  [pdf, other

    cs.LG math.FA

    Deep vs. shallow networks : An approximation theory perspective

    Authors: Hrushikesh Mhaskar, Tomaso Poggio

    Abstract: The paper briefy reviews several recent results on hierarchical architectures for learning from examples, that may formally explain the conditions under which Deep Convolutional Neural Networks perform much better in function approximation problems than shallow, one-hidden layer architectures. The paper announces new results for a non-smooth activation function - the ReLU function - used in presen… ▽ More

    Submitted 10 August, 2016; originally announced August 2016.

    Comments: 14 pages, 4 figures, to be published in a Journal

    Report number: CBMM Memo 54

  34. arXiv:1606.01552  [pdf, other

    cs.NE q-bio.NC

    View-tolerant face recognition and Hebbian learning imply mirror-symmetric neural tuning to head orientation

    Authors: Joel Z. Leibo, Qianli Liao, Winrich Freiwald, Fabio Anselmi, Tomaso Poggio

    Abstract: The primate brain contains a hierarchy of visual areas, dubbed the ventral stream, which rapidly computes object representations that are both specific for object identity and relatively robust against identity-preserving transformations like depth-rotations. Current computational models of object recognition, including recent deep learning networks, generate these properties through a hierarchy o… ▽ More

    Submitted 5 June, 2016; originally announced June 2016.

  35. arXiv:1604.03640  [pdf, other

    cs.LG cs.NE

    Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex

    Authors: Qianli Liao, Tomaso Poggio

    Abstract: We discuss relations between Residual Networks (ResNet), Recurrent Neural Networks (RNNs) and the primate visual cortex. We begin with the observation that a special type of shallow RNN is exactly equivalent to a very deep ResNet with weight sharing among the layers. A direct implementation of such a RNN, although having orders of magnitude fewer parameters, leads to a performance similar to the c… ▽ More

    Submitted 31 December, 2020; v1 submitted 12 April, 2016; originally announced April 2016.

    Comments: This version was written in Sept. 2016. For April 2016 version see v1 below

  36. arXiv:1603.04595  [pdf, other

    cs.CV cs.IR

    Nested Invariance Pooling and RBM Hashing for Image Instance Retrieval

    Authors: Olivier Morère, Jie Lin, Antoine Veillard, Vijay Chandrasekhar, Tomaso Poggio

    Abstract: The goal of this work is the computation of very compact binary hashes for image instance retrieval. Our approach has two novel contributions. The first one is Nested Invariance Pooling (NIP), a method inspired from i-theory, a mathematical theory for computing group invariant transformations with feed-forward neural networks. NIP is able to produce compact and well-performing descriptors with vis… ▽ More

    Submitted 14 April, 2016; v1 submitted 15 March, 2016; originally announced March 2016.

    Comments: Image Instance Retrieval, CNN, Invariant Representation, Hashing, Unsupervised Learning, Regularization. arXiv admin note: text overlap with arXiv:1601.02093

  37. arXiv:1603.00988  [pdf, other

    cs.LG

    Learning Functions: When Is Deep Better Than Shallow

    Authors: Hrushikesh Mhaskar, Qianli Liao, Tomaso Poggio

    Abstract: While the universal approximation property holds both for hierarchical and shallow networks, we prove that deep (hierarchical) networks can approximate the class of compositional functions with the same accuracy as shallow networks but with exponentially lower number of training parameters as well as VC-dimension. This theorem settles an old conjecture by Bengio on the role of depth in networks. W… ▽ More

    Submitted 29 May, 2016; v1 submitted 3 March, 2016; originally announced March 2016.

  38. arXiv:1601.02093  [pdf, other

    cs.CV cs.IR

    Group Invariant Deep Representations for Image Instance Retrieval

    Authors: Olivier Morère, Antoine Veillard, Jie Lin, Julie Petta, Vijay Chandrasekhar, Tomaso Poggio

    Abstract: Most image instance retrieval pipelines are based on comparison of vectors known as global image descriptors between a query image and the database images. Due to their success in large scale image classification, representations extracted from Convolutional Neural Networks (CNN) are quickly gaining ground on Fisher Vectors (FVs) as state-of-the-art global descriptors for image instance retrieval.… ▽ More

    Submitted 13 January, 2016; v1 submitted 9 January, 2016; originally announced January 2016.

  39. arXiv:1511.06292  [pdf, other

    cs.LG cs.CV

    Foveation-based Mechanisms Alleviate Adversarial Examples

    Authors: Yan Luo, Xavier Boix, Gemma Roig, Tomaso Poggio, Qi Zhao

    Abstract: We show that adversarial examples, i.e., the visually imperceptible perturbations that result in Convolutional Neural Networks (CNNs) fail, can be alleviated with a mechanism based on foveations---applying the CNN in different image regions. To see this, first, we report results in ImageNet that lead to a revision of the hypothesis that adversarial perturbations are a consequence of CNNs acting as… ▽ More

    Submitted 19 January, 2016; v1 submitted 19 November, 2015; originally announced November 2015.

  40. arXiv:1510.05067  [pdf, other

    cs.LG

    How Important is Weight Symmetry in Backpropagation?

    Authors: Qianli Liao, Joel Z. Leibo, Tomaso Poggio

    Abstract: Gradient backpropagation (BP) requires symmetric feedforward and feedback connections -- the same weights must be used for forward and backward passes. This "weight transport problem" (Grossberg 1987) is thought to be one of the main reasons to doubt BP's biologically plausibility. Using 15 different classification datasets, we systematically investigate to what extent BP really depends on weight… ▽ More

    Submitted 4 February, 2016; v1 submitted 16 October, 2015; originally announced October 2015.

  41. arXiv:1510.04935  [pdf, other

    cs.AI cs.LG stat.ML

    Holographic Embeddings of Knowledge Graphs

    Authors: Maximilian Nickel, Lorenzo Rosasco, Tomaso Poggio

    Abstract: Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correl… ▽ More

    Submitted 7 December, 2015; v1 submitted 16 October, 2015; originally announced October 2015.

    Comments: To appear in AAAI-16

    ACM Class: I.2.6; I.2.4

  42. arXiv:1508.01084  [pdf, other

    cs.LG cs.NE

    Deep Convolutional Networks are Hierarchical Kernel Machines

    Authors: Fabio Anselmi, Lorenzo Rosasco, Cheston Tan, Tomaso Poggio

    Abstract: In i-theory a typical layer of a hierarchical architecture consists of HW modules pooling the dot products of the inputs to the layer with the transformations of a few templates under a group. Such layers include as special cases the convolutional layers of Deep Convolutional Networks (DCNs) as well as the non-convolutional layers (when the group contains only the identity). Rectifying nonlinearit… ▽ More

    Submitted 5 August, 2015; originally announced August 2015.

  43. arXiv:1506.05439  [pdf, other

    cs.LG cs.CV stat.ML

    Learning with a Wasserstein Loss

    Authors: Charlie Frogner, Chiyuan Zhang, Hossein Mobahi, Mauricio Araya-Polo, Tomaso Poggio

    Abstract: Learning to predict multi-label outputs is challenging, but in many problems there is a natural metric on the outputs that can be used to improve predictions. In this paper we develop a loss function for multi-label learning, based on the Wasserstein distance. The Wasserstein distance provides a natural notion of dissimilarity for probability measures. Although optimizing with respect to the exact… ▽ More

    Submitted 29 December, 2015; v1 submitted 17 June, 2015; originally announced June 2015.

    Comments: NIPS 2015; v3 updates Algorithm 1 and Equations 6, 8

  44. arXiv:1506.02544  [pdf, ps, other

    cs.LG cs.CV stat.ML

    Learning with Group Invariant Features: A Kernel Perspective

    Authors: Youssef Mroueh, Stephen Voinea, Tomaso Poggio

    Abstract: We analyze in this paper a random feature map based on a theory of invariance I-theory introduced recently. More specifically, a group invariant signal signature is obtained through cumulative distributions of group transformed random projections. Our analysis bridges invariant feature learning with kernel methods, as we show that this feature map defines an expected Haar integration kernel that i… ▽ More

    Submitted 4 December, 2015; v1 submitted 8 June, 2015; originally announced June 2015.

    Comments: NIPS 2015

  45. arXiv:1504.03101  [pdf, ps, other

    cs.LG

    Convex Learning of Multiple Tasks and their Structure

    Authors: Carlo Ciliberto, Youssef Mroueh, Tomaso Poggio, Lorenzo Rosasco

    Abstract: Reducing the amount of human supervision is a key problem in machine learning and a natural approach is that of exploiting the relations (structure) among different tasks. This is the idea at the core of multi-task learning. In this context a fundamental question is how to incorporate the tasks structure in the learning problem.We tackle this question by studying a general computational framework… ▽ More

    Submitted 17 April, 2015; v1 submitted 13 April, 2015; originally announced April 2015.

    Comments: 26 pages, 1 figure, 2 tables

  46. arXiv:1503.05938  [pdf, other

    cs.LG

    On Invariance and Selectivity in Representation Learning

    Authors: Fabio Anselmi, Lorenzo Rosasco, Tomaso Poggio

    Abstract: We discuss data representation which can be learned automatically from data, are invariant to transformations, and at the same time selective, in the sense that two points have the same representation only if they are one the transformation of the other. The mathematical results here sharpen some of the key claims of i-theory -- a recent theory of feedforward processing in sensory cortex.

    Submitted 19 March, 2015; originally announced March 2015.

  47. arXiv:1409.3879  [pdf, other

    cs.CV cs.LG

    Unsupervised learning of clutter-resistant visual representations from natural videos

    Authors: Qianli Liao, Joel Z. Leibo, Tomaso Poggio

    Abstract: Populations of neurons in inferotemporal cortex (IT) maintain an explicit code for object identity that also tolerates transformations of object appearance e.g., position, scale, viewing angle [1, 2, 3]. Though the learning rules are not known, recent results [4, 5, 6] suggest the operation of an unsupervised temporal-association-based method e.g., Foldiak's trace rule [7]. Such methods exploit th… ▽ More

    Submitted 23 April, 2015; v1 submitted 12 September, 2014; originally announced September 2014.

  48. arXiv:1406.3884  [pdf, other

    cs.SD cs.LG

    Learning An Invariant Speech Representation

    Authors: Georgios Evangelopoulos, Stephen Voinea, Chiyuan Zhang, Lorenzo Rosasco, Tomaso Poggio

    Abstract: Recognition of speech, and in particular the ability to generalize and learn from small sets of labelled examples like humans do, depends on an appropriate representation of the acoustic input. We formulate the problem of finding robust speech features for supervised learning with small sample complexity as a problem of learning representations of the signal that are maximally invariant to intracl… ▽ More

    Submitted 15 June, 2014; originally announced June 2014.

    Comments: CBMM Memo No. 022, 5 pages, 2 figures

    Report number: CBMM Memo No. 022

  49. arXiv:1406.3793  [pdf

    cs.AI cs.CV cs.NE q-bio.NC

    Neural tuning size is a key factor underlying holistic face processing

    Authors: Cheston Tan, Tomaso Poggio

    Abstract: Faces are a class of visual stimuli with unique significance, for a variety of reasons. They are ubiquitous throughout the course of a person's life, and face recognition is crucial for daily social interaction. Faces are also unlike any other stimulus class in terms of certain physical stimulus characteristics. Furthermore, faces have been empirically found to elicit certain characteristic behavi… ▽ More

    Submitted 14 June, 2014; originally announced June 2014.

  50. arXiv:1406.1770  [pdf, other

    cs.LG q-bio.NC

    Computational role of eccentricity dependent cortical magnification

    Authors: Tomaso Poggio, Jim Mutch, Leyla Isik

    Abstract: We develop a sampling extension of M-theory focused on invariance to scale and translation. Quite surprisingly, the theory predicts an architecture of early vision with increasing receptive field sizes and a high resolution fovea -- in agreement with data about the cortical magnification factor, V1 and the retina. From the slope of the inverse of the magnification factor, M-theory predicts a corti… ▽ More

    Submitted 6 June, 2014; originally announced June 2014.

    Report number: CBMM memo 17