-
Training Convolutional Networks with Noisy Labels
Authors:
Sainbayar Sukhbaatar,
Joan Bruna,
Manohar Paluri,
Lubomir Bourdev,
Rob Fergus
Abstract:
The availability of large labeled datasets has allowed Convolutional Network models to achieve impressive recognition results. However, in many settings manual annotation of the data is impractical; instead our data has noisy labels, i.e. there is some freely available label for each image which may or may not be accurate. In this paper, we explore the performance of discriminatively-trained Convn…
▽ More
The availability of large labeled datasets has allowed Convolutional Network models to achieve impressive recognition results. However, in many settings manual annotation of the data is impractical; instead our data has noisy labels, i.e. there is some freely available label for each image which may or may not be accurate. In this paper, we explore the performance of discriminatively-trained Convnets when trained on such noisy data. We introduce an extra noise layer into the network which adapts the network outputs to match the noisy label distribution. The parameters of this noise layer can be estimated as part of the training process and involve simple modifications to current training infrastructures for deep networks. We demonstrate the approaches on several datasets, including large scale experiments on the ImageNet classification benchmark.
△ Less
Submitted 10 April, 2015; v1 submitted 9 June, 2014;
originally announced June 2014.
-
Learning to Discover Efficient Mathematical Identities
Authors:
Wojciech Zaremba,
Karol Kurach,
Rob Fergus
Abstract:
In this paper we explore how machine learning techniques can be applied to the discovery of efficient mathematical identities. We introduce an attribute grammar framework for representing symbolic expressions. Given a set of grammar rules we build trees that combine different rules, looking for branches which yield compositions that are analytically equivalent to a target expression, but of lower…
▽ More
In this paper we explore how machine learning techniques can be applied to the discovery of efficient mathematical identities. We introduce an attribute grammar framework for representing symbolic expressions. Given a set of grammar rules we build trees that combine different rules, looking for branches which yield compositions that are analytically equivalent to a target expression, but of lower computational complexity. However, as the size of the trees grows exponentially with the complexity of the target expression, brute force search is impractical for all but the simplest of expressions. Consequently, we introduce two novel learning approaches that are able to learn from simpler expressions to guide the tree search. The first of these is a simple n-gram model, the other being a recursive neural-network. We show how these approaches enable us to derive complex identities, beyond reach of brute-force search, or human derivation.
△ Less
Submitted 5 November, 2014; v1 submitted 6 June, 2014;
originally announced June 2014.
-
Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation
Authors:
Remi Denton,
Wojciech Zaremba,
Joan Bruna,
Yann LeCun,
Rob Fergus
Abstract:
We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations in the lo…
▽ More
We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations in the lower layers of the model. We exploit the linear structure present within the convolutional filters to derive approximations that significantly reduce the required computation. Using large state-of-the-art models, we demonstrate we demonstrate speedups of convolutional layers on both CPU and GPU by a factor of 2x, while keeping the accuracy within 1% of the original model.
△ Less
Submitted 9 June, 2014; v1 submitted 2 April, 2014;
originally announced April 2014.
-
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
Authors:
Pierre Sermanet,
David Eigen,
Xiang Zhang,
Michael Mathieu,
Rob Fergus,
Yann LeCun
Abstract:
We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to incr…
▽ More
We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat.
△ Less
Submitted 23 February, 2014; v1 submitted 21 December, 2013;
originally announced December 2013.
-
Intriguing properties of neural networks
Authors:
Christian Szegedy,
Wojciech Zaremba,
Ilya Sutskever,
Joan Bruna,
Dumitru Erhan,
Ian Goodfellow,
Rob Fergus
Abstract:
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties.
First, we find that there is no distinction betwee…
▽ More
Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties.
First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.
Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.
△ Less
Submitted 19 February, 2014; v1 submitted 20 December, 2013;
originally announced December 2013.
-
Understanding Deep Architectures using a Recursive Convolutional Network
Authors:
David Eigen,
Jason Rolfe,
Rob Fergus,
Yann LeCun
Abstract:
A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing…
▽ More
A key challenge in designing convolutional network models is sizing them appropriately. Many factors are involved in these decisions, including number of layers, feature maps, kernel sizes, etc. Complicating this further is the fact that each of these influence not only the numbers and dimensions of the activation units, but also the total number of parameters. In this paper we focus on assessing the independent contributions of three of these linked variables: The numbers of layers, feature maps, and parameters. To accomplish this, we employ a recursive convolutional network whose weights are tied between layers; this allows us to vary each of the three factors in a controlled setting. We find that while increasing the numbers of layers and parameters each have clear benefit, the number of feature maps (and hence dimensionality of the representation) appears ancillary, and finds most of its benefit through the introduction of more weights. Our results (i) empirically confirm the notion that adding layers alone increases computational power, within the context of convolutional layers, and (ii) suggest that precise sizing of convolutional feature map dimensions is itself of little concern; more attention should be paid to the number of parameters in these layers instead.
△ Less
Submitted 19 February, 2014; v1 submitted 6 December, 2013;
originally announced December 2013.
-
Blind Deconvolution with Non-local Sparsity Reweighting
Authors:
Dilip Krishnan,
Joan Bruna,
Rob Fergus
Abstract:
Blind deconvolution has made significant progress in the past decade. Most successful algorithms are classified either as Variational or Maximum a-Posteriori ($MAP$). In spite of the superior theoretical justification of variational techniques, carefully constructed $MAP$ algorithms have proven equally effective in practice. In this paper, we show that all successful $MAP$ and variational algorith…
▽ More
Blind deconvolution has made significant progress in the past decade. Most successful algorithms are classified either as Variational or Maximum a-Posteriori ($MAP$). In spite of the superior theoretical justification of variational techniques, carefully constructed $MAP$ algorithms have proven equally effective in practice. In this paper, we show that all successful $MAP$ and variational algorithms share a common framework, relying on the following key principles: sparsity promotion in the gradient domain, $l_2$ regularization for kernel estimation, and the use of convex (often quadratic) cost functions. Our observations lead to a unified understanding of the principles required for successful blind deconvolution. We incorporate these principles into a novel algorithm that improves significantly upon the state of the art.
△ Less
Submitted 16 June, 2014; v1 submitted 16 November, 2013;
originally announced November 2013.
-
Visualizing and Understanding Convolutional Networks
Authors:
Matthew D Zeiler,
Rob Fergus
Abstract:
Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of t…
▽ More
Large Convolutional Network models have recently demonstrated impressive classification performance on the ImageNet benchmark. However there is no clear understanding of why they perform so well, or how they might be improved. In this paper we address both issues. We introduce a novel visualization technique that gives insight into the function of intermediate feature layers and the operation of the classifier. We also perform an ablation study to discover the performance contribution from different model layers. This enables us to find model architectures that outperform Krizhevsky \etal on the ImageNet classification benchmark. We show our ImageNet model generalizes well to other datasets: when the softmax classifier is retrained, it convincingly beats the current state-of-the-art results on Caltech-101 and Caltech-256 datasets.
△ Less
Submitted 28 November, 2013; v1 submitted 12 November, 2013;
originally announced November 2013.
-
Maximizing Kepler science return per telemetered pixel: Searching the habitable zones of the brightest stars
Authors:
Benjamin T. Montet,
Ruth Angus,
Tom Barclay,
Rebekah Dawson,
Rob Fergus,
Dan Foreman-Mackey,
Stefan Harmeling,
Michael Hirsch,
David W. Hogg,
Dustin Lang,
David Schiminovich,
Bernhard Scholkopf
Abstract:
In today's mailing, Hogg et al. propose image modeling techniques to maintain 10-ppm-level precision photometry in Kepler data with only two working reaction wheels. While these results are relevant to many scientific goals for the repurposed mission, all modeling efforts so far have used a toy model of the Kepler telescope. Because the two-wheel performance of Kepler remains to be determined, we…
▽ More
In today's mailing, Hogg et al. propose image modeling techniques to maintain 10-ppm-level precision photometry in Kepler data with only two working reaction wheels. While these results are relevant to many scientific goals for the repurposed mission, all modeling efforts so far have used a toy model of the Kepler telescope. Because the two-wheel performance of Kepler remains to be determined, we advocate for the consideration of an alternate strategy for a >1 year program that maximizes the science return from the "low-torque" fields across the ecliptic plane. Assuming we can reach the precision of the original Kepler mission, we expect to detect 800 new planet candidates in the first year of such a mission. Our proposed strategy has benefits for transit timing variation and transit duration variation studies, especially when considered in concert with the future TESS mission. We also expect to help address the first key science goal of Kepler: the frequency of planets in the habitable zone as a function of spectral type.
△ Less
Submitted 3 September, 2013;
originally announced September 2013.
-
Maximizing Kepler science return per telemetered pixel: Detailed models of the focal plane in the two-wheel era
Authors:
David W. Hogg,
Ruth Angus,
Tom Barclay,
Rebekah Dawson,
Rob Fergus,
Dan Foreman-Mackey,
Stefan Harmeling,
Michael Hirsch,
Dustin Lang,
Benjamin T. Montet,
David Schiminovich,
Bernhard Schölkopf
Abstract:
Kepler's immense photometric precision to date was maintained through satellite stability and precise pointing. In this white paper, we argue that image modeling--fitting the Kepler-downlinked raw pixel data--can vastly improve the precision of Kepler in pointing-degraded two-wheel mode. We argue that a non-trivial modeling effort may permit continuance of photometry at 10-ppm-level precision. We…
▽ More
Kepler's immense photometric precision to date was maintained through satellite stability and precise pointing. In this white paper, we argue that image modeling--fitting the Kepler-downlinked raw pixel data--can vastly improve the precision of Kepler in pointing-degraded two-wheel mode. We argue that a non-trivial modeling effort may permit continuance of photometry at 10-ppm-level precision. We demonstrate some baby steps towards precise models in both data-driven (flexible) and physics-driven (interpretably parameterized) modes. We demonstrate that the expected drift or jitter in positions in the two-weel era will help with constraining calibration parameters. In particular, we show that we can infer the device flat-field at higher than pixel resolution; that is, we can infer pixel-to-pixel variations in intra-pixel sensitivity. These results are relevant to almost any scientific goal for the repurposed mission; image modeling ought to be a part of any two-wheel repurpose for the satellite. We make other recommendations for Kepler operations, but fundamentally advocate that the project stick with its core mission of finding and characterizing Earth analogs. [abridged]
△ Less
Submitted 3 September, 2013;
originally announced September 2013.
-
Reconnaissance of the HR 8799 Exosolar System I: Near IR Spectroscopy
Authors:
B. R. Oppenheimer,
C. Baranec,
C. Beichman,
D. Brenner,
R. Burruss,
E. Cady,
J. R. Crepp,
R. Dekany,
R. Fergus,
D. Hale,
L. Hillenbrand,
S. Hinkley,
David W. Hogg,
D. King,
E. R. Ligon,
T. Lockhart,
R. Nilsson,
I. R. Parry,
L. Pueyo,
E. Rice,
J. E. Roberts,
L. C. Roberts, Jr.,
M. Shao,
A. Sivaramakrishnan,
R. Soummer
, et al. (7 additional authors not shown)
Abstract:
We obtained spectra, in the wavelength range λ= 995 - 1769 nm, of all four known planets orbiting the star HR 8799. Using the suite of instrumentation known as Project 1640 on the Palomar 5-m Hale Telescope, we acquired data at two epochs. This allowed for multiple imaging detections of the companions and multiple extractions of low-resolution (R ~ 35) spectra. Data reduction employed two differen…
▽ More
We obtained spectra, in the wavelength range λ= 995 - 1769 nm, of all four known planets orbiting the star HR 8799. Using the suite of instrumentation known as Project 1640 on the Palomar 5-m Hale Telescope, we acquired data at two epochs. This allowed for multiple imaging detections of the companions and multiple extractions of low-resolution (R ~ 35) spectra. Data reduction employed two different methods of speckle suppression and spectrum extraction, both yielding results that agree. The spectra do not directly correspond to those of any known objects, although similarities with L and T-dwarfs are present, as well as some characteristics similar to planets such as Saturn. We tentatively identify the presence of CH_4 along with NH_3 and/or C_2H_2, and possibly CO_2 or HCN in varying amounts in each component of the system. Other studies suggested red colors for these faint companions, and our data confirm those observations. Cloudy models, based on previous photometric observations, may provide the best explanation for the new data presented here. Notable in our data is that these presumably co-eval objects of similar luminosity have significantly different spectra; the diversity of planets may be greater than previously thought. The techniques and methods employed in this paper represent a new capability to observe and rapidly characterize exoplanetary systems in a routine manner over a broad range of planet masses and separations. These are the first simultaneous spectroscopic observations of multiple planets in a planetary system other than our own.
△ Less
Submitted 11 March, 2013;
originally announced March 2013.
-
Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
Authors:
Matthew D. Zeiler,
Rob Fergus
Abstract:
We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with…
▽ More
We introduce a simple and effective method for regularizing large convolutional neural networks. We replace the conventional deterministic pooling operations with a stochastic procedure, randomly picking the activation within each pooling region according to a multinomial distribution, given by the activities within the pooling region. The approach is hyper-parameter free and can be combined with other regularization approaches, such as dropout and data augmentation. We achieve state-of-the-art performance on four image datasets, relative to other approaches that do not utilize data augmentation.
△ Less
Submitted 15 January, 2013;
originally announced January 2013.
-
Differentiable Pooling for Hierarchical Feature Learning
Authors:
Matthew D. Zeiler,
Rob Fergus
Abstract:
We introduce a parametric form of pooling, based on a Gaussian, which can be optimized alongside the features in a single global objective function. By contrast, existing pooling schemes are based on heuristics (e.g. local maximum) and have no clear link to the cost function of the model. Furthermore, the variables of the Gaussian explicitly store location information, distinct from the appearance…
▽ More
We introduce a parametric form of pooling, based on a Gaussian, which can be optimized alongside the features in a single global objective function. By contrast, existing pooling schemes are based on heuristics (e.g. local maximum) and have no clear link to the cost function of the model. Furthermore, the variables of the Gaussian explicitly store location information, distinct from the appearance captured by the features, thus providing a what/where decomposition of the input signal. Although the differentiable pooling scheme can be incorporated in a wide range of hierarchical models, we demonstrate it in the context of a Deconvolutional Network model (Zeiler et al. ICCV 2011). We also explore a number of secondary issues within this model and present detailed experiments on MNIST digits.
△ Less
Submitted 30 June, 2012;
originally announced July 2012.