Skip to main content

Showing 1–50 of 58 results for author: Fergus, R

  1. arXiv:2406.16860  [pdf, other

    cs.CV

    Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs

    Authors: Shengbang Tong, Ellis Brown, Penghao Wu, Sanghyun Woo, Manoj Middepogu, Sai Charitha Akula, Jihan Yang, Shusheng Yang, Adithya Iyer, Xichen Pan, Austin Wang, Rob Fergus, Yann LeCun, Saining Xie

    Abstract: We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-centric approach. While stronger language models can enhance multimodal capabilities, the design choices for vision components are often insufficiently explored and disconnected from visual representation learning research. This gap hinders accurate sensory grounding in real-world scenarios. Our study uses LLMs and… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Website at https://cambrian-mllm.github.io

  2. arXiv:2405.03651  [pdf, other

    cs.IR cs.LG

    Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders

    Authors: Nishant Yadav, Nicholas Monath, Manzil Zaheer, Rob Fergus, Andrew McCallum

    Abstract: Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. Existing approaches perform k-NN search with CE by approximating the CE similarity with a vector embedding space fit either with dual-encoders (DE) or CUR matrix factorization. DE-based retrieve-and-rerank approaches… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: ICLR 2024

  3. arXiv:2312.07540  [pdf, other

    cs.AI cs.CL cs.LG

    diff History for Neural Language Agents

    Authors: Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

    Abstract: Neural Language Models (LMs) offer an exciting solution for general-purpose embodied control. However, a key technical issue arises when using an LM-based controller: environment observations must be converted to text, which coupled with history, results in long and verbose textual prompts. As a result, prior work in LM agents is limited to restricted domains with small observation size as well as… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

    Comments: ICML 2024 version

  4. arXiv:2309.11564  [pdf, other

    cs.LG cs.CL

    Hierarchical reinforcement learning with natural language subgoals

    Authors: Arun Ahuja, Kavya Kopparapu, Rob Fergus, Ishita Dasgupta

    Abstract: Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tas… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  5. arXiv:2305.19240  [pdf, other

    cs.LG cs.AI

    NetHack is Hard to Hack

    Authors: Ulyana Piterbarg, Lerrel Pinto, Rob Fergus

    Abstract: Neural policy learning methods have achieved remarkable results in various control problems, ranging from Atari games to simulated locomotion. However, these methods struggle in long-horizon tasks, especially in open-ended environments with multi-modal observations, such as the popular dungeon-crawler game, NetHack. Intriguingly, the NeurIPS 2021 NetHack Challenge revealed that symbolic agents out… ▽ More

    Submitted 30 October, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: NeurIPS 2023

  6. arXiv:2304.00046  [pdf, other

    cs.LG cs.AI

    Accelerating exploration and representation learning with offline pre-training

    Authors: Bogdan Mazoure, Jake Bruce, Doina Precup, Rob Fergus, Ankit Anand

    Abstract: Sequential decision-making agents struggle with long horizon tasks, since solving them requires multi-step reasoning. Most reinforcement learning (RL) algorithms address this challenge by improved credit assignment, introducing memory capability, altering the agent's intrinsic motivation (i.e. exploration) or its worldview (i.e. knowledge representation). Many of these components could be learned… ▽ More

    Submitted 31 March, 2023; originally announced April 2023.

  7. arXiv:2302.11552  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Reduce, Reuse, Recycle: Compositional Generation with Energy-Based Diffusion Models and MCMC

    Authors: Yilun Du, Conor Durkan, Robin Strudel, Joshua B. Tenenbaum, Sander Dieleman, Rob Fergus, Jascha Sohl-Dickstein, Arnaud Doucet, Will Grathwohl

    Abstract: Since their introduction, diffusion models have quickly become the prevailing approach to generative modeling in many domains. They can be interpreted as learning the gradients of a time-varying sequence of log-probability density functions. This interpretation has motivated classifier-based and classifier-free guidance as methods for post-hoc control of diffusion models. In this work, we build up… ▽ More

    Submitted 18 November, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: ICML 2023, Project Webpage: https://energy-based-model.github.io/reduce-reuse-recycle/

  8. arXiv:2302.00763  [pdf, other

    cs.LG cs.AI cs.CL

    Collaborating with language models for embodied reasoning

    Authors: Ishita Dasgupta, Christine Kaeser-Chen, Kenneth Marino, Arun Ahuja, Sheila Babayan, Felix Hill, Rob Fergus

    Abstract: Reasoning in a complex and ambiguous environment is a key goal for Reinforcement Learning (RL) agents. While some sophisticated RL agents can successfully solve difficult tasks, they require a large amount of training data and often struggle to generalize to new unseen environments and new tasks. On the other hand, Large Scale Language Models (LSLMs) have exhibited strong reasoning ability and the… ▽ More

    Submitted 1 February, 2023; originally announced February 2023.

    Comments: Presented at NeurIPS 2022 Language and Reinforcement Learning Workshop (best paper) and NeurIPS 2022 Foundation Models for Decision Making Workshop. 4 pages main; 14 pages total (including references and appendix); 3 figures

  9. arXiv:2301.12507  [pdf, other

    cs.AI

    Distilling Internet-Scale Vision-Language Models into Embodied Agents

    Authors: Theodore Sumers, Kenneth Marino, Arun Ahuja, Rob Fergus, Ishita Dasgupta

    Abstract: Instruction-following agents must ground language into their observation and action spaces. Learning to ground language is challenging, typically requiring domain-specific engineering or large quantities of human interaction data. To address this challenge, we propose using pretrained vision-language models (VLMs) to supervise embodied agents. We combine ideas from model distillation and hindsight… ▽ More

    Submitted 14 June, 2023; v1 submitted 29 January, 2023; originally announced January 2023.

    Comments: 9 pages, 7 figures. Presented at ICML 2023

  10. arXiv:2301.12005  [pdf, other

    cs.LG

    EmbedDistill: A Geometric Knowledge Distillation for Information Retrieval

    Authors: Seungyeon Kim, Ankit Singh Rawat, Manzil Zaheer, Sadeep Jayasumana, Veeranjaneyulu Sadhanala, Wittawat Jitkrittum, Aditya Krishna Menon, Rob Fergus, Sanjiv Kumar

    Abstract: Large neural models (such as Transformers) achieve state-of-the-art performance for information retrieval (IR). In this paper, we aim to improve distillation methods that pave the way for the resource-efficient deployment of such models in practice. Inspired by our theoretical analysis of the teacher-student generalization gap for IR models, we propose a novel distillation approach that leverages… ▽ More

    Submitted 3 July, 2023; v1 submitted 27 January, 2023; originally announced January 2023.

  11. arXiv:2211.00177  [pdf, other

    cs.LG cs.IR cs.SI

    Learning to Navigate Wikipedia by Taking Random Walks

    Authors: Manzil Zaheer, Kenneth Marino, Will Grathwohl, John Schultz, Wendy Shang, Sheila Babayan, Arun Ahuja, Ishita Dasgupta, Christine Kaeser-Chen, Rob Fergus

    Abstract: A fundamental ability of an intelligent web-based agent is seeking out and acquiring new information. Internet search engines reliably find the correct vicinity but the top results may be a few links away from the desired target. A complementary approach is navigation via hyperlinks, employing a policy that comprehends local content and selects a link that moves it closer to the target. In this pa… ▽ More

    Submitted 31 October, 2022; originally announced November 2022.

    Journal ref: NeurIPS 2022

  12. arXiv:2208.06825  [pdf, other

    cs.LG

    Teacher Guided Training: An Efficient Framework for Knowledge Transfer

    Authors: Manzil Zaheer, Ankit Singh Rawat, Seungyeon Kim, Chong You, Himanshu Jain, Andreas Veit, Rob Fergus, Sanjiv Kumar

    Abstract: The remarkable performance gains realized by large pretrained models, e.g., GPT-3, hinge on the massive amounts of data they are exposed to during training. Analogously, distilling such large models to compact models for efficient deployment also necessitates a large amount of (labeled or unlabeled) training data. In this paper, we propose the teacher-guided training (TGT) framework for training a… ▽ More

    Submitted 14 August, 2022; originally announced August 2022.

  13. arXiv:2107.09645  [pdf, other

    cs.AI cs.LG

    Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

    Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

    Abstract: We present DrQ-v2, a model-free reinforcement learning (RL) algorithm for visual continuous control. DrQ-v2 builds on DrQ, an off-policy actor-critic approach that uses data augmentation to learn directly from pixels. We introduce several improvements that yield state-of-the-art results on the DeepMind Control Suite. Notably, DrQ-v2 is able to solve complex humanoid locomotion tasks directly from… ▽ More

    Submitted 20 July, 2021; originally announced July 2021.

  14. arXiv:2107.03851  [pdf, other

    cs.LG cs.AI

    Imitation by Predicting Observations

    Authors: Andrew Jaegle, Yury Sulsky, Arun Ahuja, Jake Bruce, Rob Fergus, Greg Wayne

    Abstract: Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous con… ▽ More

    Submitted 8 July, 2021; originally announced July 2021.

    Comments: ICML 2021

  15. arXiv:2103.08050  [pdf, other

    cs.LG

    Offline Reinforcement Learning with Fisher Divergence Critic Regularization

    Authors: Ilya Kostrikov, Jonathan Tompson, Rob Fergus, Ofir Nachum

    Abstract: Many modern approaches to offline Reinforcement Learning (RL) utilize behavior regularization, typically augmenting a model-free actor critic algorithm with a penalty measuring divergence of the policy from the offline data. In this work, we propose an alternative approach to encouraging the learned policy to stay close to the data, namely parameterizing the critic as the log-behavior-policy, whic… ▽ More

    Submitted 14 March, 2021; originally announced March 2021.

  16. arXiv:2102.11271  [pdf, other

    cs.LG cs.AI

    Reinforcement Learning with Prototypical Representations

    Authors: Denis Yarats, Rob Fergus, Alessandro Lazaric, Lerrel Pinto

    Abstract: Learning effective representations in image-based environments is crucial for sample efficient Reinforcement Learning (RL). Unfortunately, in RL, representation learning is confounded with the exploratory experience of the agent -- learning a useful representation requires diverse data, while effective exploration is only possible with coherent representations. Furthermore, we would like to learn… ▽ More

    Submitted 20 July, 2021; v1 submitted 22 February, 2021; originally announced February 2021.

    Journal ref: ICML 2021

  17. arXiv:2102.10330  [pdf, other

    cs.LG cs.AI

    Decoupling Value and Policy for Generalization in Reinforcement Learning

    Authors: Roberta Raileanu, Rob Fergus

    Abstract: Standard deep reinforcement learning algorithms use a shared representation for the policy and value function, especially when training directly from images. However, we argue that more information is needed to accurately estimate the value function than to learn the optimal policy. Consequently, the use of a shared representation for the policy and value function can lead to overfitting. To allev… ▽ More

    Submitted 15 June, 2021; v1 submitted 20 February, 2021; originally announced February 2021.

  18. arXiv:2007.02879  [pdf, other

    cs.LG cs.AI

    Fast Adaptation via Policy-Dynamics Value Functions

    Authors: Roberta Raileanu, Max Goldstein, Arthur Szlam, Rob Fergus

    Abstract: Standard RL algorithms assume fixed environment dynamics and require a significant amount of interaction to adapt to new environments. We introduce Policy-Dynamics Value Functions (PD-VF), a novel approach for rapidly adapting to dynamics different from those previously seen in training. PD-VF explicitly estimates the cumulative reward in a space of policies and environments. An ensemble of conven… ▽ More

    Submitted 6 July, 2020; originally announced July 2020.

  19. arXiv:2006.15762  [pdf, other

    cs.AI cs.LG stat.ML

    Empirically Verifying Hypotheses Using Reinforcement Learning

    Authors: Kenneth Marino, Rob Fergus, Arthur Szlam, Abhinav Gupta

    Abstract: This paper formulates hypothesis verification as an RL problem. Specifically, we aim to build an agent that, given a hypothesis about the dynamics of the world, can take actions to generate observations which can help predict whether the hypothesis is true or false. Existing RL algorithms fail to solve this task, even for simple environments. In order to train the agents, we exploit the underlying… ▽ More

    Submitted 28 June, 2020; originally announced June 2020.

  20. arXiv:2006.12862  [pdf, other

    cs.LG cs.AI

    Automatic Data Augmentation for Generalization in Deep Reinforcement Learning

    Authors: Roberta Raileanu, Max Goldstein, Denis Yarats, Ilya Kostrikov, Rob Fergus

    Abstract: Deep reinforcement learning (RL) agents often fail to generalize to unseen scenarios, even when they are trained on many instances of semantically similar environments. Data augmentation has recently been shown to improve the sample efficiency and generalization of RL agents. However, different tasks tend to benefit from different kinds of data augmentation. In this paper, we compare three approac… ▽ More

    Submitted 20 February, 2021; v1 submitted 23 June, 2020; originally announced June 2020.

  21. arXiv:2004.13649  [pdf, other

    cs.LG cs.CV eess.IV stat.ML

    Image Augmentation Is All You Need: Regularizing Deep Reinforcement Learning from Pixels

    Authors: Ilya Kostrikov, Denis Yarats, Rob Fergus

    Abstract: We propose a simple data augmentation technique that can be applied to standard model-free reinforcement learning algorithms, enabling robust learning directly from pixels without the need for auxiliary losses or pre-training. The approach leverages input perturbations commonly used in computer vision tasks to regularize the value function. Existing model-free approaches, such as Soft Actor-Critic… ▽ More

    Submitted 7 March, 2021; v1 submitted 28 April, 2020; originally announced April 2020.

  22. arXiv:2004.13167  [pdf, other

    cs.LG q-bio.QM stat.ML

    Energy-based models for atomic-resolution protein conformations

    Authors: Yilun Du, Joshua Meier, Jerry Ma, Rob Fergus, Alexander Rives

    Abstract: We propose an energy-based model (EBM) of protein conformations that operates at atomic scale. The model is trained solely on crystallized protein data. By contrast, existing approaches for scoring conformations use energy functions that incorporate knowledge of physical principles and features that are the complex product of several decades of research and tuning. To evaluate the model, we benchm… ▽ More

    Submitted 27 April, 2020; originally announced April 2020.

    Comments: Accepted to ICLR 2020

    Journal ref: International Conference on Learning Representations (ICLR), 2020

  23. arXiv:1910.01741  [pdf, other

    cs.LG cs.AI cs.RO stat.ML

    Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

    Authors: Denis Yarats, Amy Zhang, Ilya Kostrikov, Brandon Amos, Joelle Pineau, Rob Fergus

    Abstract: Training an agent to solve control tasks directly from high-dimensional images with model-free reinforcement learning (RL) has proven difficult. A promising approach is to learn a latent representation together with the control policy. However, fitting a high-capacity encoder using a scarce reward signal is sample inefficient and leads to poor performance. Prior work has shown that auxiliary losse… ▽ More

    Submitted 9 July, 2020; v1 submitted 2 October, 2019; originally announced October 2019.

  24. arXiv:1909.05863  [pdf, other

    cs.CL cs.AI cs.IR cs.MA

    Finding Generalizable Evidence by Learning to Convince Q&A Models

    Authors: Ethan Perez, Siddharth Karamcheti, Rob Fergus, Jason Weston, Douwe Kiela, Kyunghyun Cho

    Abstract: We propose a system that finds the strongest supporting evidence for a given answer to a question, using passage-based question-answering (QA) as a testbed. We train evidence agents to select the passage sentences that most convince a pretrained QA model of a given answer, if the QA model received those sentences instead of the full passage. Rather than finding evidence that convinces one model al… ▽ More

    Submitted 12 September, 2019; originally announced September 2019.

    Comments: EMNLP 2019. Code available at https://github.com/ethanjperez/convince

  25. arXiv:1901.05590  [pdf, other

    cs.LG cs.CV stat.ML

    Disentangling Video with Independent Prediction

    Authors: William F. Whitney, Rob Fergus

    Abstract: We propose an unsupervised variational model for disentangling video into independent factors, i.e. each factor's future can be predicted from its past without considering the others. We show that our approach often learns factors which are interpretable as objects in a scene.

    Submitted 16 January, 2019; originally announced January 2019.

    Comments: Presented at the Learning Disentangled Representations: from Perception to Control workshop at NIPS 2017

  26. arXiv:1811.09083  [pdf, other

    cs.LG stat.ML

    Learning Goal Embeddings via Self-Play for Hierarchical Reinforcement Learning

    Authors: Sainbayar Sukhbaatar, Emily Denton, Arthur Szlam, Rob Fergus

    Abstract: In hierarchical reinforcement learning a major challenge is determining appropriate low-level policies. We propose an unsupervised learning scheme, based on asymmetric self-play from Sukhbaatar et al. (2018), that automatically learns a good representation of sub-goals in the environment and a low-level policy that can execute them. A high-level policy can then direct the lower one by generating a… ▽ More

    Submitted 22 November, 2018; originally announced November 2018.

  27. arXiv:1803.07616  [pdf, other

    cs.AI cs.CV

    IntPhys: A Framework and Benchmark for Visual Intuitive Physics Reasoning

    Authors: Ronan Riochet, Mario Ynocente Castro, Mathieu Bernard, Adam Lerer, Rob Fergus, Véronique Izard, Emmanuel Dupoux

    Abstract: In order to reach human performance on complexvisual tasks, artificial systems need to incorporate a sig-nificant amount of understanding of the world in termsof macroscopic objects, movements, forces, etc. Inspiredby work on intuitive physics in infants, we propose anevaluation benchmark which diagnoses how much a givensystem understands about physics by testing whether itcan tell apart well matc… ▽ More

    Submitted 11 February, 2020; v1 submitted 20 March, 2018; originally announced March 2018.

  28. arXiv:1803.00512  [pdf, other

    cs.AI

    Composable Planning with Attributes

    Authors: Amy Zhang, Adam Lerer, Sainbayar Sukhbaatar, Rob Fergus, Arthur Szlam

    Abstract: The tasks that an agent will need to solve often are not known during training. However, if the agent knows which properties of the environment are important then, after learning how its actions affect those properties, it may be able to use this knowledge to solve complex tasks without training specifically for them. Towards this end, we consider a setup in which an environment is augmented with… ▽ More

    Submitted 25 April, 2019; v1 submitted 1 March, 2018; originally announced March 2018.

    Journal ref: International Conference on Machine Learning, 2018

  29. arXiv:1802.09640  [pdf, other

    cs.AI cs.LG

    Modeling Others using Oneself in Multi-Agent Reinforcement Learning

    Authors: Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus

    Abstract: We consider the multi-agent reinforcement learning setting with imperfect information in which each agent is trying to maximize its own utility. The reward function depends on the hidden state (or goal) of both agents, so the agents must infer the other players' hidden goals from their observed behavior in order to solve the tasks. We propose a new approach for learning in these domains: Self Othe… ▽ More

    Submitted 23 March, 2018; v1 submitted 26 February, 2018; originally announced February 2018.

    Comments: 10 pages, 16 figures, submitted to ICML 2018

  30. arXiv:1802.07687  [pdf, other

    cs.CV cs.AI cs.LG stat.ML

    Stochastic Video Generation with a Learned Prior

    Authors: Remi Denton, Rob Fergus

    Abstract: Generating video frames that accurately predict future world states is challenging. Existing approaches either fail to capture the full distribution of outcomes, or yield blurry generations, or both. In this paper we introduce an unsupervised video generation model that learns a prior model of uncertainty in a given environment. Video frames are generated by drawing samples from this prior and c… ▽ More

    Submitted 2 March, 2018; v1 submitted 21 February, 2018; originally announced February 2018.

    ACM Class: I.2.6; I.4

  31. arXiv:1712.01238  [pdf, other

    cs.CV cs.CL cs.LG

    Learning by Asking Questions

    Authors: Ishan Misra, Ross Girshick, Rob Fergus, Martial Hebert, Abhinav Gupta, Laurens van der Maaten

    Abstract: We introduce an interactive learning framework for the development and testing of intelligent visual systems, called learning-by-asking (LBA). We explore LBA in context of the Visual Question Answering (VQA) task. LBA differs from standard VQA training in that most questions are not observed during training time, and the learner must ask questions it wants answers to. Thus, LBA more closely mimics… ▽ More

    Submitted 4 December, 2017; originally announced December 2017.

  32. arXiv:1703.05407  [pdf, other

    cs.LG

    Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play

    Authors: Sainbayar Sukhbaatar, Zeming Lin, Ilya Kostrikov, Gabriel Synnaeve, Arthur Szlam, Rob Fergus

    Abstract: We describe a simple scheme that allows an agent to learn about its environment in an unsupervised manner. Our scheme pits two versions of the same agent, Alice and Bob, against one another. Alice proposes a task for Bob to complete; and then Bob attempts to complete the task. In this work we will focus on two kinds of environments: (nearly) reversible environments and environments that can be res… ▽ More

    Submitted 27 April, 2018; v1 submitted 15 March, 2017; originally announced March 2017.

    Comments: Published in ICLR 2018

  33. arXiv:1611.06430  [pdf, other

    cs.CV

    Semi-Supervised Learning with Context-Conditional Generative Adversarial Networks

    Authors: Remi Denton, Sam Gross, Rob Fergus

    Abstract: We introduce a simple semi-supervised learning approach for images based on in-painting using an adversarial loss. Images with random patches removed are presented to a generator whose task is to fill in the hole, based on the surrounding pixels. The in-painted images are then presented to a discriminator network that judges if they are real (unaltered training images) or not. This task acts as… ▽ More

    Submitted 19 November, 2016; originally announced November 2016.

  34. arXiv:1605.07736  [pdf, other

    cs.LG cs.AI

    Learning Multiagent Communication with Backpropagation

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

    Abstract: Many tasks in AI require the collaboration of multiple agents. Typically, the communication protocol between agents is manually specified and not altered during training. In this paper we explore a simple neural model, called CommNet, that uses continuous communication for fully cooperative tasks. The model consists of multiple agents and the communication between them is learned alongside their p… ▽ More

    Submitted 31 October, 2016; v1 submitted 25 May, 2016; originally announced May 2016.

    Comments: Accepted to NIPS 2016

  35. arXiv:1603.01312  [pdf, other

    cs.AI

    Learning Physical Intuition of Block Towers by Example

    Authors: Adam Lerer, Sam Gross, Rob Fergus

    Abstract: Wooden blocks are a common toy for infants, allowing them to develop motor skills and gain intuition about the physical behavior of the world. In this paper, we explore the ability of deep feed-forward models to learn such intuitive physics. Using a 3D game engine, we create small towers of wooden blocks whose stability is randomized and render them collapsing (or remaining upright). This data all… ▽ More

    Submitted 3 March, 2016; originally announced March 2016.

  36. arXiv:1512.02167  [pdf, other

    cs.CV cs.CL

    Simple Baseline for Visual Question Answering

    Authors: Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, Rob Fergus

    Abstract: We describe a very simple bag-of-words baseline for visual question answering. This baseline concatenates the word features from the question and CNN features from the image to predict the answer. When evaluated on the challenging VQA dataset [2], it shows comparable performance to many recent approaches using recurrent neural networks. To explore the strength and weakness of the trained model, we… ▽ More

    Submitted 15 December, 2015; v1 submitted 7 December, 2015; originally announced December 2015.

    Comments: One comparison method's scores are put into the correct column, and a new experiment of generating attention map is added

  37. arXiv:1511.07401  [pdf, other

    cs.LG cs.AI cs.NE

    MazeBase: A Sandbox for Learning from Games

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Gabriel Synnaeve, Soumith Chintala, Rob Fergus

    Abstract: This paper introduces MazeBase: an environment for simple 2D games, designed as a sandbox for machine learning approaches to reasoning and planning. Within it, we create 10 simple games embodying a range of algorithmic tasks (e.g. if-then statements or set negation). A variety of neural models (fully connected, convolutional network, memory network) are deployed via reinforcement learning on these… ▽ More

    Submitted 7 January, 2016; v1 submitted 23 November, 2015; originally announced November 2015.

  38. arXiv:1511.07275  [pdf, other

    cs.AI cs.LG

    Learning Simple Algorithms from Examples

    Authors: Wojciech Zaremba, Tomas Mikolov, Armand Joulin, Rob Fergus

    Abstract: We present an approach for learning simple algorithms such as copying, multi-digit addition and single digit multiplication directly from examples. Our framework consists of a set of interfaces, accessed by a controller. Typical interfaces are 1-D tapes or 2-D grids that hold the input and output data. For the controller, we explore a range of neural network-based models which vary in their abilit… ▽ More

    Submitted 23 November, 2015; v1 submitted 23 November, 2015; originally announced November 2015.

  39. arXiv:1511.06681  [pdf, other

    cs.CV

    Deep End2End Voxel2Voxel Prediction

    Authors: Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

    Abstract: Over the last few years deep learning methods have emerged as one of the most prominent approaches for video analysis. However, so far their most successful applications have been in the area of video classification and detection, i.e., problems involving the prediction of a single class label or a handful of output variables per video. Furthermore, while deep networks are commonly recognized as t… ▽ More

    Submitted 20 November, 2015; originally announced November 2015.

  40. arXiv:1506.05751  [pdf, other

    cs.CV

    Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

    Authors: Emily Denton, Soumith Chintala, Arthur Szlam, Rob Fergus

    Abstract: In this paper we introduce a generative parametric model capable of producing high quality samples of natural images. Our approach uses a cascade of convolutional networks within a Laplacian pyramid framework to generate images in a coarse-to-fine fashion. At each level of the pyramid, a separate generative convnet model is trained using the Generative Adversarial Nets (GAN) approach (Goodfellow e… ▽ More

    Submitted 18 June, 2015; originally announced June 2015.

  41. arXiv:1505.03873  [pdf, other

    cs.CV

    Improving Image Classification with Location Context

    Authors: Kevin Tang, Manohar Paluri, Li Fei-Fei, Rob Fergus, Lubomir Bourdev

    Abstract: With the widespread availability of cellphones and cameras that have GPS capabilities, it is common for images being uploaded to the Internet today to have GPS coordinates associated with them. In addition to research that tries to predict GPS coordinates from visual features, this also opens up the door to problems that are conditioned on the availability of GPS coordinates. In this work, we tack… ▽ More

    Submitted 14 May, 2015; originally announced May 2015.

  42. arXiv:1503.08895  [pdf, other

    cs.NE cs.CL

    End-To-End Memory Networks

    Authors: Sainbayar Sukhbaatar, Arthur Szlam, Jason Weston, Rob Fergus

    Abstract: We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network (Weston et al., 2015) but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNse… ▽ More

    Submitted 24 November, 2015; v1 submitted 30 March, 2015; originally announced March 2015.

    Comments: Accepted to NIPS 2015

  43. arXiv:1501.05703  [pdf, other

    cs.CV

    Beyond Frontal Faces: Improving Person Recognition Using Multiple Cues

    Authors: Ning Zhang, Manohar Paluri, Yaniv Taigman, Rob Fergus, Lubomir Bourdev

    Abstract: We explore the task of recognizing peoples' identities in photo albums in an unconstrained setting. To facilitate this, we introduce the new People In Photo Albums (PIPA) dataset, consisting of over 60000 instances of 2000 individuals collected from public Flickr photo albums. With only about half of the person images containing a frontal face, the recognition task is very challenging due to the l… ▽ More

    Submitted 30 January, 2015; v1 submitted 22 January, 2015; originally announced January 2015.

  44. arXiv:1412.0767  [pdf, other

    cs.CV

    Learning Spatiotemporal Features with 3D Convolutional Networks

    Authors: Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, Manohar Paluri

    Abstract: We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset. Our findings are three-fold: 1) 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets; 2) A homogeneous architecture with small 3x3x3 convolution kernels in all layers is… ▽ More

    Submitted 6 October, 2015; v1 submitted 1 December, 2014; originally announced December 2014.

  45. arXiv:1411.5309  [pdf, other

    cs.CV

    End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum Suppression

    Authors: Li Wan, David Eigen, Rob Fergus

    Abstract: Deformable Parts Models and Convolutional Networks each have achieved notable performance in object detection. Yet these two approaches find their strengths in complementary areas: DPMs are well-versed in object composition, modeling fine-grained spatial relationships between parts; likewise, ConvNets are adept at producing powerful image features, having been discriminatively trained directly on… ▽ More

    Submitted 19 November, 2014; originally announced November 2014.

  46. arXiv:1411.4734  [pdf, other

    cs.CV

    Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture

    Authors: David Eigen, Rob Fergus

    Abstract: In this paper we address three different computer vision tasks using a single basic architecture: depth prediction, surface normal estimation, and semantic labeling. We use a multiscale convolutional network that is able to adapt easily to each task using only small modifications, regressing from the input image to the output map directly. Our method progressively refines predictions using a seque… ▽ More

    Submitted 16 December, 2015; v1 submitted 17 November, 2014; originally announced November 2014.

  47. arXiv:1407.0717  [pdf, other

    cs.CV

    Deep Poselets for Human Detection

    Authors: Lubomir Bourdev, Fei Yang, Rob Fergus

    Abstract: We address the problem of detecting people in natural scenes using a part approach based on poselets. We propose a bootstrapping method that allows us to collect millions of weakly labeled examples for each poselet type. We use these examples to train a Convolutional Neural Net to discriminate different poselet types and separate them from the background class. We then use the trained CNN as a way… ▽ More

    Submitted 2 July, 2014; originally announced July 2014.

  48. arXiv:1406.2283  [pdf, other

    cs.CV

    Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

    Authors: David Eigen, Christian Puhrsch, Rob Fergus

    Abstract: Predicting depth is an essential component in understanding the 3D geometry of a scene. While for stereo images local correspondence suffices for estimation, finding depth relations from a single image is less straightforward, requiring integration of both global and local information from various cues. Moreover, the task is inherently ambiguous, with a large source of uncertainty coming from the… ▽ More

    Submitted 9 June, 2014; originally announced June 2014.

  49. arXiv:1406.2080  [pdf, other

    cs.CV cs.LG cs.NE

    Training Convolutional Networks with Noisy Labels

    Authors: Sainbayar Sukhbaatar, Joan Bruna, Manohar Paluri, Lubomir Bourdev, Rob Fergus

    Abstract: The availability of large labeled datasets has allowed Convolutional Network models to achieve impressive recognition results. However, in many settings manual annotation of the data is impractical; instead our data has noisy labels, i.e. there is some freely available label for each image which may or may not be accurate. In this paper, we explore the performance of discriminatively-trained Convn… ▽ More

    Submitted 10 April, 2015; v1 submitted 9 June, 2014; originally announced June 2014.

    Comments: Accepted as a workshop contribution at ICLR 2015

  50. arXiv:1406.1584  [pdf, other

    cs.LG

    Learning to Discover Efficient Mathematical Identities

    Authors: Wojciech Zaremba, Karol Kurach, Rob Fergus

    Abstract: In this paper we explore how machine learning techniques can be applied to the discovery of efficient mathematical identities. We introduce an attribute grammar framework for representing symbolic expressions. Given a set of grammar rules we build trees that combine different rules, looking for branches which yield compositions that are analytically equivalent to a target expression, but of lower… ▽ More

    Submitted 5 November, 2014; v1 submitted 6 June, 2014; originally announced June 2014.