Skip to main content

Showing 1–18 of 18 results for author: Guadarrama, S

  1. arXiv:2207.13224  [pdf, other

    cs.RO cs.AI cs.LG

    PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

    Authors: Kuang-Huei Lee, Ofir Nachum, Tingnan Zhang, Sergio Guadarrama, Jie Tan, Wenhao Yu

    Abstract: Evolution Strategy (ES) algorithms have shown promising results in training complex robotic control policies due to their massive parallelism capability, simple implementation, effective parameter-space exploration, and fast training time. However, a key limitation of ES is its scalability to large capacity models, including modern neural network architectures. In this work, we develop Predictive… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: To appear at IROS 2022. The supplementary video is available at https://kuanghuei.github.io/piars

  2. arXiv:2205.15241  [pdf, other

    cs.AI cs.LG

    Multi-Game Decision Transformers

    Authors: Kuang-Huei Lee, Ofir Nachum, Mengjiao Yang, Lisa Lee, Daniel Freeman, Winnie Xu, Sergio Guadarrama, Ian Fischer, Eric Jang, Henryk Michalewski, Igor Mordatch

    Abstract: A longstanding goal of the field of AI is a method for learning a highly capable, generalist agent from diverse experience. In the subfields of vision and language, this was largely achieved by scaling up transformer-based models and training them on large, diverse datasets. Motivated by this progress, we investigate whether the same strategy can be used to produce generalist reinforcement learnin… ▽ More

    Submitted 15 October, 2022; v1 submitted 30 May, 2022; originally announced May 2022.

    Comments: NeurIPS 2022. 24 pages, 16 figures. Additional information, videos and code can be seen at https://sites.google.com/view/multi-game-transformers

  3. arXiv:2109.12909  [pdf, other

    cs.LG cs.CV cs.IT

    Compressive Visual Representations

    Authors: Kuang-Huei Lee, Anurag Arnab, Sergio Guadarrama, John Canny, Ian Fischer

    Abstract: Learning effective visual representations that generalize well without human supervision is a fundamental problem in order to apply Machine Learning to a wide variety of tasks. Recently, two families of self-supervised methods, contrastive learning and latent bootstrapping, exemplified by SimCLR and BYOL respectively, have made significant progress. In this work, we hypothesize that adding explici… ▽ More

    Submitted 4 December, 2021; v1 submitted 27 September, 2021; originally announced September 2021.

    Comments: NeurIPS 2021. 27 pages, 4 figures. Code and pretrained models at https://github.com/google-research/compressive-visual-representations

  4. arXiv:2007.12401  [pdf, other

    cs.LG cs.AI cs.IT cs.RO stat.ML

    Predictive Information Accelerates Learning in RL

    Authors: Kuang-Huei Lee, Ian Fischer, Anthony Liu, Yijie Guo, Honglak Lee, John Canny, Sergio Guadarrama

    Abstract: The Predictive Information is the mutual information between the past and the future, I(X_past; X_future). We hypothesize that capturing the predictive information is useful in RL, since the ability to model what will happen next is necessary for success on many tasks. To test our hypothesis, we train Soft Actor-Critic (SAC) agents from pixels with an auxiliary task that learns a compressed repres… ▽ More

    Submitted 25 October, 2020; v1 submitted 24 July, 2020; originally announced July 2020.

    Comments: To appear at NeurIPS 2020

  5. arXiv:1912.05663  [pdf, other

    stat.ML cs.AI cs.LG

    Measuring the Reliability of Reinforcement Learning Algorithms

    Authors: Stephanie C. Y. Chan, Samuel Fishman, John Canny, Anoop Korattikara, Sergio Guadarrama

    Abstract: Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, w… ▽ More

    Submitted 12 February, 2020; v1 submitted 10 December, 2019; originally announced December 2019.

    Comments: Accepted for publication at ICLR 2020 (spotlight)

  6. arXiv:1902.07742  [pdf, other

    cs.LG stat.ML

    From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

    Authors: Justin Fu, Anoop Korattikara, Sergey Levine, Sergio Guadarrama

    Abstract: Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer. Specifying goals and tasks for autonomous machines, such as robots, is a significant challenge: conventionally, reward functions and goal states have been used to communicate objectives. But people can commu… ▽ More

    Submitted 20 February, 2019; originally announced February 2019.

  7. arXiv:1806.09594  [pdf, other

    cs.CV cs.GR cs.LG cs.MM cs.RO

    Tracking Emerges by Colorizing Videos

    Authors: Carl Vondrick, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, Kevin Murphy

    Abstract: We use large amounts of unlabeled video to learn models for visual tracking without manual human supervision. We leverage the natural temporal coherency of color to create a model that learns to colorize gray-scale videos by copying colors from a reference frame. Quantitative and qualitative experiments suggest that this task causes the model to automatically learn to track visual regions. Althoug… ▽ More

    Submitted 27 July, 2018; v1 submitted 25 June, 2018; originally announced June 2018.

    Comments: ECCV 2018. Blog post: https://ai.googleblog.com/2018/06/self-supervised-tracking-via-video.html

  8. arXiv:1707.05847  [pdf, other

    cs.CV

    The Devil is in the Decoder: Classification, Regression and GANs

    Authors: Zbigniew Wojna, Vittorio Ferrari, Sergio Guadarrama, Nathan Silberman, Liang-Chieh Chen, Alireza Fathi, Jasper Uijlings

    Abstract: Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders… ▽ More

    Submitted 19 February, 2019; v1 submitted 18 July, 2017; originally announced July 2017.

  9. arXiv:1705.07208  [pdf, other

    cs.CV cs.LG

    PixColor: Pixel Recursive Colorization

    Authors: Sergio Guadarrama, Ryan Dahl, David Bieber, Mohammad Norouzi, Jonathon Shlens, Kevin Murphy

    Abstract: We propose a novel approach to automatically produce multiple colorized versions of a grayscale image. Our method results from the observation that the task of automated colorization is relatively easy given a low-resolution version of the color image. We first train a conditional PixelCNN to generate a low resolution color for a given grayscale image. Then, given the generated low-resolution colo… ▽ More

    Submitted 5 June, 2017; v1 submitted 19 May, 2017; originally announced May 2017.

  10. arXiv:1703.10277  [pdf, other

    cs.CV

    Semantic Instance Segmentation via Deep Metric Learning

    Authors: Alireza Fathi, Zbigniew Wojna, Vivek Rathod, Peng Wang, Hyun Oh Song, Sergio Guadarrama, Kevin P. Murphy

    Abstract: We propose a new method for semantic instance segmentation, by first computing how likely two pixels are to belong to the same object, and then by grouping similar pixels together. Our similarity metric is based on a deep, fully convolutional embedding model. Our grouping method is based on selecting all points that are sufficiently similar to a set of "seed points", chosen from a deep, fully conv… ▽ More

    Submitted 29 March, 2017; originally announced March 2017.

  11. Improved Image Captioning via Policy Gradient optimization of SPIDEr

    Authors: Siqi Liu, Zhenhai Zhu, Ning Ye, Sergio Guadarrama, Kevin Murphy

    Abstract: Current image captioning methods are usually trained via (penalized) maximum likelihood estimation. However, the log-likelihood score of a caption does not correlate well with human assessments of quality. Standard syntactic evaluation metrics, such as BLEU, METEOR and ROUGE, are also not well correlated. The newer SPICE and CIDEr metrics are better correlated, but have traditionally been hard to… ▽ More

    Submitted 12 March, 2018; v1 submitted 1 December, 2016; originally announced December 2016.

    Comments: Accepted at ICCV 2017

  12. arXiv:1611.10012  [pdf, other

    cs.CV

    Speed/accuracy trade-offs for modern convolutional object detectors

    Authors: Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy

    Abstract: The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples… ▽ More

    Submitted 24 April, 2017; v1 submitted 30 November, 2016; originally announced November 2016.

    Comments: Accepted to CVPR 2017

  13. arXiv:1411.4389  [pdf, other

    cs.CV

    Long-term Recurrent Convolutional Networks for Visual Recognition and Description

    Authors: Jeff Donahue, Lisa Anne Hendricks, Marcus Rohrbach, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, Trevor Darrell

    Abstract: Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of thes… ▽ More

    Submitted 31 May, 2016; v1 submitted 17 November, 2014; originally announced November 2014.

    Comments: Originally presented at CVPR 2015 (oral). Updated version (accepted as a TPAMI journal article) includes additional results

  14. arXiv:1409.4689  [pdf, other

    cs.CV cs.LG

    Compute Less to Get More: Using ORC to Improve Sparse Filtering

    Authors: Johannes Lederer, Sergio Guadarrama

    Abstract: Sparse Filtering is a popular feature learning algorithm for image classification pipelines. In this paper, we connect the performance of Sparse Filtering with spectral properties of the corresponding feature matrices. This connection provides new insights into Sparse Filtering; in particular, it suggests early stopping of Sparse Filtering. We therefore introduce the Optimal Roundness Criterion (O… ▽ More

    Submitted 24 May, 2015; v1 submitted 16 September, 2014; originally announced September 2014.

  15. arXiv:1408.5093  [pdf, other

    cs.CV cs.LG cs.NE

    Caffe: Convolutional Architecture for Fast Feature Embedding

    Authors: Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, Trevor Darrell

    Abstract: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models. The framework is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures. Caffe fits i… ▽ More

    Submitted 20 June, 2014; originally announced August 2014.

    Comments: Tech report for the Caffe software at http://github.com/BVLC/Caffe/

  16. arXiv:1407.5035  [pdf, other

    cs.CV

    LSDA: Large Scale Detection Through Adaptation

    Authors: Judy Hoffman, Sergio Guadarrama, Eric Tzeng, Ronghang Hu, Jeff Donahue, Ross Girshick, Trevor Darrell, Kate Saenko

    Abstract: A major challenge in scaling object detection is the difficulty of obtaining labeled images for large numbers of categories. Recently, deep convolutional neural networks (CNNs) have emerged as clear winners on object classification benchmarks, in part due to training with 1.2M+ labeled classification images. Unfortunately, only a small fraction of those labels are available for the detection task.… ▽ More

    Submitted 31 October, 2014; v1 submitted 18 July, 2014; originally announced July 2014.

    Journal ref: Neural Information Processing Systems (NIPS) 2014

  17. arXiv:1006.5827  [pdf, other

    cs.RO cs.CL

    Approximate Robotic Mapping from sonar data by modeling Perceptions with Antonyms

    Authors: Sergio Guadarrama, Antonio Ruiz-Mayor

    Abstract: This work, inspired by the idea of "Computing with Words and Perceptions" proposed by Zadeh in 2001, focuses on how to transform measurements into perceptions for the problem of map building by Autonomous Mobile Robots. We propose to model the perceptions obtained from sonar-sensors as two grid maps: one for obstacles and another for empty spaces. The rules used to build and integrate these maps a… ▽ More

    Submitted 30 June, 2010; originally announced June 2010.

    Comments: To appear in Information Sciences

    Report number: FSC-2008-14

  18. arXiv:1005.5253  [pdf, other

    cs.CL cs.AI cs.HC cs.LG

    Using Soft Constraints To Learn Semantic Models Of Descriptions Of Shapes

    Authors: Sergio Guadarrama, David P. Pancho

    Abstract: The contribution of this paper is to provide a semantic model (using soft constraints) of the words used by web-users to describe objects in a language game; a game in which one user describes a selected object of those composing the scene, and another user has to guess which object has been described. The given description needs to be non ambiguous and accurate enough to allow other users to gues… ▽ More

    Submitted 28 May, 2010; originally announced May 2010.

    Comments: 8 pages, 8 figures, WCCI'10 Conference

    Report number: FSC 2009-22