Skip to main content

Showing 1–50 of 64 results for author: Castro, S

  1. arXiv:2406.18420  [pdf, other

    cs.LG cs.AI

    Mixture of Experts in a Mixture of RL settings

    Authors: Timon Willi, Johan Obando-Ceron, Jakob Foerster, Karolina Dziugaite, Pablo Samuel Castro

    Abstract: Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's lea… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.17523  [pdf, other

    cs.LG cs.AI

    On the consistency of hyper-parameter selection in value-based deep reinforcement learning

    Authors: Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

    Abstract: Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed tec… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2403.03950  [pdf, other

    cs.LG cs.AI stat.ML

    Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

    Authors: Jesse Farebrother, Jordi Orbay, Quan Vuong, Adrien Ali Taïga, Yevgen Chebotar, Ted Xiao, Alex Irpan, Sergey Levine, Pablo Samuel Castro, Aleksandra Faust, Aviral Kumar, Rishabh Agarwal

    Abstract: Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast… ▽ More

    Submitted 6 March, 2024; originally announced March 2024.

  4. arXiv:2402.15021  [pdf, other

    cs.CV cs.CL

    CLoVe: Encoding Compositional Language in Contrastive Vision-Language Models

    Authors: Santiago Castro, Amir Ziai, Avneesh Saluja, Zhuoning Yuan, Rada Mihalcea

    Abstract: Recent years have witnessed a significant increase in the performance of Vision and Language tasks. Foundational Vision-Language Models (VLMs), such as CLIP, have been leveraged in multiple settings and demonstrated remarkable performance across several tasks. Such models excel at object-centric recognition yet learn text representations that seem invariant to word order, failing to compose known… ▽ More

    Submitted 29 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  5. arXiv:2402.12479  [pdf, other

    cs.LG cs.AI

    In value-based deep reinforcement learning, a pruned network is a good network

    Authors: Johan Obando-Ceron, Aaron Courville, Pablo Samuel Castro

    Abstract: Recent work has shown that deep reinforcement learning agents have difficulty in effectively using their network parameters. We leverage prior insights into the advantages of sparse training techniques and demonstrate that gradual magnitude pruning enables value-based agents to maximize parameter effectiveness. This results in networks that yield dramatic performance improvements over traditional… ▽ More

    Submitted 25 June, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

  6. arXiv:2402.08609  [pdf, other

    cs.LG cs.AI

    Mixtures of Experts Unlock Parameter Scaling for Deep RL

    Authors: Johan Obando-Ceron, Ghada Sokar, Timon Willi, Clare Lyle, Jesse Farebrother, Jakob Foerster, Gintare Karolina Dziugaite, Doina Precup, Pablo Samuel Castro

    Abstract: The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-… ▽ More

    Submitted 26 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

  7. High-Dimensional Bayesian Optimisation with Large-Scale Constraints -- An Application to Aeroelastic Tailoring

    Authors: Hauke Maathuis, Roeland De Breuker, Saullo G. P. Castro

    Abstract: Design optimisation potentially leads to lightweight aircraft structures with lower environmental impact. Due to the high number of design variables and constraints, these problems are ordinarily solved using gradient-based optimisation methods, leading to a local solution in the design space while the global space is neglected. Bayesian Optimisation is a promising path towards sample-efficient, g… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

    Comments: Conference paper submitted to AIAA Scitech 2024 Forum

  8. arXiv:2311.17894  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci cs.LG

    Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

    Authors: Max Schwarzer, Jesse Farebrother, Joshua Greaves, Ekin Dogus Cubuk, Rishabh Agarwal, Aaron Courville, Marc G. Bellemare, Sergei Kalinin, Igor Mordatch, Pablo Samuel Castro, Kevin M. Roccapriore

    Abstract: We introduce a machine learning approach to determine the transition dynamics of silicon atoms on a single layer of carbon atoms, when stimulated by the electron beam of a scanning transmission electron microscope (STEM). Our method is data-centric, leveraging data collected on a STEM. The data samples are processed and filtered to produce symbolic representations, which we use to train a neural n… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

  9. arXiv:2311.14115  [pdf, other

    cs.LG cs.AI cs.CL

    A density estimation perspective on learning from pairwise human preferences

    Authors: Vincent Dumoulin, Daniel D. Johnson, Pablo Samuel Castro, Hugo Larochelle, Yann Dauphin

    Abstract: Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted… ▽ More

    Submitted 10 January, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

  10. arXiv:2310.19804  [pdf, other

    cs.LG cs.AI

    A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

    Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

    Abstract: Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). T… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Published in TMLR

  11. arXiv:2310.03882  [pdf, other

    cs.LG cs.AI

    Small batch deep reinforcement learning

    Authors: Johan Obando-Ceron, Marc G. Bellemare, Pablo Samuel Castro

    Abstract: In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant pe… ▽ More

    Submitted 5 October, 2023; originally announced October 2023.

    Comments: Published at NeurIPS 2023

  12. arXiv:2309.06219  [pdf, other

    cs.CV cs.CL cs.CY cs.IR

    Human Action Co-occurrence in Lifestyle Vlogs using Graph Link Prediction

    Authors: Oana Ignat, Santiago Castro, Weiji Li, Rada Mihalcea

    Abstract: We introduce the task of automatic human action co-occurrence identification, i.e., determine whether two human actions can co-occur in the same interval of time. We create and make publicly available the ACE (Action Co-occurrencE) dataset, consisting of a large graph of ~12k co-occurring pairs of visual actions and their corresponding video clips. We describe graph link prediction models that lev… ▽ More

    Submitted 18 June, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

  13. arXiv:2308.08967  [pdf, other

    cs.DC

    Multi-FedLS: a Framework for Cross-Silo Federated Learning Applications on Multi-Cloud Environments

    Authors: Rafaela C. Brum, Maria Clicia Stelling de Castro, Luciana Arantes, Lúcia Maria de A. Drummond, Pierre Sens

    Abstract: Federated Learning (FL) is a distributed Machine Learning (ML) technique that can benefit from cloud environments while preserving data privacy. We propose Multi-FedLS, a framework that manages multi-cloud resources, reducing execution time and financial costs of Cross-Silo Federated Learning applications by using preemptible VMs, cheaper than on-demand ones but that can be revoked at any time. Ou… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: In review by Journal of Parallel and Distributed Computing

  14. arXiv:2307.13824  [pdf, other

    cs.LG cs.AI

    Offline Reinforcement Learning with On-Policy Q-Function Regularization

    Authors: Laixi Shi, Robert Dadashi, Yuejie Chi, Pablo Samuel Castro, Matthieu Geist

    Abstract: The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. I… ▽ More

    Submitted 25 July, 2023; originally announced July 2023.

    Comments: Published at European Conference on Machine Learning (ECML), 2023

  15. arXiv:2306.13831  [pdf, other

    cs.LG

    Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

    Authors: Maxime Chevalier-Boisvert, Bolun Dai, Mark Towers, Rodrigo de Lazcano, Lucas Willems, Salem Lahlou, Suman Pal, Pablo Samuel Castro, Jordan Terry

    Abstract: We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas.… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

  16. arXiv:2305.19452  [pdf, other

    cs.LG cs.AI

    Bigger, Better, Faster: Human-level Atari with human-level efficiency

    Authors: Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

    Abstract: We introduce a value-based RL agent, which we call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling the neural networks used for value estimation, as well as a number of other design choices that enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design choices and provide insights for future work. We end with a dis… ▽ More

    Submitted 13 November, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ICML 2023, revised version

  17. arXiv:2305.18786  [pdf, other

    cs.CV cs.CL

    Scalable Performance Analysis for Vision-Language Models

    Authors: Santiago Castro, Oana Ignat, Rada Mihalcea

    Abstract: Joint vision-language models have shown great performance over a diverse set of tasks. However, little is known about their limitations, as the high dimensional space learned by these models makes it difficult to identify semantic errors. Recent work has addressed this problem by designing highly controlled probing task benchmarks. Our paper introduces a more scalable solution that relies on alrea… ▽ More

    Submitted 31 May, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Camera-ready version for *SEM 2023

  18. arXiv:2305.12544  [pdf, other

    cs.CL cs.AI

    Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models

    Authors: Oana Ignat, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi, Muhammad Khalifa, Namho Koh, Andrew Lee, Siyang Liu, Do June Min, Shinka Mori, Joan Nwatu, Veronica Perez-Rosas, Siqi Shen, Zekun Wang, Winston Wu, Rada Mihalcea

    Abstract: Recent progress in large language models (LLMs) has enabled the deployment of many generative NLP applications. At the same time, it has also led to a misleading public discourse that ``it's all been solved.'' Not surprisingly, this has, in turn, made many NLP researchers -- especially those at the beginning of their careers -- worry about what NLP research area they should focus on. Has it all be… ▽ More

    Submitted 15 March, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

    Comments: Accepted at COLING 2024

  19. arXiv:2304.14082  [pdf, other

    cs.LG cs.SE

    JaxPruner: A concise library for sparsity research

    Authors: Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci

    Abstract: This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the… ▽ More

    Submitted 18 December, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

    Comments: Jaxpruner is hosted at http://github.com/google-research/jaxpruner

  20. arXiv:2304.12567  [pdf, other

    cs.LG cs.AI stat.ML

    Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

    Authors: Jesse Farebrother, Joshua Greaves, Rishabh Agarwal, Charline Le Lan, Ross Goroshin, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treate… ▽ More

    Submitted 25 April, 2023; originally announced April 2023.

    Comments: ICLR 2023. Code and models are available at https://github.com/google-research/google-research/tree/master/pvn 22 pages, 8 figures

  21. arXiv:2304.05763  [pdf, ps, other

    cs.GT math.DS

    Learning coordination through new actions

    Authors: Sofia B. S. D. Castro

    Abstract: We provide a novel approach to achieving a desired outcome in a coordination game: the original 2x2 game is embedded in a 2x3 game where one of the players may use a third action. For a large set of payoff values only one of the Nash equilibria of the original 2x2 game is stable under replicator dynamics. We show that this Nash equilibrium is the ω-limit of all initial conditions in the interior o… ▽ More

    Submitted 19 January, 2024; v1 submitted 12 April, 2023; originally announced April 2023.

    MSC Class: 34C99; 37C75; 91A05; 91A10; 91A22

  22. arXiv:2302.12902  [pdf, other

    cs.LG

    The Dormant Neuron Phenomenon in Deep Reinforcement Learning

    Authors: Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci

    Abstract: In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective me… ▽ More

    Submitted 13 June, 2023; v1 submitted 24 February, 2023; originally announced February 2023.

    Comments: Oral at ICML 2023

  23. arXiv:2301.07803  [pdf, other

    cs.DC

    Relating Edge Computing and Microservices by means of Architecture Approaches and Features, Orchestration, Choreography, and Offloading: A Systematic Literature Review

    Authors: Lucas Fernando Souza de Castro, Sandro Rigo

    Abstract: Context: Microservices running and being powered by Edge Computing have been gaining much attention in the industry and academia. Since 2014, when Martin Fowler popularized the Microservice term, many studies have been published relating these subjects to explore how the Edge's low-latency feature could be combined with the high throughput of the distributed paradigm from Microservices. Objective:… ▽ More

    Submitted 23 January, 2023; v1 submitted 18 January, 2023; originally announced January 2023.

    Comments: Systematic Literature Review, 40 pages, 15 figures, 15 tables

  24. arXiv:2210.02399  [pdf, other

    cs.CV cs.AI

    Phenaki: Variable Length Video Generation From Open Domain Textual Description

    Authors: Ruben Villegas, Mohammad Babaeizadeh, Pieter-Jan Kindermans, Hernan Moraldo, Han Zhang, Mohammad Taghi Saffar, Santiago Castro, Julius Kunze, Dumitru Erhan

    Abstract: We present Phenaki, a model capable of realistic video synthesis, given a sequence of textual prompts. Generating videos from text is particularly challenging due to the computational cost, limited quantities of high quality text-video data and variable length of videos. To address these issues, we introduce a new model for learning video representation which compresses the video to a small repres… ▽ More

    Submitted 5 October, 2022; originally announced October 2022.

  25. arXiv:2209.06650  [pdf, other

    cs.CV cs.CL

    WildQA: In-the-Wild Video Question Answering

    Authors: Santiago Castro, Naihao Deng, Pingxuan Huang, Mihai Burzo, Rada Mihalcea

    Abstract: Existing video understanding datasets mostly focus on human interactions, with little attention being paid to the "in the wild" settings, where the videos are recorded outdoors. We propose WILDQA, a video understanding dataset of videos recorded in outside settings. In addition to video question answering (Video QA), we also introduce the new task of identifying visual support for a given question… ▽ More

    Submitted 14 September, 2022; originally announced September 2022.

    Comments: *: Equal contribution; COLING 2022 oral; project webpage: https://lit.eecs.umich.edu/wildqa/

  26. arXiv:2206.10369  [pdf, other

    cs.LG cs.AI

    The State of Sparse Training in Deep Reinforcement Learning

    Authors: Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro

    Abstract: The use of sparse neural networks has seen rapid growth in recent years, particularly in computer vision. Their appeal stems largely from the reduced number of parameters required to train and store, as well as in an increase in learning efficiency. Somewhat surprisingly, there have been very few efforts exploring their use in Deep Reinforcement Learning (DRL). In this work we perform a systematic… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: Proceedings of the 39th International Conference on Machine Learning (ICML'22)

  27. arXiv:2206.07748  [pdf, other

    cs.HC cs.GR

    Immersion Metrics for Virtual Reality

    Authors: Matias N. Selzer, Silvia M. Castro

    Abstract: Technological advances in recent years have promoted the development of virtual reality systems that have a wide variety of hardware and software characteristics, providing varying degrees of immersion. Immersion is an objective property of the virtual reality system that depends on both its hardware and software characteristics. Virtual reality systems are currently attempting to improve immersio… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

  28. arXiv:2206.01626  [pdf, other

    cs.LG cs.AI stat.ML

    Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research. However, RL systems, when applied to large-scale settings, rarely operate tabula rasa. Such large-scale systems undergo multiple design or algorithmic changes during their development cycle and use ad hoc approaches for incorporating these changes without re-training from s… ▽ More

    Submitted 4 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022. Code and agents at https://agarwl.github.io/reincarnating_rl

  29. arXiv:2203.13371  [pdf, other

    cs.CV

    FitCLIP: Refining Large-Scale Pretrained Image-Text Models for Zero-Shot Video Understanding Tasks

    Authors: Santiago Castro, Fabian Caba Heilbron

    Abstract: Large-scale pretrained image-text models have shown incredible zero-shot performance in a handful of tasks, including video ones such as action recognition and text-to-video retrieval. However, these models have not been adapted to video, mainly because they do not account for the time dimension but also because video frames are different from the typical images (e.g., containing motion blur, and… ▽ More

    Submitted 5 October, 2022; v1 submitted 24 March, 2022; originally announced March 2022.

    Comments: Accepted at BMVC 2022. It includes the supplementary material. The margins and page size were modified to fit the arXiv ID stamp on the left side

  30. arXiv:2202.08138  [pdf, other

    cs.CV cs.CL

    When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs

    Authors: Oana Ignat, Santiago Castro, Yuhang Zhou, Jiajun Bao, Dandan Shan, Rada Mihalcea

    Abstract: We consider the task of temporal human action localization in lifestyle vlogs. We introduce a novel dataset consisting of manual annotations of temporal localization for 13,000 narrated actions in 1,200 video clips. We present an extensive analysis of this data, which allows us to better understand how the language and visual modalities interact throughout the videos. We propose a simple yet effec… ▽ More

    Submitted 21 February, 2022; v1 submitted 16 February, 2022; originally announced February 2022.

    Comments: arXiv admin note: text overlap with arXiv:1906.04236

  31. arXiv:2202.01385  [pdf, other

    cs.RO

    Technical Report: A Hierarchical Deliberative-Reactive System Architecture for Task and Motion Planning in Partially Known Environments

    Authors: Vasileios Vasilopoulos, Sebastian Castro, William Vega-Brown, Daniel E. Koditschek, Nicholas Roy

    Abstract: We describe a task and motion planning architecture for highly dynamic systems that combines a domain-independent sampling-based deliberative planning algorithm with a global reactive planner. We leverage the recent development of a reactive, vector field planner that provides guarantees of reachability to large regions of the environment even in the face of unknown or unforeseen obstacles. The re… ▽ More

    Submitted 2 February, 2022; originally announced February 2022.

    Comments: Technical Report accompanying the paper "A Hierarchical Deliberative-Reactive System Architecture for Task and Motion Planning in Partially Known Environments" at ICRA 2022 (8 pages, 6 figures)

  32. arXiv:2112.02070  [pdf, other

    cs.MM cs.AI

    Malakai: Music That Adapts to the Shape of Emotions

    Authors: Zack Harris, Liam Atticus Clarke, Pietro Gagliano, Dante Camarena, Manal Siddiqui, Pablo S. Castro

    Abstract: The advent of ML music models such as Google Magenta's MusicVAE now allow us to extract and replicate compositional features from otherwise complex datasets. These models allow computational composers to parameterize abstract variables such as style and mood. By leveraging these models and combining them with procedural algorithms from the last few decades, it is possible to create a dynamic song… ▽ More

    Submitted 3 December, 2021; originally announced December 2021.

  33. arXiv:2111.05128  [pdf, other

    cs.LG cs.AI cs.HC cs.SD eess.AS

    Losses, Dissonances, and Distortions

    Authors: Pablo Samuel Castro

    Abstract: In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting. These dissonances and distortions become part of an artistic performance not just by affecting the visualizations, but also by affecting the artistic musical perform… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: In the 5th Machine Learning for Creativity and Design Workshop at NeurIPS 2021

  34. arXiv:2110.14020  [pdf, other

    cs.LG cs.AI

    The Difficulty of Passive Learning in Deep Reinforcement Learning

    Authors: Georg Ostrovski, Pablo Samuel Castro, Will Dabney

    Abstract: Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justif… ▽ More

    Submitted 26 October, 2021; originally announced October 2021.

    Comments: Accepted paper at NeurIPS 2021

  35. arXiv:2109.02747  [pdf, other

    cs.CV cs.CL

    WhyAct: Identifying Action Reasons in Lifestyle Vlogs

    Authors: Oana Ignat, Santiago Castro, Hanwen Miao, Weiji Li, Rada Mihalcea

    Abstract: We aim to automatically identify human action reasons in online videos. We focus on the widespread genre of lifestyle vlogs, in which people perform actions while verbally describing them. We introduce and make publicly available the WhyAct dataset, consisting of 1,077 visual actions manually annotated with their reasons. We describe a multimodal model that leverages visual and textual information… ▽ More

    Submitted 9 September, 2021; v1 submitted 6 September, 2021; originally announced September 2021.

    Comments: Accepted at EMNLP 2021

  36. arXiv:2108.13264  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    Deep Reinforcement Learning at the Edge of the Statistical Precipice

    Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville, Marc G. Bellemare

    Abstract: Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Lea… ▽ More

    Submitted 5 January, 2022; v1 submitted 30 August, 2021; originally announced August 2021.

    Comments: Outstanding Paper Award at NeurIPS 2021. Website: https://agarwl.github.io/rliable. 28 Pages, 33 Figures

  37. arXiv:2108.05828  [pdf, other

    cs.LG cs.AI stat.ML

    A general class of surrogate functions for stable and efficient reinforcement learning

    Authors: Sharan Vaswani, Olivier Bachem, Simone Totaro, Robert Mueller, Shivam Garg, Matthieu Geist, Marlos C. Machado, Pablo Samuel Castro, Nicolas Le Roux

    Abstract: Common policy gradient methods rely on the maximization of a sequence of surrogate functions. In recent years, many such surrogate functions have been proposed, most without strong theoretical guarantees, leading to algorithms such as TRPO, PPO or MPO. Rather than design yet another surrogate function, we instead propose a general framework (FMA-PG) based on functional mirror ascent that gives ris… ▽ More

    Submitted 30 October, 2023; v1 submitted 12 August, 2021; originally announced August 2021.

    Comments: Fixed minor typos

  38. Active Learning of Abstract Plan Feasibility

    Authors: Michael Noseworthy, Caris Moses, Isaiah Brand, Sebastian Castro, Leslie Kaelbling, Tomás Lozano-Pérez, Nicholas Roy

    Abstract: Long horizon sequential manipulation tasks are effectively addressed hierarchically: at a high level of abstraction the planner searches over abstract action sequences, and when a plan is found, lower level motion plans are generated. Such a strategy hinges on the ability to reliably predict that a feasible low level plan will be found which satisfies the abstract plan. However, computing Abstract… ▽ More

    Submitted 1 July, 2021; originally announced July 2021.

    Comments: To appear in Robotics: Science and Systems 2021

  39. arXiv:2106.08229  [pdf, other

    cs.LG cs.AI

    MICo: Improved representations via sampling-based state similarity for Markov decision processes

    Authors: Pablo Samuel Castro, Tyler Kastner, Prakash Panangaden, Mark Rowland

    Abstract: We present a new behavioural distance over the state space of a Markov decision process, and demonstrate the use of this distance as an effective means of shaping the learnt representations of deep reinforcement learning agents. While existing notions of state similarity are typically difficult to learn at scale due to high computational cost and lack of sample-based algorithms, our newly-proposed… ▽ More

    Submitted 21 January, 2022; v1 submitted 3 June, 2021; originally announced June 2021.

    Comments: Published at NeurIPS 2021

  40. arXiv:2104.10636  [pdf, other

    cs.RO

    Learning and Planning for Temporally Extended Tasks in Unknown Environments

    Authors: Christopher Bradley, Adam Pacheck, Gregory J. Stein, Sebastian Castro, Hadas Kress-Gazit, Nicholas Roy

    Abstract: We propose a novel planning technique for satisfying tasks specified in temporal logic in partially revealed environments. We define high-level actions derived from the environment and the given task itself, and estimate how each action contributes to progress towards completing the task. As the map is revealed, we estimate the cost and probability of success of each action from images and an enco… ▽ More

    Submitted 28 April, 2021; v1 submitted 21 April, 2021; originally announced April 2021.

    Comments: 7 Pages, 7 Figures, Accepted to ICRA 2021

  41. arXiv:2104.04182  [pdf, other

    cs.CV

    FIBER: Fill-in-the-Blanks as a Challenging Video Understanding Evaluation Framework

    Authors: Santiago Castro, Ruoyao Wang, Pingxuan Huang, Ian Stewart, Oana Ignat, Nan Liu, Jonathan C. Stroud, Rada Mihalcea

    Abstract: We propose fill-in-the-blanks as a video understanding evaluation framework and introduce FIBER -- a novel dataset consisting of 28,000 videos and descriptions in support of this evaluation framework. The fill-in-the-blanks setting tests a model's understanding of a video by requiring it to predict a masked noun phrase in the caption of the video, given the video and the surrounding text. The FIBE… ▽ More

    Submitted 22 March, 2022; v1 submitted 9 April, 2021; originally announced April 2021.

    Comments: Accepted at ACL 2022 Main conference. Camera-ready version

  42. arXiv:2102.01514  [pdf, other

    cs.LG cs.AI stat.ML

    Metrics and continuity in reinforcement learning

    Authors: Charline Le Lan, Marc G. Bellemare, Pablo Samuel Castro

    Abstract: In most practical applications of reinforcement learning, it is untenable to maintain direct estimates for individual states; in continuous-state systems, it is impossible. Instead, researchers often leverage state similarity (whether explicitly or implicitly) to build models that can generalize well from a limited set of samples. The notion of state similarity used, and the neighbourhoods and top… ▽ More

    Submitted 2 February, 2021; originally announced February 2021.

    Comments: Accepted at AAAI 2021

  43. arXiv:2101.05265  [pdf, other

    cs.LG cs.AI stat.ML

    Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning

    Authors: Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare

    Abstract: Reinforcement learning methods trained on few environments rarely learn policies that generalize to unseen environments. To improve generalization, we incorporate the inherent sequential structure in reinforcement learning into the representation learning process. This approach is orthogonal to recent approaches, which rarely exploit this structure explicitly. Specifically, we introduce a theoreti… ▽ More

    Submitted 18 March, 2021; v1 submitted 13 January, 2021; originally announced January 2021.

    Comments: ICLR 2021 (Spotlight). Website: https://agarwl.github.io/pse

  44. arXiv:2011.14826  [pdf, other

    cs.LG cs.AI

    Revisiting Rainbow: Promoting more Insightful and Inclusive Deep Reinforcement Learning Research

    Authors: Johan S. Obando-Ceron, Pablo Samuel Castro

    Abstract: Since the introduction of DQN, a vast majority of reinforcement learning research has focused on reinforcement learning with deep neural networks as function approximators. New methods are typically evaluated on a set of environments that have now become standard, such as Atari 2600 games. While these benchmarks help standardize evaluation, their computational cost has the unfortunate side effect… ▽ More

    Submitted 21 May, 2021; v1 submitted 20 November, 2020; originally announced November 2020.

    Comments: Proceedings of the 38th International Conference on Machine Learning (ICML 2021)

  45. arXiv:2011.05158  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    GANterpretations

    Authors: Pablo Samuel Castro

    Abstract: Since the introduction of Generative Adversarial Networks (GANs) [Goodfellow et al., 2014] there has been a regular stream of both technical advances (e.g., Arjovsky et al. [2017]) and creative uses of these generative models (e.g., [Karras et al., 2019, Zhu et al., 2017, Jin et al., 2017]). In this work we propose an approach for using the power of GANs to automatically generate videos to accompa… ▽ More

    Submitted 6 November, 2020; originally announced November 2020.

    Comments: In 4th Workshop on Machine Learning for Creativity and Design at NeurIPS 2020, Vancouver, Canada

  46. arXiv:1911.11134  [pdf, other

    cs.LG cs.CV stat.ML

    Rigging the Lottery: Making All Tickets Winners

    Authors: Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

    Abstract: Many applications require sparse neural networks due to space or inference time restrictions. There is a large body of work on training dense networks to yield sparse networks for inference, but this limits the size of the largest trainable sparse model to that of the largest trainable dense model. In this paper we introduce a method to train sparse neural networks with a fixed parameter count and… ▽ More

    Submitted 23 July, 2021; v1 submitted 25 November, 2019; originally announced November 2019.

    Comments: Published in Proceedings of the 37th International Conference on Machine Learning. Code can be found in github.com/google-research/rigl

    Journal ref: Proceedings of the 37th International Conference on Machine Learning (2020) 471-481

  47. arXiv:1911.09291  [pdf, other

    cs.LG cs.AI stat.ML

    Scalable methods for computing state similarity in deterministic Markov Decision Processes

    Authors: Pablo Samuel Castro

    Abstract: We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has… ▽ More

    Submitted 21 November, 2019; originally announced November 2019.

    Comments: To appear in Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)

  48. arXiv:1907.13411  [pdf, other

    cs.LG stat.ML

    Inverse Reinforcement Learning with Multiple Ranked Experts

    Authors: Pablo Samuel Castro, Shijian Li, Daqing Zhang

    Abstract: We consider the problem of learning to behave optimally in a Markov Decision Process when a reward function is not specified, but instead we have access to a set of demonstrators of varying performance. We assume the demonstrators are classified into one of k ranks, and use ideas from ordinal regression to find a reward function that maximizes the margin between the different ranks. This approach… ▽ More

    Submitted 31 July, 2019; originally announced July 2019.

  49. arXiv:1906.01815  [pdf, other

    cs.CL cs.CV

    Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper)

    Authors: Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, Soujanya Poria

    Abstract: Sarcasm is often expressed through several verbal and non-verbal cues, e.g., a change of tone, overemphasis in a word, a drawn-out syllable, or a straight looking face. Most of the recent work in sarcasm detection has been carried out on textual data. In this paper, we argue that incorporating multimodal cues can improve the automatic classification of sarcasm. As a first step towards enabling the… ▽ More

    Submitted 5 June, 2019; originally announced June 2019.

    Comments: Accepted at ACL 2019

  50. arXiv:1904.13285  [pdf, other

    cs.SD cs.LG eess.AS

    Performing Structured Improvisations with pre-trained Deep Learning Models

    Authors: Pablo Samuel Castro

    Abstract: The quality of outputs produced by deep generative models for music have seen a dramatic improvement in the last few years. However, most deep learning models perform in "offline" mode, with few restrictions on the processing time. Integrating these types of models into a live structured performance poses a challenge because of the necessity to respect the beat and harmony. Further, these deep mod… ▽ More

    Submitted 30 April, 2019; originally announced April 2019.