Skip to main content

Showing 1–50 of 151 results for author: Schmidhuber, J

  1. arXiv:2406.08404  [pdf, other

    cs.LG cs.AI

    Scaling Value Iteration Networks to 5000 Layers for Extreme Long-Term Planning

    Authors: Yuhui Wang, Qingyuan Wu, Weida Li, Dylan R. Ashley, Francesco Faccio, Chao Huang, Jürgen Schmidhuber

    Abstract: The Value Iteration Network (VIN) is an end-to-end differentiable architecture that performs value iteration on a latent MDP for planning in reinforcement learning (RL). However, VINs struggle to scale to long-term and large-scale planning tasks, such as navigating a $100\times 100$ maze -- a task which typically requires thousands of planning steps to solve. We observe that this deficiency is due… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    ACM Class: I.2.6

  2. arXiv:2406.03485  [pdf, other

    cs.LG cs.AI

    Highway Value Iteration Networks

    Authors: Yuhui Wang, Weida Li, Francesco Faccio, Qingyuan Wu, Jürgen Schmidhuber

    Abstract: Value iteration networks (VINs) enable end-to-end learning for planning tasks by employing a differentiable "planning module" that approximates the value iteration algorithm. However, long-term planning remains a challenge because training very deep VINs is difficult. To address this problem, we embed highway value iteration -- a recent algorithm designed to facilitate long-term credit assignment… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  3. arXiv:2405.18289  [pdf, other

    cs.LG cs.AI

    Highway Reinforcement Learning

    Authors: Yuhui Wang, Miroslav Strupl, Francesco Faccio, Qingyuan Wu, Haozhe Liu, Michał Grudzień, Xiaoyang Tan, Jürgen Schmidhuber

    Abstract: Learning from multi-step off-policy data collected by a set of policies is a core problem of reinforcement learning (RL). Approaches based on importance sampling (IS) often suffer from large variances due to products of IS ratios. Typical IS-free methods, such as $n$-step Q-learning, look ahead for $n$ time steps along the trajectory of actions (where $n$ is called the lookahead depth) and utilize… ▽ More

    Submitted 28 May, 2024; originally announced May 2024.

  4. arXiv:2405.17283  [pdf, other

    cs.LG cs.NE

    Recurrent Complex-Weighted Autoencoders for Unsupervised Object Discovery

    Authors: Anand Gopalakrishnan, Aleksandar Stanić, Jürgen Schmidhuber, Michael Curtis Mozer

    Abstract: Current state-of-the-art synchrony-based models encode object bindings with complex-valued activations and compute with real-valued weights in feedforward architectures. We argue for the computational advantages of a recurrent architecture with complex-valued weights. We propose a fully convolutional autoencoder, SynCx, that performs iterative constraint satisfaction: at each iteration, a hidden l… ▽ More

    Submitted 28 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: minor typo fixed

  5. arXiv:2405.16039  [pdf, other

    cs.LG cs.AI cs.NE

    MoEUT: Mixture-of-Experts Universal Transformers

    Authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber, Christopher Potts, Christopher D. Manning

    Abstract: Previous work on Universal Transformers (UTs) has demonstrated the importance of parameter sharing across layers. By allowing recurrence in depth, UTs have advantages over standard Transformers in learning compositional generalizations, but layer-sharing comes with a practical limitation of parameter-compute ratio: it drastically reduces the parameter count compared to the non-shared model with th… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.03878  [pdf, other

    cs.LG cs.AI

    Sequence Compression Speeds Up Credit Assignment in Reinforcement Learning

    Authors: Aditya A. Ramesh, Kenny Young, Louis Kirsch, Jürgen Schmidhuber

    Abstract: Temporal credit assignment in reinforcement learning is challenging due to delayed and stochastic outcomes. Monte Carlo targets can bridge long delays between action and consequence but lead to high-variance targets due to stochasticity. Temporal difference (TD) learning uses bootstrapping to overcome variance but introduces a bias that can only be corrected through many iterations. TD($λ$) provid… ▽ More

    Submitted 4 June, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: ICML 2024 version

  7. arXiv:2405.00466  [pdf, other

    cs.CV cs.CR

    Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable

    Authors: Haozhe Liu, Wentian Zhang, Bing Li, Bernard Ghanem, Jürgen Schmidhuber

    Abstract: Foundational generative models should be traceable to protect their owners and facilitate safety regulation. To achieve this, traditional approaches embed identifiers based on supervisory trigger-response signals, which are commonly known as backdoor watermarks. They are prone to failure when the model is fine-tuned with nontrigger data. Our experiments show that this vulnerability is due to energ… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

  8. arXiv:2404.08093  [pdf, other

    cs.RO cs.AI cs.LG

    Towards a Robust Soft Baby Robot With Rich Interaction Ability for Advanced Machine Learning Algorithms

    Authors: Mohannad Alhakami, Dylan R. Ashley, Joel Dunham, Francesco Faccio, Eric Feron, Jürgen Schmidhuber

    Abstract: Artificial intelligence has made great strides in many areas lately, yet it has had comparatively little success in general-use robotics. We believe one of the reasons for this is the disconnect between traditional robotic design and the properties needed for open-ended, creativity-based AI systems. To that end, we, taking selective inspiration from nature, build a robust, partially soft robotic l… ▽ More

    Submitted 11 April, 2024; originally announced April 2024.

    Comments: 5 pages in main text + 1 page of references, 7 figures in main text; source code available at https://github.com/dylanashley/robot-limb-testai

    ACM Class: I.2.9; I.2.6

  9. arXiv:2404.02747  [pdf, other

    cs.CV

    Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models

    Authors: Wentian Zhang, Haozhe Liu, Jinheng Xie, Francesco Faccio, Mike Zheng Shou, Jürgen Schmidhuber

    Abstract: This study explores the role of cross-attention during inference in text-conditional diffusion models. We find that cross-attention outputs converge to a fixed point after few inference steps. Accordingly, the time point of convergence naturally divides the entire inference process into two stages: an initial semantics-planning stage, during which, the model relies on cross-attention to plan text-… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  10. arXiv:2403.11998  [pdf, other

    cs.LG

    Learning Useful Representations of Recurrent Neural Network Weight Matrices

    Authors: Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber

    Abstract: Recurrent Neural Networks (RNNs) are general-purpose parallel-sequential computers. The program of an RNN is its weight matrix. How to learn useful representations of RNN weights that facilitate RNN analysis as well as downstream tasks? While the mechanistic approach directly looks at some RNN's weights to predict its behavior, the functionalist approach analyzes its overall functionality-specific… ▽ More

    Submitted 18 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    ACM Class: I.2.6

  11. arXiv:2402.16823  [pdf, other

    cs.AI cs.CL cs.LG cs.MA

    Language Agents as Optimizable Graphs

    Authors: Mingchen Zhuge, Wenyi Wang, Louis Kirsch, Francesco Faccio, Dmitrii Khizbullin, Jürgen Schmidhuber

    Abstract: Various human-designed prompt engineering techniques have been proposed to improve problem solvers based on Large Language Models (LLMs), yielding many disparate code bases. We unify these approaches by describing LLM-based agents as computational graphs. The nodes implement functions to process multimodal data or query LLMs, and the edges describe the information flow between operations. Graphs c… ▽ More

    Submitted 27 February, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

    Comments: Project Website: https://gptswarm.org ; Github Repo: https://github.com/metauto-ai/gptswarm ; Replace to fix typos

  12. arXiv:2402.03141  [pdf, other

    cs.LG cs.AI eess.SY

    Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

    Authors: Qingyuan Wu, Simon Sinong Zhan, Yixuan Wang, Yuhui Wang, Chung-Wei Lin, Chen Lv, Qi Zhu, Jürgen Schmidhuber, Chao Huang

    Abstract: Reinforcement learning (RL) is challenging in the common case of delays between events and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either suffer from state space explosion or performance degeneration in stochastic environments. To address these challenges, we present a novel Auxiliary-Delayed Reinforcement Learning (AD-RL) method that leverages auxiliary ta… ▽ More

    Submitted 5 June, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

    Comments: ICML 2024

  13. arXiv:2312.07987  [pdf, other

    cs.LG cs.CL cs.NE

    SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention

    Authors: Róbert Csordás, Piotr Piękos, Kazuki Irie, Jürgen Schmidhuber

    Abstract: The costly self-attention layers in modern Transformers require memory and compute quadratic in sequence length. Existing approximation methods usually underperform and fail to obtain significant speedups in practice. Here we present SwitchHead - a novel method that reduces both compute and memory requirements and achieves wall-clock speedup, while matching the language modeling performance of bas… ▽ More

    Submitted 14 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

  14. arXiv:2312.00276  [pdf, other

    cs.LG

    Automating Continual Learning

    Authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

    Abstract: General-purpose learning systems should improve themselves in open-ended fashion in ever-changing environments. Conventional learning algorithms for neural networks, however, suffer from catastrophic forgetting (CF) -- previously acquired skills are forgotten when a new task is learned. Instead of hand-crafting new algorithms for avoiding CF, we propose Automated Continual Learning (ACL) to train… ▽ More

    Submitted 29 March, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  15. arXiv:2311.08525  [pdf, other

    cs.CV cs.AI

    Efficient Rotation Invariance in Deep Neural Networks through Artificial Mental Rotation

    Authors: Lukas Tuggener, Thilo Stadelmann, Jürgen Schmidhuber

    Abstract: Humans and animals recognize objects irrespective of the beholder's point of view, which may drastically change their appearances. Artificial pattern recognizers also strive to achieve this, e.g., through translational invariance in convolutional neural networks (CNNs). However, both CNNs and vision transformers (ViTs) perform very poorly on rotated inputs. Here we present artificial mental rotati… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  16. arXiv:2311.07534  [pdf, other

    cs.SD cs.LG eess.AS

    Unsupervised Musical Object Discovery from Audio

    Authors: Joonsu Gha, Vincent Herrmann, Benjamin Grewe, Jürgen Schmidhuber, Anand Gopalakrishnan

    Abstract: Current object-centric learning models such as the popular SlotAttention architecture allow for unsupervised visual scene decomposition. Our novel MusicSlots method adapts SlotAttention to the audio domain, to achieve unsupervised music decomposition. Since concepts of opacity and occlusion in vision have no auditory analogues, the softmax normalization of alpha masks in the decoders of visual obj… ▽ More

    Submitted 14 November, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: Accepted to Machine Learning for Audio Workshop, NeurIPS 2023

  17. arXiv:2310.16076  [pdf, other

    cs.LG

    Practical Computational Power of Linear Transformers and Their Recurrent and Self-Referential Extensions

    Authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

    Abstract: Recent studies of the computational power of recurrent neural networks (RNNs) reveal a hierarchy of RNN architectures, given real-time and finite-precision assumptions. Here we study auto-regressive Transformers with linearised attention, a.k.a. linear Transformers (LTs) or Fast Weight Programmers (FWPs). LTs are special in the sense that they are equivalent to RNN-like sequence processors with a… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 (short paper)

  18. arXiv:2310.10837  [pdf, other

    cs.LG cs.NE

    Approximating Two-Layer Feedforward Networks for Efficient Transformers

    Authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

    Abstract: How to reduce compute and memory requirements of neural networks (NNs) without sacrificing performance? Many recent works use sparse Mixtures of Experts (MoEs) to build resource-efficient large language models (LMs). Here we introduce several novel perspectives on MoEs, presenting a general framework that unifies various methods to approximate two-layer NNs (e.g., feedforward blocks of Transformer… ▽ More

    Submitted 21 November, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

    Comments: Accepted to EMNLP 2023 Findings

  19. arXiv:2309.11197  [pdf, other

    cs.LG cs.CL

    The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

    Authors: Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag

    Abstract: The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the m… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  20. arXiv:2308.07795  [pdf, other

    cs.CV cs.AI

    Learning to Identify Critical States for Reinforcement Learning from Videos

    Authors: Haozhe Liu, Mingchen Zhuge, Bing Li, Yuhui Wang, Francesco Faccio, Bernard Ghanem, Jürgen Schmidhuber

    Abstract: Recent work on deep reinforcement learning (DRL) has pointed out that algorithmic information about good policies can be extracted from offline data which lack explicit information about executed actions. For example, videos of humans or robots may convey a lot of implicit information about rewarding action sequences, but a DRL machine that wants to profit from watching such videos must first lear… ▽ More

    Submitted 15 August, 2023; originally announced August 2023.

    Comments: This paper was accepted to ICCV23

  21. arXiv:2308.00352  [pdf, other

    cs.AI cs.MA

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework

    Authors: Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, Jürgen Schmidhuber

    Abstract: Remarkable progress has been made on automated problem solving through societies of agents based on large language models (LLMs). Existing LLM-based multi-agent systems can already solve simple dialogue tasks. Solutions to more complex tasks, however, are complicated through logic inconsistencies due to cascading hallucinations caused by naively chaining LLMs. Here we introduce MetaGPT, an innovat… ▽ More

    Submitted 6 November, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

  22. arXiv:2305.19044  [pdf, other

    cs.LG

    Exploring the Promise and Limits of Real-Time Recurrent Learning

    Authors: Kazuki Irie, Anand Gopalakrishnan, Jürgen Schmidhuber

    Abstract: Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT). RTRL requires neither caching past activations nor truncating context, and enables online learning. However, RTRL's time and space complexity make it impractical. To overcome this problem, most recent work on RTRL focuses on app… ▽ More

    Submitted 28 February, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: Accepted to ICLR 2024

  23. arXiv:2305.17066  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.MA

    Mindstorms in Natural Language-Based Societies of Mind

    Authors: Mingchen Zhuge, Haozhe Liu, Francesco Faccio, Dylan R. Ashley, Róbert Csordás, Anand Gopalakrishnan, Abdullah Hamdi, Hasan Abed Al Kader Hammoud, Vincent Herrmann, Kazuki Irie, Louis Kirsch, Bing Li, Guohao Li, Shuming Liu, Jinjie Mai, Piotr Piękos, Aditya Ramesh, Imanol Schlag, Weimin Shi, Aleksandar Stanić, Wenyi Wang, Yuhui Wang, Mengmeng Xu, Deng-Ping Fan, Bernard Ghanem , et al. (1 additional authors not shown)

    Abstract: Both Minsky's "society of mind" and Schmidhuber's "learning to think" inspire diverse societies of large multimodal neural networks (NNs) that solve problems by interviewing each other in a "mindstorm." Recent implementations of NN-based societies of minds consist of large language models (LLMs) and other NN-based experts communicating through a natural language interface. In doing so, they overco… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: 9 pages in main text + 7 pages of references + 38 pages of appendices, 14 figures in main text + 13 in appendices, 7 tables in appendices

    MSC Class: 68T07 ACM Class: I.2.6; I.2.11

  24. arXiv:2305.15001  [pdf, other

    cs.LG cs.AI cs.CV

    Contrastive Training of Complex-Valued Autoencoders for Object Discovery

    Authors: Aleksandar Stanić, Anand Gopalakrishnan, Kazuki Irie, Jürgen Schmidhuber

    Abstract: Current state-of-the-art object-centric models use slots and attention-based routing for binding. However, this class of models has several conceptual limitations: the number of slots is hardwired; all slots have equal capacity; training has high computational cost; there are no object-level relational factors within slots. Synchrony-based models in principle can address these limitations by using… ▽ More

    Submitted 9 November, 2023; v1 submitted 24 May, 2023; originally announced May 2023.

    Comments: accepted to NeurIPS 2023

  25. arXiv:2305.05364  [pdf, other

    cs.LG cs.AI cs.CL

    Large Language Model Programs

    Authors: Imanol Schlag, Sainbayar Sukhbaatar, Asli Celikyilmaz, Wen-tau Yih, Jason Weston, Jürgen Schmidhuber, Xian Li

    Abstract: In recent years, large pre-trained language models (LLMs) have demonstrated the ability to follow instructions and perform novel tasks from a few examples. The possibility to parameterise an LLM through such in-context examples widens their capability at a much lower cost than finetuning. We extend this line of reasoning and present a method which further expands the capabilities of an LLM by embe… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

  26. arXiv:2305.01547  [pdf, other

    cs.LG

    Accelerating Neural Self-Improvement via Bootstrapping

    Authors: Kazuki Irie, Jürgen Schmidhuber

    Abstract: Few-shot learning with sequence-processing neural networks (NNs) has recently attracted a new wave of attention in the context of large language models. In the standard N-way K-shot learning setting, an NN is explicitly optimised to learn to classify unlabelled inputs by observing a sequence of NK labelled examples. This pressures the NN to learn a learning algorithm that achieves optimal performa… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: Presented at ICLR 2023 Workshop on Mathematical and Empirical Understanding of Foundation Models, https://openreview.net/forum?id=SDwUYcyOCyP

  27. arXiv:2302.07950  [pdf, other

    cs.LG cs.CV

    Topological Neural Discrete Representation Learning à la Kohonen

    Authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

    Abstract: Unsupervised learning of discrete representations from continuous ones in neural networks (NNs) is the cornerstone of several applications today. Vector Quantisation (VQ) has become a popular method to achieve such representations, in particular in the context of generative models such as Variational Auto-Encoders (VAEs). For example, the exponential moving average-based VQ (EMA-VQ) algorithm is o… ▽ More

    Submitted 15 February, 2023; originally announced February 2023.

    Comments: Two first authors

  28. arXiv:2301.12876  [pdf, other

    cs.LG cs.AI

    Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

    Authors: Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

    Abstract: Offline RL methods have been shown to reduce the need for environment interaction by training agents using offline collected episodes. However, these methods typically require action information to be logged during data collection, which can be difficult or even impossible in some practical cases. In this paper, we investigate the potential of using action-free offline datasets to improve online r… ▽ More

    Submitted 22 March, 2023; v1 submitted 30 January, 2023; originally announced January 2023.

  29. arXiv:2212.14392  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Eliminating Meta Optimization Through Self-Referential Meta Learning

    Authors: Louis Kirsch, Jürgen Schmidhuber

    Abstract: Meta Learning automates the search for learning algorithms. At the same time, it creates a dependency on human engineering on the meta-level, where meta learning algorithms need to be designed. In this paper, we investigate self-referential meta learning systems that modify themselves without the need for explicit meta optimization. We discuss the relationship of such systems to in-context and mem… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: The first version appeared at ICML 2022, DARL Workshop

  30. arXiv:2212.14374  [pdf, other

    cs.LG cs.AI

    Learning One Abstract Bit at a Time Through Self-Invented Experiments Encoded as Neural Networks

    Authors: Vincent Herrmann, Louis Kirsch, Jürgen Schmidhuber

    Abstract: There are two important things in science: (A) Finding answers to given questions, and (B) Coming up with good questions. Our artificial scientists not only learn to answer given questions, but also continually invent new questions, by proposing hypotheses to be verified or falsified through potentially complex and time-consuming experiments, including thought experiments akin to those of mathemat… ▽ More

    Submitted 29 December, 2022; originally announced December 2022.

    Comments: 20 pages, 6 figures

  31. arXiv:2212.11279  [pdf

    cs.NE

    Annotated History of Modern AI and Deep Learning

    Authors: Juergen Schmidhuber

    Abstract: Machine learning is the science of credit assignment: finding patterns in observations that predict the consequences of actions and help to improve future performance. Credit assignment is also required for human understanding of how the world works, not only for individuals navigating daily life, but also for academic professionals like historians who interpret the present in light of past events… ▽ More

    Submitted 29 December, 2022; v1 submitted 21 December, 2022; originally announced December 2022.

    Comments: 75 pages, over 500 references. arXiv admin note: substantial text overlap with arXiv:2005.05744

    Report number: Technical Report IDSIA-22-22

  32. arXiv:2211.12423  [pdf, other

    cs.CL cs.AI cs.LG cs.MM cs.NE cs.SD eess.AS

    On Narrative Information and the Distillation of Stories

    Authors: Dylan R. Ashley, Vincent Herrmann, Zachary Friggstad, Jürgen Schmidhuber

    Abstract: The act of telling stories is a fundamental part of what it means to be human. This work introduces the concept of narrative information, which we define to be the overlap in information space between a story and the items that compose the story. Using contrastive learning methods, we show how modern artificial neural networks can be leveraged to distill stories and extract a representation of the… ▽ More

    Submitted 13 February, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: presented in the Information-Theoretic Principles in Cognitive Systems Workshop at the 36th Conference on Neural Information Processing Systems; 4 pages in main text + 2 pages of references + 8 pages of appendices, 2 figures in main text + 3 in appendices, 1 table in main text, 2 algorithms in appendices; source code available at https://github.com/dylanashley/story-distiller

    MSC Class: 68T07 (Primary) 68P30; 68W50; 94A15 (Secondary) ACM Class: H.1.1; H.5.5; I.2.6; I.5.1; J.5

  33. arXiv:2211.10282  [pdf, other

    cs.LG

    Exploring through Random Curiosity with General Value Functions

    Authors: Aditya Ramesh, Louis Kirsch, Sjoerd van Steenkiste, Jürgen Schmidhuber

    Abstract: Efficient exploration in reinforcement learning is a challenging problem commonly addressed through intrinsic rewards. Recent prominent approaches are based on state novelty or variants of artificial curiosity. However, directly applying them to partially observable environments can be ineffective and lead to premature dissipation of intrinsic rewards. Here we propose random curiosity with general… ▽ More

    Submitted 18 November, 2022; originally announced November 2022.

    Comments: Accepted to NeurIPS 2022

  34. arXiv:2211.09440  [pdf, ps, other

    cs.NE cs.LG

    Learning to Control Rapidly Changing Synaptic Connections: An Alternative Type of Memory in Sequence Processing Artificial Neural Networks

    Authors: Kazuki Irie, Jürgen Schmidhuber

    Abstract: Short-term memory in standard, general-purpose, sequence-processing recurrent neural networks (RNNs) is stored as activations of nodes or "neurons." Generalising feedforward NNs to such RNNs is mathematically straightforward and natural, and even historical: already in 1943, McCulloch and Pitts proposed this as a surrogate to "synaptic modifications" (in effect, generalising the Lenz-Ising model,… ▽ More

    Submitted 17 November, 2022; originally announced November 2022.

    Comments: Presented at NeurIPS 2022 Workshop on Memory in Artificial and Real Intelligence

  35. arXiv:2211.02222  [pdf, other

    cs.LG

    The Benefits of Model-Based Generalization in Reinforcement Learning

    Authors: Kenny Young, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

    Abstract: Model-Based Reinforcement Learning (RL) is widely believed to have the potential to improve sample efficiency by allowing an agent to synthesize large amounts of imagined experience. Experience Replay (ER) can be considered a simple kind of model, which has proved effective at improving the stability and efficiency of deep RL. In principle, a learned parametric model could improve on ER by general… ▽ More

    Submitted 10 July, 2023; v1 submitted 3 November, 2022; originally announced November 2022.

    Comments: Update to ICML version

  36. arXiv:2210.06350  [pdf, other

    cs.LG cs.AI cs.NE

    CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations

    Authors: Róbert Csordás, Kazuki Irie, Jürgen Schmidhuber

    Abstract: Well-designed diagnostic tasks have played a key role in studying the failure of neural nets (NNs) to generalize systematically. Famous examples include SCAN and Compositional Table Lookup (CTL). Here we introduce CTL++, a new diagnostic dataset based on compositions of unary symbolic functions. While the original CTL is used to test length generalization or productivity, CTL++ is designed to test… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted to EMNLP 2022

  37. arXiv:2210.06184  [pdf, other

    cs.CV cs.LG cs.NE

    Images as Weight Matrices: Sequential Image Generation Through Synaptic Learning Rules

    Authors: Kazuki Irie, Jürgen Schmidhuber

    Abstract: Work on fast weight programmers has demonstrated the effectiveness of key/value outer product-based learning rules for sequentially generating a weight matrix (WM) of a neural net (NN) by another NN or itself. However, the weight generation steps are typically not visually interpretable by humans, because the contents stored in the WM of an NN are not. Here we apply the same principle to generate… ▽ More

    Submitted 28 February, 2023; v1 submitted 7 October, 2022; originally announced October 2022.

    Comments: Accepted to ICLR 2023

  38. arXiv:2208.03374  [pdf, other

    cs.LG cs.AI

    Learning to Generalize with Object-centric Agents in the Open World Survival Game Crafter

    Authors: Aleksandar Stanić, Yujin Tang, David Ha, Jürgen Schmidhuber

    Abstract: Reinforcement learning agents must generalize beyond their training experience. Prior work has focused mostly on identical training and evaluation environments. Starting from the recently introduced Crafter benchmark, a 2D open world survival game, we introduce a new set of environments suitable for evaluating some agent's ability to generalize on previously unseen (numbers of) objects and to adap… ▽ More

    Submitted 5 August, 2022; originally announced August 2022.

    ACM Class: I.2.6

  39. arXiv:2207.01570  [pdf, other

    cs.LG stat.ML

    Goal-Conditioned Generators of Deep Policies

    Authors: Francesco Faccio, Vincent Herrmann, Aditya Ramesh, Louis Kirsch, Jürgen Schmidhuber

    Abstract: Goal-conditioned Reinforcement Learning (RL) aims at learning optimal policies, given goals encoded in special command inputs. Here we study goal-conditioned neural nets (NNs) that learn to generate deep NN policies in form of context-specific weight matrices, similar to Fast Weight Programmers and other methods from the 1990s. Using context commands of the form "generate a policy that achieves a… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Preprint. Under Review

  40. arXiv:2207.01566  [pdf, other

    cs.LG stat.ML

    General Policy Evaluation and Improvement by Learning to Identify Few But Crucial States

    Authors: Francesco Faccio, Aditya Ramesh, Vincent Herrmann, Jean Harb, Jürgen Schmidhuber

    Abstract: Learning to evaluate and improve policies is a core problem of Reinforcement Learning (RL). Traditional RL algorithms learn a value function defined for a single policy. A recently explored competitive alternative is to learn a single value function for many policies. Here we combine the actor-critic architecture of Parameter-Based Value Functions and the policy embedding of Policy Evaluation Netw… ▽ More

    Submitted 4 July, 2022; originally announced July 2022.

    Comments: Preprint. Under review

  41. arXiv:2206.01649  [pdf, other

    cs.LG

    Neural Differential Equations for Learning to Program Neural Nets Through Continuous Learning Rules

    Authors: Kazuki Irie, Francesco Faccio, Jürgen Schmidhuber

    Abstract: Neural ordinary differential equations (ODEs) have attracted much attention as continuous-time counterparts of deep residual neural networks (NNs), and numerous extensions for recurrent NNs have been proposed. Since the 1980s, ODEs have also been used to derive theoretical results for NN learning rules, e.g., the famous connection between Oja's rule and principal component analysis. Such rules are… ▽ More

    Submitted 14 October, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

    Comments: Accepted to NeurIPS 2022

  42. arXiv:2205.06595  [pdf, other

    stat.ML cs.AI cs.LG

    Upside-Down Reinforcement Learning Can Diverge in Stochastic Environments With Episodic Resets

    Authors: Miroslav Štrupl, Francesco Faccio, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh Kumar Srivastava

    Abstract: Upside-Down Reinforcement Learning (UDRL) is an approach for solving RL problems that does not require value functions and uses only supervised learning, where the targets for given inputs in a dataset do not change over time. Ghosh et al. proved that Goal-Conditional Supervised Learning (GCSL) -- which can be viewed as a simplified version of UDRL -- optimizes a lower bound on goal-reaching perfo… ▽ More

    Submitted 13 May, 2022; originally announced May 2022.

    Comments: presented at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making; 5 pages in main text + 1 page of references + 3 pages of appendices, 1 figure in main text; source code available at https://github.com/struplm/UDRL-GCSL-counterexample.git

    MSC Class: 68T05 ACM Class: I.2.6

  43. arXiv:2203.13573  [pdf, other

    cs.LG cs.AI cs.NE

    Unsupervised Learning of Temporal Abstractions with Slot-based Transformers

    Authors: Anand Gopalakrishnan, Kazuki Irie, Jürgen Schmidhuber, Sjoerd van Steenkiste

    Abstract: The discovery of reusable sub-routines simplifies decision-making and planning in complex reinforcement learning problems. Previous approaches propose to learn such temporal abstractions in a purely unsupervised fashion through observing state-action trajectories gathered from executing a policy. However, a current limitation is that they process each trajectory in an entirely sequential manner, w… ▽ More

    Submitted 22 November, 2022; v1 submitted 25 March, 2022; originally announced March 2022.

    Comments: accepted to Neural Computation journal

  44. arXiv:2202.12742  [pdf, other

    cs.LG cs.AI

    Learning Relative Return Policies With Upside-Down Reinforcement Learning

    Authors: Dylan R. Ashley, Kai Arulkumaran, Jürgen Schmidhuber, Rupesh Kumar Srivastava

    Abstract: Lately, there has been a resurgence of interest in using supervised learning to solve reinforcement learning problems. Recent work in this area has largely focused on learning command-conditioned policies. We investigate the potential of one such method -- upside-down reinforcement learning -- to work with commands that specify a desired relationship between some scalar value and the observed retu… ▽ More

    Submitted 10 May, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

    Comments: presented at the 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making; 5 pages in main text, 2 figures in main text

    ACM Class: I.2.6

  45. arXiv:2202.11960  [pdf, other

    cs.LG cs.AI

    All You Need Is Supervised Learning: From Imitation Learning to Meta-RL With Upside Down RL

    Authors: Kai Arulkumaran, Dylan R. Ashley, Jürgen Schmidhuber, Rupesh K. Srivastava

    Abstract: Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down, by taking returns as input and predicting actions. UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors. While previous work with UDRL demonstrated it in a traditional online RL… ▽ More

    Submitted 24 February, 2022; originally announced February 2022.

  46. arXiv:2202.05798  [pdf, other

    cs.LG

    The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

    Authors: Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber

    Abstract: Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the entire training experience. While this has been technically known since the 1960s, no prior work has effectively studied the operations of NNs in such a form, pre… ▽ More

    Submitted 17 June, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: Two first authors. Accepted to ICML 2022

  47. arXiv:2202.05780  [pdf, other

    cs.LG

    A Modern Self-Referential Weight Matrix That Learns to Modify Itself

    Authors: Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

    Abstract: The weight matrix (WM) of a neural network (NN) is its program. The programs of many traditional NNs are learned through gradient descent in some error function, then remain fixed. The WM of a self-referential NN, however, can keep rapidly modifying all of itself during runtime. In principle, such NNs can meta-learn to learn, and meta-meta-learn to meta-learn to learn, and so on, in the sense of r… ▽ More

    Submitted 17 June, 2022; v1 submitted 11 February, 2022; originally announced February 2022.

    Comments: Accepted to ICML 2022

  48. arXiv:2112.15550  [pdf, other

    cs.LG cs.CV

    Improving Baselines in the Wild

    Authors: Kazuki Irie, Imanol Schlag, Róbert Csordás, Jürgen Schmidhuber

    Abstract: We share our experience with the recently released WILDS benchmark, a collection of ten datasets dedicated to developing models and training strategies which are robust to domain shifts. Several experiments yield a couple of critical observations which we believe are of general interest for any future work on WILDS. Our study focuses on two datasets: iWildCam and FMoW. We show that (1) Conducting… ▽ More

    Submitted 31 December, 2021; originally announced December 2021.

    Comments: Presented at NeurIPS 2021 Workshop on Distribution Shifts, https://openreview.net/forum?id=9vxOrkNTs1x

  49. arXiv:2112.15545  [pdf, other

    cs.LG cs.CL

    Training and Generating Neural Networks in Compressed Weight Space

    Authors: Kazuki Irie, Jürgen Schmidhuber

    Abstract: The inputs and/or outputs of some neural nets are weight matrices of other neural nets. Indirect encodings or end-to-end compression of weight matrices could help to scale such approaches. Our goal is to open a discussion on this topic, starting with recurrent neural networks for character-level language modelling whose weight matrices are encoded by the discrete cosine transform. Our fast weight… ▽ More

    Submitted 31 December, 2021; originally announced December 2021.

    Comments: Presented at ICLR 2021 Workshop on Neural Compression, https://openreview.net/forum?id=qU1EUxdVd_D

  50. arXiv:2111.02216  [pdf, other

    cs.CL cs.LG cs.MM cs.SD eess.AS

    Automatic Embedding of Stories Into Collections of Independent Media

    Authors: Dylan R. Ashley, Vincent Herrmann, Zachary Friggstad, Kory W. Mathewson, Jürgen Schmidhuber

    Abstract: We look at how machine learning techniques that derive properties of items in a collection of independent media can be used to automatically embed stories into such collections. To do so, we use models that extract the tempo of songs to make a music playlist follow a narrative arc. Our work specifies an open-source tool that uses pre-trained neural network models to extract the global tempo of a s… ▽ More

    Submitted 3 November, 2021; originally announced November 2021.

    Comments: 2 pages in main text + 1 page of references + 6 pages of appendices, 2 figures in main text + 3 figures in appendices, 1 algorithm in appendices; source code available at https://gist.github.com/dylanashley/1387a99deb85bfc0bce11286810cd98b

    ACM Class: H.5.5; I.2.6; J.5