Skip to main content

Showing 1–50 of 107 results for author: Hofmann, T

  1. arXiv:2406.16300  [pdf, other

    cs.LG

    Landscaping Linear Mode Connectivity

    Authors: Sidak Pal Singh, Linara Adilova, Michael Kamp, Asja Fischer, Bernhard Schölkopf, Thomas Hofmann

    Abstract: The presence of linear paths in parameter space between two different network solutions in certain cases, i.e., linear mode connectivity (LMC), has garnered interest from both theoretical and practical fronts. There has been significant research that either practically designs algorithms catered for connecting networks by adjusting for the permutation symmetries as well as some others that more th… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: ICML 2024 HiLD workshop paper

  2. arXiv:2406.10256  [pdf, other

    cs.CL cs.AI cs.LG

    Explicit Word Density Estimation for Language Modelling

    Authors: Jovan Andonov, Octavian Ganea, Paulina Grnarova, Gary Bécigneul, Thomas Hofmann

    Abstract: Language Modelling has been a central part of Natural Language Processing for a very long time and in the past few years LSTM-based language models have been the go-to method for commercial language modeling. Recently, it has been shown that when looking at language modelling from a matrix factorization point of view, the final Softmax layer limits the expressiveness of the model, by putting an up… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: Master's thesis

  3. arXiv:2406.04327  [pdf, other

    cs.LG

    Causal Estimation of Memorisation Profiles

    Authors: Pietro Lesci, Clara Meister, Thomas Hofmann, Andreas Vlachos, Tiago Pimentel

    Abstract: Understanding memorisation in language models has practical and societal implications, e.g., studying models' training dynamics or preventing copyright infringements. Prior work defines memorisation as the causal effect of training with an instance on the model's ability to predict that instance. This definition relies on a counterfactual: the ability to observe what would have happened had the mo… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Published at the ACL 2024 Conference (main)

  4. arXiv:2405.19279  [pdf, other

    cs.LG

    Understanding and Minimising Outlier Features in Neural Network Training

    Authors: Bobby He, Lorenzo Noci, Daniele Paliotta, Imanol Schlag, Thomas Hofmann

    Abstract: Outlier Features (OF) are neurons whose activation magnitudes significantly exceed the average over a neural network's (NN) width. They are well known to emerge during standard transformer training and have the undesirable effect of hindering quantisation in afflicted models. Despite their practical importance, little is known behind why OFs emerge during training, nor how one can minimise them.… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  5. arXiv:2404.13766  [pdf, other

    cs.CV

    Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control

    Authors: Maria Mihaela Trusca, Wolf Nuyts, Jonathan Thomm, Robert Honig, Thomas Hofmann, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Current diffusion models create photorealistic images given a text prompt as input but struggle to correctly bind attributes mentioned in the text to the right objects in the image. This is evidenced by our novel image-graph alignment model called EPViT (Edge Prediction Vision Transformer) for the evaluation of image-text alignment. To alleviate the above problem, we propose focused cross-attentio… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  6. arXiv:2404.07982  [pdf, other

    cs.CL cs.LG

    Language Imbalance Can Boost Cross-lingual Generalisation

    Authors: Anton Schäfer, Shauli Ravfogel, Thomas Hofmann, Tiago Pimentel, Imanol Schlag

    Abstract: Multilinguality is crucial for extending recent advancements in language modelling to diverse linguistic communities. To maintain high performance while representing multiple languages, multilingual models ideally align representations, allowing what is learned in one language to generalise to others. Prior research has emphasised the importance of parallel data and shared vocabulary elements as k… ▽ More

    Submitted 13 May, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  7. arXiv:2404.06508  [pdf, other

    cs.CL cs.LG

    On the Effect of (Near) Duplicate Subwords in Language Modelling

    Authors: Anton Schäfer, Thomas Hofmann, Imanol Schlag, Tiago Pimentel

    Abstract: Tokenisation is a core part of language models (LMs). It involves splitting a character sequence into subwords which are assigned arbitrary indices before being served to the LM. While typically lossless, however, this process may lead to less sample efficient LM training: as it removes character-level information, it could make it harder for LMs to generalise across similar subwords, such as now… ▽ More

    Submitted 2 May, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

    ACM Class: I.2.7

  8. arXiv:2404.02499  [pdf, other

    cs.AI cs.LG

    Learning Generalized Policies for Fully Observable Non-Deterministic Planning Domains

    Authors: Till Hofmann, Hector Geffner

    Abstract: General policies represent reactive strategies for solving large families of planning problems like the infinite collection of solvable instances from a given domain. Methods for learning such policies from a collection of small training instances have been developed successfully for classical domains. In this work, we extend the formulations and the resulting combinatorial methods for learning ge… ▽ More

    Submitted 13 May, 2024; v1 submitted 3 April, 2024; originally announced April 2024.

    Comments: presented at IJCAI'24

  9. arXiv:2403.07379  [pdf, other

    cs.LG cs.CL stat.ML

    Hallmarks of Optimization Trajectories in Neural Networks: Directional Exploration and Redundancy

    Authors: Sidak Pal Singh, Bobby He, Thomas Hofmann, Bernhard Schölkopf

    Abstract: We propose a fresh take on understanding the mechanisms of neural networks by analyzing the rich directional structure of optimization trajectories, represented by their pointwise parameters. Towards this end, we introduce some natural notions of the complexity of optimization trajectories, both qualitative and quantitative, which hallmark the directional nature of optimization in neural networks:… ▽ More

    Submitted 24 June, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

    Comments: Preprint, 57 pages

  10. arXiv:2402.17457  [pdf, other

    cs.LG

    Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

    Authors: Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

    Abstract: Recently, there has been growing evidence that if the width and depth of a neural network are scaled toward the so-called rich feature learning limit ($μ$P and its depth extension), then some hyperparameters - such as the learning rate - exhibit transfer from small to very large models, thus reducing the cost of hyperparameter tuning. From an optimization perspective, this phenomenon is puzzling,… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

  11. arXiv:2402.14433  [pdf, other

    cs.CL cs.AI

    A Language Model's Guide Through Latent Space

    Authors: Dimitri von Rütte, Sotiris Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Concept guidance has emerged as a cheap and simple way to control the behavior of language models by probing their hidden representations for concept vectors and using them to perturb activations at inference time. While the focus of previous work has largely been on truthfulness, in this paper we extend this framework to a richer set of concepts such as appropriateness, humor, creativity and qual… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    ACM Class: I.2

  12. arXiv:2402.07839  [pdf, other

    cs.CV cs.LG

    Towards Meta-Pruning via Optimal Transport

    Authors: Alexander Theus, Olin Geimer, Friedrich Wicke, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

    Abstract: Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts. This paper introduces a novel approach named Intra-Fusion, challenging this prevailing pruning paradigm. Unlike existing methods that focus on designing meaningful neuron importanc… ▽ More

    Submitted 13 February, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

    Comments: Accepted as a Spotlight (top 5% of submissions) at the International Conference on Learning Representations (ICLR) 2024

  13. arXiv:2402.03187  [pdf, other

    cs.LG

    How Good is a Single Basin?

    Authors: Kai Lion, Lorenzo Noci, Thomas Hofmann, Gregor Bachmann

    Abstract: The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles. In this work, we probe this belief by constructing various "connected" ensembles which are restricted to lie in the same basin. Through our experiments, we demonstrate that increased connectivity indeed negatively impacts performance. However, when incorporatin… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  14. arXiv:2402.03164  [pdf, ps, other

    cs.AI cs.LO

    Decidable Reasoning About Time in Finite-Domain Situation Calculus Theories

    Authors: Till Hofmann, Stefan Schupp, Gerhard Lakemeyer

    Abstract: Representing time is crucial for cyber-physical systems and has been studied extensively in the Situation Calculus. The most commonly used approach represents time by adding a real-valued fluent $\mathit{time}(a)$ that attaches a time point to each action and consequently to each situation. We show that in this approach, checking whether there is a reachable situation that satisfies a given formul… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

  15. arXiv:2401.16024  [pdf, other

    cs.LG cs.AI

    Probabilistic Abduction for Visual Abstract Reasoning via Learning Rules in Vector-symbolic Architectures

    Authors: Michael Hersche, Francesco di Stefano, Thomas Hofmann, Abu Sebastian, Abbas Rahimi

    Abstract: Abstract reasoning is a cornerstone of human intelligence, and replicating it with artificial intelligence (AI) presents an ongoing challenge. This study focuses on efficiently solving Raven's progressive matrices (RPM), a visual test for assessing abstract reasoning abilities, by using distributed computation and operators provided by vector-symbolic architectures (VSA). Instead of hard-coding th… ▽ More

    Submitted 29 January, 2024; originally announced January 2024.

    Comments: Accepted in NeurIPS 2023 Workshop on MATH-AI

  16. Towards Bridging the Gap between High-Level Reasoning and Execution on Robots

    Authors: Till Hofmann

    Abstract: When reasoning about actions, e.g., by means of task planning or agent programming with Golog, the robot's actions are typically modeled on an abstract level, where complex actions such as picking up an object are treated as atomic primitives with deterministic effects and preconditions that only depend on the current state. However, when executing such an action on a robot it can no longer be see… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

    Comments: PhD Thesis

  17. arXiv:2312.09832  [pdf, other

    cs.LG

    Disentangling Linear Mode-Connectivity

    Authors: Gul Sena Altintas, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

    Abstract: Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes. While empirically well established, it unfortunately still lacks a proper theoretical understanding. Even worse, although empirical data points are abound, a systematic study of when networks exhibit LMC is largely missing in the literature. In this work we aim to close this… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

    Comments: 9 pages, 5 figures

  18. arXiv:2312.09256  [pdf, other

    cs.CV

    LIME: Localized Image Editing via Attention Regularization in Diffusion Models

    Authors: Enis Simsar, Alessio Tonioni, Yongqin Xian, Thomas Hofmann, Federico Tombari

    Abstract: Diffusion models (DMs) have gained prominence due to their ability to generate high-quality, varied images, with recent advancements in text-to-image generation. The research focus is now shifting towards the controllability of DMs. A significant challenge within this domain is localized editing, where specific areas of an image are modified without affecting the rest of the content. This paper in… ▽ More

    Submitted 14 December, 2023; originally announced December 2023.

  19. arXiv:2312.01538  [pdf, other

    cs.LG cs.NE

    Recurrent Distance Filtering for Graph Representation Learning

    Authors: Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann

    Abstract: Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively. Conversely, graph transformers allow each node to attend to all other nodes directly, but lack graph inductive bias and have to rely on ad-hoc positional encoding. In this paper, we propose a new architecture to reconcile these challenges. Our a… ▽ More

    Submitted 5 June, 2024; v1 submitted 3 December, 2023; originally announced December 2023.

    Comments: ICML 2024

  20. arXiv:2311.06224  [pdf, other

    cs.CV cs.AI cs.LG

    Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

    Authors: Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann

    Abstract: Recent advancements in deep learning have been primarily driven by the use of large models trained on increasingly vast datasets. While neural scaling laws have emerged to predict network performance given a specific level of computational resources, the growing demand for expansive datasets raises concerns. To address this, a new research direction has emerged, focusing on the creation of synthet… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  21. arXiv:2311.03233  [pdf, other

    cs.LG cs.CV

    Navigating Scaling Laws: Compute Optimality in Adaptive Model Training

    Authors: Sotiris Anagnostidis, Gregor Bachmann, Imanol Schlag, Thomas Hofmann

    Abstract: In recent years, the state-of-the-art in deep learning has been dominated by very large models that have been pre-trained on vast amounts of data. The paradigm is very simple: investing more computational resources (optimally) leads to better performance, and even predictably so; neural scaling laws have been derived that accurately forecast the performance of a network for a desired level of comp… ▽ More

    Submitted 23 May, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

  22. arXiv:2311.01906  [pdf, other

    cs.LG

    Simplifying Transformer Blocks

    Authors: Bobby He, Thomas Hofmann

    Abstract: A simple design recipe for deep Transformers is to compose identical building blocks. But standard transformer blocks are far from simple, interweaving attention and MLP sub-blocks with skip connections & normalisation layers in precise arrangements. This complexity leads to brittle architectures, where seemingly minor changes can significantly reduce training speed, or render models untrainable.… ▽ More

    Submitted 31 May, 2024; v1 submitted 3 November, 2023; originally announced November 2023.

    Comments: ICLR 2024

  23. arXiv:2310.05719  [pdf, other

    cs.LG stat.ML

    Transformer Fusion with Optimal Transport

    Authors: Moritz Imfeld, Jacopo Graldi, Marco Giordano, Thomas Hofmann, Sotiris Anagnostidis, Sidak Pal Singh

    Abstract: Fusion is a technique for merging multiple independently-trained neural networks in order to combine their capabilities. Past attempts have been restricted to the case of fully-connected, convolutional, and residual networks. This paper presents a systematic approach for fusing two or more transformer-based networks exploiting Optimal Transport to (soft-)align the various architectural components.… ▽ More

    Submitted 22 April, 2024; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: Appears at International Conference on Learning Representations (ICLR), 2024. M. Imfeld, J. Graldi, and M. Giordano are the first authors and contributed equally to this work

  24. arXiv:2310.01165  [pdf, other

    cs.LG cs.AI

    Towards guarantees for parameter isolation in continual learning

    Authors: Giulia Lanzillotta, Sidak Pal Singh, Benjamin F. Grewe, Thomas Hofmann

    Abstract: Deep learning has proved to be a successful paradigm for solving many challenges in machine learning. However, deep neural networks fail when trained sequentially on multiple tasks, a shortcoming known as catastrophic forgetting in the continual learning literature. Despite a recent flourish of learning algorithms successfully addressing this problem, we find that provable guarantees against catas… ▽ More

    Submitted 2 October, 2023; originally announced October 2023.

    Comments: 10 pages, 3 figures

  25. arXiv:2309.11197  [pdf, other

    cs.LG cs.CL

    The Languini Kitchen: Enabling Language Modelling Research at Different Scales of Compute

    Authors: Aleksandar Stanić, Dylan Ashley, Oleg Serikov, Louis Kirsch, Francesco Faccio, Jürgen Schmidhuber, Thomas Hofmann, Imanol Schlag

    Abstract: The Languini Kitchen serves as both a research collective and codebase designed to empower researchers with limited computational resources to contribute meaningfully to the field of language modelling. We introduce an experimental protocol that enables model comparisons based on equivalent compute, measured in accelerator hours. The number of tokens on which a model is trained is defined by the m… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  26. arXiv:2306.17759  [pdf, other

    stat.ML cs.LG

    The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

    Authors: Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

    Abstract: In deep learning theory, the covariance matrix of the representations serves as a proxy to examine the network's trainability. Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width. We show that at initialization the limiting distribution can be described by a… ▽ More

    Submitted 9 December, 2023; v1 submitted 30 June, 2023; originally announced June 2023.

  27. arXiv:2306.13575  [pdf, other

    cs.LG

    Scaling MLPs: A Tale of Inductive Bias

    Authors: Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

    Abstract: In this work we revisit the most fundamental building block in deep learning, the multi-layer perceptron (MLP), and study the limits of its performance on vision tasks. Empirical insights into MLPs are important for multiple reasons. (1) Given the recent narrative "less inductive bias is better", popularized due to transformers eclipsing convolutional models, it is natural to explore the limits of… ▽ More

    Submitted 3 October, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

  28. arXiv:2306.02329  [pdf, other

    cs.CV

    Multi-CLIP: Contrastive Vision-Language Pre-training for Question Answering tasks in 3D Scenes

    Authors: Alexandros Delitzas, Maria Parelli, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply common-sense linguistic knowledge and visual concepts from 2D images to 3D scene understanding is a promising direction that researchers have only recently started to explore. However, it still remains understudied whether 2D distilled knowledge can provide useful representations for downstream 3D vision-language tasks such as 3D question answering. In this paper, we propo… ▽ More

    Submitted 4 June, 2023; originally announced June 2023.

    Comments: The first two authors contributed equally. arXiv admin note: text overlap with arXiv:2304.06061

  29. arXiv:2305.15805  [pdf, other

    cs.CL cs.LG

    Dynamic Context Pruning for Efficient and Interpretable Autoregressive Transformers

    Authors: Sotiris Anagnostidis, Dario Pavllo, Luca Biggio, Lorenzo Noci, Aurelien Lucchi, Thomas Hofmann

    Abstract: Autoregressive Transformers adopted in Large Language Models (LLMs) are hard to scale to long sequences. Despite several works trying to reduce their computational cost, most of LLMs still adopt attention layers between all pairs of tokens in the sequence, thus incurring a quadratic cost. In this study, we present a novel approach that dynamically prunes contextual information while preserving the… ▽ More

    Submitted 31 May, 2024; v1 submitted 25 May, 2023; originally announced May 2023.

  30. arXiv:2305.09088  [pdf, other

    cs.LG stat.ML

    The Hessian perspective into the Nature of Convolutional Neural Networks

    Authors: Sidak Pal Singh, Thomas Hofmann, Bernhard Schölkopf

    Abstract: While Convolutional Neural Networks (CNNs) have long been investigated and applied, as well as theorized, we aim to provide a slightly different perspective into their nature -- through the perspective of their Hessian maps. The reason is that the loss Hessian captures the pairwise interaction of parameters and therefore forms a natural ground to probe how the architectural aspects of CNN get mani… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    Comments: ICML 2023 conference proceedings

  31. arXiv:2304.06061  [pdf, other

    cs.CV

    CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

    Authors: Maria Parelli, Alexandros Delitzas, Nikolas Hars, Georgios Vlassis, Sotirios Anagnostidis, Gregor Bachmann, Thomas Hofmann

    Abstract: Training models to apply linguistic knowledge and visual concepts from 2D images to 3D world understanding is a promising direction that researchers have only recently started to explore. In this work, we design a novel 3D pre-training Vision-Language method that helps a model learn semantically meaningful and transferable 3D scene point cloud representations. We inject the representational power… ▽ More

    Submitted 12 April, 2023; originally announced April 2023.

    Comments: CVPRW 2023. Code will be made publicly available: https://github.com/AlexDelitzas/3D-VQA

  32. arXiv:2303.09483  [pdf, other

    cs.LG cs.CV

    Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

    Authors: Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, Thomas Hofmann

    Abstract: In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task. Since then, the continual learning (CL) community has proposed several solutions aiming to equip the neural network with the ability to lear… ▽ More

    Submitted 31 March, 2023; v1 submitted 16 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  33. arXiv:2302.12091  [pdf, other

    cs.LG

    Random Teachers are Good Teachers

    Authors: Felix Sarnthein, Gregor Bachmann, Sotiris Anagnostidis, Thomas Hofmann

    Abstract: In this work, we investigate the implicit regularization induced by teacher-student learning dynamics in self-distillation. To isolate its effect, we describe a simple experiment where we consider teachers at random initialization instead of trained teachers. Surprisingly, when distilling a student into such a random teacher, we observe that the resulting model and its representations already poss… ▽ More

    Submitted 19 June, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning, PMLR, volume 202 (2023), pages 30022-30041

  34. arXiv:2211.12346  [pdf, other

    astro-ph.CO cs.LG

    Cosmology from Galaxy Redshift Surveys with PointNet

    Authors: Sotiris Anagnostidis, Arne Thomsen, Tomasz Kacprzak, Tilman Tröster, Luca Biggio, Alexandre Refregier, Thomas Hofmann

    Abstract: In recent years, deep learning approaches have achieved state-of-the-art results in the analysis of point cloud data. In cosmology, galaxy redshift surveys resemble such a permutation invariant collection of positions in space. These surveys have so far mostly been analysed with two-point statistics, such as power spectra and correlation functions. The usage of these summary statistics is best jus… ▽ More

    Submitted 22 November, 2022; originally announced November 2022.

  35. arXiv:2210.14019  [pdf, other

    cs.LG

    The Curious Case of Benign Memorization

    Authors: Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

    Abstract: Despite the empirical advances of deep learning across a variety of learning tasks, our theoretical understanding of its success is still very restricted. One of the key challenges is the overparametrized nature of modern models, enabling complete overfitting of the data even if the labels are randomized, i.e. networks can completely \textit{memorize} all given patterns. While such a memorization… ▽ More

    Submitted 23 February, 2023; v1 submitted 25 October, 2022; originally announced October 2022.

  36. arXiv:2210.12084  [pdf, other

    cs.CL cs.AI cs.LG

    Decoding a Neural Retriever's Latent Space for Query Suggestion

    Authors: Leonard Adolphs, Michelle Chen Huebscher, Christian Buck, Sertan Girgin, Olivier Bachem, Massimiliano Ciaramita, Thomas Hofmann

    Abstract: Neural retrieval models have superseded classic bag-of-words methods such as BM25 as the retrieval framework of choice. However, neural systems lack the interpretability of bag-of-words models; it is not trivial to connect a query change to a change in the latent space that ultimately determines the retrieval results. To shed light on this embedding space, we learn a "query decoder" that, given a… ▽ More

    Submitted 21 October, 2022; originally announced October 2022.

  37. arXiv:2210.00828  [pdf, other

    cs.CV

    Mastering Spatial Graph Prediction of Road Networks

    Authors: Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann

    Abstract: Accurately predicting road networks from satellite images requires a global understanding of the network topology. We propose to capture such high-level information by introducing a graph-based framework that simulates the addition of sequences of graph edges using a reinforcement learning (RL) approach. In particular, given a partially generated graph associated with a satellite image, an RL agen… ▽ More

    Submitted 3 October, 2022; originally announced October 2022.

  38. arXiv:2207.12763  [pdf, ps, other

    cs.AI

    Using Abstraction for Interpretable Robot Programs in Stochastic Domains

    Authors: Till Hofmann, Vaishak Belle

    Abstract: A robot's actions are inherently stochastic, as its sensors are noisy and its actions do not always have the intended effects. For this reason, the agent language Golog has been extended to models with degrees of belief and stochastic actions. While this allows more precise robot models, the resulting programs are much harder to comprehend, because they need to deal with the noise, e.g., by loopin… ▽ More

    Submitted 26 July, 2022; originally announced July 2022.

    Comments: Presented at the KR'22 Workshop on Explainable Logic-Based Knowledge Representation (XLoKR). arXiv admin note: substantial text overlap with arXiv:2204.03536

  39. arXiv:2207.12319  [pdf, other

    cs.CV cs.AI

    OpenFilter: A Framework to Democratize Research Access to Social Media AR Filters

    Authors: Piera Riccio, Bill Psomas, Francesco Galati, Francisco Escolano, Thomas Hofmann, Nuria Oliver

    Abstract: Augmented Reality or AR filters on selfies have become very popular on social media platforms for a variety of applications, including marketing, entertainment and aesthetics. Given the wide adoption of AR face filters and the importance of faces in our social structures and relations, there is increased interest by the scientific community to analyze the impact of such filters from a psychologica… ▽ More

    Submitted 27 September, 2022; v1 submitted 19 July, 2022; originally announced July 2022.

    ACM Class: I.4.9

  40. arXiv:2206.09864  [pdf, other

    cs.AI cs.RO

    Towards Using Promises for Multi-Agent Cooperation in Goal Reasoning

    Authors: Daniel Swoboda, Till Hofmann, Tarik Viehmann, Gerhard Lakemeyer

    Abstract: Reasoning and planning for mobile robots is a challenging problem, as the world evolves over time and thus the robot's goals may change. One technique to tackle this problem is goal reasoning, where the agent not only reasons about its actions, but also about which goals to pursue. While goal reasoning for single agents has been researched extensively, distributed, multi-agent goal reasoning comes… ▽ More

    Submitted 20 June, 2022; originally announced June 2022.

    Comments: Presented at the ICAPS'22 Workshop on Planning and Robotics (PlanRob)

  41. arXiv:2205.13900  [pdf, other

    cs.LG stat.ML

    How Tempering Fixes Data Augmentation in Bayesian Neural Networks

    Authors: Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

    Abstract: While Bayesian neural networks (BNNs) provide a sound and principled alternative to standard neural networks, an artificial sharpening of the posterior usually needs to be applied to reach comparable performance. This is in stark contrast to theory, dictating that given an adequate prior and a well-specified model, the untempered Bayesian posterior should achieve optimal performance. Despite the c… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Report number: Proceedings of the 39th International Conference on Machine Learning, PMLR 162:1244-1260

  42. arXiv:2204.03596  [pdf, ps, other

    cs.AI

    Controlling Golog Programs against MTL Constraints

    Authors: Till Hofmann, Stefan Schupp

    Abstract: While Golog is an expressive programming language to control the high-level behavior of a robot, it is often tedious to use on a real robotic system. On an actual robot, the user needs to consider low-level details, such as enabling and disabling hardware components, e.g., a camera to detect objects for grasping. In other words, high-level actions usually pose implicit temporal constraints on the… ▽ More

    Submitted 7 April, 2022; originally announced April 2022.

  43. arXiv:2204.03536  [pdf, other

    cs.AI

    Abstracting Noisy Robot Programs

    Authors: Till Hofmann, Vaishak Belle

    Abstract: Abstraction is a commonly used process to represent some low-level system by a more coarse specification with the goal to omit unnecessary details while preserving important aspects. While recent work on abstraction in the situation calculus has focused on non-probabilistic domains, we describe an approach to abstraction of probabilistic and dynamic systems. Based on a variant of the situation cal… ▽ More

    Submitted 1 March, 2023; v1 submitted 7 April, 2022; originally announced April 2022.

    Comments: To be presented at AAMAS'23

  44. arXiv:2203.07337  [pdf, other

    stat.ML cs.LG

    Phenomenology of Double Descent in Finite-Width Neural Networks

    Authors: Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf

    Abstract: `Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized. The current theoretical understanding behind the occurrence of this phenomenon is primarily based on linear and kernel regression models -- with informal parallels to neural networks via the Neural Tangent Kernel. Therefore such analyses do not adequately capture… ▽ More

    Submitted 14 March, 2022; originally announced March 2022.

    Comments: Published at ICLR 2022

  45. arXiv:2203.03443  [pdf, other

    cs.LG

    Generalization Through The Lens Of Leave-One-Out Error

    Authors: Gregor Bachmann, Thomas Hofmann, Aurélien Lucchi

    Abstract: Despite the tremendous empirical success of deep learning models to solve various learning tasks, our theoretical understanding of their generalization ability is very limited. Classical generalization bounds based on tools such as the VC dimension or Rademacher complexity, are so far unsuitable for deep models and it is doubtful that these techniques can yield tight bounds even in the most ideali… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  46. arXiv:2201.10936  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

    Authors: Dimitri von Rütte, Luca Biggio, Yannic Kilcher, Thomas Hofmann

    Abstract: Generating music with deep neural networks has been an area of active research in recent years. While the quality of generated samples has been steadily increasing, most methods are only able to exert minimal control over the generated sequence, if any. We propose the self-supervised description-to-sequence task, which allows for fine-grained controllable generation on a global level. We do so by… ▽ More

    Submitted 22 February, 2024; v1 submitted 26 January, 2022; originally announced January 2022.

    Comments: Published in ICLR 2023

  47. arXiv:2201.00384  [pdf, other

    cs.LG eess.SP

    On the effectiveness of Randomized Signatures as Reservoir for Learning Rough Dynamics

    Authors: Enea Monzio Compagnoni, Anna Scampicchio, Luca Biggio, Antonio Orvieto, Thomas Hofmann, Josef Teichmann

    Abstract: Many finance, physics, and engineering phenomena are modeled by continuous-time dynamical systems driven by highly irregular (stochastic) inputs. A powerful tool to perform time series analysis in this context is rooted in rough path theory and leverages the so-called Signature Transform. This algorithm enjoys strong theoretical guarantees but is hard to scale to high-dimensional data. In this pap… ▽ More

    Submitted 26 April, 2023; v1 submitted 2 January, 2022; originally announced January 2022.

    Comments: Accepted for IEEE IJCNN 2023

  48. arXiv:2109.00527  [pdf, other

    cs.CL cs.AI cs.IR cs.LG

    Boosting Search Engines with Interactive Agents

    Authors: Leonard Adolphs, Benjamin Boerschinger, Christian Buck, Michelle Chen Huebscher, Massimiliano Ciaramita, Lasse Espeholt, Thomas Hofmann, Yannic Kilcher, Sascha Rothe, Pier Giuseppe Sessa, Lierni Sestorain Saralegui

    Abstract: This paper presents first successful steps in designing search agents that learn meta-strategies for iterative query refinement in information-seeking tasks. Our approach uses machine reading to guide the selection of refinement terms from aggregated search results. Agents are then empowered with simple but effective search operators to exert fine-grained and transparent control over queries and s… ▽ More

    Submitted 7 June, 2022; v1 submitted 1 September, 2021; originally announced September 2021.

    Comments: Published in Transactions on Machine Learning Research (06/2022)

  49. arXiv:2108.01928  [pdf, other

    cs.CL cs.IR cs.LG

    How to Query Language Models?

    Authors: Leonard Adolphs, Shehzaad Dhuliawala, Thomas Hofmann

    Abstract: Large pre-trained language models (LMs) are capable of not only recovering linguistic but also factual and commonsense knowledge. To access the knowledge stored in mask-based LMs, we can use cloze-style questions and let the model fill in the blank. The flexibility advantage over structured knowledge bases comes with the drawback of finding the right query for a certain information need. Inspired… ▽ More

    Submitted 4 August, 2021; originally announced August 2021.

  50. arXiv:2106.16225  [pdf, other

    cs.LG cs.NE math.ST stat.ML

    Analytic Insights into Structure and Rank of Neural Network Hessian Maps

    Authors: Sidak Pal Singh, Gregor Bachmann, Thomas Hofmann

    Abstract: The Hessian of a neural network captures parameter interactions through second-order derivatives of the loss. It is a fundamental object of study, closely tied to various problems in deep learning, including model design, optimization, and generalization. Most prior work has been empirical, typically focusing on low-rank approximations and heuristics that are blind to the network structure. In con… ▽ More

    Submitted 1 July, 2021; v1 submitted 30 June, 2021; originally announced June 2021.