Skip to main content

Showing 1–50 of 127 results for author: Tuytelaars, T

  1. arXiv:2406.16085  [pdf, other

    cs.CV

    A Simple Framework for Open-Vocabulary Zero-Shot Segmentation

    Authors: Thomas Stegmüller, Tim Lebailly, Nikola Dukic, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran

    Abstract: Zero-shot classification capabilities naturally arise in models trained within a vision-language contrastive framework. Despite their classification prowess, these models struggle in dense tasks like zero-shot open-vocabulary segmentation. This deficiency is often attributed to the absence of localization cues in captions and the intertwined nature of the learning process, which encompasses both i… ▽ More

    Submitted 1 July, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

  2. arXiv:2406.09935  [pdf, other

    cs.LG

    Forgetting Order of Continual Learning: Examples That are Learned First are Forgotten Last

    Authors: Guy Hacohen, Tinne Tuytelaars

    Abstract: Catastrophic forgetting poses a significant challenge in continual learning, where models often forget previous tasks when trained on new data. Our empirical analysis reveals a strong correlation between catastrophic forgetting and the learning speed of examples: examples learned early are rarely forgotten, while those learned later are more susceptible to forgetting. We demonstrate that replay-ba… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  3. arXiv:2404.18020  [pdf, other

    cs.CV

    DM-Align: Leveraging the Power of Natural Language Instructions to Make Changes to Images

    Authors: Maria Mihaela Trusca, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Text-based semantic image editing assumes the manipulation of an image using a natural language instruction. Although recent works are capable of generating creative and qualitative images, the problem is still mostly approached as a black box sensitive to generating unexpected outputs. Therefore, we propose a novel model to enhance the text-based control of an image editor by explicitly reasoning… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

  4. arXiv:2404.13766  [pdf, other

    cs.CV

    Object-Attribute Binding in Text-to-Image Generation: Evaluation and Control

    Authors: Maria Mihaela Trusca, Wolf Nuyts, Jonathan Thomm, Robert Honig, Thomas Hofmann, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Current diffusion models create photorealistic images given a text prompt as input but struggle to correctly bind attributes mentioned in the text to the right objects in the image. This is evidenced by our novel image-graph alignment model called EPViT (Edge Prediction Vision Transformer) for the evaluation of image-text alignment. To alleviate the above problem, we propose focused cross-attentio… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  5. arXiv:2404.12819  [pdf, other

    cs.CV

    Unveiling the Ambiguity in Neural Inverse Rendering: A Parameter Compensation Analysis

    Authors: Georgios Kouros, Minye Wu, Sushruth Nagesh, Xianling Zhang, Tinne Tuytelaars

    Abstract: Inverse rendering aims to reconstruct the scene properties of objects solely from multiview images. However, it is an ill-posed problem prone to producing ambiguous estimations deviating from physically accurate representations. In this paper, we utilize Neural Microfacet Fields (NMF), a state-of-the-art neural inverse rendering method to illustrate the inherent ambiguity. We propose an evaluation… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

  6. arXiv:2403.15102  [pdf, other

    cs.RO

    Learning from Visual Demonstrations through Differentiable Nonlinear MPC for Personalized Autonomous Driving

    Authors: Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

    Abstract: Human-like autonomous driving controllers have the potential to enhance passenger perception of autonomous vehicles. This paper proposes DriViDOC: a model for Driving from Vision through Differentiable Optimal Control, and its application to learn personalized autonomous driving controllers from human demonstrations. DriViDOC combines the automatic inference of relevant features from camera frames… ▽ More

    Submitted 8 July, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

    Comments: This work has been accepted for publication in the Proceedings of the 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2024). Accompanying video available at: https://youtu.be/WxWPuAtJ08E

  7. arXiv:2403.10179  [pdf, other

    cs.CV

    Animate Your Motion: Turning Still Images into Dynamic Videos

    Authors: Mingxiao Li, Bo Wan, Marie-Francine Moens, Tinne Tuytelaars

    Abstract: In recent years, diffusion models have made remarkable strides in text-to-video generation, sparking a quest for enhanced control over video outputs to more accurately reflect user intentions. Traditional efforts predominantly focus on employing either semantic cues, like images or depth maps, or motion-based conditions, like moving sketches or object bounding boxes. Semantic inputs offer a rich s… ▽ More

    Submitted 15 July, 2024; v1 submitted 15 March, 2024; originally announced March 2024.

    Comments: Accepted at European Conference on Computer Vision (ECCV 2024)

  8. arXiv:2403.09377  [pdf, other

    cs.CV

    Introducing Routing Functions to Vision-Language Parameter-Efficient Fine-Tuning with Low-Rank Bottlenecks

    Authors: Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: Mainstream parameter-efficient fine-tuning (PEFT) methods, such as LoRA or Adapter, project a model's hidden states to a lower dimension, allowing pre-trained models to adapt to new data through this low-rank bottleneck. However, PEFT tasks involving multiple modalities, like vision-language (VL) tasks, require not only adaptation to new data but also learning the relationship between different mo… ▽ More

    Submitted 12 July, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

    Comments: Accepted at ECCV 2024

  9. arXiv:2402.14957  [pdf, other

    cs.CV cs.LG

    The Common Stability Mechanism behind most Self-Supervised Learning Approaches

    Authors: Abhishek Jha, Matthew B. Blaschko, Yuki M. Asano, Tinne Tuytelaars

    Abstract: Last couple of years have witnessed a tremendous progress in self-supervised learning (SSL), the success of which can be attributed to the introduction of useful inductive biases in the learning process to learn meaningful visual representations while avoiding collapse. These inductive biases and constraints manifest themselves in the form of different optimization formulations in the SSL techniqu… ▽ More

    Submitted 22 February, 2024; originally announced February 2024.

    Comments: Additional visualizations (.gif): https://github.com/abskjha/CenterVectorSSL

  10. arXiv:2312.16731  [pdf, other

    cs.LG cs.CV

    Infinite dSprites for Disentangled Continual Learning: Separating Memory Edits from Generalization

    Authors: Sebastian Dziadzio, Çağatay Yıldız, Gido M. van de Ven, Tomasz Trzciński, Tinne Tuytelaars, Matthias Bethge

    Abstract: The ability of machine learning systems to learn continually is hindered by catastrophic forgetting, the tendency of neural networks to overwrite existing knowledge when learning a new task. Continual learning methods alleviate this problem through regularization, parameter isolation, or rehearsal, but they are typically evaluated on benchmarks comprising only a handful of tasks. In contrast, huma… ▽ More

    Submitted 29 February, 2024; v1 submitted 27 December, 2023; originally announced December 2023.

    Comments: 10 pages, 10 figures

  11. arXiv:2312.08586  [pdf, other

    cs.LG cs.CV stat.ML

    Estimating calibration error under label shift without labels

    Authors: Teodora Popordanoska, Gorjan Radevski, Tinne Tuytelaars, Matthew B. Blaschko

    Abstract: In the face of dataset shift, model calibration plays a pivotal role in ensuring the reliability of machine learning systems. Calibration error (CE) is an indicator of the alignment between the predicted probabilities and the classifier accuracy. While prior works have delved into the implications of dataset shift on calibration, existing CE estimators assume access to labels from the target domai… ▽ More

    Submitted 13 December, 2023; originally announced December 2023.

    Comments: Preprint

  12. arXiv:2312.06713  [pdf, other

    cs.CV

    TeTriRF: Temporal Tri-Plane Radiance Fields for Efficient Free-Viewpoint Video

    Authors: Minye Wu, Zehao Wang, Georgios Kouros, Tinne Tuytelaars

    Abstract: Neural Radiance Fields (NeRF) revolutionize the realm of visual media by providing photorealistic Free-Viewpoint Video (FVV) experiences, offering viewers unparalleled immersion and interactivity. However, the technology's significant storage requirements and the computational complexity involved in generation and rendering currently limit its broader application. To close this gap, this paper pre… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 13 pages, 11 figures

  13. arXiv:2312.05855  [pdf, other

    cs.CV

    NeVRF: Neural Video-based Radiance Fields for Long-duration Sequences

    Authors: Minye Wu, Tinne Tuytelaars

    Abstract: Adopting Neural Radiance Fields (NeRF) to long-duration dynamic sequences has been challenging. Existing methods struggle to balance between quality and storage size and encounter difficulties with complex scene changes such as topological changes and large motions. To tackle these issues, we propose a novel neural video-based radiance fields (NeVRF) representation. NeVRF marries neural radiance f… ▽ More

    Submitted 10 December, 2023; originally announced December 2023.

    Comments: 11 pages, 12 figures

  14. arXiv:2311.14028  [pdf, other

    cs.LG cs.AI cs.CV

    Continual Learning of Diffusion Models with Generative Distillation

    Authors: Sergi Masip, Pau Rodriguez, Tinne Tuytelaars, Gido M. van de Ven

    Abstract: Diffusion models are powerful generative models that achieve state-of-the-art performance in image synthesis. However, training them demands substantial amounts of data and computational resources. Continual learning would allow for incrementally learning new tasks and accumulating knowledge, thus enabling the reuse of trained models for further learning. One potentially suitable continual learnin… ▽ More

    Submitted 20 May, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: To appear in the Proceedings of the Third Conference on Lifelong Learning Agents (CoLLAs), 2024

  15. arXiv:2311.11908  [pdf, other

    cs.LG cs.AI cs.CV

    Continual Learning: Applications and the Road Forward

    Authors: Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, Gido M. van de Ven

    Abstract: Continual learning is a subfield of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by examining recent continual learning papers published at four… ▽ More

    Submitted 28 March, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

    Journal ref: Transactions on Machine Learning Research (TMLR), 2024

  16. arXiv:2311.08043  [pdf, other

    cs.CV

    Contrastive Learning for Multi-Object Tracking with Transformers

    Authors: Pierre-François De Plaen, Nicola Marinello, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool

    Abstract: The DEtection TRansformer (DETR) opened new possibilities for object detection by modeling it as a translation task: converting image features into object-level representations. Previous works typically add expensive modules to DETR to perform Multi-Object Tracking (MOT), resulting in more complicated architectures. We instead show how DETR can be turned into a MOT model by employing an instance-l… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

    Comments: WACV 2024

  17. arXiv:2311.04898  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Two Complementary Perspectives to Continual Learning: Ask Not Only What to Optimize, But Also How

    Authors: Timm Hess, Tinne Tuytelaars, Gido M. van de Ven

    Abstract: Recent years have seen considerable progress in the continual training of deep neural networks, predominantly thanks to approaches that add replay or regularization terms to the loss function to approximate the joint loss over all tasks so far. However, we show that even with a perfect approximation to the joint loss, these approaches still suffer from temporary but substantial forgetting when sta… ▽ More

    Submitted 21 June, 2024; v1 submitted 8 November, 2023; originally announced November 2023.

    Comments: Full paper version of pre-registered report accepted at the 1st ContinualAI Unconference. The originally submitted pre-registered proposal can be found at arXiv:2311.04898v1

  18. arXiv:2310.19252  [pdf, other

    cs.CV cs.AI cs.LG

    Revisiting Evaluation Metrics for Semantic Segmentation: Optimization and Evaluation of Fine-grained Intersection over Union

    Authors: Zifu Wang, Maxim Berman, Amal Rannen-Triki, Philip H. S. Torr, Devis Tuia, Tinne Tuytelaars, Luc Van Gool, Jiaqian Yu, Matthew B. Blaschko

    Abstract: Semantic segmentation datasets often exhibit two types of imbalance: \textit{class imbalance}, where some classes appear more frequently than others and \textit{size imbalance}, where some objects occupy more pixels than others. This causes traditional evaluation metrics to be biased towards \textit{majority classes} (e.g. overall pixel-wise accuracy) and \textit{large objects} (e.g. mean pixel-wi… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  19. arXiv:2310.07855  [pdf, other

    cs.CV cs.LG

    CrIBo: Self-Supervised Learning via Cross-Image Object-Level Bootstrapping

    Authors: Tim Lebailly, Thomas Stegmüller, Behzad Bozorgtabar, Jean-Philippe Thiran, Tinne Tuytelaars

    Abstract: Leveraging nearest neighbor retrieval for self-supervised representation learning has proven beneficial with object-centric images. However, this approach faces limitations when applied to scene-centric datasets, where multiple objects within an image are only implicitly captured in the global representation. Such global bootstrapping can lead to undesirable entanglement of object representations.… ▽ More

    Submitted 3 March, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

    Comments: ICLR 2024 (spotlight)

  20. arXiv:2309.05069  [pdf, other

    cs.CV

    Exploiting CLIP for Zero-shot HOI Detection Requires Knowledge Distillation at Multiple Levels

    Authors: Bo Wan, Tinne Tuytelaars

    Abstract: In this paper, we investigate the task of zero-shot human-object interaction (HOI) detection, a novel paradigm for identifying HOIs without the need for task-specific annotations. To address this challenging task, we employ CLIP, a large-scale pre-trained vision-language model (VLM), for knowledge distillation on multiple levels. Specifically, we design a multi-branch neural network that leverages… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

  21. arXiv:2308.08530  [pdf, other

    cs.CV cs.GR

    Ref-DVGO: Reflection-Aware Direct Voxel Grid Optimization for an Improved Quality-Efficiency Trade-Off in Reflective Scene Reconstruction

    Authors: Georgios Kouros, Minye Wu, Shubham Shrivastava, Sushruth Nagesh, Punarjay Chakravarty, Tinne Tuytelaars

    Abstract: Neural Radiance Fields (NeRFs) have revolutionized the field of novel view synthesis, demonstrating remarkable performance. However, the modeling and rendering of reflective objects remain challenging problems. Recent methods have shown significant improvements over the baselines in handling reflective scenes, albeit at the expense of efficiency. In this work, we aim to strike a balance between ef… ▽ More

    Submitted 21 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: 5 pages, 4 figures, 3 tables, ICCV TRICKY 2023 Workshop

  22. arXiv:2308.08325  [pdf, other

    cs.CV

    Visually-Aware Context Modeling for News Image Captioning

    Authors: Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: News Image Captioning aims to create captions from news articles and images, emphasizing the connection between textual context and visual elements. Recognizing the significance of human faces in news images and the face-name co-occurrence pattern in existing datasets, we propose a face-naming module for learning better name embeddings. Apart from names, which can be directly linked to an image ar… ▽ More

    Submitted 21 March, 2024; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: Accepted at NAACL 2024 Main Conference

  23. arXiv:2307.07483  [pdf, other

    cs.CV

    Multimodal Distillation for Egocentric Action Recognition

    Authors: Gorjan Radevski, Dusan Grujicic, Marie-Francine Moens, Matthew Blaschko, Tinne Tuytelaars

    Abstract: The focal point of egocentric video understanding is modelling hand-object interactions. Standard models, e.g. CNNs or Vision Transformers, which receive RGB frames as input perform well. However, their performance improves further by employing additional input modalities that provide complementary cues, such as object detections, optical flow, audio, etc. The added complexity of the modality-spec… ▽ More

    Submitted 18 July, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted at ICCV 2023; Codebase released at https://github.com/gorjanradevski/multimodal-distillation

  24. arXiv:2307.02402  [pdf, other

    cs.CV cs.LG

    Unbalanced Optimal Transport: A Unified Framework for Object Detection

    Authors: Henri De Plaen, Pierre-François De Plaen, Johan A. K. Suykens, Marc Proesmans, Tinne Tuytelaars, Luc Van Gool

    Abstract: During training, supervised object detection tries to correctly match the predicted bounding boxes and associated classification scores to the ground truth. This is essential to determine which predictions are to be pushed towards which solutions, or to be discarded. Popular matching strategies include matching to the closest ground truth box (mostly used in combination with anchors), or matching… ▽ More

    Submitted 5 July, 2023; originally announced July 2023.

    Comments: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2023)

  25. arXiv:2307.01545  [pdf, other

    cs.CV

    EffSeg: Efficient Fine-Grained Instance Segmentation using Structure-Preserving Sparsity

    Authors: Cédric Picron, Tinne Tuytelaars

    Abstract: Many two-stage instance segmentation heads predict a coarse 28x28 mask per instance, which is insufficient to capture the fine-grained details of many objects. To address this issue, PointRend and RefineMask predict a 112x112 segmentation mask resulting in higher quality segmentations. Both methods however have limitations by either not having access to neighboring features (PointRend) or by perfo… ▽ More

    Submitted 4 July, 2023; originally announced July 2023.

  26. arXiv:2306.02947  [pdf, other

    cs.LG cs.CV

    Continual Learning with Pretrained Backbones by Tuning in the Input Space

    Authors: Simone Marullo, Matteo Tiezzi, Marco Gori, Stefano Melacci, Tinne Tuytelaars

    Abstract: The intrinsic difficulty in adapting deep learning models to non-stationary environments limits the applicability of neural networks to real-world tasks. This issue is critical in practical supervised learning settings, such as the ones in which a pre-trained model computes projections toward a latent space where different task predictors are sequentially learned over time. As a matter of fact, in… ▽ More

    Submitted 8 June, 2023; v1 submitted 5 June, 2023; originally announced June 2023.

  27. arXiv:2306.02161  [pdf, other

    cs.LG

    Few-Shot Open-Set Learning for On-Device Customization of KeyWord Spotting Systems

    Authors: Manuele Rusci, Tinne Tuytelaars

    Abstract: A personalized KeyWord Spotting (KWS) pipeline typically requires the training of a Deep Learning model on a large set of user-defined speech utterances, preventing fast customization directly applied on-device. To fill this gap, this paper investigates few-shot learning methods for open-set KWS classification by combining a deep feature encoder with a prototype-based classifier. With user-defined… ▽ More

    Submitted 3 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

  28. arXiv:2305.18806  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Prediction Error-based Classification for Class-Incremental Learning

    Authors: Michał Zając, Tinne Tuytelaars, Gido M. van de Ven

    Abstract: Class-incremental learning (CIL) is a particularly challenging variant of continual learning, where the goal is to learn to discriminate between all classes presented in an incremental fashion. Existing approaches often suffer from excessive forgetting and imbalance of the scores assigned to classes that have not been seen together during training. In this study, we introduce a novel approach, Pre… ▽ More

    Submitted 9 March, 2024; v1 submitted 30 May, 2023; originally announced May 2023.

    Comments: ICLR 2024 camera ready

  29. arXiv:2304.04452  [pdf, other

    cs.CV

    Neural Residual Radiance Fields for Streamably Free-Viewpoint Videos

    Authors: Liao Wang, Qiang Hu, Qihan He, Ziyu Wang, Jingyi Yu, Tinne Tuytelaars, Lan Xu, Minye Wu

    Abstract: The success of the Neural Radiance Fields (NeRFs) for modeling and free-view rendering static objects has inspired numerous attempts on dynamic scenes. Current techniques that utilize neural rendering for facilitating free-view videos (FVVs) are restricted to either offline rendering or are capable of processing only brief sequences with minimal motion. In this paper, we present a novel technique,… ▽ More

    Submitted 15 June, 2023; v1 submitted 10 April, 2023; originally announced April 2023.

    Comments: Accepted by CVPR 2023. Project page, see https://aoliao12138.github.io/ReRF/

  30. arXiv:2304.00933  [pdf, other

    cs.LG cs.CV

    Knowledge Accumulation in Continually Learned Representations and the Issue of Feature Forgetting

    Authors: Timm Hess, Eli Verwimp, Gido M. van de Ven, Tinne Tuytelaars

    Abstract: Continual learning research has shown that neural networks suffer from catastrophic forgetting "at the output level", but it is debated whether this is also the case at the level of learned representations. Multiple recent studies ascribe representations a certain level of innate robustness against forgetting -- that they only forget minimally in comparison with forgetting at the output level. We… ▽ More

    Submitted 24 June, 2024; v1 submitted 3 April, 2023; originally announced April 2023.

    Comments: TMLR 2024

    Journal ref: Transactions on Machine Learning Research (TMLR), 2024

  31. arXiv:2303.13606  [pdf, other

    cs.CV

    Adaptive Similarity Bootstrapping for Self-Distillation based Representation Learning

    Authors: Tim Lebailly, Thomas Stegmüller, Behzad Bozorgtabar, Jean-Philippe Thiran, Tinne Tuytelaars

    Abstract: Most self-supervised methods for representation learning leverage a cross-view consistency objective i.e., they maximize the representation similarity of a given image's augmented views. Recent work NNCLR goes beyond the cross-view paradigm and uses positive pairs from different images obtained via nearest neighbor bootstrapping in a contrastive setting. We empirically show that as opposed to the… ▽ More

    Submitted 7 September, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

    Comments: ICCV 2023. * denotes equal contribution

  32. arXiv:2303.13245  [pdf, other

    cs.CV

    CrOC: Cross-View Online Clustering for Dense Visual Representation Learning

    Authors: Thomas Stegmüller, Tim Lebailly, Behzad Bozorgtabar, Tinne Tuytelaars, Jean-Philippe Thiran

    Abstract: Learning dense visual representations without labels is an arduous task and more so from scene-centric data. We propose to tackle this challenging problem by proposing a Cross-view consistency objective with an Online Clustering mechanism (CrOC) to discover and segment the semantics of the views. In the absence of hand-crafted priors, the resulting method is more generalizable and does not require… ▽ More

    Submitted 23 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023, * denotes equal contribution

  33. arXiv:2303.01313  [pdf, other

    cs.CV

    Weakly-supervised HOI Detection via Prior-guided Bi-level Representation Learning

    Authors: Bo Wan, Yongfei Liu, Desen Zhou, Tinne Tuytelaars, Xuming He

    Abstract: Human object interaction (HOI) detection plays a crucial role in human-centric scene understanding and serves as a fundamental building-block for many vision tasks. One generalizable and scalable strategy for HOI detection is to use weak supervision, learning from image-level annotations only. This is inherently challenging due to ambiguous human-object associations, large search space of detectin… ▽ More

    Submitted 2 March, 2023; originally announced March 2023.

    Comments: Accepted by ICLR2023

  34. arXiv:2212.00171  [pdf, other

    cs.CV cs.AI cs.LG

    Layout-aware Dreamer for Embodied Referring Expression Grounding

    Authors: Mingxiao Li, Zehao Wang, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: In this work, we study the problem of Embodied Referring Expression Grounding, where an agent needs to navigate in a previously unseen environment and localize a remote object described by a concise high-level natural language instruction. When facing such a situation, a human tends to imagine what the destination may look like and to explore the environment based on prior knowledge of the environ… ▽ More

    Submitted 2 December, 2022; v1 submitted 30 November, 2022; originally announced December 2022.

  35. arXiv:2211.12111  [pdf, other

    cs.RO

    Evaluation of MPC-based Imitation Learning for Human-like Autonomous Driving

    Authors: Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

    Abstract: This work evaluates and analyzes the combination of imitation learning (IL) and differentiable model predictive control (MPC) for the application of human-like autonomous driving. We combine MPC with a hierarchical learning-based policy, and measure its performance in open-loop and closed-loop with metrics related to safety, comfort and similarity to human driving characteristics. We also demonstr… ▽ More

    Submitted 26 June, 2023; v1 submitted 22 November, 2022; originally announced November 2022.

    Comments: This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND. arXiv admin note: text overlap with arXiv:2206.12348

  36. arXiv:2210.17322  [pdf, other

    cs.CV

    Generative Negative Text Replay for Continual Vision-Language Pretraining

    Authors: Shipeng Yan, Lanqing Hong, Hang Xu, Jianhua Han, Tinne Tuytelaars, Zhenguo Li, Xuming He

    Abstract: Vision-language pre-training (VLP) has attracted increasing attention recently. With a large amount of image-text pairs, VLP models trained with contrastive loss have achieved impressive performance in various tasks, especially the zero-shot generalization on downstream datasets. In practical applications, however, massive data are usually collected in a streaming fashion, requiring VLP models to… ▽ More

    Submitted 31 October, 2022; originally announced October 2022.

    Comments: ECCV 2022

  37. arXiv:2210.08957  [pdf, other

    cs.CV

    Weakly Supervised Face Naming with Symmetry-Enhanced Contrastive Loss

    Authors: Tingyu Qu, Tinne Tuytelaars, Marie-Francine Moens

    Abstract: We revisit the weakly supervised cross-modal face-name alignment task; that is, given an image and a caption, we label the faces in the image with the names occurring in the caption. Whereas past approaches have learned the latent alignment between names and faces by uncertainty reasoning over a set of images and their respective captions, in this paper, we rely on appropriate loss functions to le… ▽ More

    Submitted 17 October, 2022; originally announced October 2022.

    Comments: Accepted at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2023

  38. arXiv:2210.04331  [pdf, other

    cs.CV

    Students taught by multimodal teachers are superior action recognizers

    Authors: Gorjan Radevski, Dusan Grujicic, Matthew Blaschko, Marie-Francine Moens, Tinne Tuytelaars

    Abstract: The focal point of egocentric video understanding is modelling hand-object interactions. Standard models -- CNNs, Vision Transformers, etc. -- which receive RGB frames as input perform well, however, their performance improves further by employing additional modalities such as object detections, optical flow, audio, etc. as input. The added complexity of the required modality-specific modules, on… ▽ More

    Submitted 9 October, 2022; originally announced October 2022.

    Comments: Extended abstract accepted at the 2nd Ego4D Workshop @ ECCV 2022

  39. arXiv:2210.03482  [pdf, other

    cs.CV cs.LG

    CLAD: A realistic Continual Learning benchmark for Autonomous Driving

    Authors: Eli Verwimp, Kuo Yang, Sarah Parisot, Hong Lanqing, Steven McDonagh, Eduardo Pérez-Pellitero, Matthias De Lange, Tinne Tuytelaars

    Abstract: In this paper we describe the design and the ideas motivating a new Continual Learning benchmark for Autonomous Driving (CLAD), that focuses on the problems of object classification and object detection. The benchmark utilises SODA10M, a recently released large-scale dataset that concerns autonomous driving related problems. First, we review and discuss existing continual learning benchmarks, how… ▽ More

    Submitted 7 October, 2022; originally announced October 2022.

  40. arXiv:2210.02318  [pdf, other

    cs.CV

    FQDet: Fast-converging Query-based Detector

    Authors: Cédric Picron, Punarjay Chakravarty, Tinne Tuytelaars

    Abstract: Recently, two-stage Deformable DETR introduced the query-based two-stage head, a new type of two-stage head different from the region-based two-stage heads of classical detectors as Faster R-CNN. In query-based two-stage heads, the second stage selects one feature per detection processed by a transformer, called the query, as opposed to pooling a rectangular grid of features processed by CNNs as i… ▽ More

    Submitted 28 October, 2022; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: Accepted at NeurIPS VTTA workshop 2022

  41. arXiv:2208.06195  [pdf, other

    cs.CV

    Category-Level Pose Retrieval with Contrastive Features Learnt with Occlusion Augmentation

    Authors: Georgios Kouros, Shubham Shrivastava, Cédric Picron, Sushruth Nagesh, Punarjay Chakravarty, Tinne Tuytelaars

    Abstract: Pose estimation is usually tackled as either a bin classification or a regression problem. In both cases, the idea is to directly predict the pose of an object. This is a non-trivial task due to appearance variations between similar poses and similarities between dissimilar poses. Instead, we follow the key idea that comparing two poses is easier than directly predicting one. Render-and-compare ap… ▽ More

    Submitted 12 October, 2022; v1 submitted 12 August, 2022; originally announced August 2022.

    Comments: 29 pages, 16 Figures, 14 tables, BMVC 2022

  42. arXiv:2207.14676  [pdf, other

    cs.CV

    Global-Local Self-Distillation for Visual Representation Learning

    Authors: Tim Lebailly, Tinne Tuytelaars

    Abstract: The downstream accuracy of self-supervised methods is tightly linked to the proxy task solved during training and the quality of the gradients extracted from it. Richer and more meaningful gradients updates are key to allow self-supervised methods to learn better and in a more efficient manner. In a typical self-distillation framework, the representation of two augmented images are enforced to be… ▽ More

    Submitted 12 October, 2022; v1 submitted 29 July, 2022; originally announced July 2022.

    Comments: WACV 2023

  43. arXiv:2206.12348  [pdf, other

    cs.RO eess.SY

    MPC-based Imitation Learning for Safe and Human-like Autonomous Driving

    Authors: Flavia Sofia Acerbo, Jan Swevers, Tinne Tuytelaars, Tong Duy Son

    Abstract: To ensure user acceptance of autonomous vehicles (AVs), control systems are being developed to mimic human drivers from demonstrations of desired driving behaviors. Imitation learning (IL) algorithms serve this purpose, but struggle to provide safety guarantees on the resulting closed-loop system trajectories. On the other hand, Model Predictive Control (MPC) can handle nonlinear systems with safe… ▽ More

    Submitted 24 June, 2022; originally announced June 2022.

    Comments: Accepted at the 1st Workshop on Safe Learning for Autonomous Driving (SL4AD), co-located with the 39th International Conference on Machine Learning (ICML 2022)

  44. arXiv:2205.13452  [pdf, other

    cs.LG cs.AI cs.CV

    Continual evaluation for lifelong learning: Identifying the stability gap

    Authors: Matthias De Lange, Gido van de Ven, Tinne Tuytelaars

    Abstract: Time-dependent data-generating distributions have proven to be difficult for gradient-based training of neural networks, as the greedy updates result in catastrophic forgetting of previously learned knowledge. Despite the progress in the field of continual learning to overcome this forgetting, we show that a set of common state-of-the-art methods still suffers from substantial forgetting upon star… ▽ More

    Submitted 30 March, 2023; v1 submitted 26 May, 2022; originally announced May 2022.

    Comments: Published as spotlight paper at ICLR 2023

  45. arXiv:2205.09357  [pdf, other

    cs.LG cs.AI

    Continual Pre-Training Mitigates Forgetting in Language and Vision

    Authors: Andrea Cossu, Tinne Tuytelaars, Antonio Carta, Lucia Passaro, Vincenzo Lomonaco, Davide Bacciu

    Abstract: Pre-trained models are nowadays a fundamental component of machine learning research. In continual learning, they are commonly used to initialize the model before training on the stream of non-stationary data. However, pre-training is rarely applied during continual learning. We formalize and investigate the characteristics of the continual pre-training scenario in both language and vision environ… ▽ More

    Submitted 19 May, 2022; originally announced May 2022.

    Comments: under review

  46. arXiv:2204.01407  [pdf, other

    cs.CV cs.LG

    Re-examining Distillation For Continual Object Detection

    Authors: Eli Verwimp, Kuo Yang, Sarah Parisot, Hong Lanqing, Steven McDonagh, Eduardo Pérez-Pellitero, Matthias De Lange, Tinne Tuytelaars

    Abstract: Training models continually to detect and classify objects, from new classes and new domains, remains an open problem. In this work, we conduct a thorough analysis of why and how object detection models forget catastrophically. We focus on distillation-based approaches in two-stage networks; the most-common strategy employed in contemporary continual object detection work.Distillation aims to tran… ▽ More

    Submitted 7 October, 2022; v1 submitted 4 April, 2022; originally announced April 2022.

    Comments: Accepted at BMVC '22

  47. arXiv:2203.06127  [pdf, other

    cs.CV

    Spatial Consistency Loss for Training Multi-Label Classifiers from Single-Label Annotations

    Authors: Thomas Verelst, Paul K. Rubenstein, Marcin Eichner, Tinne Tuytelaars, Maxim Berman

    Abstract: As natural images usually contain multiple objects, multi-label image classification is more applicable "in the wild" than single-label classification. However, exhaustively annotating images with every object of interest is costly and time-consuming. We aim to train multi-label classifiers from single-label annotations only. We show that adding a consistency loss, ensuring that the predictions of… ▽ More

    Submitted 11 March, 2022; originally announced March 2022.

    Comments: 24 pages, 9 figures

  48. arXiv:2203.03798   

    cs.LG cs.AI

    New Insights on Reducing Abrupt Representation Change in Online Continual Learning

    Authors: Lucas Caccia, Rahaf Aljundi, Nader Asadi, Tinne Tuytelaars, Joelle Pineau, Eugene Belilovsky

    Abstract: In the online continual learning paradigm, agents must learn from a changing distribution while respecting memory and compute constraints. Experience Replay (ER), where a small subset of past data is stored and replayed alongside new data, has emerged as a simple and effective learning strategy. In this work, we focus on the change in representations of observed data that arises when previously un… ▽ More

    Submitted 25 April, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: This has been withdrawn as it is a new version of arXiv:2104.05025

  49. arXiv:2203.03727  [pdf, other

    cs.CV

    Barlow constrained optimization for Visual Question Answering

    Authors: Abhishek Jha, Badri N. Patro, Luc Van Gool, Tinne Tuytelaars

    Abstract: Visual question answering is a vision-and-language multimodal task, that aims at predicting answers given samples from the question and image modalities. Most recent methods focus on learning a good joint embedding space of images and questions, either by improving the interaction between these two modalities, or by making it a more discriminant space. However, how informative this joint space is,… ▽ More

    Submitted 7 March, 2022; originally announced March 2022.

  50. arXiv:2203.03183  [pdf, other

    cs.AI

    Find a Way Forward: a Language-Guided Semantic Map Navigator

    Authors: Zehao Wang, Mingxiao Li, Minye Wu, Marie-Francine Moens, Tinne Tuytelaars

    Abstract: In this paper, we introduce the map-language navigation task where an agent executes natural language instructions and moves to the target position based only on a given 3D semantic map. To tackle the task, we design the instruction-aware Path Proposal and Discrimination model (iPPD). Our approach leverages map information to provide instruction-aware path proposals, i.e., it selects all potential… ▽ More

    Submitted 26 September, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: content revised