subscribe to arXiv mailings

arXiv:2407.10791 [pdf, other]

Interactive Public Transport Infrastructure Analysis through Mobility Profiles: Making the Mobility Transition Transparent

Authors: Yannick Metz, Dennis Ackermann, Daniel A. Keim, Maximilian T. Fischer

Abstract: Efficient public transport systems are crucial for sustainable urban development as cities face increasing mobility demands. Yet, many public transport networks struggle to meet diverse user needs due to historical development, urban constraints, and financial limitations. Traditionally, planning of transport network structure is often based on limited surveys, expert opinions, or partial usage st… ▽ More Efficient public transport systems are crucial for sustainable urban development as cities face increasing mobility demands. Yet, many public transport networks struggle to meet diverse user needs due to historical development, urban constraints, and financial limitations. Traditionally, planning of transport network structure is often based on limited surveys, expert opinions, or partial usage statistics. This provides an incomplete basis for decision-making. We introduce an data-driven approach to public transport planning and optimization, calculating detailed accessibility measures at the individual housing level. Our visual analytics workflow combines population-group-based simulations with dynamic infrastructure analysis, utilizing a scenario-based model to simulate daily travel patterns of varied demographic groups, including schoolchildren, students, workers, and pensioners. These population groups, each with unique mobility requirements and routines, interact with the transport system under different scenarios traveling to and from Points of Interest (POI), assessed through travel time calculations. Results are visualized through heatmaps, density maps, and network overlays, as well as detailed statistics. Our system allows us to analyze both the underlying data and simulation results on multiple levels of granularity, delivering both broad insights and granular details. Case studies with the city of Konstanz, Germany reveal key areas where public transport does not meet specific needs, confirmed through a formative user study. Due to the high cost of changing legacy networks, our analysis facilitates the identification of strategic enhancements, such as optimized schedules or rerouting, and few targeted stop relocations, highlighting consequential variations in accessibility to pinpointing critical service gaps. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 9 pages, 8 figures

ACM Class: H.5.2

arXiv:2407.10652 [pdf, other]

Cutting Through the Clutter: The Potential of LLMs for Efficient Filtration in Systematic Literature Reviews

Authors: Lucas Joos, Daniel A. Keim, Maximilian T. Fischer

Abstract: In academic research, systematic literature reviews are foundational and highly relevant, yet tedious to create due to the high volume of publications and labor-intensive processes involved. Systematic selection of relevant papers through conventional means like keyword-based filtering techniques can sometimes be inadequate, plagued by semantic ambiguities and inconsistent terminology, which can l… ▽ More In academic research, systematic literature reviews are foundational and highly relevant, yet tedious to create due to the high volume of publications and labor-intensive processes involved. Systematic selection of relevant papers through conventional means like keyword-based filtering techniques can sometimes be inadequate, plagued by semantic ambiguities and inconsistent terminology, which can lead to sub-optimal outcomes. To mitigate the required extensive manual filtering, we explore and evaluate the potential of using Large Language Models (LLMs) to enhance the efficiency, speed, and precision of literature review filtering, reducing the amount of manual screening required. By using models as classification agents acting on a structured database only, we prevent common problems inherent in LLMs, such as hallucinations. We evaluate the real-world performance of such a setup during the construction of a recent literature survey paper with initially more than 8.3k potentially relevant articles under consideration and compare this with human performance on the same dataset. Our findings indicate that employing advanced LLMs like GPT-4o, Claude 3.5 Sonnet, Gemini 1.5 Flash, or Llama3 with simple prompting can significantly reduce the time required for literature filtering - from usually weeks of manual research to only a few minutes. Simultaneously, we crucially show that false negatives can indeed be controlled through a consensus scheme, achieving recalls >98.8% at or even beyond the typical human error threshold, thereby also providing for more accurate and relevant articles selected. Our research not only demonstrates a substantial improvement in the methodology of literature reviews but also sets the stage for further integration and extensive future applications of responsible AI in academic research practices. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 5 pages, 5 figures, 1 table

ACM Class: H.5.2

arXiv:2407.09271 [pdf, other]

iNeMo: Incremental Neural Mesh Models for Robust Class-Incremental Learning

Authors: Tom Fischer, Yaoyao Liu, Artur Jesslen, Noor Ahmed, Prakhar Kaushik, Angtian Wang, Alan Yuille, Adam Kortylewski, Eddy Ilg

Abstract: Different from human nature, it is still common practice today for vision tasks to train deep learning models only initially and on fixed datasets. A variety of approaches have recently addressed handling continual data streams. However, extending these methods to manage out-of-distribution (OOD) scenarios has not effectively been investigated. On the other hand, it has recently been shown that no… ▽ More Different from human nature, it is still common practice today for vision tasks to train deep learning models only initially and on fixed datasets. A variety of approaches have recently addressed handling continual data streams. However, extending these methods to manage out-of-distribution (OOD) scenarios has not effectively been investigated. On the other hand, it has recently been shown that non-continual neural mesh models exhibit strong performance in generalizing to such OOD scenarios. To leverage this decisive property in a continual learning setting, we propose incremental neural mesh models that can be extended with new meshes over time. In addition, we present a latent space initialization strategy that enables us to allocate feature space for future unseen classes in advance and a positional regularization term that forces the features of the different classes to consistently stay in respective latent space regions. We demonstrate the effectiveness of our method through extensive experiments on the Pascal3D and ObjectNet3D datasets and show that our approach outperforms the baselines for classification by $2-6\%$ in the in-domain and by $6-50\%$ in the OOD setting. Our work also presents the first incremental learning approach for pose estimation. Our code and model can be found at https://github.com/Fischer-Tom/iNeMo. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.05427 [pdf, other]

MelodyVis: Visual Analytics for Melodic Patterns in Sheet Music

Authors: Matthias Miller, Daniel Fürst, Maximilian T. Fischer, Hanna Hauptmann, Daniel Keim, Mennatallah El-Assady

Abstract: Manual melody detection is a tedious task requiring high expertise level, while automatic detection is often not expressive or powerful enough. Thus, we present MelodyVis, a visual application designed in collaboration with musicology experts to explore melodic patterns in digital sheet music. MelodyVis features five connected views, including a Melody Operator Graph and a Voicing Timeline. The sy… ▽ More Manual melody detection is a tedious task requiring high expertise level, while automatic detection is often not expressive or powerful enough. Thus, we present MelodyVis, a visual application designed in collaboration with musicology experts to explore melodic patterns in digital sheet music. MelodyVis features five connected views, including a Melody Operator Graph and a Voicing Timeline. The system utilizes eight atomic operators, such as transposition and mirroring, to capture melody repetitions and variations. Users can start their analysis by manually selecting patterns in the sheet view, and then identifying other patterns based on the selected samples through an interactive exploration process. We conducted a user study to investigate the effectiveness and usefulness of our approach and its integrated melodic operators, including usability and mental load questions. We compared the analysis executed by 25 participants with and without the operators. The study results indicate that the participants could identify at least twice as many patterns with activated operators. MelodyVis allows analysts to steer the analysis process and interpret results. Our study also confirms the usefulness of MelodyVis in supporting common analytical tasks in melodic analysis, with participants reporting improved pattern identification and interpretation. Thus, MelodyVis addresses the limitations of fully-automated approaches, enabling music analysts to step into the analysis process and uncover and understand intricate melodic patterns and transformations in sheet music. △ Less

Submitted 7 July, 2024; originally announced July 2024.

Comments: 9+2 pages, 9 figures, preprint, originally submitted to IEEE VIS 23, revision

ACM Class: I.5.4; H.3.3; J.5.7

arXiv:2406.19543 [pdf, other]

Demarked: A Strategy for Enhanced Abusive Speech Moderation through Counterspeech, Detoxification, and Message Management

Authors: Seid Muhie Yimam, Daryna Dementieva, Tim Fischer, Daniil Moskovskiy, Naquee Rizwan, Punyajoy Saha, Sarthak Roy, Martin Semmann, Alexander Panchenko, Chris Biemann, Animesh Mukherjee

Abstract: Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcat… ▽ More Despite regulations imposed by nations and social media platforms, such as recent EU regulations targeting digital violence, abusive content persists as a significant challenge. Existing approaches primarily rely on binary solutions, such as outright blocking or banning, yet fail to address the complex nature of abusive speech. In this work, we propose a more comprehensive approach called Demarcation scoring abusive speech based on four aspect -- (i) severity scale; (ii) presence of a target; (iii) context scale; (iv) legal scale -- and suggesting more options of actions like detoxification, counter speech generation, blocking, or, as a final measure, human intervention. Through a thorough analysis of abusive speech regulations across diverse jurisdictions, platforms, and research papers we highlight the gap in preventing measures and advocate for tailored proactive steps to combat its multifaceted manifestations. Our work aims to inform future strategies for effectively addressing abusive speech online. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.15068 [pdf, other]

Occamy: A 432-Core 28.1 DP-GFLOP/s/W 83% FPU Utilization Dual-Chiplet, Dual-HBM2E RISC-V-based Accelerator for Stencil and Sparse Linear Algebra Computations with 8-to-64-bit Floating-Point Support in 12nm FinFET

Authors: Gianna Paulin, Paul Scheffler, Thomas Benz, Matheus Cavalcante, Tim Fischer, Manuel Eggimann, Yichao Zhang, Nils Wistoff, Luca Bertaccini, Luca Colagrande, Gianmarco Ottavi, Frank K. Gürkaynak, Davide Rossi, Luca Benini

Abstract: We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom extensions, two 64-bit host cores, and a latency-tolerant multi-chiplet interconnect and memory system with 32 GiB of HBM2E. It achieves leading-edge utilization on stenc… ▽ More We present Occamy, a 432-core RISC-V dual-chiplet 2.5D system for efficient sparse linear algebra and stencil computations on FP64 and narrow (32-, 16-, 8-bit) SIMD FP data. Occamy features 48 clusters of RISC-V cores with custom extensions, two 64-bit host cores, and a latency-tolerant multi-chiplet interconnect and memory system with 32 GiB of HBM2E. It achieves leading-edge utilization on stencils (83 %), sparse-dense (42 %), and sparse-sparse (49 %) matrix multiply. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 2 pages, 7 figures. Accepted at the 2024 IEEE Symposium on VLSI Technology & Circuits

arXiv:2406.03175 [pdf, other]

Dynamic 3D Gaussian Fields for Urban Areas

Authors: Tobias Fischer, Jonas Kulhanek, Samuel Rota Bulò, Lorenzo Porzi, Marc Pollefeys, Peter Kontschieder

Abstract: We present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas. Existing works are not well suited for applications like mixed-reality or closed-loop simulation due to their limited visual quality and non-interactive rendering speeds. Recently, rasterization-based approaches have achieved high-quality NVS at impressive speeds. However, the… ▽ More We present an efficient neural 3D scene representation for novel-view synthesis (NVS) in large-scale, dynamic urban areas. Existing works are not well suited for applications like mixed-reality or closed-loop simulation due to their limited visual quality and non-interactive rendering speeds. Recently, rasterization-based approaches have achieved high-quality NVS at impressive speeds. However, these methods are limited to small-scale, homogeneous data, i.e. they cannot handle severe appearance and geometry variations due to weather, season, and lighting and do not scale to larger, dynamic areas with thousands of images. We propose 4DGF, a neural scene representation that scales to large-scale dynamic urban areas, handles heterogeneous input data, and substantially improves rendering speeds. We use 3D Gaussians as an efficient geometry scaffold while relying on neural fields as a compact and flexible appearance model. We integrate scene dynamics via a scene graph at global scale while modeling articulated motions on a local level via deformations. This decomposed approach enables flexible scene composition suitable for real-world applications. In experiments, we surpass the state-of-the-art by over 3 dB in PSNR and more than 200 times in rendering speed. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: Project page is available at https://tobiasfshr.github.io/pub/4dgf/

arXiv:2405.19284 [pdf, other]

Optimizing Foundation Model Inference on a Many-tiny-core Open-source RISC-V Platform

Authors: Viviane Potocnik, Luca Colagrande, Tim Fischer, Luca Bertaccini, Daniele Jahier Pagliari, Alessio Burrello, Luca Benini

Abstract: Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators with highly customized, proprietary instruction sets. Until now, limited attention has been given to RISC-V-based general-purpose platforms. In our work, we pre… ▽ More Transformer-based foundation models have become crucial for various domains, most notably natural language processing (NLP) or computer vision (CV). These models are predominantly deployed on high-performance GPUs or hardwired accelerators with highly customized, proprietary instruction sets. Until now, limited attention has been given to RISC-V-based general-purpose platforms. In our work, we present the first end-to-end inference results of transformer models on an open-source many-tiny-core RISC-V platform implementing distributed Softmax primitives and leveraging ISA extensions for SIMD floating-point operand streaming and instruction repetition, as well as specialized DMA engines to minimize costly main memory accesses and to tolerate their latency. We focus on two foundational transformer topologies, encoder-only and decoder-only models. For encoder-only models, we demonstrate a speedup of up to 12.8x between the most optimized implementation and the baseline version. We reach over 79% FPU utilization and 294 GFLOPS/W, outperforming State-of-the-Art (SoA) accelerators by more than 2x utilizing the HW platform while achieving comparable throughput per computational unit. For decoder-only topologies, we achieve 16.1x speedup in the Non-Autoregressive (NAR) mode and up to 35.6x speedup in the Autoregressive (AR) mode compared to the baseline implementation. Compared to the best SoA dedicated accelerator, we achieve 2.04x higher FPU utilization. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 14 pages, 10 figures, 4 tables, IEEE Transactions on Circuits and Systems for Artificial Intelligence

ACM Class: C.4; C.3; I.2

arXiv:2405.14599 [pdf, other]

Neuroexplicit Diffusion Models for Inpainting of Optical Flow Fields

Authors: Tom Fischer, Pascal Peter, Joachim Weickert, Eddy Ilg

Abstract: Deep learning has revolutionized the field of computer vision by introducing large scale neural networks with millions of parameters. Training these networks requires massive datasets and leads to intransparent models that can fail to generalize. At the other extreme, models designed from partial differential equations (PDEs) embed specialized domain knowledge into mathematical equations and usual… ▽ More Deep learning has revolutionized the field of computer vision by introducing large scale neural networks with millions of parameters. Training these networks requires massive datasets and leads to intransparent models that can fail to generalize. At the other extreme, models designed from partial differential equations (PDEs) embed specialized domain knowledge into mathematical equations and usually rely on few manually chosen hyperparameters. This makes them transparent by construction and if designed and calibrated carefully, they can generalize well to unseen scenarios. In this paper, we show how to bring model- and data-driven approaches together by combining the explicit PDE-based approaches with convolutional neural networks to obtain the best of both worlds. We illustrate a joint architecture for the task of inpainting optical flow fields and show that the combination of model- and data-driven modeling leads to an effective architecture. Our model outperforms both fully explicit and fully data-driven baselines in terms of reconstruction quality, robustness and amount of required training data. Averaging the endpoint error across different mask densities, our method outperforms the explicit baselines by 11-27%, the GAN baseline by 47% and the Probabilisitic Diffusion baseline by 42%. With that, our method sets a new state of the art for inpainting of optical flow fields from random masks. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2404.16044 [pdf, other]

Toward the Categorical Data Map

Authors: Frederik L. Dennig, Lucas Joos, Patrick Paetzold, Daniela Blumberg, Oliver Deussen, Daniel A. Keim, Maximilian T. Fischer

Abstract: Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is base… ▽ More Categorical data does not have an intrinsic definition of distance or order, and therefore, established visualization techniques for categorical data only allow for a set-based or frequency-based analysis, e.g., through Euler diagrams or Parallel Sets, and do not support a similarity-based analysis. We present a novel dimensionality reduction-based visualization for categorical data, which is based on defining the distance of two data items as the number of varying attributes. Our technique enables users to pre-attentively detect groups of similar data items and observe the properties of the projection, such as attributes strongly influencing the embedding. Our prototype visually encodes data properties in an enhanced scatterplot-like visualization, encoding attributes in the background to show the distribution of categories. In addition, we propose two graph-based measures to quantify the plot's visual quality, which rank attributes according to their contribution to cluster cohesion. To demonstrate the capabilities of our similarity-based approach, we compare it to Euler diagrams and Parallel Sets regarding visual scalability and show its benefits through an expert study with five data scientists analyzing the Titanic and Mushroom datasets with up to 23 attributes and 8124 category combinations. Our results indicate that the Categorical Data Map offers an effective analysis method, especially for large datasets with a high number of category combinations. △ Less

Submitted 14 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

Comments: 12 pages, 10 figures, LaTeX; formatting; corrected typo

arXiv:2404.09406 [pdf, other]

Human-in-the-Loop Segmentation of Multi-species Coral Imagery

Authors: Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Niko Suenderhauf, Tobias Fischer

Abstract: Broad-scale marine surveys performed by underwater vehicles significantly increase the availability of coral reef imagery, however it is costly and time-consuming for domain experts to label images. Point label propagation is an approach used to leverage existing image data labeled with sparse point labels. The resulting augmented ground truth generated is then used to train a semantic segmentatio… ▽ More Broad-scale marine surveys performed by underwater vehicles significantly increase the availability of coral reef imagery, however it is costly and time-consuming for domain experts to label images. Point label propagation is an approach used to leverage existing image data labeled with sparse point labels. The resulting augmented ground truth generated is then used to train a semantic segmentation model. Here, we first demonstrate that recent advances in foundation models enable generation of multi-species coral augmented ground truth masks using denoised DINOv2 features and K-Nearest Neighbors (KNN), without the need for any pre-training or custom-designed algorithms. For extremely sparsely labeled images, we propose a labeling regime based on human-in-the-loop principles, resulting in significant improvement in annotation efficiency: If only 5 point labels per image are available, our proposed human-in-the-loop approach improves on the state-of-the-art by 17.3% for pixel accuracy and 22.6% for mIoU; and by 10.6% and 19.1% when 10 point labels per image are available. Even if the human-in-the-loop labeling regime is not used, the denoised DINOv2 features with a KNN outperforms the prior state-of-the-art by 3.5% for pixel accuracy and 5.7% for mIoU (5 grid points). We also provide a detailed analysis of how point labeling style and the quantity of points per image affects the point label propagation quality and provide general recommendations on maximizing point label efficiency. △ Less

Submitted 16 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted at the CVPR2024 3rd Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU), 10 pages, 6 figures, an additional 4 pages of supplementary material

arXiv:2404.03658 [pdf, other]

Know Your Neighbors: Improving Single-View Reconstruction via Spatial Vision-Language Reasoning

Authors: Rui Li, Tobias Fischer, Mattia Segu, Marc Pollefeys, Luc Van Gool, Federico Tombari

Abstract: Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual… ▽ More Recovering the 3D scene geometry from a single view is a fundamental yet ill-posed problem in computer vision. While classical depth estimation methods infer only a 2.5D scene representation limited to the image plane, recent approaches based on radiance fields reconstruct a full 3D representation. However, these methods still struggle with occluded regions since inferring geometry without visual observation requires (i) semantic knowledge of the surroundings, and (ii) reasoning about spatial context. We propose KYN, a novel method for single-view scene reconstruction that reasons about semantic and spatial context to predict each point's density. We introduce a vision-language modulation module to enrich point features with fine-grained semantic information. We aggregate point representations across the scene through a language-guided spatial attention mechanism to yield per-point density predictions aware of the 3D semantic context. We show that KYN improves 3D shape recovery compared to predicting density for each 3D point in isolation. We achieve state-of-the-art results in scene and object reconstruction on KITTI-360, and show improved zero-shot generalization compared to prior work. Project page: https://ruili3.github.io/kyn. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: CVPR 2024. Project page: https://ruili3.github.io/kyn

arXiv:2404.03073 [pdf, other]

Mai Ho'omāuna i ka 'Ai: Language Models Improve Automatic Speech Recognition in Hawaiian

Authors: Kaavya Chaparala, Guido Zarrella, Bruce Torres Fischer, Larry Kimura, Oiwi Parker Jones

Abstract: In this paper we address the challenge of improving Automatic Speech Recognition (ASR) for a low-resource language, Hawaiian, by incorporating large amounts of independent text data into an ASR foundation model, Whisper. To do this, we train an external language model (LM) on ~1.5M words of Hawaiian text. We then use the LM to rescore Whisper and compute word error rates (WERs) on a manually curat… ▽ More In this paper we address the challenge of improving Automatic Speech Recognition (ASR) for a low-resource language, Hawaiian, by incorporating large amounts of independent text data into an ASR foundation model, Whisper. To do this, we train an external language model (LM) on ~1.5M words of Hawaiian text. We then use the LM to rescore Whisper and compute word error rates (WERs) on a manually curated test set of labeled Hawaiian data. As a baseline, we use Whisper without an external LM. Experimental results reveal a small but significant improvement in WER when ASR outputs are rescored with a Hawaiian LM. The results support leveraging all available data in the development of ASR systems for underrepresented languages. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.00168 [pdf, other]

Multi-Level Neural Scene Graphs for Dynamic Urban Environments

Authors: Tobias Fischer, Lorenzo Porzi, Samuel Rota Bulò, Marc Pollefeys, Peter Kontschieder

Abstract: We estimate the radiance field of large-scale dynamic areas from multiple vehicle captures under varying environmental conditions. Previous works in this domain are either restricted to static environments, do not scale to more than a single short video, or struggle to separately represent dynamic object instances. To this end, we present a novel, decomposable radiance field approach for dynamic u… ▽ More We estimate the radiance field of large-scale dynamic areas from multiple vehicle captures under varying environmental conditions. Previous works in this domain are either restricted to static environments, do not scale to more than a single short video, or struggle to separately represent dynamic object instances. To this end, we present a novel, decomposable radiance field approach for dynamic urban environments. We propose a multi-level neural scene graph representation that scales to thousands of images from dozens of sequences with hundreds of fast-moving objects. To enable efficient training and rendering of our representation, we develop a fast composite ray sampling and rendering scheme. To test our approach in urban driving scenarios, we introduce a new, novel view synthesis benchmark. We show that our approach outperforms prior art by a significant margin on both established and our proposed benchmark while being faster in training and rendering. △ Less

Submitted 29 March, 2024; originally announced April 2024.

Comments: CVPR 2024. Project page is available at https://tobiasfshr.github.io/pub/ml-nsg/

arXiv:2403.16425 [pdf, other]

Enhancing Visual Place Recognition via Fast and Slow Adaptive Biasing in Event Cameras

Authors: Gokul B. Nair, Michael Milford, Tobias Fischer

Abstract: Event cameras are increasingly popular in robotics due to their beneficial features, such as low latency, energy efficiency, and high dynamic range. Nevertheless, their downstream task performance is greatly influenced by the optimization of bias parameters. These parameters, for instance, regulate the necessary change in light intensity to trigger an event, which in turn depends on factors such a… ▽ More Event cameras are increasingly popular in robotics due to their beneficial features, such as low latency, energy efficiency, and high dynamic range. Nevertheless, their downstream task performance is greatly influenced by the optimization of bias parameters. These parameters, for instance, regulate the necessary change in light intensity to trigger an event, which in turn depends on factors such as the environment lighting and camera motion. This paper introduces feedback control algorithms that automatically tune the bias parameters through two interacting methods: 1) An immediate, on-the-fly fast adaptation of the refractory period, which sets the minimum interval between consecutive events, and 2) if the event rate exceeds the specified bounds even after changing the refractory period repeatedly, the controller adapts the pixel bandwidth and event thresholds, which stabilizes after a short period of noise events across all pixels (slow adaptation). Our evaluation focuses on the visual place recognition task, where incoming query images are compared to a given reference database. We conducted comprehensive evaluations of our algorithms' adaptive feedback control in real-time. To do so, we collected the QCR-Fast-and-Slow dataset that contains DAVIS346 event camera streams from 366 repeated traversals of a Scout Mini robot navigating through a 100 meter long indoor lab setting (totaling over 35km distance traveled) in varying brightness conditions with ground truth location information. Our proposed feedback controllers result in superior performance when compared to the standard bias settings and prior feedback control methods. Our findings also detail the impact of bias adjustments on task performance and feature ablation studies on the fast and slow adaptation mechanisms. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 8 pages, 9 figures, paper under review

arXiv:2403.15313 [pdf, other]

CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking

Authors: Nicolas Baumann, Michael Baumgartner, Edoardo Ghignone, Jonas Kühne, Tobias Fischer, Yung-Hsu Yang, Marc Pollefeys, Michele Magno

Abstract: Accurate detection and tracking of surrounding objects is essential to enable self-driving vehicles. While Light Detection and Ranging (LiDAR) sensors have set the benchmark for high performance, the appeal of camera-only solutions lies in their cost-effectiveness. Notably, despite the prevalent use of Radio Detection and Ranging (RADAR) sensors in automotive systems, their potential in 3D detecti… ▽ More Accurate detection and tracking of surrounding objects is essential to enable self-driving vehicles. While Light Detection and Ranging (LiDAR) sensors have set the benchmark for high performance, the appeal of camera-only solutions lies in their cost-effectiveness. Notably, despite the prevalent use of Radio Detection and Ranging (RADAR) sensors in automotive systems, their potential in 3D detection and tracking has been largely disregarded due to data sparsity and measurement noise. As a recent development, the combination of RADARs and cameras is emerging as a promising solution. This paper presents Camera-RADAR 3D Detection and Tracking (CR3DT), a camera-RADAR fusion model for 3D object detection, and Multi-Object Tracking (MOT). Building upon the foundations of the State-of-the-Art (SotA) camera-only BEVDet architecture, CR3DT demonstrates substantial improvements in both detection and tracking capabilities, by incorporating the spatial and velocity information of the RADAR sensor. Experimental results demonstrate an absolute improvement in detection performance of 5.3% in mean Average Precision (mAP) and a 14.9% increase in Average Multi-Object Tracking Accuracy (AMOTA) on the nuScenes dataset when leveraging both modalities. CR3DT bridges the gap between high-performance and cost-effective perception systems in autonomous driving, by capitalizing on the ubiquitous presence of RADAR in automotive applications. △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2401.01955 [pdf, other]

MULTI-CASE: A Transformer-based Ethics-aware Multimodal Investigative Intelligence Framework

Authors: Maximilian T. Fischer, Yannick Metz, Lucas Joos, Matthias Miller, Daniel A. Keim

Abstract: AI-driven models are increasingly deployed in operational analytics solutions, for instance, in investigative journalism or the intelligence community. Current approaches face two primary challenges: ethical and privacy concerns, as well as difficulties in efficiently combining heterogeneous data sources for multimodal analytics. To tackle the challenge of multimodal analytics, we present MULTI-CA… ▽ More AI-driven models are increasingly deployed in operational analytics solutions, for instance, in investigative journalism or the intelligence community. Current approaches face two primary challenges: ethical and privacy concerns, as well as difficulties in efficiently combining heterogeneous data sources for multimodal analytics. To tackle the challenge of multimodal analytics, we present MULTI-CASE, a holistic visual analytics framework tailored towards ethics-aware and multimodal intelligence exploration, designed in collaboration with domain experts. It leverages an equal joint agency between human and AI to explore and assess heterogeneous information spaces, checking and balancing automation through Visual Analytics. MULTI-CASE operates on a fully-integrated data model and features type-specific analysis with multiple linked components, including a combined search, annotated text view, and graph-based analysis. Parts of the underlying entity detection are based on a RoBERTa-based language model, which we tailored towards user requirements through fine-tuning. An overarching knowledge exploration graph combines all information streams, provides in-situ explanations, transparent source attribution, and facilitates effective exploration. To assess our approach, we conducted a comprehensive set of evaluations: We benchmarked the underlying language model on relevant NER tasks, achieving state-of-the-art performance. The demonstrator was assessed according to intelligence capability assessments, while the methodology was evaluated according to ethics design guidelines. As a case study, we present our framework in an investigative journalism setting, supporting war crime investigations. Finally, we conduct a formative user evaluation with domain experts in law enforcement. Our evaluations confirm that our framework facilitates human agency and steering in security-sensitive applications. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: 6 pages, 3 figures, 1 table

arXiv:2311.14276 [pdf, other]

Racing With ROS 2 A Navigation System for an Autonomous Formula Student Race Car

Authors: Alastair Bradford, Grant van Breda, Tobias Fischer

Abstract: The advent of autonomous vehicle technologies has significantly impacted various sectors, including motorsport, where Formula Student and Formula: Society of Automotive Engineers introduced autonomous racing classes. These offer new challenges to aspiring engineers, including the team at QUT Motorsport, but also raise the entry barrier due to the complexity of high-speed navigation and control. Th… ▽ More The advent of autonomous vehicle technologies has significantly impacted various sectors, including motorsport, where Formula Student and Formula: Society of Automotive Engineers introduced autonomous racing classes. These offer new challenges to aspiring engineers, including the team at QUT Motorsport, but also raise the entry barrier due to the complexity of high-speed navigation and control. This paper presents an open-source solution using the Robot Operating System 2, specifically its open-source navigation stack, to address these challenges in autonomous Formula Student race cars. We compare off-the-shelf navigation libraries that this stack comprises of against traditional custom-made programs developed by QUT Motorsport to evaluate their applicability in autonomous racing scenarios and integrate them onto an autonomous race car. Our contributions include quantitative and qualitative comparisons of these packages against traditional navigation solutions, aiming to lower the entry barrier for autonomous racing. This paper also serves as a comprehensive tutorial for teams participating in similar racing disciplines and other autonomous mobile robot applications. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: 10 pages, 6 figures

Journal ref: Australasian Conference on Robotics and Automation (ACRA 2023)

arXiv:2311.13186 [pdf, other]

Applications of Spiking Neural Networks in Visual Place Recognition

Authors: Somayeh Hussaini, Michael Milford, Tobias Fischer

Abstract: In robotics, Spiking Neural Networks (SNNs) are increasingly recognized for their largely-unrealized potential energy efficiency and low latency particularly when implemented on neuromorphic hardware. Our paper highlights three advancements for SNNs in Visual Place Recognition (VPR). First, we propose Modular SNNs, where each SNN represents a set of non-overlapping geographically distinct places,… ▽ More In robotics, Spiking Neural Networks (SNNs) are increasingly recognized for their largely-unrealized potential energy efficiency and low latency particularly when implemented on neuromorphic hardware. Our paper highlights three advancements for SNNs in Visual Place Recognition (VPR). First, we propose Modular SNNs, where each SNN represents a set of non-overlapping geographically distinct places, enabling scalable networks for large environments. Secondly, we present Ensembles of Modular SNNs, where multiple networks represent the same place, significantly enhancing accuracy compared to single-network models. Our SNNs are compact and small, comprising only 1500 neurons and 474k synapses, which makes them ideally suited for ensembling due to this small size. Lastly, we investigate the role of sequence matching in SNN-based VPR, a technique where consecutive images are used to refine place recognition. We analyze the responsiveness of SNNs to ensembling and sequence matching compared to other VPR techniques. Our contributions highlight the viability of SNNs for VPR, offering scalable and robust solutions, paving the way for their application in various energy-sensitive robotic tasks. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: 17 pages, 8 figures, under review

arXiv:2311.02872 [pdf, other]

FocusTune: Tuning Visual Localization through Focus-Guided Sampling

Authors: Son Tung Nguyen, Alejandro Fontan, Michael Milford, Tobias Fischer

Abstract: We propose FocusTune, a focus-guided sampling technique to improve the performance of visual localization algorithms. FocusTune directs a scene coordinate regression model towards regions critical for 3D point triangulation by exploiting key geometric constraints. Specifically, rather than uniformly sampling points across the image for training the scene coordinate regression model, we instead re-… ▽ More We propose FocusTune, a focus-guided sampling technique to improve the performance of visual localization algorithms. FocusTune directs a scene coordinate regression model towards regions critical for 3D point triangulation by exploiting key geometric constraints. Specifically, rather than uniformly sampling points across the image for training the scene coordinate regression model, we instead re-project 3D scene coordinates onto the 2D image plane and sample within a local neighborhood of the re-projected points. While our proposed sampling strategy is generally applicable, we showcase FocusTune by integrating it with the recently introduced Accelerated Coordinate Encoding (ACE) model. Our results demonstrate that FocusTune both improves or matches state-of-the-art performance whilst keeping ACE's appealing low storage and compute requirements, for example reducing translation error from 25 to 19 and 17 to 15 cm for single and ensemble models, respectively, on the Cambridge Landmarks dataset. This combination of high performance and low compute and storage requirements is particularly promising for applications in areas like mobile robotics and augmented reality. We made our code available at \url{https://github.com/sontung/focus-tune}. △ Less

Submitted 5 November, 2023; originally announced November 2023.

arXiv:2309.15405 [pdf]

Teach and Repeat Navigation: A Robust Control Approach

Authors: Payam Nourizadeh, Michael Milford, Tobias Fischer

Abstract: Robot navigation requires an autonomy pipeline that is robust to environmental changes and effective in varying conditions. Teach and Repeat (T&R) navigation has shown high performance in autonomous repeated tasks under challenging circumstances, but research within T&R has predominantly focused on motion planning as opposed to motion control. In this paper, we propose a novel T&R system based on… ▽ More Robot navigation requires an autonomy pipeline that is robust to environmental changes and effective in varying conditions. Teach and Repeat (T&R) navigation has shown high performance in autonomous repeated tasks under challenging circumstances, but research within T&R has predominantly focused on motion planning as opposed to motion control. In this paper, we propose a novel T&R system based on a robust motion control technique for a skid-steering mobile robot using sliding-mode control that effectively handles uncertainties that are particularly pronounced in the T&R task, where sensor noises, parametric uncertainties, and wheel-terrain interaction are common challenges. We first theoretically demonstrate that the proposed T&R system is globally stable and robust while considering the uncertainties of the closed-loop system. When deployed on a Clearpath Jackal robot, we then show the global stability of the proposed system in both indoor and outdoor environments covering different terrains, outperforming previous state-of-the-art methods in terms of mean average trajectory error and stability in these challenging environments. This paper makes an important step towards long-term autonomous T&R navigation with ensured safety guarantees. △ Less

Submitted 29 May, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

Comments: Accepted to IEEE International Conference on Robotics and Automation 2024 (ICRA2024)

arXiv:2309.10225 [pdf, other]

VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition

Authors: Adam D. Hines, Peter G. Stratton, Michael Milford, Tobias Fischer

Abstract: Spiking Neural Networks (SNNs) are at the forefront of neuromorphic computing thanks to their potential energy-efficiency, low latencies, and capacity for continual learning. While these capabilities are well suited for robotics tasks, SNNs have seen limited adaptation in this field thus far. This work introduces a SNN for Visual Place Recognition (VPR) that is both trainable within minutes and qu… ▽ More Spiking Neural Networks (SNNs) are at the forefront of neuromorphic computing thanks to their potential energy-efficiency, low latencies, and capacity for continual learning. While these capabilities are well suited for robotics tasks, SNNs have seen limited adaptation in this field thus far. This work introduces a SNN for Visual Place Recognition (VPR) that is both trainable within minutes and queryable in milliseconds, making it well suited for deployment on compute-constrained robotic systems. Our proposed system, VPRTempo, overcomes slow training and inference times using an abstracted SNN that trades biological realism for efficiency. VPRTempo employs a temporal code that determines the timing of a single spike based on a pixel's intensity, as opposed to prior SNNs relying on rate coding that determined the number of spikes; improving spike efficiency by over 100%. VPRTempo is trained using Spike-Timing Dependent Plasticity and a supervised delta learning rule enforcing that each output spiking neuron responds to just a single place. We evaluate our system on the Nordland and Oxford RobotCar benchmark localization datasets, which include up to 27k places. We found that VPRTempo's accuracy is comparable to prior SNNs and the popular NetVLAD place recognition algorithm, while being several orders of magnitude faster and suitable for real-time deployment -- with inference speeds over 50 Hz on CPU. VPRTempo could be integrated as a loop closure component for online SLAM on resource-constrained systems such as space and underwater robots. △ Less

Submitted 29 February, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

Comments: 8 pages, 3 figures, accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2024

arXiv:2308.14713 [pdf, other]

R3D3: Dense 3D Reconstruction of Dynamic Scenes from Multiple Cameras

Authors: Aron Schmied, Tobias Fischer, Martin Danelljan, Marc Pollefeys, Fisher Yu

Abstract: Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We p… ▽ More Dense 3D reconstruction and ego-motion estimation are key challenges in autonomous driving and robotics. Compared to the complex, multi-modal systems deployed today, multi-camera systems provide a simpler, low-cost alternative. However, camera-based 3D reconstruction of complex dynamic scenes has proven extremely difficult, as existing solutions often produce incomplete or incoherent results. We propose R3D3, a multi-camera system for dense 3D reconstruction and ego-motion estimation. Our approach iterates between geometric estimation that exploits spatial-temporal information from multiple cameras, and monocular depth refinement. We integrate multi-camera feature correlation and dense bundle adjustment operators that yield robust geometric depth and pose estimates. To improve reconstruction where geometric depth is unreliable, e.g. for moving objects or low-textured regions, we introduce learnable scene priors via a depth refinement network. We show that this design enables a dense, consistent 3D reconstruction of challenging, dynamic outdoor environments. Consequently, we achieve state-of-the-art dense depth prediction on the DDAD and NuScenes benchmarks. △ Less

Submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted to ICCV 2023. Project page is available at https://www.vis.xyz/pub/r3d3/

arXiv:2308.00257 [pdf, other]

Trajectory Tracking via Multiscale Continuous Attractor Networks

Authors: Therese Joseph, Tobias Fischer, Michael Milford

Abstract: Animals and insects showcase remarkably robust and adept navigational abilities, up to literally circumnavigating the globe. Primary progress in robotics inspired by these natural systems has occurred in two areas: highly theoretical computational neuroscience models, and handcrafted systems like RatSLAM and NeuroSLAM. In this research, we present work bridging the gap between the two, in the form… ▽ More Animals and insects showcase remarkably robust and adept navigational abilities, up to literally circumnavigating the globe. Primary progress in robotics inspired by these natural systems has occurred in two areas: highly theoretical computational neuroscience models, and handcrafted systems like RatSLAM and NeuroSLAM. In this research, we present work bridging the gap between the two, in the form of Multiscale Continuous Attractor Networks (MCAN), that combine the multiscale parallel spatial neural networks of the previous theoretical models with the real-world robustness of the robot-targeted systems, to enable trajectory tracking over large velocity ranges. To overcome the limitations of the reliance of previous systems on hand-tuned parameters, we present a genetic algorithm-based approach for automated tuning of these networks, substantially improving their usability. To provide challenging navigational scale ranges, we open source a flexible city-scale navigation simulator that adapts to any street network, enabling high throughput experimentation. In extensive experiments using the city-scale navigation environment and Kitti, we show that the system is capable of stable dead reckoning over a wide range of velocities and environmental scales, where a single-scale approach fails. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: 8 Pages, 8 Figures, accepted at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2023)

arXiv:2307.03493 [pdf, other]

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Authors: Gamze İslamoğlu, Moritz Scherer, Gianna Paulin, Tim Fischer, Victor J. B. Jung, Angelo Garofalo, Luca Benini

Abstract: Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work,… ▽ More Transformer networks have emerged as the state-of-the-art approach for natural language processing tasks and are gaining popularity in other domains such as computer vision and audio processing. However, the efficient hardware acceleration of transformer models poses new challenges due to their high arithmetic intensities, large memory requirements, and complex dataflow dependencies. In this work, we propose ITA, a novel accelerator architecture for transformers and related models that targets efficient inference on embedded systems by exploiting 8-bit quantization and an innovative softmax implementation that operates exclusively on integer values. By computing on-the-fly in streaming mode, our softmax implementation minimizes data movement and energy consumption. ITA achieves competitive energy efficiency with respect to state-of-the-art transformer accelerators with 16.9 TOPS/W, while outperforming them in area efficiency with 5.93 TOPS/mm$^2$ in 22 nm fully-depleted silicon-on-insulator technology at 0.8 V. △ Less

Submitted 10 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: Accepted for publication at the 2023 ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED)

arXiv:2305.08562 [pdf, other]

doi 10.1109/MDAT.2023.3306720

FlooNoC: A Multi-Tbps Wide NoC for Heterogeneous AXI4 Traffic

Authors: Tim Fischer, Michael Rogenmoser, Matheus Cavalcante, Frank K. Gürkaynak, Luca Benini

Abstract: Meeting the staggering bandwidth requirements of today's applications challenges the traditional narrow and serialized NoCs, which hit hard bounds on the maximum operating frequency. This paper proposes FlooNoC, an open-source, low-latency, fully AXI4-compatible NoC with wide physical channels for latency-tolerant high-bandwidth non-blocking transactions and decoupled latency-critical short messag… ▽ More Meeting the staggering bandwidth requirements of today's applications challenges the traditional narrow and serialized NoCs, which hit hard bounds on the maximum operating frequency. This paper proposes FlooNoC, an open-source, low-latency, fully AXI4-compatible NoC with wide physical channels for latency-tolerant high-bandwidth non-blocking transactions and decoupled latency-critical short messages. We demonstrate the feasibility of wide channels by integrating a 5x5 router and links within a 9-core compute cluster in 12 nm FinFet technology. Our NoC achieves a bandwidth of 629Gbps per link while running at only 1.23 GHz (at 0.19 pJ/B/hop), with just 10% area overhead post layout. △ Less

Submitted 6 August, 2023; v1 submitted 15 May, 2023; originally announced May 2023.

arXiv:2304.08408 [pdf, other]

OVTrack: Open-Vocabulary Multiple Object Tracking

Authors: Siyuan Li, Tobias Fischer, Lei Ke, Henghui Ding, Martin Danelljan, Fisher Yu

Abstract: The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited t… ▽ More The ability to recognize, localize and track dynamic objects in a scene is fundamental to many real-world applications, such as self-driving and robotic systems. Yet, traditional multiple object tracking (MOT) benchmarks rely only on a few object categories that hardly represent the multitude of possible objects that are encountered in the real world. This leaves contemporary MOT methods limited to a small set of pre-defined object categories. In this paper, we address this limitation by tackling a novel task, open-vocabulary MOT, that aims to evaluate tracking beyond pre-defined training categories. We further develop OVTrack, an open-vocabulary tracker that is capable of tracking arbitrary object classes. Its design is based on two key ingredients: First, leveraging vision-language models for both classification and association via knowledge distillation; second, a data hallucination strategy for robust appearance feature learning from denoising diffusion probabilistic models. The result is an extremely data-efficient open-vocabulary tracker that sets a new state-of-the-art on the large-scale, large-vocabulary TAO benchmark, while being trained solely on static images. Project page: https://www.vis.xyz/pub/ovtrack/ △ Less

Submitted 17 April, 2023; originally announced April 2023.

Comments: CVPR 2023

arXiv:2304.04640 [pdf, other]

NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Authors: Jason Yik, Korneel Van den Berghe, Douwe den Blanken, Younes Bouhadjar, Maxime Fabre, Paul Hueber, Denis Kleyko, Noah Pacik-Nelson, Pao-Sheng Vincent Sun, Guangzhi Tang, Shenqi Wang, Biyan Zhou, Soikat Hasan Ahmed, George Vathakkattil Joseph, Benedetto Leto, Aurora Micheli, Anurag Kumar Mishra, Gregor Lenz, Tao Sun, Zergham Ahmed, Mahmoud Akl, Brian Anderson, Andreas G. Andreou, Chiara Bartolozzi, Arindam Basu , et al. (73 additional authors not shown)

Abstract: Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neu… ▽ More Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it difficult to accurately measure technological advancements, compare performance with conventional methods, and identify promising future research directions. Prior neuromorphic computing benchmark efforts have not seen widespread adoption due to a lack of inclusive, actionable, and iterative benchmark design and guidelines. To address these shortcomings, we present NeuroBench: a benchmark framework for neuromorphic computing algorithms and systems. NeuroBench is a collaboratively-designed effort from an open community of nearly 100 co-authors across over 50 institutions in industry and academia, aiming to provide a representative structure for standardizing the evaluation of neuromorphic approaches. The NeuroBench framework introduces a common set of tools and systematic methodology for inclusive benchmark measurement, delivering an objective reference framework for quantifying neuromorphic approaches in both hardware-independent (algorithm track) and hardware-dependent (system track) settings. In this article, we present initial performance baselines across various model architectures on the algorithm track and outline the system track benchmark tasks and guidelines. NeuroBench is intended to continually expand its benchmarks and features to foster and track the progress made by the research community. △ Less

Submitted 17 January, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: Updated from whitepaper to full perspective article preprint

arXiv:2303.03281 [pdf, other]

doi 10.1109/MRA.2023.3310859

Visual Place Recognition: A Tutorial

Authors: Stefan Schubert, Peer Neubert, Sourav Garg, Michael Milford, Tobias Fischer

Abstract: Localization is an essential capability for mobile robots. A rapidly growing field of research in this area is Visual Place Recognition (VPR), which is the ability to recognize previously seen places in the world based solely on images. This present work is the first tutorial paper on visual place recognition. It unifies the terminology of VPR and complements prior research in two important direct… ▽ More Localization is an essential capability for mobile robots. A rapidly growing field of research in this area is Visual Place Recognition (VPR), which is the ability to recognize previously seen places in the world based solely on images. This present work is the first tutorial paper on visual place recognition. It unifies the terminology of VPR and complements prior research in two important directions: 1) It provides a systematic introduction for newcomers to the field, covering topics such as the formulation of the VPR problem, a general-purpose algorithmic pipeline, an evaluation methodology for VPR approaches, and the major challenges for VPR and how they may be addressed. 2) As a contribution for researchers acquainted with the VPR problem, it examines the intricacies of different VPR problem types regarding input, data processing, and output. The tutorial also discusses the subtleties behind the evaluation of VPR algorithms, e.g., the evaluation of a VPR system that has to find all matching database images per query, as opposed to just a single match. Practical code examples in Python illustrate to prospective practitioners and researchers how VPR is implemented and evaluated. △ Less

Submitted 9 August, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: IEEE Robotics & Automation Magazine (RAM)

arXiv:2303.00973 [pdf, other]

Image Labels Are All You Need for Coarse Seagrass Segmentation

Authors: Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Tobias Fischer

Abstract: Seagrass meadows serve as critical carbon sinks, but estimating the amount of carbon they store requires knowledge of the seagrass species present. Underwater and surface vehicles equipped with machine learning algorithms can help to accurately estimate the composition and extent of seagrass meadows at scale. However, previous approaches for seagrass detection and classification have required supe… ▽ More Seagrass meadows serve as critical carbon sinks, but estimating the amount of carbon they store requires knowledge of the seagrass species present. Underwater and surface vehicles equipped with machine learning algorithms can help to accurately estimate the composition and extent of seagrass meadows at scale. However, previous approaches for seagrass detection and classification have required supervision from patch-level labels. In this paper, we reframe seagrass classification as a weakly supervised coarse segmentation problem where image-level labels are used during training (25 times fewer labels compared to patch-level labeling) and patch-level outputs are obtained at inference time. To this end, we introduce SeaFeats, an architecture that uses unsupervised contrastive pre-training and feature similarity, and SeaCLIP, a model that showcases the effectiveness of large language models as a supervisory signal in domain-specific applications. We demonstrate that an ensemble of SeaFeats and SeaCLIP leads to highly robust performance. Our method outperforms previous approaches that require patch-level labels on the multi-species 'DeepSeagrass' dataset by 6.8% (absolute) for the class-weighted F1 score, and by 12.1% (absolute) for the seagrass presence/absence F1 score on the 'Global Wetlands' dataset. We also present two case studies for real-world deployment: outlier detection on the Global Wetlands dataset, and application of our method on imagery collected by the FloatyBoat autonomous surface vehicle. △ Less

Submitted 5 September, 2023; v1 submitted 2 March, 2023; originally announced March 2023.

Comments: 10 pages, 4 figures, additional 3 pages of supplementary material

Journal ref: 2024 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

arXiv:2212.01247 [pdf, other]

CC-3DT: Panoramic 3D Object Tracking via Cross-Camera Fusion

Authors: Tobias Fischer, Yung-Hsu Yang, Suryansh Kumar, Min Sun, Fisher Yu

Abstract: To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D… ▽ More To track the 3D locations and trajectories of the other traffic participants at any given time, modern autonomous vehicles are equipped with multiple cameras that cover the vehicle's full surroundings. Yet, camera-based 3D object tracking methods prioritize optimizing the single-camera setup and resort to post-hoc fusion in a multi-camera setup. In this paper, we propose a method for panoramic 3D object tracking, called CC-3DT, that associates and models object trajectories both temporally and across views, and improves the overall tracking consistency. In particular, our method fuses 3D detections from multiple cameras before association, reducing identity switches significantly and improving motion modeling. Our experiments on large-scale driving datasets show that fusion before association leads to a large margin of improvement over post-hoc fusion. We set a new state-of-the-art with 12.6% improvement in average multi-object tracking accuracy (AMOTA) among all camera-based methods on the competitive NuScenes 3D tracking benchmark, outperforming previously published methods by 6.5% in AMOTA with the same 3D detector. △ Less

Submitted 2 December, 2022; originally announced December 2022.

Comments: Project page: https://www.vis.xyz/pub/cc-3dt/

arXiv:2212.00688 [pdf, other]

TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology

Authors: Moritz Scherer, Alfio Di Mauro, Tim Fischer, Georg Rutishauser, Luca Benini

Abstract: Tiny Machine Learning (TinyML) applications impose uJ/Inference constraints, with a maximum power consumption of tens of mW. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital Ternary Neural Network (TNN) accelerator in a RISC-V-based System-on-Chip (SoC). Besides supporting Ternary Convolutional N… ▽ More Tiny Machine Learning (TinyML) applications impose uJ/Inference constraints, with a maximum power consumption of tens of mW. It is extremely challenging to meet these requirements at a reasonable accuracy level. This work addresses the challenge with a flexible, fully digital Ternary Neural Network (TNN) accelerator in a RISC-V-based System-on-Chip (SoC). Besides supporting Ternary Convolutional Neural Networks, we introduce extensions to the accelerator design that enable the processing of time-dilated Temporal Convolutional Neural Networks (TCNs). The design achieves 5.5 uJ/Inference, 12.2 mW, 8000 Inferences/sec at 0.5 V for a Dynamic Vision Sensor (DVS) based TCN, and an accuracy of 94.5 % and 2.72 uJ/Inference, 12.2 mW, 3200 Inferences/sec at 0.5 V for a non-trivial 9-layer, 96 channels-per-layer convolutional network with CIFAR-10 accuracy of 86 %. The peak energy efficiency is 1036 TOp/s/W, outperforming the state-of-the-art silicon-proven TinyML quantized accelerators by 1.67x while achieving competitive accuracy. △ Less

Submitted 1 December, 2022; originally announced December 2022.

Comments: Accepted at IEEE MICRO Journal

arXiv:2211.13989 [pdf, other]

HexaMesh: Scaling to Hundreds of Chiplets with an Optimized Chiplet Arrangement

Authors: Patrick Iff, Maciej Besta, Matheus Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler

Abstract: 2.5D integration is an important technique to tackle the growing cost of manufacturing chips in advanced technology nodes. This poses the challenge of providing high-performance inter-chiplet interconnects (ICIs). As the number of chiplets grows to tens or hundreds, it becomes infeasible to hand-optimize their arrangement in a way that maximizes the ICI performance. In this paper, we propose HexaM… ▽ More 2.5D integration is an important technique to tackle the growing cost of manufacturing chips in advanced technology nodes. This poses the challenge of providing high-performance inter-chiplet interconnects (ICIs). As the number of chiplets grows to tens or hundreds, it becomes infeasible to hand-optimize their arrangement in a way that maximizes the ICI performance. In this paper, we propose HexaMesh, an arrangement of chiplets that outperforms a grid arrangement both in theory (network diameter reduced by 42%; bisection bandwidth improved by 130%) and in practice (latency reduced by 19%; throughput improved by 34%). MexaMesh enables large-scale chiplet designs with high-performance ICIs. △ Less

Submitted 8 October, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

arXiv:2211.13980 [pdf, other]

Sparse Hamming Graph: A Customizable Network-on-Chip Topology

Authors: Patrick Iff, Maciej Besta, Matheus Cavalcante, Tim Fischer, Luca Benini, Torsten Hoefler

Abstract: Chips with hundreds to thousands of cores require scalable networks-on-chip (NoCs). Customization of the NoC topology is necessary to reach the diverse design goals of different chips. We introduce sparse Hamming graph, a novel NoC topology with an adjustable costperformance trade-off that is based on four NoC topology design principles we identified. To efficiently customize this topology, we dev… ▽ More Chips with hundreds to thousands of cores require scalable networks-on-chip (NoCs). Customization of the NoC topology is necessary to reach the diverse design goals of different chips. We introduce sparse Hamming graph, a novel NoC topology with an adjustable costperformance trade-off that is based on four NoC topology design principles we identified. To efficiently customize this topology, we develop a toolchain that leverages approximate floorplanning and link routing to deliver fast and accurate cost and performance predictions. We demonstrate how to use our methodology to achieve desired cost-performance trade-offs while outperforming established topologies in cost, performance, or both. △ Less

Submitted 28 June, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

arXiv:2210.07509 [pdf, other]

doi 10.1109/ICRA48891.2023.10161561

Boosting Performance of a Baseline Visual Place Recognition Technique by Predicting the Maximally Complementary Technique

Authors: Connor Malone, Stephen Hausler, Tobias Fischer, Michael Milford

Abstract: One recent promising approach to the Visual Place Recognition (VPR) problem has been to fuse the place recognition estimates of multiple complementary VPR techniques using methods such as SRAL and multi-process fusion. These approaches come with a substantial practical limitation: they require all potential VPR methods to be brute-force run before they are selectively fused. The obvious solution t… ▽ More One recent promising approach to the Visual Place Recognition (VPR) problem has been to fuse the place recognition estimates of multiple complementary VPR techniques using methods such as SRAL and multi-process fusion. These approaches come with a substantial practical limitation: they require all potential VPR methods to be brute-force run before they are selectively fused. The obvious solution to this limitation is to predict the viable subset of methods ahead of time, but this is challenging because it requires a predictive signal within the imagery itself that is indicative of high performance methods. Here we propose an alternative approach that instead starts with a known single base VPR technique, and learns to predict the most complementary additional VPR technique to fuse with it, that results in the largest improvement in performance. The key innovation here is to use a dimensionally reduced difference vector between the query image and the top-retrieved reference image using this baseline technique as the predictive signal of the most complementary additional technique, both during training and inference. We demonstrate that our approach can train a single network to select performant, complementary technique pairs across datasets which span multiple modes of transportation (train, car, walking) as well as to generalise to unseen datasets, outperforming multiple baseline strategies for manually selecting the best technique pairs based on the same training data. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: 7 pages, 5 figures. arXiv admin note: text overlap with arXiv:2112.04701

Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2210.06984 [pdf, other]

QDTrack: Quasi-Dense Similarity Learning for Appearance-Only Multiple Object Tracking

Authors: Tobias Fischer, Thomas E. Huang, Jiangmiao Pang, Linlu Qiu, Haofeng Chen, Trevor Darrell, Fisher Yu

Abstract: Similarity learning has been recognized as a crucial step for object tracking. However, existing multiple object tracking methods only use sparse ground truth matching as the training objective, while ignoring the majority of the informative regions in images. In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contras… ▽ More Similarity learning has been recognized as a crucial step for object tracking. However, existing multiple object tracking methods only use sparse ground truth matching as the training objective, while ignoring the majority of the informative regions in images. In this paper, we present Quasi-Dense Similarity Learning, which densely samples hundreds of object regions on a pair of images for contrastive learning. We combine this similarity learning with multiple existing object detectors to build Quasi-Dense Tracking (QDTrack), which does not require displacement regression or motion priors. We find that the resulting distinctive feature space admits a simple nearest neighbor search at inference time for object association. In addition, we show that our similarity learning scheme is not limited to video data, but can learn effective instance similarity even from static input, enabling a competitive tracking performance without training on videos or using tracking supervision. We conduct extensive experiments on a wide variety of popular MOT benchmarks. We find that, despite its simplicity, QDTrack rivals the performance of state-of-the-art tracking methods on all benchmarks and sets a new state-of-the-art on the large-scale BDD100K MOT benchmark, while introducing negligible computational overhead to the detector. △ Less

Submitted 27 September, 2023; v1 submitted 12 October, 2022; originally announced October 2022.

arXiv:2209.08723 [pdf, other]

Ensembles of Compact, Region-specific & Regularized Spiking Neural Networks for Scalable Place Recognition

Authors: Somayeh Hussaini, Michael Milford, Tobias Fischer

Abstract: Spiking neural networks have significant potential utility in robotics due to their high energy efficiency on specialized hardware, but proof-of-concept implementations have not yet typically achieved competitive performance or capability with conventional approaches. In this paper, we tackle one of the key practical challenges of scalability by introducing a novel modular ensemble network approac… ▽ More Spiking neural networks have significant potential utility in robotics due to their high energy efficiency on specialized hardware, but proof-of-concept implementations have not yet typically achieved competitive performance or capability with conventional approaches. In this paper, we tackle one of the key practical challenges of scalability by introducing a novel modular ensemble network approach, where compact, localized spiking networks each learn and are solely responsible for recognizing places in a local region of the environment only. This modular approach creates a highly scalable system. However, it comes with a high-performance cost where a lack of global regularization at deployment time leads to hyperactive neurons that erroneously respond to places outside their learned region. Our second contribution introduces a regularization approach that detects and removes these problematic hyperactive neurons during the initial environmental learning phase. We evaluate this new scalable modular system on benchmark localization datasets Nordland and Oxford RobotCar, with comparisons to standard techniques NetVLAD, DenseVLAD, and SAD, and a previous spiking neural network system. Our system substantially outperforms the previous SNN system on its small dataset, but also maintains performance on 27 times larger benchmark datasets where the operation of the previous system is computationally infeasible, and performs competitively with the conventional localization systems. △ Less

Submitted 5 May, 2023; v1 submitted 18 September, 2022; originally announced September 2022.

Comments: 8 pages, 6 figures, accepted to the IEEE International Conference on Robotics and Automation (ICRA) 2023

arXiv:2208.13930 [pdf, other]

SAFE: Sensitivity-Aware Features for Out-of-Distribution Object Detection

Authors: Samuel Wilson, Tobias Fischer, Feras Dayoub, Dimity Miller, Niko Sünderhauf

Abstract: We address the problem of out-of-distribution (OOD) detection for the task of object detection. We show that residual convolutional layers with batch normalisation produce Sensitivity-Aware FEatures (SAFE) that are consistently powerful for distinguishing in-distribution from out-of-distribution detections. We extract SAFE vectors for every detected object, and train a multilayer perceptron on the… ▽ More We address the problem of out-of-distribution (OOD) detection for the task of object detection. We show that residual convolutional layers with batch normalisation produce Sensitivity-Aware FEatures (SAFE) that are consistently powerful for distinguishing in-distribution from out-of-distribution detections. We extract SAFE vectors for every detected object, and train a multilayer perceptron on the surrogate task of distinguishing adversarially perturbed from clean in-distribution examples. This circumvents the need for realistic OOD training data, computationally expensive generative models, or retraining of the base object detector. SAFE outperforms the state-of-the-art OOD object detectors on multiple benchmarks by large margins, e.g. reducing the FPR95 by an absolute 30.6% from 48.3% to 17.7% on the OpenImages dataset. △ Less

Submitted 22 August, 2023; v1 submitted 29 August, 2022; originally announced August 2022.

Journal ref: IEEE International Conference on Computer Vision 2023

arXiv:2207.03192 [pdf, other]

MiniFloat-NN and ExSdotp: An ISA Extension and a Modular Open Hardware Unit for Low-Precision Training on RISC-V cores

Authors: Luca Bertaccini, Gianna Paulin, Tim Fischer, Stefan Mach, Luca Benini

Abstract: Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of the NN models and improving the energy efficiency of the underlying hardware architectures. Narrow integer data types have been vastly investigated for NN inference and have successfully been pushed to the extreme of ternary and binary representations. In… ▽ More Low-precision formats have recently driven major breakthroughs in neural network (NN) training and inference by reducing the memory footprint of the NN models and improving the energy efficiency of the underlying hardware architectures. Narrow integer data types have been vastly investigated for NN inference and have successfully been pushed to the extreme of ternary and binary representations. In contrast, most training-oriented platforms use at least 16-bit floating-point (FP) formats. Lower-precision data types such as 8-bit FP formats and mixed-precision techniques have only recently been explored in hardware implementations. We present MiniFloat-NN, a RISC-V instruction set architecture extension for low-precision NN training, providing support for two 8-bit and two 16-bit FP formats and expanding operations. The extension includes sum-of-dot-product instructions that accumulate the result in a larger format and three-term additions in two variations: expanding and non-expanding. We implement an ExSdotp unit to efficiently support in hardware both instruction types. The fused nature of the ExSdotp module prevents precision losses generated by the non-associativity of two consecutive FP additions while saving around 30% of the area and critical path compared to a cascade of two expanding fused multiply-add units. We replicate the ExSdotp module in a SIMD wrapper and integrate it into an open-source floating-point unit, which, coupled to an open-source RISC-V core, lays the foundation for future scalable architectures targeting low-precision and mixed-precision NN training. A cluster containing eight extended cores sharing a scratchpad memory, implemented in 12 nm FinFET technology, achieves up to 575 GFLOPS/W when computing FP8-to-FP16 GEMMs at 0.8 V, 1.26 GHz. △ Less

Submitted 7 July, 2022; originally announced July 2022.

Comments: This work has been submitted to the ARITH22 - IEEE Symposium on Computer Arithmetic for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. 8 pages

arXiv:2206.13673 [pdf, other]

How Many Events do You Need? Event-based Visual Place Recognition Using Sparse But Varying Pixels

Authors: Tobias Fischer, Michael Milford

Abstract: Event cameras continue to attract interest due to desirable characteristics such as high dynamic range, low latency, virtually no motion blur, and high energy efficiency. One of the potential applications that would benefit from these characteristics lies in visual place recognition for robot localization, i.e. matching a query observation to the corresponding reference place in the database. In t… ▽ More Event cameras continue to attract interest due to desirable characteristics such as high dynamic range, low latency, virtually no motion blur, and high energy efficiency. One of the potential applications that would benefit from these characteristics lies in visual place recognition for robot localization, i.e. matching a query observation to the corresponding reference place in the database. In this letter, we explore the distinctiveness of event streams from a small subset of pixels (in the tens or hundreds). We demonstrate that the absolute difference in the number of events at those pixel locations accumulated into event frames can be sufficient for the place recognition task, when pixels that display large variations in the reference set are used. Using such sparse (over image coordinates) but varying (variance over the number of events per pixel location) pixels enables frequent and computationally cheap updates of the location estimates. Furthermore, when event frames contain a constant number of events, our method takes full advantage of the event-driven nature of the sensory stream and displays promising robustness to changes in velocity. We evaluate our proposed approach on the Brisbane-Event-VPR dataset in an outdoor driving scenario, as well as the newly contributed indoor QCR-Event-VPR dataset that was captured with a DAVIS346 camera mounted on a mobile robotic platform. Our results show that our approach achieves competitive performance when compared to several baseline methods on those datasets, and is particularly well suited for compute- and energy-constrained platforms such as interplanetary rovers. △ Less

Submitted 13 October, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: 8 pages

Journal ref: IEEE Robotics and Automation Letters 2022

arXiv:2203.09859 [pdf, other]

doi 10.1145/3531146.3533151

Promoting Ethical Awareness in Communication Analysis: Investigating Potentials and Limits of Visual Analytics for Intelligence Applications

Authors: Maximilian T. Fischer, Simon David Hirsbrunner, Wolfgang Jentner, Matthias Miller, Daniel A. Keim, Paula Helm

Abstract: Digital systems for analyzing human communication data have become prevalent in recent years. Intelligence analysis of communications data in investigative journalism, criminal intelligence, and law present particularly interesting cases, as they must take into account the often highly sensitive properties of the underlying operations and data. At the same time, these are areas where increasingly… ▽ More Digital systems for analyzing human communication data have become prevalent in recent years. Intelligence analysis of communications data in investigative journalism, criminal intelligence, and law present particularly interesting cases, as they must take into account the often highly sensitive properties of the underlying operations and data. At the same time, these are areas where increasingly automated, sophisticated approaches systems can be particularly relevant, especially in terms of Big Data manageability. However, by the shifting of responsibilities, this also poses dangers. In addition to privacy concerns, these dangers relate to uncertain or poor data quality, leading to discrimination and potentially misleading insights. Visual analytics combines machine learning methods with interactive visual interfaces to enable human sense- and decision-making. This technique can be key for designing and operating meaningful interactive communication analysis systems that consider these ethical challenges. In this interdisciplinary work, a joint endeavor of computer scientists, ethicists, and scholars in Science & Technology Studies, we investigate and evaluate opportunities and risks involved in using Visual analytics approaches for communication analysis in intelligence applications in particular. We introduce, at first, the common technological systems used in communication analysis, further discussing the domain-specific ethical implications, tensions, and risks involved. We then make the case of how tailored Visual Analytics approaches may reduce and mitigate the described problems, both theoretically and through practical examples. We show that finding Visual Analytics design solutions for ethical issues is not a mere optimization task, but balancing out and negotiating these trade-offs has, as we argue, to be an integral aspect of the system design process from the outset. △ Less

Submitted 2 May, 2022; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: 13 pages, 4 figures

Journal ref: Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22), June 21--24, 2022, Seoul, Republic of Korea

arXiv:2202.13487 [pdf, other]

doi 10.1109/LRA.2022.3187836

Point Label Aware Superpixels for Multi-species Segmentation of Underwater Imagery

Authors: Scarlett Raine, Ross Marchant, Brano Kusy, Frederic Maire, Tobias Fischer

Abstract: Monitoring coral reefs using underwater vehicles increases the range of marine surveys and availability of historical ecological data by collecting significant quantities of images. Analysis of this imagery can be automated using a model trained to perform semantic segmentation, however it is too costly and time-consuming to densely label images for training supervised models. In this letter, we l… ▽ More Monitoring coral reefs using underwater vehicles increases the range of marine surveys and availability of historical ecological data by collecting significant quantities of images. Analysis of this imagery can be automated using a model trained to perform semantic segmentation, however it is too costly and time-consuming to densely label images for training supervised models. In this letter, we leverage photo-quadrat imagery labeled by ecologists with sparse point labels. We propose a point label aware method for propagating labels within superpixel regions to obtain augmented ground truth for training a semantic segmentation model. Our point label aware superpixel method utilizes the sparse point labels, and clusters pixels using learned features to accurately generate single-species segments in cluttered, complex coral images. Our method outperforms prior methods on the UCSD Mosaics dataset by 3.62% for pixel accuracy and 8.35% for mean IoU for the label propagation task, while reducing computation time reported by previous approaches by 76%. We train a DeepLabv3+ architecture and outperform state-of-the-art for semantic segmentation by 2.91% for pixel accuracy and 9.65% for mean IoU on the UCSD Mosaics dataset and by 4.19% for pixel accuracy and 14.32% mean IoU for the Eilat dataset. △ Less

Submitted 10 July, 2022; v1 submitted 27 February, 2022; originally announced February 2022.

Journal ref: IEEE Robotics and Automation Letters 2022, vol. 7, no. 3, pp. 8291-8298

arXiv:2112.05341 [pdf, other]

Hyperdimensional Feature Fusion for Out-Of-Distribution Detection

Authors: Samuel Wilson, Tobias Fischer, Niko Sünderhauf, Feras Dayoub

Abstract: We introduce powerful ideas from Hyperdimensional Computing into the challenging field of Out-of-Distribution (OOD) detection. In contrast to most existing work that performs OOD detection based on only a single layer of a neural network, we use similarity-preserving semi-orthogonal projection matrices to project the feature maps from multiple layers into a common vector space. By repeatedly apply… ▽ More We introduce powerful ideas from Hyperdimensional Computing into the challenging field of Out-of-Distribution (OOD) detection. In contrast to most existing work that performs OOD detection based on only a single layer of a neural network, we use similarity-preserving semi-orthogonal projection matrices to project the feature maps from multiple layers into a common vector space. By repeatedly applying the bundling operation $\oplus$, we create expressive class-specific descriptor vectors for all in-distribution classes. At test time, a simple and efficient cosine similarity calculation between descriptor vectors consistently identifies OOD samples with better performance than the current state-of-the-art. We show that the hyperdimensional fusion of multiple network layers is critical to achieve best general performance. △ Less

Submitted 29 August, 2022; v1 submitted 10 December, 2021; originally announced December 2021.

Comments: Accepted to WACV2023

arXiv:2112.04701 [pdf, other]

Unsupervised Complementary-aware Multi-process Fusion for Visual Place Recognition

Authors: Stephen Hausler, Tobias Fischer, Michael Milford

Abstract: A recent approach to the Visual Place Recognition (VPR) problem has been to fuse the place recognition estimates of multiple complementary VPR techniques simultaneously. However, selecting the optimal set of techniques to use in a specific deployment environment a-priori is a difficult and unresolved challenge. Further, to the best of our knowledge, no method exists which can select a set of techn… ▽ More A recent approach to the Visual Place Recognition (VPR) problem has been to fuse the place recognition estimates of multiple complementary VPR techniques simultaneously. However, selecting the optimal set of techniques to use in a specific deployment environment a-priori is a difficult and unresolved challenge. Further, to the best of our knowledge, no method exists which can select a set of techniques on a frame-by-frame basis in response to image-to-image variations. In this work, we propose an unsupervised algorithm that finds the most robust set of VPR techniques to use in the current deployment environment, on a frame-by-frame basis. The selection of techniques is determined by an analysis of the similarity scores between the current query image and the collection of database images and does not require ground-truth information. We demonstrate our approach on a wide variety of datasets and VPR techniques and show that the proposed dynamic multi-process fusion (Dyn-MPF) has superior VPR performance compared to a variety of challenging competitive methods, some of which are given an unfair advantage through access to the ground-truth information. △ Less

Submitted 8 December, 2021; originally announced December 2021.

arXiv:2110.10756 [pdf, ps, other]

doi 10.1109/TSP.2022.3200548

Ambiguities in Direction-of-Arrival Estimation with Linear Arrays

Authors: Frederic Matter, Tobias Fischer, Marius Pesavento, Marc E. Pfetsch

Abstract: In this paper, we present a novel approach to compute ambiguities in thinned uniform linear arrays, i.e., sparse non-uniform linear arrays, via a mixed-integer program. Ambiguities arise when there exists a set of distinct directions-of-arrival, for which the corresponding steering matrix is rank-deficient and are associated with nonunique parameter estimation. Our approach uses Young tableaux for… ▽ More In this paper, we present a novel approach to compute ambiguities in thinned uniform linear arrays, i.e., sparse non-uniform linear arrays, via a mixed-integer program. Ambiguities arise when there exists a set of distinct directions-of-arrival, for which the corresponding steering matrix is rank-deficient and are associated with nonunique parameter estimation. Our approach uses Young tableaux for which a submatrix of the steering matrix has a vanishing determinant, which can be expressed through vanishing sums of unit roots. Each of these vanishing sums then corresponds to an ambiguous set of directions-of-arrival. We derive a method to enumerate such ambiguous sets using a mixed-integer program and present results on several examples. △ Less

Submitted 20 October, 2021; originally announced October 2021.

arXiv:2109.06452 [pdf, other]

doi 10.1109/LRA.2022.3149030

Spiking Neural Networks for Visual Place Recognition via Weighted Neuronal Assignments

Authors: Somayeh Hussaini, Michael Milford, Tobias Fischer

Abstract: Spiking neural networks (SNNs) offer both compelling potential advantages, including energy efficiency and low latencies and challenges including the non-differentiable nature of event spikes. Much of the initial research in this area has converted deep neural networks to equivalent SNNs, but this conversion approach potentially negates some of the advantages of SNN-based approaches developed from… ▽ More Spiking neural networks (SNNs) offer both compelling potential advantages, including energy efficiency and low latencies and challenges including the non-differentiable nature of event spikes. Much of the initial research in this area has converted deep neural networks to equivalent SNNs, but this conversion approach potentially negates some of the advantages of SNN-based approaches developed from scratch. One promising area for high-performance SNNs is template matching and image recognition. This research introduces the first high-performance SNN for the Visual Place Recognition (VPR) task: given a query image, the SNN has to find the closest match out of a list of reference images. At the core of this new system is a novel assignment scheme that implements a form of ambiguity-informed salience, by up-weighting single-place-encoding neurons and down-weighting "ambiguous" neurons that respond to multiple different reference places. In a range of experiments on the challenging Nordland, Oxford RobotCar, SPEDTest, Synthia, and St Lucia datasets, we show that our SNN achieves comparable VPR performance to state-of-the-art and classical techniques, and degrades gracefully in performance with an increasing number of reference places. Our results provide a significant milestone towards SNNs that can provide robust, energy-efficient, and low latency robot localization. △ Less

Submitted 9 February, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

Comments: 8 pages, 6 figures, IEEE Robotics and Automation Letters (RA-L), also accepted to IEEE International Conference on Robotics and Automation (ICRA 2022)

Journal ref: IEEE Robotics and Automation Letters 2022

arXiv:2109.00097 [pdf, other]

Bio-inspired robot perception coupled with robot-modeled human perception

Authors: Tobias Fischer

Abstract: My overarching research goal is to provide robots with perceptional abilities that allow interactions with humans in a human-like manner. To develop these perceptional abilities, I believe that it is useful to study the principles of the human visual system. I use these principles to develop new computer vision algorithms and validate their effectiveness in intelligent robotic systems. I am enthus… ▽ More My overarching research goal is to provide robots with perceptional abilities that allow interactions with humans in a human-like manner. To develop these perceptional abilities, I believe that it is useful to study the principles of the human visual system. I use these principles to develop new computer vision algorithms and validate their effectiveness in intelligent robotic systems. I am enthusiastic about this approach as it offers the dual benefit of uncovering principles inherent in the human visual system, as well as applying these principles to its artificial counterpart. Fig. 1 contains a depiction of my research. △ Less

Submitted 31 August, 2021; originally announced September 2021.

Comments: Paper accepted to the "Robotics: Science and Systems Pioneers Workshop 2021"

arXiv:2107.13936 [pdf, other]

doi 10.1109/VIS49827.2021.9623305

Towards a Survey on Static and Dynamic Hypergraph Visualizations

Authors: Maximilian T. Fischer, Alexander Frings, Daniel A. Keim, Daniel Seebacher

Abstract: Leveraging hypergraph structures to model advanced processes has gained much attention over the last few years in many areas, ranging from protein-interaction in computational biology to image retrieval using machine learning. Hypergraph models can provide a more accurate representation of the underlying processes while reducing the overall number of links compared to regular representations. Howe… ▽ More Leveraging hypergraph structures to model advanced processes has gained much attention over the last few years in many areas, ranging from protein-interaction in computational biology to image retrieval using machine learning. Hypergraph models can provide a more accurate representation of the underlying processes while reducing the overall number of links compared to regular representations. However, interactive visualization methods for hypergraphs and hypergraph-based models have rarely been explored or systematically analyzed. This paper reviews the existing research landscape for hypergraph and hypergraph model visualizations and assesses the currently employed techniques. We provide an overview and a categorization of proposed approaches, focusing on performance, scalability, interaction support, successful evaluation, and the ability to represent different underlying data structures, including a recent demand for a temporal representation of interaction networks and their improvements beyond graph-based methods. Lastly, we discuss the strengths and weaknesses of the approaches and give an insight into the future challenges arising in this emerging research field. △ Less

Submitted 29 July, 2021; originally announced July 2021.

Comments: 2021 IEEE Visualization Conference (VIS)

Journal ref: 2021 IEEE Visualization Conference (VIS)

arXiv:2107.07707 [pdf, other]

doi 10.1109/LRA.2021.3096745

Probabilistic Appearance-Invariant Topometric Localization with New Place Awareness

Authors: Ming Xu, Tobias Fischer, Niko Sünderhauf, Michael Milford

Abstract: Probabilistic state-estimation approaches offer a principled foundation for designing localization systems, because they naturally integrate sequences of imperfect motion and exteroceptive sensor data. Recently, probabilistic localization systems utilizing appearance-invariant visual place recognition (VPR) methods as the primary exteroceptive sensor have demonstrated state-of-the-art performance… ▽ More Probabilistic state-estimation approaches offer a principled foundation for designing localization systems, because they naturally integrate sequences of imperfect motion and exteroceptive sensor data. Recently, probabilistic localization systems utilizing appearance-invariant visual place recognition (VPR) methods as the primary exteroceptive sensor have demonstrated state-of-the-art performance in the presence of substantial appearance change. However, existing systems 1) do not fully utilize odometry data within the motion models, and 2) are unable to handle route deviations, due to the assumption that query traverses exactly repeat the mapping traverse. To address these shortcomings, we present a new probabilistic topometric localization system which incorporates full 3-dof odometry into the motion model and furthermore, adds an "off-map" state within the state-estimation framework, allowing query traverses which feature significant route detours from the reference map to be successfully localized. We perform extensive evaluation on multiple query traverses from the Oxford RobotCar dataset exhibiting both significant appearance change and deviations from routes previously traversed. In particular, we evaluate performance on two practically relevant localization tasks: loop closure detection and global localization. Our approach achieves major performance improvements over both existing and improved state-of-the-art systems. △ Less

Submitted 16 July, 2021; originally announced July 2021.

Comments: 8 pages

Journal ref: IEEE Robotics and Automation Letters and IROS 2021

arXiv:2106.14802 [pdf, other]

doi 10.1109/VDS57266.2022.00006

Communication Analysis through Visual Analytics: Current Practices, Challenges, and New Frontiers

Authors: Maximilian T. Fischer, Frederik L. Dennig, Daniel Seebacher, Daniel A. Keim, Mennatallah El-Assady

Abstract: The automated analysis of digital human communication data often focuses on specific aspects such as content or network structure in isolation. This can provide limited perspectives while making cross-methodological analyses, occurring in domains like investigative journalism, difficult. Communication research in psychology and the digital humanities instead stresses the importance of a holistic a… ▽ More The automated analysis of digital human communication data often focuses on specific aspects such as content or network structure in isolation. This can provide limited perspectives while making cross-methodological analyses, occurring in domains like investigative journalism, difficult. Communication research in psychology and the digital humanities instead stresses the importance of a holistic approach to overcome these limiting factors. In this work, we conduct an extensive survey on the properties of over forty semi-automated communication analysis systems and investigate how they cover concepts described in theoretical communication research. From these investigations, we derive a design space and contribute a conceptual framework based on communication research, technical considerations, and the surveyed approaches. The framework describes the systems' properties, capabilities, and composition through a wide range of criteria organized in the dimensions (1) Data, (2) Processing and Models, (3) Visual Interface, and (4) Knowledge Generation. These criteria enable a formalization of digital communication analysis through visual analytics, which, we argue, is uniquely suited for this task by tackling automation complexity while leveraging domain knowledge. With our framework, we identify shortcomings and research challenges, such as group communication dynamics, trust and privacy considerations, and holistic approaches. Simultaneously, our framework supports the evaluation of systems and promotes the mutual exchange between researchers through a structured common language, laying the foundations for future research on communication analysis. △ Less

Submitted 6 July, 2022; v1 submitted 28 June, 2021; originally announced June 2021.

Comments: 11 pages, 2 tables, 1 figure

Journal ref: 2022 IEEE Visualization in Data Science (VDS)

Showing 1–50 of 71 results for author: Fischer, T