subscribe to arXiv mailings

DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

Authors: Jer Pelhan, Alan Lukežič, Vitjan Zavrtanik, Matej Kristan

Abstract: Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, wh… ▽ More Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR2024

arXiv:2401.03872 [pdf, other]

A New Dataset and a Distractor-Aware Architecture for Transparent Object Tracking

Authors: Alan Lukezic, Ziga Trojer, Jiri Matas, Matej Kristan

Abstract: Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, developme… ▽ More Performance of modern trackers degrades substantially on transparent objects compared to opaque objects. This is largely due to two distinct reasons. Transparent objects are unique in that their appearance is directly affected by the background. Furthermore, transparent object scenes often contain many visually similar objects (distractors), which often lead to tracking failure. However, development of modern tracking architectures requires large training sets, which do not exist in transparent object tracking. We present two contributions addressing the aforementioned issues. We propose the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Standard trackers trained on this dataset consistently improve by up to 16%. Our second contribution is a new distractor-aware transparent object tracker (DiTra) that treats localization accuracy and target identification as separate tasks and implements them by a novel architecture. DiTra sets a new state-of-the-art in transparent object tracking and generalizes well to opaque objects. △ Less

Submitted 8 January, 2024; originally announced January 2024.

Comments: Under the review. arXiv admin note: substantial text overlap with arXiv:2210.03436

arXiv:2312.09646 [pdf, ps, other]

Exact Algorithms and Lowerbounds for Multiagent Pathfinding: Power of Treelike Topology

Authors: Foivos Fioravantes, Dušan Knop, Jan Matyáš Křišťan, Nikolaos Melissinos, Michal Opler

Abstract: In the Multiagent Path Finding problem (MAPF for short), we focus on efficiently finding non-colliding paths for a set of $k$ agents on a given graph $G$, where each agent seeks a path from its source vertex to a target. An important measure of the quality of the solution is the length of the proposed schedule $\ell$, that is, the length of a longest path (including the waiting time). In this work… ▽ More In the Multiagent Path Finding problem (MAPF for short), we focus on efficiently finding non-colliding paths for a set of $k$ agents on a given graph $G$, where each agent seeks a path from its source vertex to a target. An important measure of the quality of the solution is the length of the proposed schedule $\ell$, that is, the length of a longest path (including the waiting time). In this work, we propose a systematic study under the parameterized complexity framework. The hardness results we provide align with many heuristics used for this problem, whose running time could potentially be improved based on our fixed-parameter tractability results. We show that MAPF is W[1]-hard with respect to $k$ (even if $k$ is combined with the maximum degree of the input graph). The problem remains NP-hard in planar graphs even if the maximum degree and the makespan$\ell$ are fixed constants. On the positive side, we show an FPT algorithm for $k+\ell$. As we delve further, the structure of~$G$ comes into play. We give an FPT algorithm for parameter $k$ plus the diameter of the graph~$G$. The MAPF problem is W[1]-hard for cliquewidth of $G$ plus $\ell$ while it is FPT for treewidth of $G$ plus $\ell$. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: Accepted by AAAI'24

arXiv:2311.14762 [pdf, other]

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

Authors: Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo , et al. (24 additional authors not shown)

Abstract: The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst… ▽ More The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Part of 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 IEEE Xplore submission as part of WACV 2024

arXiv:2311.01117 [pdf, other]

Cheating Depth: Enhancing 3D Surface Anomaly Detection via Depth Simulation

Authors: Vitjan Zavrtanik, Matej Kristan, Danijel Skočaj

Abstract: RGB-based surface anomaly detection methods have advanced significantly. However, certain surface anomalies remain practically invisible in RGB alone, necessitating the incorporation of 3D information. Existing approaches that employ point-cloud backbones suffer from suboptimal representations and reduced applicability due to slow processing. Re-training RGB backbones, designed for faster dense in… ▽ More RGB-based surface anomaly detection methods have advanced significantly. However, certain surface anomalies remain practically invisible in RGB alone, necessitating the incorporation of 3D information. Existing approaches that employ point-cloud backbones suffer from suboptimal representations and reduced applicability due to slow processing. Re-training RGB backbones, designed for faster dense input processing, on industrial depth datasets is hindered by the limited availability of sufficiently large datasets. We make several contributions to address these challenges. (i) We propose a novel Depth-Aware Discrete Autoencoder (DADA) architecture, that enables learning a general discrete latent space that jointly models RGB and 3D data for 3D surface anomaly detection. (ii) We tackle the lack of diverse industrial depth datasets by introducing a simulation process for learning informative depth features in the depth encoder. (iii) We propose a new surface anomaly detection method 3DSR, which outperforms all existing state-of-the-art on the challenging MVTec3D anomaly detection benchmark, both in terms of accuracy and processing speed. The experimental results validate the effectiveness and efficiency of our approach, highlighting the potential of utilizing depth information for improved surface anomaly detection. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: Accepted at WACV 2024

arXiv:2308.09618 [pdf, other]

LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark

Authors: Lojze Žust, Janez Perš, Matej Kristan

Abstract: The progress in maritime obstacle detection is hindered by the lack of a diverse dataset that adequately captures the complexity of general maritime environments. We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas. Our major contribution is the new dataset, which boasts the largest diversity in recording locations, scene types, ob… ▽ More The progress in maritime obstacle detection is hindered by the lack of a diverse dataset that adequately captures the complexity of general maritime environments. We present the first maritime panoptic obstacle detection benchmark LaRS, featuring scenes from Lakes, Rivers and Seas. Our major contribution is the new dataset, which boasts the largest diversity in recording locations, scene types, obstacle classes, and acquisition conditions among the related datasets. LaRS is composed of over 4000 per-pixel labeled key frames with nine preceding frames to allow utilization of the temporal texture, amounting to over 40k frames. Each key frame is annotated with 8 thing, 3 stuff classes and 19 global scene attributes. We report the results of 27 semantic and panoptic segmentation methods, along with several performance insights and future research directions. To enable objective evaluation, we have implemented an online evaluation server. The LaRS dataset, evaluation toolkit and benchmark are publicly available at: https://lojzezust.github.io/lars-dataset △ Less

Submitted 18 August, 2023; originally announced August 2023.

Comments: ICCV 2023, 9 pages, 8 figures

arXiv:2307.10847 [pdf, other]

Shortest Dominating Set Reconfiguration under Token Sliding

Authors: Jan Matyáš Křišťan, Jakub Svoboda

Abstract: In this paper, we present novel algorithms that efficiently compute a shortest reconfiguration sequence between two given dominating sets in trees and interval graphs under the Token Sliding model. In this problem, a graph is provided along with its two dominating sets, which can be imagined as tokens placed on vertices. The objective is to find a shortest sequence of dominating sets that transfor… ▽ More In this paper, we present novel algorithms that efficiently compute a shortest reconfiguration sequence between two given dominating sets in trees and interval graphs under the Token Sliding model. In this problem, a graph is provided along with its two dominating sets, which can be imagined as tokens placed on vertices. The objective is to find a shortest sequence of dominating sets that transforms one set into the other, with each set in the sequence resulting from sliding a single token in the previous set. While identifying any sequence has been well studied, our work presents the first polynomial algorithms for this optimization variant in the context of dominating sets. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: To appear at FCT 2023 (Fundamentals of Computation Theory)

arXiv:2304.11249 [pdf, other]

eWaSR -- an embedded-compute-ready maritime obstacle detection network

Authors: Matija Teršek, Lojze Žust, Matej Kristan

Abstract: Maritime obstacle detection is critical for safe navigation of autonomous surface vehicles (ASVs). While the accuracy of image-based detection methods has advanced substantially, their computational and memory requirements prohibit deployment on embedded devices. In this paper we analyze the currently best-performing maritime obstacle detection network WaSR. Based on the analysis we then propose r… ▽ More Maritime obstacle detection is critical for safe navigation of autonomous surface vehicles (ASVs). While the accuracy of image-based detection methods has advanced substantially, their computational and memory requirements prohibit deployment on embedded devices. In this paper we analyze the currently best-performing maritime obstacle detection network WaSR. Based on the analysis we then propose replacements for the most computationally intensive stages and propose its embedded-compute-ready variant eWaSR. In particular, the new design follows the most recent advancements of transformer-based lightweight networks. eWaSR achieves comparable detection results to state-of-the-art WaSR with only 0.52% F1 score performance drop and outperforms other state-of-the-art embedded-ready architectures by over 9.74% in F1 score. On a standard GPU, eWaSR runs 10x faster than the original WaSR (115 FPS vs 11 FPS). Tests on a real embedded device OAK-D show that, while WaSR cannot run due to memory restrictions, eWaSR runs comfortably at 5.5 FPS. This makes eWaSR the first practical embedded-compute-ready maritime obstacle detection network. The source code and trained eWaSR models are publicly available here: https://github.com/tersekmatija/eWaSR. △ Less

Submitted 21 April, 2023; originally announced April 2023.

Comments: 18 pages, 7 figures, submitted to MDPI Sensors

arXiv:2301.05155 [pdf, other]

Computing m-Eternal Domination Number of Cactus Graphs in Linear Time

Authors: Václav Blažej, Jan Matyáš Křišťan, Tomáš Valla

Abstract: In m-eternal domination attacker and defender play on a graph. Initially, the defender places guards on vertices. In each round, the attacker chooses a vertex to attack. Then, the defender can move each guard to a neighboring vertex and must move a guard to the attacked vertex. The m-eternal domination number is the minimum number of guards such that the graph can be defended indefinitely. In this… ▽ More In m-eternal domination attacker and defender play on a graph. Initially, the defender places guards on vertices. In each round, the attacker chooses a vertex to attack. Then, the defender can move each guard to a neighboring vertex and must move a guard to the attacked vertex. The m-eternal domination number is the minimum number of guards such that the graph can be defended indefinitely. In this paper, we study the m-eternal domination number of cactus graphs. We consider two variants of the m-eternal domination number: one allows multiple guards to occupy a single vertex, the second variant requires the guards to occupy distinct vertices. We develop several tools for obtaining lower and upper bounds on these problems and we use them to obtain an algorithm which computes the minimum number of required guards of cactus graphs for both variants of the problem. △ Less

Submitted 12 January, 2023; originally announced January 2023.

arXiv:2211.13508 [pdf, other]

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Authors: Benjamin Kiefer, Matej Kristan, Janez Perš, Lojze Žust, Fabio Poiesi, Fabio Augusto de Alcantara Andrade, Alexandre Bernardino, Matthew Dawkins, Jenni Raitoharju, Yitong Quan, Adem Atmaca, Timon Höfer, Qiming Zhang, Yufei Xu, Jing Zhang, Dacheng Tao, Lars Sommer, Raphael Spraul, Hangyue Zhao, Hongpu Zhang, Yanyun Zhao, Jan Lukas Augustin, Eui-ik Jeon, Impyeong Lee, Luca Zedda , et al. (48 additional authors not shown)

Abstract: The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detec… ▽ More The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi. △ Less

Submitted 28 November, 2022; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: MaCVi 2023 was part of WACV 2023. This report (38 pages) discusses the competition as part of MaCVi

arXiv:2211.08217 [pdf, other]

A Low-Shot Object Counting Network With Iterative Prototype Adaptation

Authors: Nikola Djukic, Alan Lukezic, Vitjan Zavrtanik, Matej Kristan

Abstract: We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot). The standard few-shot pipeline follows extraction of appearance queries from exemplars and matching them with image features to infer the object counts. Existing methods extract queries by feature pooling which neglects the shape information (e.g., si… ▽ More We consider low-shot counting of arbitrary semantic categories in the image using only few annotated exemplars (few-shot) or no exemplars (no-shot). The standard few-shot pipeline follows extraction of appearance queries from exemplars and matching them with image features to infer the object counts. Existing methods extract queries by feature pooling which neglects the shape information (e.g., size and aspect) and leads to a reduced object localization accuracy and count estimates. We propose a Low-shot Object Counting network with iterative prototype Adaptation (LOCA). Our main contribution is the new object prototype extraction module, which iteratively fuses the exemplar shape and appearance information with image features. The module is easily adapted to zero-shot scenarios, enabling LOCA to cover the entire spectrum of low-shot counting problems. LOCA outperforms all recent state-of-the-art methods on FSC147 benchmark by 20-30% in RMSE on one-shot and few-shot and achieves state-of-the-art on zero-shot scenarios, while demonstrating better generalization capabilities. △ Less

Submitted 28 September, 2023; v1 submitted 15 November, 2022; originally announced November 2022.

Comments: Accepted to ICCV2023, code: https://github.com/djukicn/loca

arXiv:2210.03436 [pdf, other]

Trans2k: Unlocking the Power of Deep Models for Transparent Object Tracking

Authors: Alan Lukezic, Ziga Trojer, Jiri Matas, Matej Kristan

Abstract: Visual object tracking has focused predominantly on opaque objects, while transparent object tracking received very little attention. Motivated by the uniqueness of transparent objects in that their appearance is directly affected by the background, the first dedicated evaluation dataset has emerged recently. We contribute to this effort by proposing the first transparent object tracking training… ▽ More Visual object tracking has focused predominantly on opaque objects, while transparent object tracking received very little attention. Motivated by the uniqueness of transparent objects in that their appearance is directly affected by the background, the first dedicated evaluation dataset has emerged recently. We contribute to this effort by proposing the first transparent object tracking training dataset Trans2k that consists of over 2k sequences with 104,343 images overall, annotated by bounding boxes and segmentation masks. Noting that transparent objects can be realistically rendered by modern renderers, we quantify domain-specific attributes and render the dataset containing visual attributes and tracking situations not covered in the existing object training datasets. We observe a consistent performance boost (up to 16%) across a diverse set of modern tracking architectures when trained using Trans2k, and show insights not previously possible due to the lack of appropriate training sets. The dataset and the rendering engine will be publicly released to unlock the power of modern learning-based trackers and foster new designs in transparent object tracking. △ Less

Submitted 7 October, 2022; originally announced October 2022.

Comments: Accepted to BMVC 2022. Project page: https://github.com/trojerz/Trans2k

arXiv:2208.01521 [pdf, other]

DSR -- A dual subspace re-projection network for surface anomaly detection

Authors: Vitjan Zavrtanik, Matej Kristan, Danijel Skočaj

Abstract: The state-of-the-art in discriminative unsupervised surface anomaly detection relies on external datasets for synthesizing anomaly-augmented training images. Such approaches are prone to failure on near-in-distribution anomalies since these are difficult to be synthesized realistically due to their similarity to anomaly-free regions. We propose an architecture based on quantized feature space repr… ▽ More The state-of-the-art in discriminative unsupervised surface anomaly detection relies on external datasets for synthesizing anomaly-augmented training images. Such approaches are prone to failure on near-in-distribution anomalies since these are difficult to be synthesized realistically due to their similarity to anomaly-free regions. We propose an architecture based on quantized feature space representation with dual decoders, DSR, that avoids the image-level anomaly synthesis requirement. Without making any assumptions about the visual properties of anomalies, DSR generates the anomalies at the feature level by sampling the learned quantized feature space, which allows a controlled generation of near-in-distribution anomalies. DSR achieves state-of-the-art results on the KSDD2 and MVTec anomaly detection datasets. The experiments on the challenging real-world KSDD2 dataset show that DSR significantly outperforms other unsupervised surface anomaly detection methods, improving the previous top-performing methods by 10% AP in anomaly detection and 35% AP in anomaly localization. △ Less

Submitted 23 November, 2022; v1 submitted 2 August, 2022; originally announced August 2022.

Comments: Presented at ECCV2022

arXiv:2206.13263 [pdf, other]

doi 10.3390/s22239139

Learning with Weak Annotations for Robust Maritime Obstacle Detection

Authors: Lojze Žust, Matej Kristan

Abstract: Robust maritime obstacle detection is critical for safe navigation of autonomous boats and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. However, per-pixel ground truth labeling of such datasets is labor-intensive and expensive. We propose a new scaffolding learning regime (SLR) that leverages weak annotations consisting… ▽ More Robust maritime obstacle detection is critical for safe navigation of autonomous boats and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. However, per-pixel ground truth labeling of such datasets is labor-intensive and expensive. We propose a new scaffolding learning regime (SLR) that leverages weak annotations consisting of water edges, the horizon location, and obstacle bounding boxes to train segmentation-based obstacle detection networks, thereby reducing the required ground truth labeling effort by a factor of twenty. SLR trains an initial model from weak annotations and then alternates between re-estimating the segmentation pseudo-labels and improving the network parameters. Experiments show that maritime obstacle segmentation networks trained using SLR on weak annotations not only match but outperform the same networks trained with dense ground truth labels, which is a remarkable result. In addition to the increased accuracy, SLR also increases domain generalization and can be used for domain adaptation with a low manual annotation load. The SLR code and pre-trained models are available at https://github.com/lojzezust/SLR . △ Less

Submitted 25 November, 2022; v1 submitted 27 June, 2022; originally announced June 2022.

Comments: Published in MDPI Sensors, 23 pages, 8 figures

Journal ref: Sensors 2022, 22, 9139

arXiv:2204.02720 [pdf, other]

Efficient attack sequences in m-eternal domination

Authors: Václav Blažej, Jan Matyáš Křišťan, Tomáš Valla

Abstract: We study the m-eternal domination problem from the perspective of the attacker. For many graph classes, the minimum required number of guards to defend eternally is known. By definition, if the defender has less than the required number of guards, then there exists a sequence of attacks that ensures the attacker's victory. Little is known about such sequences of attacks, in particular, no bound on… ▽ More We study the m-eternal domination problem from the perspective of the attacker. For many graph classes, the minimum required number of guards to defend eternally is known. By definition, if the defender has less than the required number of guards, then there exists a sequence of attacks that ensures the attacker's victory. Little is known about such sequences of attacks, in particular, no bound on its length is known. We show that if the game is played on a tree $T$ on $n$ vertices and the defender has less than the necessary number of guards, then the attacker can win in at most $n$ turns. Furthermore, we present an efficient procedure that produces such an attacking strategy. △ Less

Submitted 6 April, 2022; originally announced April 2022.

arXiv:2203.05352 [pdf, other]

Temporal Context for Robust Maritime Obstacle Detection

Authors: Lojze Žust, Matej Kristan

Abstract: Robust maritime obstacle detection is essential for fully autonomous unmanned surface vehicles (USVs). The currently widely adopted segmentation-based obstacle detection methods are prone to misclassification of object reflections and sun glitter as obstacles, producing many false positive detections, effectively rendering the methods impractical for USV navigation. However, water-turbulence-induc… ▽ More Robust maritime obstacle detection is essential for fully autonomous unmanned surface vehicles (USVs). The currently widely adopted segmentation-based obstacle detection methods are prone to misclassification of object reflections and sun glitter as obstacles, producing many false positive detections, effectively rendering the methods impractical for USV navigation. However, water-turbulence-induced temporal appearance changes on object reflections are very distinctive from the appearance dynamics of true objects. We harness this property to design WaSR-T, a novel maritime obstacle detection network, that extracts the temporal context from a sequence of recent frames to reduce ambiguity. By learning the local temporal characteristics of object reflection on the water surface, WaSR-T substantially improves obstacle detection accuracy in the presence of reflections and glitter. Compared with existing single-frame methods, WaSR-T reduces the number of false positive detections by 41% overall and by over 53% within the danger zone of the boat, while preserving a high recall, and achieving new state-of-the-art performance on the challenging MODS maritime obstacle detection benchmark. The code, pretrained models and extended datasets are available at https://github.com/lojzezust/WaSR-T △ Less

Submitted 3 August, 2022; v1 submitted 10 March, 2022; originally announced March 2022.

Comments: 7 pages, 6 figures, accepted to IROS 2022, for code & data visit https://github.com/lojzezust/WaSR-T

arXiv:2202.11927 [pdf, other]

Polynomial Kernels for Tracking Shortest Paths

Authors: Václav Blažej, Pratibha Choudhary, Dušan Knop, Jan Matyáš Křišťan, Ondřej Suchý, Tomáš Valla

Abstract: Given an undirected graph $G=(V,E)$, vertices $s,t\in V$, and an integer $k$, Tracking Shortest Paths requires deciding whether there exists a set of $k$ vertices $T\subseteq V$ such that for any two distinct shortest paths between $s$ and $t$, say $P_1$ and $P_2$, we have $T\cap V(P_1)\neq T\cap V(P_2)$. In this paper, we give the first polynomial size kernel for the problem. Specifically we show… ▽ More Given an undirected graph $G=(V,E)$, vertices $s,t\in V$, and an integer $k$, Tracking Shortest Paths requires deciding whether there exists a set of $k$ vertices $T\subseteq V$ such that for any two distinct shortest paths between $s$ and $t$, say $P_1$ and $P_2$, we have $T\cap V(P_1)\neq T\cap V(P_2)$. In this paper, we give the first polynomial size kernel for the problem. Specifically we show the existence of a kernel with $\mathcal{O}(k^2)$ vertices and edges in general graphs and a kernel with $\mathcal{O}(k)$ vertices and edges in planar graphs for the Tracking Paths in DAG problem. This problem admits a polynomial parameter transformation to Tracking Shortest Paths, and this implies a kernel with $\mathcal{O}(k^4)$ vertices and edges for Tracking Shortest Paths in general graphs and a kernel with $\mathcal{O}(k^2)$ vertices and edges in planar graphs. Based on the above we also give a single exponential algorithm for Tracking Shortest Paths in planar graphs. △ Less

Submitted 24 February, 2022; originally announced February 2022.

arXiv:2112.11846 [pdf, other]

A Discriminative Single-Shot Segmentation Network for Visual Object Tracking

Authors: Alan Lukežič, Jiří Matas, Matej Kristan

Abstract: Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker -- D3S2, which narrows the gap between visual object tracking and video object segmentation. A si… ▽ More Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker -- D3S2, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve robust online target segmentation. The overall tracking reliability is further increased by decoupling the object and feature scale estimation. Without per-dataset finetuning, and trained only for segmentation as the primary output, D3S2 outperforms all published trackers on the recent short-term tracking benchmark VOT2020 and performs very close to the state-of-the-art trackers on the GOT-10k, TrackingNet, OTB100 and LaSoT. D3S2 outperforms the leading segmentation tracker SiamMask on video object segmentation benchmarks and performs on par with top video object segmentation algorithms. △ Less

Submitted 27 December, 2021; v1 submitted 22 December, 2021; originally announced December 2021.

Comments: Extended version of the D3S tracker (CVPR2020). Accepted to IEEE TPAMI. arXiv admin note: substantial text overlap with arXiv:1911.08862

arXiv:2108.07610 [pdf, other]

DRAEM -- A discriminatively trained reconstruction embedding for surface anomaly detection

Authors: Vitjan Zavrtanik, Matej Kristan, Danijel Skočaj

Abstract: Visual surface anomaly detection aims to detect local image regions that significantly deviate from normal appearance. Recent surface anomaly detection methods rely on generative models to accurately reconstruct the normal areas and to fail on anomalies. These methods are trained only on anomaly-free images, and often require hand-crafted post-processing steps to localize the anomalies, which proh… ▽ More Visual surface anomaly detection aims to detect local image regions that significantly deviate from normal appearance. Recent surface anomaly detection methods rely on generative models to accurately reconstruct the normal areas and to fail on anomalies. These methods are trained only on anomaly-free images, and often require hand-crafted post-processing steps to localize the anomalies, which prohibits optimizing the feature extraction for maximal detection capability. In addition to reconstructive approach, we cast surface anomaly detection primarily as a discriminative problem and propose a discriminatively trained reconstruction anomaly embedding model (DRAEM). The proposed method learns a joint representation of an anomalous image and its anomaly-free reconstruction, while simultaneously learning a decision boundary between normal and anomalous examples. The method enables direct anomaly localization without the need for additional complicated post-processing of the network output and can be trained using simple and general anomaly simulations. On the challenging MVTec anomaly detection dataset, DRAEM outperforms the current state-of-the-art unsupervised methods by a large margin and even delivers detection performance close to the fully-supervised methods on the widely used DAGM surface-defect detection dataset, while substantially outperforming them in localization accuracy. △ Less

Submitted 27 September, 2021; v1 submitted 17 August, 2021; originally announced August 2021.

Comments: Accepted to ICCV2021

arXiv:2108.01430 [pdf, other]

Constant Factor Approximation for Tracking Paths and Fault Tolerant Feedback Vertex Set

Authors: Václav Blažej, Pratibha Choudhary, Dušan Knop, Jan Matyáš Křišťan, Ondřej Suchý, Tomáš Valla

Abstract: Consider a vertex-weighted graph $G$ with a source $s$ and a target $t$. Tracking Paths requires finding a minimum weight set of vertices (trackers) such that the sequence of trackers in each path from $s$ to $t$ is unique. In this work, we derive a factor $6$-approximation algorithm for Tracking Paths in weighted graphs and a factor $4$-approximation algorithm if the input is unweighted. This is… ▽ More Consider a vertex-weighted graph $G$ with a source $s$ and a target $t$. Tracking Paths requires finding a minimum weight set of vertices (trackers) such that the sequence of trackers in each path from $s$ to $t$ is unique. In this work, we derive a factor $6$-approximation algorithm for Tracking Paths in weighted graphs and a factor $4$-approximation algorithm if the input is unweighted. This is the first constant factor approximation for this problem. While doing so, we also study approximation of the closely related $r$-Fault Tolerant Feedback Vertex Set problem. There, for a fixed integer $r$ and a given vertex-weighted graph $G$, the task is to find a minimum weight set of vertices intersecting every cycle of $G$ in at least $r+1$ vertices. We give a factor $\mathcal{O}(r)$ approximation algorithm for $r$-Fault Tolerant Feedback Vertex Set if $r$ is a constant. △ Less

Submitted 24 February, 2022; v1 submitted 3 August, 2021; originally announced August 2021.

arXiv:2108.00564 [pdf, other]

Learning Maritime Obstacle Detection from Weak Annotations by Scaffolding

Authors: Lojze Žust, Matej Kristan

Abstract: Coastal water autonomous boats rely on robust perception methods for obstacle detection and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. Per-pixel ground truth labeling of such datasets, however, is labor-intensive and expensive. We observe that far less information is required for practical obstacle avoidance - the loca… ▽ More Coastal water autonomous boats rely on robust perception methods for obstacle detection and timely collision avoidance. The current state-of-the-art is based on deep segmentation networks trained on large datasets. Per-pixel ground truth labeling of such datasets, however, is labor-intensive and expensive. We observe that far less information is required for practical obstacle avoidance - the location of water edge on static obstacles like shore and approximate location and bounds of dynamic obstacles in the water is sufficient to plan a reaction. We propose a new scaffolding learning regime (SLR) that allows training obstacle detection segmentation networks only from such weak annotations, thus significantly reducing the cost of ground-truth labeling. Experiments show that maritime obstacle segmentation networks trained using SLR substantially outperform the same networks trained with dense ground truth labels. Thus accuracy is not sacrificed for labelling simplicity but is in fact improved, which is a remarkable result. △ Less

Submitted 1 August, 2021; originally announced August 2021.

Comments: Accepted to WACV 2022

arXiv:2105.02359 [pdf, other]

doi 10.1109/TITS.2021.3124192

MODS -- A USV-oriented object detection and obstacle segmentation benchmark

Authors: Borja Bovcon, Jon Muhovič, Duško Vranac, Dean Mozetič, Janez Perš, Matej Kristan

Abstract: Small-sized unmanned surface vehicles (USV) are coastal water devices with a broad range of applications such as environmental control and surveillance. A crucial capability for autonomous operation is obstacle detection for timely reaction and collision avoidance, which has been recently explored in the context of camera-based visual scene interpretation. Owing to curated datasets, substantial ad… ▽ More Small-sized unmanned surface vehicles (USV) are coastal water devices with a broad range of applications such as environmental control and surveillance. A crucial capability for autonomous operation is obstacle detection for timely reaction and collision avoidance, which has been recently explored in the context of camera-based visual scene interpretation. Owing to curated datasets, substantial advances in scene interpretation have been made in a related field of unmanned ground vehicles. However, the current maritime datasets do not adequately capture the complexity of real-world USV scenes and the evaluation protocols are not standardised, which makes cross-paper comparison of different methods difficult and hinders the progress. To address these issues, we introduce a new obstacle detection benchmark MODS, which considers two major perception tasks: maritime object detection and the more general maritime obstacle segmentation. We present a new diverse maritime evaluation dataset containing approximately 81k stereo images synchronized with an on-board IMU, with over 60k objects annotated. We propose a new obstacle segmentation performance evaluation protocol that reflects the detection accuracy in a way meaningful for practical USV navigation. Nineteen recent state-of-the-art object detection and obstacle segmentation methods are evaluated using the proposed protocol, creating a benchmark to facilitate development of the field. The proposed dataset, as well as evaluation routines, are made publicly available at vicos.si/resources. △ Less

Submitted 9 February, 2022; v1 submitted 5 May, 2021; originally announced May 2021.

Comments: 16 pages, 15 figures. The dataset, as well as the proposed evaluation protocols, are published on our website: https://www.vicos.si/resources/

arXiv:2001.01921 [pdf, other]

A water-obstacle separation and refinement network for unmanned surface vehicles

Authors: Borja Bovcon, Matej Kristan

Abstract: Obstacle detection by semantic segmentation shows a great promise for autonomous navigation in unmanned surface vehicles (USV). However, existing methods suffer from poor estimation of the water edge in the presence of visual ambiguities, poor detection of small obstacles and high false-positive rate on water reflections and wakes. We propose a new deep encoder-decoder architecture, a water-obstac… ▽ More Obstacle detection by semantic segmentation shows a great promise for autonomous navigation in unmanned surface vehicles (USV). However, existing methods suffer from poor estimation of the water edge in the presence of visual ambiguities, poor detection of small obstacles and high false-positive rate on water reflections and wakes. We propose a new deep encoder-decoder architecture, a water-obstacle separation and refinement network (WaSR), to address these issues. Detection and water edge accuracy are improved by a novel decoder that gradually fuses inertial information from IMU with the visual features from the encoder. In addition, a novel loss function is designed to increase the separation between water and obstacle features early on in the network. Subsequently, the capacity of the remaining layers in the decoder is better utilised, leading to a significant reduction in false positives and increased true positives. Experimental results show that WaSR outperforms the current state-of-the-art by a large margin, yielding a 14% increase in F-measure over the second-best method. △ Less

Submitted 7 January, 2020; originally announced January 2020.

Comments: 6 pages + references, 6 figures, submitted to ICRA2020. MODD2 and MaSTr1325 datasets are available at http://box.vicos.si/borja/viamaro/index.html

arXiv:1912.00660 [pdf, other]

DAL -- A Deep Depth-aware Long-term Tracker

Authors: Yanlin Qian, Alan Lukežič, Matej Kristan, Joni-Kristian Kämäräinen, Jiri Matas

Abstract: The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run. We reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep f… ▽ More The best RGBD trackers provide high accuracy but are slow to run. On the other hand, the best RGB trackers are fast but clearly inferior on the RGBD datasets. In this work, we propose a deep depth-aware long-term tracker that achieves state-of-the-art RGBD tracking performance and is fast to run. We reformulate deep discriminative correlation filter (DCF) to embed the depth information into deep features. Moreover, the same depth-aware correlation filter is used for target re-detection. Comprehensive evaluations show that the proposed tracker achieves state-of-the-art performance on the Princeton RGBD, STC, and the newly-released CDTB benchmarks and runs 20 fps. △ Less

Submitted 2 December, 2019; originally announced December 2019.

Comments: 10 pages

arXiv:1911.08862 [pdf, other]

D3S -- A Discriminative Single Shot Segmentation Tracker

Authors: Alan Lukežič, Jiří Matas, Matej Kristan

Abstract: Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker - D3S, which narrows the gap between visual object tracking and video object segmentation. A sing… ▽ More Template-based discriminative trackers are currently the dominant tracking paradigm due to their robustness, but are restricted to bounding box tracking and a limited range of transformation models, which reduces their localization accuracy. We propose a discriminative single-shot segmentation tracker - D3S, which narrows the gap between visual object tracking and video object segmentation. A single-shot network applies two target models with complementary geometric properties, one invariant to a broad range of transformations, including non-rigid deformations, the other assuming a rigid object to simultaneously achieve high robustness and online target segmentation. Without per-dataset finetuning and trained only for segmentation as the primary output, D3S outperforms all trackers on VOT2016, VOT2018 and GOT-10k benchmarks and performs close to the state-of-the-art trackers on the TrackingNet. D3S outperforms the leading segmentation tracker SiamMask on video object segmentation benchmark and performs on par with top video object segmentation algorithms, while running an order of magnitude faster, close to real-time. △ Less

Submitted 14 April, 2020; v1 submitted 20 November, 2019; originally announced November 2019.

Comments: The paper is accepted to the CVPR2020

arXiv:1907.07910 [pdf, other]

On the m-eternal Domination Number of Cactus Graphs

Authors: Václav Blažej, Jan Matyáš Křišťan, Tomáš Valla

Abstract: Given a graph $G$, guards are placed on vertices of $G$. Then vertices are subject to an infinite sequence of attacks so that each attack must be defended by a guard moving from a neighboring vertex. The m-eternal domination number is the minimum number of guards such that the graph can be defended indefinitely. In this paper we study the m-eternal domination number of cactus graphs, that is, conn… ▽ More Given a graph $G$, guards are placed on vertices of $G$. Then vertices are subject to an infinite sequence of attacks so that each attack must be defended by a guard moving from a neighboring vertex. The m-eternal domination number is the minimum number of guards such that the graph can be defended indefinitely. In this paper we study the m-eternal domination number of cactus graphs, that is, connected graphs where each edge lies in at most two cycles, and we consider three variants of the m-eternal domination number: first variant allows multiple guards to occupy a single vertex, second variant does not allow it, and in the third variant additional "eviction" attacks must be defended. We provide a new upper bound for the m-eternal domination number of cactus graphs, and for a subclass of cactus graphs called Christmas cactus graphs, where each vertex lies in at most two cycles, we prove that these three numbers are equal. Moreover, we present a linear-time algorithm for computing them. △ Less

Submitted 18 July, 2019; originally announced July 2019.

arXiv:1907.00618 [pdf, other]

CDTB: A Color and Depth Visual Object Tracking Dataset and Benchmark

Authors: Alan Lukežič, Ugur Kart, Jani Käpylä, Ahmed Durmush, Joni-Kristian Kämäräinen, Jiří Matas, Matej Kristan

Abstract: A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize th… ▽ More A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term tackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers. △ Less

Submitted 1 July, 2019; originally announced July 2019.

arXiv:1906.08675 [pdf, other]

Performance Evaluation Methodology for Long-Term Visual Object Tracking

Authors: Alan Lukežič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas, Matej Kristan

Abstract: A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize th… ▽ More A long-term visual object tracking performance evaluation methodology and a benchmark are proposed. Performance measures are designed by following a long-term tracking definition to maximize the analysis probing strength. The new measures outperform existing ones in interpretation potential and in better distinguishing between different tracking behaviors. We show that these measures generalize the short-term performance measures, thus linking the two tracking problems. Furthermore, the new measures are highly robust to temporal annotation sparsity and allow annotation of sequences hundreds of times longer than in the current datasets without increasing manual annotation labor. A new challenging dataset of carefully selected sequences with many target disappearances is proposed. A new tracking taxonomy is proposed to position trackers on the short-term/long-term spectrum. The benchmark contains an extensive evaluation of the largest number of long-term tackers and comparison to state-of-the-art short-term trackers. We analyze the influence of tracking architecture implementations to long-term performance and explore various re-detection strategies as well as influence of visual model update strategies to long-term tracking drift. The methodology is integrated in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate future development of long-term trackers. △ Less

Submitted 19 June, 2019; originally announced June 2019.

Comments: Submitted to a journal on June 2018. arXiv admin note: substantial text overlap with arXiv:1804.07056

arXiv:1902.07474 [pdf, other]

doi 10.1007/s11263-019-01282-1

Spatially-Adaptive Filter Units for Compact and Efficient Deep Neural Networks

Authors: Domen Tabernik, Matej Kristan, Aleš Leonardis

Abstract: Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, that has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-co… ▽ More Convolutional neural networks excel in a number of computer vision tasks. One of their most crucial architectural elements is the effective receptive field size, that has to be manually set to accommodate a specific task. Standard solutions involve large kernels, down/up-sampling and dilated convolutions. These require testing a variety of dilation and down/up-sampling factors and result in non-compact representations and excessive number of parameters. We address this issue by proposing a new convolution filter composed of displaced aggregation units (DAU). DAUs learn spatial displacements and adapt the receptive field sizes of individual convolution filters to a given problem, thus eliminating the need for hand-crafted modifications. DAUs provide a seamless substitution of convolutional filters in existing state-of-the-art architectures, which we demonstrate on AlexNet, ResNet50, ResNet101, DeepLab and SRN-DeblurNet. The benefits of this design are demonstrated on a variety of computer vision tasks and datasets, such as image classification (ILSVRC 2012), semantic segmentation (PASCAL VOC 2011, Cityscape) and blind image de-blurring (GOPRO). Results show that DAUs efficiently allocate parameters resulting in up to four times more compact networks at similar or better performance. △ Less

Submitted 6 February, 2020; v1 submitted 20 February, 2019; originally announced February 2019.

Comments: Accepted for publication in International Journal of Computer Vision, Jan 02 2020

arXiv:1811.10863 [pdf, other]

Object Tracking by Reconstruction with View-Specific Discriminative Correlation Filters

Authors: Ugur Kart, Alan Lukezic, Matej Kristan, Joni-Kristian Kamarainen, Jiri Matas

Abstract: Standard RGB-D trackers treat the target as an inherently 2D structure, which makes modelling appearance changes related even to simple out-of-plane rotation highly challenging. We address this limitation by proposing a novel long-term RGB-D tracker - Object Tracking by Reconstruction (OTR). The tracker performs online 3D target reconstruction to facilitate robust learning of a set of view-specifi… ▽ More Standard RGB-D trackers treat the target as an inherently 2D structure, which makes modelling appearance changes related even to simple out-of-plane rotation highly challenging. We address this limitation by proposing a novel long-term RGB-D tracker - Object Tracking by Reconstruction (OTR). The tracker performs online 3D target reconstruction to facilitate robust learning of a set of view-specific discriminative correlation filters (DCFs). The 3D reconstruction supports two performance-enhancing features: (i) generation of accurate spatial support for constrained DCF learning from its 2D projection and (ii) point cloud based estimation of 3D pose change for selection and storage of view-specific DCFs which are used to robustly localize the target after out-of-view rotation or heavy occlusion. Extensive evaluation of OTR on the challenging Princeton RGB-D tracking and STC Benchmarks shows it outperforms the state-of-the-art by a large margin. △ Less

Submitted 27 November, 2018; originally announced November 2018.

arXiv:1804.07056 [pdf, other]

Now you see me: evaluating performance in long-term visual tracking

Authors: Alan Lukežič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas, Matej Kristan

Abstract: We propose a new long-term tracking performance evaluation methodology and present a new challenging dataset of carefully selected sequences with many target disappearances. We perform an extensive evaluation of six long-term and nine short-term state-of-the-art trackers, using new performance measures, suitable for evaluating long-term tracking - tracking precision, recall and F-score. The evalua… ▽ More We propose a new long-term tracking performance evaluation methodology and present a new challenging dataset of carefully selected sequences with many target disappearances. We perform an extensive evaluation of six long-term and nine short-term state-of-the-art trackers, using new performance measures, suitable for evaluating long-term tracking - tracking precision, recall and F-score. The evaluation shows that a good model update strategy and the capability of image-wide re-detection are critical for long-term tracking performance. We integrated the methodology in the VOT toolkit to automate experimental analysis and benchmarking and to facilitate the development of long-term trackers. △ Less

Submitted 19 April, 2018; originally announced April 2018.

arXiv:1802.07956 [pdf, other]

doi 10.1016/j.robot.2018.02.017

Stereo obstacle detection for unmanned surface vehicles by IMU-assisted semantic segmentation

Authors: Borja Bovcon, Rok Mandeljc, Janez Perš, Matej Kristan

Abstract: A new obstacle detection algorithm for unmanned surface vehicles (USVs) is presented. A state-of-the-art graphical model for semantic segmentation is extended to incorporate boat pitch and roll measurements from the on-board inertial measurement unit (IMU), and a stereo verification algorithm that consolidates tentative detections obtained from the segmentation is proposed. The IMU readings are us… ▽ More A new obstacle detection algorithm for unmanned surface vehicles (USVs) is presented. A state-of-the-art graphical model for semantic segmentation is extended to incorporate boat pitch and roll measurements from the on-board inertial measurement unit (IMU), and a stereo verification algorithm that consolidates tentative detections obtained from the segmentation is proposed. The IMU readings are used to estimate the location of horizon line in the image, which automatically adjusts the priors in the probabilistic semantic segmentation model. We derive the equations for projecting the horizon into images, propose an efficient optimization algorithm for the extended graphical model, and offer a practical IMU-camera-USV calibration procedure. Using an USV equipped with multiple synchronized sensors, we captured a new challenging multi-modal dataset, and annotated its images with water edge and obstacles. Experimental results show that the proposed algorithm significantly outperforms the state of the art, with nearly 30% improvement in water-edge detection accuracy, an over 21% reduction of false positive rate, an almost 60% reduction of false negative rate, and an over 65% increase of true positive rate, while its Matlab implementation runs in real-time. △ Less

Submitted 22 February, 2018; originally announced February 2018.

Comments: 14 pages, 18 figures, new publicly available multi-modal obstacle detection dataset

arXiv:1711.11473 [pdf, other]

Spatially-Adaptive Filter Units for Deep Neural Networks

Authors: Domen Tabernik, Matej Kristan, Aleš Leonardis

Abstract: Classical deep convolutional networks increase receptive field size by either gradual resolution reduction or application of hand-crafted dilated convolutions to prevent increase in the number of parameters. In this paper we propose a novel displaced aggregation unit (DAU) that does not require hand-crafting. In contrast to classical filters with units (pixels) placed on a fixed regular grid, the… ▽ More Classical deep convolutional networks increase receptive field size by either gradual resolution reduction or application of hand-crafted dilated convolutions to prevent increase in the number of parameters. In this paper we propose a novel displaced aggregation unit (DAU) that does not require hand-crafting. In contrast to classical filters with units (pixels) placed on a fixed regular grid, the displacement of the DAUs are learned, which enables filters to spatially-adapt their receptive field to a given problem. We extensively demonstrate the strength of DAUs on a classification and semantic segmentation tasks. Compared to ConvNets with regular filter, ConvNets with DAUs achieve comparable performance at faster convergence and up to 3-times reduction in parameters. Furthermore, DAUs allow us to study deep networks from novel perspectives. We study spatial distributions of DAU filters and analyze the number of parameters allocated for spatial coverage in a filter. △ Less

Submitted 15 March, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

Comments: Accepted to Computer Vision and Pattern Recognition 2018

arXiv:1711.09594 [pdf, other]

FuCoLoT -- A Fully-Correlational Long-Term Tracker

Authors: Alan Lukežič, Luka Čehovin Zajc, Tomáš Vojíř, Jiří Matas, Matej Kristan

Abstract: We propose FuCoLoT -- a Fully Correlational Long-term Tracker. It exploits the novel DCF constrained filter learning method to design a detector that is able to re-detect the target in the whole image efficiently. FuCoLoT maintains several correlation filters trained on different time scales that act as the detector components. A novel mechanism based on the correlation response is used for tracki… ▽ More We propose FuCoLoT -- a Fully Correlational Long-term Tracker. It exploits the novel DCF constrained filter learning method to design a detector that is able to re-detect the target in the whole image efficiently. FuCoLoT maintains several correlation filters trained on different time scales that act as the detector components. A novel mechanism based on the correlation response is used for tracking failure estimation. FuCoLoT achieves state-of-the-art results on standard short-term benchmarks and it outperforms the current best-performing tracker on the long-term UAV20L benchmark by over 19%. It has an order of magnitude smaller memory footprint than its best-performing competitors and runs at 15fps in a single CPU thread. △ Less

Submitted 14 January, 2019; v1 submitted 27 November, 2017; originally announced November 2017.

arXiv:1612.00089 [pdf, other]

Beyond standard benchmarks: Parameterizing performance evaluation in visual object tracking

Authors: Luka Čehovin Zajc, Alan Lukežič, Aleš Leonardis, Matej Kristan

Abstract: Object-to-camera motion produces a variety of apparent motion patterns that significantly affect performance of short-term visual trackers. Despite being crucial for designing robust trackers, their influence is poorly explored in standard benchmarks due to weakly defined, biased and overlapping attribute annotations. In this paper we propose to go beyond pre-recorded benchmarks with post-hoc anno… ▽ More Object-to-camera motion produces a variety of apparent motion patterns that significantly affect performance of short-term visual trackers. Despite being crucial for designing robust trackers, their influence is poorly explored in standard benchmarks due to weakly defined, biased and overlapping attribute annotations. In this paper we propose to go beyond pre-recorded benchmarks with post-hoc annotations by presenting an approach that utilizes omnidirectional videos to generate realistic, consistently annotated, short-term tracking scenarios with exactly parameterized motion patterns. We have created an evaluation system, constructed a fully annotated dataset of omnidirectional videos and the generators for typical motion patterns. We provide an in-depth analysis of major tracking paradigms which is complementary to the standard benchmarks and confirms the expressiveness of our evaluation approach. △ Less

Submitted 25 March, 2017; v1 submitted 30 November, 2016; originally announced December 2016.

arXiv:1611.08461 [pdf, other]

doi 10.1007/s11263-017-1061-3

Discriminative Correlation Filter with Channel and Spatial Reliability

Authors: Alan Lukežič, Tomáš Vojíř, Luka Čehovin, Jiří Matas, Matej Kristan

Abstract: Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to… ▽ More Short-term tracking is an open and challenging problem for which discriminative correlation filters (DCF) have shown excellent performance. We introduce the channel and spatial reliability concepts to DCF tracking and provide a novel learning algorithm for its efficient and seamless integration in the filter update and the tracking process. The spatial reliability map adjusts the filter support to the part of the object suitable for tracking. This both allows to enlarge the search region and improves tracking of non-rectangular objects. Reliability scores reflect channel-wise quality of the learned filters and are used as feature weighting coefficients in localization. Experimentally, with only two simple standard features, HoGs and Colornames, the novel CSR-DCF method -- DCF with Channel and Spatial Reliability -- achieves state-of-the-art results on VOT 2016, VOT 2015 and OTB100. The CSR-DCF runs in real-time on a CPU. △ Less

Submitted 14 January, 2019; v1 submitted 25 November, 2016; originally announced November 2016.

Comments: Accepted to: International Journal of Computer Vision: https://link.springer.com/article/10.1007/s11263-017-1061-3

arXiv:1609.03795 [pdf, other]

Towards Deep Compositional Networks

Authors: Domen Tabernik, Matej Kristan, Jeremy L. Wyatt, Aleš Leonardis

Abstract: Hierarchical feature learning based on convolutional neural networks (CNN) has recently shown significant potential in various computer vision tasks. While allowing high-quality discriminative feature learning, the downside of CNNs is the lack of explicit structure in features, which often leads to overfitting, absence of reconstruction from partial observations and limited generative abilities. E… ▽ More Hierarchical feature learning based on convolutional neural networks (CNN) has recently shown significant potential in various computer vision tasks. While allowing high-quality discriminative feature learning, the downside of CNNs is the lack of explicit structure in features, which often leads to overfitting, absence of reconstruction from partial observations and limited generative abilities. Explicit structure is inherent in hierarchical compositional models, however, these lack the ability to optimize a well-defined cost function. We propose a novel analytic model of a basic unit in a layered hierarchical model with both explicit compositional structure and a well-defined discriminative cost function. Our experiments on two datasets show that the proposed compositional model performs on a par with standard CNNs on discriminative tasks, while, due to explicit modeling of the structure in the feature units, affording a straight-forward visualization of parts and faster inference due to separability of the units. Actions △ Less

Submitted 13 September, 2016; originally announced September 2016.

Comments: Published in proceedings of 23th International Conference on Pattern Recognition (ICPR 2016)

arXiv:1605.03720 [pdf, other]

Deformable Parts Correlation Filters for Robust Visual Tracking

Authors: Alan Lukežič, Luka Čehovin, Matej Kristan

Abstract: Deformable parts models show a great potential in tracking by principally addressing non-rigid object deformations and self occlusions, but according to recent benchmarks, they often lag behind the holistic approaches. The reason is that potentially large number of degrees of freedom have to be estimated for object localization and simplifications of the constellation topology are often assumed to… ▽ More Deformable parts models show a great potential in tracking by principally addressing non-rigid object deformations and self occlusions, but according to recent benchmarks, they often lag behind the holistic approaches. The reason is that potentially large number of degrees of freedom have to be estimated for object localization and simplifications of the constellation topology are often assumed to make the inference tractable. We present a new formulation of the constellation model with correlation filters that treats the geometric and visual constraints within a single convex cost function and derive a highly efficient optimization for MAP inference of a fully-connected constellation. We propose a tracker that models the object at two levels of detail. The coarse level corresponds a root correlation filter and a novel color model for approximate object localization, while the mid-level representation is composed of the new deformable constellation of correlation filters that refine the object location. The resulting tracker is rigorously analyzed on a highly challenging OTB, VOT2014 and VOT2015 benchmarks, exhibits a state-of-the-art performance and runs in real-time. △ Less

Submitted 12 May, 2016; originally announced May 2016.

Comments: 14 pages, first submission to jurnal: 9.11.2015, re-submission on 11.5.2016

arXiv:1603.02649 [pdf, other]

A regularization-based approach for unsupervised image segmentation

Authors: Aleksandar Dimitriev, Matej Kristan

Abstract: We propose a novel unsupervised image segmentation algorithm, which aims to segment an image into several coherent parts. It requires no user input, no supervised learning phase and assumes an unknown number of segments. It achieves this by first over-segmenting the image into several hundred superpixels. These are iteratively joined on the basis of a discriminative classifier trained on color and… ▽ More We propose a novel unsupervised image segmentation algorithm, which aims to segment an image into several coherent parts. It requires no user input, no supervised learning phase and assumes an unknown number of segments. It achieves this by first over-segmenting the image into several hundred superpixels. These are iteratively joined on the basis of a discriminative classifier trained on color and texture information obtained from each superpixel. The output of the classifier is regularized by a Markov random field that lends more influence to neighbouring superpixels that are more similar. In each iteration, similar superpixels fall under the same label, until only a few coherent regions remain in the image. The algorithm was tested on a standard evaluation data set, where it performs on par with state-of-the-art algorithms in term of precision and greatly outperforms the state of the art by reducing the oversegmentation of the object of interest. △ Less

Submitted 8 March, 2016; originally announced March 2016.

arXiv:1503.01918 [pdf, other]

Fast image-based obstacle detection from unmanned surface vehicles

Authors: Matej Kristan, Vildana Sulic, Stanislav Kovacic, Janez Pers

Abstract: Obstacle detection plays an important role in unmanned surface vehicles (USV). The USVs operate in highly diverse environments in which an obstacle may be a floating piece of wood, a scuba diver, a pier, or a part of a shoreline, which presents a significant challenge to continuous detection from images taken onboard. This paper addresses the problem of online detection by constrained unsupervised… ▽ More Obstacle detection plays an important role in unmanned surface vehicles (USV). The USVs operate in highly diverse environments in which an obstacle may be a floating piece of wood, a scuba diver, a pier, or a part of a shoreline, which presents a significant challenge to continuous detection from images taken onboard. This paper addresses the problem of online detection by constrained unsupervised segmentation. To this end, a new graphical model is proposed that affords a fast and continuous obstacle image-map estimation from a single video stream captured onboard a USV. The model accounts for the semantic structure of marine environment as observed from USV by imposing weak structural constraints. A Markov random field framework is adopted and a highly efficient algorithm for simultaneous optimization of model parameters and segmentation mask estimation is derived. Our approach does not require computationally intensive extraction of texture features and comfortably runs in real-time. The algorithm is tested on a new, challenging, dataset for segmentation and obstacle detection in marine environments, which is the largest annotated dataset of its kind. Results on this dataset show that our model outperforms the related approaches, while requiring a fraction of computational effort. △ Less

Submitted 6 March, 2015; originally announced March 2015.

Comments: This is an extended version of the ACCV2014 paper [Kristan et al., 2014] submitted to a journal. [Kristan et al., 2014] M. Kristan, J. Pers, V. Sulic, S. Kovacic, A graphical model for rapid obstacle image-map estimation from unmanned surface vehicles, in Proc. Asian Conf. Computer Vision, 2014

arXiv:1503.01313 [pdf, other]

doi 10.1109/TPAMI.2016.2516982

A Novel Performance Evaluation Methodology for Single-Target Trackers

Authors: Matej Kristan, Jiri Matas, Ales Leonardis, Tomas Vojir, Roman Pflugfelder, Gustavo Fernandez, Georg Nebehay, Fatih Porikli, Luka Cehovin

Abstract: This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking… ▽ More This paper addresses the problem of single-target tracker performance evaluation. We consider the performance measures, the dataset and the evaluation system to be the most important components of tracker evaluation and propose requirements for each of them. The requirements are the basis of a new evaluation methodology that aims at a simple and easily interpretable tracker comparison. The ranking-based methodology addresses tracker equivalence in terms of statistical significance and practical differences. A fully-annotated dataset with per-frame annotations with several visual attributes is introduced. The diversity of its visual properties is maximized in a novel way by clustering a large number of videos according to their visual attributes. This makes it the most sophistically constructed and annotated dataset to date. A multi-platform evaluation system allowing easy integration of third-party trackers is presented as well. The proposed evaluation methodology was tested on the VOT2014 challenge on the new dataset and 38 trackers, making it the largest benchmark to date. Most of the tested trackers are indeed state-of-the-art since they outperform the standard baselines, resulting in a highly-challenging benchmark. An exhaustive analysis of the dataset from the perspective of tracking difficulty is carried out. To facilitate tracker comparison a new performance visualization technique is proposed. △ Less

Submitted 8 January, 2016; v1 submitted 4 March, 2015; originally announced March 2015.

Comments: Final version (Accepted), IEEE Pattern Analysis and Machine Intelligence, 2016

arXiv:1502.05803 [pdf, other]

doi 10.1109/TIP.2016.2520370

Visual object tracking performance measures revisited

Authors: Luka Čehovin, Aleš Leonardis, Matej Kristan

Abstract: The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased towards particular tracking aspects. In thi… ▽ More The problem of visual tracking evaluation is sporting a large variety of performance measures, and largely suffers from lack of consensus about which measures should be used in experiments. This makes the cross-paper tracker comparison difficult. Furthermore, as some measures may be less effective than others, the tracking results may be skewed or biased towards particular tracking aspects. In this paper we revisit the popular performance measures and tracker performance visualizations and analyze them theoretically and experimentally. We show that several measures are equivalent from the point of information they provide for tracker comparison and, crucially, that some are more brittle than the others. Based on our analysis we narrow down the set of potential measures to only two complementary ones, describing accuracy and robustness, thus pushing towards homogenization of the tracker evaluation methodology. These two measures can be intuitively interpreted and visualized and have been employed by the recent Visual Object Tracking (VOT) challenges as the foundation for the evaluation methodology. △ Less

Submitted 7 March, 2016; v1 submitted 20 February, 2015; originally announced February 2015.

Journal ref: IEEE Transactions on Image Processing (March 2016), 1261 - 1274

Showing 1–42 of 42 results for author: Kristan, M