subscribe to arXiv mailings

NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Authors: Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou , et al. (63 additional authors not shown)

Abstract: This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge i… ▽ More This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

arXiv:2312.09723 [pdf, other]

doi 10.1109/WACV57701.2024.00832

Tracking Skiers from the Top to the Bottom

Authors: Matteo Dunnhofer, Luca Sordi, Niki Martinel, Christian Micheloni

Abstract: Skiing is a popular winter sport discipline with a long history of competitive events. In this domain, computer vision has the potential to enhance the understanding of athletes' performance, but its application lags behind other sports due to limited studies and datasets. This paper makes a step forward in filling such gaps. A thorough investigation is performed on the task of skier tracking in a… ▽ More Skiing is a popular winter sport discipline with a long history of competitive events. In this domain, computer vision has the potential to enhance the understanding of athletes' performance, but its application lags behind other sports due to limited studies and datasets. This paper makes a step forward in filling such gaps. A thorough investigation is performed on the task of skier tracking in a video capturing his/her complete performance. Obtaining continuous and accurate skier localization is preemptive for further higher-level performance analyses. To enable the study, the largest and most annotated dataset for computer vision in skiing, SkiTB, is introduced. Several visual object tracking algorithms, including both established methodologies and a newly introduced skier-optimized baseline algorithm, are tested using the dataset. The results provide valuable insights into the applicability of different tracking methods for vision-based skiing analysis. SkiTB, code, and results are available at https://machinelearning.uniud.it/datasets/skitb. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) 2024

arXiv:2304.02994 [pdf, other]

doi 10.1109/CVPRW59228.2023.00547

Visualizing Skiers' Trajectories in Monocular Videos

Authors: Matteo Dunnhofer, Luca Sordi, Christian Micheloni

Abstract: Trajectories are fundamental to winning in alpine skiing. Tools enabling the analysis of such curves can enhance the training activity and enrich broadcasting content. In this paper, we propose SkiTraVis, an algorithm to visualize the sequence of points traversed by a skier during its performance. SkiTraVis works on monocular videos and constitutes a pipeline of a visual tracker to model the skier… ▽ More Trajectories are fundamental to winning in alpine skiing. Tools enabling the analysis of such curves can enhance the training activity and enrich broadcasting content. In this paper, we propose SkiTraVis, an algorithm to visualize the sequence of points traversed by a skier during its performance. SkiTraVis works on monocular videos and constitutes a pipeline of a visual tracker to model the skier's motion and of a frame correspondence module to estimate the camera's motion. The separation of the two motions enables the visualization of the trajectory according to the moving camera's perspective. We performed experiments on videos of real-world professional competitions to quantify the visualization error, the computational efficiency, as well as the applicability. Overall, the results achieved demonstrate the potential of our solution for broadcasting media enhancement and coach assistance. △ Less

Submitted 11 April, 2023; v1 submitted 6 April, 2023; originally announced April 2023.

Comments: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), CVsports workshop

arXiv:2302.01144 [pdf, other]

UW-CVGAN: UnderWater Image Enhancement with Capsules Vectors Quantization

Authors: Rita Pucci, Christian Micheloni, Niki Martinel

Abstract: The degradation in the underwater images is due to wavelength-dependent light attenuation, scattering, and to the diversity of the water types in which they are captured. Deep neural networks take a step in this field, providing autonomous models able to achieve the enhancement of underwater images. We introduce Underwater Capsules Vectors GAN UWCVGAN based on the discrete features quantization pa… ▽ More The degradation in the underwater images is due to wavelength-dependent light attenuation, scattering, and to the diversity of the water types in which they are captured. Deep neural networks take a step in this field, providing autonomous models able to achieve the enhancement of underwater images. We introduce Underwater Capsules Vectors GAN UWCVGAN based on the discrete features quantization paradigm from VQGAN for this task. The proposed UWCVGAN combines an encoding network, which compresses the image into its latent representation, with a decoding network, able to reconstruct the enhancement of the image from the only latent representation. In contrast with VQGAN, UWCVGAN achieves feature quantization by exploiting the clusterization ability of capsule layer, making the model completely trainable and easier to manage. The model obtains enhanced underwater images with high quality and fine details. Moreover, the trained encoder is independent of the decoder giving the possibility to be embedded onto the collector as compressing algorithm to reduce the memory space required for the images, of factor $3\times$. \myUWCVGAN{ }is validated with quantitative and qualitative analysis on benchmark datasets, and we present metrics results compared with the state of the art. △ Less

Submitted 2 February, 2023; originally announced February 2023.

arXiv:2210.10413 [pdf, other]

Real Image Super-Resolution using GAN through modeling of LR and HR process

Authors: Rao Muhammad Umer, Christian Micheloni

Abstract: The current existing deep image super-resolution methods usually assume that a Low Resolution (LR) image is bicubicly downscaled of a High Resolution (HR) image. However, such an ideal bicubic downsampling process is different from the real LR degradations, which usually come from complicated combinations of different degradation processes, such as camera blur, sensor noise, sharpening artifacts,… ▽ More The current existing deep image super-resolution methods usually assume that a Low Resolution (LR) image is bicubicly downscaled of a High Resolution (HR) image. However, such an ideal bicubic downsampling process is different from the real LR degradations, which usually come from complicated combinations of different degradation processes, such as camera blur, sensor noise, sharpening artifacts, JPEG compression, and further image editing, and several times image transmission over the internet and unpredictable noises. It leads to the highly ill-posed nature of the inverse upscaling problem. To address these issues, we propose a GAN-based SR approach with learnable adaptive sinusoidal nonlinearities incorporated in LR and SR models by directly learn degradation distributions and then synthesize paired LR/HR training data to train the generalized SR model to real image degradations. We demonstrate the effectiveness of our proposed approach in quantitative and qualitative experiments. △ Less

Submitted 19 October, 2022; originally announced October 2022.

Comments: Accepted in 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), 2022. arXiv admin note: text overlap with arXiv:2009.03693, arXiv:2005.00953

arXiv:2209.13502 [pdf, other]

doi 10.1007/s11263-022-01694-6

Visual Object Tracking in First Person Vision

Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

Abstract: The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects… ▽ More The understanding of human-object interactions is fundamental in First Person Vision (FPV). Visual tracking algorithms which follow the objects manipulated by the camera wearer can provide useful information to effectively model such interactions. In the last years, the computer vision community has significantly improved the performance of tracking algorithms for a large variety of target objects and scenarios. Despite a few previous attempts to exploit trackers in the FPV domain, a methodical analysis of the performance of state-of-the-art trackers is still missing. This research gap raises the question of whether current solutions can be used ``off-the-shelf'' or more domain-specific investigations should be carried out. This paper aims to provide answers to such questions. We present the first systematic investigation of single object tracking in FPV. Our study extensively analyses the performance of 42 algorithms including generic object trackers and baseline FPV-specific trackers. The analysis is carried out by focusing on different aspects of the FPV setting, introducing new performance measures, and in relation to FPV-specific tasks. The study is made possible through the introduction of TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV poses new challenges to current visual trackers. We highlight the factors causing such behavior and point out possible research directions. Despite their difficulties, we prove that trackers bring benefits to FPV downstream tasks requiring short-term object tracking. We expect that generic object tracking will gain popularity in FPV as new and FPV-specific methodologies are investigated. △ Less

Submitted 27 September, 2022; originally announced September 2022.

Comments: International Journal of Computer Vision (IJCV). arXiv admin note: substantial text overlap with arXiv:2108.13665

arXiv:2205.04261 [pdf, other]

doi 10.1109/ICPR56361.2022.9956082

CoCoLoT: Combining Complementary Trackers in Long-Term Visual Tracking

Authors: Matteo Dunnhofer, Christian Micheloni

Abstract: How to combine the complementary capabilities of an ensemble of different algorithms has been of central interest in visual object tracking. A significant progress on such a problem has been achieved, but considering short-term tracking scenarios. Instead, long-term tracking settings have been substantially ignored by the solutions. In this paper, we explicitly consider long-term tracking scenario… ▽ More How to combine the complementary capabilities of an ensemble of different algorithms has been of central interest in visual object tracking. A significant progress on such a problem has been achieved, but considering short-term tracking scenarios. Instead, long-term tracking settings have been substantially ignored by the solutions. In this paper, we explicitly consider long-term tracking scenarios and provide a framework, named CoCoLoT, that combines the characteristics of complementary visual trackers to achieve enhanced long-term tracking performance. CoCoLoT perceives whether the trackers are following the target object through an online learned deep verification model, and accordingly activates a decision policy which selects the best performing tracker as well as it corrects the performance of the failing one. The proposed methodology is evaluated extensively and the comparison with several other solutions reveals that it competes favourably with the state-of-the-art on the most popular long-term visual tracking benchmarks. △ Less

Submitted 9 May, 2022; originally announced May 2022.

Comments: International Conference on Pattern Recognition (ICPR) 2022

arXiv:2112.09647 [pdf, other]

Video-Based Reconstruction of the Trajectories Performed by Skiers

Authors: Matteo Dunnhofer, Alberto Zurini, Maurizio Dunnhofer, Christian Micheloni

Abstract: Trajectories are fundamental in different skiing disciplines. Tools enabling the analysis of such curves can enhance the training activity and enrich the broadcasting contents. However, the solutions currently available are based on geo-localized sensors and surface models. In this short paper, we propose a video-based approach to reconstruct the sequence of points traversed by an athlete during i… ▽ More Trajectories are fundamental in different skiing disciplines. Tools enabling the analysis of such curves can enhance the training activity and enrich the broadcasting contents. However, the solutions currently available are based on geo-localized sensors and surface models. In this short paper, we propose a video-based approach to reconstruct the sequence of points traversed by an athlete during its performance. Our prototype is constituted by a pipeline of deep learning-based algorithms to reconstruct the athlete's motion and to visualize it according to the camera perspective. This is achieved for different skiing disciplines in the wild without any camera calibration. We tested our solution on broadcast and smartphone-captured videos of alpine skiing and ski jumping professional competitions. The qualitative results achieved show the potential of our solution. △ Less

Submitted 17 December, 2021; originally announced December 2021.

arXiv:2110.13217 [pdf, other]

RBSRICNN: Raw Burst Super-Resolution through Iterative Convolutional Neural Network

Authors: Rao Muhammad Umer, Christian Micheloni

Abstract: Modern digital cameras and smartphones mostly rely on image signal processing (ISP) pipelines to produce realistic colored RGB images. However, compared to DSLR cameras, low-quality images are usually obtained in many portable mobile devices with compact camera sensors due to their physical limitations. The low-quality images have multiple degradations i.e., sub-pixel shift due to camera motion, m… ▽ More Modern digital cameras and smartphones mostly rely on image signal processing (ISP) pipelines to produce realistic colored RGB images. However, compared to DSLR cameras, low-quality images are usually obtained in many portable mobile devices with compact camera sensors due to their physical limitations. The low-quality images have multiple degradations i.e., sub-pixel shift due to camera motion, mosaick patterns due to camera color filter array, low-resolution due to smaller camera sensors, and the rest information are corrupted by the noise. Such degradations limit the performance of current Single Image Super-resolution (SISR) methods in recovering high-resolution (HR) image details from a single low-resolution (LR) image. In this work, we propose a Raw Burst Super-Resolution Iterative Convolutional Neural Network (RBSRICNN) that follows the burst photography pipeline as a whole by a forward (physical) model. The proposed Burst SR scheme solves the problem with classical image regularization, convex optimization, and deep learning techniques, compared to existing black-box data-driven methods. The proposed network produces the final output by an iterative refinement of the intermediate SR estimates. We demonstrate the effectiveness of our proposed approach in quantitative and qualitative experiments that generalize robustly to real LR burst inputs with onl synthetic burst data available for training. △ Less

Submitted 10 November, 2021; v1 submitted 25 October, 2021; originally announced October 2021.

Comments: Fourth Workshop on Machine Learning and the Physical Sciences (NeurIPS 2021)

arXiv:2109.07871 [pdf, other]

Resolution based Feature Distillation for Cross Resolution Person Re-Identification

Authors: Asad Munir, Chengjin Lyu, Bart Goossens, Wilfried Philips, Christian Micheloni

Abstract: Person re-identification (re-id) aims to retrieve images of same identities across different camera views. Resolution mismatch occurs due to varying distances between person of interest and cameras, this significantly degrades the performance of re-id in real world scenarios. Most of the existing approaches resolve the re-id task as low resolution problem in which a low resolution query image is s… ▽ More Person re-identification (re-id) aims to retrieve images of same identities across different camera views. Resolution mismatch occurs due to varying distances between person of interest and cameras, this significantly degrades the performance of re-id in real world scenarios. Most of the existing approaches resolve the re-id task as low resolution problem in which a low resolution query image is searched in a high resolution images gallery. Several approaches apply image super resolution techniques to produce high resolution images but ignore the multiple resolutions of gallery images which is a better realistic scenario. In this paper, we introduce channel correlations to improve the learning of features from the degraded data. In addition, to overcome the problem of multiple resolutions we propose a Resolution based Feature Distillation (RFD) approach. Such an approach learns resolution invariant features by filtering the resolution related features from the final feature vectors that are used to compute the distance matrix. We tested the proposed approach on two synthetically created datasets and on one original multi resolution dataset with real degradation. Our approach improves the performance when multiple resolutions occur in the gallery and have comparable results in case of single resolution (low resolution re-id). △ Less

Submitted 16 September, 2021; originally announced September 2021.

Comments: 9 pages

arXiv:2108.13665 [pdf, other]

doi 10.1109/ICCVW54120.2021.00304

Is First Person Vision Challenging for Object Tracking?

Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

Abstract: Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects a… ▽ More Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Visual tracking solutions available in the computer vision literature have significantly improved their performance in the last years for a large variety of target objects and tracking scenarios. However, despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art trackers in this domain is still missing. In this paper, we fill the gap by presenting the first systematic study of object tracking in FPV. Our study extensively analyses the performance of recent visual trackers and baseline FPV trackers with respect to different aspects and considering a new performance measure. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. Our results show that object tracking in FPV is challenging, which suggests that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks. △ Less

Submitted 31 August, 2021; originally announced August 2021.

Comments: IEEE/CVF International Conference on Computer Vision (ICCV) 2021, Visual Object Tracking Challenge VOT2021 workshop. arXiv admin note: text overlap with arXiv:2011.12263

arXiv:2107.03145 [pdf, other]

A Deep Residual Star Generative Adversarial Network for multi-domain Image Super-Resolution

Authors: Rao Muhammad Umer, Asad Munir, Christian Micheloni

Abstract: Recently, most of state-of-the-art single image super-resolution (SISR) methods have attained impressive performance by using deep convolutional neural networks (DCNNs). The existing SR methods have limited performance due to a fixed degradation settings, i.e. usually a bicubic downscaling of low-resolution (LR) image. However, in real-world settings, the LR degradation process is unknown which ca… ▽ More Recently, most of state-of-the-art single image super-resolution (SISR) methods have attained impressive performance by using deep convolutional neural networks (DCNNs). The existing SR methods have limited performance due to a fixed degradation settings, i.e. usually a bicubic downscaling of low-resolution (LR) image. However, in real-world settings, the LR degradation process is unknown which can be bicubic LR, bilinear LR, nearest-neighbor LR, or real LR. Therefore, most SR methods are ineffective and inefficient in handling more than one degradation settings within a single network. To handle the multiple degradation, i.e. refers to multi-domain image super-resolution, we propose a deep Super-Resolution Residual StarGAN (SR2*GAN), a novel and scalable approach that super-resolves the LR images for the multiple LR domains using only a single model. The proposed scheme is trained in a StarGAN like network topology with a single generator and discriminator networks. We demonstrate the effectiveness of our proposed approach in quantitative and qualitative experiments compared to other state-of-the-art methods. △ Less

Submitted 7 July, 2021; originally announced July 2021.

Comments: 5 pages, 6th International Conference on Smart and Sustainable Technologies 2021. arXiv admin note: text overlap with arXiv:2009.03693, arXiv:2005.00953

arXiv:2106.03839 [pdf, other]

NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results

Authors: Goutam Bhat, Martin Danelljan, Radu Timofte, Kazutoshi Akita, Wooyeong Cho, Haoqiang Fan, Lanpeng Jia, Daeshik Kim, Bruno Lecouat, Youwei Li, Shuaicheng Liu, Ziluan Liu, Ziwei Luo, Takahiro Maeda, Julien Mairal, Christian Micheloni, Xuan Mo, Takeru Oba, Pavel Ostyakov, Jean Ponce, Sanghyeok Son, Jian Sun, Norimichi Ukita, Rao Muhammad Umer, Youliang Yan , et al. (3 additional authors not shown)

Abstract: This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using… ▽ More This paper reviews the NTIRE2021 challenge on burst super-resolution. Given a RAW noisy burst as input, the task in the challenge was to generate a clean RGB image with 4 times higher resolution. The challenge contained two tracks; Track 1 evaluating on synthetically generated data, and Track 2 using real-world bursts from mobile camera. In the final testing phase, 6 teams submitted results using a diverse set of solutions. The top-performing methods set a new state-of-the-art for the burst super-resolution task. △ Less

Submitted 7 June, 2021; originally announced June 2021.

Comments: NTIRE 2021 Burst Super-Resolution challenge report

arXiv:2103.14496 [pdf, other]

doi 10.1109/LRA.2021.3070816

Weakly-Supervised Domain Adaptation of Deep Regression Trackers via Reinforced Knowledge Distillation

Authors: Matteo Dunnhofer, Niki Martinel, Christian Micheloni

Abstract: Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this paper we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propo… ▽ More Deep regression trackers are among the fastest tracking algorithms available, and therefore suitable for real-time robotic applications. However, their accuracy is inadequate in many domains due to distribution shift and overfitting. In this paper we overcome such limitations by presenting the first methodology for domain adaption of such a class of trackers. To reduce the labeling effort we propose a weakly-supervised adaptation strategy, in which reinforcement learning is used to express weak supervision as a scalar application-dependent and temporally-delayed feedback. At the same time, knowledge distillation is employed to guarantee learning stability and to compress and transfer knowledge from more powerful but slower trackers. Extensive experiments on five different robotic vision domains demonstrate the relevance of our methodology. Real-time speed is achieved on embedded devices and on machines without GPUs, while accuracy reaches significant results. △ Less

Submitted 26 March, 2021; originally announced March 2021.

Comments: IEEE Robotics and Automation Letters (RA-L)

arXiv:2101.07576 [pdf, other]

Collaboration among Image and Object Level Features for Image Colourisation

Authors: Rita Pucci, Christian Micheloni, Niki Martinel

Abstract: Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image level (context) features. However, obtaining human hints is not always f… ▽ More Image colourisation is an ill-posed problem, with multiple correct solutions which depend on the context and object instances present in the input datum. Previous approaches attacked the problem either by requiring intense user interactions or by exploiting the ability of convolutional neural networks (CNNs) in learning image level (context) features. However, obtaining human hints is not always feasible and CNNs alone are not able to learn object-level semantics unless multiple models pretrained with supervision are considered. In this work, we propose a single network, named UCapsNet, that separate image-level features obtained through convolutions and object-level features captured by means of capsules. Then, by skip connections over different layers, we enforce collaboration between such disentangling factors to produce high quality and plausible image colourisation. We pose the problem as a classification task that can be addressed by a fully self-supervised approach, thus requires no human effort. Experimental results on three benchmark datasets show that our approach outperforms existing methods on standard quality metrics and achieves a state of the art performances on image colourisation. A large scale user study shows that our method is preferred over existing solutions. △ Less

Submitted 19 January, 2021; originally announced January 2021.

arXiv:2012.02478 [pdf, other]

Is It a Plausible Colour? UCapsNet for Image Colourisation

Authors: Rita Pucci, Christian Micheloni, Gian Luca Foresti, Niki Martinel

Abstract: Human beings can imagine the colours of a grayscale image with no particular effort thanks to their ability of semantic feature extraction. Can an autonomous system achieve that? Can it hallucinate plausible and vibrant colours? This is the colourisation problem. Different from existing works relying on convolutional neural network models pre-trained with supervision, we cast such colourisation pr… ▽ More Human beings can imagine the colours of a grayscale image with no particular effort thanks to their ability of semantic feature extraction. Can an autonomous system achieve that? Can it hallucinate plausible and vibrant colours? This is the colourisation problem. Different from existing works relying on convolutional neural network models pre-trained with supervision, we cast such colourisation problem as a self-supervised learning task. We tackle the problem with the introduction of a novel architecture based on Capsules trained following the adversarial learning paradigm. Capsule networks are able to extract a semantic representation of the entities in the image but loose details about their spatial information, which is important for colourising a grayscale image. Thus our UCapsNet structure comes with an encoding phase that extracts entities through capsules and spatial details through convolutional neural networks. A decoding phase merges the entity features with the spatial features to hallucinate a plausible colour version of the input datum. Results on the ImageNet benchmark show that our approach is able to generate more vibrant and plausible colours than exiting solutions and achieves superior performance than models pre-trained with supervision. △ Less

Submitted 4 December, 2020; originally announced December 2020.

arXiv:2011.12263 [pdf, other]

Is First Person Vision Challenging for Object Tracking?

Authors: Matteo Dunnhofer, Antonino Furnari, Giovanni Maria Farinella, Christian Micheloni

Abstract: Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art visual trackers in this domain is still… ▽ More Understanding human-object interactions is fundamental in First Person Vision (FPV). Tracking algorithms which follow the objects manipulated by the camera wearer can provide useful cues to effectively model such interactions. Despite a few previous attempts to exploit trackers in FPV applications, a methodical analysis of the performance of state-of-the-art visual trackers in this domain is still missing. In this short paper, we provide a recap of the first systematic study of object tracking in FPV. Our work extensively analyses the performance of recent and baseline FPV trackers with respect to different aspects. This is achieved through TREK-150, a novel benchmark dataset composed of 150 densely annotated video sequences. The results suggest that more research efforts should be devoted to this problem so that tracking could benefit FPV tasks. The full version of this paper is available at arXiv:2108.13665. △ Less

Submitted 24 September, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: Extended Abstract accepted by the EPIC workshop at ICCV 2021. The full version of this paper is available at arXiv:2108.13665

arXiv:2009.12072 [pdf, other]

AIM 2020 Challenge on Real Image Super-Resolution: Methods and Results

Authors: Pengxu Wei, Hannan Lu, Radu Timofte, Liang Lin, Wangmeng Zuo, Zhihong Pan, Baopu Li, Teng Xi, Yanwen Fan, Gang Zhang, Jingtuo Liu, Junyu Han, Errui Ding, Tangxin Xie, Liang Cao, Yan Zou, Yi Shen, Jialiang Zhang, Yu Jia, Kaihua Cheng, Chenhuan Wu, Yue Lin, Cen Liu, Yunbo Peng, Xueyi Zou , et al. (51 additional authors not shown)

Abstract: This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020. This challenge involves three tracks to super-resolve an input image for $\times$2, $\times$3 and $\times$4 scaling factors, respectively. The goal is to attract more attention to realistic image degradation for the SR task, wh… ▽ More This paper introduces the real image Super-Resolution (SR) challenge that was part of the Advances in Image Manipulation (AIM) workshop, held in conjunction with ECCV 2020. This challenge involves three tracks to super-resolve an input image for $\times$2, $\times$3 and $\times$4 scaling factors, respectively. The goal is to attract more attention to realistic image degradation for the SR task, which is much more complicated and challenging, and contributes to real-world image super-resolution applications. 452 participants were registered for three tracks in total, and 24 teams submitted their results. They gauge the state-of-the-art approaches for real image SR in terms of PSNR and SSIM. △ Less

Submitted 25 September, 2020; originally announced September 2020.

Journal ref: European Conference on Computer Vision Workshops, 2020

arXiv:2009.06943 [pdf, other]

AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results

Authors: Kai Zhang, Martin Danelljan, Yawei Li, Radu Timofte, Jie Liu, Jie Tang, Gangshan Wu, Yu Zhu, Xiangyu He, Wenjie Xu, Chenghua Li, Cong Leng, Jian Cheng, Guangyang Wu, Wenyi Wang, Xiaohong Liu, Hengyuan Zhao, Xiangtao Kong, Jingwen He, Yu Qiao, Chao Dong, Xiaotong Luo, Liang Chen, Jiangtao Zhang, Maitreya Suin , et al. (60 additional authors not shown)

Abstract: This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter co… ▽ More This paper reviews the AIM 2020 challenge on efficient single image super-resolution with focus on the proposed solutions and results. The challenge task was to super-resolve an input image with a magnification factor x4 based on a set of prior examples of low and corresponding high resolution images. The goal is to devise a network that reduces one or several aspects such as runtime, parameter count, FLOPs, activations, and memory consumption while at least maintaining PSNR of MSRResNet. The track had 150 registered participants, and 25 teams submitted the final results. They gauge the state-of-the-art in efficient single image super-resolution. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2009.04809 [pdf, other]

Deep Iterative Residual Convolutional Network for Single Image Super-Resolution

Authors: Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Abstract: Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. The most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and high-resolution (HR) outputs. These existing SR methods… ▽ More Deep convolutional neural networks (CNNs) have recently achieved great success for single image super-resolution (SISR) task due to their powerful feature representation capabilities. The most recent deep learning based SISR methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and high-resolution (HR) outputs. These existing SR methods do not take into account the image observation (physical) model and thus require a large number of network's trainable parameters with a great volume of training data. To address these issues, we propose a deep Iterative Super-Resolution Residual Convolutional Network (ISRResCNet) that exploits the powerful image regularization and large-scale optimization techniques by training the deep network in an iterative manner with a residual learning approach. Extensive experimental results on various super-resolution benchmarks demonstrate that our method with a few trainable parameters improves the results for different scaling factors in comparison with the state-of-art methods. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: To be appeared in proceedings of the 25th IEEE International Conference on Pattern Recognition (ICPR). arXiv admin note: text overlap with arXiv:2005.00953, arXiv:2009.03693

arXiv:2009.03693 [pdf, other]

Deep Cyclic Generative Adversarial Residual Convolutional Networks for Real Image Super-Resolution

Authors: Rao Muhammad Umer, Christian Micheloni

Abstract: Recent deep learning based single image super-resolution (SISR) methods mostly train their models in a clean data domain where the low-resolution (LR) and the high-resolution (HR) images come from noise-free settings (same domain) due to the bicubic down-sampling assumption. However, such degradation process is not available in real-world settings. We consider a deep cyclic network structure to ma… ▽ More Recent deep learning based single image super-resolution (SISR) methods mostly train their models in a clean data domain where the low-resolution (LR) and the high-resolution (HR) images come from noise-free settings (same domain) due to the bicubic down-sampling assumption. However, such degradation process is not available in real-world settings. We consider a deep cyclic network structure to maintain the domain consistency between the LR and HR data distributions, which is inspired by the recent success of CycleGAN in the image-to-image translation applications. We propose the Super-Resolution Residual Cyclic Generative Adversarial Network (SRResCycGAN) by training with a generative adversarial network (GAN) framework for the LR to HR domain translation in an end-to-end manner. We demonstrate our proposed approach in the quantitative and qualitative experiments that generalize well to the real image super-resolution and it is easy to deploy for the mobile/embedded devices. In addition, our SR results on the AIM 2020 Real Image SR Challenge datasets demonstrate that the proposed SR approach achieves comparable results as the other state-of-art methods. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Comments: In proceedings of European Conference on Computer Vision (ECCV) Workshops. arXiv admin note: substantial text overlap with arXiv:2005.00953

arXiv:2008.00992 [pdf, other]

doi 10.1007/978-3-030-68238-5_41

An Exploration of Target-Conditioned Segmentation Methods for Visual Object Trackers

Authors: Matteo Dunnhofer, Niki Martinel, Christian Micheloni

Abstract: Visual object tracking is the problem of predicting a target object's state in a video. Generally, bounding-boxes have been used to represent states, and a surge of effort has been spent by the community to produce efficient causal algorithms capable of locating targets with such representations. As the field is moving towards binary segmentation masks to define objects more precisely, in this pap… ▽ More Visual object tracking is the problem of predicting a target object's state in a video. Generally, bounding-boxes have been used to represent states, and a surge of effort has been spent by the community to produce efficient causal algorithms capable of locating targets with such representations. As the field is moving towards binary segmentation masks to define objects more precisely, in this paper we propose to extensively explore target-conditioned segmentation methods available in the computer vision community, in order to transform any bounding-box tracker into a segmentation tracker. Our analysis shows that such methods allow trackers to compete with recently proposed segmentation trackers, while performing quasi real-time. △ Less

Submitted 13 August, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

Comments: European Conference on Computer Vision (ECCV) 2020, Visual Object Tracking Challenge VOT2020 workshop

arXiv:2007.04108 [pdf, other]

doi 10.1007/978-3-030-69532-3_38

Tracking-by-Trackers with a Distilled and Reinforced Model

Authors: Matteo Dunnhofer, Niki Martinel, Christian Micheloni

Abstract: Visual object tracking was generally tackled by reasoning independently on fast processing algorithms, accurate online adaptation methods, and fusion of trackers. In this paper, we unify such goals by proposing a novel tracking methodology that takes advantage of other visual trackers, offline and online. A compact student model is trained via the marriage of knowledge distillation and reinforceme… ▽ More Visual object tracking was generally tackled by reasoning independently on fast processing algorithms, accurate online adaptation methods, and fusion of trackers. In this paper, we unify such goals by proposing a novel tracking methodology that takes advantage of other visual trackers, offline and online. A compact student model is trained via the marriage of knowledge distillation and reinforcement learning. The first allows to transfer and compress tracking knowledge of other trackers. The second enables the learning of evaluation measures which are then exploited online. After learning, the student can be ultimately used to build (i) a very fast single-shot tracker, (ii) a tracker with a simple and effective online adaptation mechanism, (iii) a tracker that performs fusion of other trackers. Extensive validation shows that the proposed algorithms compete with real-time state-of-the-art trackers. △ Less

Submitted 30 September, 2020; v1 submitted 8 July, 2020; originally announced July 2020.

Comments: Asian Conference on Computer Vision (ACCV) 2020

arXiv:2005.01996 [pdf, other]

NTIRE 2020 Challenge on Real-World Image Super-Resolution: Methods and Results

Authors: Andreas Lugmayr, Martin Danelljan, Radu Timofte, Namhyuk Ahn, Dongwoon Bai, Jie Cai, Yun Cao, Junyang Chen, Kaihua Cheng, SeYoung Chun, Wei Deng, Mostafa El-Khamy, Chiu Man Ho, Xiaozhong Ji, Amin Kheradmand, Gwantae Kim, Hanseok Ko, Kanghyu Lee, Jungwon Lee, Hao Li, Ziluan Liu, Zhi-Song Liu, Shuai Liu, Yunhua Lu, Zibo Meng , et al. (21 additional authors not shown)

Abstract: This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Proc… ▽ More This paper reviews the NTIRE 2020 challenge on real world super-resolution. It focuses on the participating methods and final results. The challenge addresses the real world setting, where paired true high and low-resolution images are unavailable. For training, only one set of source input images is therefore provided along with a set of unpaired high-quality target images. In Track 1: Image Processing artifacts, the aim is to super-resolve images with synthetically generated image processing artifacts. This allows for quantitative benchmarking of the approaches \wrt a ground-truth image. In Track 2: Smartphone Images, real low-quality smart phone images have to be super-resolved. In both tracks, the ultimate goal is to achieve the best perceptual quality, evaluated using a human study. This is the second challenge on the subject, following AIM 2019, targeting to advance the state-of-the-art in super-resolution. To measure the performance we use the benchmark protocol from AIM 2019. In total 22 teams competed in the final testing phase, demonstrating new and innovative solutions to the problem. △ Less

Submitted 5 May, 2020; originally announced May 2020.

arXiv:2005.00953 [pdf, other]

Deep Generative Adversarial Residual Convolutional Networks for Real-World Super-Resolution

Authors: Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Abstract: Most current deep learning based single image super-resolution (SISR) methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and the high-resolution (HR) outputs from a large number of paired (LR/HR) training data. They usually take as assumption that the LR image is a bicubic down-sampled version of the HR image. However, such degradati… ▽ More Most current deep learning based single image super-resolution (SISR) methods focus on designing deeper / wider models to learn the non-linear mapping between low-resolution (LR) inputs and the high-resolution (HR) outputs from a large number of paired (LR/HR) training data. They usually take as assumption that the LR image is a bicubic down-sampled version of the HR image. However, such degradation process is not available in real-world settings i.e. inherent sensor noise, stochastic noise, compression artifacts, possible mismatch between image degradation process and camera device. It reduces significantly the performance of current SISR methods due to real-world image corruptions. To address these problems, we propose a deep Super-Resolution Residual Convolutional Generative Adversarial Network (SRResCGAN) to follow the real-world degradation settings by adversarial training the model with pixel-wise supervision in the HR domain from its generated LR counterpart. The proposed network exploits the residual learning by minimizing the energy-based objective function with powerful image regularization and convex optimization techniques. We demonstrate our proposed approach in quantitative and qualitative experiments that generalize robustly to real input and it is easy to deploy for other down-scaling operators and mobile/embedded devices. △ Less

Submitted 2 May, 2020; originally announced May 2020.

arXiv:2004.06154 [pdf, other]

An Efficient UAV-based Artificial Intelligence Framework for Real-Time Visual Tasks

Authors: Enkhtogtokh Togootogtokh, Christian Micheloni, Gian Luca Foresti, Niki Martinel

Abstract: Modern Unmanned Aerial Vehicles equipped with state of the art artificial intelligence (AI) technologies are opening to a wide plethora of novel and interesting applications. While this field received a strong impact from the recent AI breakthroughs, most of the provided solutions either entirely rely on commercial software or provide a weak integration interface which denies the development of ad… ▽ More Modern Unmanned Aerial Vehicles equipped with state of the art artificial intelligence (AI) technologies are opening to a wide plethora of novel and interesting applications. While this field received a strong impact from the recent AI breakthroughs, most of the provided solutions either entirely rely on commercial software or provide a weak integration interface which denies the development of additional techniques. This leads us to propose a novel and efficient framework for the UAV-AI joint technology. Intelligent UAV systems encounter complex challenges to be tackled without human control. One of these complex challenges is to be able to carry out computer vision tasks in real-time use cases. In this paper we focus on this challenge and introduce a multi-layer AI (MLAI) framework to allow easy integration of ad-hoc visual-based AI applications. To show its features and its advantages, we implemented and evaluated different modern visual-based deep learning models for object detection, target tracking and target handover. △ Less

Submitted 13 April, 2020; originally announced April 2020.

arXiv:1910.04856 [pdf, other]

Video-Based Convolutional Attention for Person Re-Identification

Authors: Marco Zamprogno, Marco Passon, Niki Martinel, Giuseppe Serra, Giuseppe Lancioni, Christian Micheloni, Carlo Tasso, Gian Luca Foresti

Abstract: In this paper we consider the problem of video-based person re-identification, which is the task of associating videos of the same person captured by different and non-overlapping cameras. We propose a Siamese framework in which video frames of the person to re-identify and of the candidate one are processed by two identical networks which produce a similarity score. We introduce an attention mech… ▽ More In this paper we consider the problem of video-based person re-identification, which is the task of associating videos of the same person captured by different and non-overlapping cameras. We propose a Siamese framework in which video frames of the person to re-identify and of the candidate one are processed by two identical networks which produce a similarity score. We introduce an attention mechanisms to capture the relevant information both at frame level (spatial information) and at video level (temporal information given by the importance of a specific frame within the sequence). One of the novelties of our approach is given by a joint concurrent processing of both frame and video levels, providing in such a way a very simple architecture. Despite this fact, our approach achieves better performance than the state-of-the-art on the challenging iLIDS-VID dataset. △ Less

Submitted 26 September, 2019; originally announced October 2019.

Comments: 11 pages, 2 figures. Accepted by ICIAP2019, 20th International Conference on IMAGE ANALYSIS AND PROCESSING, Trento, Italy, 9-13 September, 2019

arXiv:1909.08487 [pdf, other]

doi 10.1109/ICCVW.2019.00282

Visual Tracking by means of Deep Reinforcement Learning and an Expert Demonstrator

Authors: Matteo Dunnhofer, Niki Martinel, Gian Luca Foresti, Christian Micheloni

Abstract: In the last decade many different algorithms have been proposed to track a generic object in videos. Their execution on recent large-scale video datasets can produce a great amount of various tracking behaviours. New trends in Reinforcement Learning showed that demonstrations of an expert agent can be efficiently used to speed-up the process of policy learning. Taking inspiration from such works a… ▽ More In the last decade many different algorithms have been proposed to track a generic object in videos. Their execution on recent large-scale video datasets can produce a great amount of various tracking behaviours. New trends in Reinforcement Learning showed that demonstrations of an expert agent can be efficiently used to speed-up the process of policy learning. Taking inspiration from such works and from the recent applications of Reinforcement Learning to visual tracking, we propose two novel trackers, A3CT, which exploits demonstrations of a state-of-the-art tracker to learn an effective tracking policy, and A3CTD, that takes advantage of the same expert tracker to correct its behaviour during tracking. Through an extensive experimental validation on the GOT-10k, OTB-100, LaSOT, UAV123 and VOT benchmarks, we show that the proposed trackers achieve state-of-the-art performance while running in real-time. △ Less

Submitted 18 September, 2019; originally announced September 2019.

Comments: in 2019 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) - VOT2019 Challenge Workshop

arXiv:1909.03748 [pdf, other]

doi 10.1145/3349801.3349823

Deep Super-Resolution Network for Single Image Super-Resolution with Realistic Degradations

Authors: Rao Muhammad Umer, Gian Luca Foresti, Christian Micheloni

Abstract: Single Image Super-Resolution (SISR) aims to generate a high-resolution (HR) image of a given low-resolution (LR) image. The most of existing convolutional neural network (CNN) based SISR methods usually take an assumption that a LR image is only bicubicly down-sampled version of an HR image. However, the true degradation (i.e. the LR image is a bicubicly downsampled, blurred and noisy version of… ▽ More Single Image Super-Resolution (SISR) aims to generate a high-resolution (HR) image of a given low-resolution (LR) image. The most of existing convolutional neural network (CNN) based SISR methods usually take an assumption that a LR image is only bicubicly down-sampled version of an HR image. However, the true degradation (i.e. the LR image is a bicubicly downsampled, blurred and noisy version of an HR image) of a LR image goes beyond the widely used bicubic assumption, which makes the SISR problem highly ill-posed nature of inverse problems. To address this issue, we propose a deep SISR network that works for blur kernels of different sizes, and different noise levels in an unified residual CNN-based denoiser network, which significantly improves a practical CNN-based super-resolver for real applications. Extensive experimental results on synthetic LR datasets and real images demonstrate that our proposed method not only can produce better results on more realistic degradation but also computational efficient to practical SISR applications. △ Less

Submitted 9 September, 2019; originally announced September 2019.

Comments: 7 pages

Journal ref: 13th International Conference on Distributed Smart Cameras (ICDSC 2019)

arXiv:1612.06543 [pdf, other]

Wide-Slice Residual Networks for Food Recognition

Authors: Niki Martinel, Gian Luca Foresti, Christian Micheloni

Abstract: Food diary applications represent a tantalizing market. Such applications, based on image food recognition, opened to new challenges for computer vision and pattern recognition algorithms. Recent works in the field are focusing either on hand-crafted representations or on learning these by exploiting deep neural networks. Despite the success of such a last family of works, these generally exploit… ▽ More Food diary applications represent a tantalizing market. Such applications, based on image food recognition, opened to new challenges for computer vision and pattern recognition algorithms. Recent works in the field are focusing either on hand-crafted representations or on learning these by exploiting deep neural networks. Despite the success of such a last family of works, these generally exploit off-the shelf deep architectures to classify food dishes. Thus, the architectures are not cast to the specific problem. We believe that better results can be obtained if the deep architecture is defined with respect to an analysis of the food composition. Following such an intuition, this work introduces a new deep scheme that is designed to handle the food structure. Specifically, inspired by the recent success of residual deep network, we exploit such a learning scheme and introduce a slice convolution block to capture the vertical food layers. Outputs of the deep residual blocks are combined with the sliced convolution to produce the classification score for specific food categories. To evaluate our proposed architecture we have conducted experimental results on three benchmark datasets. Results demonstrate that our solution shows better performance with respect to existing approaches (e.g., a top-1 accuracy of 90.27% on the Food-101 challenging dataset). △ Less

Submitted 20 December, 2016; originally announced December 2016.

arXiv:1607.07216 [pdf, other]

Temporal Model Adaptation for Person Re-Identification

Authors: Niki Martinel, Abir Das, Christian Micheloni, Amit K. Roy-Chowdhury

Abstract: Person re-identification is an open and challenging problem in computer vision. Majority of the efforts have been spent either to design the best feature representation or to learn the optimal matching metric. Most approaches have neglected the problem of adapting the selected features or the learned model over time. To address such a problem, we propose a temporal model adaptation scheme with hum… ▽ More Person re-identification is an open and challenging problem in computer vision. Majority of the efforts have been spent either to design the best feature representation or to learn the optimal matching metric. Most approaches have neglected the problem of adapting the selected features or the learned model over time. To address such a problem, we propose a temporal model adaptation scheme with human in the loop. We first introduce a similarity-dissimilarity learning method which can be trained in an incremental fashion by means of a stochastic alternating directions methods of multipliers optimization procedure. Then, to achieve temporal adaptation with limited human effort, we exploit a graph-based approach to present the user only the most informative probe-gallery matches that should be used to update the model. Results on three datasets have shown that our approach performs on par or even better than state-of-the-art approaches while reducing the manual pairwise labeling effort by about 80%. △ Less

Submitted 25 July, 2016; originally announced July 2016.

Showing 1–31 of 31 results for author: Micheloni, C