Skip to main content

Showing 1–50 of 92 results for author: Kittler, J

  1. arXiv:2406.17460  [pdf, other

    cs.CV

    Investigating Self-Supervised Methods for Label-Efficient Learning

    Authors: Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais

    Abstract: Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks like classification, segmentation and detection. The low-shot learning capability of these models, across several low-shot downstream tasks, has been largely under explored. We perform a system level study of different self supervised pret… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  2. arXiv:2406.17450  [pdf, other

    cs.CV cs.AI

    Pseudo Labelling for Enhanced Masked Autoencoders

    Authors: Srinivasa Rao Nandam, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais

    Abstract: Masked Image Modeling (MIM)-based models, such as SdAE, CAE, GreenMIM, and MixAE, have explored different strategies to enhance the performance of Masked Autoencoders (MAE) by modifying prediction, loss functions, or incorporating additional architectural components. In this paper, we propose an enhanced approach that boosts MAE performance by integrating pseudo labelling for both class and data t… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2405.05004  [pdf, other

    cs.CV

    TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking

    Authors: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler

    Abstract: There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intri… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  4. arXiv:2405.00168  [pdf, other

    cs.CV

    Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

    Authors: Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

    Abstract: RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quali… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  5. arXiv:2404.16359  [pdf, other

    cs.CV

    An Improved Graph Pooling Network for Skeleton-Based Action Recognition

    Authors: Cong Wu, Xiao-Jun Wu, Tianyang Xu, Josef Kittler

    Abstract: Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matri… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  6. arXiv:2404.00509  [pdf, other

    cs.LG cs.CV

    DailyMAE: Towards Pretraining Masked Autoencoders in One Day

    Authors: Jiantao Wu, Shentong Mo, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais

    Abstract: Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting how models pretrained on extensive datasets can enhance the performance of downstream tasks. However, the high computational demands of pretraining po… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  7. arXiv:2401.00285  [pdf, other

    cs.CV cs.AI

    BusReF: Infrared-Visible images registration and fusion focus on reconstructible area using one set of features

    Authors: Zeyang Zhang, Hui Li, Tianyang Xu, Xiaojun Wu, Josef Kittler

    Abstract: In a scenario where multi-modal cameras are operating together, the problem of working with non-aligned images cannot be avoided. Yet, existing image fusion algorithms rely heavily on strictly registered input image pairs to produce more precise fusion results, as a way to improve the performance of downstream high-level vision tasks. In order to relax this assumption, one can attempt to register… ▽ More

    Submitted 30 December, 2023; originally announced January 2024.

  8. arXiv:2312.14209  [pdf, other

    cs.CV

    TextFusion: Unveiling the Power of Textual Semantics for Controllable Image Fusion

    Authors: Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Hui Li, Xi Li, Zhangyong Tang, Josef Kittler

    Abstract: Advanced image fusion methods are devoted to generating the fusion results by aggregating the complementary information conveyed by the source images. However, the difference in the source-specific manifestation of the imaged scene content makes it difficult to design a robust and controllable fusion process. We argue that this issue can be alleviated with the help of higher-level semantics, conve… ▽ More

    Submitted 8 February, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: v2 version, 13 pages, 16 figures, with the code repository link

    ACM Class: I.4

  9. arXiv:2312.01118  [pdf, other

    cs.CV

    Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning

    Authors: Jiantao Wu, Shentong Mo, Sara Atito, Josef Kittler, Zhenhua Feng, Muhammad Awais

    Abstract: Recently, self-supervised metric learning has raised attention for the potential to learn a generic distance function. It overcomes the limitations of conventional supervised one, e.g., scalability and label biases. Despite progress in this domain, current benchmarks, incorporating a narrow scope of classes, stop the nuanced evaluation of semantic representations. To bridge this gap, we introduce… ▽ More

    Submitted 2 December, 2023; originally announced December 2023.

  10. arXiv:2311.16738  [pdf, other

    cs.CV

    Riemannian Self-Attention Mechanism for SPD Networks

    Authors: Rui Wang, Xiao-Jun Wu, Hui Li, Josef Kittler

    Abstract: Symmetric positive definite (SPD) matrix has been demonstrated to be an effective feature descriptor in many scientific areas, as it can encode spatiotemporal statistics of the data adequately on a curved Riemannian manifold, i.e., SPD manifold. Although there are many different ways to design network architectures for SPD matrix nonlinear learning, very few solutions explicitly mine the geometric… ▽ More

    Submitted 28 November, 2023; originally announced November 2023.

    Comments: 14 pages, 10 figures, 5 tables

  11. arXiv:2309.05834  [pdf, other

    cs.CV

    SCD-Net: Spatiotemporal Clues Disentanglement Network for Self-supervised Skeleton-based Action Recognition

    Authors: Cong Wu, Xiao-Jun Wu, Josef Kittler, Tianyang Xu, Sara Atito, Muhammad Awais, Zhenhua Feng

    Abstract: Contrastive learning has achieved great success in skeleton-based action recognition. However, most existing approaches encode the skeleton sequences as entangled spatiotemporal representations and confine the contrasts to the same level of representation. Instead, this paper introduces a novel contrastive learning framework, namely Spatiotemporal Clues Disentanglement Network (SCD-Net). Specifica… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  12. arXiv:2309.01728  [pdf, other

    cs.CV

    Generative-based Fusion Mechanism for Multi-Modal Tracking

    Authors: Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Xiao-Jun Wu, Josef Kittler

    Abstract: Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-mo… ▽ More

    Submitted 30 November, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

  13. arXiv:2308.11448  [pdf, other

    cs.CV cs.LG

    Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding

    Authors: Jiantao Wu, Shentong Mo, Muhammad Awais, Sara Atito, Zhenhua Feng, Josef Kittler

    Abstract: Self-supervised pretraining (SSP) has emerged as a popular technique in machine learning, enabling the extraction of meaningful feature representations without labelled data. In the realm of computer vision, pretrained vision transformers (ViTs) have played a pivotal role in advancing transfer learning. Nonetheless, the escalating cost of finetuning these large models has posed a challenge due to… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

  14. arXiv:2305.05970  [pdf, other

    cs.CV

    FusionBooster: A Unified Image Fusion Boosting Paradigm

    Authors: Chunyang Cheng, Tianyang Xu, Xiao-Jun Wu, Hui Li, Xi Li, Josef Kittler

    Abstract: In recent years, numerous ideas have emerged for designing a mutually reinforcing mechanism or extra stages for the image fusion task, ignoring the inevitable gaps between different vision tasks and the computational burden. We argue that there is a scope to improve the fusion performance with the help of the FusionBooster, a model specifically designed for the fusion task. In particular, our boos… ▽ More

    Submitted 8 February, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

    Comments: 18 pages; v2, including the code repository

    ACM Class: I.4

  15. arXiv:2304.05172  [pdf, other

    cs.CV

    LRRNet: A Novel Representation Learning Guided Fusion Network for Infrared and Visible Images

    Authors: Hui Li, Tianyang Xu, Xiao-Jun Wu, Jiwen Lu, Josef Kittler

    Abstract: Deep learning based fusion methods have been achieving promising performance in image fusion tasks. This is attributed to the network architecture that plays a very important role in the fusion process. However, in general, it is hard to specify a good fusion architecture, and consequently, the design of fusion networks is still a black art, rather than science. To address this problem, we formula… ▽ More

    Submitted 16 April, 2023; v1 submitted 11 April, 2023; originally announced April 2023.

    Comments: 14 pages, 15 figures, 8 tables

  16. arXiv:2211.13189  [pdf, other

    cs.SD cs.CV eess.AS

    ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification

    Authors: Sara Atito, Muhammad Awais, Wenwu Wang, Mark D Plumbley, Josef Kittler

    Abstract: Transformers, which were originally developed for natural language processing, have recently generated significant interest in the computer vision and audio communities due to their flexibility in learning long-range relationships. Constrained by the data hungry nature of transformers and the limited amount of labelled data, most transformer-based models for audio tasks are finetuned from ImageNet… ▽ More

    Submitted 10 March, 2024; v1 submitted 23 November, 2022; originally announced November 2022.

  17. arXiv:2208.09787  [pdf, other

    cs.CV

    RGBD1K: A Large-scale Dataset and Benchmark for RGB-D Object Tracking

    Authors: Xue-Feng Zhu, Tianyang Xu, Zhangyong Tang, Zucheng Wu, Haodong Liu, Xiao Yang, Xiao-Jun Wu, Josef Kittler

    Abstract: RGB-D object tracking has attracted considerable attention recently, achieving promising performance thanks to the symbiosis between visual and depth channels. However, given a limited amount of annotated RGB-D tracking data, most state-of-the-art RGB-D trackers are simple extensions of high-performance RGB-only trackers, without fully exploiting the underlying potential of the depth channel in th… ▽ More

    Submitted 30 December, 2022; v1 submitted 20 August, 2022; originally announced August 2022.

  18. arXiv:2206.11352  [pdf, ps, other

    cs.CV

    Doubly Reparameterized Importance Weighted Structure Learning for Scene Graph Generation

    Authors: Daqi Liu, Miroslaw Bober, Josef Kittler

    Abstract: As a structured prediction task, scene graph generation, given an input image, aims to explicitly model objects and their relationships by constructing a visually-grounded scene graph. In the current literature, such task is universally solved via a message passing neural network based mean field variational Bayesian methodology. The classical loose evidence lower bound is generally chosen as the… ▽ More

    Submitted 22 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2205.07017

  19. arXiv:2206.07967  [pdf, other

    cs.CV

    DreamNet: A Deep Riemannian Network based on SPD Manifold Learning for Visual Classification

    Authors: Rui Wang, Xiao-Jun Wu, Ziheng Chen, Tianyang Xu, Josef Kittler

    Abstract: Image set-based visual classification methods have achieved remarkable performance, via characterising the image set in terms of a non-singular covariance matrix on a symmetric positive definite (SPD) manifold. To adapt to complicated visual scenarios better, several Riemannian networks (RiemNets) for SPD matrix nonlinear processing have recently been studied. However, it is pertinent to ask, whet… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

    Comments: 9 pages, 7 figures

  20. arXiv:2205.14986  [pdf, other

    cs.CV

    GMML is All you Need

    Authors: Sara Atito, Muhammad Awais, Josef Kittler

    Abstract: Vision transformers have generated significant interest in the computer vision community because of their flexibility in exploiting contextual information, whether it is sharply confined local, or long range global. However, they are known to be data hungry. This has motivated the research in self-supervised transformer pretraining, which does not need to decode the semantic information conveyed b… ▽ More

    Submitted 30 May, 2022; originally announced May 2022.

  21. arXiv:2205.07017  [pdf, other

    cs.CV

    Importance Weighted Structure Learning for Scene Graph Generation

    Authors: Daqi Liu, Miroslaw Bober, Josef Kittler

    Abstract: Scene graph generation is a structured prediction task aiming to explicitly model objects and their relationships via constructing a visually-grounded scene graph for an input image. Currently, the message passing neural network based mean field variational Bayesian methodology is the ubiquitous solution for such a task, in which the variational inference objective is often assumed to be the class… ▽ More

    Submitted 14 May, 2022; originally announced May 2022.

  22. arXiv:2201.11697  [pdf, other

    cs.CV cs.AI

    Constrained Structure Learning for Scene Graph Generation

    Authors: Daqi Liu, Miroslaw Bober, Josef Kittler

    Abstract: As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, suc… ▽ More

    Submitted 27 January, 2022; originally announced January 2022.

  23. arXiv:2201.10145  [pdf, other

    cs.CV

    Riemannian Local Mechanism for SPD Neural Networks

    Authors: Ziheng Chen, Tianyang Xu, Xiao-Jun Wu, Rui Wang, Zhiwu Huang, Josef Kittler

    Abstract: The Symmetric Positive Definite (SPD) matrices have received wide attention for data representation in many scientific areas. Although there are many different attempts to develop effective deep architectures for data processing on the Riemannian manifold of SPD matrices, very few solutions explicitly mine the local geometrical information in deep SPD feature representations. Given the great succe… ▽ More

    Submitted 19 May, 2023; v1 submitted 25 January, 2022; originally announced January 2022.

  24. arXiv:2201.08673  [pdf, other

    cs.CV

    Exploring Fusion Strategies for Accurate RGBT Visual Object Tracking

    Authors: Zhangyong Tang, Tianyang Xu, Hui Li, Xiao-Jun Wu, Xuefeng Zhu, Josef Kittler

    Abstract: We address the problem of multi-modal object tracking in video and explore various options of fusing the complementary information conveyed by the visible (RGB) and thermal infrared (TIR) modalities including pixel-level, feature-level and decision-level fusion. Specifically, different from the existing methods, paradigm of image fusion task is heeded for fusion at pixel level. Feature-level fusio… ▽ More

    Submitted 21 January, 2022; originally announced January 2022.

    Comments: 13 pages, 10 figures

  25. arXiv:2112.05727  [pdf, other

    cs.CV

    Neural Belief Propagation for Scene Graph Generation

    Authors: Daqi Liu, Miroslaw Bober, Josef Kittler

    Abstract: Scene graph generation aims to interpret an input image by explicitly modelling the potential objects and their relationships, which is predominantly solved by the message passing neural network models in previous methods. Currently, such approximation models generally assume the output variables are totally independent and thus ignore the informative structural higher-order interactions. This cou… ▽ More

    Submitted 10 December, 2021; originally announced December 2021.

  26. arXiv:2111.15340  [pdf, other

    cs.CV cs.LG

    MC-SSL0.0: Towards Multi-Concept Self-Supervised Learning

    Authors: Sara Atito, Muhammad Awais, Ammarah Farooq, Zhenhua Feng, Josef Kittler

    Abstract: Self-supervised pretraining is the method of choice for natural language processing models and is rapidly gaining popularity in many vision tasks. Recently, self-supervised pretraining has shown to outperform supervised pretraining for many downstream vision applications, marking a milestone in the area. This superiority is attributed to the negative impact of incomplete labelling of the training… ▽ More

    Submitted 30 November, 2021; originally announced November 2021.

  27. arXiv:2111.13156  [pdf, other

    cs.CV

    Global Interaction Modelling in Vision Transformer via Super Tokens

    Authors: Ammarah Farooq, Muhammad Awais, Sara Ahmed, Josef Kittler

    Abstract: With the popularity of Transformer architectures in computer vision, the research focus has shifted towards developing computationally efficient designs. Window-based local attention is one of the major techniques being adopted in recent works. These methods begin with very small patch size and small embedding dimensions and then perform strided convolution (patch merging) in order to reduce the f… ▽ More

    Submitted 25 November, 2021; originally announced November 2021.

  28. Benchmarking Quality-Dependent and Cost-Sensitive Score-Level Multimodal Biometric Fusion Algorithms

    Authors: Norman Poh, Thirimachos Bourlai, Josef Kittler, Lorene Allano, Fernando Alonso-Fernandez, Onkar Ambekar, John Baker, Bernadette Dorizzi, Omolara Fatukasi, Julian Fierrez, Harald Ganster, Javier Ortega-Garcia, Donald Maurer, Albert Ali Salah, Tobias Scheidat, Claus Vielhauer

    Abstract: Automatically verifying the identity of a person by means of biometrics is an important application in day-to-day activities such as accessing banking services and security control in airports. To increase the system reliability, several biometric devices are often used. Such a combined system is known as a multimodal biometric system. This paper reports a benchmarking study carried out within the… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: Published at IEEE Transactions on Information Forensics and Security journal

  29. The Multiscenario Multienvironment BioSecure Multimodal Database (BMDB)

    Authors: Javier Ortega-Garcia, Julian Fierrez, Fernando Alonso-Fernandez, Javier Galbally, Manuel R Freire, Joaquin Gonzalez-Rodriguez, Carmen Garcia-Mateo, Jose-Luis Alba-Castro, Elisardo Gonzalez-Agulla, Enrique Otero-Muras, Sonia Garcia-Salicetti, Lorene Allano, Bao Ly-Van, Bernadette Dorizzi, Josef Kittler, Thirimachos Bourlai, Norman Poh, Farzin Deravi, Ming NR Ng, Michael Fairhurst, Jean Hennebert, Andreas Humm, Massimo Tistarelli, Linda Brodo, Jonas Richiardi , et al. (7 additional authors not shown)

    Abstract: A new multimodal biometric database designed and acquired within the framework of the European BioSecure Network of Excellence is presented. It is comprised of more than 600 individuals acquired simultaneously in three scenarios: 1) over the Internet, 2) in an office environment with desktop PC, and 3) in indoor/outdoor environments with mobile portable hardware. The three scenarios include a comm… ▽ More

    Submitted 17 November, 2021; originally announced November 2021.

    Comments: Published at IEEE Transactions on Pattern Analysis and Machine Intelligence journal

  30. arXiv:2109.10304  [pdf, other

    cs.LG cs.CV

    Learning PAC-Bayes Priors for Probabilistic Neural Networks

    Authors: Maria Perez-Ortiz, Omar Rivasplata, Benjamin Guedj, Matthew Gleeson, Jingyu Zhang, John Shawe-Taylor, Miroslaw Bober, Josef Kittler

    Abstract: Recent works have investigated deep learning models trained by optimising PAC-Bayes bounds, with priors that are learnt on subsets of the data. This combination has been shown to lead not only to accurate classifiers, but also to remarkably tight risk certificates, bearing promise towards self-certified learning (i.e. use all the data to learn a predictor and certify its quality). In this work, we… ▽ More

    Submitted 21 September, 2021; originally announced September 2021.

  31. MOON: Multi-Hash Codes Joint Learning for Cross-Media Retrieval

    Authors: Donglin Zhang, Xiao-Jun Wu, He-Feng Yin, Josef Kittler

    Abstract: In recent years, cross-media hashing technique has attracted increasing attention for its high computation efficiency and low storage cost. However, the existing approaches still have some limitations, which need to be explored. 1) A fixed hash length (e.g., 16bits or 32bits) is predefined before learning the binary codes. Therefore, these models need to be retrained when the hash length changes,… ▽ More

    Submitted 17 August, 2021; originally announced September 2021.

  32. UMFA: A photorealistic style transfer method based on U-Net and multi-layer feature aggregation

    Authors: D. Y. Rao, X. J. Wu, H. Li, J. Kittler, T. Y. Xu

    Abstract: In this paper, we propose a photorealistic style transfer network to emphasize the natural effect of photorealistic image stylization. In general, distortion of the image content and lacking of details are two typical issues in the style transfer field. To this end, we design a novel framework employing the U-Net structure to maintain the rich spatial clues, with a multi-layer feature aggregation… ▽ More

    Submitted 13 August, 2021; originally announced August 2021.

  33. arXiv:2107.13967  [pdf, other

    cs.CV

    PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion

    Authors: Yu Fu, TianYang Xu, XiaoJun Wu, Josef Kittler

    Abstract: The Transformer architecture has witnessed a rapid development in recent years, outperforming the CNN architectures in many computer vision tasks, as exemplified by the Vision Transformers (ViT) for image classification. However, existing visual transformer models aim to extract semantic information for high-level tasks, such as classification and detection.These methods ignore the importance of t… ▽ More

    Submitted 2 August, 2022; v1 submitted 29 July, 2021; originally announced July 2021.

    Comments: We have added more experiments and are improving the article, and will submit it to the journal. It will be updated

  34. arXiv:2104.03602  [pdf, other

    cs.CV cs.LG

    SiT: Self-supervised vIsion Transformer

    Authors: Sara Atito, Muhammad Awais, Josef Kittler

    Abstract: Self-supervised learning methods are gaining increasing traction in computer vision due to their recent success in reducing the gap with supervised learning. In natural language processing (NLP) self-supervised learning and transformers are already the methods of choice. The recent literature suggests that the transformers are becoming increasingly popular also in computer vision. So far, the visi… ▽ More

    Submitted 26 December, 2022; v1 submitted 8 April, 2021; originally announced April 2021.

  35. RFN-Nest: An end-to-end residual fusion network for infrared and visible images

    Authors: Hui Li, Xiao-Jun Wu, Josef Kittler

    Abstract: In the image fusion field, the design of deep learning-based fusion methods is far from routine. It is invariably fusion-task specific and requires a careful consideration. The most difficult part of the design is to choose an appropriate strategy to generate the fused image for a specific task in hand. Thus, devising learnable fusion strategy is a very challenging problem in the community of imag… ▽ More

    Submitted 14 March, 2021; v1 submitted 7 March, 2021; originally announced March 2021.

    Comments: Accepted by Information Fusion. 17 pages, 18 figures, 8 tables

  36. arXiv:2103.03503  [pdf

    cs.CV cs.LG

    NPT-Loss: A Metric Loss with Implicit Mining for Face Recognition

    Authors: Syed Safwan Khalid, Muhammad Awais, Chi-Ho Chan, Zhenhua Feng, Ammarah Farooq, Ali Akbari, Josef Kittler

    Abstract: Face recognition (FR) using deep convolutional neural networks (DCNNs) has seen remarkable success in recent years. One key ingredient of DCNN-based FR is the appropriate design of a loss function that ensures discrimination between various identities. The state-of-the-art (SOTA) solutions utilise normalised Softmax loss with additive and/or multiplicative margins. Despite being popular, these Sof… ▽ More

    Submitted 5 March, 2021; originally announced March 2021.

  37. arXiv:2103.02126  [pdf, other

    cs.LG

    Differentiable Neural Architecture Learning for Efficient Neural Network Design

    Authors: Qingbei Guo, Xiao-Jun Wu, Josef Kittler, Zhiquan Feng

    Abstract: Automated neural network design has received ever-increasing attention with the evolution of deep convolutional neural networks (CNNs), especially involving their deployment on embedded and mobile platforms. One of the biggest problems that neural architecture search (NAS) confronts is that a large number of candidate neural architectures are required to train, using, for instance, reinforcement l… ▽ More

    Submitted 2 March, 2021; originally announced March 2021.

    Comments: 10 pages, 8 figures, 53 conferences

  38. arXiv:2102.10526  [pdf, other

    cs.CV eess.IV

    Deep Decomposition Network for Image Processing: A Case Study for Visible and Infrared Image Fusion

    Authors: Yu Fu, Xiao-Jun Wu, Josef Kittler

    Abstract: Image decomposition is a crucial subject in the field of image processing. It can extract salient features from the source image. We propose a new image decomposition method based on convolutional neural network. This method can be applied to many image processing tasks. In this paper, we apply the image decomposition network to the image fusion task. We input infrared image and visible light imag… ▽ More

    Submitted 3 August, 2022; v1 submitted 21 February, 2021; originally announced February 2021.

  39. arXiv:2101.08238  [pdf, other

    cs.CV cs.LG

    AXM-Net: Implicit Cross-Modal Feature Alignment for Person Re-identification

    Authors: Ammarah Farooq, Muhammad Awais, Josef Kittler, Syed Safwan Khalid

    Abstract: Cross-modal person re-identification (Re-ID) is critical for modern video surveillance systems. The key challenge is to align cross-modality representations induced by the semantic information present for a person and ignore background information. This work presents a novel convolutional neural network (CNN) based architecture designed to learn semantically aligned cross-modal visual and textual… ▽ More

    Submitted 20 July, 2022; v1 submitted 19 January, 2021; originally announced January 2021.

    Comments: AAAI-2022 (Oral Paper)

  40. arXiv:2101.06663  [pdf, other

    cs.CV

    Separable Batch Normalization for Robust Facial Landmark Localization with Cross-protocol Network Training

    Authors: Shuangping Jin, Zhenhua Feng, Wankou Yang, Josef Kittler

    Abstract: A big, diverse and balanced training data is the key to the success of deep neural network training. However, existing publicly available datasets used in facial landmark localization are usually much smaller than those for other computer vision tasks. A small dataset without diverse and balanced training samples cannot support the training of a deep network effectively. To address the above issue… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

    Comments: 10 pages,6 figures

  41. arXiv:2010.10368  [pdf, other

    cs.CV cs.AI

    A Flatter Loss for Bias Mitigation in Cross-dataset Facial Age Estimation

    Authors: Ali Akbari, Muhammad Awais, Zhen-Hua Feng, Ammarah Farooq, Josef Kittler

    Abstract: The most existing studies in the facial age estimation assume training and test images are captured under similar shooting conditions. However, this is rarely valid in real-world applications, where training and test sets usually have different characteristics. In this paper, we advocate a cross-dataset protocol for age estimation benchmarking. In order to improve the cross-dataset age estimation… ▽ More

    Submitted 26 October, 2020; v1 submitted 20 October, 2020; originally announced October 2020.

  42. arXiv:2009.13803  [pdf, other

    cs.LG cs.CV

    Self-grouping Convolutional Neural Networks

    Authors: Qingbei Guo, Xiao-Jun Wu, Josef Kittler, Zhiquan Feng

    Abstract: Although group convolution operators are increasingly used in deep convolutional neural networks to improve the computational efficiency and to reduce the number of parameters, most existing methods construct their group convolution architectures by a predefined partitioning of the filters of each convolutional layer into multiple regular filter groups with an equal spatial group size and data-ind… ▽ More

    Submitted 29 September, 2020; originally announced September 2020.

  43. Deep Convolutional Neural Network Ensembles using ECOC

    Authors: Sara Atito Ali Ahmed, Cemre Zor, Berrin Yanikoglu, Muhammad Awais, Josef Kittler

    Abstract: Deep neural networks have enhanced the performance of decision making systems in many applications including image understanding, and further gains can be achieved by constructing ensembles. However, designing an ensemble of deep networks is often not very beneficial since the time needed to train the networks is very high or the performance gain obtained is not very significant. In this paper, we… ▽ More

    Submitted 7 March, 2021; v1 submitted 7 September, 2020; originally announced September 2020.

    Comments: 13 pages double column IEEE transactions style

    MSC Class: 68T07; ACM Class: I.5.2; I.2.0

  44. arXiv:2007.05175  [pdf, other

    cs.CV

    Affine Non-negative Collaborative Representation Based Pattern Classification

    Authors: He-Feng Yin, Xiao-Jun Wu, Zhen-Hua Feng, Josef Kittler

    Abstract: During the past decade, representation-based classification methods have received considerable attention in pattern recognition. In particular, the recently proposed non-negative representation based classification (NRC) method has been reported to achieve promising results in a wide range of classification tasks. However, NRC has two major drawbacks. First, there is no regularization term in the… ▽ More

    Submitted 10 July, 2020; originally announced July 2020.

    Comments: submitted to the 25th International Conference on Pattern Recognition (ICPR2020)

  45. arXiv:2005.13708  [pdf, other

    cs.CV

    AFAT: Adaptive Failure-Aware Tracker for Robust Visual Object Tracking

    Authors: Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler

    Abstract: Siamese approaches have achieved promising performance in visual object tracking recently. The key to the success of Siamese trackers is to learn appearance-invariant feature embedding functions via pair-wise offline training on large-scale video datasets. However, the Siamese paradigm uses one-shot learning to model the online tracking task, which impedes online adaptation in the tracking process… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

  46. arXiv:2003.00808  [pdf, other

    cs.CV cs.LG stat.ML

    A Convolutional Baseline for Person Re-Identification Using Vision and Language Descriptions

    Authors: Ammarah Farooq, Muhammad Awais, Fei Yan, Josef Kittler, Ali Akbari, Syed Safwan Khalid

    Abstract: Classical person re-identification approaches assume that a person of interest has appeared across different cameras and can be queried by one of the existing images. However, in real-world surveillance scenarios, frequently no visual information will be available about the queried person. In such scenarios, a natural language description of the person by a witness will provide the only source of… ▽ More

    Submitted 20 February, 2020; originally announced March 2020.

    Comments: 12 pages including references, currently under review in IEEE transactions on Image Processing

  47. arXiv:1912.11325  [pdf

    cs.CV

    Adaptive Distraction Context Aware Tracking Based on Correlation Filter

    Authors: Fei Feng, Xiao-Jun Wu, Tianyang Xu, Josef Kittler, Xue-Feng Zhu

    Abstract: The Discriminative Correlation Filter (CF) uses a circulant convolution operation to provide several training samples for the design of a classifier that can distinguish the target from the background. The filter design may be interfered by objects close to the target during the tracking process, resulting in tracking failure. This paper proposes an adaptive distraction context aware tracking algo… ▽ More

    Submitted 24 December, 2019; originally announced December 2019.

  48. arXiv:1912.03145  [pdf, other

    cs.CV

    Face Recognition via Locality Constrained Low Rank Representation and Dictionary Learning

    Authors: He-Feng Yin, Xiao-Jun Wu, Josef Kittler

    Abstract: Face recognition has been widely studied due to its importance in smart cities applications. However, the case when both training and test images are corrupted is not well solved. To address such a problem, this paper proposes a locality constrained low rank representation and dictionary learning (LCLRRDL) algorithm for robust face recognition. In particular, we present three contributions in the… ▽ More

    Submitted 6 December, 2019; originally announced December 2019.

    Comments: 6 pages, 2 figures

  49. An Accelerated Correlation Filter Tracker

    Authors: Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler

    Abstract: Recent visual object tracking methods have witnessed a continuous improvement in the state-of-the-art with the development of efficient discriminative correlation filters (DCF) and robust deep neural network features. Despite the outstanding performance achieved by the above combination, existing advanced trackers suffer from the burden of high computational complexity of the deep feature extracti… ▽ More

    Submitted 5 December, 2019; originally announced December 2019.

    Journal ref: Pattern Recognition 102(2019) 107172

  50. arXiv:1911.10301  [pdf, other

    cs.CV

    Learning a Representation with the Block-Diagonal Structure for Pattern Classification

    Authors: He-Feng Yin, Xiao-Jun Wu, Josef Kittler, Zhen-Hua Feng

    Abstract: Sparse-representation-based classification (SRC) has been widely studied and developed for various practical signal classification applications. However, the performance of a SRC-based method is degraded when both the training and test data are corrupted. To counteract this problem, we propose an approach that learns Representation with Block-Diagonal Structure (RBDS) for robust image recognition.… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

    Comments: accepted by Pattern Analysis and Applications