subscribe to arXiv mailings

arXiv:1911.10301 [pdf, other]

Learning a Representation with the Block-Diagonal Structure for Pattern Classification

Authors: He-Feng Yin, Xiao-Jun Wu, Josef Kittler, Zhen-Hua Feng

Abstract: Sparse-representation-based classification (SRC) has been widely studied and developed for various practical signal classification applications. However, the performance of a SRC-based method is degraded when both the training and test data are corrupted. To counteract this problem, we propose an approach that learns Representation with Block-Diagonal Structure (RBDS) for robust image recognition.… ▽ More Sparse-representation-based classification (SRC) has been widely studied and developed for various practical signal classification applications. However, the performance of a SRC-based method is degraded when both the training and test data are corrupted. To counteract this problem, we propose an approach that learns Representation with Block-Diagonal Structure (RBDS) for robust image recognition. To be more specific, we first introduce a regularization term that captures the block-diagonal structure of the target representation matrix of the training data. The resulting problem is then solved by an optimizer. Last, based on the learned representation, a simple yet effective linear classifier is used for the classification task. The experimental results obtained on several benchmarking datasets demonstrate the efficacy of the proposed RBDS method. △ Less

Submitted 22 November, 2019; originally announced November 2019.

Comments: accepted by Pattern Analysis and Applications

arXiv:1909.08797 [pdf, other]

Dual Encoder-Decoder based Generative Adversarial Networks for Disentangled Facial Representation Learning

Authors: Cong Hu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler

Abstract: To learn disentangled representations of facial images, we present a Dual Encoder-Decoder based Generative Adversarial Network (DED-GAN). In the proposed method, both the generator and discriminator are designed with deep encoder-decoder architectures as their backbones. To be more specific, the encoder-decoder structured generator is used to learn a pose disentangled face representation, and the… ▽ More To learn disentangled representations of facial images, we present a Dual Encoder-Decoder based Generative Adversarial Network (DED-GAN). In the proposed method, both the generator and discriminator are designed with deep encoder-decoder architectures as their backbones. To be more specific, the encoder-decoder structured generator is used to learn a pose disentangled face representation, and the encoder-decoder structured discriminator is tasked to perform real/fake classification, face reconstruction, determining identity and estimating face pose. We further improve the proposed network architecture by minimising the additional pixel-wise loss defined by the Wasserstein distance at the output of the discriminator so that the adversarial framework can be better trained. Additionally, we consider face pose variation to be continuous, rather than discrete in existing literature, to inject richer pose information into our model. The pose estimation task is formulated as a regression problem, which helps to disentangle identity information from pose variations. The proposed network is evaluated on the tasks of pose-invariant face recognition (PIFR) and face synthesis across poses. An extensive quantitative and qualitative evaluation carried out on several controlled and in-the-wild benchmarking datasets demonstrates the superiority of the proposed DED-GAN method over the state-of-the-art approaches. △ Less

Submitted 19 September, 2019; originally announced September 2019.

arXiv:1909.07273 [pdf, other]

More About Covariance Descriptors for Image Set Coding: Log-Euclidean Framework based Kernel Matrix Representation

Authors: Kai-Xuan Chen, Xiao-Jun Wu, Jie-Yi Ren, Rui Wang, Josef Kittler

Abstract: We consider a family of structural descriptors for visual data, namely covariance descriptors (CovDs) that lie on a non-linear symmetric positive definite (SPD) manifold, a special type of Riemannian manifolds. We propose an improved version of CovDs for image set coding by extending the traditional CovDs from Euclidean space to the SPD manifold. Specifically, the manifold of SPD matrices is a com… ▽ More We consider a family of structural descriptors for visual data, namely covariance descriptors (CovDs) that lie on a non-linear symmetric positive definite (SPD) manifold, a special type of Riemannian manifolds. We propose an improved version of CovDs for image set coding by extending the traditional CovDs from Euclidean space to the SPD manifold. Specifically, the manifold of SPD matrices is a complete inner product space with the operations of logarithmic multiplication and scalar logarithmic multiplication defined in the Log-Euclidean framework. In this framework, we characterise covariance structure in terms of the arc-cosine kernel which satisfies Mercer's condition and propose the operation of mean centralization on SPD matrices. Furthermore, we combine arc-cosine kernels of different orders using mixing parameters learnt by kernel alignment in a supervised manner. Our proposed framework provides a lower-dimensional and more discriminative data representation for the task of image set classification. The experimental results demonstrate its superior performance, measured in terms of recognition accuracy, as compared with the state-of-the-art methods. △ Less

Submitted 26 September, 2019; v1 submitted 16 September, 2019; originally announced September 2019.

Comments: 10 pages

arXiv:1908.11821 [pdf, other]

Dual Attention MobDenseNet(DAMDNet) for Robust 3D Face Alignment

Authors: Lei Jiang Xiao-Jun Wu Josef Kittler

Abstract: 3D face alignment of monocular images is a crucial process in the recognition of faces with disguise.3D face reconstruction facilitated by alignment can restore the face structure which is helpful in detcting disguise interference.This paper proposes a dual attention mechanism and an efficient end-to-end 3D face alignment framework.We build a stable network model through Depthwise Separable Convol… ▽ More 3D face alignment of monocular images is a crucial process in the recognition of faces with disguise.3D face reconstruction facilitated by alignment can restore the face structure which is helpful in detcting disguise interference.This paper proposes a dual attention mechanism and an efficient end-to-end 3D face alignment framework.We build a stable network model through Depthwise Separable Convolution, Densely Connected Convolutional and Lightweight Channel Attention Mechanism. In order to enhance the ability of the network model to extract the spatial features of the face region, we adopt Spatial Group-wise Feature enhancement module to improve the representation ability of the network. Different loss functions are applied jointly to constrain the 3D parameters of a 3D Morphable Model (3DMM) and its 3D vertices. We use a variety of data enhancement methods and generate large virtual pose face data sets to solve the data imbalance problem. The experiments on the challenging AFLW,AFLW2000-3D datasets show that our algorithm significantly improves the accuracy of 3D face alignment. Our experiments using the field DFW dataset show that DAMDNet exhibits excellent performance in the 3D alignment and reconstruction of challenging disguised faces.The model parameters and the complexity of the proposed method are also reduced significantly.The code is publicly available at https:// github.com/LeiJiangJNU/DAMDNet △ Less

Submitted 30 August, 2019; originally announced August 2019.

Comments: 10 pages

Journal ref: ICCV2019 workshop

arXiv:1908.01950 [pdf, other]

Multiple Riemannian Manifold-valued Descriptors based Image Set Classification with Multi-Kernel Metric Learning

Authors: Rui Wang, XiaoJun Wu, Josef Kittler

Abstract: The importance of wild video based image set recognition is becoming monotonically increasing. However, the contents of these collected videos are often complicated, and how to efficiently perform set modeling and feature extraction is a big challenge for set-based classification algorithms. In recent years, some proposed image set classification methods have made a considerable advance by modelin… ▽ More The importance of wild video based image set recognition is becoming monotonically increasing. However, the contents of these collected videos are often complicated, and how to efficiently perform set modeling and feature extraction is a big challenge for set-based classification algorithms. In recent years, some proposed image set classification methods have made a considerable advance by modeling the original image set with covariance matrix, linear subspace, or Gaussian distribution. As a matter of fact, most of them just adopt a single geometric model to describe each given image set, which may lose some other useful information for classification. To tackle this problem, we propose a novel algorithm to model each image set from a multi-geometric perspective. Specifically, the covariance matrix, linear subspace, and Gaussian distribution are applied for set representation simultaneously. In order to fuse these multiple heterogeneous Riemannian manifoldvalued features, the well-equipped Riemannian kernel functions are first utilized to map them into high dimensional Hilbert spaces. Then, a multi-kernel metric learning framework is devised to embed the learned hybrid kernels into a lower dimensional common subspace for classification. We conduct experiments on four widely used datasets corresponding to four different classification tasks: video-based face recognition, set-based object categorization, video-based emotion recognition, and dynamic scene classification, to evaluate the classification performance of the proposed algorithm. Extensive experimental results justify its superiority over the state-of-the-art. △ Less

Submitted 6 August, 2019; originally announced August 2019.

Comments: 15 pages, 9 figures

arXiv:1907.13242 [pdf, other]

Joint Group Feature Selection and Discriminative Filter Learning for Robust Visual Object Tracking

Authors: Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler

Abstract: We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or… ▽ More We propose a new Group Feature Selection method for Discriminative Correlation Filters (GFS-DCF) based visual object tracking. The key innovation of the proposed method is to perform group feature selection across both channel and spatial dimensions, thus to pinpoint the structural relevance of multi-channel features to the filtering system. In contrast to the widely used spatial regularisation or feature selection methods, to the best of our knowledge, this is the first time that channel selection has been advocated for DCF-based tracking. We demonstrate that our GFS-DCF method is able to significantly improve the performance of a DCF tracker equipped with deep neural network features. In addition, our GFS-DCF enables joint feature selection and filter learning, achieving enhanced discrimination and interpretability of the learned filters. To further improve the performance, we adaptively integrate historical information by constraining filters to be smooth across temporal frames, using an efficient low-rank approximation. By design, specific temporal-spatial-channel configurations are dynamically learned in the tracking process, highlighting the relevant features, and alleviating the performance degrading impact of less discriminative representations and reducing information redundancy. The experimental results obtained on OTB2013, OTB2015, VOT2017, VOT2018 and TrackingNet demonstrate the merits of our GFS-DCF and its superiority over the state-of-the-art trackers. The code is publicly available at https://github.com/XU-TIANYANG/GFS-DCF. △ Less

Submitted 2 August, 2019; v1 submitted 30 July, 2019; originally announced July 2019.

arXiv:1905.09173 [pdf, other]

Multi-Task Kernel Null-Space for One-Class Classification

Authors: Shervin Rahimzadeh Arashloo, Josef Kittler

Abstract: The one-class kernel spectral regression (OC-KSR), the regression-based formulation of the kernel null-space approach has been found to be an effective Fisher criterion-based methodology for one-class classification (OCC), achieving state-of-the-art performance in one-class classification while providing relatively high robustness against data corruption. This work extends the OC-KSR methodology t… ▽ More The one-class kernel spectral regression (OC-KSR), the regression-based formulation of the kernel null-space approach has been found to be an effective Fisher criterion-based methodology for one-class classification (OCC), achieving state-of-the-art performance in one-class classification while providing relatively high robustness against data corruption. This work extends the OC-KSR methodology to a multi-task setting where multiple one-class problems share information for improved performance. By viewing the multi-task structure learning problem as one of compositional function learning, first, the OC-KSR method is extended to learn multiple tasks' structure \textit{linearly} by posing it as an instantiation of the separable kernel learning problem in a vector-valued reproducing kernel Hilbert space where an output kernel encodes tasks' structure while another kernel captures input similarities. Next, a non-linear structure learning mechanism is proposed which captures multiple tasks' relationships \textit{non-linearly} via an output kernel. The non-linear structure learning method is then extended to a sparse setting where different tasks compete in an output composition mechanism, leading to a sparse non-linear structure among multiple problems. Through extensive experiments on different data sets, the merits of the proposed multi-task kernel null-space techniques are verified against the baseline as well as other existing multi-task one-class learning techniques. △ Less

Submitted 22 May, 2019; originally announced May 2019.

arXiv:1905.05445 [pdf, other]

Transition Subspace Learning based Least Squares Regression for Image Classification

Authors: Zhe Chen, Xiao-Jun Wu, Josef Kittler

Abstract: Only learning one projection matrix from original samples to the corresponding binary labels is too strict and will consequentlly lose some intrinsic geometric structures of data. In this paper, we propose a novel transition subspace learning based least squares regression (TSL-LSR) model for multicategory image classification. The main idea of TSL-LSR is to learn a transition subspace between the… ▽ More Only learning one projection matrix from original samples to the corresponding binary labels is too strict and will consequentlly lose some intrinsic geometric structures of data. In this paper, we propose a novel transition subspace learning based least squares regression (TSL-LSR) model for multicategory image classification. The main idea of TSL-LSR is to learn a transition subspace between the original samples and binary labels to alleviate the problem of overfitting caused by strict projection learning. Moreover, in order to reflect the underlying low-rank structure of transition matrix and learn more discriminative projection matrix, a low-rank constraint is added to the transition subspace. Experimental results on several image datasets demonstrate the effectiveness of the proposed TSL-LSR model in comparison with state-of-the-art algorithms △ Less

Submitted 14 June, 2019; v1 submitted 14 May, 2019; originally announced May 2019.

arXiv:1903.07836 [pdf, other]

Non-negative representation based discriminative dictionary learning for face recognition

Authors: Zhe Chen, Xiao-Jun Wu, Josef Kittler

Abstract: In this paper, we propose a non-negative representation based discriminative dictionary learning algorithm (NRDL) for multicategory face classification. In contrast to traditional dictionary learning methods, NRDL investigates the use of non-negative representation (NR), which contributes to learning discriminative dictionary atoms. In order to make the learned dictionary more suitable for classif… ▽ More In this paper, we propose a non-negative representation based discriminative dictionary learning algorithm (NRDL) for multicategory face classification. In contrast to traditional dictionary learning methods, NRDL investigates the use of non-negative representation (NR), which contributes to learning discriminative dictionary atoms. In order to make the learned dictionary more suitable for classification, NRDL seamlessly incorporates nonnegative representation constraint, discriminative dictionary learning and linear classifier training into a unified model. Specifically, NRDL introduces a positive constraint on representation matrix to find distinct atoms from heterogeneous training samples, which results in sparse and discriminative representation. Moreover, a discriminative dictionary encouraging function is proposed to enhance the uniqueness of class-specific sub-dictionaries. Meanwhile, an inter-class incoherence constraint and a compact graph based regularization term are constructed to respectively improve the discriminability of learned classifier. Experimental results on several benchmark face data sets verify the advantages of our NRDL algorithm over the state-of-the-art dictionary learning methods. △ Less

Submitted 28 September, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

arXiv:1903.07833 [pdf, other]

Fisher Discriminative Least Squares Regression for Image Classification

Authors: Zhe Chen, Xiao-Jun Wu, Josef Kittler

Abstract: Discriminative least squares regression (DLSR) has been shown to achieve promising performance in multi-class image classification tasks. Its key idea is to force the regression labels of different classes to move in opposite directions by means of the proposed the joint use of the $ε$-draggings technique, yielding discriminative regression model exhibiting wider margins, and the Fisher criterion.… ▽ More Discriminative least squares regression (DLSR) has been shown to achieve promising performance in multi-class image classification tasks. Its key idea is to force the regression labels of different classes to move in opposite directions by means of the proposed the joint use of the $ε$-draggings technique, yielding discriminative regression model exhibiting wider margins, and the Fisher criterion. The $ε$-draggings technique ignores an important problem: its non-negative relaxation matrix is dynamically updated in optimization, which means the dragging values can also cause the labels from the same class to be uncorrelated. In order to learn a more powerful discriminative projection, as well as regression labels, we propose a Fisher regularized DLSR (FDLSR) framework by constraining the relaxed labels using the Fisher criterion. On one hand, the Fisher criterion improves the intra-class compactness of the relaxed labels during relaxation learning. On the other hand, it is expected further to enhance the inter-class separability of $ε$-draggings technique. FDLSR for the first time ever attempts to integrate the Fisher discriminant criterion and $ε$-draggings technique into one unified model because they are absolutely complementary in learning discriminative projection. Extensive experiments on various datasets demonstrate that the proposed FDLSR method achieves performance that is superior to other state-of-the-art classification methods. The Matlab codes of this paper are available at https://github.com/chenzhe207/FDLSR. △ Less

Submitted 4 August, 2020; v1 submitted 19 March, 2019; originally announced March 2019.

arXiv:1903.07832 [pdf, other]

Low-Rank Discriminative Least Squares Regression for Image Classification

Authors: Zhe Chen, Xiao-Jun Wu, Josef Kittler

Abstract: Latest least squares regression (LSR) methods mainly try to learn slack regression targets to replace strict zero-one labels. However, the difference of intra-class targets can also be highlighted when enlarging the distance between different classes, and roughly persuing relaxed targets may lead to the problem of overfitting. To solve above problems, we propose a low-rank discriminative least squ… ▽ More Latest least squares regression (LSR) methods mainly try to learn slack regression targets to replace strict zero-one labels. However, the difference of intra-class targets can also be highlighted when enlarging the distance between different classes, and roughly persuing relaxed targets may lead to the problem of overfitting. To solve above problems, we propose a low-rank discriminative least squares regression model (LRDLSR) for multi-class image classification. Specifically, LRDLSR class-wisely imposes low-rank constraint on the intra-class regression targets to encourage its compactness and similarity. Moreover, LRDLSR introduces an additional regularization term on the learned targets to avoid the problem of overfitting. These two improvements are helpful to learn a more discriminative projection for regression and thus achieving better classification performance. Experimental results over a range of image databases demonstrate the effectiveness of the proposed LRDLSR method. △ Less

Submitted 8 October, 2019; v1 submitted 19 March, 2019; originally announced March 2019.

arXiv:1903.05434 [pdf, ps, other]

Visual Semantic Information Pursuit: A Survey

Authors: Daqi Liu, Miroslaw Bober, Josef Kittler

Abstract: Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task while the latter one corresponds to visual context reasoning. Remarkable advances in visual perception have been achieved due to the success of deep learning. In… ▽ More Visual semantic information comprises two important parts: the meaning of each visual semantic unit and the coherent visual semantic relation conveyed by these visual semantic units. Essentially, the former one is a visual perception task while the latter one corresponds to visual context reasoning. Remarkable advances in visual perception have been achieved due to the success of deep learning. In contrast, visual semantic information pursuit, a visual scene semantic interpretation task combining visual perception and visual context reasoning, is still in its early stage. It is the core task of many different computer vision applications, such as object detection, visual semantic segmentation, visual relationship detection or scene graph generation. Since it helps to enhance the accuracy and the consistency of the resulting interpretation, visual context reasoning is often incorporated with visual perception in current deep end-to-end visual semantic information pursuit methods. However, a comprehensive review for this exciting area is still lacking. In this survey, we present a unified theoretical paradigm for all these methods, followed by an overview of the major developments and the future trends in each potential direction. The common benchmark datasets, the evaluation metrics and the comparisons of the corresponding methods are also introduced. △ Less

Submitted 13 March, 2019; originally announced March 2019.

Comments: Preliminary work. Under review by IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI). Do not distribute

arXiv:1902.02208 [pdf, other]

Robust One-Class Kernel Spectral Regression

Authors: Shervin Rahimzadeh Arashloo, Josef Kittler

Abstract: The kernel null-space technique and its regression-based formulation (called one-class kernel spectral regression, a.k.a. OC-KSR) is known to be an effective and computationally attractive one-class classification framework. Despite its outstanding performance, the applicability of kernel null-space method is limited due to its susceptibility to possible training data corruptions and inability to… ▽ More The kernel null-space technique and its regression-based formulation (called one-class kernel spectral regression, a.k.a. OC-KSR) is known to be an effective and computationally attractive one-class classification framework. Despite its outstanding performance, the applicability of kernel null-space method is limited due to its susceptibility to possible training data corruptions and inability to rank training observations according to their conformity with the model. This work addresses these shortcomings by studying the effect of regularising the solution of the null-space kernel Fisher methodology in the context of its regression-based formulation (OC-KSR). In this respect, first, the effect of a Tikhonov regularisation in the Hilbert space is analysed where the one-class learning problem in presence of contaminations in the training set is posed as a sensitivity analysis problem. Next, driven by the success of the sparse representation methodology, the effect of a sparsity regularisation on the solution is studied. For both alternative regularisation schemes, iterative algorithms are proposed which recursively update label confidences and rank training observations based on their fit with the model. Through extensive experiments conducted on different data sets, the proposed methodology is found to enhance robustness against contamination in the training set as compared with the baseline kernel null-space technique as well as other existing approaches in a one-class classification paradigm while providing the functionality to rank training samples effectively. △ Less

Submitted 6 February, 2019; originally announced February 2019.

arXiv:1812.07660 [pdf]

Discriminative Supervised Hashing for Cross-Modal similarity Search

Authors: Jun Yu, Xiao-Jun Wu, Josef Kittler

Abstract: With the advantage of low storage cost and high retrieval efficiency, hashing techniques have recently been an emerging topic in cross-modal similarity search. As multiple modal data reflect similar semantic content, many researches aim at learning unified binary codes. However, discriminative hashing features learned by these methods are not adequate. This results in lower accuracy and robustness… ▽ More With the advantage of low storage cost and high retrieval efficiency, hashing techniques have recently been an emerging topic in cross-modal similarity search. As multiple modal data reflect similar semantic content, many researches aim at learning unified binary codes. However, discriminative hashing features learned by these methods are not adequate. This results in lower accuracy and robustness. We propose a novel hashing learning framework which jointly performs classifier learning, subspace learning and matrix factorization to preserve class-specific semantic content, termed Discriminative Supervised Hashing (DSH), to learn the discrimative unified binary codes for multi-modal data. Besides, reducing the loss of information and preserving the non-linear structure of data, DSH non-linearly projects different modalities into the common space in which the similarity among heterogeneous data points can be measured. Extensive experiments conducted on the three publicly available datasets demonstrate that the framework proposed in this paper outperforms several state-of -the-art methods. △ Less

Submitted 17 April, 2019; v1 submitted 5 December, 2018; originally announced December 2018.

Comments: 7 pages,3 figures,4 tables;The paper is under consideration at Image and Vision Computing

arXiv:1811.05295 [pdf]

Pose Invariant 3D Face Reconstruction

Authors: Lei Jiang, XiaoJun Wu, Josef Kittler

Abstract: 3D face reconstruction is an important task in the field of computer vision. Although 3D face reconstruction has being developing rapidly in recent years, it is still a challenge for face reconstruction under large pose. That is because much of the information about a face in a large pose will be unknowable. In order to address this issue, this paper proposes a novel 3D face reconstruction algorit… ▽ More 3D face reconstruction is an important task in the field of computer vision. Although 3D face reconstruction has being developing rapidly in recent years, it is still a challenge for face reconstruction under large pose. That is because much of the information about a face in a large pose will be unknowable. In order to address this issue, this paper proposes a novel 3D face reconstruction algorithm (PIFR) based on 3D Morphable Model (3DMM). After input a single face image, it generates a frontal image by normalizing the image. Then we set weighted sum of the 3D parameters of the two images. Our method solves the problem of face reconstruction of a single image of a traditional method in a large pose, works on arbitrary Pose and Expressions, greatly improves the accuracy of reconstruction. Experiments on the challenging AFW, LFPW and AFLW database show that our algorithm significantly improves the accuracy of 3D face reconstruction even under extreme poses . △ Less

Submitted 13 November, 2018; originally announced November 2018.

Comments: 8 pages

arXiv:1811.02291 [pdf, other]

doi 10.1109/TIP.2020.2975984

MDLatLRR: A novel decomposition method for infrared and visible image fusion

Authors: Hui Li, Xiao-Jun Wu, Josef Kittler

Abstract: Image decomposition is crucial for many image processing tasks, as it allows to extract salient features from source images. A good image decomposition method could lead to a better performance, especially in image fusion tasks. We propose a multi-level image decomposition method based on latent low-rank representation(LatLRR), which is called MDLatLRR. This decomposition method is applicable to m… ▽ More Image decomposition is crucial for many image processing tasks, as it allows to extract salient features from source images. A good image decomposition method could lead to a better performance, especially in image fusion tasks. We propose a multi-level image decomposition method based on latent low-rank representation(LatLRR), which is called MDLatLRR. This decomposition method is applicable to many image processing fields. In this paper, we focus on the image fusion task. We develop a novel image fusion framework based on MDLatLRR, which is used to decompose source images into detail parts(salient features) and base parts. A nuclear-norm based fusion strategy is used to fuse the detail parts, and the base parts are fused by an averaging strategy. Compared with other state-of-the-art fusion methods, the proposed algorithm exhibits better fusion performance in both subjective and objective evaluation. △ Less

Submitted 23 March, 2020; v1 submitted 6 November, 2018; originally announced November 2018.

Comments: IEEE Trans. Image Processing 2020, 14 pages, 17 figures, 3 tables. arXiv admin note: text overlap with arXiv:1804.08992

arXiv:1808.05399 [pdf]

Landmark Weighting for 3DMM Shape Fitting

Authors: Yu Yanga, Xiao-Jun Wu, Josef Kittler

Abstract: Human face is a 3D object with shape and surface texture. 3D Morphable Model (3DMM) is a powerful tool for reconstructing the 3D face from a single 2D face image. In the shape fitting process, 3DMM estimates the correspondence between 2D and 3D landmarks. Most traditional 3DMM fitting methods fail to reconstruct an accurate model because face shape fitting is a difficult non-linear optimization pr… ▽ More Human face is a 3D object with shape and surface texture. 3D Morphable Model (3DMM) is a powerful tool for reconstructing the 3D face from a single 2D face image. In the shape fitting process, 3DMM estimates the correspondence between 2D and 3D landmarks. Most traditional 3DMM fitting methods fail to reconstruct an accurate model because face shape fitting is a difficult non-linear optimization problem. In this paper we show that landmark weighting is instrumental to improve the accuracy of shape reconstruction and propose a novel 3D Morphable Model Fitting method. Different from previous works that treat all landmarks equally, we take into consideration the estimated errors for each pair of 2D and 3D corresponding landmarks. The landmark points are weighted in the optimization cost function based on these errors. Obviously, these landmarks have different semantics because they locate on different facial components. In the context of the solution of fitting is approximated, there are deviations in landmarks matching. However, these landmarks with different semantics have different effects on reconstructing 3D faces. Thus, it is necessary to consider each landmark individually. To our knowledge, we are the first to analyze each feature point for 3D face reconstruction by 3DMM. The weight is adaptive with the estimation residuals of landmarks. Experimental results show that the proposed method significantly reduces the reconstruction error and improves the authenticity of the 3D model expression. △ Less

Submitted 16 August, 2018; originally announced August 2018.

Comments: 7 pages, 7 figures

arXiv:1808.04152 [pdf, other]

Learning Discriminative Hashing Codes for Cross-Modal Retrieval based on Multi-view Features

Authors: Jun Yu, Xiao-Jun Wu, Josef Kittler

Abstract: Hashing techniques have been applied broadly in retrieval tasks due to their low storage requirements and high speed of processing. Many hashing methods based on a single view have been extensively studied for information retrieval. However, the representation capacity of a single view is insufficient and some discriminative information is not captured, which results in limited improvement. In thi… ▽ More Hashing techniques have been applied broadly in retrieval tasks due to their low storage requirements and high speed of processing. Many hashing methods based on a single view have been extensively studied for information retrieval. However, the representation capacity of a single view is insufficient and some discriminative information is not captured, which results in limited improvement. In this paper, we employ multiple views to represent images and texts for enriching the feature information. Our framework exploits the complementary information among multiple views to better learn the discriminative compact hash codes. A discrete hashing learning framework that jointly performs classifier learning and subspace learning is proposed to complete multiple search tasks simultaneously. Our framework includes two stages, namely a kernelization process and a quantization process. Kernelization aims to find a common subspace where multi-view features can be fused. The quantization stage is designed to learn discriminative unified hashing codes. Extensive experiments are performed on single-label datasets (WiKi and MMED) and multi-label datasets (MIRFlickr and NUS-WIDE) and the experimental results indicate the superiority of our method compared with the state-of-the-art methods. △ Less

Submitted 6 January, 2020; v1 submitted 13 August, 2018; originally announced August 2018.

Comments: 28 pages, 10 figures, 13 tables. The paper is under consideration at Pattern Analysis and Applications

arXiv:1807.11348 [pdf, other]

doi 10.1109/TIP.2019.2919201

Learning Adaptive Discriminative Correlation Filters via Temporal Consistency Preserving Spatial Feature Selection for Robust Visual Tracking

Authors: Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler

Abstract: With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovat… ▽ More With efficient appearance learning models, Discriminative Correlation Filter (DCF) has been proven to be very successful in recent video object tracking benchmarks and competitions. However, the existing DCF paradigm suffers from two major issues, i.e., spatial boundary effect and temporal filter degradation. To mitigate these challenges, we propose a new DCF-based tracking method. The key innovations of the proposed method include adaptive spatial feature selection and temporal consistent constraints, with which the new tracker enables joint spatial-temporal filter learning in a lower dimensional discriminative manifold. More specifically, we apply structured spatial sparsity constraints to multi-channel filers. Consequently, the process of learning spatial filters can be approximated by the lasso regularisation. To encourage temporal consistency, the filter model is restricted to lie around its historical value and updated locally to preserve the global structure in the manifold. Last, a unified optimisation framework is proposed to jointly select temporal consistency preserving spatial features and learn discriminative filters with the augmented Lagrangian method. Qualitative and quantitative evaluations have been conducted on a number of well-known benchmarking datasets such as OTB2013, OTB50, OTB100, Temple-Colour, UAV123 and VOT2018. The experimental results demonstrate the superiority of the proposed method over the state-of-the-art approaches. △ Less

Submitted 19 June, 2019; v1 submitted 30 July, 2018; originally announced July 2018.

Journal ref: IEEE Transactions on Image Processing, 2019

arXiv:1807.01085 [pdf, other]

One-Class Kernel Spectral Regression

Authors: Shervin Rahimzadeh Arashloo, Josef Kittler

Abstract: The paper introduces a new efficient nonlinear one-class classifier formulated as the Rayleigh quotient criterion optimisation. The method, operating in a reproducing kernel Hilbert space, minimises the scatter of target distribution along an optimal projection direction while at the same time keeping projections of positive observations distant from the mean of the negative class. We provide a gr… ▽ More The paper introduces a new efficient nonlinear one-class classifier formulated as the Rayleigh quotient criterion optimisation. The method, operating in a reproducing kernel Hilbert space, minimises the scatter of target distribution along an optimal projection direction while at the same time keeping projections of positive observations distant from the mean of the negative class. We provide a graph embedding view of the problem which can then be solved efficiently using the spectral regression approach. In this sense, unlike previous similar methods which often require costly eigen-computations of dense matrices, the proposed approach casts the problem under consideration into a regression framework which is computationally more efficient. In particular, it is shown that the dominant complexity of the proposed method is the complexity of computing the kernel matrix. Additional appealing characteristics of the proposed one-class classifier are: 1-the ability to be trained in an incremental fashion (allowing for application in streaming data scenarios while also reducing the computational complexity in a non-streaming operation mode); 2-being unsupervised, but providing the option for refining the solution using negative training examples, when available; And last but not the least, 3-the use of the kernel trick which facilitates a nonlinear mapping of the data into a high-dimensional feature space to seek better solutions. △ Less

Submitted 10 February, 2019; v1 submitted 3 July, 2018; originally announced July 2018.

arXiv:1807.00848 [pdf, other]

Client-Specific Anomaly Detection for Face Presentation Attack Detection

Authors: Shervin Rahimzadeh Arashloo, Josef Kittler

Abstract: The one-class anomaly detection approach has previously been found to be effective in face presentation attack detection, especially in an \textit{unseen} attack scenario, where the system is exposed to novel types of attacks. This work follows the same anomaly-based formulation of the problem and analyses the merits of deploying \textit{client-specific} information for face spoofing detection. We… ▽ More The one-class anomaly detection approach has previously been found to be effective in face presentation attack detection, especially in an \textit{unseen} attack scenario, where the system is exposed to novel types of attacks. This work follows the same anomaly-based formulation of the problem and analyses the merits of deploying \textit{client-specific} information for face spoofing detection. We propose training one-class client-specific classifiers (both generative and discriminative) using representations obtained from pre-trained deep convolutional neural networks. Next, based on subject-specific score distributions, a distinct threshold is set for each client, which is then used for decision making regarding a test query. Through extensive experiments using different one-class systems, it is shown that the use of client-specific information in a one-class anomaly detection formulation (both in model construction as well as decision threshold tuning) improves the performance significantly. In addition, it is demonstrated that the same set of deep convolutional features used for the recognition purposes is effective for face presentation attack detection in the class-specific one-class anomaly detection paradigm. △ Less

Submitted 2 July, 2018; originally announced July 2018.

arXiv:1806.10830 [pdf]

Grassmannian Discriminant Maps (GDM) for Manifold Dimensionality Reduction with Application to Image Set Classification

Authors: Rui Wang, Xiao-Jun Wu, Kai-Xuan Chen, Josef Kittler

Abstract: In image set classification, a considerable progress has been made by representing original image sets on Grassmann manifolds. In order to extend the advantages of the Euclidean based dimensionality reduction methods to the Grassmann Manifold, several methods have been suggested recently which jointly perform dimensionality reduction and metric learning on Grassmann manifold to improve performance… ▽ More In image set classification, a considerable progress has been made by representing original image sets on Grassmann manifolds. In order to extend the advantages of the Euclidean based dimensionality reduction methods to the Grassmann Manifold, several methods have been suggested recently which jointly perform dimensionality reduction and metric learning on Grassmann manifold to improve performance. Nevertheless, when applied to complex datasets, the learned features do not exhibit enough discriminatory power. To overcome this problem, we propose a new method named Grassmannian Discriminant Maps (GDM) for manifold dimensionality reduction problems. The core of the method is a new discriminant function for metric learning and dimensionality reduction. For comparison and better understanding, we also study a simple variations to GDM. The key difference between them is the discriminant function. We experiment on data sets corresponding to three tasks: face recognition, object categorization, and hand gesture recognition to evaluate the proposed method and its simple extensions. Compared with the state of the art, the results achieved show the effectiveness of the proposed algorithm. △ Less

Submitted 22 January, 2022; v1 submitted 28 June, 2018; originally announced June 2018.

Comments: 8 pages, 9 figures

arXiv:1806.07155 [pdf]

Semi-supervised Hashing for Semi-Paired Cross-View Retrieval

Authors: Jun Yu, Xiao-Jun Wu, Josef Kittler

Abstract: Recently, hashing techniques have gained importance in large-scale retrieval tasks because of their retrieval speed. Most of the existing cross-view frameworks assume that data are well paired. However, the fully-paired multiview situation is not universal in real applications. The aim of the method proposed in this paper is to learn the hashing function for semi-paired cross-view retrieval tasks.… ▽ More Recently, hashing techniques have gained importance in large-scale retrieval tasks because of their retrieval speed. Most of the existing cross-view frameworks assume that data are well paired. However, the fully-paired multiview situation is not universal in real applications. The aim of the method proposed in this paper is to learn the hashing function for semi-paired cross-view retrieval tasks. To utilize the label information of partial data, we propose a semi-supervised hashing learning framework which jointly performs feature extraction and classifier learning. The experimental results on two datasets show that our method outperforms several state-of-the-art methods in terms of retrieval accuracy. △ Less

Submitted 19 June, 2018; originally announced June 2018.

Comments: 6 pages, 5 figures, 2 tables

arXiv:1806.06177 [pdf]

doi 10.1109/ICPR.2018.8545822

Riemannian kernel based Nyström method for approximate infinite-dimensional covariance descriptors with application to image set classification

Authors: Kai-Xuan Chen, Xiao-Jun Wu, Rui Wang, Josef Kittler

Abstract: In the domain of pattern recognition, using the CovDs (Covariance Descriptors) to represent data and taking the metrics of the resulting Riemannian manifold into account have been widely adopted for the task of image set classification. Recently, it has been proven that infinite-dimensional CovDs are more discriminative than their low-dimensional counterparts. However, the form of infinite-dimensi… ▽ More In the domain of pattern recognition, using the CovDs (Covariance Descriptors) to represent data and taking the metrics of the resulting Riemannian manifold into account have been widely adopted for the task of image set classification. Recently, it has been proven that infinite-dimensional CovDs are more discriminative than their low-dimensional counterparts. However, the form of infinite-dimensional CovDs is implicit and the computational load is high. We propose a novel framework for representing image sets by approximating infinite-dimensional CovDs in the paradigm of the Nyström method based on a Riemannian kernel. We start by modeling the images via CovDs, which lie on the Riemannian manifold spanned by SPD (Symmetric Positive Definite) matrices. We then extend the Nyström method to the SPD manifold and obtain the approximations of CovDs in RKHS (Reproducing Kernel Hilbert Space). Finally, we approximate infinite-dimensional CovDs via these approximations. Empirically, we apply our framework to the task of image set classification. The experimental results obtained on three benchmark datasets show that our proposed approximate infinite-dimensional CovDs outperform the original CovDs. △ Less

Submitted 1 September, 2019; v1 submitted 16 June, 2018; originally announced June 2018.

Comments: 6 pages, 3 figures, International Conference on Pattern Recognition 2018

arXiv:1805.11918 [pdf]

Multiple Manifolds Metric Learning with Application to Image Set Classification

Authors: Rui Wang, Xiao-Jun Wu, Kai-Xuan Chen, Josef Kittler

Abstract: In image set classification, a considerable advance has been made by modeling the original image sets by second order statistics or linear subspace, which typically lie on the Riemannian manifold. Specifically, they are Symmetric Positive Definite (SPD) manifold and Grassmann manifold respectively, and some algorithms have been developed on them for classification tasks. Motivated by the inability… ▽ More In image set classification, a considerable advance has been made by modeling the original image sets by second order statistics or linear subspace, which typically lie on the Riemannian manifold. Specifically, they are Symmetric Positive Definite (SPD) manifold and Grassmann manifold respectively, and some algorithms have been developed on them for classification tasks. Motivated by the inability of existing methods to extract discriminatory features for data on Riemannian manifolds, we propose a novel algorithm which combines multiple manifolds as the features of the original image sets. In order to fuse these manifolds, the well-studied Riemannian kernels have been utilized to map the original Riemannian spaces into high dimensional Hilbert spaces. A metric Learning method has been devised to embed these kernel spaces into a lower dimensional common subspace for classification. The state-of-the-art results achieved on three datasets corresponding to two different classification tasks, namely face recognition and object categorization, demonstrate the effectiveness of the proposed method. △ Less

Submitted 30 May, 2018; originally announced May 2018.

Comments: 6 pages, 4 figures,ICPR 2018(accepted)

arXiv:1805.10628

A Simple Riemannian Manifold Network for Image Set Classification

Authors: Rui Wang, Xiao-Jun Wu, Josef Kittler

Abstract: In the domain of image-set based classification, a considerable advance has been made by representing original image sets as covariance matrices which typical lie in a Riemannian manifold. Specifically, it is a Symmetric Positive Definite (SPD) manifold. Traditional manifold learning methods inevitably have the property of high computational complexity or weak performance of the feature representa… ▽ More In the domain of image-set based classification, a considerable advance has been made by representing original image sets as covariance matrices which typical lie in a Riemannian manifold. Specifically, it is a Symmetric Positive Definite (SPD) manifold. Traditional manifold learning methods inevitably have the property of high computational complexity or weak performance of the feature representation. In order to overcome these limitations, we propose a very simple Riemannian manifold network for image set classification. Inspired by deep learning architectures, we design a fully connected layer to generate more novel, more powerful SPD matrices. However we exploit the rectifying layer to prevent the input SPD matrices from being singular. We also introduce a non-linear learning of the proposed network with an innovative objective function. Furthermore we devise a pooling layer to further reduce the redundancy of the input SPD matrices, and the log-map layer to project the SPD manifold to the Euclidean space. For learning the connection weights between the input layer and the fully connected layer, we use Two-directional two-dimensional Principal Component Analysis ((2D)2PCA) algorithm. The proposed Riemannian manifold network (RieMNet) avoids complex computing and can be built and trained extremely easy and efficient. We have also developed a deep version of RieMNet, named as DRieMNet. The proposed RieMNet and DRieMNet are evaluated on three tasks: video-based face recognition, set-based object categorization, and set-based cell identification. Extensive experimental results show the superiority of our method over the state-of-the-art. △ Less

Submitted 17 November, 2018; v1 submitted 27 May, 2018; originally announced May 2018.

Comments: There are some errors in the submitted paper. (1) Section III, part B, the Equation (8) is formulated in wrong way. (2) Section III, part E, we use S to represents the sum of eigenvalues in Equation (15) but it has been used as the sactter matrix in Equation (17) and (18). As a result, we are very sorry for this, and would like to withdraw this submitted paper to carefully revise it

arXiv:1805.10476 [pdf]

doi 10.1117/1.JEI.28.2.023016

L1-(2D)2PCANet: A Deep Learning Network for Face Recognition

Authors: YunKun Li, XiaoJun Wu, Josef Kittler

Abstract: In this paper, we propose a novel deep learning network L1-(2D)2PCANet for face recognition, which is based on L1-norm-based two-directional two-dimensional principal component analysis (L1-(2D)2PCA). In our network, the role of L1-(2D)2PCA is to learn the filters of multiple convolution layers. After the convolution layers, we deploy binary hashing and block-wise histogram for pooling. We test ou… ▽ More In this paper, we propose a novel deep learning network L1-(2D)2PCANet for face recognition, which is based on L1-norm-based two-directional two-dimensional principal component analysis (L1-(2D)2PCA). In our network, the role of L1-(2D)2PCA is to learn the filters of multiple convolution layers. After the convolution layers, we deploy binary hashing and block-wise histogram for pooling. We test our network on some benchmark facial datasets YALE, AR, Extended Yale B, LFW-a and FERET with CNN, PCANet, 2DPCANet and L1-PCANet as comparison. The results show that the recognition performance of L1-(2D)2PCANet in all tests is better than baseline networks, especially when there are outliers in the test data. Owing to the L1-norm, L1-2D2PCANet is robust to outliers and changes of the training images. △ Less

Submitted 26 May, 2018; originally announced May 2018.

Comments: 8 pages and 5 figures

arXiv:1804.06992 [pdf, other]

doi 10.1109/ICPR.2018.8546006

Infrared and Visible Image Fusion using a Deep Learning Framework

Authors: Hui Li, Xiao-Jun Wu, Josef Kittler

Abstract: In recent years, deep learning has become a very active research tool which is used in many image processing fields. In this paper, we propose an effective image fusion method using a deep learning framework to generate a single image which contains all the features from infrared and visible images. First, the source images are decomposed into base parts and detail content. Then the base parts are… ▽ More In recent years, deep learning has become a very active research tool which is used in many image processing fields. In this paper, we propose an effective image fusion method using a deep learning framework to generate a single image which contains all the features from infrared and visible images. First, the source images are decomposed into base parts and detail content. Then the base parts are fused by weighted-averaging. For the detail content, we use a deep learning network to extract multi-layer features. Using these features, we use l_1-norm and weighted-average strategy to generate several candidates of the fused detail content. Once we get these candidates, the max selection strategy is used to get final fused detail content. Finally, the fused image will be reconstructed by combining the fused base part and detail content. The experimental results demonstrate that our proposed method achieves state-of-the-art performance in both objective assessment and visual quality. The Code of our fusion method is available at https://github.com/hli1221/imagefusion_deeplearning △ Less

Submitted 18 December, 2018; v1 submitted 19 April, 2018; originally announced April 2018.

Comments: 6 pages, 6 figures, 2 tables, ICPR 2018(accepted)

arXiv:1804.03675 [pdf, other]

doi 10.1007/978-3-030-01252-6_14

Semi-supervised Adversarial Learning to Generate Photorealistic Face Images of New Identities from 3D Morphable Model

Authors: Baris Gecer, Binod Bhattarai, Josef Kittler, Tae-Kyun Kim

Abstract: We propose a novel end-to-end semi-supervised adversarial framework to generate photorealistic face images of new identities with wide ranges of expressions, poses, and illuminations conditioned by a 3D morphable model. Previous adversarial style-transfer methods either supervise their networks with large volume of paired data or use unpaired data with a highly under-constrained two-way generative… ▽ More We propose a novel end-to-end semi-supervised adversarial framework to generate photorealistic face images of new identities with wide ranges of expressions, poses, and illuminations conditioned by a 3D morphable model. Previous adversarial style-transfer methods either supervise their networks with large volume of paired data or use unpaired data with a highly under-constrained two-way generative framework in an unsupervised fashion. We introduce pairwise adversarial supervision to constrain two-way domain adaptation by a small number of paired real and synthetic images for training along with the large volume of unpaired data. Extensive qualitative and quantitative experiments are performed to validate our idea. Generated face images of new identities contain pose, lighting and expression diversity and qualitative results show that they are highly constraint by the synthetic input image while adding photorealism and retaining identity information. We combine face images generated by the proposed method with the real data set to train face recognition algorithms. We evaluated the model on two challenging data sets: LFW and IJB-A. We observe that the generated images from our framework consistently improves over the performance of deep face recognition network trained with Oxford VGG Face dataset and achieves comparable results to the state-of-the-art. △ Less

Submitted 10 April, 2018; originally announced April 2018.

Journal ref: In Proceedings of the European conference on computer vision (ECCV), 2018, pp. 217-234

arXiv:1803.05536 [pdf, other]

Evaluation of Dense 3D Reconstruction from 2D Face Images in the Wild

Authors: Zhen-Hua Feng, Patrik Huber, Josef Kittler, Peter JB Hancock, Xiao-Jun Wu, Qijun Zhao, Paul Koppen, Matthias Rätsch

Abstract: This paper investigates the evaluation of dense 3D face reconstruction from a single 2D image in the wild. To this end, we organise a competition that provides a new benchmark dataset that contains 2000 2D facial images of 135 subjects as well as their 3D ground truth face scans. In contrast to previous competitions or challenges, the aim of this new benchmark dataset is to evaluate the accuracy o… ▽ More This paper investigates the evaluation of dense 3D face reconstruction from a single 2D image in the wild. To this end, we organise a competition that provides a new benchmark dataset that contains 2000 2D facial images of 135 subjects as well as their 3D ground truth face scans. In contrast to previous competitions or challenges, the aim of this new benchmark dataset is to evaluate the accuracy of a 3D dense face reconstruction algorithm using real, accurate and high-resolution 3D ground truth face scans. In addition to the dataset, we provide a standard protocol as well as a Python script for the evaluation. Last, we report the results obtained by three state-of-the-art 3D face reconstruction systems on the new benchmark dataset. The competition is organised along with the 2018 13th IEEE Conference on Automatic Face & Gesture Recognition. △ Less

Submitted 20 April, 2018; v1 submitted 14 March, 2018; originally announced March 2018.

Comments: 8 pages

arXiv:1711.06753 [pdf, other]

Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks

Authors: Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, Xiao-Jun Wu

Abstract: We present a new loss function, namely Wing loss, for robust facial landmark localisation with Convolutional Neural Networks (CNNs). We first compare and analyse different loss functions including L2, L1 and smooth L1. The analysis of these loss functions suggests that, for the training of a CNN-based localisation model, more attention should be paid to small and medium range errors. To this end,… ▽ More We present a new loss function, namely Wing loss, for robust facial landmark localisation with Convolutional Neural Networks (CNNs). We first compare and analyse different loss functions including L2, L1 and smooth L1. The analysis of these loss functions suggests that, for the training of a CNN-based localisation model, more attention should be paid to small and medium range errors. To this end, we design a piece-wise loss function. The new loss amplifies the impact of errors from the interval (-w, w) by switching from L1 loss to a modified logarithm function. To address the problem of under-representation of samples with large out-of-plane head rotations in the training set, we propose a simple but effective boosting strategy, referred to as pose-based data balancing. In particular, we deal with the data imbalance problem by duplicating the minority training samples and perturbing them by injecting random image rotation, bounding box translation and other data augmentation approaches. Last, the proposed approach is extended to create a two-stage framework for robust facial landmark localisation. The experimental results obtained on AFLW and 300W demonstrate the merits of the Wing loss function, and prove the superiority of the proposed method over the state-of-the-art approaches. △ Less

Submitted 23 October, 2018; v1 submitted 17 November, 2017; originally announced November 2017.

Comments: 11 pages, 6 figures, 6 tables

arXiv:1710.01202 [pdf, other]

Person Re-Identification with Vision and Language

Authors: Fei Yan, Krystian Mikolajczyk, Josef Kittler

Abstract: In this paper we propose a new approach to person re-identification using images and natural language descriptions. We propose a joint vision and language model based on CCA and CNN architectures to match across the two modalities as well as to enrich visual examples for which there are no language descriptions. We also introduce new annotations in the form of natural language descriptions for two… ▽ More In this paper we propose a new approach to person re-identification using images and natural language descriptions. We propose a joint vision and language model based on CCA and CNN architectures to match across the two modalities as well as to enrich visual examples for which there are no language descriptions. We also introduce new annotations in the form of natural language descriptions for two standard Re-ID benchmarks, namely CUHK03 and VIPeR. We perform experiments on these two datasets with techniques based on CNN, hand-crafted features as well as LSTM for analysing visual and natural description data. We investigate and demonstrate the advantages of using natural language descriptions compared to attributes as well as CNN compared to LSTM in the context of Re-ID. We show that the joint use of language and vision can significantly improve the state-of-the-art performance on standard Re-ID benchmarks. △ Less

Submitted 3 October, 2017; originally announced October 2017.

arXiv:1708.07199 [pdf, other]

doi 10.1109/ICCVW.2017.110

3D Morphable Models as Spatial Transformer Networks

Authors: Anil Bas, Patrik Huber, William A. P. Smith, Muhammad Awais, Josef Kittler

Abstract: In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained lo… ▽ More In this paper, we show how a 3D Morphable Model (i.e. a statistical model of the 3D shape of a class of objects such as faces) can be used to spatially transform input data as a module (a 3DMM-STN) within a convolutional neural network. This is an extension of the original spatial transformer network in that we are able to interpret and normalise 3D pose changes and self-occlusions. The trained localisation part of the network is independently useful since it learns to fit a 3D morphable model to a single image. We show that the localiser can be trained using only simple geometric loss functions on a relatively small dataset yet is able to perform robust normalisation on highly uncontrolled images including occlusion, self-occlusion and large pose changes. △ Less

Submitted 23 August, 2017; originally announced August 2017.

Comments: Accepted to ICCV 2017 2nd Workshop on Geometry Meets Deep Learning

MSC Class: 68T45 ACM Class: I.4.8; I.2.10

arXiv:1708.02337 [pdf, other]

doi 10.1109/BTAS.2017.8272759

Unconstrained Face Detection and Open-Set Face Recognition Challenge

Authors: Manuel Günther, Peiyun Hu, Christian Herrmann, Chi Ho Chan, Min Jiang, Shufan Yang, Akshay Raj Dhamija, Deva Ramanan, Jürgen Beyerer, Josef Kittler, Mohamad Al Jazaery, Mohammad Iqbal Nouyed, Guodong Guo, Cezary Stankiewicz, Terrance E. Boult

Abstract: Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses… ▽ More Face detection and recognition benchmarks have shifted toward more difficult environments. The challenge presented in this paper addresses the next step in the direction of automatic detection and identification of people from outdoor surveillance cameras. While face detection has shown remarkable success in images collected from the web, surveillance cameras include more diverse occlusions, poses, weather conditions and image blur. Although face verification or closed-set face identification have surpassed human capabilities on some datasets, open-set identification is much more complex as it needs to reject both unknown identities and false accepts from the face detector. We show that unconstrained face detection can approach high detection rates albeit with moderate false accept rates. By contrast, open-set face recognition is currently weak and requires much more attention. △ Less

Submitted 25 September, 2018; v1 submitted 7 August, 2017; originally announced August 2017.

Comments: This is an ERRATA version of the paper originally presented at the International Joint Conference on Biometrics. Due to a bug in our evaluation code, the results of the participants changed. The final conclusion, however, is still the same

arXiv:1705.02402 [pdf, other]

Face Detection, Bounding Box Aggregation and Pose Estimation for Robust Facial Landmark Localisation in the Wild

Authors: Zhen-Hua Feng, Josef Kittler, Muhammad Awais, Patrik Huber, Xiao-Jun Wu

Abstract: We present a framework for robust face detection and landmark localisation of faces in the wild, which has been evaluated as part of `the 2nd Facial Landmark Localisation Competition'. The framework has four stages: face detection, bounding box aggregation, pose estimation and landmark localisation. To achieve a high detection rate, we use two publicly available CNN-based face detectors and two pr… ▽ More We present a framework for robust face detection and landmark localisation of faces in the wild, which has been evaluated as part of `the 2nd Facial Landmark Localisation Competition'. The framework has four stages: face detection, bounding box aggregation, pose estimation and landmark localisation. To achieve a high detection rate, we use two publicly available CNN-based face detectors and two proprietary detectors. We aggregate the detected face bounding boxes of each input image to reduce false positives and improve face detection accuracy. A cascaded shape regressor, trained using faces with a variety of pose variations, is then employed for pose estimation and image pre-processing. Last, we train the final cascaded shape regressor for fine-grained landmark localisation, using a large number of training samples with limited pose variations. The experimental results obtained on the 300W and Menpo benchmarks demonstrate the superiority of our framework over state-of-the-art methods. △ Less

Submitted 1 June, 2017; v1 submitted 5 May, 2017; originally announced May 2017.

arXiv:1612.09548 [pdf, other]

A Unified Tensor-based Active Appearance Face Model

Authors: Zhen-Hua Feng, Josef Kittler, William Christmas, Xiao-Jun Wu

Abstract: Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture information of 2D faces. For each type of face information, namely shape and texture, we construct a unified tensor model capturing all relevant appearance variations. This contrasts… ▽ More Appearance variations result in many difficulties in face image analysis. To deal with this challenge, we present a Unified Tensor-based Active Appearance Model (UT-AAM) for jointly modelling the geometry and texture information of 2D faces. For each type of face information, namely shape and texture, we construct a unified tensor model capturing all relevant appearance variations. This contrasts with the variation-specific models of the classical tensor AAM. To achieve the unification across pose variations, a strategy for dealing with self-occluded faces is proposed to obtain consistent shape and texture representations of pose-varied faces. In addition, our UT-AAM is capable of constructing the model from an incomplete training dataset, using tensor completion methods. Last, we use an effective cascaded-regression-based method for UT-AAM fitting. With these advancements, the utility of UT-AAM in practice is considerably enhanced. As an example, we demonstrate the improvements in training facial landmark detectors through the use of UT-AAM to synthesise a large number of virtual samples. Experimental results obtained using the Multi-PIE and 300-W face datasets demonstrate the merits of the proposed approach. △ Less

Submitted 13 June, 2017; v1 submitted 30 December, 2016; originally announced December 2016.

arXiv:1611.05396 [pdf, other]

Dynamic Attention-controlled Cascaded Shape Regression Exploiting Training Data Augmentation and Fuzzy-set Sample Weighting

Authors: Zhen-Hua Feng, Josef Kittler, William Christmas, Patrik Huber, Xiao-Jun Wu

Abstract: We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on unconstrained faces. Our DAC-CSR divides facial landmark detection into three cascaded sub-tasks: face bounding box refinement, general CSR and attention-controlled CSR. The first two stages refine initial face bounding boxes and output intermedi… ▽ More We present a new Cascaded Shape Regression (CSR) architecture, namely Dynamic Attention-Controlled CSR (DAC-CSR), for robust facial landmark detection on unconstrained faces. Our DAC-CSR divides facial landmark detection into three cascaded sub-tasks: face bounding box refinement, general CSR and attention-controlled CSR. The first two stages refine initial face bounding boxes and output intermediate facial landmarks. Then, an online dynamic model selection method is used to choose appropriate domain-specific CSRs for further landmark refinement. The key innovation of our DAC-CSR is the fault-tolerant mechanism, using fuzzy set sample weighting for attention-controlled domain-specific model training. Moreover, we advocate data augmentation with a simple but effective 2D profile face generator, and context-aware feature extraction for better facial feature representation. Experimental results obtained on challenging datasets demonstrate the merits of our DAC-CSR over the state-of-the-art. △ Less

Submitted 4 April, 2017; v1 submitted 16 November, 2016; originally announced November 2016.

arXiv:1611.00284 [pdf, other]

Dictionary Integration using 3D Morphable Face Models for Pose-invariant Collaborative-representation-based Classification

Authors: Xiaoning Song, Zhen-Hua Feng, Guosheng Hu, Josef Kittler, William Christmas, Xiao-Jun Wu

Abstract: The paper presents a dictionary integration algorithm using 3D morphable face models (3DMM) for pose-invariant collaborative-representation-based face classification. To this end, we first fit a 3DMM to the 2D face images of a dictionary to reconstruct the 3D shape and texture of each image. The 3D faces are used to render a number of virtual 2D face images with arbitrary pose variations to augmen… ▽ More The paper presents a dictionary integration algorithm using 3D morphable face models (3DMM) for pose-invariant collaborative-representation-based face classification. To this end, we first fit a 3DMM to the 2D face images of a dictionary to reconstruct the 3D shape and texture of each image. The 3D faces are used to render a number of virtual 2D face images with arbitrary pose variations to augment the training data, by merging the original and rendered virtual samples to create an extended dictionary. Second, to reduce the information redundancy of the extended dictionary and improve the sparsity of reconstruction coefficient vectors using collaborative-representation-based classification (CRC), we exploit an on-line elimination scheme to optimise the extended dictionary by identifying the most representative training samples for a given query. The final goal is to perform pose-invariant face classification using the proposed dictionary integration method and the on-line pruning strategy under the CRC framework. Experimental results obtained for a set of well-known face datasets demonstrate the merits of the proposed method, especially its robustness to pose variations. △ Less

Submitted 25 November, 2016; v1 submitted 1 November, 2016; originally announced November 2016.

arXiv:1605.06764 [pdf, other]

doi 10.1109/LSP.2016.2643284

3D Face Tracking and Texture Fusion in the Wild

Authors: Patrik Huber, Philipp Kopp, Matthias Rätsch, William Christmas, Josef Kittler

Abstract: We present a fully automatic approach to real-time 3D face reconstruction from monocular in-the-wild videos. With the use of a cascaded-regressor based face tracking and a 3D Morphable Face Model shape fitting, we obtain a semi-dense 3D face shape. We further use the texture information from multiple frames to build a holistic 3D face representation from the video frames. Our system is able to cap… ▽ More We present a fully automatic approach to real-time 3D face reconstruction from monocular in-the-wild videos. With the use of a cascaded-regressor based face tracking and a 3D Morphable Face Model shape fitting, we obtain a semi-dense 3D face shape. We further use the texture information from multiple frames to build a holistic 3D face representation from the video frames. Our system is able to capture facial expressions and does not require any person-specific training. We demonstrate the robustness of our approach on the challenging 300 Videos in the Wild (300-VW) dataset. Our real-time fitting framework is available as an open source library at http://4dface.org. △ Less

Submitted 22 May, 2016; originally announced May 2016.

MSC Class: 68T45 ACM Class: I.4.8; I.4.9; I.2.10

Journal ref: IEEE Signal Processing Letters (Volume: 24, Issue: 4, April 2017)

arXiv:1604.04451 [pdf, other]

Delta divergence: A novel decision cognizant measure of classifier incongruence

Authors: Josef Kittler, Cemre Zor

Abstract: Disagreement between two classifiers regarding the class membership of an observation in pattern recognition can be indicative of an anomaly and its nuance. As in general classifiers base their decision on class aposteriori probabilities, the most natural approach to detecting classifier incongruence is to use divergence. However, existing divergences are not particularly suitable to gauge classif… ▽ More Disagreement between two classifiers regarding the class membership of an observation in pattern recognition can be indicative of an anomaly and its nuance. As in general classifiers base their decision on class aposteriori probabilities, the most natural approach to detecting classifier incongruence is to use divergence. However, existing divergences are not particularly suitable to gauge classifier incongruence. In this paper, we postulate the properties that a divergence measure should satisfy and propose a novel divergence measure, referred to as Delta divergence. In contrast to existing measures, it is decision cognizant. The focus in Delta divergence on the dominant hypotheses has a clutter reducing property, the significance of which grows with increasing number of classes. The proposed measure satisfies other important properties such as symmetry, and independence of classifier confidence. The relationship of the proposed divergence to some baseline measures is demonstrated experimentally, showing its superiority. △ Less

Submitted 4 July, 2016; v1 submitted 15 April, 2016; originally announced April 2016.

Comments: 12 pages, 12 figures

arXiv:1504.02351 [pdf, other]

When Face Recognition Meets with Deep Learning: an Evaluation of Convolutional Neural Networks for Face Recognition

Authors: Guosheng Hu, Yongxin Yang, Dong Yi, Josef Kittler, William Christmas, Stan Z. Li, Timothy Hospedales

Abstract: Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a 'good' architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive ev… ▽ More Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a 'good' architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible. Specifically, we use public database LFW (Labeled Faces in the Wild) to train CNNs, unlike most existing CNNs trained on private databases. We propose three CNN architectures which are the first reported architectures trained using LFW data. This paper quantitatively compares the architectures of CNNs and evaluate the effect of different implementation choices. We identify several useful properties of CNN-FRS. For instance, the dimensionality of the learned features can be significantly reduced without adverse effect on face recognition accuracy. In addition, traditional metric learning method exploiting CNN-learned features is evaluated. Experiments show two crucial factors to good CNN-FRS performance are the fusion of multiple CNNs and metric learning. To make our work reproducible, source code and models will be made publicly available. △ Less

Submitted 9 April, 2015; originally announced April 2015.

Comments: 7 pages, 4 figures, 7 tables

arXiv:1503.02330 [pdf, other]

doi 10.1109/ICIP.2015.7350989

Fitting 3D Morphable Models using Local Features

Authors: Patrik Huber, Zhen-Hua Feng, William Christmas, Josef Kittler, Matthias Rätsch

Abstract: In this paper, we propose a novel fitting method that uses local image features to fit a 3D Morphable Model to 2D images. To overcome the obstacle of optimising a cost function that contains a non-differentiable feature extraction operator, we use a learning-based cascaded regression method that learns the gradient direction from data. The method allows to simultaneously solve for shape and pose p… ▽ More In this paper, we propose a novel fitting method that uses local image features to fit a 3D Morphable Model to 2D images. To overcome the obstacle of optimising a cost function that contains a non-differentiable feature extraction operator, we use a learning-based cascaded regression method that learns the gradient direction from data. The method allows to simultaneously solve for shape and pose parameters. Our method is thoroughly evaluated on Morphable Model generated data and first results on real data are presented. Compared to traditional fitting methods, which use simple raw features like pixel colour or edge maps, local features have been shown to be much more robust against variations in imaging conditions. Our approach is unique in that we are the first to use local features to fit a Morphable Model. Because of the speed of our method, it is applicable for realtime applications. Our cascaded regression framework is available as an open source library (https://github.com/patrikhuber). △ Less

Submitted 8 March, 2015; originally announced March 2015.

Comments: Submitted to ICIP 2015; 4 pages, 4 figures

MSC Class: 68T45 ACM Class: I.4.8; I.2.10

Journal ref: Proceedings of the IEEE International Conference on Image Processing (ICIP) 2015, pages 1195-1199

arXiv:1411.5825 [pdf]

doi 10.1016/j.media.2014.11.010

Assessment of algorithms for mitosis detection in breast cancer histopathology images

Authors: Mitko Veta, Paul J. van Diest, Stefan M. Willems, Haibo Wang, Anant Madabhushi, Angel Cruz-Roa, Fabio Gonzalez, Anders B. L. Larsen, Jacob S. Vestergaard, Anders B. Dahl, Dan C. Cireşan, Jürgen Schmidhuber, Alessandro Giusti, Luca M. Gambardella, F. Boray Tek, Thomas Walter, Ching-Wei Wang, Satoshi Kondo, Bogdan J. Matuszewski, Frederic Precioso, Violet Snell, Josef Kittler, Teofilo E. de Campos, Adnan M. Khan, Nasir M. Rajpoot , et al. (4 additional authors not shown)

Abstract: The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automati… ▽ More The proliferative activity of breast tumors, which is routinely estimated by counting of mitotic figures in hematoxylin and eosin stained histology sections, is considered to be one of the most important prognostic markers. However, mitosis counting is laborious, subjective and may suffer from low inter-observer agreement. With the wider acceptance of whole slide images in pathology labs, automatic image analysis has been proposed as a potential solution for these issues. In this paper, the results from the Assessment of Mitosis Detection Algorithms 2013 (AMIDA13) challenge are described. The challenge was based on a data set consisting of 12 training and 11 testing subjects, with more than one thousand annotated mitotic figures by multiple observers. Short descriptions and results from the evaluation of eleven methods are presented. The top performing method has an error rate that is comparable to the inter-observer agreement among pathologists. △ Less

Submitted 21 November, 2014; originally announced November 2014.

Comments: 23 pages, 5 figures, accepted for publication in the journal Medical Image Analysis

Showing 51–93 of 93 results for author: Kittler, J