-
On-Demand Routing in LEO Mega-Constellations with Dynamic Laser Inter-Satellite Links
Authors:
Dhiraj Bhattacharjee,
Pablo G. Madoery,
Aizaz U. Chaudhry,
Halim Yanikomeroglu,
Gunes Karabulut Kurt,
Peng Hu,
Khaled Ahmed,
Stephane Martel
Abstract:
Low Earth orbit (LEO) satellite mega constellations are beginning to include laser inter-satellite links (LISLs) to extend the Internet to the most remote locations on Earth. Since the process of establishing these links incurs a setup delay on the order of seconds, a static network topology is generally established well in advance, which is then used for the routing calculations. However, this in…
▽ More
Low Earth orbit (LEO) satellite mega constellations are beginning to include laser inter-satellite links (LISLs) to extend the Internet to the most remote locations on Earth. Since the process of establishing these links incurs a setup delay on the order of seconds, a static network topology is generally established well in advance, which is then used for the routing calculations. However, this involves keeping links active even when they are not being used to forward traffic, leading to poor energy efficiency. Motivated by technological advances that are gradually decreasing the LISL setup delays, we foresee scenarios where it will be possible to compute routes and establish dynamic LISLs on demand. This will require considering setup delays as penalties that will affect the end-to-end latency. In this paper, we present a nonlinear optimization model that considers these penalties in the cost function and propose three heuristic algorithms that solve the problem in a tractable way. The algorithms establish different trade-offs in terms of performance and computational complexity. We extensively analyze metrics including average latency, route change rate, outage probability, and jitter in Starlink's Phase I version 2 constellation. The results show the benefit of adaptive routing schemes according to the link setup delay. In particular, more complex schemes can decrease the average end-to-end latency in exchange for an increase in execution time. On the other hand, depending on the maximum tolerated latency, it is possible to use less computationally complex schemes which will be more scalable for the satellite mega constellations of the future.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Full-stack evaluation of Machine Learning inference workloads for RISC-V systems
Authors:
Debjyoti Bhattacharjee,
Anmol,
Tommaso Marinelli,
Karan Pathak,
Peter Kourzanov
Abstract:
Architectural simulators hold a vital role in RISC-V research, providing a crucial platform for workload evaluation without the need for costly physical prototypes. They serve as a dynamic environment for exploring innovative architectural concepts, enabling swift iteration and thorough analysis of performance metrics. As deep learning algorithms become increasingly pervasive, it is essential to b…
▽ More
Architectural simulators hold a vital role in RISC-V research, providing a crucial platform for workload evaluation without the need for costly physical prototypes. They serve as a dynamic environment for exploring innovative architectural concepts, enabling swift iteration and thorough analysis of performance metrics. As deep learning algorithms become increasingly pervasive, it is essential to benchmark new architectures with machine learning workloads. The diverse computational kernels used in deep learning algorithms highlight the necessity for a comprehensive compilation toolchain to map to target hardware platforms. This study evaluates the performance of a wide array of machine learning workloads on RISC-V architectures using gem5, an open-source architectural simulator. Leveraging an open-source compilation toolchain based on Multi-Level Intermediate Representation (MLIR), the research presents benchmarking results specifically focused on deep learning inference workloads. Additionally, the study sheds light on current limitations of gem5 when simulating RISC-V architectures, offering insights for future development and refinement.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
Authors:
Ziyang Gong,
Fuhao Li,
Yupeng Deng,
Deblina Bhattacharjee,
Xianzheng Ma,
Xiangwei Zhu,
Zhenming Ji
Abstract:
Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these disc…
▽ More
Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their models to overlook discrepancies within all adverse scenes. To tackle this, we propose CoDA which instructs models to distinguish, focus, and learn from these discrepancies at scene and image levels. Specifically, CoDA consists of a Chain-of-Domain (CoD) strategy and a Severity-Aware Visual Prompt Tuning (SAVPT) mechanism. CoD focuses on scene-level instructions to divide all adverse scenes into easy and hard scenes, guiding models to adapt from source to easy domains with easy scene images, and then to hard domains with hard scene images, thereby laying a solid foundation for whole adaptations. Building upon this foundation, we employ SAVPT to dive into more detailed image-level instructions to boost performance. SAVPT features a novel metric Severity that divides all adverse scene images into low-severity and high-severity images. Then Severity directs visual prompts and adapters, instructing models to concentrate on unified severity features instead of scene-specific features, without adding complexity to the model architecture. CoDA achieves SOTA performances on widely-used benchmarks under all adverse scenes. Notably, CoDA outperforms the existing ones by 4.6%, and 10.3% mIoU on the Foggy Driving, and Foggy Zurich benchmarks, respectively. Our code is available at https://github.com/Cuzyoung/CoDA
△ Less
Submitted 15 July, 2024; v1 submitted 26 March, 2024;
originally announced March 2024.
-
OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation
Authors:
Baran Ozaydin,
Tong Zhang,
Deblina Bhattacharjee,
Sabine Süsstrunk,
Mathieu Salzmann
Abstract:
Unsupervised Semantic Segmentation (USS) involves segmenting images without relying on predefined labels, aiming to alleviate the burden of extensive human labeling. Existing methods utilize features generated by self-supervised models and specific priors for clustering. However, their clustering objectives are not involved in the optimization of the features during training. Additionally, due to…
▽ More
Unsupervised Semantic Segmentation (USS) involves segmenting images without relying on predefined labels, aiming to alleviate the burden of extensive human labeling. Existing methods utilize features generated by self-supervised models and specific priors for clustering. However, their clustering objectives are not involved in the optimization of the features during training. Additionally, due to the lack of clear class definitions in USS, the resulting segments may not align well with the clustering objective. In this paper, we introduce a novel approach called Optimally Matched Hierarchy (OMH) to simultaneously address the above issues. The core of our method lies in imposing structured sparsity on the feature space, which allows the features to encode information with different levels of granularity. The structure of this sparsity stems from our hierarchy (OMH). To achieve this, we learn a soft but sparse hierarchy among parallel clusters through Optimal Transport. Our OMH yields better unsupervised segmentation performance compared to existing USS methods. Our extensive experiments demonstrate the benefits of OMH when utilizing our differentiable paradigm. We will make our code publicly available.
△ Less
Submitted 5 April, 2024; v1 submitted 11 March, 2024;
originally announced March 2024.
-
Hand Biometrics in Digital Forensics
Authors:
Asish Bera,
Debotosh Bhattacharjee,
Mita Nasipuri
Abstract:
Digital forensic is now an unavoidable part for securing the digital world from identity theft. Higher order of crimes, dealing with a massive database is really very challenging problem for any intelligent system. Biometric is a better solution to win over the problems encountered by digital forensics. Many biometric characteristics are playing their significant roles in forensics over the decade…
▽ More
Digital forensic is now an unavoidable part for securing the digital world from identity theft. Higher order of crimes, dealing with a massive database is really very challenging problem for any intelligent system. Biometric is a better solution to win over the problems encountered by digital forensics. Many biometric characteristics are playing their significant roles in forensics over the decades. The potential benefits and scope of hand based modes in forensics have been investigated with an illustration of hand geometry verifi-cation method. It can be applied when effective biometric evidences are properly unavailable; gloves are damaged, and dirt or any kind of liquid can minimize the accessibility and reliability of the fingerprint or palmprint. Due to the crisis of pure uniqueness of hand features for a very large database, it may be relevant for verification only. Some unimodal and multimodal hand based biometrics (e.g. hand geometry, palmprint and hand vein) with several feature extractions, database and verification methods have been discussed with 2D, 3D and infrared images.
△ Less
Submitted 17 February, 2024;
originally announced February 2024.
-
AM^2-EmoJE: Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning
Authors:
Naresh Kumar Devulapally,
Sidharth Anand,
Sreyasee Das Bhattacharjee,
Junsong Yuan
Abstract:
Human emotion can be presented in different modes i.e., audio, video, and text. However, the contribution of each mode in exhibiting each emotion is not uniform. Furthermore, the availability of complete mode-specific details may not always be guaranteed in the test time. In this work, we propose AM^2-EmoJE, a model for Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedd…
▽ More
Human emotion can be presented in different modes i.e., audio, video, and text. However, the contribution of each mode in exhibiting each emotion is not uniform. Furthermore, the availability of complete mode-specific details may not always be guaranteed in the test time. In this work, we propose AM^2-EmoJE, a model for Adaptive Missing-Modality Emotion Recognition in Conversation via Joint Embedding Learning model that is grounded on two-fold contributions: First, a query adaptive fusion that can automatically learn the relative importance of its mode-specific representations in a query-specific manner. By this the model aims to prioritize the mode-invariant spatial query details of the emotion patterns, while also retaining its mode-exclusive aspects within the learned multimodal query descriptor. Second the multimodal joint embedding learning module that explicitly addresses various missing modality scenarios in test-time. By this, the model learns to emphasize on the correlated patterns across modalities, which may help align the cross-attended mode-specific descriptors pairwise within a joint-embedding space and thereby compensate for missing modalities during inference. By leveraging the spatio-temporal details at the dialogue level, the proposed AM^2-EmoJE not only demonstrates superior performance compared to the best-performing state-of-the-art multimodal methods, by effectively leveraging body language in place of face expression, it also exhibits an enhanced privacy feature. By reporting around 2-5% improvement in the weighted-F1 score, the proposed multimodal joint embedding module facilitates an impressive performance gain in a variety of missing-modality query scenarios during test time.
△ Less
Submitted 26 January, 2024;
originally announced February 2024.
-
AMuSE: Adaptive Multimodal Analysis for Speaker Emotion Recognition in Group Conversations
Authors:
Naresh Kumar Devulapally,
Sidharth Anand,
Sreyasee Das Bhattacharjee,
Junsong Yuan,
Yu-Ping Chang
Abstract:
Analyzing individual emotions during group conversation is crucial in developing intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make…
▽ More
Analyzing individual emotions during group conversation is crucial in developing intelligent agents capable of natural human-machine interaction. While reliable emotion recognition techniques depend on different modalities (text, audio, video), the inherent heterogeneity between these modalities and the dynamic cross-modal interactions influenced by an individual's unique behavioral patterns make the task of emotion recognition very challenging. This difficulty is compounded in group settings, where the emotion and its temporal evolution are not only influenced by the individual but also by external contexts like audience reaction and context of the ongoing conversation. To meet this challenge, we propose a Multimodal Attention Network that captures cross-modal interactions at various levels of spatial abstraction by jointly learning its interactive bunch of mode-specific Peripheral and Central networks. The proposed MAN injects cross-modal attention via its Peripheral key-value pairs within each layer of a mode-specific Central query network. The resulting cross-attended mode-specific descriptors are then combined using an Adaptive Fusion technique that enables the model to integrate the discriminative and complementary mode-specific data patterns within an instance-specific multimodal descriptor. Given a dialogue represented by a sequence of utterances, the proposed AMuSE model condenses both spatial and temporal features into two dense descriptors: speaker-level and utterance-level. This helps not only in delivering better classification performance (3-5% improvement in Weighted-F1 and 5-7% improvement in Accuracy) in large-scale public datasets but also helps the users in understanding the reasoning behind each emotion prediction made by the model via its Multimodal Explainability Visualization module.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Finger Biometric Recognition With Feature Selection
Authors:
Asish Bera,
Debotosh Bhattacharjee,
Mita Nasipuri
Abstract:
Biometrics is indispensable in this modern digital era for secure automated human authentication in various fields of machine learning and pattern recognition. Hand geometry is a promising physiological biometric trait with ample deployed application areas for identity verification. Due to the intricate anatomic foundation of the thumb and substantial inter-finger posture variation, satisfactory p…
▽ More
Biometrics is indispensable in this modern digital era for secure automated human authentication in various fields of machine learning and pattern recognition. Hand geometry is a promising physiological biometric trait with ample deployed application areas for identity verification. Due to the intricate anatomic foundation of the thumb and substantial inter-finger posture variation, satisfactory performances cannot be achieved while the thumb is included in the contact-free environment. To overcome the hindrances associated with the thumb, four finger-based (excluding the thumb) biometric approaches have been devised. In this chapter, a four-finger based biometric method has been presented. Again, selection of salient features is essential to reduce the feature dimensionality by eliminating the insignificant features. Weights are assigned according to the discriminative efficiency of the features to emphasize on the essential features. Two different strategies namely, the global and local feature selection methods are adopted based on the adaptive forward-selection and backward-elimination (FoBa) algorithm. The identification performances are evaluated using the weighted k-nearest neighbor (wk-NN) and random forest (RF) classifiers. The experiments are conducted using the selected feature subsets over the 300 subjects of the Bosphorus hand database. The best identification accuracy of 98.67%, and equal error rate (EER) of 4.6% have been achieved using the subset of 25 features which are selected by the rank-based local FoBa algorithm.
△ Less
Submitted 19 December, 2023; v1 submitted 16 December, 2023;
originally announced December 2023.
-
Vision Transformer Adapters for Generalizable Multitask Learning
Authors:
Deblina Bhattacharjee,
Sabine Süsstrunk,
Mathieu Salzmann
Abstract:
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contr…
▽ More
We introduce the first multitasking vision transformer adapters that learn generalizable task affinities which can be applied to novel tasks and domains. Integrated into an off-the-shelf vision transformer backbone, our adapters can simultaneously solve multiple dense vision tasks in a parameter-efficient manner, unlike existing multitasking transformers that are parametrically expensive. In contrast to concurrent methods, we do not require retraining or fine-tuning whenever a new task or domain is added. We introduce a task-adapted attention mechanism within our adapter framework that combines gradient-based task similarities with attention-based ones. The learned task affinities generalize to the following settings: zero-shot task transfer, unsupervised domain adaptation, and generalization without fine-tuning to novel domains. We demonstrate that our approach outperforms not only the existing convolutional neural network-based multitasking methods but also the vision transformer-based ones. Our project page is at \url{https://ivrl.github.io/VTAGML}.
△ Less
Submitted 23 August, 2023;
originally announced August 2023.
-
Deep Neural Networks Fused with Textures for Image Classification
Authors:
Asish Bera,
Debotosh Bhattacharjee,
Mita Nasipuri
Abstract:
Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations. Deep learning methods have achieved remarkable success in solving FGIC. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extra…
▽ More
Fine-grained image classification (FGIC) is a challenging task in computer vision for due to small visual differences among inter-subcategories, but, large intra-class variations. Deep learning methods have achieved remarkable success in solving FGIC. In this paper, we propose a fusion approach to address FGIC by combining global texture with local patch-based information. The first pipeline extracts deep features from various fixed-size non-overlapping patches and encodes features by sequential modelling using the long short-term memory (LSTM). Another path computes image-level textures at multiple scales using the local binary patterns (LBP). The advantages of both streams are integrated to represent an efficient feature vector for image classification. The method is tested on eight datasets representing the human faces, skin lesions, food dishes, marine lives, etc. using four standard backbone CNNs. Our method has attained better classification accuracy over existing methods with notable margins.
△ Less
Submitted 31 March, 2024; v1 submitted 3 August, 2023;
originally announced August 2023.
-
Fine-Grained Sports, Yoga, and Dance Postures Recognition: A Benchmark Analysis
Authors:
Asish Bera,
Mita Nasipuri,
Ondrej Krejcar,
Debotosh Bhattacharjee
Abstract:
Human body-pose estimation is a complex problem in computer vision. Recent research interests have been widened specifically on the Sports, Yoga, and Dance (SYD) postures for maintaining health conditions. The SYD pose categories are regarded as a fine-grained image classification task due to the complex movement of body parts. Deep Convolutional Neural Networks (CNNs) have attained significantly…
▽ More
Human body-pose estimation is a complex problem in computer vision. Recent research interests have been widened specifically on the Sports, Yoga, and Dance (SYD) postures for maintaining health conditions. The SYD pose categories are regarded as a fine-grained image classification task due to the complex movement of body parts. Deep Convolutional Neural Networks (CNNs) have attained significantly improved performance in solving various human body-pose estimation problems. Though decent progress has been achieved in yoga postures recognition using deep learning techniques, fine-grained sports, and dance recognition necessitates ample research attention. However, no benchmark public image dataset with sufficient inter-class and intra-class variations is available yet to address sports and dance postures classification. To solve this limitation, we have proposed two image datasets, one for 102 sport categories and another for 12 dance styles. Two public datasets, Yoga-82 which contains 82 classes and Yoga-107 represents 107 classes are collected for yoga postures. These four SYD datasets are experimented with the proposed deep model, SYD-Net, which integrates a patch-based attention (PbA) mechanism on top of standard backbone CNNs. The PbA module leverages the self-attention mechanism that learns contextual information from a set of uniform and multi-scale patches and emphasizes discriminative features to understand the semantic correlation among patches. Moreover, random erasing data augmentation is applied to improve performance. The proposed SYD-Net has achieved state-of-the-art accuracy on Yoga-82 using five base CNNs. SYD-Net's accuracy on other datasets is remarkable, implying its efficiency. Our Sports-102 and Dance-12 datasets are publicly available at https://sites.google.com/view/syd-net/home.
△ Less
Submitted 1 August, 2023;
originally announced August 2023.
-
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges
Authors:
Debesh Jha,
Vanshali Sharma,
Debapriya Banik,
Debayan Bhattacharya,
Kaushiki Roy,
Steven A. Hicks,
Nikhil Kumar Tomar,
Vajira Thambawita,
Adrian Krenzer,
Ge-Peng Ji,
Sahadev Poudel,
George Batchkala,
Saruar Alam,
Awadelrahman M. A. Ahmed,
Quoc-Huy Trinh,
Zeshan Khan,
Tien-Phat Nguyen,
Shruti Shrestha,
Sabari Nathan,
Jeonghwan Gwak,
Ritika K. Jha,
Zheyuan Zhang,
Alexander Schlaefer,
Debotosh Bhattacharjee,
M. K. Bhuyan
, et al. (8 additional authors not shown)
Abstract:
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has…
▽ More
Automatic analysis of colonoscopy images has been an active field of research motivated by the importance of early detection of precancerous polyps. However, detecting polyps during the live examination can be challenging due to various factors such as variation of skills and experience among the endoscopists, lack of attentiveness, and fatigue leading to a high polyp miss-rate. Deep learning has emerged as a promising solution to this challenge as it can assist endoscopists in detecting and classifying overlooked polyps and abnormalities in real time. In addition to the algorithm's accuracy, transparency and interpretability are crucial to explaining the whys and hows of the algorithm's prediction. Further, most algorithms are developed in private data, closed source, or proprietary software, and methods lack reproducibility. Therefore, to promote the development of efficient and transparent methods, we have organized the "Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image Segmentation (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic. For the transparency task, a multi-disciplinary team, including expert gastroenterologists, accessed each submission and evaluated the team based on open-source practices, failure case analysis, ablation studies, usability and understandability of evaluations to gain a deeper understanding of the models' credibility for clinical deployment. Through the comprehensive analysis of the challenge, we not only highlight the advancements in polyp and surgical instrument segmentation but also encourage qualitative evaluation for building more transparent and understandable AI-based colonoscopy systems.
△ Less
Submitted 6 May, 2024; v1 submitted 30 July, 2023;
originally announced July 2023.
-
Dense Multitask Learning to Reconfigure Comics
Authors:
Deblina Bhattacharjee,
Sabine Süsstrunk,
Mathieu Salzmann
Abstract:
In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comic panels. This is a significantly c…
▽ More
In this paper, we develop a MultiTask Learning (MTL) model to achieve dense predictions for comics panels to, in turn, facilitate the transfer of comics from one publication channel to another by assisting authors in the task of reconfiguring their narratives. Our MTL method can successfully identify the semantic units as well as the embedded notion of 3D in comic panels. This is a significantly challenging problem because comics comprise disparate artistic styles, illustrations, layouts, and object scales that depend on the authors creative process. Typically, dense image-based prediction techniques require a large corpus of data. Finding an automated solution for dense prediction in the comics domain, therefore, becomes more difficult with the lack of ground-truth dense annotations for the comics images. To address these challenges, we develop the following solutions: 1) we leverage a commonly-used strategy known as unsupervised image-to-image translation, which allows us to utilize a large corpus of real-world annotations; 2) we utilize the results of the translations to develop our multitasking approach that is based on a vision transformer backbone and a domain transferable attention module; 3) we study the feasibility of integrating our MTL dense-prediction method with an existing retargeting method, thereby reconfiguring comics.
△ Less
Submitted 16 July, 2023;
originally announced July 2023.
-
Laser Inter-Satellite Link Setup Delay: Quantification, Impact, and Tolerable Value
Authors:
Dhiraj Bhattacharjee,
Aizaz U. Chaudhry,
Halim Yanikomeroglu,
Peng Hu,
Guillaume Lamontagne
Abstract:
Dynamic laser inter-satellite links (LISLs) provide the flexibility of connecting a pair of satellites as required (dynamically) while static LISLs need to be active continuously between the energy-constrained satellites. However, due to the LISL establishment time (termed herein as LISL setup delay) being in the order of seconds, realizing dynamic LISLs is currently unfeasible. Towards the realiz…
▽ More
Dynamic laser inter-satellite links (LISLs) provide the flexibility of connecting a pair of satellites as required (dynamically) while static LISLs need to be active continuously between the energy-constrained satellites. However, due to the LISL establishment time (termed herein as LISL setup delay) being in the order of seconds, realizing dynamic LISLs is currently unfeasible. Towards the realization of dynamic LISLs, we first study the quantification of LISL setup delay; then we calculate the end-to-end latency of a free-space optical satellite network (FSOSN) with the LISL setup delay; subsequently, we analyze the impact of LISL setup delay on the end-to-end latency of the FSOSN. We also provide design guidelines for the laser communication terminal manufacturers in the form of maximum tolerable value of LISL setup delay for which the FSOSN based on Starlink's Phase I satellite constellation will be meaningful to use for low-latency long-distance inter-continental data communications.
△ Less
Submitted 12 January, 2023;
originally announced January 2023.
-
MulT: An End-to-End Multitask Learning Transformer
Authors:
Deblina Bhattacharjee,
Tong Zhang,
Sabine Süsstrunk,
Mathieu Salzmann
Abstract:
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for e…
▽ More
We propose an end-to-end Multitask Learning Transformer framework, named MulT, to simultaneously learn multiple high-level vision tasks, including depth estimation, semantic segmentation, reshading, surface normal estimation, 2D keypoint detection, and edge detection. Based on the Swin transformer model, our framework encodes the input image into a shared representation and makes predictions for each vision task using task-specific transformer-based decoder heads. At the heart of our approach is a shared attention mechanism modeling the dependencies across the tasks. We evaluate our model on several multitask benchmarks, showing that our MulT framework outperforms both the state-of-the art multitask convolutional neural network models and all the respective single task transformer models. Our experiments further highlight the benefits of sharing attention across all the tasks, and demonstrate that our MulT model is robust and generalizes well to new domains. Our project website is at https://ivrl.github.io/MulT/.
△ Less
Submitted 17 May, 2022;
originally announced May 2022.
-
Spoofing Detection on Hand Images Using Quality Assessment
Authors:
Asish Bera,
Ratnadeep Dey,
Debotosh Bhattacharjee,
Mita Nasipuri,
Hubert P. H. Shum
Abstract:
Recent research on biometrics focuses on achieving a high success rate of authentication and addressing the concern of various spoofing attacks. Although hand geometry recognition provides adequate security over unauthorized access, it is susceptible to presentation attack. This paper presents an anti-spoofing method toward hand biometrics. A presentation attack detection approach is addressed by…
▽ More
Recent research on biometrics focuses on achieving a high success rate of authentication and addressing the concern of various spoofing attacks. Although hand geometry recognition provides adequate security over unauthorized access, it is susceptible to presentation attack. This paper presents an anti-spoofing method toward hand biometrics. A presentation attack detection approach is addressed by assessing the visual quality of genuine and fake hand images. A threshold-based gradient magnitude similarity quality metric is proposed to discriminate between the real and spoofed hand samples. The visual hand images of 255 subjects from the Bogazici University hand database are considered as original samples. Correspondingly, from each genuine sample, we acquire a forged image using a Canon EOS 700D camera. Such fake hand images with natural degradation are considered for electronic screen display based spoofing attack detection. Furthermore, we create another fake hand dataset with artificial degradation by introducing additional Gaussian blur, salt and pepper, and speckle noises to original images. Ten quality metrics are measured from each sample for classification between original and fake hand image. The classification experiments are performed using the k-nearest neighbors, random forest, and support vector machine classifiers, as well as deep convolutional neural networks. The proposed gradient similarity-based quality metric achieves 1.5% average classification er ror using the k-nearest neighbors and random forest classifiers. An average classification error of 2.5% is obtained using the baseline evaluation with the MobileNetV2 deep network for discriminating original and different types of fake hand samples.
△ Less
Submitted 22 October, 2021;
originally announced October 2021.
-
Estimating Image Depth in the Comics Domain
Authors:
Deblina Bhattacharjee,
Martin Everaert,
Mathieu Salzmann,
Sabine Süsstrunk
Abstract:
Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth…
▽ More
Estimating the depth of comics images is challenging as such images a) are monocular; b) lack ground-truth depth annotations; c) differ across different artistic styles; d) are sparse and noisy. We thus, use an off-the-shelf unsupervised image to image translation method to translate the comics images to natural ones and then use an attention-guided monocular depth estimator to predict their depth. This lets us leverage the depth annotations of existing natural images to train the depth estimator. Furthermore, our model learns to distinguish between text and images in the comics panels to reduce text-based artefacts in the depth estimates. Our method consistently outperforms the existing state-ofthe-art approaches across all metrics on both the DCM and eBDtheque images. Finally, we introduce a dataset to evaluate depth prediction on comics. Our project website can be accessed at https://github.com/IVRL/ComicsDepth.
△ Less
Submitted 15 August, 2022; v1 submitted 7 October, 2021;
originally announced October 2021.
-
Fidelity Estimation Improves Noisy-Image Classification With Pretrained Networks
Authors:
Xiaoyu Lin,
Deblina Bhattacharjee,
Majed El Helou,
Sabine Süsstrunk
Abstract:
Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods a…
▽ More
Image classification has significantly improved using deep learning. This is mainly due to convolutional neural networks (CNNs) that are capable of learning rich feature extractors from large datasets. However, most deep learning classification methods are trained on clean images and are not robust when handling noisy ones, even if a restoration preprocessing step is applied. While novel methods address this problem, they rely on modified feature extractors and thus necessitate retraining. We instead propose a method that can be applied on a $pretrained$ classifier. Our method exploits a fidelity map estimate that is fused into the internal representations of the feature extractor, thereby guiding the attention of the network and making it more robust to noisy data. We improve the noisy-image classification (NIC) results by significantly large margins, especially at high noise levels, and come close to the fully retrained approaches. Furthermore, as proof of concept, we show that when using our oracle fidelity map we even outperform the fully retrained methods, whether trained on noisy or restored images.
△ Less
Submitted 4 October, 2021; v1 submitted 1 June, 2021;
originally announced June 2021.
-
Modeling Object Dissimilarity for Deep Saliency Prediction
Authors:
Bahar Aydemir,
Deblina Bhattacharjee,
Tong Zhang,
Seungryong Kim,
Mathieu Salzmann,
Sabine Süsstrunk
Abstract:
Saliency prediction has made great strides over the past two decades, with current techniques modeling low-level information, such as color, intensity and size contrasts, and high-level ones, such as attention and gaze direction for entire objects. Despite this, these methods fail to account for the dissimilarity between objects, which affects human visual attention. In this paper, we introduce a…
▽ More
Saliency prediction has made great strides over the past two decades, with current techniques modeling low-level information, such as color, intensity and size contrasts, and high-level ones, such as attention and gaze direction for entire objects. Despite this, these methods fail to account for the dissimilarity between objects, which affects human visual attention. In this paper, we introduce a detection-guided saliency prediction network that explicitly models the differences between multiple objects, such as their appearance and size dissimilarities. Our approach allows us to fuse our object dissimilarities with features extracted by any deep saliency prediction network. As evidenced by our experiments, this consistently boosts the accuracy of the baseline networks, enabling us to outperform the state-of-the-art models on three saliency benchmarks, namely SALICON, MIT300 and CAT2000. Our project page is at https://github.com/IVRL/DisSal.
△ Less
Submitted 24 November, 2022; v1 submitted 8 April, 2021;
originally announced April 2021.
-
CONTRA: Area-Constrained Technology Mapping Framework For Memristive Memory Processing Unit
Authors:
Debjyoti Bhattacharjee,
Anupam Chattopadhyay,
Srijit Dutta,
Ronny Ronen,
Shahar Kvatinsky
Abstract:
Data-intensive applications are poised to benefit directly from processing-in-memory platforms, such as memristive Memory Processing Units, which allow leveraging data locality and performing stateful logic operations. Developing design automation flows for such platforms is a challenging and highly relevant research problem. In this work, we investigate the problem of minimizing delay under arbit…
▽ More
Data-intensive applications are poised to benefit directly from processing-in-memory platforms, such as memristive Memory Processing Units, which allow leveraging data locality and performing stateful logic operations. Developing design automation flows for such platforms is a challenging and highly relevant research problem. In this work, we investigate the problem of minimizing delay under arbitrary area constraint for MAGIC-based in-memory computing platforms. We propose an end-to-end area constrained technology mapping framework, CONTRA. CONTRA uses Look-Up Table(LUT) based mapping of the input function on the crossbar array to maximize parallel operations and uses a novel search technique to move data optimally inside the array. CONTRA supports benchmarks in a variety of formats, along with crossbar dimensions as input to generate MAGIC instructions. CONTRA scales for large benchmarks, as demonstrated by our experiments. CONTRA allows mapping benchmarks to smaller crossbar dimensions than achieved by any other technique before, while allowing a wide variety of area-delay trade-offs. CONTRA improves the composite metric of area-delay product by 2.1x to 13.1x compared to seven existing technology mapping approaches.
△ Less
Submitted 2 September, 2020;
originally announced September 2020.
-
New techniques for fault-tolerant decomposition of Multi-Controlled Toffoli gate
Authors:
Laxmidhar Biswal,
Debjyoti Bhattacharjee,
Anupam Chattopadhyay,
Hafizur Rahaman
Abstract:
Physical implementation of scalable quantum architectures faces an immense challenge in form of fragile quantum states. To overcome it, quantum architectures with fault tolerance is desirable. This is achieved currently by using surface code along with a transversal gate set. This dictates the need for decomposition of universal Multi Control Toffoli~(MCT) gates using a transversal gate set. Addit…
▽ More
Physical implementation of scalable quantum architectures faces an immense challenge in form of fragile quantum states. To overcome it, quantum architectures with fault tolerance is desirable. This is achieved currently by using surface code along with a transversal gate set. This dictates the need for decomposition of universal Multi Control Toffoli~(MCT) gates using a transversal gate set. Additionally, the transversal non-Clifford phase gate incurs high latency which makes it an important factor to consider during decomposition.Besides, the decomposition of large Multi-control Toffoli~(MCT) gate without ancilla presents an additional hurdle. In this manuscript, we address both of these issues by introducing Clifford+$Z_N$ gate library. We present an ancilla free decomposition of MCT gates with linear phase depth and quadratic phase count. Furthermore, we provide a technique for decomposition of MCT gates in unit phase depth using the Clifford+$Z_N$ library, albeit at the cost of ancillary lines and quadratic phase count.
△ Less
Submitted 28 April, 2019; v1 submitted 15 April, 2019;
originally announced April 2019.
-
Crossbar-Constrained Technology Mapping for ReRAM based In-Memory Computing
Authors:
Debjyoti Bhattacharjee,
Yaswanth Tavva,
Arvind Easwaran,
Anupam Chattopadhyay
Abstract:
In recent times, Resistive RAMs (ReRAMs) have gained significant prominence due to their unique feature of supporting both non-volatile storage and logic capabilities. ReRAM is also reported to provide extremely low power consumption compared to the standard CMOS storage devices. As a result, researchers have explored the mapping and design of diverse applications, ranging from arithmetic to neuro…
▽ More
In recent times, Resistive RAMs (ReRAMs) have gained significant prominence due to their unique feature of supporting both non-volatile storage and logic capabilities. ReRAM is also reported to provide extremely low power consumption compared to the standard CMOS storage devices. As a result, researchers have explored the mapping and design of diverse applications, ranging from arithmetic to neuromorphic computing structures to ReRAM-based platforms. ReVAMP, a general-purpose ReRAM computing platform, has been proposed recently to leverage the parallelism exhibited in a crossbar structure. However, the technology mapping on ReVAMP remains an open challenge. Though the technology mapping with device/area-constraints have been proposed, crossbar constraints are not considered so far. In this work, we address this problem. Two technology mapping flows are proposed, considering different runtime-efficiency trade-offs. Both the mapping flows take crossbar constraints into account and generate feasible mapping for a variety of crossbar dimensions. Our proposed algorithms are highly scalable and reveal important design hints for ReRAM-based implementations.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.
-
Quantum Circuits for Toom-Cook Multiplication
Authors:
Srijit Dutta,
Debjyoti Bhattacharjee,
Anupam Chattopadhyay
Abstract:
In this paper, we report efficient quantum circuits for integer multiplication using Toom-Cook algorithm. By analysing the recursive tree structure of the algorithm, we obtained a bound on the count of Toffoli gates and qubits. These bounds are further improved by employing reversible pebble games through uncomputing the intermediate results. The asymptotic bounds for different performance metrics…
▽ More
In this paper, we report efficient quantum circuits for integer multiplication using Toom-Cook algorithm. By analysing the recursive tree structure of the algorithm, we obtained a bound on the count of Toffoli gates and qubits. These bounds are further improved by employing reversible pebble games through uncomputing the intermediate results. The asymptotic bounds for different performance metrics of the proposed quantum circuit are superior to the prior implementations of multiplier circuits using schoolbook and Karatsuba algorithms.
△ Less
Submitted 23 June, 2018; v1 submitted 7 May, 2018;
originally announced May 2018.
-
Depth-Optimal Quantum Circuit Placement for Arbitrary Topologies
Authors:
Debjyoti Bhattacharjee,
Anupam Chattopadhyay
Abstract:
A significant hurdle towards realization of practical and scalable quantum computing is to protect the quantum states from inherent noises during the computation. In physical implementation of quantum circuits, a long-distance interaction between two qubits is undesirable since, it can be interpreted as a noise. Therefore, multiple quantum technologies and quantum error correcting codes strongly r…
▽ More
A significant hurdle towards realization of practical and scalable quantum computing is to protect the quantum states from inherent noises during the computation. In physical implementation of quantum circuits, a long-distance interaction between two qubits is undesirable since, it can be interpreted as a noise. Therefore, multiple quantum technologies and quantum error correcting codes strongly require the interacting qubits to be arranged in a nearest neighbor (NN) fashion. The current literature on converting a given quantum circuit to an NN-arranged one mainly considered chained qubit topologies or Linear Nearest Neighbor (LNN) topology. However, practical quantum circuit realizations, such as Nuclear Magnetic Resonance (NMR), may not have an LNN topology. To address this gap, we consider an arbitrary qubit topology. We present an Integer Linear Programming (ILP) formulation for achieving minimal logical depth while guaranteeing the nearest neighbor arrangement between the interacting qubits. We substantiate our claim with studies on diverse network topologies and prominent quantum circuit benchmarks.
△ Less
Submitted 23 March, 2017;
originally announced March 2017.
-
A Novel Approach for Human Action Recognition from Silhouette Images
Authors:
Satyabrata Maity,
Debotosh Bhattacharjee,
Amlan Chakrabarti
Abstract:
In this paper, a novel human action recognition technique from video is presented. Any action of human is a combination of several micro action sequences performed by one or more body parts of the human. The proposed approach uses spatio-temporal body parts movement (STBPM) features extracted from foreground silhouette of the human objects. The newly proposed STBPM feature estimates the movements…
▽ More
In this paper, a novel human action recognition technique from video is presented. Any action of human is a combination of several micro action sequences performed by one or more body parts of the human. The proposed approach uses spatio-temporal body parts movement (STBPM) features extracted from foreground silhouette of the human objects. The newly proposed STBPM feature estimates the movements of different body parts for any given time segment to classify actions. We also proposed a rule based logic named rule action classifier (RAC), which uses a series of condition action rules based on prior knowledge and hence does not required training to classify any action. Since we don't require training to classify actions, the proposed approach is view independent. The experimental results on publicly available Wizeman and MuHVAi datasets are compared with that of the related research work in terms of accuracy in the human action detection, and proposed technique outperforms the others.
△ Less
Submitted 15 October, 2015;
originally announced October 2015.
-
Robust 3D face recognition in presence of pose and partial occlusions or missing parts
Authors:
Parama Bagchi,
Debotosh Bhattacharjee,
Mita Nasipuri
Abstract:
In this paper, we propose a robust 3D face recognition system which can handle pose as well as occlusions in real world. The system at first takes as input, a 3D range image, simultaneously registers it using ICP(Iterative Closest Point) algorithm. ICP used in this work, registers facial surfaces to a common model by minimizing distances between a probe model and a gallery model. However the perfo…
▽ More
In this paper, we propose a robust 3D face recognition system which can handle pose as well as occlusions in real world. The system at first takes as input, a 3D range image, simultaneously registers it using ICP(Iterative Closest Point) algorithm. ICP used in this work, registers facial surfaces to a common model by minimizing distances between a probe model and a gallery model. However the performance of ICP relies heavily on the initial conditions. Hence, it is necessary to provide an initial registration, which will be improved iteratively and finally converge to the best alignment possible. Once the faces are registered, the occlusions are automatically extracted by thresholding the depth map values of the 3D image. After the occluded regions are detected, restoration is done by Principal Component Analysis (PCA). The restored images, after the removal of occlusions, are then fed to the recognition system for classification purpose. Features are extracted from the reconstructed non-occluded face images in the form of face normals. The experimental results which were obtained on the occluded facial images from the Bosphorus 3D face database, illustrate that our occlusion compensation scheme has attained a recognition accuracy of 91.30%.
△ Less
Submitted 16 August, 2014;
originally announced August 2014.
-
Human Face Recognition using Gabor based Kernel Entropy Component Analysis
Authors:
Arindam Kar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri,
Mahantapas Kundu
Abstract:
In this paper, we present a novel Gabor wavelet based Kernel Entropy Component Analysis (KECA) method by integrating the Gabor wavelet transformation (GWT) of facial images with the KECA method for enhanced face recognition performance. Firstly, from the Gabor wavelet transformed images the most important discriminative desirable facial features characterized by spatial frequency, spatial locality…
▽ More
In this paper, we present a novel Gabor wavelet based Kernel Entropy Component Analysis (KECA) method by integrating the Gabor wavelet transformation (GWT) of facial images with the KECA method for enhanced face recognition performance. Firstly, from the Gabor wavelet transformed images the most important discriminative desirable facial features characterized by spatial frequency, spatial locality and orientation selectivity to cope with the variations due to illumination and facial expression changes were derived. After that KECA, relating to the Renyi entropy is extended to include cosine kernel function. The KECA with the cosine kernels is then applied on the extracted most important discriminating feature vectors of facial images to obtain only those real kernel ECA eigenvectors that are associated with eigenvalues having positive entropy contribution. Finally, these real KECA features are used for image classification using the L1, L2 distance measures; the Mahalanobis distance measure and the cosine similarity measure. The feasibility of the Gabor based KECA method with the cosine kernel has been successfully tested on both frontal and pose-angled face recognition, using datasets from the ORL, FRAV2D and the FERET database.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
High Performance Human Face Recognition using Gabor based Pseudo Hidden Markov Model
Authors:
Arindam Kar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri,
Mahantapas Kundu
Abstract:
This paper introduces a novel methodology that combines the multi-resolution feature of the Gabor wavelet transformation (GWT) with the local interactions of the facial structures expressed through the Pseudo Hidden Markov model (PHMM). Unlike the traditional zigzag scanning method for feature extraction a continuous scanning method from top-left corner to right then top-down and right to left and…
▽ More
This paper introduces a novel methodology that combines the multi-resolution feature of the Gabor wavelet transformation (GWT) with the local interactions of the facial structures expressed through the Pseudo Hidden Markov model (PHMM). Unlike the traditional zigzag scanning method for feature extraction a continuous scanning method from top-left corner to right then top-down and right to left and so on until right-bottom of the image i.e. a spiral scanning technique has been proposed for better feature selection. Unlike traditional HMMs, the proposed PHMM does not perform the state conditional independence of the visible observation sequence assumption. This is achieved via the concept of local structures introduced by the PHMM used to extract facial bands and automatically select the most informative features of a face image. Thus, the long-range dependency problem inherent to traditional HMMs has been drastically reduced. Again with the use of most informative pixels rather than the whole image makes the proposed method reasonably faster for face recognition. This method has been successfully tested on frontal face images from the ORL, FRAV2D and FERET face databases where the images vary in pose, illumination, expression, and scale. The FERET data set contains 2200 frontal face images of 200 subjects, while the FRAV2D data set consists of 1100 images of 100 subjects and the full ORL database is considered. The results reported in this application are far better than the recent and most referred systems.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
Face Recognition using Hough Peaks extracted from the significant blocks of the Gradient Image
Authors:
Arindam Kar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri,
Mahantapas Kundu
Abstract:
This paper proposes a new technique for automatic face recognition using integrated peaks of the Hough transformed significant blocks of the binary gradient image. In this approach firstly the gradient of an image is calculated and a threshold is set to obtain a binary gradient image, which is less sensitive to noise and illumination changes. Secondly, significant blocks are extracted from the abs…
▽ More
This paper proposes a new technique for automatic face recognition using integrated peaks of the Hough transformed significant blocks of the binary gradient image. In this approach firstly the gradient of an image is calculated and a threshold is set to obtain a binary gradient image, which is less sensitive to noise and illumination changes. Secondly, significant blocks are extracted from the absolute gradient image, to extract pertinent information with the idea of dimension reduction. Finally the best fitted Hough peaks are extracted from the Hough transformed significant blocks for efficient face recognition. Then these Hough peaks are concatenated together, which are used as feature in classification process. The efficiency of the proposed method is demonstrated by the experiment on 1100 images from the FRAV2D face database, 2200 images from the FERET database, where the images vary in pose, expression, illumination and scale and 400 images from the ORL face database, where the images slightly vary in pose. Our method has shown 93.3%, 88.5% and 99% recognition accuracy for the FRAV2D, FERET and the ORL database respectively.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
An Approach: Modality Reduction and Face-Sketch Recognition
Authors:
Sourav Pramanik,
Dr. Debotosh Bhattacharjee
Abstract:
To recognize face sketch through face photo database is a challenging task for todays researchers. Because face photo images in training set and face sketch images in testing set have different modality. Difference between two face photos of difference person is smaller than the difference between same person in a face photo and face sketched. In this paper, for reduction of the modality between f…
▽ More
To recognize face sketch through face photo database is a challenging task for todays researchers. Because face photo images in training set and face sketch images in testing set have different modality. Difference between two face photos of difference person is smaller than the difference between same person in a face photo and face sketched. In this paper, for reduction of the modality between face photo and face sketch we first bring face photo and face sketch images in a new dimension using 2D Discrete Haar wavelet transform with scale 3 followed by a negative approach. After that, extract features from transformed images using Principal Component Analysis (PCA). Thereafter, we use SVM classifier and K-NN classifier for better classification. Our proposed method is experimentally verified by its robustness against faces that are captured in a good lighting condition and in a frontal pose. The experiment has been conducted with 100 male and female face images as training set and 100 male and female face sketch images as testing set collected from CUHK training and testing cropped photos and CUHK training and testing cropped sketches.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
A Face Recognition approach based on entropy estimate of the nonlinear DCT features in the Logarithm Domain together with Kernel Entropy Component Analysis
Authors:
Arindam Kar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri,
Mahantapas Kundu
Abstract:
This paper exploits the feature extraction capabilities of the discrete cosine transform (DCT) together with an illumination normalization approach in the logarithm domain that increase its robustness to variations in facial geometry and illumination. Secondly in the same domain the entropy measures are applied on the DCT coefficients so that maximum entropy preserving pixels can be extracted as t…
▽ More
This paper exploits the feature extraction capabilities of the discrete cosine transform (DCT) together with an illumination normalization approach in the logarithm domain that increase its robustness to variations in facial geometry and illumination. Secondly in the same domain the entropy measures are applied on the DCT coefficients so that maximum entropy preserving pixels can be extracted as the feature vector. Thus the informative features of a face can be extracted in a low dimensional space. Finally, the kernel entropy component analysis (KECA) with an extension of arc cosine kernels is applied on the extracted DCT coefficients that contribute most to the entropy estimate to obtain only those real kernel ECA eigenvectors that are associated with eigenvalues having high positive entropy contribution. The resulting system was successfully tested on real image sequences and is robust to significant partial occlusion and illumination changes, validated with the experiments on the FERET, AR, FRAV2D and ORL face databases. Experimental comparison is demonstrated to prove the superiority of the proposed approach in respect to recognition accuracy. Using specificity and sensitivity we find that the best is achieved when Renyi entropy is applied on the DCT coefficients. Extensive experimental comparison is demonstrated to prove the superiority of the proposed approach in respect to recognition accuracy. Moreover, the proposed approach is very simple, computationally fast and can be implemented in any real-time face recognition system.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
A Gabor block based Kernel Discriminative Common Vector (KDCV) approach using cosine kernels for Human Face Recognition
Authors:
Arindam Kar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri,
Mahantapas Kundu
Abstract:
In this paper a nonlinear Gabor Wavelet Transform (GWT) discriminant feature extraction approach for enhanced face recognition is proposed. Firstly, the low-energized blocks from Gabor wavelet transformed images are extracted. Secondly, the nonlinear discriminating features are analyzed and extracted from the selected low-energized blocks by the generalized Kernel Discriminative Common Vector (KDC…
▽ More
In this paper a nonlinear Gabor Wavelet Transform (GWT) discriminant feature extraction approach for enhanced face recognition is proposed. Firstly, the low-energized blocks from Gabor wavelet transformed images are extracted. Secondly, the nonlinear discriminating features are analyzed and extracted from the selected low-energized blocks by the generalized Kernel Discriminative Common Vector (KDCV) method. The KDCV method is extended to include cosine kernel function in the discriminating method. The KDCV with the cosine kernels is then applied on the extracted low energized discriminating feature vectors to obtain the real component of a complex quantity for face recognition. In order to derive positive kernel discriminative vectors; we apply only those kernel discriminative eigenvectors that are associated with non-zero eigenvalues. The feasibility of the low energized Gabor block based generalized KDCV method with cosine kernel function models has been successfully tested for image classification using the L1, L2 distance measures; and the cosine similarity measure on both frontal and pose-angled face recognition. Experimental results on the FRAV2D and the FERET database demonstrate the effectiveness of this new approach.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
An adaptive block based integrated LDP,GLCM,and Morphological features for Face Recognition
Authors:
Arindam Kar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri,
Mahantapas Kundu
Abstract:
This paper proposes a technique for automatic face recognition using integrated multiple feature sets extracted from the significant blocks of a gradient image. We discuss about the use of novel morphological, local directional pattern (LDP) and gray-level co-occurrence matrix GLCM based feature extraction technique to recognize human faces. Firstly, the new morphological features i.e., features b…
▽ More
This paper proposes a technique for automatic face recognition using integrated multiple feature sets extracted from the significant blocks of a gradient image. We discuss about the use of novel morphological, local directional pattern (LDP) and gray-level co-occurrence matrix GLCM based feature extraction technique to recognize human faces. Firstly, the new morphological features i.e., features based on number of runs of pixels in four directions (N,NE,E,NW) are extracted, together with the GLCM based statistical features and LDP features that are less sensitive to the noise and non-monotonic illumination changes, are extracted from the significant blocks of the gradient image. Then these features are concatenated together. We integrate the above mentioned methods to take full advantage of the three approaches. Extraction of the significant blocks from the absolute gradient image and hence from the original image to extract pertinent information with the idea of dimension reduction forms the basis of the work. The efficiency of our method is demonstrated by the experiment on 1100 images from the FRAV2D face database, 2200 images from the FERET database, where the images vary in pose, expression, illumination and scale and 400 images from the ORL face database, where the images slightly vary in pose. Our method has shown 90.3%, 93% and 98.75% recognition accuracy for the FRAV2D, FERET and the ORL database respectively.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
Geometric Feature Based Face-Sketch Recognition
Authors:
Sourav Pramanik,
Debotosh Bhattacharjee
Abstract:
This paper presents a novel facial sketch image or face-sketch recognition approach based on facial feature extraction. To recognize a face-sketch, we have concentrated on a set of geometric face features like eyes, nose, eyebrows, lips, etc and their length and width ratio because it is difficult to match photos and sketches because they belong to two different modalities. In this system, first t…
▽ More
This paper presents a novel facial sketch image or face-sketch recognition approach based on facial feature extraction. To recognize a face-sketch, we have concentrated on a set of geometric face features like eyes, nose, eyebrows, lips, etc and their length and width ratio because it is difficult to match photos and sketches because they belong to two different modalities. In this system, first the facial features/components from training images are extracted, then ratios of length, width, and area etc. are calculated and those are stored as feature vectors for individual images. After that the mean feature vectors are computed and subtracted from each feature vector for centering of the feature vectors. In the next phase, feature vector for the incoming probe face-sketch is also computed in similar fashion. Here, K-NN classifier is used to recognize probe face-sketch. It is experimentally verified that the proposed method is robust against faces are in a frontal pose, with normal lighting and neutral expression and have no occlusions. The experiment has been conducted with 80 male and female face images from different face databases. It has useful applications for both law enforcement and digital entertainment.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
Multi-Sensor Image Fusion Based on Moment Calculation
Authors:
Sourav Pramanik,
Debotosh Bhattacharjee
Abstract:
An image fusion method based on salient features is proposed in this paper. In this work, we have concentrated on salient features of the image for fusion in order to preserve all relevant information contained in the input images and tried to enhance the contrast in fused image and also suppressed noise to a maximum extent. In our system, first we have applied a mask on two input images in order…
▽ More
An image fusion method based on salient features is proposed in this paper. In this work, we have concentrated on salient features of the image for fusion in order to preserve all relevant information contained in the input images and tried to enhance the contrast in fused image and also suppressed noise to a maximum extent. In our system, first we have applied a mask on two input images in order to conserve the high frequency information along with some low frequency information and stifle noise to a maximum extent. Thereafter, for identification of salience features from sources images, a local moment is computed in the neighborhood of a coefficient. Finally, a decision map is generated based on local moment in order to get the fused image. To verify our proposed algorithm, we have tested it on 120 sensor image pairs collected from Manchester University UK database. The experimental results show that the proposed method can provide superior fused image in terms of several quantitative fusion evaluation index.
△ Less
Submitted 5 December, 2013;
originally announced December 2013.
-
Medical Aid for Automatic Detection of Malaria
Authors:
Pramit Ghosh,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
The analysis and counting of blood cells in a microscope image can provide useful information concerning to the health of a person. In particular, morphological analysis of red blood cells deformations can effectively detect important disease like malaria. Blood images, obtained by the microscope, which is coupled with a digital camera, are analyzed by the computer for diagnosis or can be transmit…
▽ More
The analysis and counting of blood cells in a microscope image can provide useful information concerning to the health of a person. In particular, morphological analysis of red blood cells deformations can effectively detect important disease like malaria. Blood images, obtained by the microscope, which is coupled with a digital camera, are analyzed by the computer for diagnosis or can be transmitted easily to clinical centers than liquid blood samples. Automatic analysis system for the presence of Plasmodium in microscopic image of blood can greatly help pathologists and doctors that typically inspect blood films manually. Unfortunately, the analysis made by human experts is not rapid and not yet standardized due to the operators capabilities and tiredness. The paper shows how effectively and accurately it is possible to identify the Plasmodium in the blood film. In particular, the paper presents how to enhance the microscopic image and filter out the unnecessary segments followed by the threshold based segmentation and recognize the presence of Plasmodium. The proposed system can be deployed in the remote area as a supporting aid for telemedicine technology and only basic training is sufficient to operate it. This system achieved more than 98 percentage accuracy for the samples collected to test this system.
△ Less
Submitted 3 December, 2013;
originally announced December 2013.
-
Automatic White Blood Cell Measuring Aid for Medical Diagnosis
Authors:
Pramit Ghosh,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
Blood related invasive pathological investigations play a major role in diagnosis of diseases. But in India and other third world countries there are no enough pathological infrastructures for medical diagnosis. Moreover, most of the remote places of those countries have neither pathologists nor physicians. Telemedicine partially solves the lack of physicians. But the pathological investigation in…
▽ More
Blood related invasive pathological investigations play a major role in diagnosis of diseases. But in India and other third world countries there are no enough pathological infrastructures for medical diagnosis. Moreover, most of the remote places of those countries have neither pathologists nor physicians. Telemedicine partially solves the lack of physicians. But the pathological investigation infrastructure can not be integrated with the telemedicine technology. The objective of this work is to automate the blood related pathological investigation process. Detection of different white blood cells has been automated in this work. This system can be deployed in the remote area as a supporting aid for telemedicine technology and only high school education is sufficient to operate it. The proposed system achieved 97.33 percent accuracy for the samples collected to test this system.
△ Less
Submitted 3 December, 2013;
originally announced December 2013.
-
A novel approach to nose-tip and eye corners detection using H-K Curvature Analysis in case of 3D images
Authors:
Parama Bagchi,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
In this paper we present a novel method that combines a HK curvature-based approach for three-dimensional (3D) face detection in different poses (X-axis, Y-axis and Z-axis). Salient face features, such as the eyes and nose, are detected through an analysis of the curvature of the entire facial surface. All the experiments have been performed on the FRAV3D Database. After applying the proposed algo…
▽ More
In this paper we present a novel method that combines a HK curvature-based approach for three-dimensional (3D) face detection in different poses (X-axis, Y-axis and Z-axis). Salient face features, such as the eyes and nose, are detected through an analysis of the curvature of the entire facial surface. All the experiments have been performed on the FRAV3D Database. After applying the proposed algorithm to the 3D facial surface we have obtained considerably good results i.e. on 752 3D face images our method detected the eye corners for 543 face images, thus giving a 72.20% of eye corners detection and 743 face images for nose-tip detection thus giving a 98.80% of good nose tip localization
△ Less
Submitted 18 September, 2013;
originally announced September 2013.
-
Detection of pose orientation across single and multiple axes in case of 3D face images
Authors:
Parama Bagchi,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
In this paper, we propose a new approach that takes as input a 3D face image across X, Y and Z axes as well as both Y and X axes and gives output as its pose i.e. it tells whether the face is oriented with respect the X, Y or Z axes or is it oriented across multiple axes with angles of rotation up to 42 degree. All the experiments have been performed on the FRAV3D, GAVADB and Bosphorus database wh…
▽ More
In this paper, we propose a new approach that takes as input a 3D face image across X, Y and Z axes as well as both Y and X axes and gives output as its pose i.e. it tells whether the face is oriented with respect the X, Y or Z axes or is it oriented across multiple axes with angles of rotation up to 42 degree. All the experiments have been performed on the FRAV3D, GAVADB and Bosphorus database which has two figures of each individual across multiple axes. After applying the proposed algorithm to the 3D facial surface from FRAV3D on 848 3D faces, 566 3D faces were correctly recognized for pose thus giving 67% of correct identification rate. We had experimented on 420 images from the GAVADB database, and only 336 images were detected for correct pose identification rate i.e. 80% and from Bosphorus database on 560 images only 448 images were detected for correct pose identification i.e. 80%.abstract goes here.
△ Less
Submitted 18 September, 2013;
originally announced September 2013.
-
A novel approach for nose tip detection using smoothing by weighted median filtering applied to 3D face images in variant poses
Authors:
Parama Bagchi,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
This paper is based on an application of smoothing of 3D face images followed by feature detection i.e. detecting the nose tip. The present method uses a weighted mesh median filtering technique for smoothing. In this present smoothing technique we have built the neighborhood surrounding a particular point in 3D face and replaced that with the weighted value of the surrounding points in 3D face im…
▽ More
This paper is based on an application of smoothing of 3D face images followed by feature detection i.e. detecting the nose tip. The present method uses a weighted mesh median filtering technique for smoothing. In this present smoothing technique we have built the neighborhood surrounding a particular point in 3D face and replaced that with the weighted value of the surrounding points in 3D face image. After applying the smoothing technique to the 3D face images our experimental results show that we have obtained considerable improvement as compared to the algorithm without smoothing. We have used here the maximum intensity algorithm for detecting the nose-tip and this method correctly detects the nose-tip in case of any pose i.e. along X, Y, and Z axes. The present technique gave us worked successfully on 535 out of 542 3D face images as compared to the method without smoothing which worked only on 521 3D face images out of 542 face images. Thus we have obtained a 98.70% performance rate over 96.12% performance rate of the algorithm without smoothing. All the experiments have been performed on the FRAV3D database.
△ Less
Submitted 18 September, 2013;
originally announced September 2013.
-
A method for nose-tip based 3D face registration using maximum intensity algorithm
Authors:
Parama Bagchi,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak kr. Basu
Abstract:
In this paper we present a novel technique of registering 3D images across pose. In this context, we have taken into account the images which are aligned across X, Y and Z axes. We have first determined the angle across which the image is rotated with respect to X, Y and Z axes and then translation is performed on the images. After testing the proposed method on 472 images from the FRAV3D database…
▽ More
In this paper we present a novel technique of registering 3D images across pose. In this context, we have taken into account the images which are aligned across X, Y and Z axes. We have first determined the angle across which the image is rotated with respect to X, Y and Z axes and then translation is performed on the images. After testing the proposed method on 472 images from the FRAV3D database, the method correctly registers 358 images thus giving a performance rate of 75.84%.
△ Less
Submitted 13 September, 2013;
originally announced September 2013.
-
A Novel Approach in detecting pose orientation of a 3D face required for face
Authors:
Parama Bagchi,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
In this paper we present a novel approach that takes as input a 3D image and gives as output its pose i.e. it tells whether the face is oriented with respect the X, Y or Z axes with angles of rotation up to 40 degree. All the experiments have been performed on the FRAV3D Database. After applying the proposed algorithm to the 3D facial surface we have obtained i.e. on 848 3D face images our method…
▽ More
In this paper we present a novel approach that takes as input a 3D image and gives as output its pose i.e. it tells whether the face is oriented with respect the X, Y or Z axes with angles of rotation up to 40 degree. All the experiments have been performed on the FRAV3D Database. After applying the proposed algorithm to the 3D facial surface we have obtained i.e. on 848 3D face images our method detected the pose correctly for 566 face images,thus giving an approximately 67 % of correct pose detection.
△ Less
Submitted 13 September, 2013;
originally announced September 2013.
-
Thermal Human face recognition based on Haar wavelet transform and series matching technique
Authors:
Ayan Seal,
Suranjan Ganguly,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak kr. Basu
Abstract:
Thermal infrared (IR) images represent the heat patterns emitted from hot object and they do not consider the energies reflected from an object. Objects living or non-living emit different amounts of IR energy according to their body temperature and characteristics. Humans are homoeothermic and hence capable of maintaining constant temperature under different surrounding temperature. Face recognit…
▽ More
Thermal infrared (IR) images represent the heat patterns emitted from hot object and they do not consider the energies reflected from an object. Objects living or non-living emit different amounts of IR energy according to their body temperature and characteristics. Humans are homoeothermic and hence capable of maintaining constant temperature under different surrounding temperature. Face recognition from thermal (IR) images should focus on changes of temperature on facial blood vessels. These temperature changes can be regarded as texture features of images and wavelet transform is a very good tool to analyze multi-scale and multi-directional texture. Wavelet transform is also used for image dimensionality reduction, by removing redundancies and preserving original features of the image. The sizes of the facial images are normally large. So, the wavelet transform is used before image similarity is measured. Therefore this paper describes an efficient approach of human face recognition based on wavelet transform from thermal IR images. The system consists of three steps. At the very first step, human thermal IR face image is preprocessed and the face region is only cropped from the entire image. Secondly, Haar wavelet is used to extract low frequency band from the cropped face region. Lastly, the image classification between the training images and the test images is done, which is based on low-frequency components. The proposed approach is tested on a number of human thermal infrared face images created at our own laboratory and Terravic Facial IR Database. Experimental results indicated that the thermal infra red face images can be recognized by the proposed system effectively. The maximum success of 95% recognition has been achieved.
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
Minutiae Based Thermal Human Face Recognition using Label Connected Component Algorithm
Authors:
Ayan Seal,
Suranjan Ganguly,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
In this paper, a thermal infra red face recognition system for human identification and verification using blood perfusion data and back propagation feed forward neural network is proposed. The system consists of three steps. At the very first step face region is cropped from the colour 24-bit input images. Secondly face features are extracted from the croped region, which will be taken as the inp…
▽ More
In this paper, a thermal infra red face recognition system for human identification and verification using blood perfusion data and back propagation feed forward neural network is proposed. The system consists of three steps. At the very first step face region is cropped from the colour 24-bit input images. Secondly face features are extracted from the croped region, which will be taken as the input of the back propagation feed forward neural network in the third step and classification and recognition is carried out. The proposed approaches are tested on a number of human thermal infra red face images created at our own laboratory. Experimental results reveal the higher degree performance
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
A Comparative Study of Human thermal face recognition based on Haar wavelet transform (HWT) and Local Binary Pattern (LBP)
Authors:
Ayan Seal,
Suranjan Ganguly,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kumar Basu
Abstract:
Thermal infra-red (IR) images focus on changes of temperature distribution on facial muscles and blood vessels. These temperature changes can be regarded as texture features of images. A comparative study of face recognition methods working in thermal spectrum is carried out in this paper. In these study two local-matching methods based on Haar wavelet transform and Local Binary Pattern (LBP) are…
▽ More
Thermal infra-red (IR) images focus on changes of temperature distribution on facial muscles and blood vessels. These temperature changes can be regarded as texture features of images. A comparative study of face recognition methods working in thermal spectrum is carried out in this paper. In these study two local-matching methods based on Haar wavelet transform and Local Binary Pattern (LBP) are analyzed. Wavelet transform is a good tool to analyze multi-scale, multi-direction changes of texture. Local binary patterns (LBP) are a type of feature used for classification in computer vision. Firstly, human thermal IR face image is preprocessed and cropped the face region only from the entire image. Secondly, two different approaches are used to extract the features from the cropped face region. In the first approach, the training images and the test images are processed with Haar wavelet transform and the LL band and the average of LH/HL/HH bands sub-images are created for each face image. Then a total confidence matrix is formed for each face image by taking a weighted sum of the corresponding pixel values of the LL band and average band. For LBP feature extraction, each of the face images in training and test datasets is divided into 161 numbers of sub images, each of size 8X8 pixels. For each such sub images, LBP features are extracted which are concatenated in row wise manner. PCA is performed separately on the individual feature set for dimensionality reeducation. Finally two different classifiers are used to classify face images. One such classifier multi-layer feed forward neural network and another classifier is minimum distance classifier. The Experiments have been performed on the database created at our own laboratory and Terravic Facial IR Database.
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
Automated Thermal Face recognition based on Minutiae Extraction
Authors:
Ayan Seal,
Suranjan Ganguly,
Debotosh Bhattacharjee,
Mita Nasipuri,
Dipak Kr. Basu
Abstract:
In this paper an efficient approach for human face recognition based on the use of minutiae points in thermal face image is proposed. The thermogram of human face is captured by thermal infra-red camera. Image processing methods are used to pre-process the captured thermogram, from which different physiological features based on blood perfusion data are extracted. Blood perfusion data are related…
▽ More
In this paper an efficient approach for human face recognition based on the use of minutiae points in thermal face image is proposed. The thermogram of human face is captured by thermal infra-red camera. Image processing methods are used to pre-process the captured thermogram, from which different physiological features based on blood perfusion data are extracted. Blood perfusion data are related to distribution of blood vessels under the face skin. In the present work, three different methods have been used to get the blood perfusion image, namely bit-plane slicing and medial axis transform, morphological erosion and medial axis transform, sobel edge operators. Distribution of blood vessels is unique for each person and a set of extracted minutiae points from a blood perfusion data of a human face should be unique for that face. Two different methods are discussed for extracting minutiae points from blood perfusion data. For extraction of features entire face image is partitioned into equal size blocks and the total number of minutiae points from each block is computed to construct final feature vector. Therefore, the size of the feature vectors is found to be same as total number of blocks considered. A five layer feed-forward back propagation neural network is used as the classification tool. A number of experiments were conducted to evaluate the performance of the proposed face recognition methodologies with varying block size on the database created at our own laboratory. It has been found that the first method supercedes the other two producing an accuracy of 97.62% with block size 16X16 for bit-plane 4.
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
Minutiae Based Thermal Face Recognition using Blood Perfusion Data
Authors:
Ayan Seal,
Mita Nasipuri,
Debotosh Bhattacharjee,
Dipak Kumar Basu
Abstract:
This paper describes an efficient approach for human face recognition based on blood perfusion data from infra-red face images. Blood perfusion data are characterized by the regional blood flow in human tissue and therefore do not depend entirely on surrounding temperature. These data bear a great potential for deriving discriminating facial thermogram for better classification and recognition of…
▽ More
This paper describes an efficient approach for human face recognition based on blood perfusion data from infra-red face images. Blood perfusion data are characterized by the regional blood flow in human tissue and therefore do not depend entirely on surrounding temperature. These data bear a great potential for deriving discriminating facial thermogram for better classification and recognition of face images in comparison to optical image data. Blood perfusion data are related to distribution of blood vessels under the face skin. A distribution of blood vessels are unique for each person and as a set of extracted minutiae points from a blood perfusion data of a human face should be unique for that face. There may be several such minutiae point sets for a single face but all of these correspond to that particular face only. Entire face image is partitioned into equal blocks and the total number of minutiae points from each block is computed to construct final vector. Therefore, the size of the feature vectors is found to be same as total number of blocks considered. For classification, a five layer feed-forward backpropagation neural network has been used. A number of experiments were conducted to evaluate the performance of the proposed face recognition system with varying block sizes. Experiments have been performed on the database created at our own laboratory. The maximum success of 91.47% recognition has been achieved with block size 8X8.
△ Less
Submitted 4 September, 2013;
originally announced September 2013.
-
A Rough Computing based Performance Evaluation Approach for Educational Institutions
Authors:
Debi Prasanna Acharjya,
Debarati Bhattacharjee
Abstract:
Performance evaluation of various organizations especially educational institutions is a very important area of research and needs to be cultivated more. In this paper, we propose a performance evaluation for educational institutions using rough set on fuzzy approximation spaces with ordering rules and information entropy. In order to measure the performance of educational institutions, we constru…
▽ More
Performance evaluation of various organizations especially educational institutions is a very important area of research and needs to be cultivated more. In this paper, we propose a performance evaluation for educational institutions using rough set on fuzzy approximation spaces with ordering rules and information entropy. In order to measure the performance of educational institutions, we construct an evaluation index system. Rough set on fuzzy approximation spaces with ordering is applied to explore the evaluation index data of each level. Furthermore, the concept of information entropy is used to determine the weighting coefficients of evaluation indexes. Also, we find the most important indexes that influence the weighting coefficients. The proposed approach is validated and shows the practical viability. Moreover, the proposed approach can be applicable to any organizations.
△ Less
Submitted 3 August, 2013;
originally announced August 2013.
-
High Performance Human Face Recognition using Independent High Intensity Gabor Wavelet Responses: A Statistical Approach
Authors:
Arindam Kar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri,
Mahantapas Kundu
Abstract:
In this paper, we present a technique by which high-intensity feature vectors extracted from the Gabor wavelet transformation of frontal face images, is combined together with Independent Component Analysis (ICA) for enhanced face recognition. Firstly, the high-intensity feature vectors are automatically extracted using the local characteristics of each individual face from the Gabor transformed i…
▽ More
In this paper, we present a technique by which high-intensity feature vectors extracted from the Gabor wavelet transformation of frontal face images, is combined together with Independent Component Analysis (ICA) for enhanced face recognition. Firstly, the high-intensity feature vectors are automatically extracted using the local characteristics of each individual face from the Gabor transformed images. Then ICA is applied on these locally extracted high-intensity feature vectors of the facial images to obtain the independent high intensity feature (IHIF) vectors. These IHIF forms the basis of the work. Finally, the image classification is done using these IHIF vectors, which are considered as representatives of the images. The importance behind implementing ICA along with the high-intensity features of Gabor wavelet transformation is twofold. On the one hand, selecting peaks of the Gabor transformed face images exhibit strong characteristics of spatial locality, scale, and orientation selectivity. Thus these images produce salient local features that are most suitable for face recognition. On the other hand, as the ICA employs locally salient features from the high informative facial parts, it reduces redundancy and represents independent features explicitly. These independent features are most useful for subsequent facial discrimination and associative recall. The efficiency of IHIF method is demonstrated by the experiment on frontal facial images dataset, selected from the FERET, FRAV2D, and the ORL database.
△ Less
Submitted 17 June, 2011;
originally announced June 2011.
-
Next Level of Data Fusion for Human Face Recognition
Authors:
Mrinal Kanti Bhowmik,
Gautam Majumdar,
Debotosh Bhattacharjee,
Dipak Kumar Basu,
Mita Nasipuri
Abstract:
This paper demonstrates two different fusion techniques at two different levels of a human face recognition process. The first one is called data fusion at lower level and the second one is the decision fusion towards the end of the recognition process. At first a data fusion is applied on visual and corresponding thermal images to generate fused image. Data fusion is implemented in the wavelet do…
▽ More
This paper demonstrates two different fusion techniques at two different levels of a human face recognition process. The first one is called data fusion at lower level and the second one is the decision fusion towards the end of the recognition process. At first a data fusion is applied on visual and corresponding thermal images to generate fused image. Data fusion is implemented in the wavelet domain after decomposing the images through Daubechies wavelet coefficients (db2). During the data fusion maximum of approximate and other three details coefficients are merged together. After that Principle Component Analysis (PCA) is applied over the fused coefficients and finally two different artificial neural networks namely Multilayer Perceptron(MLP) and Radial Basis Function(RBF) networks have been used separately to classify the images. After that, for decision fusion based decisions from both the classifiers are combined together using Bayesian formulation. For experiments, IRIS thermal/visible Face Database has been used. Experimental results show that the performance of multiple classifier system along with decision fusion works well over the single classifier system.
△ Less
Submitted 17 June, 2011;
originally announced June 2011.