subscribe to arXiv mailings

A Timely Survey on Vision Transformer for Deepfake Detection

Authors: Zhikan Wang, Zhongyao Cheng, Jiajie Xiong, Xun Xu, Tianrui Li, Bharadwaj Veeravalli, Xulei Yang

Abstract: In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (… ▽ More In recent years, the rapid advancement of deepfake technology has revolutionized content creation, lowering forgery costs while elevating quality. However, this progress brings forth pressing concerns such as infringements on individual rights, national security threats, and risks to public safety. To counter these challenges, various detection methodologies have emerged, with Vision Transformer (ViT)-based approaches showcasing superior performance in generality and efficiency. This survey presents a timely overview of ViT-based deepfake detection models, categorized into standalone, sequential, and parallel architectures. Furthermore, it succinctly delineates the structure and characteristics of each model. By analyzing existing research and addressing future directions, this survey aims to equip researchers with a nuanced understanding of ViT's pivotal role in deepfake detection, serving as a valuable reference for both academic and practical pursuits in this domain. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2403.19278 [pdf, other]

CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection

Authors: Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan

Abstract: Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, es… ▽ More Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, especially where one class is a majority and the other minority, has a large impact on class bias. We propose Class-Aware Teacher (CAT) to address the class bias issue in the domain adaptation setting. In our work, we approximate the class relationships with our Inter-Class Relation module (ICRm) and exploit it to reduce the bias within the model. In this way, we are able to apply augmentations to highly related classes, both inter- and intra-domain, to boost the performance of minority classes while having minimal impact on majority classes. We further reduce the bias by implementing a class-relation weight to our classification loss. Experiments conducted on various datasets and ablation studies show that our method is able to address the class bias in the domain adaptation setting. On the Cityscapes to Foggy Cityscapes dataset, we attained a 52.5 mAP, a substantial improvement over the 51.2 mAP achieved by the state-of-the-art method. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: Accepted into CVPR 2024

arXiv:2401.02764 [pdf, other]

Fus-MAE: A cross-attention-based data fusion approach for Masked Autoencoders in remote sensing

Authors: Hugo Chan-To-Hing, Bharadwaj Veeravalli

Abstract: Self-supervised frameworks for representation learning have recently stirred up interest among the remote sensing community, given their potential to mitigate the high labeling costs associated with curating large satellite image datasets. In the realm of multimodal data fusion, while the often used contrastive learning methods can help bridging the domain gap between different sensor types, they… ▽ More Self-supervised frameworks for representation learning have recently stirred up interest among the remote sensing community, given their potential to mitigate the high labeling costs associated with curating large satellite image datasets. In the realm of multimodal data fusion, while the often used contrastive learning methods can help bridging the domain gap between different sensor types, they rely on data augmentations techniques that require expertise and careful design, especially for multispectral remote sensing data. A possible but rather scarcely studied way to circumvent these limitations is to use a masked image modelling based pretraining strategy. In this paper, we introduce Fus-MAE, a self-supervised learning framework based on masked autoencoders that uses cross-attention to perform early and feature-level data fusion between synthetic aperture radar and multispectral optical data - two modalities with a significant domain gap. Our empirical findings demonstrate that Fus-MAE can effectively compete with contrastive learning strategies tailored for SAR-optical data fusion and outperforms other masked-autoencoders frameworks trained on a larger corpus. △ Less

Submitted 5 January, 2024; originally announced January 2024.

arXiv:2303.13853 [pdf, other]

2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection

Authors: Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan

Abstract: Object detection at night is a challenging problem due to the absence of night image annotations. Despite several domain adaptation methods, achieving high-precision results remains an issue. False-positive error propagation is still observed in methods using the well-established student-teacher framework, particularly for small-scale and low-light objects. This paper proposes a two-phase consiste… ▽ More Object detection at night is a challenging problem due to the absence of night image annotations. Despite several domain adaptation methods, achieving high-precision results remains an issue. False-positive error propagation is still observed in methods using the well-established student-teacher framework, particularly for small-scale and low-light objects. This paper proposes a two-phase consistency unsupervised domain adaptation network, 2PCNet, to address these issues. The network employs high-confidence bounding-box predictions from the teacher in the first phase and appends them to the student's region proposals for the teacher to re-evaluate in the second phase, resulting in a combination of high and low confidence pseudo-labels. The night images and pseudo-labels are scaled-down before being used as input to the student, providing stronger small-scale pseudo-labels. To address errors that arise from low-light regions and other night-related attributes in images, we propose a night-specific augmentation pipeline called NightAug. This pipeline involves applying random augmentations, such as glare, blur, and noise, to daytime images. Experiments on publicly available datasets demonstrate that our method achieves superior results to state-of-the-art methods by 20\%, and to supervised models trained directly on the target data. △ Less

Submitted 24 March, 2023; originally announced March 2023.

Comments: Accepted into CVPR'23

arXiv:2303.08526 [pdf, other]

HeRAFC: Heuristic Resource Allocation and Optimization in MultiFog-Cloud Environment

Authors: Chinmaya Kumar Dehury, Bharadwaj Veeravalli, Satish Narayana Srirama

Abstract: By bringing computing capacity from a remote cloud environment closer to the user, fog computing is introduced. As a result, users can access the services from more nearby computing environments, resulting in better quality of service and lower latency on the network. From the service providers' point of view, this addresses the network latency and congestion issues. This is achieved by deploying… ▽ More By bringing computing capacity from a remote cloud environment closer to the user, fog computing is introduced. As a result, users can access the services from more nearby computing environments, resulting in better quality of service and lower latency on the network. From the service providers' point of view, this addresses the network latency and congestion issues. This is achieved by deploying the services in cloud and fog computing environments. The responsibility of service providers is to manage the heterogeneous resources available in both computing environments. In recent years, resource management strategies have made it possible to efficiently allocate resources from nearby fog and clouds to users' applications. Unfortunately, these existing resource management strategies fail to give the desired result when the service providers have the opportunity to allocate the resources to the users' application from fog nodes that are at a multi-hop distance from the nearby fog node. The complexity of this resource management problem drastically increases in a MultiFog-Cloud environment. This problem motivates us to revisit and present a novel Heuristic Resource Allocation and Optimization algorithm in a MultiFog-Cloud (HeRAFC) environment. Taking users' application priority, execution time, and communication latency into account, HeRAFC optimizes resource utilization and minimizes cloud load. The proposed algorithm is evaluated and compared with related algorithms. The simulation results show the efficiency of the proposed HeRAFC over other algorithms. △ Less

Submitted 21 March, 2023; v1 submitted 15 March, 2023; originally announced March 2023.

Comments: Currently under review with Elsevier JPDC journal

arXiv:2207.01900 [pdf, other]

doi 10.1109/ICIP46576.2022.9897494

ACT-Net: Asymmetric Co-Teacher Network for Semi-supervised Memory-efficient Medical Image Segmentation

Authors: Ziyuan Zhao, Andong Zhu, Zeng Zeng, Bharadwaj Veeravalli, Cuntai Guan

Abstract: While deep models have shown promising performance in medical image segmentation, they heavily rely on a large amount of well-annotated data, which is difficult to access, especially in clinical practice. On the other hand, high-accuracy deep models usually come in large model sizes, limiting their employment in real scenarios. In this work, we propose a novel asymmetric co-teacher framework, ACT-… ▽ More While deep models have shown promising performance in medical image segmentation, they heavily rely on a large amount of well-annotated data, which is difficult to access, especially in clinical practice. On the other hand, high-accuracy deep models usually come in large model sizes, limiting their employment in real scenarios. In this work, we propose a novel asymmetric co-teacher framework, ACT-Net, to alleviate the burden on both expensive annotations and computational costs for semi-supervised knowledge distillation. We advance teacher-student learning with a co-teacher network to facilitate asymmetric knowledge distillation from large models to small ones by alternating student and teacher roles, obtaining tiny but accurate models for clinical employment. To verify the effectiveness of our ACT-Net, we employ the ACDC dataset for cardiac substructure segmentation in our experiments. Extensive experimental results demonstrate that ACT-Net outperforms other knowledge distillation methods and achieves lossless segmentation performance with 250x fewer parameters. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Journal ref: 2022 IEEE International Conference on Image Processing (ICIP)

arXiv:2207.01883 [pdf, other]

doi 10.1109/ICIP46576.2022.9897591

MMGL: Multi-Scale Multi-View Global-Local Contrastive learning for Semi-supervised Cardiac Image Segmentation

Authors: Ziyuan Zhao, Jinxuan Hu, Zeng Zeng, Xulei Yang, Peisheng Qian, Bharadwaj Veeravalli, Cuntai Guan

Abstract: With large-scale well-labeled datasets, deep learning has shown significant success in medical image segmentation. However, it is challenging to acquire abundant annotations in clinical practice due to extensive expertise requirements and costly labeling efforts. Recently, contrastive learning has shown a strong capacity for visual representation learning on unlabeled data, achieving impressive pe… ▽ More With large-scale well-labeled datasets, deep learning has shown significant success in medical image segmentation. However, it is challenging to acquire abundant annotations in clinical practice due to extensive expertise requirements and costly labeling efforts. Recently, contrastive learning has shown a strong capacity for visual representation learning on unlabeled data, achieving impressive performance rivaling supervised learning in many domains. In this work, we propose a novel multi-scale multi-view global-local contrastive learning (MMGL) framework to thoroughly explore global and local features from different scales and views for robust contrastive learning performance, thereby improving segmentation performance with limited annotations. Extensive experiments on the MM-WHS dataset demonstrate the effectiveness of MMGL framework on semi-supervised cardiac image segmentation, outperforming the state-of-the-art contrastive learning methods by a large margin. △ Less

Submitted 5 July, 2022; originally announced July 2022.

Comments: Accepted by IEEE International Conference on Image Processing (ICIP 2022)

Journal ref: 2022 IEEE International Conference on Image Processing (ICIP)

arXiv:2205.07028 [pdf, other]

Object-Aware Self-supervised Multi-Label Learning

Authors: Xu Kaixin, Liu Liyang, Zhao Ziyuan, Zeng Zeng, Bharadwaj Veeravalli

Abstract: Multi-label Learning on Image data has been widely exploited with deep learning models. However, supervised training on deep CNN models often cannot discover sufficient discriminative features for classification. As a result, numerous self-supervision methods are proposed to learn more robust image representations. However, most self-supervised approaches focus on single-instance single-label data… ▽ More Multi-label Learning on Image data has been widely exploited with deep learning models. However, supervised training on deep CNN models often cannot discover sufficient discriminative features for classification. As a result, numerous self-supervision methods are proposed to learn more robust image representations. However, most self-supervised approaches focus on single-instance single-label data and fall short on more complex images with multiple objects. Therefore, we propose an Object-Aware Self-Supervision (OASS) method to obtain more fine-grained representations for multi-label learning, dynamically generating auxiliary tasks based on object locations. Secondly, the robust representation learned by OASS can be leveraged to efficiently generate Class-Specific Instances (CSI) in a proposal-free fashion to better guide multi-label supervision signal transfer to instances. Extensive experiments on the VOC2012 dataset for multi-label classification demonstrate the effectiveness of the proposed method against the state-of-the-art counterparts. △ Less

Submitted 13 July, 2022; v1 submitted 14 May, 2022; originally announced May 2022.

Comments: Accepted by IEEE International Conference on Image Processing (ICIP 2022)

arXiv:2205.07021 [pdf, other]

doi 10.1109/EMBC48229.2022.9871734

Self-supervised Assisted Active Learning for Skin Lesion Segmentation

Authors: Ziyuan Zhao, Wenjing Lu, Zeng Zeng, Kaixin Xu, Bharadwaj Veeravalli, Cuntai Guan

Abstract: Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with so… ▽ More Label scarcity has been a long-standing issue for biomedical image segmentation, due to high annotation costs and professional requirements. Recently, active learning (AL) strategies strive to reduce annotation costs by querying a small portion of data for annotation, receiving much traction in the field of medical imaging. However, most of the existing AL methods have to initialize models with some randomly selected samples followed by active selection based on various criteria, such as uncertainty and diversity. Such random-start initialization methods inevitably introduce under-value redundant samples and unnecessary annotation costs. For the purpose of addressing the issue, we propose a novel self-supervised assisted active learning framework in the cold-start setting, in which the segmentation model is first warmed up with self-supervised learning (SSL), and then SSL features are used for sample selection via latent feature clustering without accessing labels. We assess our proposed methodology on skin lesions segmentation task. Extensive experiments demonstrate that our approach is capable of achieving promising performance with substantial improvements over existing baselines. △ Less

Submitted 14 May, 2022; originally announced May 2022.

Comments: Accepted by the 44th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC 2022)

Journal ref: 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2203.00497 [pdf, other]

A predictive analytics approach for stroke prediction using machine learning and neural networks

Authors: Soumyabrata Dev, Hewei Wang, Chidozie Shamrock Nwosu, Nishtha Jain, Bharadwaj Veeravalli, Deepu John

Abstract: The negative impact of stroke in society has led to concerted efforts to improve the management and diagnosis of stroke. With an increased synergy between technology and medical diagnosis, caregivers create opportunities for better patient management by systematically mining and archiving the patients' medical records. Therefore, it is vital to study the interdependency of these risk factors in pa… ▽ More The negative impact of stroke in society has led to concerted efforts to improve the management and diagnosis of stroke. With an increased synergy between technology and medical diagnosis, caregivers create opportunities for better patient management by systematically mining and archiving the patients' medical records. Therefore, it is vital to study the interdependency of these risk factors in patients' health records and understand their relative contribution to stroke prediction. This paper systematically analyzes the various factors in electronic health records for effective stroke prediction. Using various statistical techniques and principal component analysis, we identify the most important factors for stroke prediction. We conclude that age, heart disease, average glucose level, and hypertension are the most important factors for detecting stroke in patients. Furthermore, a perceptron neural network using these four attributes provides the highest accuracy rate and lowest miss rate compared to using all available input features and other benchmarking algorithms. As the dataset is highly imbalanced concerning the occurrence of stroke, we report our results on a balanced dataset created via sub-sampling techniques. △ Less

Submitted 1 March, 2022; originally announced March 2022.

Comments: Published in Healthcare Analytics, 2022

arXiv:2111.00579 [pdf, other]

doi 10.1109/TCC.2021.3126677

RRFT: A Rank-Based Resource Aware Fault Tolerant Strategy for Cloud Platforms

Authors: Chinmaya Kumar Dehury, Prasan Kumar Sahoo, Bharadwaj Veeravalli

Abstract: The applications that are deployed in the cloud to provide services to the users encompass a large number of interconnected dependent cloud components. Multiple identical components are scheduled to run concurrently in order to handle unexpected failures and provide uninterrupted service to the end user, which introduces resource overhead problem for the cloud service provider. Furthermore such re… ▽ More The applications that are deployed in the cloud to provide services to the users encompass a large number of interconnected dependent cloud components. Multiple identical components are scheduled to run concurrently in order to handle unexpected failures and provide uninterrupted service to the end user, which introduces resource overhead problem for the cloud service provider. Furthermore such resource-intensive fault tolerant strategies bring extra monetary overhead to the cloud service provider and eventually to the cloud users. In order to address these issues, a novel fault tolerant strategy based on the significance level of each component is developed. The communication topology among the application components, their historical performance, failure rate, failure impact on other components, dependencies among them, etc., are used to rank those application components to further decide on the importance of one component over others. Based on the rank, a Markov Decision Process (MDP) model is presented to determine the number of replicas that varies from one component to another. A rigorous performance evaluation is carried out using some of the most common practically useful metrics such as, recovery time upon a fault, average number of components needed, number of parallel components successfully executed, etc., to quote a few, with similar component ranking and fault tolerant strategies. Simulation results demonstrate that the proposed algorithm reduces the required number of virtual and physical machines by approximately 10% and 4.2%, respectively, compared to other similar algorithms. △ Less

Submitted 31 October, 2021; originally announced November 2021.

Comments: This is accepted in IEEE TCC. The preprint version will be uploaded soon

arXiv:2010.04351 [pdf, other]

doi 10.1145/3477145.3477157

Connection Pruning for Deep Spiking Neural Networks with On-Chip Learning

Authors: Thao N. N. Nguyen, Bharadwaj Veeravalli, Xuanyao Fong

Abstract: Long training time hinders the potential of the deep, large-scale Spiking Neural Network (SNN) with the on-chip learning capability to be realized on the embedded systems hardware. Our work proposes a novel connection pruning approach that can be applied during the on-chip Spike Timing Dependent Plasticity (STDP)-based learning to optimize the learning time and the network connectivity of the deep… ▽ More Long training time hinders the potential of the deep, large-scale Spiking Neural Network (SNN) with the on-chip learning capability to be realized on the embedded systems hardware. Our work proposes a novel connection pruning approach that can be applied during the on-chip Spike Timing Dependent Plasticity (STDP)-based learning to optimize the learning time and the network connectivity of the deep SNN. We applied our approach to a deep SNN with the Time To First Spike (TTFS) coding and has successfully achieved 2.1x speed-up and 64% energy savings in the on-chip learning and reduced the network connectivity by 92.83%, without incurring any accuracy loss. Moreover, the connectivity reduction results in 2.83x speed-up and 78.24% energy savings in the inference. Evaluation of our proposed approach on the Field Programmable Gate Array (FPGA) platform revealed 0.56% power overhead was needed to implement the pruning algorithm. △ Less

Submitted 31 July, 2021; v1 submitted 8 October, 2020; originally announced October 2020.

Comments: 8 pages, 9 figures This paper has been accepted for publication in the International Conference on Neuromorphic Systems (ICONS) 2021

Journal ref: International Conference on Neuromorphic Systems 2021

arXiv:2005.04069 [pdf, other]

doi 10.1109/EMBC44109.2020.9176677

Multi-Phase Cross-modal Learning for Noninvasive Gene Mutation Prediction in Hepatocellular Carcinoma

Authors: Jiapan Gu, Ziyuan Zhao, Zeng Zeng, Yuzhe Wang, Zhengyiren Qiu, Bharadwaj Veeravalli, Brian Kim Poh Goh, Glenn Kunnath Bonney, Krishnakumar Madhavan, Chan Wan Ying, Lim Kheng Choon, Thng Choon Hua, Pierce KH Chow

Abstract: Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer and the fourth most common cause of cancer-related death worldwide. Understanding the underlying gene mutations in HCC provides great prognostic value for treatment planning and targeted therapy. Radiogenomics has revealed an association between non-invasive imaging features and molecular genomics. However, imaging feat… ▽ More Hepatocellular carcinoma (HCC) is the most common type of primary liver cancer and the fourth most common cause of cancer-related death worldwide. Understanding the underlying gene mutations in HCC provides great prognostic value for treatment planning and targeted therapy. Radiogenomics has revealed an association between non-invasive imaging features and molecular genomics. However, imaging feature identification is laborious and error-prone. In this paper, we propose an end-to-end deep learning framework for mutation prediction in APOB, COL11A1 and ATRX genes using multiphasic CT scans. Considering intra-tumour heterogeneity (ITH) in HCC, multi-region sampling technology is implemented to generate the dataset for experiments. Experimental results demonstrate the effectiveness of the proposed model. △ Less

Submitted 8 May, 2020; originally announced May 2020.

Comments: Accepted version to be published in the 42nd IEEE Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2020, Montreal, Canada

Journal ref: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

arXiv:2005.03225 [pdf, other]

doi 10.1109/EMBC44109.2020.9176662

Deeply Supervised Active Learning for Finger Bones Segmentation

Authors: Ziyuan Zhao, Xiaoyan Yang, Bharadwaj Veeravalli, Zeng Zeng

Abstract: Segmentation is a prerequisite yet challenging task for medical image analysis. In this paper, we introduce a novel deeply supervised active learning approach for finger bones segmentation. The proposed architecture is fine-tuned in an iterative and incremental learning manner. In each step, the deep supervision mechanism guides the learning process of hidden layers and selects samples to be label… ▽ More Segmentation is a prerequisite yet challenging task for medical image analysis. In this paper, we introduce a novel deeply supervised active learning approach for finger bones segmentation. The proposed architecture is fine-tuned in an iterative and incremental learning manner. In each step, the deep supervision mechanism guides the learning process of hidden layers and selects samples to be labeled. Extensive experiments demonstrated that our method achieves competitive segmentation results using less labeled samples as compared with full annotation. △ Less

Submitted 6 May, 2020; originally announced May 2020.

Comments: Accepted version to be published in the 42nd IEEE Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBC 2020, Montreal, Canada

Journal ref: 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC)

Showing 1–14 of 14 results for author: Veeravalli, B