-
Mitigating Backdoor Attacks using Activation-Guided Model Editing
Authors:
Felix Hsieh,
Huy H. Nguyen,
AprilPyone MaungMaung,
Dmitrii Usynin,
Isao Echizen
Abstract:
Backdoor attacks compromise the integrity and reliability of machine learning models by embedding a hidden trigger during the training process, which can later be activated to cause unintended misbehavior. We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks. The proposed method utilizes model activation of domain-equivalent unseen data to guide t…
▽ More
Backdoor attacks compromise the integrity and reliability of machine learning models by embedding a hidden trigger during the training process, which can later be activated to cause unintended misbehavior. We propose a novel backdoor mitigation approach via machine unlearning to counter such backdoor attacks. The proposed method utilizes model activation of domain-equivalent unseen data to guide the editing of the model's weights. Unlike the previous unlearning-based mitigation methods, ours is computationally inexpensive and achieves state-of-the-art performance while only requiring a handful of unseen samples for unlearning. In addition, we also point out that unlearning the backdoor may cause the whole targeted class to be unlearned, thus introducing an additional repair step to preserve the model's utility after editing the model. Experiment results show that the proposed method is effective in unlearning the backdoor on different datasets and trigger patterns.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Language-Driven Closed-Loop Grasping with Model-Predictive Trajectory Replanning
Authors:
Huy Hoang Nguyen,
Minh Nhat Vu,
Florian Beck,
Gerald Ebmer,
Anh Nguyen,
Andreas Kugi
Abstract:
Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects…
▽ More
Combining a vision module inside a closed-loop control system for a \emph{seamless movement} of a robot in a manipulation task is challenging due to the inconsistent update rates between utilized modules. This task is even more difficult in a dynamic environment, e.g., objects are moving. This paper presents a \emph{modular} zero-shot framework for language-driven manipulation of (dynamic) objects through a closed-loop control system with real-time trajectory replanning and an online 6D object pose localization. We segment an object within $\SI{0.5}{\second}$ by leveraging a vision language model via language commands. Then, guided by natural language commands, a closed-loop system, including a unified pose estimation and tracking and online trajectory planning, is utilized to continuously track this object and compute the optimal trajectory in real-time. Our proposed zero-shot framework provides a smooth trajectory that avoids jerky movements and ensures the robot can grasp a non-stationary object. Experiment results exhibit the real-time capability of the proposed zero-shot modular framework for the trajectory optimization module to accurately and efficiently grasp moving objects, i.e., up to \SI{30}{\hertz} update rates for the online 6D pose localization module and \SI{10}{\hertz} update rates for the receding-horizon trajectory optimization. These advantages highlight the modular framework's potential applications in robotics and human-robot interaction; see the video in https://www.acin.tuwien.ac.at/en/6e64/.
△ Less
Submitted 19 June, 2024; v1 submitted 13 June, 2024;
originally announced June 2024.
-
Image-level Regression for Uncertainty-aware Retinal Image Segmentation
Authors:
Trung Dang,
Huy Hoang Nguyen,
Aleksei Tiulpin
Abstract:
Accurate retinal vessel segmentation is a crucial step in the quantitative assessment of retinal vasculature, which is needed for the early detection of retinal diseases and other conditions. Numerous studies have been conducted to tackle the problem of segmenting vessels automatically using a pixel-wise classification approach. The common practice of creating ground truth labels is to categorize…
▽ More
Accurate retinal vessel segmentation is a crucial step in the quantitative assessment of retinal vasculature, which is needed for the early detection of retinal diseases and other conditions. Numerous studies have been conducted to tackle the problem of segmenting vessels automatically using a pixel-wise classification approach. The common practice of creating ground truth labels is to categorize pixels as foreground and background. This approach is, however, biased, and it ignores the uncertainty of a human annotator when it comes to annotating e.g. thin vessels. In this work, we propose a simple and effective method that casts the retinal image segmentation task as an image-level regression. For this purpose, we first introduce a novel Segmentation Annotation Uncertainty-Aware (SAUNA) transform, which adds pixel uncertainty to the ground truth using the pixel's closeness to the annotation boundary and vessel thickness. To train our model with soft labels, we generalize the earlier proposed Jaccard metric loss to arbitrary hypercubes, which is a second contribution of this work. The proposed SAUNA transform and the new theoretical results allow us to directly train a standard U-Net-like architecture at the image level, outperforming all recently published methods. We conduct thorough experiments and compare our method to a diverse set of baselines across 5 retinal image datasets. Our implementation is available at \url{https://github.com/Oulu-IMEDS/SAUNA}.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
SiNGR: Brain Tumor Segmentation via Signed Normalized Geodesic Transform Regression
Authors:
Trung Dang,
Huy Hoang Nguyen,
Aleksei Tiulpin
Abstract:
One of the primary challenges in brain tumor segmentation arises from the uncertainty of voxels close to tumor boundaries. However, the conventional process of generating ground truth segmentation masks fails to treat such uncertainties properly. Those ``hard labels'' with 0s and 1s conceptually influenced the majority of prior studies on brain image segmentation. As a result, tumor segmentation i…
▽ More
One of the primary challenges in brain tumor segmentation arises from the uncertainty of voxels close to tumor boundaries. However, the conventional process of generating ground truth segmentation masks fails to treat such uncertainties properly. Those ``hard labels'' with 0s and 1s conceptually influenced the majority of prior studies on brain image segmentation. As a result, tumor segmentation is often solved through voxel classification. In this work, we instead view this problem as a voxel-level regression, where the ground truth represents a certainty mapping from any pixel based on the distance to tumor border. We propose a novel ground truth label transformation, which is based on a signed geodesic transform, to capture the uncertainty in brain tumors' vicinity, while maintaining a margin between positive and negative samples. We combine this idea with a Focal-like regression L1-loss that enables effective regression learning in high-dimensional output space by appropriately weighting voxels according to their difficulty. We thoroughly conduct an experimental evaluation to validate the components of our proposed method, compare it to a diverse array of state-of-the-art segmentation models, and show that it is architecture-agnostic. The code of our method is made publicly available (\url{https://github.com/Oulu-IMEDS/SiNGR/}).
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis
Authors:
Huy H. Nguyen,
Junichi Yamagishi,
Isao Echizen
Abstract:
This paper investigates the effectiveness of self-supervised pre-trained transformers compared to supervised pre-trained transformers and conventional neural networks (ConvNets) for detecting various types of deepfakes. We focus on their potential for improved generalization, particularly when training data is limited. Despite the notable success of large vision-language models utilizing transform…
▽ More
This paper investigates the effectiveness of self-supervised pre-trained transformers compared to supervised pre-trained transformers and conventional neural networks (ConvNets) for detecting various types of deepfakes. We focus on their potential for improved generalization, particularly when training data is limited. Despite the notable success of large vision-language models utilizing transformer architectures in various tasks, including zero-shot and few-shot learning, the deepfake detection community has still shown some reluctance to adopt pre-trained vision transformers (ViTs), especially large ones, as feature extractors. One concern is their perceived excessive capacity, which often demands extensive data, and the resulting suboptimal generalization when training or fine-tuning data is small or less diverse. This contrasts poorly with ConvNets, which have already established themselves as robust feature extractors. Additionally, training and optimizing transformers from scratch requires significant computational resources, making this accessible primarily to large companies and hindering broader investigation within the academic community. Recent advancements in using self-supervised learning (SSL) in transformers, such as DINO and its derivatives, have showcased significant adaptability across diverse vision tasks and possess explicit semantic segmentation capabilities. By leveraging DINO for deepfake detection with modest training data and implementing partial fine-tuning, we observe comparable adaptability to the task and the natural explainability of the detection result via the attention mechanism. Moreover, partial fine-tuning of transformers for deepfake detection offers a more resource-efficient alternative, requiring significantly fewer computational resources.
△ Less
Submitted 1 May, 2024;
originally announced May 2024.
-
CORI: CJKV Benchmark with Romanization Integration -- A step towards Cross-lingual Transfer Beyond Textual Scripts
Authors:
Hoang H. Nguyen,
Chenwei Zhang,
Ye Liu,
Natalie Parde,
Eugene Rohrbaugh,
Philip S. Yu
Abstract:
Naively assuming English as a source language may hinder cross-lingual transfer for many languages by failing to consider the importance of language contact. Some languages are more well-connected than others, and target languages can benefit from transferring from closely related languages; for many languages, the set of closely related languages does not include English. In this work, we study t…
▽ More
Naively assuming English as a source language may hinder cross-lingual transfer for many languages by failing to consider the importance of language contact. Some languages are more well-connected than others, and target languages can benefit from transferring from closely related languages; for many languages, the set of closely related languages does not include English. In this work, we study the impact of source language for cross-lingual transfer, demonstrating the importance of selecting source languages that have high contact with the target language. We also construct a novel benchmark dataset for close contact Chinese-Japanese-Korean-Vietnamese (CJKV) languages to further encourage in-depth studies of language contact. To comprehensively capture contact between these languages, we propose to integrate Romanized transcription beyond textual scripts via Contrastive Learning objectives, leading to enhanced cross-lingual representations and effective zero-shot cross-lingual transfer.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
Stochastic Constrained Decentralized Optimization for Machine Learning with Fewer Data Oracles: a Gradient Sliding Approach
Authors:
Hoang Huy Nguyen,
Yan Li,
Tuo Zhao
Abstract:
In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that…
▽ More
In modern decentralized applications, ensuring communication efficiency and privacy for the users are the key challenges. In order to train machine-learning models, the algorithm has to communicate to the data center and sample data for its gradient computation, thus exposing the data and increasing the communication cost. This gives rise to the need for a decentralized optimization algorithm that is communication-efficient and minimizes the number of gradient computations. To this end, we propose the primal-dual sliding with conditional gradient sliding framework, which is communication-efficient and achieves an $\varepsilon$-approximate solution with the optimal gradient complexity of $O(1/\sqrt{\varepsilon}+σ^2/{\varepsilon^2})$ and $O(\log(1/\varepsilon)+σ^2/\varepsilon)$ for the convex and strongly convex setting respectively and an LO (Linear Optimization) complexity of $O(1/\varepsilon^2)$ for both settings given a stochastic gradient oracle with variance $σ^2$. Compared with the prior work \cite{wai-fw-2017}, our framework relaxes the assumption of the optimal solution being a strict interior point of the feasible set and enjoys wider applicability for large-scale training using a stochastic gradient oracle. We also demonstrate the efficiency of our algorithms with various numerical experiments.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Image-Text Out-Of-Context Detection Using Synthetic Multimodal Misinformation
Authors:
Fatma Shalabi,
Huy H. Nguyen,
Hichem Felouat,
Ching-Chun Chang,
Isao Echizen
Abstract:
Misinformation has become a major challenge in the era of increasing digital information, requiring the development of effective detection methods. We have investigated a novel approach to Out-Of-Context detection (OOCD) that uses synthetic data generation. We created a dataset specifically designed for OOCD and developed an efficient detector for accurate classification. Our experimental findings…
▽ More
Misinformation has become a major challenge in the era of increasing digital information, requiring the development of effective detection methods. We have investigated a novel approach to Out-Of-Context detection (OOCD) that uses synthetic data generation. We created a dataset specifically designed for OOCD and developed an efficient detector for accurate classification. Our experimental findings validate the use of synthetic data generation and demonstrate its efficacy in addressing the data limitations associated with OOCD. The dataset and detector should serve as valuable resources for future research and the development of robust misinformation detection systems.
△ Less
Submitted 29 January, 2024;
originally announced March 2024.
-
Leveraging Chat-Based Large Vision Language Models for Multimodal Out-Of-Context Detection
Authors:
Fatma Shalabi,
Hichem Felouat,
Huy H. Nguyen,
Isao Echizen
Abstract:
Out-of-context (OOC) detection is a challenging task involving identifying images and texts that are irrelevant to the context in which they are presented. Large vision-language models (LVLMs) are effective at various tasks, including image classification and text generation. However, the extent of their proficiency in multimodal OOC detection tasks is unclear. In this paper, we investigate the ab…
▽ More
Out-of-context (OOC) detection is a challenging task involving identifying images and texts that are irrelevant to the context in which they are presented. Large vision-language models (LVLMs) are effective at various tasks, including image classification and text generation. However, the extent of their proficiency in multimodal OOC detection tasks is unclear. In this paper, we investigate the ability of LVLMs to detect multimodal OOC and show that these models cannot achieve high accuracy on OOC detection tasks without fine-tuning. However, we demonstrate that fine-tuning LVLMs on multimodal OOC datasets can further improve their OOC detection accuracy. To evaluate the performance of LVLMs on OOC detection tasks, we fine-tune MiniGPT-4 on the NewsCLIPpings dataset, a large dataset of multimodal OOC. Our results show that fine-tuning MiniGPT-4 on the NewsCLIPpings dataset significantly improves the OOC detection accuracy in this dataset. This suggests that fine-tuning can significantly improve the performance of LVLMs on OOC detection tasks.
△ Less
Submitted 22 January, 2024;
originally announced March 2024.
-
Fine-Tuning Text-To-Image Diffusion Models for Class-Wise Spurious Feature Generation
Authors:
AprilPyone MaungMaung,
Huy H. Nguyen,
Hitoshi Kiya,
Isao Echizen
Abstract:
We propose a method for generating spurious features by leveraging large-scale text-to-image diffusion models. Although the previous work detects spurious features in a large-scale dataset like ImageNet and introduces Spurious ImageNet, we found that not all spurious images are spurious across different classifiers. Although spurious images help measure the reliance of a classifier, filtering many…
▽ More
We propose a method for generating spurious features by leveraging large-scale text-to-image diffusion models. Although the previous work detects spurious features in a large-scale dataset like ImageNet and introduces Spurious ImageNet, we found that not all spurious images are spurious across different classifiers. Although spurious images help measure the reliance of a classifier, filtering many images from the Internet to find more spurious features is time-consuming. To this end, we utilize an existing approach of personalizing large-scale text-to-image diffusion models with available discovered spurious images and propose a new spurious feature similarity loss based on neural features of an adversarially robust model. Precisely, we fine-tune Stable Diffusion with several reference images from Spurious ImageNet with a modified objective incorporating the proposed spurious-feature similarity loss. Experiment results show that our method can generate spurious images that are consistently spurious across different classifiers. Moreover, the generated spurious images are visually similar to reference images from Spurious ImageNet.
△ Less
Submitted 12 February, 2024;
originally announced February 2024.
-
Enhancing Robustness of LLM-Synthetic Text Detectors for Academic Writing: A Comprehensive Analysis
Authors:
Zhicheng Dou,
Yuchen Guo,
Ching-Chun Chang,
Huy H. Nguyen,
Isao Echizen
Abstract:
The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in terms of revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic…
▽ More
The emergence of large language models (LLMs), such as Generative Pre-trained Transformer 4 (GPT-4) used by ChatGPT, has profoundly impacted the academic and broader community. While these models offer numerous advantages in terms of revolutionizing work and study methods, they have also garnered significant attention due to their potential negative consequences. One example is generating academic reports or papers with little to no human contribution. Consequently, researchers have focused on developing detectors to address the misuse of LLMs. However, most existing methods prioritize achieving higher accuracy on restricted datasets, neglecting the crucial aspect of generalizability. This limitation hinders their practical application in real-life scenarios where reliability is paramount. In this paper, we present a comprehensive analysis of the impact of prompts on the text generated by LLMs and highlight the potential lack of robustness in one of the current state-of-the-art GPT detectors. To mitigate these issues concerning the misuse of LLMs in academic writing, we propose a reference-based Siamese detector named Synthetic-Siamese which takes a pair of texts, one as the inquiry and the other as the reference. Our method effectively addresses the lack of robustness of previous detectors (OpenAI detector and DetectGPT) and significantly improves the baseline performances in realistic academic writing scenarios by approximately 67% to 95%.
△ Less
Submitted 15 January, 2024;
originally announced January 2024.
-
Cross-Attention Watermarking of Large Language Models
Authors:
Folco Bertini Baldassini,
Huy H. Nguyen,
Ching-Chung Chang,
Isao Echizen
Abstract:
A new approach to linguistic watermarking of language models is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pret…
▽ More
A new approach to linguistic watermarking of language models is presented in which information is imperceptibly inserted into the output text while preserving its readability and original meaning. A cross-attention mechanism is used to embed watermarks in the text during inference. Two methods using cross-attention are presented that minimize the effect of watermarking on the performance of a pretrained model. Exploration of different training strategies for optimizing the watermarking and of the challenges and implications of applying this approach in real-world scenarios clarified the tradeoff between watermark robustness and text quality. Watermark selection substantially affects the generated output for high entropy sentences. This proactive watermarking approach has potential application in future model development.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Surface Normal Estimation with Transformers
Authors:
Barry Shichen Hu,
Siyun Liang,
Johannes Paetzold,
Huy H. Nguyen,
Isao Echizen,
Jiapeng Tang
Abstract:
We propose the use of a Transformer to accurately predict normals from point clouds with noise and density variations. Previous learning-based methods utilize PointNet variants to explicitly extract multi-scale features at different input scales, then focus on a surface fitting method by which local point cloud neighborhoods are fitted to a geometric surface approximated by either a polynomial fun…
▽ More
We propose the use of a Transformer to accurately predict normals from point clouds with noise and density variations. Previous learning-based methods utilize PointNet variants to explicitly extract multi-scale features at different input scales, then focus on a surface fitting method by which local point cloud neighborhoods are fitted to a geometric surface approximated by either a polynomial function or a multi-layer perceptron (MLP). However, fitting surfaces to fixed-order polynomial functions can suffer from overfitting or underfitting, and learning MLP-represented hyper-surfaces requires pre-generated per-point weights. To avoid these limitations, we first unify the design choices in previous works and then propose a simplified Transformer-based model to extract richer and more robust geometric features for the surface normal estimation task. Through extensive experiments, we demonstrate that our Transformer-based method achieves state-of-the-art performance on both the synthetic shape dataset PCPNet, and the real-world indoor scene dataset SceneNN, exhibiting more noise-resilient behavior and significantly faster inference. Most importantly, we demonstrate that the sophisticated hand-designed modules in existing works are not necessary to excel at the task of surface normal estimation.
△ Less
Submitted 11 January, 2024;
originally announced January 2024.
-
UAV Trajectory Planning for AoI-Minimal Data Collection in UAV-Aided IoT Networks by Transformer
Authors:
Botao Zhu,
Ebrahim Bedeer,
Ha H. Nguyen,
Robert Barton,
Zhen Gao
Abstract:
Maintaining freshness of data collection in Internet-of-Things (IoT) networks has attracted increasing attention. By taking into account age-of-information (AoI), we investigate the trajectory planning problem of an unmanned aerial vehicle (UAV) that is used to aid a cluster-based IoT network. An optimization problem is formulated to minimize the total AoI of the collected data by the UAV from the…
▽ More
Maintaining freshness of data collection in Internet-of-Things (IoT) networks has attracted increasing attention. By taking into account age-of-information (AoI), we investigate the trajectory planning problem of an unmanned aerial vehicle (UAV) that is used to aid a cluster-based IoT network. An optimization problem is formulated to minimize the total AoI of the collected data by the UAV from the ground IoT network. Since the total AoI of the IoT network depends on the flight time of the UAV and the data collection time at hovering points, we jointly optimize the selection of hovering points and the visiting order to these points. We exploit the state-of-the-art transformer and the weighted A*, which is a path search algorithm, to design a machine learning algorithm to solve the formulated problem. The whole UAV-IoT system is fed into the encoder network of the proposed algorithm, and the algorithm's decoder network outputs the visiting order to ground clusters. Then, the weighted A* is used to find the hovering point for each cluster in the ground IoT network. Simulation results show that the trained model by the proposed algorithm has a good generalization ability to generate solutions for IoT networks with different numbers of ground clusters, without the need to retrain the model. Furthermore, results show that our proposed algorithm can find better UAV trajectories with the minimum total AoI when compared to other algorithms.
△ Less
Submitted 8 November, 2023;
originally announced January 2024.
-
On Partial Optimal Transport: Revising the Infeasibility of Sinkhorn and Efficient Gradient Methods
Authors:
Anh Duc Nguyen,
Tuan Dung Nguyen,
Quang Minh Nguyen,
Hoang H. Nguyen,
Lam M. Nguyen,
Kim-Chuan Toh
Abstract:
This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of…
▽ More
This paper studies the Partial Optimal Transport (POT) problem between two unbalanced measures with at most $n$ supports and its applications in various AI tasks such as color transfer or domain adaptation. There is hence the need for fast approximations of POT with increasingly large problem sizes in arising applications. We first theoretically and experimentally investigate the infeasibility of the state-of-the-art Sinkhorn algorithm for POT due to its incompatible rounding procedure, which consequently degrades its qualitative performance in real world applications like point-cloud registration. To this end, we propose a novel rounding algorithm for POT, and then provide a feasible Sinkhorn procedure with a revised computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon^4)$. Our rounding algorithm also permits the development of two first-order methods to approximate the POT problem. The first algorithm, Adaptive Primal-Dual Accelerated Gradient Descent (APDAGD), finds an $\varepsilon$-approximate solution to the POT problem in $\mathcal{\widetilde O}(n^{2.5}/\varepsilon)$, which is better in $\varepsilon$ than revised Sinkhorn. The second method, Dual Extrapolation, achieves the computation complexity of $\mathcal{\widetilde O}(n^2/\varepsilon)$, thereby being the best in the literature. We further demonstrate the flexibility of POT compared to standard OT as well as the practicality of our algorithms on real applications where two marginal distributions are unbalanced.
△ Less
Submitted 22 December, 2023; v1 submitted 21 December, 2023;
originally announced December 2023.
-
Generalized Deepfakes Detection with Reconstructed-Blended Images and Multi-scale Feature Reconstruction Network
Authors:
Yuyang Sun,
Huy H. Nguyen,
Chun-Shien Lu,
ZhiYong Zhang,
Lu Sun,
Isao Echizen
Abstract:
The growing diversity of digital face manipulation techniques has led to an urgent need for a universal and robust detection technology to mitigate the risks posed by malicious forgeries. We present a blended-based detection approach that has robust applicability to unseen datasets. It combines a method for generating synthetic training samples, i.e., reconstructed blended images, that incorporate…
▽ More
The growing diversity of digital face manipulation techniques has led to an urgent need for a universal and robust detection technology to mitigate the risks posed by malicious forgeries. We present a blended-based detection approach that has robust applicability to unseen datasets. It combines a method for generating synthetic training samples, i.e., reconstructed blended images, that incorporate potential deepfake generator artifacts and a detection model, a multi-scale feature reconstruction network, for capturing the generic boundary artifacts and noise distribution anomalies brought about by digital face manipulations. Experiments demonstrated that this approach results in better performance in both cross-manipulation detection and cross-dataset detection on unseen data.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
CoF-CoT: Enhancing Large Language Models with Coarse-to-Fine Chain-of-Thought Prompting for Multi-domain NLU Tasks
Authors:
Hoang H. Nguyen,
Ye Liu,
Chenwei Zhang,
Tao Zhang,
Philip S. Yu
Abstract:
While Chain-of-Thought prompting is popular in reasoning tasks, its application to Large Language Models (LLMs) in Natural Language Understanding (NLU) is under-explored. Motivated by multi-step reasoning of LLMs, we propose Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps where LLMs can learn to acquire and leverage essential concepts to…
▽ More
While Chain-of-Thought prompting is popular in reasoning tasks, its application to Large Language Models (LLMs) in Natural Language Understanding (NLU) is under-explored. Motivated by multi-step reasoning of LLMs, we propose Coarse-to-Fine Chain-of-Thought (CoF-CoT) approach that breaks down NLU tasks into multiple reasoning steps where LLMs can learn to acquire and leverage essential concepts to solve tasks from different granularities. Moreover, we propose leveraging semantic-based Abstract Meaning Representation (AMR) structured knowledge as an intermediate step to capture the nuances and diverse structures of utterances, and to understand connections between their varying levels of granularity. Our proposed approach is demonstrated effective in assisting the LLMs adapt to the multi-grained NLU tasks under both zero-shot and few-shot multi-domain settings.
△ Less
Submitted 23 October, 2023;
originally announced October 2023.
-
Quality Assurance of A GPT-based Sentiment Analysis System: Adversarial Review Data Generation and Detection
Authors:
Tinghui Ouyang,
Hoang-Quoc Nguyen-Son,
Huy H. Nguyen,
Isao Echizen,
Yoshiki Seo
Abstract:
Large Language Models (LLMs) have been garnering significant attention of AI researchers, especially following the widespread popularity of ChatGPT. However, due to LLMs' intricate architecture and vast parameters, several concerns and challenges regarding their quality assurance require to be addressed. In this paper, a fine-tuned GPT-based sentiment analysis model is first constructed and studie…
▽ More
Large Language Models (LLMs) have been garnering significant attention of AI researchers, especially following the widespread popularity of ChatGPT. However, due to LLMs' intricate architecture and vast parameters, several concerns and challenges regarding their quality assurance require to be addressed. In this paper, a fine-tuned GPT-based sentiment analysis model is first constructed and studied as the reference in AI quality analysis. Then, the quality analysis related to data adequacy is implemented, including employing the content-based approach to generate reasonable adversarial review comments as the wrongly-annotated data, and developing surprise adequacy (SA)-based techniques to detect these abnormal data. Experiments based on Amazon.com review data and a fine-tuned GPT model were implemented. Results were thoroughly discussed from the perspective of AI quality assurance to present the quality analysis of an LLM model on generated adversarial textual data and the effectiveness of using SA on anomaly detection in data quality assurance.
△ Less
Submitted 8 October, 2023;
originally announced October 2023.
-
How Close are Other Computer Vision Tasks to Deepfake Detection?
Authors:
Huy H. Nguyen,
Junichi Yamagishi,
Isao Echizen
Abstract:
In this paper, we challenge the conventional belief that supervised ImageNet-trained models have strong generalizability and are suitable for use as feature extractors in deepfake detection. We present a new measurement, "model separability," for visually and quantitatively assessing a model's raw capacity to separate data in an unsupervised manner. We also present a systematic benchmark for deter…
▽ More
In this paper, we challenge the conventional belief that supervised ImageNet-trained models have strong generalizability and are suitable for use as feature extractors in deepfake detection. We present a new measurement, "model separability," for visually and quantitatively assessing a model's raw capacity to separate data in an unsupervised manner. We also present a systematic benchmark for determining the correlation between deepfake detection and other computer vision tasks using pre-trained models. Our analysis shows that pre-trained face recognition models are more closely related to deepfake detection than other models. Additionally, models trained using self-supervised methods are more effective in separation than those trained using supervised methods. After fine-tuning all models on a small deepfake dataset, we found that self-supervised models deliver the best results, but there is a risk of overfitting. Our results provide valuable insights that should help researchers and practitioners develop more effective deepfake detection models.
△ Less
Submitted 2 October, 2023;
originally announced October 2023.
-
Circular-Line Trajectory Tracking Controller for Mobile Robot using Multi-Pixy2 Sensors
Authors:
Xuan Quang Ngo,
Tri Duc Tran,
Huy Hung Nguyen,
Van Dong Nguyen,
Van Tu Duong,
Tan Tien Nguyen
Abstract:
This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed…
▽ More
This study suggests a novel tracking method that employs three Pixy2 sensors to identify the desired line trajectories instead of traditional perceiving means. Firstly, the kinematic model of the mobile robot is derived from the information gathered by three Pixy2 sensors. Secondly, the sliding mode controller is implemented to regulate the tracking error. Finally, simulation results are analyzed to show the effectiveness of the proposed method.
△ Less
Submitted 12 August, 2023;
originally announced September 2023.
-
Defending Against Physical Adversarial Patch Attacks on Infrared Human Detection
Authors:
Lukas Strack,
Futa Waseda,
Huy H. Nguyen,
Yinqiang Zheng,
Isao Echizen
Abstract:
Infrared detection is an emerging technique for safety-critical tasks owing to its remarkable anti-interference capability. However, recent studies have revealed that it is vulnerable to physically-realizable adversarial patches, posing risks in its real-world applications. To address this problem, we are the first to investigate defense strategies against adversarial patch attacks on infrared det…
▽ More
Infrared detection is an emerging technique for safety-critical tasks owing to its remarkable anti-interference capability. However, recent studies have revealed that it is vulnerable to physically-realizable adversarial patches, posing risks in its real-world applications. To address this problem, we are the first to investigate defense strategies against adversarial patch attacks on infrared detection, especially human detection. We propose a straightforward defense strategy, patch-based occlusion-aware detection (POD), which efficiently augments training samples with random patches and subsequently detects them. POD not only robustly detects people but also identifies adversarial patch locations. Surprisingly, while being extremely computationally efficient, POD easily generalizes to state-of-the-art adversarial patch attacks that are unseen during training. Furthermore, POD improves detection precision even in a clean (i.e., no-attack) situation due to the data augmentation effect. Our evaluation demonstrates that POD is robust to adversarial patches of various shapes and sizes. The effectiveness of our baseline approach is shown to be a viable defense mechanism for real-world infrared human detection systems, paving the way for exploring future research directions.
△ Less
Submitted 10 June, 2024; v1 submitted 27 September, 2023;
originally announced September 2023.
-
Universal Graph Continual Learning
Authors:
Thanh Duc Hoang,
Do Viet Tung,
Duy-Hung Nguyen,
Bao-Sinh Nguyen,
Huy Hoang Nguyen,
Hung Le
Abstract:
We address catastrophic forgetting issues in graph learning as incoming data transits from one to another graph distribution. Whereas prior studies primarily tackle one setting of graph continual learning such as incremental node classification, we focus on a universal approach wherein each data point in a task can be a node or a graph, and the task varies from node to graph classification. We pro…
▽ More
We address catastrophic forgetting issues in graph learning as incoming data transits from one to another graph distribution. Whereas prior studies primarily tackle one setting of graph continual learning such as incremental node classification, we focus on a universal approach wherein each data point in a task can be a node or a graph, and the task varies from node to graph classification. We propose a novel method that enables graph neural networks to excel in this universal setting. Our approach perseveres knowledge about past tasks through a rehearsal mechanism that maintains local and global structure consistency across the graphs. We benchmark our method against various continual learning baselines in real-world graph datasets and achieve significant improvement in average performance and forgetting across tasks.
△ Less
Submitted 26 August, 2023;
originally announced August 2023.
-
Slot Induction via Pre-trained Language Model Probing and Multi-level Contrastive Learning
Authors:
Hoang H. Nguyen,
Chenwei Zhang,
Ye Liu,
Philip S. Yu
Abstract:
Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot b…
▽ More
Recent advanced methods in Natural Language Understanding for Task-oriented Dialogue (TOD) Systems (e.g., intent detection and slot filling) require a large amount of annotated data to achieve competitive performance. In reality, token-level annotations (slot labels) are time-consuming and difficult to acquire. In this work, we study the Slot Induction (SI) task whose objective is to induce slot boundaries without explicit knowledge of token-level slot annotations. We propose leveraging Unsupervised Pre-trained Language Model (PLM) Probing and Contrastive Learning mechanism to exploit (1) unsupervised semantic knowledge extracted from PLM, and (2) additional sentence-level intent label signals available from TOD. Our approach is shown to be effective in SI task and capable of bridging the gaps with token-level supervised models on two NLU benchmark datasets. When generalized to emerging intents, our SI objectives also provide enhanced slot label representations, leading to improved performance on the Slot Filling tasks.
△ Less
Submitted 9 August, 2023;
originally announced August 2023.
-
Enhancing Cross-lingual Transfer via Phonemic Transcription Integration
Authors:
Hoang H. Nguyen,
Chenwei Zhang,
Tao Zhang,
Eugene Rohrbaugh,
Philip S. Yu
Abstract:
Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic…
▽ More
Previous cross-lingual transfer methods are restricted to orthographic representation learning via textual scripts. This limitation hampers cross-lingual transfer and is biased towards languages sharing similar well-known scripts. To alleviate the gap between languages from different writing scripts, we propose PhoneXL, a framework incorporating phonemic transcriptions as an additional linguistic modality beyond the traditional orthographic transcriptions for cross-lingual transfer. Particularly, we propose unsupervised alignment objectives to capture (1) local one-to-one alignment between the two different modalities, (2) alignment via multi-modality contexts to leverage information from additional modalities, and (3) alignment via multilingual contexts where additional bilingual dictionaries are incorporated. We also release the first phonemic-orthographic alignment dataset on two token-level tasks (Named Entity Recognition and Part-of-Speech Tagging) among the understudied but interconnected Chinese-Japanese-Korean-Vietnamese (CJKV) languages. Our pilot study reveals phonemic transcription provides essential information beyond the orthography to enhance cross-lingual transfer and bridge the gap among CJKV languages, leading to consistent improvements on cross-lingual token-level tasks over orthographic-based multilingual PLMs.
△ Less
Submitted 10 July, 2023;
originally announced July 2023.
-
Research Impact of Solar Panel Cleaning Robot on Photovoltaic Panel's Deflection
Authors:
Trung Dat Phan,
Minh Duc Nguyen,
Maxence Auffray,
Nhut Thang Le,
Cong Toai Truong,
Van Tu Duong,
Huy Hung Nguyen,
Tan Tien Nguyen
Abstract:
In the last few decades, solar panel cleaning robots (SPCR) have been widely used for sanitizing photovoltaic (PV) panels as an effective solution for ensuring PV efficiency. However, the dynamic load generated by the SPCR during operation might have a negative impact on PV panels. To reduce these effects, this paper presents the utilization of ANSYS software to simulate multiple scenarios involvi…
▽ More
In the last few decades, solar panel cleaning robots (SPCR) have been widely used for sanitizing photovoltaic (PV) panels as an effective solution for ensuring PV efficiency. However, the dynamic load generated by the SPCR during operation might have a negative impact on PV panels. To reduce these effects, this paper presents the utilization of ANSYS software to simulate multiple scenarios involving the impact of SPCR on PV panels. The simulation scenarios provided in the paper are derived from the typical movements of SPCR observed during practical operations. The simulation results show the deformation process of PV panels, and a second-order polynomial is established to describe the deformed amplitude along the centerline of PV panels. This second-order polynomial contributes to the design process of a damper system for SPCR aiming to reduce the influence of SPCR on PV panels. Moreover, the experiments are conducted to examine the correlation between the results of the simulation and the experiment.
△ Less
Submitted 8 June, 2023; v1 submitted 8 June, 2023;
originally announced June 2023.
-
Design and Detection of Unitary Constellations in Non-Coherent SIMO Systems for Short Packet Communications
Authors:
Son T. Duong,
Ha H. Nguyen,
Ebrahim Bedeer,
Robert Barton
Abstract:
This paper proposes a novel design of multi-symbol unitary constellation for non-coherent single-input multiple-output (SIMO) communications over block Rayleigh fading channels. To facilitate the design and the detection of large unitary constellations at reduced complexity, the proposed constellations are constructed as the Cartesian product of independent amplitude and phase-shift-keying (PSK) v…
▽ More
This paper proposes a novel design of multi-symbol unitary constellation for non-coherent single-input multiple-output (SIMO) communications over block Rayleigh fading channels. To facilitate the design and the detection of large unitary constellations at reduced complexity, the proposed constellations are constructed as the Cartesian product of independent amplitude and phase-shift-keying (PSK) vectors, and hence, can be iteratively detected. The amplitude vector is detected by exhaustive search, whose complexity is sufficiently low in short packet transmission scenarios. To detect the PSK vector, we use the posterior probability as a reliability criterion in the sorted decision-feedback differential detection (sort-DFDD), which results in near-optimal error performance for PSK symbols with equal modulation orders. This detector is called posteriori-based-reliability-sort-DFDD (PR-sort-DFDD) and has polynomial complexity. We also propose an improved detector called improved-PR-sort-DFDD to detect a more generalized PSK structure, i.e., PSK symbols with unequal modulation orders. This detector also approaches the optimal error performance with polynomial complexity. Simulation results show the merits of our proposed multi-symbol unitary constellation when compared to competing low-complexity unitary constellations.
△ Less
Submitted 6 November, 2023; v1 submitted 4 May, 2023;
originally announced May 2023.
-
LMI-based Data-Driven Robust Model Predictive Control
Authors:
Hoang Hai Nguyen,
Maurice Friedel,
Rolf Findeisen
Abstract:
Predictive control, which is based on a model of the system to compute the applied input optimizing the future system behavior, is by now widely used. If the nominal models are not given or are very uncertain, data-driven model predictive control approaches can be employed, where the system model or input is directly obtained from past measured trajectories. Using a data informativity framework an…
▽ More
Predictive control, which is based on a model of the system to compute the applied input optimizing the future system behavior, is by now widely used. If the nominal models are not given or are very uncertain, data-driven model predictive control approaches can be employed, where the system model or input is directly obtained from past measured trajectories. Using a data informativity framework and Finsler's lemma, we propose a data-driven robust linear matrix inequality-based model predictive control scheme that considers input and state constraints. Using these data, we formulate the problem as a semi-definite optimization problem, whose solution provides the matrix gain for the linear feedback, while the decisive variables are independent of the length of the measurement data. The designed controller stabilizes the closed-loop system asymptotically and guarantees constraint satisfaction. Numerical examples are conducted to illustrate the method.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Cyber Vaccine for Deepfake Immunity
Authors:
Ching-Chun Chang,
Huy Hong Nguyen,
Junichi Yamagishi,
Isao Echizen
Abstract:
Deepfakes pose an evolving threat to cybersecurity, which calls for the development of automated countermeasures. While considerable forensic research has been devoted to the detection and localisation of deepfakes, solutions for reversing fake to real are yet to be developed. In this study, we introduce cyber vaccination for conferring immunity to deepfakes. Analogous to biological vaccination th…
▽ More
Deepfakes pose an evolving threat to cybersecurity, which calls for the development of automated countermeasures. While considerable forensic research has been devoted to the detection and localisation of deepfakes, solutions for reversing fake to real are yet to be developed. In this study, we introduce cyber vaccination for conferring immunity to deepfakes. Analogous to biological vaccination that injects antigens to induce immunity prior to infection by an actual pathogen, cyber vaccination simulates deepfakes and performs adversarial training to build a defensive immune system. Aiming at building up attack-agnostic immunity with limited computational resources, we propose to simulate various deepfakes with one single overpowered attack: face masking. The proposed immune system consists of a vaccinator for inducing immunity and a neutraliser for recovering facial content. Experimental evaluations demonstrate effective immunity to face replacement, face reenactment and various types of corruptions.
△ Less
Submitted 5 March, 2023;
originally announced March 2023.
-
i2LQR: Iterative LQR for Iterative Tasks in Dynamic Environments
Authors:
Yifan Zeng,
Suiyi He,
Han Hoang Nguyen,
Yihan Li,
Zhongyu Li,
Koushil Sreenath,
Jun Zeng
Abstract:
This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike…
▽ More
This work introduces a novel control strategy called Iterative Linear Quadratic Regulator for Iterative Tasks (i2LQR), which aims to improve closed-loop performance with local trajectory optimization for iterative tasks in a dynamic environment. The proposed algorithm is reference-free and utilizes historical data from previous iterations to enhance the performance of the autonomous system. Unlike existing algorithms, the i2LQR computes the optimal solution in an iterative manner at each timestamp, rendering it well-suited for iterative tasks with changing constraints at different iterations. To evaluate the performance of the proposed algorithm, we conduct numerical simulations for an iterative task aimed at minimizing completion time. The results show that i2LQR achieves an optimized performance with respect to learning-based MPC (LMPC) as the benchmark in static environments, and outperforms LMPC in dynamic environments with both static and dynamics obstacles.
△ Less
Submitted 6 September, 2023; v1 submitted 27 February, 2023;
originally announced February 2023.
-
Design of Mobile Manipulator for Fire Extinguisher Testing. Part II: Design and Simulation
Authors:
Thai Nguyen Chau,
Xuan Quang Ngo,
Van Tu Duong,
Trong Trung Nguyen,
Huy Hung Nguyen,
Tan Tien Nguyen
Abstract:
All flames are extinguished as early as possible, or fire services have to deal with major conflagrations. This leads to the fact that the quality of fire extinguishers has become a very sensitive and important issue in firefighting. Inspired by the development of automatic fire fighting systems, this paper presents a mobile manipulator to evaluate the power of fire extinguishers, which is designe…
▽ More
All flames are extinguished as early as possible, or fire services have to deal with major conflagrations. This leads to the fact that the quality of fire extinguishers has become a very sensitive and important issue in firefighting. Inspired by the development of automatic fire fighting systems, this paper presents a mobile manipulator to evaluate the power of fire extinguishers, which is designed according to the standard of fire extinguishers named as ISO 7165:2009 and ISO 11601:2008. A detailed discussion on key specifications solutions and mechanical design of the chassis of the mobile manipulator has been presented in Part I: Key Specifications and Conceptual Design. The focus of this part is on the rest of the mechanical design and controller de-sign of the mobile manipulator.
△ Less
Submitted 26 January, 2023;
originally announced January 2023.
-
Face Forgery Detection Based on Facial Region Displacement Trajectory Series
Authors:
YuYang Sun,
ZhiYong Zhang,
Isao Echizen,
Huy H. Nguyen,
ChangZhen Qiu,
Lu Sun
Abstract:
Deep-learning-based technologies such as deepfakes ones have been attracting widespread attention in both society and academia, particularly ones used to synthesize forged face images. These automatic and professional-skill-free face manipulation technologies can be used to replace the face in an original image or video with any target object while maintaining the expression and demeanor. Since hu…
▽ More
Deep-learning-based technologies such as deepfakes ones have been attracting widespread attention in both society and academia, particularly ones used to synthesize forged face images. These automatic and professional-skill-free face manipulation technologies can be used to replace the face in an original image or video with any target object while maintaining the expression and demeanor. Since human faces are closely related to identity characteristics, maliciously disseminated identity manipulated videos could trigger a crisis of public trust in the media and could even have serious political, social, and legal implications. To effectively detect manipulated videos, we focus on the position offset in the face blending process, resulting from the forced affine transformation of the normalized forged face. We introduce a method for detecting manipulated videos that is based on the trajectory of the facial region displacement. Specifically, we develop a virtual-anchor-based method for extracting the facial trajectory, which can robustly represent displacement information. This information was used to construct a network for exposing multidimensional artifacts in the trajectory sequences of manipulated videos that is based on dual-stream spatial-temporal graph attention and a gated recurrent unit backbone. Testing of our method on various manipulation datasets demonstrated that its accuracy and generalization ability is competitive with that of the leading detection methods.
△ Less
Submitted 7 December, 2022;
originally announced December 2022.
-
Bit-Interleaved Coded Energy-Based Modulation with Iterative Decoding
Authors:
Ali Fazeli,
Ha H. Nguyen,
Halim Yanikomeroglu
Abstract:
This paper develops a low-complexity near-optimal non-coherent receiver for a multi-level energy-based coded modulation system. Inspired by the turbo processing principle, we incorporate the fundamentals of bit-interleaved coded modulation with iterative decoding (BICM-ID) into the proposed receiver design. The resulting system is called bit-interleaved coded energy-based modulation with iterative…
▽ More
This paper develops a low-complexity near-optimal non-coherent receiver for a multi-level energy-based coded modulation system. Inspired by the turbo processing principle, we incorporate the fundamentals of bit-interleaved coded modulation with iterative decoding (BICM-ID) into the proposed receiver design. The resulting system is called bit-interleaved coded energy-based modulation with iterative decoding (BICEM-ID) and its error performance is analytically studied. Specifically, we derive upper bounds on the average pairwise error probability (PEP) of the non-coherent BICEM-ID system in the feedback-free (FF) and error-free feedback (EFF) scenarios. It is revealed that the definition of the nearest neighbors, which is important in the performance analysis in the FF scenario, is very different from that in the coherent BICM-ID counterpart. The analysis also reveals how the mapping from coded bits to energy levels influences the diversity order and coding gain of the BICEM-ID systems. A design criterion for good mappings is then formulated and an algorithm is proposed to find a set of best mappings for BICEM-ID. Finally, simulation results corroborate the main analytical findings.
△ Less
Submitted 22 November, 2022;
originally announced November 2022.
-
Multi-level Design for Multiple-Symbol Non-Coherent Unitary Constellations for Massive SIMO Systems
Authors:
Son T. Duong,
Ha H. Nguyen,
Ebrahim Bedeer
Abstract:
This paper investigates non-coherent detection of single-input multiple-output (SIMO) systems over block Rayleigh fading channels. Using the Kullback-Leibler divergence as the design criterion, we formulate a multiple-symbol constellation optimization problem, which turns out to have high computational complexity to construct and detect. We exploit the structure of the formulated problem and decou…
▽ More
This paper investigates non-coherent detection of single-input multiple-output (SIMO) systems over block Rayleigh fading channels. Using the Kullback-Leibler divergence as the design criterion, we formulate a multiple-symbol constellation optimization problem, which turns out to have high computational complexity to construct and detect. We exploit the structure of the formulated problem and decouple it into a unitary constellation design and a multi-level design. The proposed multi-level design has low complexity in both construction and detection. Simulation results show that our multi-level design has better performance than traditional pilot-based schemes and other existing low-complexity multi-level designs.
△ Less
Submitted 17 November, 2022;
originally announced November 2022.
-
A Stronger Baseline For Automatic Pfirrmann Grading Of Lumbar Spine MRI Using Deep Learning
Authors:
Narasimharao Kowlagi,
Huy Hoang Nguyen,
Terence McSweeney,
Simo Saarakkala,
Juhani määttä,
Jaro Karppinen,
Aleksei Tiulpin
Abstract:
This paper addresses the challenge of grading visual features in lumbar spine MRI using Deep Learning. Such a method is essential for the automatic quantification of structural changes in the spine, which is valuable for understanding low back pain. Multiple recent studies investigated different architecture designs, and the most recent success has been attributed to the use of transformer archite…
▽ More
This paper addresses the challenge of grading visual features in lumbar spine MRI using Deep Learning. Such a method is essential for the automatic quantification of structural changes in the spine, which is valuable for understanding low back pain. Multiple recent studies investigated different architecture designs, and the most recent success has been attributed to the use of transformer architectures. In this work, we argue that with a well-tuned three-stage pipeline comprising semantic segmentation, localization, and classification, convolutional networks outperform the state-of-the-art approaches. We conducted an ablation study of the existing methods in a population cohort, and report performance generalization across various subgroups. Our code is publicly available to advance research on disc degeneration and low back pain.
△ Less
Submitted 26 October, 2022;
originally announced October 2022.
-
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data
Authors:
Huy Hoang Nguyen,
Matthew B. Blaschko,
Simo Saarakkala,
Aleksei Tiulpin
Abstract:
Deep neural networks are often applied to medical images to automate the problem of medical diagnosis. However, a more clinically relevant question that practitioners usually face is how to predict the future trajectory of a disease. Current methods for prognosis or disease trajectory forecasting often require domain knowledge and are complicated to apply. In this paper, we formulate the prognosis…
▽ More
Deep neural networks are often applied to medical images to automate the problem of medical diagnosis. However, a more clinically relevant question that practitioners usually face is how to predict the future trajectory of a disease. Current methods for prognosis or disease trajectory forecasting often require domain knowledge and are complicated to apply. In this paper, we formulate the prognosis prediction problem as a one-to-many prediction problem. Inspired by a clinical decision-making process with two agents -- a radiologist and a general practitioner -- we predict prognosis with two transformer-based components that share information with each other. The first transformer in this framework aims to analyze the imaging data, and the second one leverages its internal states as inputs, also fusing them with auxiliary clinical data. The temporal nature of the problem is modeled within the transformer states, allowing us to treat the forecasting problem as a multi-task classification, for which we propose a novel loss. We show the effectiveness of our approach in predicting the development of structural knee osteoarthritis changes and forecasting Alzheimer's disease clinical status directly from raw multi-modal data. The proposed method outperforms multiple state-of-the-art baselines with respect to performance and calibration, both of which are needed for real-world applications. An open-source implementation of our method is made publicly available at \url{https://github.com/Oulu-IMEDS/CLIMATv2}.
△ Less
Submitted 19 September, 2023; v1 submitted 25 October, 2022;
originally announced October 2022.
-
Analysis of Master Vein Attacks on Finger Vein Recognition Systems
Authors:
Huy H. Nguyen,
Trung-Nghia Le,
Junichi Yamagishi,
Isao Echizen
Abstract:
Finger vein recognition (FVR) systems have been commercially used, especially in ATMs, for customer verification. Thus, it is essential to measure their robustness against various attack methods, especially when a hand-crafted FVR system is used without any countermeasure methods. In this paper, we are the first in the literature to introduce master vein attacks in which we craft a vein-looking im…
▽ More
Finger vein recognition (FVR) systems have been commercially used, especially in ATMs, for customer verification. Thus, it is essential to measure their robustness against various attack methods, especially when a hand-crafted FVR system is used without any countermeasure methods. In this paper, we are the first in the literature to introduce master vein attacks in which we craft a vein-looking image so that it can falsely match with as many identities as possible by the FVR systems. We present two methods for generating master veins for use in attacking these systems. The first uses an adaptation of the latent variable evolution algorithm with a proposed generative model (a multi-stage combination of beta-VAE and WGAN-GP models). The second uses an adversarial machine learning attack method to attack a strong surrogate CNN-based recognition system. The two methods can be easily combined to boost their attack ability. Experimental results demonstrated that the proposed methods alone and together achieved false acceptance rates up to 73.29% and 88.79%, respectively, against Miura's hand-crafted FVR system. We also point out that Miura's system is easily compromised by non-vein-looking samples generated by a WGAN-GP model with false acceptance rates up to 94.21%. The results raise the alarm about the robustness of such systems and suggest that master vein attacks should be considered an important security measure.
△ Less
Submitted 18 October, 2022;
originally announced October 2022.
-
MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities
Authors:
Hoang H. Nguyen,
Nhat-Minh Nguyen,
Chunyao Xie,
Zahra Ahmadi,
Daniel Kudendo,
Thanh-Nam Doan,
Lingxiao Jiang
Abstract:
Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulner…
▽ More
Learning heterogeneous graphs consisting of different types of nodes and edges enhances the results of homogeneous graph techniques. An interesting example of such graphs is control-flow graphs representing possible software code execution flows. As such graphs represent more semantic information of code, developing techniques and tools for such graphs can be highly beneficial for detecting vulnerabilities in software for its reliability. However, existing heterogeneous graph techniques are still insufficient in handling complex graphs where the number of different types of nodes and edges is large and variable. This paper concentrates on the Ethereum smart contracts as a sample of software codes represented by heterogeneous contract graphs built upon both control-flow graphs and call graphs containing different types of nodes and links. We propose MANDO, a new heterogeneous graph representation to learn such heterogeneous contract graphs' structures. MANDO extracts customized metapaths, which compose relational connections between different types of nodes and their neighbors. Moreover, it develops a multi-metapath heterogeneous graph attention network to learn multi-level embeddings of different types of nodes and their metapaths in the heterogeneous contract graphs, which can capture the code semantics of smart contracts more accurately and facilitate both fine-grained line-level and coarse-grained contract-level vulnerability detection. Our extensive evaluation of large smart contract datasets shows that MANDO improves the vulnerability detection results of other techniques at the coarse-grained contract level. More importantly, it is the first learning-based approach capable of identifying vulnerabilities at the fine-grained line-level, and significantly improves the traditional code analysis-based vulnerability detection approaches by 11.35% to 70.81% in terms of F1-score.
△ Less
Submitted 7 September, 2022; v1 submitted 28 August, 2022;
originally announced August 2022.
-
Rethinking Adversarial Examples for Location Privacy Protection
Authors:
Trung-Nghia Le,
Ta Gu,
Huy H. Nguyen,
Isao Echizen
Abstract:
We have investigated a new application of adversarial examples, namely location privacy protection against landmark recognition systems. We introduce mask-guided multimodal projected gradient descent (MM-PGD), in which adversarial examples are trained on different deep models. Image contents are protected by analyzing the properties of regions to identify the ones most suitable for blending in adv…
▽ More
We have investigated a new application of adversarial examples, namely location privacy protection against landmark recognition systems. We introduce mask-guided multimodal projected gradient descent (MM-PGD), in which adversarial examples are trained on different deep models. Image contents are protected by analyzing the properties of regions to identify the ones most suitable for blending in adversarial examples. We investigated two region identification strategies: class activation map-based MM-PGD, in which the internal behaviors of trained deep models are targeted; and human-vision-based MM-PGD, in which regions that attract less human attention are targeted. Experiments on the Places365 dataset demonstrated that these strategies are potentially effective in defending against black-box landmark recognition systems without the need for much image manipulation.
△ Less
Submitted 28 June, 2022;
originally announced June 2022.
-
AdaTriplet: Adaptive Gradient Triplet Loss with Automatic Margin Learning for Forensic Medical Image Matching
Authors:
Khanh Nguyen,
Huy Hoang Nguyen,
Aleksei Tiulpin
Abstract:
This paper tackles the challenge of forensic medical image matching (FMIM) using deep neural networks (DNNs). FMIM is a particular case of content-based image retrieval (CBIR). The main challenge in FMIM compared to the general case of CBIR, is that the subject to whom a query image belongs may be affected by aging and progressive degenerative disorders, making it difficult to match data on a subj…
▽ More
This paper tackles the challenge of forensic medical image matching (FMIM) using deep neural networks (DNNs). FMIM is a particular case of content-based image retrieval (CBIR). The main challenge in FMIM compared to the general case of CBIR, is that the subject to whom a query image belongs may be affected by aging and progressive degenerative disorders, making it difficult to match data on a subject level. CBIR with DNNs is generally solved by minimizing a ranking loss, such as Triplet loss (TL), computed on image representations extracted by a DNN from the original data. TL, in particular, operates on triplets: anchor, positive (similar to anchor) and negative (dissimilar to anchor). Although TL has been shown to perform well in many CBIR tasks, it still has limitations, which we identify and analyze in this work. In this paper, we introduce (i) the AdaTriplet loss -- an extension of TL whose gradients adapt to different difficulty levels of negative samples, and (ii) the AutoMargin method -- a technique to adjust hyperparameters of margin-based losses such as TL and our proposed loss dynamically. Our results are evaluated on two large-scale benchmarks for FMIM based on the Osteoarthritis Initiative and Chest X-ray-14 datasets. The codes allowing replication of this study have been made publicly available at \url{https://github.com/Oulu-IMEDS/AdaTriplet}.
△ Less
Submitted 10 May, 2022; v1 submitted 5 May, 2022;
originally announced May 2022.
-
Robust Deepfake On Unrestricted Media: Generation And Detection
Authors:
Trung-Nghia Le,
Huy H Nguyen,
Junichi Yamagishi,
Isao Echizen
Abstract:
Recent advances in deep learning have led to substantial improvements in deepfake generation, resulting in fake media with a more realistic appearance. Although deepfake media have potential application in a wide range of areas and are drawing much attention from both the academic and industrial communities, it also leads to serious social and criminal concerns. This chapter explores the evolution…
▽ More
Recent advances in deep learning have led to substantial improvements in deepfake generation, resulting in fake media with a more realistic appearance. Although deepfake media have potential application in a wide range of areas and are drawing much attention from both the academic and industrial communities, it also leads to serious social and criminal concerns. This chapter explores the evolution of and challenges in deepfake generation and detection. It also discusses possible ways to improve the robustness of deepfake detection for a wide variety of media (e.g., in-the-wild images and videos). Finally, it suggests a focus for future fake media research.
△ Less
Submitted 13 February, 2022;
originally announced February 2022.
-
On Unbalanced Optimal Transport: Gradient Methods, Sparsity and Approximation Error
Authors:
Quang Minh Nguyen,
Hoang H. Nguyen,
Yi Zhou,
Lam M. Nguyen
Abstract:
We study the Unbalanced Optimal Transport (UOT) between two measures of possibly different masses with at most $n$ components, where the marginal constraints of standard Optimal Transport (OT) are relaxed via Kullback-Leibler divergence with regularization factor $τ$. Although only Sinkhorn-based UOT solvers have been analyzed in the literature with the iteration complexity of…
▽ More
We study the Unbalanced Optimal Transport (UOT) between two measures of possibly different masses with at most $n$ components, where the marginal constraints of standard Optimal Transport (OT) are relaxed via Kullback-Leibler divergence with regularization factor $τ$. Although only Sinkhorn-based UOT solvers have been analyzed in the literature with the iteration complexity of ${O}\big(\tfrac{τ\log(n)}{\varepsilon} \log\big(\tfrac{\log(n)}{\varepsilon}\big)\big)$ and per-iteration cost of $O(n^2)$ for achieving the desired error $\varepsilon$, their positively dense output transportation plans strongly hinder the practicality. On the other hand, while being vastly used as heuristics for computing UOT in modern deep learning applications and having shown success in sparse OT problem, gradient methods applied to UOT have not been formally studied. In this paper, we propose a novel algorithm based on Gradient Extrapolation Method (GEM-UOT) to find an $\varepsilon$-approximate solution to the UOT problem in $O\big( κ\log\big(\frac{τn}{\varepsilon}\big) \big)$ iterations with $\widetilde{O}(n^2)$ per-iteration cost, where $κ$ is the condition number depending on only the two input measures. Our proof technique is based on a novel dual formulation of the squared $\ell_2$-norm UOT objective, which fills the lack of sparse UOT literature and also leads to a new characterization of approximation error between UOT and OT. To this end, we further present a novel approach of OT retrieval from UOT, which is based on GEM-UOT with fine tuned $τ$ and a post-process projection step. Extensive experiments on synthetic and real datasets validate our theories and demonstrate the favorable performance of our methods in practice.
△ Less
Submitted 8 January, 2024; v1 submitted 7 February, 2022;
originally announced February 2022.
-
Hybrid Adaptive Control for Series Elastic Actuator of Humanoid Robot
Authors:
Anh Khoa Lanh Luu,
Van Tu Duong,
Huy Hung Nguyen,
Sang Bong Kim,
Tan Tien Nguyen
Abstract:
Generally, humanoid robots usually suffer significant impact force when walking or running in a non-predefined environment that could easily damage the actuators due to high stiffness. In recent years, the usages of passive, compliant series elastic actuators (SEA) for driving humanoid's joints have proved the capability in many aspects so far. However, despite being widely applied in the biped ro…
▽ More
Generally, humanoid robots usually suffer significant impact force when walking or running in a non-predefined environment that could easily damage the actuators due to high stiffness. In recent years, the usages of passive, compliant series elastic actuators (SEA) for driving humanoid's joints have proved the capability in many aspects so far. However, despite being widely applied in the biped robot research field, the stable control problem for a humanoid powered by the SEAs, especially in the walking process, is still a challenge. This paper proposes a model reference adaptive control (MRAC) combined with the backstepping algorithm to deal with the parameter uncertainties in a humanoid's lower limb driven by the SEA system. This is also an extension of our previous research (Lanh et al.,2021). Firstly, a dynamic model of SEA is obtained. Secondly, since there are unknown and uncertain parameters in the SEA model, a model reference adaptive controller (MRAC) is employed to guarantee the robust performance of the humanoid's lower limb. Finally, an experiment is carried out to evaluate the effectiveness of the proposed controller and the SEA mechanism.
△ Less
Submitted 23 January, 2022;
originally announced January 2022.
-
Closer Look at the Transferability of Adversarial Examples: How They Fool Different Models Differently
Authors:
Futa Waseda,
Sosuke Nishikawa,
Trung-Nghia Le,
Huy H. Nguyen,
Isao Echizen
Abstract:
Deep neural networks are vulnerable to adversarial examples (AEs), which have adversarial transferability: AEs generated for the source model can mislead another (target) model's predictions. However, the transferability has not been understood in terms of to which class target model's predictions were misled (i.e., class-aware transferability). In this paper, we differentiate the cases in which a…
▽ More
Deep neural networks are vulnerable to adversarial examples (AEs), which have adversarial transferability: AEs generated for the source model can mislead another (target) model's predictions. However, the transferability has not been understood in terms of to which class target model's predictions were misled (i.e., class-aware transferability). In this paper, we differentiate the cases in which a target model predicts the same wrong class as the source model ("same mistake") or a different wrong class ("different mistake") to analyze and provide an explanation of the mechanism. We find that (1) AEs tend to cause same mistakes, which correlates with "non-targeted transferability"; however, (2) different mistakes occur even between similar models, regardless of the perturbation size. Furthermore, we present evidence that the difference between same mistakes and different mistakes can be explained by non-robust features, predictive but human-uninterpretable patterns: different mistakes occur when non-robust features in AEs are used differently by models. Non-robust features can thus provide consistent explanations for the class-aware transferability of AEs.
△ Less
Submitted 19 October, 2022; v1 submitted 28 December, 2021;
originally announced December 2021.
-
RIS-Aided Cell-Free Massive MIMO Systems: Joint Design of Transmit Beamforming and Phase Shifts
Authors:
Si-Nian Jin,
Dian-Wu Yue,
Ha H. Nguyen
Abstract:
This paper studies RIS-aided cell-free massive MIMO systems, where multiple RISs are deployed to assist the communication between multiple access points (APs) and multiple users, with either continuous or discrete phase shifts at the RISs. We formulate the max-min fairness problem that maximizes the minimum achievable rate among all users by jointly optimizing the transmit beamforming at active AP…
▽ More
This paper studies RIS-aided cell-free massive MIMO systems, where multiple RISs are deployed to assist the communication between multiple access points (APs) and multiple users, with either continuous or discrete phase shifts at the RISs. We formulate the max-min fairness problem that maximizes the minimum achievable rate among all users by jointly optimizing the transmit beamforming at active APs and the phase shifts at passive RISs, subject to power constraints at the APs. To address such a challenging problem, we first study the special single-user scenario and propose an algorithm that can transform the optimization problem into semidefinite program (SDP) or integer linear program (ILP) for the cases of continuous and discrete phase shifts, respectively. By solving the resulting SDP and ILP, we first obtain the optimal phase shifts, and then design the optimal transmit beamforming accordingly. To solve the optimization problem for the multi-user scenario and continuous phase shifts at RISs, we extend the single-user algorithm and propose an alternating optimization algorithm, which can first decompose the max-min fairness problem into two subproblems related to transmit beamforming and phase shifts, and then transform the two subproblems into second-order-cone program and SDP, respectively. For the multi-user scenario and discrete phase shifts, the max-min fairness problem is shown to be a mixed-integer non-linear program (MINLP). To tackle it, we design a ZF-based successive refinement algorithm, which can find a suboptimal transmit beamforming and phase shifts by means of alternating optimization. Numerical results show that compared with benchmark schemes of random phase shifts and without using RISs, the proposed algorithms can significantly increase the minimum achievable rate among all users, especially when the number of reflecting elements at each RIS is large.
△ Less
Submitted 13 December, 2021;
originally announced December 2021.
-
Joint Cluster Head Selection and Trajectory Planning in UAV-Aided IoT Networks by Reinforcement Learning with Sequential Model
Authors:
Botao Zhu,
Ebrahim Bedeer,
Ha H. Nguyen,
Robert Barton,
Jerome Henry
Abstract:
Employing unmanned aerial vehicles (UAVs) has attracted growing interests and emerged as the state-of-the-art technology for data collection in Internet-of-Things (IoT) networks. In this paper, with the objective of minimizing the total energy consumption of the UAV-IoT system, we formulate the problem of jointly designing the UAV's trajectory and selecting cluster heads in the IoT network as a co…
▽ More
Employing unmanned aerial vehicles (UAVs) has attracted growing interests and emerged as the state-of-the-art technology for data collection in Internet-of-Things (IoT) networks. In this paper, with the objective of minimizing the total energy consumption of the UAV-IoT system, we formulate the problem of jointly designing the UAV's trajectory and selecting cluster heads in the IoT network as a constrained combinatorial optimization problem which is classified as NP-hard and challenging to solve. We propose a novel deep reinforcement learning (DRL) with a sequential model strategy that can effectively learn the policy represented by a sequence-to-sequence neural network for the UAV's trajectory design in an unsupervised manner. Through extensive simulations, the obtained results show that the proposed DRL method can find the UAV's trajectory that requires much less energy consumption when compared to other baseline algorithms and achieves close-to-optimal performance. In addition, simulation results show that the trained model by our proposed DRL algorithm has an excellent generalization ability to larger problem sizes without the need to retrain the model.
△ Less
Submitted 1 December, 2021;
originally announced December 2021.
-
Effectiveness of Detection-based and Regression-based Approaches for Estimating Mask-Wearing Ratio
Authors:
Khanh-Duy Nguyen,
Huy H. Nguyen,
Trung-Nghia Le,
Junichi Yamagishi,
Isao Echizen
Abstract:
Estimating the mask-wearing ratio in public places is important as it enables health authorities to promptly analyze and implement policies. Methods for estimating the mask-wearing ratio on the basis of image analysis have been reported. However, there is still a lack of comprehensive research on both methodologies and datasets. Most recent reports straightforwardly propose estimating the ratio by…
▽ More
Estimating the mask-wearing ratio in public places is important as it enables health authorities to promptly analyze and implement policies. Methods for estimating the mask-wearing ratio on the basis of image analysis have been reported. However, there is still a lack of comprehensive research on both methodologies and datasets. Most recent reports straightforwardly propose estimating the ratio by applying conventional object detection and classification methods. It is feasible to use regression-based approaches to estimate the number of people wearing masks, especially for congested scenes with tiny and occluded faces, but this has not been well studied. A large-scale and well-annotated dataset is still in demand. In this paper, we present two methods for ratio estimation that leverage either a detection-based or regression-based approach. For the detection-based approach, we improved the state-of-the-art face detector, RetinaFace, used to estimate the ratio. For the regression-based approach, we fine-tuned the baseline network, CSRNet, used to estimate the density maps for masked and unmasked faces. We also present the first large-scale dataset, the ``NFM dataset,'' which contains 581,108 face annotations extracted from 18,088 video frames in 17 street-view videos. Experiments demonstrated that the RetinaFace-based method has higher accuracy under various situations and that the CSRNet-based method has a shorter operation time thanks to its compactness.
△ Less
Submitted 3 December, 2021; v1 submitted 24 November, 2021;
originally announced November 2021.
-
Concurrent Transmission and Multiuser Detection of LoRa Signals
Authors:
The Khai Nguyen,
Ha H. Nguyen,
Ebrahim Bedeer
Abstract:
This paper investigates a new model to improve the scalability of low-power long-range (LoRa) networks by allowing multiple end devices (EDs) to simultaneously communicate with multiple multi-antenna gateways on the same frequency band and using the same spreading factor. The maximum likelihood (ML) decision rule is first derived for non-coherent detection of information bits transmitted by multip…
▽ More
This paper investigates a new model to improve the scalability of low-power long-range (LoRa) networks by allowing multiple end devices (EDs) to simultaneously communicate with multiple multi-antenna gateways on the same frequency band and using the same spreading factor. The maximum likelihood (ML) decision rule is first derived for non-coherent detection of information bits transmitted by multiple devices. To overcome the high complexity of the ML detection, we propose a sub-optimal two-stage detection algorithm to balance the computational complexity and error performance. In the first stage, we identify transmit chirps (without knowing which EDs transmit them). In the second stage, we determine the EDs that transmit the specific chirps identified from the first stage. To improve the detection performance in the second stage, we also optimize the transmit powers of EDs to minimize the similarity, measured by the Jaccard coefficient, between the received powers of any pair of EDs. As the power control optimization problem is non-convex, we use concepts from successive convex approximation to transform it to an approximate convex optimization problem that can be solved iteratively and guaranteed to reach a sub-optimal solution. Simulation results demonstrate and justify the tradeoff between transmit power penalties and network scalability of the proposed LoRa network model. In particular, by allowing concurrent transmission of 2 or 3 EDs, the uplink capacity of the proposed network can be doubled or tripled over that of a conventional LoRa network, albeit at the expense of additional 3.0 or 4.7 dB transmit power.
△ Less
Submitted 18 November, 2021;
originally announced November 2021.
-
Master Face Attacks on Face Recognition Systems
Authors:
Huy H. Nguyen,
Sébastien Marcel,
Junichi Yamagishi,
Isao Echizen
Abstract:
Face authentication is now widely used, especially on mobile devices, rather than authentication using a personal identification number or an unlock pattern, due to its convenience. It has thus become a tempting target for attackers using a presentation attack. Traditional presentation attacks use facial images or videos of the victim. Previous work has proven the existence of master faces, i.e.,…
▽ More
Face authentication is now widely used, especially on mobile devices, rather than authentication using a personal identification number or an unlock pattern, due to its convenience. It has thus become a tempting target for attackers using a presentation attack. Traditional presentation attacks use facial images or videos of the victim. Previous work has proven the existence of master faces, i.e., faces that match multiple enrolled templates in face recognition systems, and their existence extends the ability of presentation attacks. In this paper, we perform an extensive study on latent variable evolution (LVE), a method commonly used to generate master faces. We run an LVE algorithm for various scenarios and with more than one database and/or face recognition system to study the properties of the master faces and to understand in which conditions strong master faces could be generated. Moreover, through analysis, we hypothesize that master faces come from some dense areas in the embedding spaces of the face recognition systems. Last but not least, simulated presentation attacks using generated master faces generally preserve the false-matching ability of their original digital forms, thus demonstrating that the existence of master faces poses an actual threat.
△ Less
Submitted 7 September, 2021;
originally announced September 2021.
-
Angle Estimation for Terahertz Ultra-Massive MIMO-Based Space-to-Air Communications
Authors:
Anwen Liao,
Zhen Gao,
Yang Yang,
Ha H. Nguyen,
Hua Wang,
Hao Yin
Abstract:
This paper investigates terahertz ultra-massive (UM)-MIMO-based angle estimation for space-to-air communications, which can solve the performance degradation problem caused by the dual delay-beam squint effects of terahertz UM-MIMO channels. Specifically, we first design a grouping true-time delay unit module that can significantly mitigate the impact of delay-beam squint effects to establish the…
▽ More
This paper investigates terahertz ultra-massive (UM)-MIMO-based angle estimation for space-to-air communications, which can solve the performance degradation problem caused by the dual delay-beam squint effects of terahertz UM-MIMO channels. Specifically, we first design a grouping true-time delay unit module that can significantly mitigate the impact of delay-beam squint effects to establish the space-to-air THz link. Based on the subarray selection scheme, the UM hybrid array can be equivalently considered as a low-dimensional fully-digital array, and then the fine estimates of azimuth/elevation angles at both UAVs and satellite can be separately acquired using the proposed prior-aided iterative angle estimation algorithm. The simulation results that close to Cramér-Rao lower bounds verify the effectiveness of our solution.
△ Less
Submitted 2 August, 2021;
originally announced August 2021.
-
UAV Trajectory Planning in Wireless Sensor Networks for Energy Consumption Minimization by Deep Reinforcement Learning
Authors:
Botao Zhu,
Ebrahim Bedeer,
Ha H. Nguyen,
Robert Barton,
Jerome Henry
Abstract:
Unmanned aerial vehicles (UAVs) have emerged as a promising candidate solution for data collection of large-scale wireless sensor networks (WSNs). In this paper, we investigate a UAV-aided WSN, where cluster heads (CHs) receive data from their member nodes, and a UAV is dispatched to collect data from CHs along the planned trajectory. We aim to minimize the total energy consumption of the UAV-WSN…
▽ More
Unmanned aerial vehicles (UAVs) have emerged as a promising candidate solution for data collection of large-scale wireless sensor networks (WSNs). In this paper, we investigate a UAV-aided WSN, where cluster heads (CHs) receive data from their member nodes, and a UAV is dispatched to collect data from CHs along the planned trajectory. We aim to minimize the total energy consumption of the UAV-WSN system in a complete round of data collection. Toward this end, we formulate the energy consumption minimization problem as a constrained combinatorial optimization problem by jointly selecting CHs from nodes within clusters and planning the UAV's visiting order to the selected CHs. The formulated energy consumption minimization problem is NP-hard, and hence, hard to solve optimally. In order to tackle this challenge, we propose a novel deep reinforcement learning (DRL) technique, pointer network-A* (Ptr-A*), which can efficiently learn from experiences the UAV trajectory policy for minimizing the energy consumption. The UAV's start point and the WSN with a set of pre-determined clusters are fed into the Ptr-A*, and the Ptr-A* outputs a group of CHs and the visiting order to these CHs, i.e., the UAV's trajectory. The parameters of the Ptr-A* are trained on small-scale clusters problem instances for faster training by using the actor-critic algorithm in an unsupervised manner. At inference, three search strategies are also proposed to improve the quality of solutions. Simulation results show that the trained models based on 20-clusters and 40-clusters have a good generalization ability to solve the UAV's trajectory planning problem in WSNs with different numbers of clusters, without the need to retrain the models. Furthermore, the results show that our proposed DRL algorithm outperforms two baseline techniques.
△ Less
Submitted 31 July, 2021;
originally announced August 2021.