subscribe to arXiv mailings

ESGNN: Towards Equivariant Scene Graph Neural Network for 3D Scene Understanding

Authors: Quang P. M. Pham, Khoi T. N. Nguyen, Lan C. Ngo, Truong Do, Truong Son Hy

Abstract: Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, m… ▽ More Scene graphs have been proven to be useful for various scene understanding tasks due to their compact and explicit nature. However, existing approaches often neglect the importance of maintaining the symmetry-preserving property when generating scene graphs from 3D point clouds. This oversight can diminish the accuracy and robustness of the resulting scene graphs, especially when handling noisy, multi-view 3D data. This work, to the best of our knowledge, is the first to implement an Equivariant Graph Neural Network in semantic scene graph generation from 3D point clouds for scene understanding. Our proposed method, ESGNN, outperforms existing state-of-the-art approaches, demonstrating a significant improvement in scene estimation with faster convergence. ESGNN demands low computational resources and is easy to implement from available frameworks, paving the way for real-time applications such as robotics and computer vision. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.03820 [pdf, other]

A Survey on Intelligent Internet of Things: Applications, Security, Privacy, and Future Directions

Authors: Ons Aouedi, Thai-Hoc Vu, Alessio Sacco, Dinh C. Nguyen, Kandaraj Piamrat, Guido Marchetto, Quoc-Viet Pham

Abstract: The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT… ▽ More The rapid advances in the Internet of Things (IoT) have promoted a revolution in communication technology and offered various customer services. Artificial intelligence (AI) techniques have been exploited to facilitate IoT operations and maximize their potential in modern application scenarios. In particular, the convergence of IoT and AI has led to a new networking paradigm called Intelligent IoT (IIoT), which has the potential to significantly transform businesses and industrial domains. This paper presents a comprehensive survey of IIoT by investigating its significant applications in mobile networks, as well as its associated security and privacy issues. Specifically, we explore and discuss the roles of IIoT in a wide range of key application domains, from smart healthcare and smart cities to smart transportation and smart industries. Through such extensive discussions, we investigate important security issues in IIoT networks, where network attacks, confidentiality, integrity, and intrusion are analyzed, along with a discussion of potential countermeasures. Privacy issues in IIoT networks were also surveyed and discussed, including data, location, and model privacy leakage. Finally, we outline several key challenges and highlight potential research directions in this important area. △ Less

Submitted 21 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: This work has been accepted by IEEE Communications Surveys & Tutorials

arXiv:2405.20024 [pdf, other]

Applications of Generative AI (GAI) for Mobile and Wireless Networking: A Survey

Authors: Thai-Hoc Vu, Senthil Kumar Jagatheesaperumal, Minh-Duong Nguyen, Nguyen Van Huynh, Sunghwan Kim, Quoc-Viet Pham

Abstract: The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and… ▽ More The success of Artificial Intelligence (AI) in multiple disciplines and vertical domains in recent years has promoted the evolution of mobile networking and the future Internet toward an AI-integrated Internet-of-Things (IoT) era. Nevertheless, most AI techniques rely on data generated by physical devices (e.g., mobile devices and network nodes) or specific applications (e.g., fitness trackers and mobile gaming). To bypass this circumvent, Generative AI (GAI), a.k.a. AI-generated content (AIGC), has emerged as a powerful AI paradigm; thanks to its ability to efficiently learn complex data distributions and generate synthetic data to represent the original data in various forms. This impressive feature is projected to transform the management of mobile networking and diversify the current services and applications provided. On this basis, this work presents a concise tutorial on the role of GAIs in mobile and wireless networking. In particular, this survey first provides the fundamentals of GAI and representative GAI models, serving as an essential preliminary to the understanding of the applications of GAI in mobile and wireless networking. Then, this work provides a comprehensive review of state-of-the-art studies and GAI applications in network management, wireless security, semantic communication, and lessons learned from the open literature. Finally, this work summarizes the current research on GAI for mobile and wireless networking by outlining important challenges that need to be resolved to facilitate the development and applicability of GAI in this edge-cutting area. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.17002 [pdf, other]

UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models

Authors: Quan Van Nguyen, Huy Quang Pham, Dan Quang Tran, Thang Kien-Bao Nguyen, Nhat-Hao Nguyen-Dang, Bao-Thien Nguyen-Tat

Abstract: Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Metho… ▽ More Purpose: This study focuses on the development of automated text generation from radiology images, termed diagnostic captioning, to assist medical professionals in reducing clinical errors and improving productivity. The aim is to provide tools that enhance report quality and efficiency, which can significantly impact both clinical practice and deep learning research in the biomedical field. Methods: In our participation in the ImageCLEFmedical2024 Caption evaluation campaign, we explored caption prediction tasks using advanced Transformer-based models. We developed methods incorporating Transformer encoder-decoder and Query Transformer architectures. These models were trained and evaluated to generate diagnostic captions from radiology images. Results: Experimental evaluations demonstrated the effectiveness of our models, with the VisionDiagnostor-BioBART model achieving the highest BERTScore of 0.6267. This performance contributed to our team, DarkCow, achieving third place on the leaderboard. Conclusion: Our diagnostic captioning models show great promise in aiding medical professionals by generating high-quality reports efficiently. This approach can facilitate better data processing and performance optimization in medical imaging departments, ultimately benefiting healthcare delivery. △ Less

Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.03206 [pdf, other]

Vietnamese AI Generated Text Detection

Authors: Quang-Dan Tran, Van-Quan Nguyen, Quang-Huy Pham, K. B. Thang Nguyen, Trong-Hop Do

Abstract: In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we pre… ▽ More In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we present a dataset named ViDetect, comprising 6.800 samples of Vietnamese essay, with 3.400 samples authored by humans and the remainder generated by LLMs, serving the purpose of detecting text generated by AI. We conducted evaluations using state-of-the-art methods, including ViT5, BartPho, PhoBERT, mDeberta V3, and mBERT. These results contribute not only to the growing body of research on detecting text generated by AI but also demonstrate the adaptability and effectiveness of different methods in the Vietnamese language context. This research lays the foundation for future advancements in AI-generated text detection and provides valuable insights for researchers in the field of natural language processing. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.18397 [pdf, other]

ViOCRVQA: Novel Benchmark Dataset and Vision Reader for Visual Question Answering by Understanding Vietnamese Text in Images

Authors: Huy Quang Pham, Thang Kien-Bao Nguyen, Quan Van Nguyen, Dan Quang Tran, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recogniti… ▽ More Optical Character Recognition - Visual Question Answering (OCR-VQA) is the task of answering text information contained in images that have just been significantly developed in the English language in recent years. However, there are limited studies of this task in low-resource languages such as Vietnamese. To this end, we introduce a novel dataset, ViOCRVQA (Vietnamese Optical Character Recognition - Visual Question Answering dataset), consisting of 28,000+ images and 120,000+ question-answer pairs. In this dataset, all the images contain text and questions about the information relevant to the text in the images. We deploy ideas from state-of-the-art methods proposed for English to conduct experiments on our dataset, revealing the challenges and difficulties inherent in a Vietnamese dataset. Furthermore, we introduce a novel approach, called VisionReader, which achieved 0.4116 in EM and 0.6990 in the F1-score on the test set. Through the results, we found that the OCR system plays a very important role in VQA models on the ViOCRVQA dataset. In addition, the objects in the image also play a role in improving model performance. We open access to our dataset at link (https://github.com/qhnhynmm/ViOCRVQA.git) for further research in OCR-VQA task in Vietnamese. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.10652 [pdf, other]

ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images

Authors: Quan Van Nguyen, Dan Quang Tran, Huy Quang Pham, Thang Kien-Bao Nguyen, Nghia Hieu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along… ▽ More Visual Question Answering (VQA) is a complicated task that requires the capability of simultaneously processing natural language and images. Initially, this task was researched, focusing on methods to help machines understand objects and scene contexts in images. However, some text appearing in the image that carries explicit information about the full content of the image is not mentioned. Along with the continuous development of the AI era, there have been many studies on the reading comprehension ability of VQA models in the world. As a developing country, conditions are still limited, and this task is still open in Vietnam. Therefore, we introduce the first large-scale dataset in Vietnamese specializing in the ability to understand text appearing in images, we call it ViTextVQA (\textbf{Vi}etnamese \textbf{Text}-based \textbf{V}isual \textbf{Q}uestion \textbf{A}nswering dataset) which contains \textbf{over 16,000} images and \textbf{over 50,000} questions with answers. Through meticulous experiments with various state-of-the-art models, we uncover the significance of the order in which tokens in OCR text are processed and selected to formulate answers. This finding helped us significantly improve the performance of the baseline models on the ViTextVQA dataset. Our dataset is available at this \href{https://github.com/minhquan6203/ViTextVQA-Dataset}{link} for research purposes. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: Preprint submitted to IJCV

arXiv:2404.06257 [pdf, other]

DDPG-E2E: A Novel Policy Gradient Approach for End-to-End Communication Systems

Authors: Bolun Zhang, Nguyen Van Huynh, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham

Abstract: The End-to-end (E2E) learning-based approach has great potential to reshape the existing communication systems by replacing the transceivers with deep neural networks. To this end, the E2E learning approach needs to assume the availability of prior channel information to mathematically formulate a differentiable channel layer for the backpropagation (BP) of the error gradients, thereby jointly opt… ▽ More The End-to-end (E2E) learning-based approach has great potential to reshape the existing communication systems by replacing the transceivers with deep neural networks. To this end, the E2E learning approach needs to assume the availability of prior channel information to mathematically formulate a differentiable channel layer for the backpropagation (BP) of the error gradients, thereby jointly optimizing the transmitter and the receiver. However, accurate and instantaneous channel state information is hardly obtained in practical wireless communication scenarios. Moreover, the existing E2E learning-based solutions exhibit limited performance in data transmissions with large block lengths. In this article, these practical issues are addressed by our proposed deep deterministic policy gradient-based E2E communication system. In particular, the proposed solution utilizes a reward feedback mechanism to train both the transmitter and the receiver, which alleviates the information loss of error gradients during BP. In addition, a convolutional neural network (CNN)-based architecture is developed to mitigate the curse of dimensionality problem when transmitting messages with large block lengths. Extensive simulations then demonstrate that our proposed solution can not only jointly train the transmitter and the receiver simultaneously without requiring the prior channel knowledge but also can obtain significant performance improvement on block error rate compared to state-of-the-art solutions. △ Less

Submitted 9 April, 2024; originally announced April 2024.

arXiv:2404.05641 [pdf, other]

3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules

Authors: Maxence Bideaux, Alice Phe, Mohamed Chaouch, Bertrand Luvison, Quoc-Cuong Pham

Abstract: We introduce 3D-COCO, an extension of the original MS-COCO dataset providing 3D models and 2D-3D alignment annotations. 3D-COCO was designed to achieve computer vision tasks such as 3D reconstruction or image detection configurable with textual, 2D image, and 3D CAD model queries. We complete the existing MS-COCO dataset with 28K 3D models collected on ShapeNet and Objaverse. By using an IoU-based… ▽ More We introduce 3D-COCO, an extension of the original MS-COCO dataset providing 3D models and 2D-3D alignment annotations. 3D-COCO was designed to achieve computer vision tasks such as 3D reconstruction or image detection configurable with textual, 2D image, and 3D CAD model queries. We complete the existing MS-COCO dataset with 28K 3D models collected on ShapeNet and Objaverse. By using an IoU-based method, we match each MS-COCO annotation with the best 3D models to provide a 2D-3D alignment. The open-source nature of 3D-COCO is a premiere that should pave the way for new research on 3D-related topics. The dataset and its source codes is available at https://kalisteo.cea.fr/index.php/coco3d-object-detection-and-reconstruction/ △ Less

Submitted 16 July, 2024; v1 submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.19102 [pdf, other]

Automatic Fingerpad Customization for Precise and Stable Grasping of 3D-Print Parts

Authors: Joyce Xin-Yan Lim, Quang-Cuong Pham

Abstract: The rise in additive manufacturing comes with unique opportunities and challenges. Massive part customization and rapid design changes are made possible with additive manufacturing, however, manufacturing industries that desire the implementation of robotics automation to improve production efficiency could face challenges in the gripper design and grasp planning due to highly complex geometrical… ▽ More The rise in additive manufacturing comes with unique opportunities and challenges. Massive part customization and rapid design changes are made possible with additive manufacturing, however, manufacturing industries that desire the implementation of robotics automation to improve production efficiency could face challenges in the gripper design and grasp planning due to highly complex geometrical shapes resulting from massive part customization. Yet, current gripper design for such objects are often manual and rely on ad-hoc design intuition. This would be limiting as such grippers would lack the ability to grasp different objects or grasp points, which is important for practical implementations. Hence, we introduce a fast, end-to-end approach to customize rigid gripper fingerpads that could achieve precise and stable grasping for different objects at multiple grasp points. Our approach relies on two key components: (i) a method based on set Boolean operations, e.g. intersections, subtractions, and unions to extract object features and synthesize gripper surfaces that conform to different local shapes to form caging grasps; (ii) a method to evaluate the grasp quality of synthesized grippers. We experimentally demonstrate the validity of our approach by synthesizing fingerpads that, once mounted on a physical robot gripper, are able to grasp different objects at multiple grasp points, all with tightly constrained grasps. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2402.12035 [pdf, other]

Class-incremental Learning for Time Series: Benchmark and Evaluation

Authors: Zhongzheng Qiao, Quang Pham, Zhen Cao, Hoang H Le, P. N. Suganthan, Xudong Jiang, Ramasamy Savitha

Abstract: Real-world environments are inherently non-stationary, frequently introducing new classes over time. This is especially common in time series classification, such as the emergence of new disease classification in healthcare or the addition of new activities in human activity recognition. In such cases, a learning system is required to assimilate novel classes effectively while avoiding catastrophi… ▽ More Real-world environments are inherently non-stationary, frequently introducing new classes over time. This is especially common in time series classification, such as the emergence of new disease classification in healthcare or the addition of new activities in human activity recognition. In such cases, a learning system is required to assimilate novel classes effectively while avoiding catastrophic forgetting of the old ones, which gives rise to the Class-incremental Learning (CIL) problem. However, despite the encouraging progress in the image and language domains, CIL for time series data remains relatively understudied. Existing studies suffer from inconsistent experimental designs, necessitating a comprehensive evaluation and benchmarking of methods across a wide range of datasets. To this end, we first present an overview of the Time Series Class-incremental Learning (TSCIL) problem, highlight its unique challenges, and cover the advanced methodologies. Further, based on standardized settings, we develop a unified experimental framework that supports the rapid development of new algorithms, easy integration of new datasets, and standardization of the evaluation process. Using this framework, we conduct a comprehensive evaluation of various generic and time-series-specific CIL methods in both standard and privacy-sensitive scenarios. Our extensive experiments not only provide a standard baseline to support future research but also shed light on the impact of various design factors such as normalization layers or memory budget thresholds. Codes are available at https://github.com/zqiao11/TSCIL. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: Currently under review for KDD 2024 (ADS track)

arXiv:2402.02526 [pdf, other]

CompeteSMoE -- Effective Training of Sparse Mixture of Experts via Competition

Authors: Quang Pham, Giang Do, Huy Nguyen, TrungTin Nguyen, Chenghao Liu, Mina Sartipi, Binh T. Nguyen, Savitha Ramasamy, Xiaoli Li, Steven Hoi, Nhat Ho

Abstract: Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this… ▽ More Sparse mixture of experts (SMoE) offers an appealing solution to scale up the model complexity beyond the mean of increasing the network's depth or width. However, effective training of SMoE has proven to be challenging due to the representation collapse issue, which causes parameter redundancy and limited representation potentials. In this work, we propose a competition mechanism to address this fundamental challenge of representation collapse. By routing inputs only to experts with the highest neural response, we show that, under mild assumptions, competition enjoys the same convergence rate as the optimal estimator. We further propose CompeteSMoE, an effective and efficient algorithm to train large language models by deploying a simple router that predicts the competition outcomes. Consequently, CompeteSMoE enjoys strong performance gains from the competition routing policy while having low computation overheads. Our extensive empirical evaluations on two transformer architectures and a wide range of tasks demonstrate the efficacy, robustness, and scalability of CompeteSMoE compared to state-of-the-art SMoE strategies. △ Less

Submitted 4 February, 2024; originally announced February 2024.

arXiv:2312.17650 [pdf, other]

Grasping, Part Identification, and Pose Refinement in One Shot with a Tactile Gripper

Authors: Joyce Xin-Yan Lim, Quang-Cuong Pham

Abstract: The rise in additive manufacturing comes with unique opportunities and challenges. Rapid changes to part design and massive part customization distinctive to 3D-Print (3DP) can be easily achieved. Customized parts that are unique, yet exhibit similar features such as dental moulds, shoe insoles, or engine vanes could be industrially manufactured with 3DP. However, the opportunity for massive part… ▽ More The rise in additive manufacturing comes with unique opportunities and challenges. Rapid changes to part design and massive part customization distinctive to 3D-Print (3DP) can be easily achieved. Customized parts that are unique, yet exhibit similar features such as dental moulds, shoe insoles, or engine vanes could be industrially manufactured with 3DP. However, the opportunity for massive part customization comes with unique challenges for the existing production paradigm of robotics applications, as the current robotics paradigm for part identification and pose refinement is repetitive, where data-driven and object-dependent approaches are often used. Thus, a bottleneck exists in robotics applications for 3DP parts where massive customization is involved, as it is difficult for feature-based deep learning approaches to distinguish between similar parts such as shoe insoles belonging to different people. As such, we propose a method that augments patterns on 3DP parts so that grasping, part identification, and pose refinement can be executed in one shot with a tactile gripper. We also experimentally evaluate our approach from three perspectives, including real insertion tasks that mimic robotic sorting and packing, and achieved excellent classification results, a high insertion success rate of 95%, and a sub-millimeter pose refinement accuracy. △ Less

Submitted 29 December, 2023; originally announced December 2023.

Comments: 6 pages, 5 figures

arXiv:2312.13975 [pdf, other]

A Joint Communication and Computation Design for Semantic Wireless Communication with Probability Graph

Authors: Zhouxiang Zhao, Zhaohui Yang, Xu Gan, Quoc-Viet Pham, Chongwen Huang, Wei Xu, Zhaoyang Zhang

Abstract: In this paper, we delve into the challenge of optimizing joint communication and computation for semantic communication over wireless networks using a probability graph framework. In the considered model, the base station (BS) extracts the small-sized compressed semantic information through removing redundant messages based on the stored knowledge base. Specifically, the knowledge base is encapsul… ▽ More In this paper, we delve into the challenge of optimizing joint communication and computation for semantic communication over wireless networks using a probability graph framework. In the considered model, the base station (BS) extracts the small-sized compressed semantic information through removing redundant messages based on the stored knowledge base. Specifically, the knowledge base is encapsulated in a probability graph that encapsulates statistical relations. At the user side, the compressed information is accurately deduced using the same probability graph employed by the BS. While this approach introduces an additional computational overhead for semantic information extraction, it significantly curtails communication resource consumption by transmitting concise data. We derive both communication and computation cost models based on the inference process of the probability graph. Building upon these models, we introduce a joint communication and computation resource allocation problem aimed at minimizing the overall energy consumption of the network, while accounting for latency, power, and semantic constraints. To address this problem, we obtain a closed-form solution for transmission power under a fixed semantic compression ratio. Subsequently, we propose an efficient linear search-based algorithm to attain the optimal solution for the considered problem with low computational complexity. Simulation results underscore the effectiveness of our proposed system, showcasing notable improvements compared to conventional non-semantic schemes. △ Less

Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: arXiv admin note: substantial text overlap with arXiv:2310.00015

arXiv:2312.07035 [pdf, other]

HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts

Authors: Giang Do, Khiem Le, Quang Pham, TrungTin Nguyen, Thanh-Nam Doan, Bint T. Nguyen, Chenghao Liu, Savitha Ramasamy, Xiaoli Li, Steven Hoi

Abstract: By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from rando… ▽ More By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be sub-optimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces \HyperRout, which dynamically generates the router's parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of \HyperRouter compared to existing routing methods. Our implementation is publicly available at {\url{https://github.com/giangdip2410/HyperRouter}}. △ Less

Submitted 12 December, 2023; originally announced December 2023.

arXiv:2311.14762 [pdf, other]

The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024

Authors: Benjamin Kiefer, Lojze Žust, Matej Kristan, Janez Perš, Matija Teršek, Arnold Wiliem, Martin Messmer, Cheng-Yen Yang, Hsiang-Wei Huang, Zhongyu Jiang, Heng-Cheng Kuo, Jie Mei, Jenq-Neng Hwang, Daniel Stadler, Lars Sommer, Kaer Huang, Aiguo Zheng, Weitu Chong, Kanokphan Lertniphonphan, Jun Xie, Feng Chen, Jian Li, Zhepeng Wang, Luca Zedda, Andrea Loddo , et al. (24 additional authors not shown)

Abstract: The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obst… ▽ More The 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 addresses maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicles (USV). Three challenges categories are considered: (i) UAV-based Maritime Object Tracking with Re-identification, (ii) USV-based Maritime Obstacle Segmentation and Detection, (iii) USV-based Maritime Boat Tracking. The USV-based Maritime Obstacle Segmentation and Detection features three sub-challenges, including a new embedded challenge addressing efficicent inference on real-world embedded devices. This report offers a comprehensive overview of the findings from the challenges. We provide both statistical and qualitative analyses, evaluating trends from over 195 submissions. All datasets, evaluation code, and the leaderboard are available to the public at https://macvi.org/workshop/macvi24. △ Less

Submitted 23 November, 2023; originally announced November 2023.

Comments: Part of 2nd Workshop on Maritime Computer Vision (MaCVi) 2024 IEEE Xplore submission as part of WACV 2024

arXiv:2311.11096 [pdf, other]

On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation

Authors: Duy Minh Ho Nguyen, Tan Ngoc Pham, Nghiem Tuong Diep, Nghi Quoc Phan, Quang Pham, Vinh Tong, Binh T. Nguyen, Ngan Hoang Le, Nhat Ho, Pengtao Xie, Daniel Sonntag, Mathias Niepert

Abstract: Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for… ▽ More Constructing a robust model that can effectively generalize to test samples under distribution shifts remains a significant challenge in the field of medical imaging. The foundational models for vision and language, pre-trained on extensive sets of natural image and text data, have emerged as a promising approach. It showcases impressive learning abilities across different tasks with the need for only a limited amount of annotated samples. While numerous techniques have focused on developing better fine-tuning strategies to adapt these models for specific domains, we instead examine their robustness to domain shifts in the medical image segmentation task. To this end, we compare the generalization performance to unseen domains of various pre-trained models after being fine-tuned on the same in-distribution dataset and show that foundation-based models enjoy better robustness than other architectures. From here, we further developed a new Bayesian uncertainty estimation for frozen models and used them as an indicator to characterize the model's performance on out-of-distribution (OOD) data, proving particularly beneficial for real-world applications. Our experiments not only reveal the limitations of current indicators like accuracy on the line or agreement on the line commonly used in natural image applications but also emphasize the promise of the introduced Bayesian uncertainty. Specifically, lower uncertainty predictions usually tend to higher out-of-distribution (OOD) performance. △ Less

Submitted 18 November, 2023; originally announced November 2023.

Comments: Advances in Neural Information Processing Systems (NeurIPS) 2023, Workshop on robustness of zero/few-shot learning in foundation models

arXiv:2311.03669 [pdf, other]

Stable Modular Control via Contraction Theory for Reinforcement Learning

Authors: Bing Song, Jean-Jacques Slotine, Quang-Cuong Pham

Abstract: We propose a novel way to integrate control techniques with reinforcement learning (RL) for stability, robustness, and generalization: leveraging contraction theory to realize modularity in neural control, which ensures that combining stable subsystems can automatically preserve the stability. We realize such modularity via signal composition and dynamic decomposition. Signal composition creates t… ▽ More We propose a novel way to integrate control techniques with reinforcement learning (RL) for stability, robustness, and generalization: leveraging contraction theory to realize modularity in neural control, which ensures that combining stable subsystems can automatically preserve the stability. We realize such modularity via signal composition and dynamic decomposition. Signal composition creates the latent space, within which RL applies to maximizing rewards. Dynamic decomposition is realized by coordinate transformation that creates an auxiliary space, within which the latent signals are coupled in the way that their combination can preserve stability provided each signal, that is, each subsystem, has stable self-feedbacks. Leveraging modularity, the nonlinear stability problem is deconstructed into algebraically solvable ones, the stability of the subsystems in the auxiliary space, yielding linear constraints on the input gradients of control networks that can be as simple as switching the signs of network weights. This minimally invasive method for stability allows arguably easy integration into the modular neural architectures in machine learning, like hierarchical RL, and improves their performance. We demonstrate in simulation the necessity and the effectiveness of our method: the necessity for robustness and generalization, and the effectiveness in improving hierarchical RL for manipulation learning. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.02633 [pdf, other]

The Background Also Matters: Background-Aware Motion-Guided Objects Discovery

Authors: Sandra Kara, Hejer Ammar, Florian Chabot, Quoc-Cuong Pham

Abstract: Recent works have shown that objects discovery can largely benefit from the inherent motion information in video data. However, these methods lack a proper background processing, resulting in an over-segmentation of the non-object regions into random segments. This is a critical limitation given the unsupervised setting, where object segments and noise are not distinguishable. To address this limi… ▽ More Recent works have shown that objects discovery can largely benefit from the inherent motion information in video data. However, these methods lack a proper background processing, resulting in an over-segmentation of the non-object regions into random segments. This is a critical limitation given the unsupervised setting, where object segments and noise are not distinguishable. To address this limitation we propose BMOD, a Background-aware Motion-guided Objects Discovery method. Concretely, we leverage masks of moving objects extracted from optical flow and design a learning mechanism to extend them to the true foreground composed of both moving and static objects. The background, a complementary concept of the learned foreground class, is then isolated in the object discovery process. This enables a joint learning of the objects discovery task and the object/non-object separation. The conducted experiments on synthetic and real-world datasets show that integrating our background handling with various cutting-edge methods brings each time a considerable improvement. Specifically, we improve the objects discovery performance with a large margin, while establishing a strong baseline for object/non-object separation. △ Less

Submitted 5 November, 2023; originally announced November 2023.

Comments: accepted at WACV2024 (IEEE/CVF Winter conference on Applications of Computer Vision)

arXiv:2311.02415 [pdf, other]

Time-Division Based Integrated Sensing, Communication, and Computing in Integrated Satellite-Terrestrial Networks

Authors: Xiangming Zhu, Hua Wang, Zhaohui Yang, Quoc-Viet Pham

Abstract: In this paper, we investigate time-division based framework for integrated sensing, communication, and computing in integrated satellite-terrestrial networks. We consider a scenario, where Internet-of-Things devices on the ground operate with sensing and communication in a time-division manner, and can process the sensing results locally, at the edge, or in the cloud via the satellite communicatio… ▽ More In this paper, we investigate time-division based framework for integrated sensing, communication, and computing in integrated satellite-terrestrial networks. We consider a scenario, where Internet-of-Things devices on the ground operate with sensing and communication in a time-division manner, and can process the sensing results locally, at the edge, or in the cloud via the satellite communication link. Based on the proposed framework, we formulate a multi-dimensional optimization problem to maximize the utility performance of sensing, communication, and computing abilities. After decomposing the original optimization problem into two subproblems, we first derive the closed-form solution of the optimal task partitioning strategy for terrestrial users and satellite users. Then, we develop the joint subframe allocation and task partitioning strategy to optimize the overall performance, by means of which the Pareto optimal solutions can be obtained along the Pareto frontier. Extensive simulations are provided to demonstrated the effectiveness of the proposed strategy, which is 10% to 60% superior compared with the benchmarks. Also, the trade-off between the multidimensional resource and multi-functional performance is analyzed from the perspective of network design. △ Less

Submitted 4 November, 2023; originally announced November 2023.

arXiv:2310.10549 [pdf, other]

Applications of Distributed Machine Learning for the Internet-of-Things: A Comprehensive Survey

Authors: Mai Le, Thien Huynh-The, Tan Do-Duy, Thai-Hoc Vu, Won-Joo Hwang, Quoc-Viet Pham

Abstract: The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI… ▽ More The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI for better IoT services and applications. Therefore, existing AI-enabled IoT systems can be enhanced by implementing distributed machine learning (aka distributed learning) approaches. This work aims to provide a comprehensive survey on distributed learning for IoT services and applications in emerging networks. In particular, we first provide a background of machine learning and present a preliminary to typical distributed learning approaches, such as federated learning, multi-agent reinforcement learning, and distributed inference. Then, we provide an extensive review of distributed learning for critical IoT services (e.g., data sharing and computation offloading, localization, mobile crowdsensing, and security and privacy) and IoT applications (e.g., smart healthcare, smart grid, autonomous vehicle, aerial IoT networks, and smart industry). From the reviewed literature, we also present critical challenges of distributed learning for IoT and propose several promising solutions and research directions in this emerging area. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2310.07497 [pdf, other]

Sample-Driven Federated Learning for Energy-Efficient and Real-Time IoT Sensing

Authors: Minh Ngoc Luu, Minh-Duong Nguyen, Ebrahim Bedeer, Van Duc Nguyen, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham

Abstract: In the domain of Federated Learning (FL) systems, recent cutting-edge methods heavily rely on ideal conditions convergence analysis. Specifically, these approaches assume that the training datasets on IoT devices possess similar attributes to the global data distribution. However, this approach fails to capture the full spectrum of data characteristics in real-time sensing FL systems. In order to… ▽ More In the domain of Federated Learning (FL) systems, recent cutting-edge methods heavily rely on ideal conditions convergence analysis. Specifically, these approaches assume that the training datasets on IoT devices possess similar attributes to the global data distribution. However, this approach fails to capture the full spectrum of data characteristics in real-time sensing FL systems. In order to overcome this limitation, we suggest a new approach system specifically designed for IoT networks with real-time sensing capabilities. Our approach takes into account the generalization gap due to the user's data sampling process. By effectively controlling this sampling process, we can mitigate the overfitting issue and improve overall accuracy. In particular, We first formulate an optimization problem that harnesses the sampling process to concurrently reduce overfitting while maximizing accuracy. In pursuit of this objective, our surrogate optimization problem is adept at handling energy efficiency while optimizing the accuracy with high generalization. To solve the optimization problem with high complexity, we introduce an online reinforcement learning algorithm, named Sample-driven Control for Federated Learning (SCFL) built on the Soft Actor-Critic (A2C) framework. This enables the agent to dynamically adapt and find the global optima even in changing environments. By leveraging the capabilities of SCFL, our system offers a promising solution for resource allocation in FL systems with real-time sensing capabilities. △ Less

Submitted 11 October, 2023; originally announced October 2023.

Comments: 17 pages, 5 figures

MSC Class: 68-00 ACM Class: I.2.11

arXiv:2310.00911 [pdf, other]

Dynamic Manipulation of a Deformable Linear Object: Simulation and Learning

Authors: Qi Jing Chen, Timothy Bretl, Nghia Vuong, Quang-Cuong Pham

Abstract: We show that it is possible to learn an open-loop policy in simulation for the dynamic manipulation of a deformable linear object (DLO) -- e.g., a rope, wire, or cable -- that can be executed by a real robot without additional training. Our method is enabled by integrating an existing state-of-the-art DLO model (Discrete Elastic Rods) with MuJoCo, a robot simulator. We describe how this integratio… ▽ More We show that it is possible to learn an open-loop policy in simulation for the dynamic manipulation of a deformable linear object (DLO) -- e.g., a rope, wire, or cable -- that can be executed by a real robot without additional training. Our method is enabled by integrating an existing state-of-the-art DLO model (Discrete Elastic Rods) with MuJoCo, a robot simulator. We describe how this integration was done, check that validation results produced in simulation match what we expect from analysis of the physics, and apply policy optimization to train an open-loop policy from data collected only in simulation that uses a robot arm to fling a wire precisely between two obstacles. This policy achieves a success rate of 76.7% when executed by a real robot in hardware experiments without additional training on the real task. △ Less

Submitted 2 October, 2023; originally announced October 2023.

Comments: 7 pages, 8 figures

arXiv:2310.00015 [pdf, other]

Semantic Communication with Probability Graph: A Joint Communication and Computation Design

Authors: Zhouxiang Zhao, Zhaohui Yang, Quoc-Viet Pham, Qianqian Yang, Zhaoyang Zhang

Abstract: In this paper, we present a probability graph-based semantic information compression system for scenarios where the base station (BS) and the user share common background knowledge. We employ probability graphs to represent the shared knowledge between the communicating parties. During the transmission of specific text data, the BS first extracts semantic information from the text, which is repres… ▽ More In this paper, we present a probability graph-based semantic information compression system for scenarios where the base station (BS) and the user share common background knowledge. We employ probability graphs to represent the shared knowledge between the communicating parties. During the transmission of specific text data, the BS first extracts semantic information from the text, which is represented by a knowledge graph. Subsequently, the BS omits certain relational information based on the shared probability graph to reduce the data size. Upon receiving the compressed semantic data, the user can automatically restore missing information using the shared probability graph and predefined rules. This approach brings additional computational resource consumption while effectively reducing communication resource consumption. Considering the limitations of wireless resources, we address the problem of joint communication and computation resource allocation design, aiming at minimizing the total communication and computation energy consumption of the network while adhering to latency, transmit power, and semantic constraints. Simulation results demonstrate the effectiveness of the proposed system. △ Less

Submitted 5 October, 2023; v1 submitted 16 September, 2023; originally announced October 2023.

arXiv:2309.16219 [pdf, other]

Sensorless Estimation of Contact Using Deep-Learning for Human-Robot Interaction

Authors: Shilin Shan, Quang-Cuong Pham

Abstract: Physical human-robot interaction has been an area of interest for decades. Collaborative tasks, such as joint compliance, demand high-quality joint torque sensing. While external torque sensors are reliable, they come with the drawbacks of being expensive and vulnerable to impacts. To address these issues, studies have been conducted to estimate external torques using only internal signals, such a… ▽ More Physical human-robot interaction has been an area of interest for decades. Collaborative tasks, such as joint compliance, demand high-quality joint torque sensing. While external torque sensors are reliable, they come with the drawbacks of being expensive and vulnerable to impacts. To address these issues, studies have been conducted to estimate external torques using only internal signals, such as joint states and current measurements. However, insufficient attention has been given to friction hysteresis approximation, which is crucial for tasks involving extensive dynamic to static state transitions. In this paper, we propose a deep-learning-based method that leverages a novel long-term memory scheme to achieve dynamics identification, accurately approximating the static hysteresis. We also introduce modifications to the well-known Residual Learning architecture, retaining high accuracy while reducing inference time. The robustness of the proposed method is illustrated through a joint compliance and task compliance experiment. △ Less

Submitted 5 March, 2024; v1 submitted 28 September, 2023; originally announced September 2023.

Comments: Final version accepted to ICRA 2024, 7 pages

arXiv:2309.14587 [pdf, other]

Joint Communication and Computation Framework for Goal-Oriented Semantic Communication with Distortion Rate Resilience

Authors: Minh-Duong Nguyen, Quang-Vinh Do, Zhaohui Yang, Quoc-Viet Pham, Won-Joo Hwang

Abstract: Recent research efforts on semantic communication have mostly considered accuracy as a main problem for optimizing goal-oriented communication systems. However, these approaches introduce a paradox: the accuracy of artificial intelligence (AI) tasks should naturally emerge through training rather than being dictated by network constraints. Acknowledging this dilemma, this work introduces an innova… ▽ More Recent research efforts on semantic communication have mostly considered accuracy as a main problem for optimizing goal-oriented communication systems. However, these approaches introduce a paradox: the accuracy of artificial intelligence (AI) tasks should naturally emerge through training rather than being dictated by network constraints. Acknowledging this dilemma, this work introduces an innovative approach that leverages the rate-distortion theory to analyze distortions induced by communication and semantic compression, thereby analyzing the learning process. Specifically, we examine the distribution shift between the original data and the distorted data, thus assessing its impact on the AI model's performance. Founding upon this analysis, we can preemptively estimate the empirical accuracy of AI tasks, making the goal-oriented semantic communication problem feasible. To achieve this objective, we present the theoretical foundation of our approach, accompanied by simulations and experiments that demonstrate its effectiveness. The experimental results indicate that our proposed method enables accurate AI task performance while adhering to network constraints, establishing it as a valuable contribution to the field of signal processing. Furthermore, this work advances research in goal-oriented semantic communication and highlights the significance of data-driven approaches in optimizing the performance of intelligent systems. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 15 pages; 11 figures, 2 tables

MSC Class: 68T05 ACM Class: F.1.3

arXiv:2309.14053 [pdf, other]

Revisiting LARS for Large Batch Training Generalization of Neural Networks

Authors: Khoi Do, Duong Nguyen, Hoa Nguyen, Long Tran-Thanh, Nguyen-Hoang Tran, Quoc-Viet Pham

Abstract: This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. B… ▽ More This paper explores Large Batch Training techniques using layer-wise adaptive scaling ratio (LARS) across diverse settings, uncovering insights. LARS algorithms with warm-up tend to be trapped in sharp minimizers early on due to redundant ratio scaling. Additionally, a fixed steep decline in the latter phase restricts deep neural networks from effectively navigating early-phase sharp minimizers. Building on these findings, we propose Time Varying LARS (TVLARS), a novel algorithm that replaces warm-up with a configurable sigmoid-like function for robust training in the initial phase. TVLARS promotes gradient exploration early on, surpassing sharp optimizers and gradually transitioning to LARS for robustness in later phases. Extensive experiments demonstrate that TVLARS consistently outperforms LARS and LAMB in most cases, with up to 2\% improvement in classification scenarios. Notably, in all self-supervised learning cases, TVLARS dominates LARS and LAMB with performance improvements of up to 10\%. △ Less

Submitted 15 February, 2024; v1 submitted 25 September, 2023; originally announced September 2023.

arXiv:2309.12668 [pdf, other]

UWA360CAM: A 360$^{\circ}$ 24/7 Real-Time Streaming Camera System for Underwater Applications

Authors: Quan-Dung Pham, Yipeng Zhu, Tan-Sang Ha, K. H. Long Nguyen, Binh-Son Hua, Sai-Kit Yeung

Abstract: Omnidirectional camera is a cost-effective and information-rich sensor highly suitable for many marine applications and the ocean scientific community, encompassing several domains such as augmented reality, mapping, motion estimation, visual surveillance, and simultaneous localization and mapping. However, designing and constructing such a high-quality 360$^{\circ}$ real-time streaming camera sys… ▽ More Omnidirectional camera is a cost-effective and information-rich sensor highly suitable for many marine applications and the ocean scientific community, encompassing several domains such as augmented reality, mapping, motion estimation, visual surveillance, and simultaneous localization and mapping. However, designing and constructing such a high-quality 360$^{\circ}$ real-time streaming camera system for underwater applications is a challenging problem due to the technical complexity in several aspects including sensor resolution, wide field of view, power supply, optical design, system calibration, and overheating management. This paper presents a novel and comprehensive system that addresses the complexities associated with the design, construction, and implementation of a fully functional 360$^{\circ}$ real-time streaming camera system specifically tailored for underwater environments. Our proposed system, UWA360CAM, can stream video in real time, operate in 24/7, and capture 360$^{\circ}$ underwater panorama images. Notably, our work is the pioneering effort in providing a detailed and replicable account of this system. The experiments provide a comprehensive analysis of our proposed system. △ Less

Submitted 30 September, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.12543 [pdf, other]

Real-time Batched Distance Computation for Time-Optimal Safe Path Tracking

Authors: Shohei Fujii, Quang-Cuong Pham

Abstract: In human-robot collaboration, there has been a trade-off relationship between the speed of collaborative robots and the safety of human workers. In our previous paper, we introduced a time-optimal path tracking algorithm designed to maximize speed while ensuring safety for human workers. This algorithm runs in real-time and provides the safe and fastest control input for every cycle with respect t… ▽ More In human-robot collaboration, there has been a trade-off relationship between the speed of collaborative robots and the safety of human workers. In our previous paper, we introduced a time-optimal path tracking algorithm designed to maximize speed while ensuring safety for human workers. This algorithm runs in real-time and provides the safe and fastest control input for every cycle with respect to ISO standards. However, true optimality has not been achieved due to inaccurate distance computation resulting from conservative model simplification. To attain true optimality, we require a method that can compute distances 1. at many robot configurations to examine along a trajectory 2. in real-time for online robot control 3. as precisely as possible for optimal control. In this paper, we propose a batched, fast and precise distance checking method based on precomputed link-local SDFs. Our method can check distances for 500 waypoints along a trajectory within less than 1 millisecond using a GPU at runtime, making it suited for time-critical robotic control. Additionally, a neural approximation has been proposed to accelerate preprocessing by a factor of 2. Finally, we experimentally demonstrate that our method can navigate a 6-DoF robot earlier than a geometric-primitives-based distance checker in a dynamic and collaborative environment. △ Less

Submitted 5 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: 7 pages. Accepted to ICRA2024

arXiv:2309.12251 [pdf, other]

Planning Optimal Trajectories for Mobile Manipulators under End-effector Trajectory Continuity Constraint

Authors: Quang-Nam Nguyen, Quang-Cuong Pham

Abstract: Mobile manipulators have been employed in many applications that are traditionally performed by either multiple fixed-base robots or a large robotic system. This capability is enabled by the mobility of the mobile base. However, the mobile base also brings redundancy to the system, which makes mobile manipulator motion planning more challenging. In this paper, we tackle the mobile manipulator moti… ▽ More Mobile manipulators have been employed in many applications that are traditionally performed by either multiple fixed-base robots or a large robotic system. This capability is enabled by the mobility of the mobile base. However, the mobile base also brings redundancy to the system, which makes mobile manipulator motion planning more challenging. In this paper, we tackle the mobile manipulator motion planning problem under the end-effector trajectory continuity constraint in which the end-effector is required to traverse a continuous task-space trajectory (time-parametrized path), such as in mobile printing or spraying applications. Our method decouples the problem into: (1) planning an optimal base trajectory subject to geometric task constraints, end-effector trajectory continuity constraint, collision avoidance, and base velocity constraint; which ensures that (2) a manipulator trajectory is computed subsequently based on the obtained base trajectory. To validate our method, we propose a discrete optimal base trajectory planning algorithm to solve several mobile printing tasks in hardware experiment and simulations. △ Less

Submitted 6 March, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

Comments: Accepted for ICRA 2024

arXiv:2309.06006 [pdf, ps, other]

SoccerNet 2023 Challenges Results

Authors: Anthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliège, Jan Held, Carlos Hinojosa, Amir M. Mansourian, Pierre Miralles, Olivier Barnich, Christophe De Vleeschouwer, Alexandre Alahi, Bernard Ghanem, Marc Van Droogenbroeck, Abdullah Kamal, Adrien Maglo, Albert Clapés, Amr Abdelaziz, Artur Xarles, Astrid Orcesi, Atom Scott, Bin Liu, Byoungkwon Lim , et al. (77 additional authors not shown)

Abstract: The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, fo… ▽ More The SoccerNet 2023 challenges were the third annual video understanding challenges organized by the SoccerNet team. For this third edition, the challenges were composed of seven vision-based tasks split into three main themes. The first theme, broadcast video understanding, is composed of three high-level tasks related to describing events occurring in the video broadcasts: (1) action spotting, focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting, focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense video captioning, focusing on describing the broadcast with natural language and anchored timestamps. The second theme, field understanding, relates to the single task of (4) camera calibration, focusing on retrieving the intrinsic and extrinsic camera parameters from images. The third and last theme, player understanding, is composed of three low-level tasks related to extracting information about the players: (5) re-identification, focusing on retrieving the same players across multiple views, (6) multiple object tracking, focusing on tracking players and the ball through unedited video streams, and (7) jersey number recognition, focusing on recognizing the jersey number of players from tracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel, including new annotations and data, task (4) was enhanced with more data and annotations, and task (6) now focuses on end-to-end approaches. More information on the tasks, challenges, and leaderboards are available on https://www.soccer-net.org. Baselines and development kits can be found on https://github.com/SoccerNet. △ Less

Submitted 12 September, 2023; originally announced September 2023.

arXiv:2308.04953 [pdf, other]

Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation

Authors: Mai Le, Dinh Thai Hoang, Diep N. Nguyen, Won-Joo Hwang, Quoc-Viet Pham

Abstract: Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the fir… ▽ More Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the first time investigates a resource allocation problem in collaborative sensing-assisted sustainable FL (S2FL) networks with the goal of minimizing the total completion time. We investigate a practical harvesting-sensing-training-transmitting protocol in which energy-limited MDs first harvest energy from RF signals, use it to gain a reward for user participation, sense the training data from the environment, train the local models at MDs, and transmit the model updates to the server. The total completion time minimization problem of jointly optimizing power transfer, transmit power allocation, data sensing, bandwidth allocation, local model training, and data transmission is complicated due to the non-convex objective function, highly non-convex constraints, and strongly coupled variables. We propose a computationally-efficient path-following algorithm to obtain the optimal solution via the decomposition technique. In particular, inner convex approximations are developed for the resource allocation subproblem, and the subproblems are performed alternatively in an iterative fashion. Simulation results are provided to evaluate the effectiveness of the proposed S2FL algorithm in reducing the completion time up to 21.45% in comparison with other benchmark schemes. Further, we investigate an extension of our work from frequency division multiple access (FDMA) to non-orthogonal multiple access (NOMA) and show that NOMA can speed up the total completion time 8.36% on average of the considered FL system. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.03415 [pdf, other]

End-to-End Evaluation for Low-Latency Simultaneous Speech Translation

Authors: Christian Huber, Tu Anh Dinh, Carlos Mullov, Ngoc Quan Pham, Thai Binh Nguyen, Fabian Retkowski, Stefan Constantin, Enes Yavuz Ugan, Danni Liu, Zhaolin Li, Sai Koneru, Jan Niehues, Alexander Waibel

Abstract: The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work… ▽ More The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user. △ Less

Submitted 23 October, 2023; v1 submitted 7 August, 2023; originally announced August 2023.

arXiv:2308.02677 [pdf]

Metaverse for Industry 5.0 in NextG Communications: Potential Applications and Future Challenges

Authors: B. Prabadevi, N. Deepa, Nancy Victor, Thippa Reddy Gadekallu, Praveen Kumar Reddy Maddikunta, Gokul Yenduri, Wei Wang, Quoc Viet Pham, Thien Huynh-The, Madhusanka Liyanage

Abstract: With the advent of new technologies and endeavors for automation in almost all day-to-day activities, the recent discussions on the metaverse life have a greater expectation. Furthermore, we are in the era of the fifth industrial revolution, where machines and humans collaborate to maximize productivity with the effective utilization of human intelligence and other resources. Hence, Industry 5.0 i… ▽ More With the advent of new technologies and endeavors for automation in almost all day-to-day activities, the recent discussions on the metaverse life have a greater expectation. Furthermore, we are in the era of the fifth industrial revolution, where machines and humans collaborate to maximize productivity with the effective utilization of human intelligence and other resources. Hence, Industry 5.0 in the metaverse may have tremendous technological integration for a more immersive experience and enhanced communication.These technological amalgamations are suitable for the present environment and entirely different from the previous perception of virtual technologies. This work presents a comprehensive review of the applications of the metaverse in Industry 5.0 (so-called industrial metaverse). In particular, we first provide a preliminary to the metaverse and industry 5.0 and discuss key enabling technologies of the industrial metaverse, including virtual and augmented reality, 3D modeling, artificial intelligence, edge computing, digital twin, blockchain, and 6G communication networks. This work then explores diverse metaverse applications in Industry 5.0 vertical domains like Society 5.0, agriculture, supply chain management, healthcare, education, and transportation. A number of research projects are presented to showcase the conceptualization and implementation of the industrial metaverse. Furthermore, various challenges in realizing the industrial metaverse, feasible solutions, and future directions for further research have been presented. △ Less

Submitted 31 July, 2023; originally announced August 2023.

Comments: Submitted for peer review

arXiv:2307.10705 [pdf, other]

doi 10.1109/MAPR59823.2023.10288710

TwinLiteNet: An Efficient and Lightweight Model for Driveable Area and Lane Segmentation in Self-Driving Cars

Authors: Quang Huy Che, Dinh Phuc Nguyen, Minh Quan Pham, Duc Khai Lam

Abstract: Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles.… ▽ More Semantic segmentation is a common task in autonomous driving to understand the surrounding environment. Driveable Area Segmentation and Lane Detection are particularly important for safe and efficient navigation on the road. However, original semantic segmentation models are computationally expensive and require high-end hardware, which is not feasible for embedded systems in autonomous vehicles. This paper proposes a lightweight model for the driveable area and lane line segmentation. TwinLiteNet is designed cheaply but achieves accurate and efficient segmentation results. We evaluate TwinLiteNet on the BDD100K dataset and compare it with modern models. Experimental results show that our TwinLiteNet performs similarly to existing approaches, requiring significantly fewer computational resources. Specifically, TwinLiteNet achieves a mIoU score of 91.3% for the Drivable Area task and 31.08% IoU for the Lane Detection task with only 0.4 million parameters and achieves 415 FPS on GPU RTX A5000. Furthermore, TwinLiteNet can run in real-time on embedded devices with limited computing power, especially since it achieves 60FPS on Jetson Xavier NX, making it an ideal solution for self-driving vehicles. Code is available: url{https://github.com/chequanghuy/TwinLiteNet}. △ Less

Submitted 13 December, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted by MAPR 2023

arXiv:2306.08537 [pdf, other]

VIBR: Learning View-Invariant Value Functions for Robust Visual Control

Authors: Tom Dupuis, Jaonary Rabarisoa, Quoc-Cuong Pham, David Filliat

Abstract: End-to-end reinforcement learning on images showed significant progress in the recent years. Data-based approach leverage data augmentation and domain randomization while representation learning methods use auxiliary losses to learn task-relevant features. Yet, reinforcement still struggles in visually diverse environments full of distractions and spurious noise. In this work, we tackle the proble… ▽ More End-to-end reinforcement learning on images showed significant progress in the recent years. Data-based approach leverage data augmentation and domain randomization while representation learning methods use auxiliary losses to learn task-relevant features. Yet, reinforcement still struggles in visually diverse environments full of distractions and spurious noise. In this work, we tackle the problem of robust visual control at its core and present VIBR (View-Invariant Bellman Residuals), a method that combines multi-view training and invariant prediction to reduce out-of-distribution (OOD) generalization gap for RL based visuomotor control. Our model-free approach improve baselines performances without the need of additional representation learning objectives and with limited additional computational cost. We show that VIBR outperforms existing methods on complex visuo-motor control environment with high visual perturbation. Our approach achieves state-of the-art results on the Distracting Control Suite benchmark, a challenging benchmark still not solved by current methods, where we evaluate the robustness to a number of visual perturbators, as well as OOD generalization and extrapolation capabilities. △ Less

Submitted 14 June, 2023; originally announced June 2023.

Journal ref: Proceedings of The 2nd Conference on Lifelong Learning Agents, PMLR 232 (2023) 658-682

arXiv:2306.06679 [pdf, other]

Reinforcement Learning with Parameterized Manipulation Primitives for Robotic Assembly

Authors: Nghia Vuong, Quang-Cuong Pham

Abstract: A common theme in robot assembly is the adoption of Manipulation Primitives as the atomic motion to compose assembly strategy, typically in the form of a state machine or a graph. While this approach has shown great performance and robustness in increasingly complex assembly tasks, the state machine has to be engineered manually in most cases. Such hard-coded strategies will fail to handle unexpec… ▽ More A common theme in robot assembly is the adoption of Manipulation Primitives as the atomic motion to compose assembly strategy, typically in the form of a state machine or a graph. While this approach has shown great performance and robustness in increasingly complex assembly tasks, the state machine has to be engineered manually in most cases. Such hard-coded strategies will fail to handle unexpected situations that are not considered in the design. To address this issue, we propose to find dynamics sequence of manipulation primitives through Reinforcement Learning. Leveraging parameterized manipulation primitives, the proposed method greatly improves both assembly performance and sample efficiency of Reinforcement Learning compared to a previous work using non-parameterized manipulation primitives. In practice, our method achieves good zero-shot sim-to-real performance on high-precision peg insertion tasks with different geometry, clearance, and material. △ Less

Submitted 11 June, 2023; originally announced June 2023.

Comments: arXiv admin note: text overlap with arXiv:2011.00778

arXiv:2306.06675 [pdf, other]

Contact Reduction with Bounded Stiffness for Robust Sim-to-Real Transfer of Robot Assembly

Authors: Nghia Vuong, Quang-Cuong Pham

Abstract: In sim-to-real Reinforcement Learning (RL), a policy is trained in a simulated environment and then deployed on the physical system. The main challenge of sim-to-real RL is to overcome the reality gap - the discrepancies between the real world and its simulated counterpart. Using general geometric representations, such as convex decomposition, triangular mesh, signed distance field can improve sim… ▽ More In sim-to-real Reinforcement Learning (RL), a policy is trained in a simulated environment and then deployed on the physical system. The main challenge of sim-to-real RL is to overcome the reality gap - the discrepancies between the real world and its simulated counterpart. Using general geometric representations, such as convex decomposition, triangular mesh, signed distance field can improve simulation fidelity, and thus potentially narrow the reality gap. Common to these approaches is that many contact points are generated for geometrically-complex objects, which slows down simulation and may cause numerical instability. Contact reduction methods address these issues by limiting the number of contact points, but the validity of these methods for sim-to-real RL has not been confirmed. In this paper, we present a contact reduction method with bounded stiffness to improve the simulation accuracy. Our experiments show that the proposed method critically enables training RL policy for a tight-clearance double pin insertion task and successfully deploying the policy on a rigid, position-controlled physical robot. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2306.05197 [pdf, other]

doi 10.1109/IROS55552.2023.10342287

Time-Optimal Path Tracking with ISO Safety Guarantees

Authors: Shohei Fujii, Quang-Cuong Pham

Abstract: One way of ensuring operator's safety during human-robot collaboration is through Speed and Separation Monitoring (SSM), as defined in ISO standard ISO/TS 15066. In general, it is impossible to avoid all human-robot collisions: consider for instance the case when the robot does not move at all, a human operator can still collide with it by hitting it of her own voluntary motion. In the SSM framewo… ▽ More One way of ensuring operator's safety during human-robot collaboration is through Speed and Separation Monitoring (SSM), as defined in ISO standard ISO/TS 15066. In general, it is impossible to avoid all human-robot collisions: consider for instance the case when the robot does not move at all, a human operator can still collide with it by hitting it of her own voluntary motion. In the SSM framework, it is possible however to minimize harm by requiring this: \emph{if} a collision ever occurs, then the robot must be in a \emph{stationary state} (all links have zero velocity) at the time instant of the collision. In this paper, we propose a time-optimal control policy based on Time-Optimal Path Parameterization (TOPP) to guarantee such a behavior. Specifically, we show that: for any robot motion that is strictly faster than the motion recommended by our policy, there exists a human motion that results in a collision with the robot in a non-stationary state. Correlatively, we show, in simulation, that our policy is strictly less conservative than state-of-the-art safe robot control methods. Additionally, we propose a parallelization method to reduce the computation time of our pre-computation phase (down to 0.5 sec, practically), which enables the whole pipeline (including the pre-computation) to be executed at runtime, nearly in real-time. Finally, we demonstrate the application of our method in a scenario: time-optimal, safe control of a 6-dof industrial robot. △ Less

Submitted 12 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 8 pages, accepted to IROS 2023

arXiv:2305.17345 [pdf, other]

doi 10.1109/ICRA48891.2023.10161293

Task-Space Clustering for Mobile Manipulator Task Sequencing

Authors: Quang-Nam Nguyen, Nicholas Adrian, Quang-Cuong Pham

Abstract: Mobile manipulators have gained attention for the potential in performing large-scale tasks which are beyond the reach of fixed-base manipulators. The Robotic Task Sequencing Problem for mobile manipulators often requires optimizing the motion sequence of the robot to visit multiple targets while reducing the number of base placements. A two-step approach to this problem is clustering the task-spa… ▽ More Mobile manipulators have gained attention for the potential in performing large-scale tasks which are beyond the reach of fixed-base manipulators. The Robotic Task Sequencing Problem for mobile manipulators often requires optimizing the motion sequence of the robot to visit multiple targets while reducing the number of base placements. A two-step approach to this problem is clustering the task-space into clusters of targets before sequencing the robot motion. In this paper, we propose a task-space clustering method which formulates the clustering step as a Set Cover Problem using bipartite graph and reachability analysis, then solves it to obtain the minimum number of target clusters with corresponding base placements. We demonstrated the practical usage of our method in a mobile drilling experiment containing hundreds of targets. Multiple simulations were conducted to benchmark the algorithm and also showed that our proposed method found, in practical time, better solutions than the existing state-of-the-art methods. △ Less

Submitted 26 May, 2023; originally announced May 2023.

arXiv:2304.11790 [pdf, other]

Adaptive-saturated RNN: Remember more with less instability

Authors: Khoi Minh Nguyen-Duy, Quang Pham, Binh T. Nguyen

Abstract: Orthogonal parameterization is a compelling solution to the vanishing gradient problem (VGP) in recurrent neural networks (RNNs). With orthogonal parameters and non-saturated activation functions, gradients in such models are constrained to unit norms. On the other hand, although the traditional vanilla RNNs are seen to have higher memory capacity, they suffer from the VGP and perform badly in man… ▽ More Orthogonal parameterization is a compelling solution to the vanishing gradient problem (VGP) in recurrent neural networks (RNNs). With orthogonal parameters and non-saturated activation functions, gradients in such models are constrained to unit norms. On the other hand, although the traditional vanilla RNNs are seen to have higher memory capacity, they suffer from the VGP and perform badly in many applications. This work proposes Adaptive-Saturated RNNs (asRNN), a variant that dynamically adjusts its saturation level between the two mentioned approaches. Consequently, asRNN enjoys both the capacity of a vanilla RNN and the training stability of orthogonal RNNs. Our experiments show encouraging results of asRNN on challenging sequence learning benchmarks compared to several strong competitors. The research code is accessible at https://github.com/ndminhkhoi46/asRNN/. △ Less

Submitted 23 April, 2023; originally announced April 2023.

Comments: 8 pages, 2 figures, 5 tables, ICLR 2023 Tiny Paper Track

ACM Class: I.2

Journal ref: ICLR 2023 Tiny Paper Track

arXiv:2304.00524 [pdf, other]

A Survey on Federated Learning for the Healthcare Metaverse: Concepts, Applications, Challenges, and Future Directions

Authors: Ali Kashif Bashir, Nancy Victor, Sweta Bhattacharya, Thien Huynh-The, Rajeswari Chengoden, Gokul Yenduri, Praveen Kumar Reddy Maddikunta, Quoc-Viet Pham, Thippa Reddy Gadekallu, Madhusanka Liyanage

Abstract: Recent technological advancements have considerately improved healthcare systems to provide various intelligent healthcare services and improve the quality of life. Federated learning (FL), a new branch of artificial intelligence (AI), opens opportunities to deal with privacy issues in healthcare systems and exploit data and computing resources available at distributed devices. Additionally, the M… ▽ More Recent technological advancements have considerately improved healthcare systems to provide various intelligent healthcare services and improve the quality of life. Federated learning (FL), a new branch of artificial intelligence (AI), opens opportunities to deal with privacy issues in healthcare systems and exploit data and computing resources available at distributed devices. Additionally, the Metaverse, through integrating emerging technologies, such as AI, cloud edge computing, Internet of Things (IoT), blockchain, and semantic communications, has transformed many vertical domains in general and the healthcare sector in particular. Obviously, FL shows many benefits and provides new opportunities for conventional and Metaverse healthcare, motivating us to provide a survey on the usage of FL for Metaverse healthcare systems. First, we present preliminaries to IoT-based healthcare systems, FL in conventional healthcare, and Metaverse healthcare. The benefits of FL in Metaverse healthcare are then discussed, from improved privacy and scalability, better interoperability, better data management, and extra security to automation and low-latency healthcare services. Subsequently, we discuss several applications pertaining to FL-enabled Metaverse healthcare, including medical diagnosis, patient monitoring, medical education, infectious disease, and drug discovery. Finally, we highlight significant challenges and potential solutions toward the realization of FL in Metaverse healthcare. △ Less

Submitted 4 April, 2023; v1 submitted 2 April, 2023; originally announced April 2023.

Comments: Submitted to peer review

arXiv:2303.18162 [pdf, other]

A Multiple Choices Reading Comprehension Corpus for Vietnamese Language Education

Authors: Son T. Luu, Khoi Trong Hoang, Tuong Quang Pham, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen

Abstract: Machine reading comprehension has been an interesting and challenging task in recent years, with the purpose of extracting useful information from texts. To attain the computer ability to understand the reading text and answer relevant information, we introduce ViMMRC 2.0 - an extension of the previous ViMMRC for the task of multiple-choice reading comprehension in Vietnamese Textbooks which conta… ▽ More Machine reading comprehension has been an interesting and challenging task in recent years, with the purpose of extracting useful information from texts. To attain the computer ability to understand the reading text and answer relevant information, we introduce ViMMRC 2.0 - an extension of the previous ViMMRC for the task of multiple-choice reading comprehension in Vietnamese Textbooks which contain the reading articles for students from Grade 1 to Grade 12. This dataset has 699 reading passages which are prose and poems, and 5,273 questions. The questions in the new dataset are not fixed with four options as in the previous version. Moreover, the difficulty of questions is increased, which challenges the models to find the correct choice. The computer must understand the whole context of the reading passage, the question, and the content of each choice to extract the right answers. Hence, we propose the multi-stage approach that combines the multi-step attention network (MAN) with the natural language inference (NLI) task to enhance the performance of the reading comprehension model. Then, we compare the proposed methodology with the baseline BERTology models on the new dataset and the ViMMRC 1.0. Our multi-stage models achieved 58.81% by Accuracy on the test set, which is 5.34% better than the highest BERTology models. From the results of the error analysis, we found the challenge of the reading comprehension models is understanding the implicit context in texts and linking them together in order to find the correct answers. Finally, we hope our new dataset will motivate further research in enhancing the language understanding ability of computers in the Vietnamese language. △ Less

Submitted 31 March, 2023; originally announced March 2023.

arXiv:2303.09115 [pdf, other]

Learning for Amalgamation: A Multi-Source Transfer Learning Framework For Sentiment Classification

Authors: Cuong V. Nguyen, Khiem H. Le, Anh M. Tran, Quang H. Pham, Binh T. Nguyen

Abstract: Transfer learning plays an essential role in Deep Learning, which can remarkably improve the performance of the target domain, whose training data is not sufficient. Our work explores beyond the common practice of transfer learning with a single pre-trained model. We focus on the task of Vietnamese sentiment classification and propose LIFA, a framework to learn a unified embedding from several pre… ▽ More Transfer learning plays an essential role in Deep Learning, which can remarkably improve the performance of the target domain, whose training data is not sufficient. Our work explores beyond the common practice of transfer learning with a single pre-trained model. We focus on the task of Vietnamese sentiment classification and propose LIFA, a framework to learn a unified embedding from several pre-trained models. We further propose two more LIFA variants that encourage the pre-trained models to either cooperate or compete with one another. Studying these variants sheds light on the success of LIFA by showing that sharing knowledge among the models is more beneficial for transfer learning. Moreover, we construct the AISIA-VN-Review-F dataset, the first large-scale Vietnamese sentiment classification database. We conduct extensive experiments on the AISIA-VN-Review-F and existing benchmarks to demonstrate the efficacy of LIFA compared to other techniques. To contribute to the Vietnamese NLP research, we publish our source code and datasets to the research community upon acceptance. △ Less

Submitted 16 March, 2023; originally announced March 2023.

Comments: Information Sciences

arXiv:2303.04518 [pdf, other]

Monte-Carlo Tree Search with Prioritized Node Expansion for Multi-Goal Task Planning

Authors: Kai Pfeiffer, Leonardo Edgar, Quang-Cuong Pham

Abstract: Symbolic task planning for robots is computationally challenging due to the combinatorial complexity of the possible action space. This fact is amplified if there are several sub-goals to be achieved due to the increased length of the action sequences. In this work, we propose a multi-goal symbolic task planner for deterministic decision processes based on Monte Carlo Tree Search. We augment the a… ▽ More Symbolic task planning for robots is computationally challenging due to the combinatorial complexity of the possible action space. This fact is amplified if there are several sub-goals to be achieved due to the increased length of the action sequences. In this work, we propose a multi-goal symbolic task planner for deterministic decision processes based on Monte Carlo Tree Search. We augment the algorithm by prioritized node expansion which prioritizes nodes that already have fulfilled some sub-goals. Due to its linear complexity in the number of sub-goals, our algorithm is able to identify symbolic action sequences of 145 elements to reach the desired goal state with up to 48 sub-goals while the search tree is limited to under 6500 nodes. We use action reduction based on a kinematic reachability criterion to further ease computational complexity. We combine our algorithm with object localization and motion planning and apply it to a real-robot demonstration with two manipulators in an industrial bearing inspection setting. △ Less

Submitted 24 July, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.04516 [pdf, other]

Time-Optimal Control via Heaviside Step-Function Approximation

Authors: Kai Pfeiffer, Quang-Cuong Pham

Abstract: Least-squares programming is a popular tool in robotics due to its simplicity and availability of open-source solvers. However, certain problems like sparse programming in the $\ell_0$- or $\ell_1$-norm for time-optimal control are not equivalently solvable. In this work, we propose a non-linear hierarchical least-squares programming (NL-HLSP) for time-optimal control of non-linear discrete dynami… ▽ More Least-squares programming is a popular tool in robotics due to its simplicity and availability of open-source solvers. However, certain problems like sparse programming in the $\ell_0$- or $\ell_1$-norm for time-optimal control are not equivalently solvable. In this work, we propose a non-linear hierarchical least-squares programming (NL-HLSP) for time-optimal control of non-linear discrete dynamic systems. We use a continuous approximation of the heaviside step function with an additional term that avoids vanishing gradients. We use a simple discretization method by keeping states and controls piece-wise constant between discretization steps. This way, we obtain a comparatively easily implementable NL-HLSP in contrast to direct transcription approaches of optimal control. We show that the NL-HLSP indeed recovers the discrete time-optimal control in the limit for resting goal points. We confirm the results in simulation for linear and non-linear control scenarios. △ Less

Submitted 9 October, 2023; v1 submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.03616 [pdf, other]

doi 10.1109/LRA.2023.3296943

Geometry-Aware Coverage Path Planning for Depowdering on Complex 3D Surfaces

Authors: Van-Thach Do, Quang-Cuong Pham

Abstract: This paper presents a new approach to obtaining nearly complete coverage paths (CP) with low overlapping on 3D general surfaces using mesh models. The CP is obtained by segmenting the mesh model into a given number of clusters using constrained centroidal Voronoi tessellation (CCVT) and finding the shortest path from cluster centroids using the geodesic metric efficiently. We introduce a new cost… ▽ More This paper presents a new approach to obtaining nearly complete coverage paths (CP) with low overlapping on 3D general surfaces using mesh models. The CP is obtained by segmenting the mesh model into a given number of clusters using constrained centroidal Voronoi tessellation (CCVT) and finding the shortest path from cluster centroids using the geodesic metric efficiently. We introduce a new cost function to harmoniously achieve uniform areas of the obtained clusters and a restriction on the variation of triangle normals during the construction of CCVTs. The obtained clusters can be used to construct high-quality viewpoints (VP) for visual coverage tasks. Here, we utilize the planned VPs as cleaning configurations to perform residual powder removal in additive manufacturing using manipulator robots. The self-occlusion of VPs and ensuring collision-free robot configurations are addressed by integrating a proposed optimization-based strategy to find a set of candidate rays for each VP into the motion planning phase. CP planning benchmarks and physical experiments are conducted to demonstrate the effectiveness of the proposed approach. We show that our approach can compute the CPs and VPs of various mesh models with a massive number of triangles within a reasonable time. △ Less

Submitted 7 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

Comments: 8 pages, 8 figures

Journal ref: IEEE ROBOTICS AND AUTOMATION LETTERS, VOL. 8, NO. 9, SEPTEMBER 2023

arXiv:2301.13413 [pdf, other]

doi 10.1109/LRA.2023.3341770

Fine Robotic Manipulation without Force/Torque Sensor

Authors: Shilin Shan, Quang-Cuong Pham

Abstract: Force Sensing and Force Control are essential to many industrial applications. Typically, a 6-axis Force/Torque (F/T) sensor is mounted between the robot's wrist and the end-effector in order to measure the forces and torques exerted by the environment onto the robot (the external wrench). Although a typical 6-axis F/T sensor can provide highly accurate measurements, it is expensive and vulnerable… ▽ More Force Sensing and Force Control are essential to many industrial applications. Typically, a 6-axis Force/Torque (F/T) sensor is mounted between the robot's wrist and the end-effector in order to measure the forces and torques exerted by the environment onto the robot (the external wrench). Although a typical 6-axis F/T sensor can provide highly accurate measurements, it is expensive and vulnerable to drift and external impacts. Existing methods aiming at estimating the external wrench using only the robot's internal signals are limited in scope: for example, wrench estimation accuracy was mostly validated in free-space motions and simple contacts as opposed to tasks like assembly that require high-precision force control. Here we present a Neural Network based method and argue that by devoting particular attention to the training data structure, it is possible to accurately estimate the external wrench in a wide range of scenarios based solely on internal signals. As an illustration, we demonstrate a pin insertion experiment with 100-micron clearance and a hand-guiding experiment, both performed without external F/T sensors or joint torque sensors. Our result opens the possibility of equipping the existing 2.7 million industrial robots with Force Sensing and Force Control capabilities without any additional hardware. △ Less

Submitted 5 March, 2024; v1 submitted 31 January, 2023; originally announced January 2023.

Comments: Accepted to Robotics and Automation Letters (RA-L), 8 pages

arXiv:2301.00912 [pdf, ps, other]

Distributed Machine Learning for UAV Swarms: Computing, Sensing, and Semantics

Authors: Yahao Ding, Zhaohui Yang, Quoc-Viet Pham, Zhaoyang Zhang, Mohammad Shikh-Bahaei

Abstract: Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this su… ▽ More Unmanned aerial vehicle (UAV) swarms are considered as a promising technique for next-generation communication networks due to their flexibility, mobility, low cost, and the ability to collaboratively and autonomously provide services. Distributed learning (DL) enables UAV swarms to intelligently provide communication services, multi-directional remote surveillance, and target tracking. In this survey, we first introduce several popular DL algorithms such as federated learning (FL), multi-agent Reinforcement Learning (MARL), distributed inference, and split learning, and present a comprehensive overview of their applications for UAV swarms, such as trajectory design, power control, wireless resource allocation, user assignment, perception, and satellite communications. Then, we present several state-of-the-art applications of UAV swarms in wireless communication systems, such us reconfigurable intelligent surface (RIS), virtual reality (VR), semantic communications, and discuss the problems and challenges that DL-enabled UAV swarms can solve in these applications. Finally, we describe open problems of using DL in UAV swarms and future research directions of DL enabled UAV swarms. In summary, this survey provides a comprehensive survey of various DL applications for UAV swarms in extensive scenarios. △ Less

Submitted 2 January, 2023; originally announced January 2023.

arXiv:2212.10124 [pdf, other]

Image Segmentation-based Unsupervised Multiple Objects Discovery

Authors: Sandra Kara, Hejer Ammar, Florian Chabot, Quoc-Cuong Pham

Abstract: Unsupervised object discovery aims to localize objects in images, while removing the dependence on annotations required by most deep learning-based methods. To address this problem, we propose a fully unsupervised, bottom-up approach, for multiple objects discovery. The proposed approach is a two-stage framework. First, instances of object parts are segmented by using the intra-image similarity be… ▽ More Unsupervised object discovery aims to localize objects in images, while removing the dependence on annotations required by most deep learning-based methods. To address this problem, we propose a fully unsupervised, bottom-up approach, for multiple objects discovery. The proposed approach is a two-stage framework. First, instances of object parts are segmented by using the intra-image similarity between self-supervised local features. The second step merges and filters the object parts to form complete object instances. The latter is performed by two CNN models that capture semantic information on objects from the entire dataset. We demonstrate that the pseudo-labels generated by our method provide a better precision-recall trade-off than existing single and multiple objects discovery methods. In particular, we provide state-of-the-art results for both unsupervised class-agnostic object detection and unsupervised image segmentation. △ Less

Submitted 20 December, 2022; originally announced December 2022.

Comments: WACV 2023

Showing 1–50 of 171 results for author: Pham, Q