-
Active Sensing Strategy: Multi-Modal, Multi-Robot Source Localization and Mapping in Real-World Settings with Fixed One-Way Switching
Authors:
Vu Phi Tran,
Asanka G. Perera,
Matthew A. Garratt,
Kathryn Kasmarik,
Sreenatha G. Anavatti
Abstract:
This paper introduces a state-machine model for a multi-modal, multi-robot environmental sensing algorithm tailored to dynamic real-world settings. The algorithm uniquely combines two exploration strategies for gas source localization and mapping: (1) an initial exploration phase using multi-robot coverage path planning with variable formations for early gas field indication; and (2) a subsequent…
▽ More
This paper introduces a state-machine model for a multi-modal, multi-robot environmental sensing algorithm tailored to dynamic real-world settings. The algorithm uniquely combines two exploration strategies for gas source localization and mapping: (1) an initial exploration phase using multi-robot coverage path planning with variable formations for early gas field indication; and (2) a subsequent active sensing phase employing multi-robot swarms for precise field estimation. The state machine governs the transition between these two phases. During exploration, a coverage path maximizes the visited area while measuring gas concentration and estimating the initial gas field at predefined sample times. In the active sensing phase, mobile robots in a swarm collaborate to select the next measurement point, ensuring coordinated and efficient sensing. System validation involves hardware-in-the-loop experiments and real-time tests with a radio source emulating a gas field. The approach is benchmarked against state-of-the-art single-mode active sensing and gas source localization techniques. Evaluation highlights the multi-modal switching approach's ability to expedite convergence, navigate obstacles in dynamic environments, and significantly enhance gas source location accuracy. The findings show a 43% reduction in turnaround time, a 50% increase in estimation accuracy, and improved robustness of multi-robot environmental sensing in cluttered scenarios without collisions, surpassing the performance of conventional active sensing strategies.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Building a temperature forecasting model for the city with the regression neural network (RNN)
Authors:
Nguyen Phuc Tran,
Duy Thanh Tran,
Thi Thuy Nga Duong
Abstract:
In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building…
▽ More
In recent years, a study by environmental organizations in the world and Vietnam shows that weather change is quite complex. global warming has become a serious problem in the modern world, which is a concern for scientists. last century, it was difficult to forecast the weather due to missing weather monitoring stations and technological limitations. this made it hard to collect data for building predictive models to make accurate simulations. in Vietnam, research on weather forecast models is a recent development, having only begun around 2000. along with advancements in computer science, mathematical models are being built and applied with machine learning techniques to create more accurate and reliable predictive models. this article will summarize the research and solutions for applying recurrent neural networks to forecast urban temperatures.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
VOODOO XP: Expressive One-Shot Head Reenactment for VR Telepresence
Authors:
Phong Tran,
Egor Zakharov,
Long-Nhat Ho,
Liwen Hu,
Adilbek Karmanov,
Aviral Agarwal,
McLean Goldwhite,
Ariana Bermudez Venegas,
Anh Tuan Tran,
Hao Li
Abstract:
We introduce VOODOO XP: a 3D-aware one-shot head reenactment method that can generate highly expressive facial expressions from any input driver video and a single 2D portrait. Our solution is real-time, view-consistent, and can be instantly used without calibration or fine-tuning. We demonstrate our solution on a monocular video setting and an end-to-end VR telepresence system for two-way communi…
▽ More
We introduce VOODOO XP: a 3D-aware one-shot head reenactment method that can generate highly expressive facial expressions from any input driver video and a single 2D portrait. Our solution is real-time, view-consistent, and can be instantly used without calibration or fine-tuning. We demonstrate our solution on a monocular video setting and an end-to-end VR telepresence system for two-way communication. Compared to 2D head reenactment methods, 3D-aware approaches aim to preserve the identity of the subject and ensure view-consistent facial geometry for novel camera poses, which makes them suitable for immersive applications. While various facial disentanglement techniques have been introduced, cutting-edge 3D-aware neural reenactment techniques still lack expressiveness and fail to reproduce complex and fine-scale facial expressions. We present a novel cross-reenactment architecture that directly transfers the driver's facial expressions to transformer blocks of the input source's 3D lifting module. We show that highly effective disentanglement is possible using an innovative multi-stage self-supervision approach, which is based on a coarse-to-fine strategy, combined with an explicit face neutralization and 3D lifted frontalization during its initial training stage. We further integrate our novel head reenactment solution into an accessible high-fidelity VR telepresence system, where any person can instantly build a personalized neural head avatar from any photo and bring it to life using the headset. We demonstrate state-of-the-art performance in terms of expressiveness and likeness preservation on a large set of diverse subjects and capture conditions.
△ Less
Submitted 28 May, 2024; v1 submitted 25 May, 2024;
originally announced May 2024.
-
Symmetric Linear Bandits with Hidden Symmetry
Authors:
Nam Phuong Tran,
The Anh Ta,
Debmalya Mandal,
Long Tran-Thanh
Abstract:
High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the…
▽ More
High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the literature is sparsity. However, it may not be available in practice. Symmetry, where the reward is invariant under certain groups of transformations on the set of arms, is another important inductive bias in the high-dimensional case that covers many standard structures, including sparsity. In this work, we study high-dimensional symmetric linear bandits where the symmetry is hidden from the learner, and the correct symmetry needs to be learned in an online setting. We examine the structure of a collection of hidden symmetry and provide a method based on model selection within the collection of low-dimensional subspaces. Our algorithm achieves a regret bound of $ O(d_0^{1/3} T^{2/3} \log(d))$, where $d$ is the ambient dimension which is potentially very large, and $d_0$ is the dimension of the true low-dimensional subspace such that $d_0 \ll d$. With an extra assumption on well-separated models, we can further improve the regret to $ O(d_0\sqrt{T\log(d)} )$.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Long-term Human Participation Assessment In Collaborative Learning Environments Using Dynamic Scene Analysis
Authors:
Wenjing Shi,
Phuong Tran,
Sylvia Celedón-Pattichis,
Marios S. Pattichis
Abstract:
The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the ca…
▽ More
The paper develops datasets and methods to assess student participation in real-life collaborative learning environments. In collaborative learning environments, students are organized into small groups where they are free to interact within their group. Thus, students can move around freely causing issues with strong pose variation, move out and re-enter the camera scene, or face away from the camera. We formulate the problem of assessing student participation into two subproblems: (i) student group detection against strong background interference from other groups, and (ii) dynamic participant tracking within the group. A massive independent testing dataset of 12,518,250 student label instances, of total duration of 21 hours and 22 minutes of real-life videos, is used for evaluating the performance of our proposed method for student group detection. The proposed method of using multiple image representations is shown to perform equally or better than YOLO on all video instances. Over the entire dataset, the proposed method achieved an F1 score of 0.85 compared to 0.80 for YOLO. Following student group detection, the paper presents the development of a dynamic participant tracking system for assessing student group participation through long video sessions. The proposed dynamic participant tracking system is shown to perform exceptionally well, missing a student in just one out of 35 testing videos. In comparison, a state of the art method fails to track students in 14 out of the 35 testing videos. The proposed method achieves 82.3% accuracy on an independent set of long, real-life collaborative videos.
△ Less
Submitted 14 April, 2024;
originally announced May 2024.
-
Deep-learning Optical Flow Outperforms PIV in Obtaining Velocity Fields from Active Nematics
Authors:
Phu N. Tran,
Sattvic Ray,
Linnea Lemma,
Yunrui Li,
Reef Sweeney,
Aparna Baskaran,
Zvonimir Dogic,
Pengyu Hong,
Michael F. Hagan
Abstract:
Deep learning-based optical flow (DLOF) extracts features in adjacent video frames with deep convolutional neural networks. It uses those features to estimate the inter-frame motions of objects at the pixel level. In this article, we evaluate the ability of optical flow to quantify the spontaneous flows of MT-based active nematics under different labeling conditions. We compare DLOF against the co…
▽ More
Deep learning-based optical flow (DLOF) extracts features in adjacent video frames with deep convolutional neural networks. It uses those features to estimate the inter-frame motions of objects at the pixel level. In this article, we evaluate the ability of optical flow to quantify the spontaneous flows of MT-based active nematics under different labeling conditions. We compare DLOF against the commonly used technique, particle imaging velocimetry (PIV). We obtain flow velocity ground truths either by performing semi-automated particle tracking on samples with sparsely labeled filaments, or from passive tracer beads. We find that DLOF produces significantly more accurate velocity fields than PIV for densely labeled samples. We show that the breakdown of PIV arises because the algorithm cannot reliably distinguish contrast variations at high densities, particularly in directions parallel to the nematic director. DLOF overcomes this limitation. For sparsely labeled samples, DLOF and PIV produce results with similar accuracy, but DLOF gives higher-resolution fields. Our work establishes DLOF as a versatile tool for measuring fluid flows in a broad class of active, soft, and biophysical systems.
△ Less
Submitted 26 April, 2024; v1 submitted 23 April, 2024;
originally announced April 2024.
-
TrafficVLM: A Controllable Visual Language Model for Traffic Video Captioning
Authors:
Quang Minh Dinh,
Minh Khoi Ho,
Anh Quan Dang,
Hung Phong Tran
Abstract:
Traffic video description and analysis have received much attention recently due to the growing demand for efficient and reliable urban surveillance systems. Most existing methods only focus on locating traffic event segments, which severely lack descriptive details related to the behaviour and context of all the subjects of interest in the events. In this paper, we present TrafficVLM, a novel mul…
▽ More
Traffic video description and analysis have received much attention recently due to the growing demand for efficient and reliable urban surveillance systems. Most existing methods only focus on locating traffic event segments, which severely lack descriptive details related to the behaviour and context of all the subjects of interest in the events. In this paper, we present TrafficVLM, a novel multi-modal dense video captioning model for vehicle ego camera view. TrafficVLM models traffic video events at different levels of analysis, both spatially and temporally, and generates long fine-grained descriptions for the vehicle and pedestrian at different phases of the event. We also propose a conditional component for TrafficVLM to control the generation outputs and a multi-task fine-tuning paradigm to enhance TrafficVLM's learning capability. Experiments show that TrafficVLM performs well on both vehicle and overhead camera views. Our solution achieved outstanding results in Track 2 of the AI City Challenge 2024, ranking us third in the challenge standings. Our code is publicly available at https://github.com/quangminhdinh/TrafficVLM.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
ML KPI Prediction in 5G and B5G Networks
Authors:
Nguyen Phuc Tran,
Oscar Delgado,
Brigitte Jaumard,
Fadi Bishay
Abstract:
Network operators are facing new challenges when meeting the needs of their customers. The challenges arise due to the rise of new services, such as HD video streaming, IoT, autonomous driving, etc., and the exponential growth of network traffic. In this context, 5G and B5G networks have been evolving to accommodate a wide range of applications and use cases. Additionally, this evolution brings ne…
▽ More
Network operators are facing new challenges when meeting the needs of their customers. The challenges arise due to the rise of new services, such as HD video streaming, IoT, autonomous driving, etc., and the exponential growth of network traffic. In this context, 5G and B5G networks have been evolving to accommodate a wide range of applications and use cases. Additionally, this evolution brings new features, like the ability to create multiple end-to-end isolated virtual networks using network slicing. Nevertheless, to ensure the quality of service, operators must maintain and optimize their networks in accordance with the key performance indicators (KPIs) and the slice service-level agreements (SLAs).
In this paper, we introduce a machine learning (ML) model used to estimate throughput in 5G and B5G networks with end-to-end (E2E) network slices. Then, we combine the predicted throughput with the current network state to derive an estimate of other network KPIs, which can be used to further improve service assurance. To assess the efficiency of our solution, a performance metric was proposed. Numerical evaluations demonstrate that our KPI prediction model outperforms those derived from other methods with the same or nearly the same computational time.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Proactive Service Assurance in 5G and B5G Networks: A Closed-Loop Algorithm for End-to-End Network Slicing
Authors:
Nguyen Phuc Tran,
Oscar Delgado,
Brigitte Jaumard
Abstract:
The customization of services in Fifth-generation (5G) and Beyond 5G (B5G) networks relies heavily on network slicing, which creates multiple virtual networks on a shared physical infrastructure, tailored to meet specific requirements of distinct applications, using Software Defined Networking (SDN) and Network Function Virtualization (NFV). It is imperative to ensure that network services meet th…
▽ More
The customization of services in Fifth-generation (5G) and Beyond 5G (B5G) networks relies heavily on network slicing, which creates multiple virtual networks on a shared physical infrastructure, tailored to meet specific requirements of distinct applications, using Software Defined Networking (SDN) and Network Function Virtualization (NFV). It is imperative to ensure that network services meet the performance and reliability requirements of various applications and users, thus, service assurance is one of the critical components in network slicing. One of the key functionalities of network slicing is the ability to scale Virtualized Network Functions (VNFs) in response to changing resource demand and to meet Customer Service Level agreements (SLAs). In this paper, we introduce a proactive closed-loop algorithm for end-to-end network orchestration, designed to provide service assurance in 5G and B5G networks. We focus on dynamically scaling resources to meet key performance indicators (KPIs) specific to each network slice and operate in parallel across multiple slices, making it scalable and capable of managing completely automatically real-time service assurance. Through our experiments, we demonstrate that the proposed algorithm effectively fulfills service assurance requirements for different network slice types, thereby minimizing network resource utilization and reducing the over-provisioning of spare resources.
△ Less
Submitted 24 June, 2024; v1 submitted 1 April, 2024;
originally announced April 2024.
-
Blur2Blur: Blur Conversion for Unsupervised Image Deblurring on Unknown Domains
Authors:
Bang-Dang Pham,
Phong Tran,
Anh Tran,
Cuong Pham,
Rang Nguyen,
Minh Hoai
Abstract:
This paper presents an innovative framework designed to train an image deblurring algorithm tailored to a specific camera device. This algorithm works by transforming a blurry input image, which is challenging to deblur, into another blurry image that is more amenable to deblurring. The transformation process, from one blurry state to another, leverages unpaired data consisting of sharp and blurry…
▽ More
This paper presents an innovative framework designed to train an image deblurring algorithm tailored to a specific camera device. This algorithm works by transforming a blurry input image, which is challenging to deblur, into another blurry image that is more amenable to deblurring. The transformation process, from one blurry state to another, leverages unpaired data consisting of sharp and blurry images captured by the target camera device. Learning this blur-to-blur transformation is inherently simpler than direct blur-to-sharp conversion, as it primarily involves modifying blur patterns rather than the intricate task of reconstructing fine image details. The efficacy of the proposed approach has been demonstrated through comprehensive experiments on various benchmarks, where it significantly outperforms state-of-the-art methods both quantitatively and qualitatively. Our code and data are available at https://zero1778.github.io/blur2blur/
△ Less
Submitted 24 March, 2024;
originally announced March 2024.
-
Learning the Expected Core of Strictly Convex Stochastic Cooperative Games
Authors:
Nam Phuong Tran,
The Anh Ta,
Shuqing Shi,
Debmalya Mandal,
Yali Du,
Long Tran-Thanh
Abstract:
Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in reward allocation is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In previous works, computing the core requires either knowledge of the reward function in dete…
▽ More
Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in reward allocation is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In previous works, computing the core requires either knowledge of the reward function in deterministic games or the reward distribution in stochastic games. However, this is unrealistic, as the reward function or distribution is often only partially known and may be subject to uncertainty. In this paper, we consider the core learning problem in stochastic cooperative games, where the reward distribution is unknown. Our goal is to learn the expected core, that is, the set of allocations that are stable in expectation, given an oracle that returns a stochastic reward for an enquired coalition each round. Within the class of strictly convex games, we present an algorithm named \texttt{Common-Points-Picking} that returns a point in the expected core given a polynomial number of samples, with high probability. To analyse the algorithm, we develop a new extension of the separation hyperplane theorem for multiple convex sets.
△ Less
Submitted 22 May, 2024; v1 submitted 10 February, 2024;
originally announced February 2024.
-
A Solution for Commercializing, Decentralizing and Storing Electronic Medical Records by Integrating Proxy Re-Encryption, IPFS, and Blockchain
Authors:
Phong Tran,
Thong Nguyen,
Long Chu,
Nhi Tran,
Hang Ta
Abstract:
The rapid expansion of user medical records across global systems presents not only opportunities but also new challenges in maintaining effective application models that ensure user privacy, controllability, and the ability to commercialize patient medical records. Moreover, the proliferation of data analysis models in healthcare institutions necessitates the decentralization and restorability of…
▽ More
The rapid expansion of user medical records across global systems presents not only opportunities but also new challenges in maintaining effective application models that ensure user privacy, controllability, and the ability to commercialize patient medical records. Moreover, the proliferation of data analysis models in healthcare institutions necessitates the decentralization and restorability of medical record data. It is imperative that user medical data collected from these systems can be easily analyzed and utilized even years after collection, without the risk of data loss due to numerous factors. Additionally, medical information must be authorized by the data owner, granting patients the right to accept or decline data usage requests from medical research agencies. In response, we propose an innovative solution for implementing a decentralized system utilizing an EVM-compatible blockchain and IPFS for decentralized storage. To ensure privacy and control, we employ Proxy Re-Encryption (PRE), a cryptographic authorized method, within the medical data marketplace. Our proposed architecture significantly reduces costs associated with granting read access to healthcare research agencies by minimizing the encryption and decryption time of stored records. Furthermore, it empowers users with enhanced control over their health data through tamperproof blockchain smart contracts and IPFS, safeguarding the integrity and privacy of their medical records.
△ Less
Submitted 4 June, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Context-Aware Stress Monitoring using Wearable and Mobile Technologies in Everyday Settings
Authors:
Seyed Amir Hossein Aqajari,
Sina Labbaf,
Phuc Hoang Tran,
Brenda Nguyen,
Milad Asgari Mehrabadi,
Marco Levorato,
Nikil Dutt,
Amir M. Rahmani
Abstract:
Daily monitoring of stress is a critical component of maintaining optimal physical and mental health. Physiological signals and contextual information have recently emerged as promising indicators for detecting instances of heightened stress. Nonetheless, developing a real-time monitoring system that utilizes both physiological and contextual data to anticipate stress levels in everyday settings w…
▽ More
Daily monitoring of stress is a critical component of maintaining optimal physical and mental health. Physiological signals and contextual information have recently emerged as promising indicators for detecting instances of heightened stress. Nonetheless, developing a real-time monitoring system that utilizes both physiological and contextual data to anticipate stress levels in everyday settings while also gathering stress labels from participants represents a significant challenge. We present a monitoring system that objectively tracks daily stress levels by utilizing both physiological and contextual data in a daily-life environment. Additionally, we have integrated a smart labeling approach to optimize the ecological momentary assessment (EMA) collection, which is required for building machine learning models for stress detection. We propose a three-tier Internet-of-Things-based system architecture to address the challenges. We utilized a cross-validation technique to accurately estimate the performance of our stress models. We achieved the F1-score of 70\% with a Random Forest classifier using both PPG and contextual data, which is considered an acceptable score in models built for everyday settings. Whereas using PPG data alone, the highest F1-score achieved is approximately 56\%, emphasizing the significance of incorporating both PPG and contextual data in stress detection tasks.
△ Less
Submitted 14 December, 2023;
originally announced January 2024.
-
VOODOO 3D: Volumetric Portrait Disentanglement for One-Shot 3D Head Reenactment
Authors:
Phong Tran,
Egor Zakharov,
Long-Nhat Ho,
Anh Tuan Tran,
Liwen Hu,
Hao Li
Abstract:
We present a 3D-aware one-shot head reenactment method based on a fully volumetric neural disentanglement framework for source appearance and driver expressions. Our method is real-time and produces high-fidelity and view-consistent output, suitable for 3D teleconferencing systems based on holographic displays. Existing cutting-edge 3D-aware reenactment methods often use neural radiance fields or…
▽ More
We present a 3D-aware one-shot head reenactment method based on a fully volumetric neural disentanglement framework for source appearance and driver expressions. Our method is real-time and produces high-fidelity and view-consistent output, suitable for 3D teleconferencing systems based on holographic displays. Existing cutting-edge 3D-aware reenactment methods often use neural radiance fields or 3D meshes to produce view-consistent appearance encoding, but, at the same time, they rely on linear face models, such as 3DMM, to achieve its disentanglement with facial expressions. As a result, their reenactment results often exhibit identity leakage from the driver or have unnatural expressions. To address these problems, we propose a neural self-supervised disentanglement approach that lifts both the source image and driver video frame into a shared 3D volumetric representation based on tri-planes. This representation can then be freely manipulated with expression tri-planes extracted from the driving images and rendered from an arbitrary view using neural radiance fields. We achieve this disentanglement via self-supervised learning on a large in-the-wild video dataset. We further introduce a highly effective fine-tuning approach to improve the generalizability of the 3D lifting using the same real-world data. We demonstrate state-of-the-art performance on a wide range of datasets, and also showcase high-quality 3D-aware head reenactment on highly challenging and diverse subjects, including non-frontal head poses and complex expressions for both source and driver.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Radio Source Localization using Sparse Signal Measurements from Uncrewed Ground Vehicles
Authors:
Asanka Perera,
Vu Phi Tran,
Sreenatha Anavatti,
Kathryn Kasmarik,
Matthew Garratt
Abstract:
Radio source localization can benefit many fields, including wireless communications, radar, radio astronomy, wireless sensor networks, positioning systems, and surveillance systems. However, accurately estimating the position of a radio transmitter using a remote sensor is not an easy task, as many factors contribute to the highly dynamic behavior of radio signals. In this study, we investigate t…
▽ More
Radio source localization can benefit many fields, including wireless communications, radar, radio astronomy, wireless sensor networks, positioning systems, and surveillance systems. However, accurately estimating the position of a radio transmitter using a remote sensor is not an easy task, as many factors contribute to the highly dynamic behavior of radio signals. In this study, we investigate techniques to use a mobile robot to explore an outdoor area and localize the radio source using sparse Received Signal Strength Indicator (RSSI) measurements. We propose a novel radio source localization method with fast turnaround times and reduced complexity compared to the state-of-the-art. Our technique uses RSSI measurements collected while the robot completed a sparse trajectory using a coverage path planning map. The mean RSSI within each grid cell was used to find the most likely cell containing the source. Three techniques were analyzed with the data from eight field tests using a mobile robot. The proposed method can localize a gas source in a basketball field with a 1.2 m accuracy and within three minutes of convergence time, whereas the state-of-the-art active sensing technique took more than 30 minutes to reach a source estimation accuracy below 1 m.
△ Less
Submitted 6 December, 2023;
originally announced December 2023.
-
Simple Transferability Estimation for Regression Tasks
Authors:
Cuong N. Nguyen,
Phong Tran,
Lam Si Tung Ho,
Vu Dinh,
Anh T. Tran,
Tal Hassner,
Cuong V. Nguyen
Abstract:
We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel…
▽ More
We consider transferability estimation, the problem of estimating how well deep learning models transfer from a source to a target task. We focus on regression tasks, which received little previous attention, and propose two simple and computationally efficient approaches that estimate transferability based on the negative regularized mean squared error of a linear regression model. We prove novel theoretical results connecting our approaches to the actual transferability of the optimal target models obtained from the transfer learning process. Despite their simplicity, our approaches significantly outperform existing state-of-the-art regression transferability estimators in both accuracy and efficiency. On two large-scale keypoint regression benchmarks, our approaches yield 12% to 36% better results on average while being at least 27% faster than previous state-of-the-art methods.
△ Less
Submitted 3 December, 2023; v1 submitted 1 December, 2023;
originally announced December 2023.
-
Individually Rational Collaborative Vehicle Routing through Give-And-Take Exchanges
Authors:
Paul Mingzheng Tang,
Ba Phong Tran,
Hoong Chuin Lau
Abstract:
In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pa…
▽ More
In this paper, we are concerned with the automated exchange of orders between logistics companies in a marketplace platform to optimize total revenues. We introduce a novel multi-agent approach to this problem, focusing on the Collaborative Vehicle Routing Problem (CVRP) through the lens of individual rationality. Our proposed algorithm applies the principles of Vehicle Routing Problem (VRP) to pairs of vehicles from different logistics companies, optimizing the overall routes while considering standard VRP constraints plus individual rationality constraints. By facilitating cooperation among competing logistics agents through a Give-and-Take approach, we show that it is possible to reduce travel distance and increase operational efficiency system-wide. More importantly, our approach ensures individual rationality and faster convergence, which are important properties of ensuring the long-term sustainability of the marketplace platform. We demonstrate the efficacy of our approach through extensive experiments using real-world test data from major logistics companies. The results reveal our algorithm's ability to rapidly identify numerous optimal solutions, underscoring its practical applicability and potential to transform the logistics industry.
△ Less
Submitted 31 August, 2023;
originally announced August 2023.
-
What Truly Matters in Trajectory Prediction for Autonomous Driving?
Authors:
Phong Tran,
Haoran Wu,
Cunjun Yu,
Panpan Cai,
Sifa Zheng,
David Hsu
Abstract:
Trajectory prediction plays a vital role in the performance of autonomous driving systems, and prediction accuracy, such as average displacement error (ADE) or final displacement error (FDE), is widely used as a performance metric. However, a significant disparity exists between the accuracy of predictors on fixed datasets and driving performance when the predictors are used downstream for vehicle…
▽ More
Trajectory prediction plays a vital role in the performance of autonomous driving systems, and prediction accuracy, such as average displacement error (ADE) or final displacement error (FDE), is widely used as a performance metric. However, a significant disparity exists between the accuracy of predictors on fixed datasets and driving performance when the predictors are used downstream for vehicle control, because of a dynamics gap. In the real world, the prediction algorithm influences the behavior of the ego vehicle, which, in turn, influences the behaviors of other vehicles nearby. This interaction results in predictor-specific dynamics that directly impacts prediction results. In fixed datasets, since other vehicles' responses are predetermined, this interaction effect is lost, leading to a significant dynamics gap. This paper studies the overlooked significance of this dynamics gap. We also examine several other factors contributing to the disparity between prediction performance and driving performance. The findings highlight the trade-off between the predictor's computational efficiency and prediction accuracy in determining real-world driving performance. In summary, an interactive, task-driven evaluation protocol for trajectory prediction is crucial to capture its effectiveness for autonomous driving. Source code along with experimental settings is available online.
△ Less
Submitted 6 November, 2023; v1 submitted 26 June, 2023;
originally announced June 2023.
-
Coverage Path Planning with Budget Constraints for Multiple Unmanned Ground Vehicles
Authors:
Vu Phi Tran,
Asanka Perera,
Matthew A. Garratt,
Kathryn Kasmarik,
Sreenatha Anavatti
Abstract:
This paper proposes a state-machine model for a multi-modal, multi-robot environmental sensing algorithm. This multi-modal algorithm integrates two different exploration algorithms: (1) coverage path planning using variable formations and (2) collaborative active sensing using multi-robot swarms. The state machine provides the logic for when to switch between these different sensing algorithms. We…
▽ More
This paper proposes a state-machine model for a multi-modal, multi-robot environmental sensing algorithm. This multi-modal algorithm integrates two different exploration algorithms: (1) coverage path planning using variable formations and (2) collaborative active sensing using multi-robot swarms. The state machine provides the logic for when to switch between these different sensing algorithms. We evaluate the performance of the proposed approach on a gas source localisation and mapping task. We use hardware-in-the-loop experiments and real-time experiments with a radio source simulating a real gas field. We compare the proposed approach with a single-mode, state-of-the-art collaborative active sensing approach. Our results indicate that our multi-modal switching approach can converge more rapidly than single-mode active sensing.
△ Less
Submitted 6 June, 2023;
originally announced June 2023.
-
HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering
Authors:
Bang-Dang Pham,
Phong Tran,
Anh Tran,
Cuong Pham,
Rang Nguyen,
Minh Hoai
Abstract:
We consider the challenging task of training models for image-to-video deblurring, which aims to recover a sequence of sharp images corresponding to a given blurry image input. A critical issue disturbing the training of an image-to-video model is the ambiguity of the frame ordering since both the forward and backward sequences are plausible solutions. This paper proposes an effective self-supervi…
▽ More
We consider the challenging task of training models for image-to-video deblurring, which aims to recover a sequence of sharp images corresponding to a given blurry image input. A critical issue disturbing the training of an image-to-video model is the ambiguity of the frame ordering since both the forward and backward sequences are plausible solutions. This paper proposes an effective self-supervised ordering scheme that allows training high-quality image-to-video deblurring models. Unlike previous methods that rely on order-invariant losses, we assign an explicit order for each video sequence, thus avoiding the order-ambiguity issue. Specifically, we map each video sequence to a vector in a latent high-dimensional space so that there exists a hyperplane such that for every video sequence, the vectors extracted from it and its reversed sequence are on different sides of the hyperplane. The side of the vectors will be used to define the order of the corresponding sequence. Last but not least, we propose a real-image dataset for the image-to-video deblurring problem that covers a variety of popular domains, including face, hand, and street. Extensive experimental results confirm the effectiveness of our method. Code and data are available at https://github.com/VinAIResearch/HyperCUT.git
△ Less
Submitted 5 April, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Ensemble Learning of Myocardial Displacements for Myocardial Infarction Detection in Echocardiography
Authors:
Nguyen Tuan,
Phi Nguyen,
Dai Tran,
Hung Pham,
Quang Nguyen,
Thanh Le,
Hanh Van,
Bach Do,
Phuong Tran,
Vinh Le,
Thuy Nguyen,
Long Tran,
Hieu Pham
Abstract:
Early detection and localization of myocardial infarction (MI) can reduce the severity of cardiac damage through timely treatment interventions. In recent years, deep learning techniques have shown promise for detecting MI in echocardiographic images. However, there has been no examination of how segmentation accuracy affects MI classification performance and the potential benefits of using ensemb…
▽ More
Early detection and localization of myocardial infarction (MI) can reduce the severity of cardiac damage through timely treatment interventions. In recent years, deep learning techniques have shown promise for detecting MI in echocardiographic images. However, there has been no examination of how segmentation accuracy affects MI classification performance and the potential benefits of using ensemble learning approaches. Our study investigates this relationship and introduces a robust method that combines features from multiple segmentation models to improve MI classification performance by leveraging ensemble learning. Our method combines myocardial segment displacement features from multiple segmentation models, which are then input into a typical classifier to estimate the risk of MI. We validated the proposed approach on two datasets: the public HMC-QU dataset (109 echocardiograms) for training and validation, and an E-Hospital dataset (60 echocardiograms) from a local clinical site in Vietnam for independent testing. Model performance was evaluated based on accuracy, sensitivity, and specificity. The proposed approach demonstrated excellent performance in detecting MI. The results showed that the proposed approach outperformed the state-of-the-art feature-based method. Further research is necessary to determine its potential use in clinical settings as a tool to assist cardiologists and technicians with objective assessments and reduce dependence on operator subjectivity. Our research codes are available on GitHub at https://github.com/vinuni-vishc/mi-detection-echo.
△ Less
Submitted 12 March, 2023;
originally announced March 2023.
-
LEDetection: A Simple Framework for Semi-Supervised Few-Shot Object Detection
Authors:
Phi Vu Tran
Abstract:
Few-shot object detection (FSOD) is a challenging problem aimed at detecting novel concepts from few exemplars. Existing approaches to FSOD all assume abundant base labels to adapt to novel objects. This paper studies the new task of semi-supervised FSOD by considering a realistic scenario in which both base and novel labels are simultaneously scarce. We explore the utility of unlabeled data withi…
▽ More
Few-shot object detection (FSOD) is a challenging problem aimed at detecting novel concepts from few exemplars. Existing approaches to FSOD all assume abundant base labels to adapt to novel objects. This paper studies the new task of semi-supervised FSOD by considering a realistic scenario in which both base and novel labels are simultaneously scarce. We explore the utility of unlabeled data within our proposed label-efficient detection framework and discover its remarkable ability to boost semi-supervised FSOD by way of region proposals. Motivated by this finding, we introduce SoftER Teacher, a robust detector combining pseudo-labeling with consistency learning on region proposals, to harness unlabeled data for improved FSOD without relying on abundant labels. Rigorous experiments show that SoftER Teacher surpasses the novel performance of a strong supervised detector using only 10% of required base labels, without catastrophic forgetting observed in prior approaches. Our work also sheds light on a potential relationship between semi-supervised and few-shot detection suggesting that a stronger semi-supervised detector leads to a more effective few-shot detector.
△ Less
Submitted 14 February, 2024; v1 submitted 10 March, 2023;
originally announced March 2023.
-
Proof of Swarm Based Ensemble Learning for Federated Learning Applications
Authors:
Ali Raza,
Kim Phuc Tran,
Ludovic Koehl,
Shujun Li
Abstract:
Ensemble learning combines results from multiple machine learning models in order to provide a better and optimised predictive model with reduced bias, variance and improved predictions. However, in federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Hence, a mechanism is required to combine results of local models to produce a global mode…
▽ More
Ensemble learning combines results from multiple machine learning models in order to provide a better and optimised predictive model with reduced bias, variance and improved predictions. However, in federated learning it is not feasible to apply centralised ensemble learning directly due to privacy concerns. Hence, a mechanism is required to combine results of local models to produce a global model. Most distributed consensus algorithms, such as Byzantine fault tolerance (BFT), do not normally perform well in such applications. This is because, in such methods predictions of some of the peers are disregarded, so a majority of peers can win without even considering other peers' decisions. Additionally, the confidence score of the result of each peer is not normally taken into account, although it is an important feature to consider for ensemble learning. Moreover, the problem of a tie event is often left un-addressed by methods such as BFT. To fill these research gaps, we propose PoSw (Proof of Swarm), a novel distributed consensus algorithm for ensemble learning in a federated setting, which was inspired by particle swarm based algorithms for solving optimisation problems. The proposed algorithm is theoretically proved to always converge in a relatively small number of steps and has mechanisms to resolve tie events while trying to achieve sub-optimum solutions. We experimentally validated the performance of the proposed algorithm using ECG classification as an example application in healthcare, showing that the ensemble learning model outperformed all local models and even the FL-based global model. To the best of our knowledge, the proposed algorithm is the first attempt to make consensus over the output results of distributed models trained using federated learning.
△ Less
Submitted 2 January, 2023; v1 submitted 28 December, 2022;
originally announced December 2022.
-
Invariant Lipschitz Bandits: A Side Observation Approach
Authors:
Nam Phuong Tran,
Long Tran-Thanh
Abstract:
Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization…
▽ More
Symmetry arises in many optimization and decision-making problems, and has attracted considerable attention from the optimization community: By utilizing the existence of such symmetries, the process of searching for optimal solutions can be improved significantly. Despite its success in (offline) optimization, the utilization of symmetries has not been well examined within the online optimization settings, especially in the bandit literature. As such, in this paper we study the invariant Lipschitz bandit setting, a subclass of the Lipschitz bandits where the reward function and the set of arms are preserved under a group of transformations. We introduce an algorithm named \texttt{UniformMesh-N}, which naturally integrates side observations using group orbits into the \texttt{UniformMesh} algorithm (\cite{Kleinberg2005_UniformMesh}), which uniformly discretizes the set of arms. Using the side-observation approach, we prove an improved regret upper bound, which depends on the cardinality of the group, given that the group is finite. We also prove a matching regret's lower bound for the invariant Lipschitz bandit class (up to logarithmic factors). We hope that our work will ignite further investigation of symmetry in bandit theory and sequential decision-making theory in general.
△ Less
Submitted 28 August, 2023; v1 submitted 14 December, 2022;
originally announced December 2022.
-
Access to care: analysis of the geographical distribution of healthcare using Linked Open Data
Authors:
Selene Baez Santamaria,
Emmanouil Manousogiannis,
Guusje Boomgaard,
Linh P. Tran,
Zoltan Szlavik,
Robert-Jan Sips
Abstract:
Background: Access to medical care is strongly dependent on resource allocation, such as the geographical distribution of medical facilities. Nevertheless, this data is usually restricted to country official documentation, not available to the public. While some medical facilities' data is accessible as semantic resources on the Web, it is not consistent in its modeling and has yet to be integrate…
▽ More
Background: Access to medical care is strongly dependent on resource allocation, such as the geographical distribution of medical facilities. Nevertheless, this data is usually restricted to country official documentation, not available to the public. While some medical facilities' data is accessible as semantic resources on the Web, it is not consistent in its modeling and has yet to be integrated into a complete, open, and specialized repository. This work focuses on generating a comprehensive semantic dataset of medical facilities worldwide containing extensive information about such facilities' geo-location.
Results: For this purpose, we collect, align, and link various open-source databases where medical facilities' information may be present. This work allows us to evaluate each data source along various dimensions, such as completeness, correctness, and interlinking with other sources, all critical aspects of current knowledge representation technologies.
Conclusions: Our contributions directly benefit stakeholders in the biomedical and health domain (patients, healthcare professionals, companies, regulatory authorities, and researchers), who will now have a better overview of the access to and distribution of medical facilities.
△ Less
Submitted 26 September, 2022; v1 submitted 11 April, 2022;
originally announced April 2022.
-
Robust Fuzzy Q-Learning-Based Strictly Negative Imaginary Tracking Controllers for the Uncertain Quadrotor Systems
Authors:
Vu Phi Tran,
M. A Mabrok,
Sreenatha G. Anavatti,
Matthew A. Garratt,
Ian R. Petersen
Abstract:
Quadrotors are one of the popular unmanned aerial vehicles (UAVs) due to their versatility and simple design. However, the tuning of gains for quadrotor flight controllers can be laborious, and accurately stable control of trajectories can be difficult to maintain under exogenous disturbances and uncertain system parameters. This paper introduces a novel robust and adaptive control synthesis metho…
▽ More
Quadrotors are one of the popular unmanned aerial vehicles (UAVs) due to their versatility and simple design. However, the tuning of gains for quadrotor flight controllers can be laborious, and accurately stable control of trajectories can be difficult to maintain under exogenous disturbances and uncertain system parameters. This paper introduces a novel robust and adaptive control synthesis methodology for a quadrotor robot's attitude and altitude stabilization. The developed method is based on the fuzzy reinforcement learning and Strictly Negative Imaginary (SNI) property. The first stage of our control approach is to transform a nonlinear quadrotor system into an equivalent Negative-Imaginary (NI) linear model by means of the feedback linearization (FL) technique. The second phase is to design a control scheme that adapts online the Strictly Negative Imaginary (SNI) controller gains via fuzzy Q-learning, inspired by biological learning. The proposed controller does not require any prior training. The performance of the designed controller is compared with that of a fixed-gain SNI controller, a fuzzy-SNI controller, and a conventional PID controller in a series of numerical simulations. Furthermore, the stability of the proposed controller and the adaptive laws are proofed using the NI theorem.
△ Less
Submitted 25 March, 2022;
originally announced March 2022.
-
Frontier-led Swarming: Robust Multi-Robot Coverage of Unknown Environments
Authors:
Vu Phi Tran,
Matthew A. Garratt,
Kathryn Kasmarik,
Sreenatha G. Anavatti
Abstract:
This paper proposes a novel swarm-based control algorithm for exploration and coverage of unknown environments, while maintaining a formation that permits short-range communication. The algorithm combines two elements: swarm rules for maintaining a close-knit formation and frontier search for driving exploration and coverage. Inspired by natural systems in which large numbers of simple agents (e.g…
▽ More
This paper proposes a novel swarm-based control algorithm for exploration and coverage of unknown environments, while maintaining a formation that permits short-range communication. The algorithm combines two elements: swarm rules for maintaining a close-knit formation and frontier search for driving exploration and coverage. Inspired by natural systems in which large numbers of simple agents (e.g., schooling fish, flocking birds, swarming insects) perform complicated collective behaviors for efficiency and safety, the first element uses three simple rules to maintain a swarm formation. The second element provides a means to select promising regions to explore (and cover) by minimising a cost function involving robots' relative distance to frontier cells and the frontier's size. We tested the performance of our approach on heterogeneous and homogeneous groups of mobile robots in different environments. We measure both coverage performance and swarm formation statistics as indicators of the robots' ability to explore effectively while maintaining a formation conducive to short-range communication. Through a series of comparison experiments, we demonstrate that our proposed strategy has superior performance to recently presented map coverage methodologies and conventional swarming methods.
△ Less
Submitted 25 January, 2022; v1 submitted 28 November, 2021;
originally announced November 2021.
-
Facial Recognition in Collaborative Learning Videos
Authors:
Phuong Tran,
Marios Pattichis,
Sylvia Celedón-Pattichis,
Carlos LópezLeiva
Abstract:
Face recognition in collaborative learning videos presents many challenges. In collaborative learning videos, students sit around a typical table at different positions to the recording camera, come and go, move around, get partially or fully occluded. Furthermore, the videos tend to be very long, requiring the development of fast and accurate methods. We develop a dynamic system of recognizing pa…
▽ More
Face recognition in collaborative learning videos presents many challenges. In collaborative learning videos, students sit around a typical table at different positions to the recording camera, come and go, move around, get partially or fully occluded. Furthermore, the videos tend to be very long, requiring the development of fast and accurate methods. We develop a dynamic system of recognizing participants in collaborative learning systems. We address occlusion and recognition failures by using past information about the face detection history. We address the need for detecting faces from different poses and the need for speed by associating each participant with a collection of prototype faces computed through sampling or K-means clustering. Our results show that the proposed system is proven to be very fast and accurate. We also compare our system against a baseline system that uses InsightFace [2] and the original training video segments. We achieved an average accuracy of 86.2% compared to 70.8% for the baseline system. On average, our recognition rate was 28.1 times faster than the baseline system.
△ Less
Submitted 25 October, 2021;
originally announced October 2021.
-
Lightweight Transformer in Federated Setting for Human Activity Recognition
Authors:
Ali Raza,
Kim Phuc Tran,
Ludovic Koehl,
Shujun Li,
Xianyi Zeng,
Khaled Benzaidi
Abstract:
Human activity recognition (HAR) is a machine learning task with important applications in healthcare especially in the context of home care of patients and older adults. HAR is often based on data collected from smart sensors, particularly smart home IoT devices such as smartphones, wearables and other body sensors. Deep learning techniques like convolutional neural networks (CNNs) and recurrent…
▽ More
Human activity recognition (HAR) is a machine learning task with important applications in healthcare especially in the context of home care of patients and older adults. HAR is often based on data collected from smart sensors, particularly smart home IoT devices such as smartphones, wearables and other body sensors. Deep learning techniques like convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have been used for HAR, both in centralized and federated settings. However, these techniques have certain limitations: RNNs cannot be easily parallelized, CNNs have the limitation of sequence length, and both are computationally expensive. Moreover, in home healthcare applications the centralized approach can raise serious privacy concerns since the sensors used by a HAR classifier collect a lot of highly personal and sensitive data about people in the home. In this paper, to address some of such challenges facing HAR, we propose a novel lightweight (one-patch) transformer, which can combine the advantages of RNNs and CNNs without their major limitations, and also TransFed, a more privacy-friendly, federated learning-based HAR classifier using our proposed lightweight transformer. We designed a testbed to construct a new HAR dataset from five recruited human participants, and used the new dataset to evaluate the performance of the proposed HAR classifier in both federated and centralized settings. Additionally, we use another public dataset to evaluate the performance of the proposed HAR classifier in centralized setting to compare it with existing HAR classifiers. The experimental results showed that our proposed new solution outperformed state-of-the-art HAR classifiers based on CNNs and RNNs, whiling being more computationally efficient.
△ Less
Submitted 4 November, 2022; v1 submitted 1 October, 2021;
originally announced October 2021.
-
Designing ECG Monitoring Healthcare System with Federated Transfer Learning and Explainable AI
Authors:
Ali Raza,
Kim Phuc Tran,
Ludovic Koehl,
Shujun Li
Abstract:
Deep learning play a vital role in classifying different arrhythmias using the electrocardiography (ECG) data. Nevertheless, training deep learning models normally requires a large amount of data and it can lead to privacy concerns. Unfortunately, a large amount of healthcare data cannot be easily collected from a single silo. Additionally, deep learning models are like black-box, with no explaina…
▽ More
Deep learning play a vital role in classifying different arrhythmias using the electrocardiography (ECG) data. Nevertheless, training deep learning models normally requires a large amount of data and it can lead to privacy concerns. Unfortunately, a large amount of healthcare data cannot be easily collected from a single silo. Additionally, deep learning models are like black-box, with no explainability of the predicted results, which is often required in clinical healthcare. This limits the application of deep learning in real-world health systems. In this paper, we design a new explainable artificial intelligence (XAI) based deep learning framework in a federated setting for ECG-based healthcare applications. The federated setting is used to solve issues such as data availability and privacy concerns. Furthermore, the proposed framework setting effectively classifies arrhythmia's using an autoencoder and a classifier, both based on a convolutional neural network (CNN). Additionally, we propose an XAI-based module on top of the proposed classifier to explain the classification results, which help clinical practitioners make quick and reliable decisions. The proposed framework was trained and tested using the MIT-BIH Arrhythmia database. The classifier achieved accuracy up to 94% and 98% for arrhythmia detection using noisy and clean data, respectively, with five-fold cross-validation.
△ Less
Submitted 10 January, 2022; v1 submitted 26 May, 2021;
originally announced May 2021.
-
Explore Image Deblurring via Blur Kernel Space
Authors:
Phong Tran,
Anh Tran,
Quynh Phung,
Minh Hoai
Abstract:
This paper introduces a method to encode the blur operators of an arbitrary dataset of sharp-blur image pairs into a blur kernel space. Assuming the encoded kernel space is close enough to in-the-wild blur operators, we propose an alternating optimization algorithm for blind image deblurring. It approximates an unseen blur operator by a kernel in the encoded space and searches for the correspondin…
▽ More
This paper introduces a method to encode the blur operators of an arbitrary dataset of sharp-blur image pairs into a blur kernel space. Assuming the encoded kernel space is close enough to in-the-wild blur operators, we propose an alternating optimization algorithm for blind image deblurring. It approximates an unseen blur operator by a kernel in the encoded space and searches for the corresponding sharp image. Unlike recent deep-learning-based methods, our system can handle unseen blur kernel, while avoiding using complicated handcrafted priors on the blur operator often found in classical methods. Due to the method's design, the encoded kernel space is fully differentiable, thus can be easily adopted in deep neural network models. Moreover, our method can be used for blur synthesis by transferring existing blur operators from a given dataset into a new domain. Finally, we provide experimental results to confirm the effectiveness of the proposed method.
△ Less
Submitted 3 April, 2021; v1 submitted 1 April, 2021;
originally announced April 2021.
-
SSLayout360: Semi-Supervised Indoor Layout Estimation from 360-Degree Panorama
Authors:
Phi Vu Tran
Abstract:
Recent years have seen flourishing research on both semi-supervised learning and 3D room layout reconstruction. In this work, we explore the intersection of these two fields to advance the research objective of enabling more accurate 3D indoor scene modeling with less labeled data. We propose the first approach to learn representations of room corners and boundaries by using a combination of label…
▽ More
Recent years have seen flourishing research on both semi-supervised learning and 3D room layout reconstruction. In this work, we explore the intersection of these two fields to advance the research objective of enabling more accurate 3D indoor scene modeling with less labeled data. We propose the first approach to learn representations of room corners and boundaries by using a combination of labeled and unlabeled data for improved layout estimation in a 360-degree panoramic scene. Through extensive comparative experiments, we demonstrate that our approach can advance layout estimation of complex indoor scenes using as few as 20 labeled examples. When coupled with a layout predictor pre-trained on synthetic data, our semi-supervised method matches the fully supervised counterpart using only 12% of the labels. Our work takes an important first step towards robust semi-supervised layout estimation that can enable many applications in 3D perception with limited labeled data.
△ Less
Submitted 16 May, 2021; v1 submitted 25 March, 2021;
originally announced March 2021.
-
FineNet: Frame Interpolation and Enhancement for Face Video Deblurring
Authors:
Phong Tran,
Anh Tran,
Thao Nguyen,
Minh Hoai
Abstract:
The objective of this work is to deblur face videos. We propose a method that tackles this problem from two directions: (1) enhancing the blurry frames, and (2) treating the blurry frames as missing values and estimate them by interpolation. These approaches are complementary to each other, and their combination outperforms individual ones. We also introduce a novel module that leverages the struc…
▽ More
The objective of this work is to deblur face videos. We propose a method that tackles this problem from two directions: (1) enhancing the blurry frames, and (2) treating the blurry frames as missing values and estimate them by interpolation. These approaches are complementary to each other, and their combination outperforms individual ones. We also introduce a novel module that leverages the structure of faces for finding positional offsets between video frames. This module can be integrated into the processing pipelines of both approaches, improving the quality of the final outcome. Experiments on three real and synthetically generated blurry video datasets show that our method outperforms the previous state-of-the-art methods by a large margin in terms of both quantitative and qualitative results.
△ Less
Submitted 1 March, 2021;
originally announced March 2021.
-
A Deep Reinforcement Learning Based Multi-Criteria Decision Support System for Textile Manufacturing Process Optimization
Authors:
Zhenglei He,
Kim Phuc Tran,
Sebastien Thomassey,
Xianyi Zeng,
Jie Xu,
Chang Haiyi
Abstract:
Textile manufacturing is a typical traditional industry involving high complexity in interconnected processes with limited capacity on the application of modern technologies. Decision-making in this domain generally takes multiple criteria into consideration, which usually arouses more complexity. To address this issue, the present paper proposes a decision support system that combines the intelli…
▽ More
Textile manufacturing is a typical traditional industry involving high complexity in interconnected processes with limited capacity on the application of modern technologies. Decision-making in this domain generally takes multiple criteria into consideration, which usually arouses more complexity. To address this issue, the present paper proposes a decision support system that combines the intelligent data-based random forest (RF) models and a human knowledge based analytical hierarchical process (AHP) multi-criteria structure in accordance to the objective and the subjective factors of the textile manufacturing process. More importantly, the textile manufacturing process is described as the Markov decision process (MDP) paradigm, and a deep reinforcement learning scheme, the Deep Q-networks (DQN), is employed to optimize it. The effectiveness of this system has been validated in a case study of optimizing a textile ozonation process, showing that it can better master the challenging decision-making tasks in textile manufacturing processes.
△ Less
Submitted 29 December, 2020;
originally announced December 2020.
-
Multi-Objective Optimization of the Textile Manufacturing Process Using Deep-Q-Network Based Multi-Agent Reinforcement Learning
Authors:
Zhenglei He,
Kim Phuc Tran,
Sebastien Thomassey,
Xianyi Zeng,
Jie Xu,
Changhai Yi
Abstract:
Multi-objective optimization of the textile manufacturing process is an increasing challenge because of the growing complexity involved in the development of the textile industry. The use of intelligent techniques has been often discussed in this domain, although a significant improvement from certain successful applications has been reported, the traditional methods failed to work with high-as we…
▽ More
Multi-objective optimization of the textile manufacturing process is an increasing challenge because of the growing complexity involved in the development of the textile industry. The use of intelligent techniques has been often discussed in this domain, although a significant improvement from certain successful applications has been reported, the traditional methods failed to work with high-as well as human intervention. Upon which, this paper proposed a multi-agent reinforcement learning (MARL) framework to transform the optimization process into a stochastic game and introduced the deep Q-networks algorithm to train the multiple agents. A utilitarian selection mechanism was employed in the stochastic game, which (-greedy policy) in each state to avoid the interruption of multiple equilibria and achieve the correlated equilibrium optimal solutions of the optimizing process. The case study result reflects that the proposed MARL system is possible to achieve the optimal solutions for the textile ozonation process and it performs better than the traditional approaches.
△ Less
Submitted 2 December, 2020;
originally announced December 2020.
-
Deep4Air: A Novel Deep Learning Framework for Airport Airside Surveillance
Authors:
Phat Thai,
Sameer Alam,
Nimrod Lilith,
Phu N. Tran,
Binh Nguyen Thanh
Abstract:
An airport runway and taxiway (airside) area is a highly dynamic and complex environment featuring interactions between different types of vehicles (speed and dimension), under varying visibility and traffic conditions. Airport ground movements are deemed safety-critical activities, and safe-separation procedures must be maintained by Air Traffic Controllers (ATCs). Large airports with complicated…
▽ More
An airport runway and taxiway (airside) area is a highly dynamic and complex environment featuring interactions between different types of vehicles (speed and dimension), under varying visibility and traffic conditions. Airport ground movements are deemed safety-critical activities, and safe-separation procedures must be maintained by Air Traffic Controllers (ATCs). Large airports with complicated runway-taxiway systems use advanced ground surveillance systems. However, these systems have inherent limitations and a lack of real-time analytics. In this paper, we propose a novel computer-vision based framework, namely "Deep4Air", which can not only augment the ground surveillance systems via the automated visual monitoring of runways and taxiways for aircraft location, but also provide real-time speed and distance analytics for aircraft on runways and taxiways. The proposed framework includes an adaptive deep neural network for efficiently detecting and tracking aircraft. The experimental results show an average precision of detection and tracking of up to 99.8% on simulated data with validations on surveillance videos from the digital tower at George Bush Intercontinental Airport. The results also demonstrate that "Deep4Air" can locate aircraft positions relative to the airport runway and taxiway infrastructure with high accuracy. Furthermore, aircraft speed and separation distance are monitored in real-time, providing enhanced safety management.
△ Less
Submitted 21 July, 2021; v1 submitted 2 October, 2020;
originally announced October 2020.
-
A reinforcement learning based decision support system in textile manufacturing process
Authors:
Zhenglei He,
Kim Phuc Tran,
Sébastien Thomassey,
Xianyi Zeng,
Changhai Yi
Abstract:
This paper introduced a reinforcement learning based decision support system in textile manufacturing process. A solution optimization problem of color fading ozonation is discussed and set up as a Markov Decision Process (MDP) in terms of tuple {S, A, P, R}. Q-learning is used to train an agent in the interaction with the setup environment by accumulating the reward R. According to the applicatio…
▽ More
This paper introduced a reinforcement learning based decision support system in textile manufacturing process. A solution optimization problem of color fading ozonation is discussed and set up as a Markov Decision Process (MDP) in terms of tuple {S, A, P, R}. Q-learning is used to train an agent in the interaction with the setup environment by accumulating the reward R. According to the application result, it is found that the proposed MDP model has well expressed the optimization problem of textile manufacturing process discussed in this paper, therefore the use of reinforcement learning to support decision making in this sector is conducted and proven that is applicable with promising prospects.
△ Less
Submitted 20 May, 2020;
originally announced May 2020.
-
Continuous Deep Hierarchical Reinforcement Learning for Ground-Air Swarm Shepherding
Authors:
Hung The Nguyen,
Tung Duy Nguyen,
Vu Phi Tran,
Matthew Garratt,
Kathryn Kasmarik,
Sreenatha Anavatti,
Michael Barlow,
Hussein A. Abbass
Abstract:
The control and guidance of multi-robots (swarm) is a non-trivial problem due to the complexity inherent in the coupled interaction among the group. Whether the swarm is cooperative or non-cooperative, lessons can be learnt from sheepdogs herding sheep. Biomimicry of shepherding offers computational methods for swarm control with the potential to generalize and scale in different environments. How…
▽ More
The control and guidance of multi-robots (swarm) is a non-trivial problem due to the complexity inherent in the coupled interaction among the group. Whether the swarm is cooperative or non-cooperative, lessons can be learnt from sheepdogs herding sheep. Biomimicry of shepherding offers computational methods for swarm control with the potential to generalize and scale in different environments. However, learning to shepherd is complex due to the large search space that a machine learner is faced with. We present a deep hierarchical reinforcement learning approach for shepherding, whereby an unmanned aerial vehicle (UAV) learns to act as an aerial sheepdog to control and guide a swarm of unmanned ground vehicles (UGVs). The approach extends our previous work on machine education to decompose the search space into a hierarchically organized curriculum. Each lesson in the curriculum is learnt by a deep reinforcement learning model. The hierarchy is formed by fusing the outputs of the model. The approach is demonstrated first in a high-fidelity robotic-operating-system (ROS)-based simulation environment, then with physical UGVs and a UAV in an in-door testing facility. We investigate the ability of the method to generalize as the models move from simulation to the real-world and as the models move from one scale to another.
△ Less
Submitted 26 August, 2020; v1 submitted 24 April, 2020;
originally announced April 2020.
-
ModelHub.AI: Dissemination Platform for Deep Learning Models
Authors:
Ahmed Hosny,
Michael Schwier,
Christoph Berger,
Evin P Örnek,
Mehmet Turan,
Phi V Tran,
Leon Weninger,
Fabian Isensee,
Klaus H Maier-Hein,
Richard McKinley,
Michael T Lu,
Udo Hoffmann,
Bjoern Menze,
Spyridon Bakas,
Andriy Fedorov,
Hugo JWL Aerts
Abstract:
Recent advances in artificial intelligence research have led to a profusion of studies that apply deep learning to problems in image analysis and natural language processing among others. Additionally, the availability of open-source computational frameworks has lowered the barriers to implementing state-of-the-art methods across multiple domains. Albeit leading to major performance breakthroughs…
▽ More
Recent advances in artificial intelligence research have led to a profusion of studies that apply deep learning to problems in image analysis and natural language processing among others. Additionally, the availability of open-source computational frameworks has lowered the barriers to implementing state-of-the-art methods across multiple domains. Albeit leading to major performance breakthroughs in some tasks, effective dissemination of deep learning algorithms remains challenging, inhibiting reproducibility and benchmarking studies, impeding further validation, and ultimately hindering their effectiveness in the cumulative scientific progress. In developing a platform for sharing research outputs, we present ModelHub.AI (www.modelhub.ai), a community-driven container-based software engine and platform for the structured dissemination of deep learning models. For contributors, the engine controls data flow throughout the inference cycle, while the contributor-facing standard template exposes model-specific functions including inference, as well as pre- and post-processing. Python and RESTful Application programming interfaces (APIs) enable users to interact with models hosted on ModelHub.AI and allows both researchers and developers to utilize models out-of-the-box. ModelHub.AI is domain-, data-, and framework-agnostic, catering to different workflows and contributors' preferences.
△ Less
Submitted 26 November, 2019;
originally announced November 2019.
-
Exploring Self-Supervised Regularization for Supervised and Semi-Supervised Learning
Authors:
Phi Vu Tran
Abstract:
Recent advances in semi-supervised learning have shown tremendous potential in overcoming a major barrier to the success of modern machine learning algorithms: access to vast amounts of human-labeled training data. Previous algorithms based on consistency regularization can harness the abundance of unlabeled data to produce impressive results on a number of semi-supervised benchmarks, approaching…
▽ More
Recent advances in semi-supervised learning have shown tremendous potential in overcoming a major barrier to the success of modern machine learning algorithms: access to vast amounts of human-labeled training data. Previous algorithms based on consistency regularization can harness the abundance of unlabeled data to produce impressive results on a number of semi-supervised benchmarks, approaching the performance of strong supervised baselines using only a fraction of the available labeled data. In this work, we challenge the long-standing success of consistency regularization by introducing self-supervised regularization as the basis for combining semantic feature representations from unlabeled data. We perform extensive comparative experiments to demonstrate the effectiveness of self-supervised regularization for supervised and semi-supervised image classification on SVHN, CIFAR-10, and CIFAR-100 benchmark datasets. We present two main results: (1) models augmented with self-supervised regularization significantly improve upon traditional supervised classifiers without the need for unlabeled data; (2) together with unlabeled data, our models yield semi-supervised performance competitive with, and in many cases exceeding, prior state-of-the-art consistency baselines. Lastly, our models have the practical utility of being efficiently trained end-to-end and require no additional hyper-parameters to tune for optimal performance beyond the standard set for training neural networks. Reference code and data are available at https://github.com/vuptran/sesemi
△ Less
Submitted 21 November, 2019; v1 submitted 25 June, 2019;
originally announced June 2019.
-
Wearable Sensor Data Based Human Activity Recognition using Machine Learning: A new approach
Authors:
H. D. Nguyen,
K. P. Tran,
X. Zeng,
L. Koehl,
G. Tartare
Abstract:
Recent years have witnessed the rapid development of human activity recognition (HAR) based on wearable sensor data. One can find many practical applications in this area, especially in the field of health care. Many machine learning algorithms such as Decision Trees, Support Vector Machine, Naive Bayes, K-Nearest Neighbor, and Multilayer Perceptron are successfully used in HAR. Although these met…
▽ More
Recent years have witnessed the rapid development of human activity recognition (HAR) based on wearable sensor data. One can find many practical applications in this area, especially in the field of health care. Many machine learning algorithms such as Decision Trees, Support Vector Machine, Naive Bayes, K-Nearest Neighbor, and Multilayer Perceptron are successfully used in HAR. Although these methods are fast and easy for implementation, they still have some limitations due to poor performance in a number of situations. In this paper, we propose a novel method based on the ensemble learning to boost the performance of these machine learning methods for HAR.
△ Less
Submitted 9 May, 2019;
originally announced May 2019.
-
An extended polygonal finite element method for large deformation fracture analysis
Authors:
Hai D. Huynh,
Phuong Tran,
Xiaoying Zhuang,
H. Nguyen-Xuan
Abstract:
The modeling of large deformation fracture mechanics has been a challenging problem regarding the accuracy of numerical methods and their ability to deal with considerable changes in deformations of meshes where having the presence of cracks. This paper further investigates the extended finite element method (XFEM) for the simulation of large strain fracture for hyper-elastic materials, in particu…
▽ More
The modeling of large deformation fracture mechanics has been a challenging problem regarding the accuracy of numerical methods and their ability to deal with considerable changes in deformations of meshes where having the presence of cracks. This paper further investigates the extended finite element method (XFEM) for the simulation of large strain fracture for hyper-elastic materials, in particular rubber ones. A crucial idea is to use a polygonal mesh to represent space of the present numerical technique in advance, and then a local refinement of structured meshes at the vicinity of the discontinuities is additionally established. Due to differences in the size and type of elements at the boundaries of those two regions, hanging nodes produced in the modified mesh are considered as normal nodes in an arbitrarily polygonal element. Conforming these special elements becomes straightforward by the flexible use of basis functions over polygonal elements. Results of this study are shown through several numerical examples to prove its efficiency and accuracy through comparison with former achievements.
△ Less
Submitted 20 March, 2019; v1 submitted 9 March, 2019;
originally announced March 2019.
-
Time-Varying Formation Control of a Collaborative Multi-Agent System Using Negative-Imaginary Systems Theory
Authors:
Vu Phi Tran,
Matthew Garratt,
Ian R. Petersen
Abstract:
The movement of cooperative robots in a densely cluttered environment may not be possible if the formation type is invariant. Hence, we investigate a new method for time-varying formation control for a group of heterogeneous autonomous vehicles, which may include Unmanned Ground Vehicles (UGV) and Unmanned Aerial Vehicles (UAV). We have extended a Negative-Imaginary (NI) consensus control approach…
▽ More
The movement of cooperative robots in a densely cluttered environment may not be possible if the formation type is invariant. Hence, we investigate a new method for time-varying formation control for a group of heterogeneous autonomous vehicles, which may include Unmanned Ground Vehicles (UGV) and Unmanned Aerial Vehicles (UAV). We have extended a Negative-Imaginary (NI) consensus control approach to switch the formation shape of the robots whilst only using the relative distance between agents and between agents and obstacles. All agents can automatically create a new safe formation to overcome obstacles based on a novel geometric method, then restore the prototype formation once the obstacles are cleared. Furthermore, we improve the position consensus at sharp corners by achieving yaw consensus between robots. Simulation and experimental results are then analyzed to validate the feasibility of our proposed approach.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Distributed Obstacle and Multi-Robot Collision Avoidance in Uncertain Environments
Authors:
Vu Phi Tran,
Matthew Garratt,
Ian R. Petersen
Abstract:
This paper tackles the distributed leader-follower (L-F) control problem for heterogeneous mobile robots in unknown environments requiring obstacle avoidance, inter-robot collision avoidance, and reliable robot communications. To prevent an inter-robot collision, we employ a virtual propulsive force between robots. For obstacle avoidance, we present a novel distributed Negative-Imaginary (NI) vari…
▽ More
This paper tackles the distributed leader-follower (L-F) control problem for heterogeneous mobile robots in unknown environments requiring obstacle avoidance, inter-robot collision avoidance, and reliable robot communications. To prevent an inter-robot collision, we employ a virtual propulsive force between robots. For obstacle avoidance, we present a novel distributed Negative-Imaginary (NI) variant formation tracking control approach and a dynamic network topology methodology which allows the formation to change its shape and the robot to switch their roles. In the case of communication or sensor loss, a UAV, controlled by a Strictly-Negative-Imaginary (SNI) controller with good wind resistance characteristics, is utilized to track the position of the UGV formation using its camera. Simulations and indoor experiments have been conducted to validate the proposed methods.
△ Less
Submitted 15 November, 2018;
originally announced November 2018.
-
Multi-Task Graph Autoencoders
Authors:
Phi Vu Tran
Abstract:
We examine two fundamental tasks associated with graph representation learning: link prediction and node classification. We present a new autoencoder architecture capable of learning a joint representation of local graph structure and available node features for the simultaneous multi-task learning of unsupervised link prediction and semi-supervised node classification. Our simple, yet effective a…
▽ More
We examine two fundamental tasks associated with graph representation learning: link prediction and node classification. We present a new autoencoder architecture capable of learning a joint representation of local graph structure and available node features for the simultaneous multi-task learning of unsupervised link prediction and semi-supervised node classification. Our simple, yet effective and versatile model is efficiently trained end-to-end in a single stage, whereas previous related deep graph embedding methods require multiple training steps that are difficult to optimize. We provide an empirical evaluation of our model on five benchmark relational, graph-structured datasets and demonstrate significant improvement over three strong baselines for graph representation learning. Reference code and data are available at https://github.com/vuptran/graph-representation-learning
△ Less
Submitted 7 November, 2018;
originally announced November 2018.
-
Transparency by Design: Closing the Gap Between Performance and Interpretability in Visual Reasoning
Authors:
David Mascharka,
Philip Tran,
Ryan Soklaski,
Arjun Majumdar
Abstract:
Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for performing visual reasoning tasks. While modular networks were initially designed with a degree of model transparency, their performance on complex visual reasoni…
▽ More
Visual question answering requires high-order reasoning about an image, which is a fundamental capability needed by machine systems to follow complex directives. Recently, modular networks have been shown to be an effective framework for performing visual reasoning tasks. While modular networks were initially designed with a degree of model transparency, their performance on complex visual reasoning benchmarks was lacking. Current state-of-the-art approaches do not provide an effective mechanism for understanding the reasoning process. In this paper, we close the performance gap between interpretable models and state-of-the-art visual reasoning methods. We propose a set of visual-reasoning primitives which, when composed, manifest as a model capable of performing complex reasoning tasks in an explicitly-interpretable manner. The fidelity and interpretability of the primitives' outputs enable an unparalleled ability to diagnose the strengths and weaknesses of the resulting model. Critically, we show that these primitives are highly performant, achieving state-of-the-art accuracy of 99.1% on the CLEVR dataset. We also show that our model is able to effectively learn generalized representations when provided a small amount of data containing novel object attributes. Using the CoGenT generalization task, we show more than a 20 percentage point improvement over the current state of the art.
△ Less
Submitted 2 July, 2018; v1 submitted 14 March, 2018;
originally announced March 2018.
-
Learning to Make Predictions on Graphs with Autoencoders
Authors:
Phi Vu Tran
Abstract:
We examine two fundamental tasks associated with graph representation learning: link prediction and semi-supervised node classification. We present a novel autoencoder architecture capable of learning a joint representation of both local graph structure and available node features for the multi-task learning of link prediction and node classification. Our autoencoder architecture is efficiently tr…
▽ More
We examine two fundamental tasks associated with graph representation learning: link prediction and semi-supervised node classification. We present a novel autoencoder architecture capable of learning a joint representation of both local graph structure and available node features for the multi-task learning of link prediction and node classification. Our autoencoder architecture is efficiently trained end-to-end in a single learning stage to simultaneously perform link prediction and node classification, whereas previous related methods require multiple training steps that are difficult to optimize. We provide a comprehensive empirical evaluation of our models on nine benchmark graph-structured datasets and demonstrate significant improvement over related methods for graph representation learning. Reference code and data are available at https://github.com/vuptran/graph-representation-learning
△ Less
Submitted 29 July, 2018; v1 submitted 22 February, 2018;
originally announced February 2018.
-
A Fully Convolutional Neural Network for Cardiac Segmentation in Short-Axis MRI
Authors:
Phi Vu Tran
Abstract:
Automated cardiac segmentation from magnetic resonance imaging datasets is an essential step in the timely diagnosis and management of cardiac pathologies. We propose to tackle the problem of automated left and right ventricle segmentation through the application of a deep fully convolutional neural network architecture. Our model is efficiently trained end-to-end in a single learning stage from w…
▽ More
Automated cardiac segmentation from magnetic resonance imaging datasets is an essential step in the timely diagnosis and management of cardiac pathologies. We propose to tackle the problem of automated left and right ventricle segmentation through the application of a deep fully convolutional neural network architecture. Our model is efficiently trained end-to-end in a single learning stage from whole-image inputs and ground truths to make inference at every pixel. To our knowledge, this is the first application of a fully convolutional neural network architecture for pixel-wise labeling in cardiac magnetic resonance imaging. Numerical experiments demonstrate that our model is robust to outperform previous fully automated methods across multiple evaluation measures on a range of cardiac datasets. Moreover, our model is fast and can leverage commodity compute resources such as the graphics processing unit to enable state-of-the-art cardiac segmentation at massive scales. The models and code are available at https://github.com/vuptran/cardiac-segmentation
△ Less
Submitted 26 April, 2017; v1 submitted 2 April, 2016;
originally announced April 2016.
-
Novel Intrusion Detection using Probabilistic Neural Network and Adaptive Boosting
Authors:
Tich Phuoc Tran,
Longbing Cao,
Dat Tran,
Cuong Duc Nguyen
Abstract:
This article applies Machine Learning techniques to solve Intrusion Detection problems within computer networks. Due to complex and dynamic nature of computer networks and hacking techniques, detecting malicious activities remains a challenging task for security experts, that is, currently available defense systems suffer from low detection capability and high number of false alarms. To overcome…
▽ More
This article applies Machine Learning techniques to solve Intrusion Detection problems within computer networks. Due to complex and dynamic nature of computer networks and hacking techniques, detecting malicious activities remains a challenging task for security experts, that is, currently available defense systems suffer from low detection capability and high number of false alarms. To overcome such performance limitations, we propose a novel Machine Learning algorithm, namely Boosted Subspace Probabilistic Neural Network (BSPNN), which integrates an adaptive boosting technique and a semi parametric neural network to obtain good tradeoff between accuracy and generality. As the result, learning bias and generalization variance can be significantly minimized. Substantial experiments on KDD 99 intrusion benchmark indicate that our model outperforms other state of the art learning algorithms, with significantly improved detection accuracy, minimal false alarms and relatively small computational complexity.
△ Less
Submitted 2 November, 2009;
originally announced November 2009.