subscribe to arXiv mailings

Advancing UWF-SLO Vessel Segmentation with Source-Free Active Domain Adaptation and a Novel Multi-Center Dataset

Authors: Hongqiu Wang, Xiangde Luo, Wu Chen, Qingqing Tang, Mei Xin, Qiong Wang, Lei Zhu

Abstract: Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging,… ▽ More Accurate vessel segmentation in Ultra-Wide-Field Scanning Laser Ophthalmoscopy (UWF-SLO) images is crucial for diagnosing retinal diseases. Although recent techniques have shown encouraging outcomes in vessel segmentation, models trained on one medical dataset often underperform on others due to domain shifts. Meanwhile, manually labeling high-resolution UWF-SLO images is an extremely challenging, time-consuming and expensive task. In response, this study introduces a pioneering framework that leverages a patch-based active domain adaptation approach. By actively recommending a few valuable image patches by the devised Cascade Uncertainty-Predominance (CUP) selection strategy for labeling and model-finetuning, our method significantly improves the accuracy of UWF-SLO vessel segmentation across diverse medical centers. In addition, we annotate and construct the first Multi-center UWF-SLO Vessel Segmentation (MU-VS) dataset to promote this topic research, comprising data from multiple institutions. This dataset serves as a valuable resource for cross-center evaluation, verifying the effectiveness and robustness of our approach. Experimental results demonstrate that our approach surpasses existing domain adaptation and active learning methods, considerably reducing the gap between the Upper and Lower bounds with minimal annotations, highlighting our method's practical clinical value. We will release our dataset and code to facilitate relevant research: https://github.com/whq-xxh/SFADA-UWF-SLO. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: MICCAI 2024 Early Accept

arXiv:2402.18811 [pdf, other]

BFRFormer: Transformer-based generator for Real-World Blind Face Restoration

Authors: Guojing Ge, Qi Song, Guibo Zhu, Yuting Zhang, Jinglu Chen, Miao Xin, Ming Tang, Jinqiao Wang

Abstract: Blind face restoration is a challenging task due to the unknown and complex degradation. Although face prior-based methods and reference-based methods have recently demonstrated high-quality results, the restored images tend to contain over-smoothed results and lose identity-preserved details when the degradation is severe. It is observed that this is attributed to short-range dependencies, the in… ▽ More Blind face restoration is a challenging task due to the unknown and complex degradation. Although face prior-based methods and reference-based methods have recently demonstrated high-quality results, the restored images tend to contain over-smoothed results and lose identity-preserved details when the degradation is severe. It is observed that this is attributed to short-range dependencies, the intrinsic limitation of convolutional neural networks. To model long-range dependencies, we propose a Transformer-based blind face restoration method, named BFRFormer, to reconstruct images with more identity-preserved details in an end-to-end manner. In BFRFormer, to remove blocking artifacts, the wavelet discriminator and aggregated attention module are developed, and spectral normalization and balanced consistency regulation are adaptively applied to address the training instability and over-fitting problem, respectively. Extensive experiments show that our method outperforms state-of-the-art methods on a synthetic dataset and four real-world datasets. The source code, Casia-Test dataset, and pre-trained models are released at https://github.com/s8Znk/BFRFormer. △ Less

Submitted 28 February, 2024; originally announced February 2024.

Comments: Accepted by ICASSP 2024

arXiv:2402.14299 [pdf, other]

We Choose to Go to Space: Agent-driven Human and Multi-Robot Collaboration in Microgravity

Authors: Miao Xin, Zhongrui You, Zihan Zhang, Taoran Jiang, Tingjia Xu, Haotian Liang, Guojing Ge, Yuchen Ji, Shentong Mo, Jian Cheng

Abstract: We present SpaceAgents-1, a system for learning human and multi-robot collaboration (HMRC) strategies under microgravity conditions. Future space exploration requires humans to work together with robots. However, acquiring proficient robot skills and adept collaboration under microgravity conditions poses significant challenges within ground laboratories. To address this issue, we develop a microg… ▽ More We present SpaceAgents-1, a system for learning human and multi-robot collaboration (HMRC) strategies under microgravity conditions. Future space exploration requires humans to work together with robots. However, acquiring proficient robot skills and adept collaboration under microgravity conditions poses significant challenges within ground laboratories. To address this issue, we develop a microgravity simulation environment and present three typical configurations of intra-cabin robots. We propose a hierarchical heterogeneous multi-agent collaboration architecture: guided by foundation models, a Decision-Making Agent serves as a task planner for human-robot collaboration, while individual Skill-Expert Agents manage the embodied control of robots. This mechanism empowers the SpaceAgents-1 system to execute a range of intricate long-horizon HMRC tasks. △ Less

Submitted 22 February, 2024; originally announced February 2024.

arXiv:2312.10885

A novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm

Authors: Yong Niu, Xing Xing, Zhichun Jia, Ruidi Liu, Mindong Xin

Abstract: Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedd… ▽ More Sequential recommendation aims to infer user preferences from historical interaction sequences and predict the next item that users may be interested in the future. The current mainstream design approach is to represent items as fixed vectors, capturing the underlying relationships between items and user preferences based on the order of interactions. However, relying on a single fixed-item embedding may weaken the modeling capability of the system, and the global dynamics and local saliency exhibited by user preferences need to be distinguished. To address these issues, this paper proposes a novel diffusion recommendation algorithm based on multi-scale cnn and residual lstm (AREAL). We introduce diffusion models into the recommend system, representing items as probability distributions instead of fixed vectors. This approach enables adaptive reflection of multiple aspects of the items and generates item distributions in a denoising manner. We use multi-scale cnn and residual lstm methods to extract the local and global dependency features of user history interactions, and use attention mechanism to distinguish weights as the guide features of reverse diffusion recovery. The effectiveness of the proposed method is validated through experiments conducted on two real-world datasets. Specifically, AREAL obtains improvements over the best baselines by 2.63% and 4.25% in terms of HR@20 and 5.05% and 3.94% in terms of NDCG@20 on all datasets. △ Less

Submitted 20 December, 2023; v1 submitted 17 December, 2023; originally announced December 2023.

Comments: This paper needs to be further modified, including the ablation experiment, model framework and other information in Chapter 5. There are some inaccuracies in the presentation of this paper. Two datasets are used instead of three, and there are many inaccuracies in the presentation, which need to be further corrected

arXiv:2310.20009 [pdf, other]

Nash or Stackelberg? -- A comparative study for game-theoretic AV decision-making

Authors: Brady Bateman, Ming Xin, H. Eric Tseng, Mushuang Liu

Abstract: This paper studies game-theoretic decision-making for autonomous vehicles (AVs). A receding horizon multi-player game is formulated to model the AV decision-making problem. Two classes of games, including Nash game and Stackelber games, are developed respectively. For each of the two games, two solution settings, including pairwise games and multi-player games, are introduced, respectively, to sol… ▽ More This paper studies game-theoretic decision-making for autonomous vehicles (AVs). A receding horizon multi-player game is formulated to model the AV decision-making problem. Two classes of games, including Nash game and Stackelber games, are developed respectively. For each of the two games, two solution settings, including pairwise games and multi-player games, are introduced, respectively, to solve the game in multi-agent scenarios. Comparative studies are conducted via statistical simulations to gain understandings of the performance of the two classes of games and of the two solution settings, respectively. The simulations are conducted in intersection-crossing scenarios, and the game performance is quantified by three metrics: safety, travel efficiency, and computational time. △ Less

Submitted 30 October, 2023; originally announced October 2023.

Comments: 8 pages, submitted to ECC24

arXiv:2309.07694 [pdf, ps, other]

Tree of Uncertain Thoughts Reasoning for Large Language Models

Authors: Shentong Mo, Miao Xin

Abstract: While the recently introduced Tree of Thoughts (ToT) has heralded advancements in allowing Large Language Models (LLMs) to reason through foresight and backtracking for global decision-making, it has overlooked the inherent local uncertainties in intermediate decision points or "thoughts". These local uncertainties, intrinsic to LLMs given their potential for diverse responses, remain a significan… ▽ More While the recently introduced Tree of Thoughts (ToT) has heralded advancements in allowing Large Language Models (LLMs) to reason through foresight and backtracking for global decision-making, it has overlooked the inherent local uncertainties in intermediate decision points or "thoughts". These local uncertainties, intrinsic to LLMs given their potential for diverse responses, remain a significant concern in the reasoning process. Addressing this pivotal gap, we introduce the Tree of Uncertain Thoughts (TouT) - a reasoning framework tailored for LLMs. Our TouT effectively leverages Monte Carlo Dropout to quantify uncertainty scores associated with LLMs' diverse local responses at these intermediate steps. By marrying this local uncertainty quantification with global search algorithms, TouT enhances the model's precision in response generation. We substantiate our approach with rigorous experiments on two demanding planning tasks: Game of 24 and Mini Crosswords. The empirical evidence underscores TouT's superiority over both ToT and chain-of-thought prompting methods. △ Less

Submitted 14 September, 2023; originally announced September 2023.

arXiv:2207.05523 [pdf]

Integrating Vehicle Slip and Yaw in Overarching Multi-Tiered Automated Vehicle Steering Control to Balance Path Following Accuracy, Gracefulness, and Safety

Authors: Ming Xin, Mark A. Minor

Abstract: Balancing path following accuracy and error convergence with graceful motion in steering control is challenging due to the competing nature of these requirements, especially across a range of operating speeds and conditions. This paper demonstrates that an integrated multi-tiered steering controller considering the impact of slip on kinematic control, dynamic control, and steering actuator rate co… ▽ More Balancing path following accuracy and error convergence with graceful motion in steering control is challenging due to the competing nature of these requirements, especially across a range of operating speeds and conditions. This paper demonstrates that an integrated multi-tiered steering controller considering the impact of slip on kinematic control, dynamic control, and steering actuator rate commands achieves accurate and graceful path following. This work is founded on multi-tiered sideslip and yaw-based models, which allow derivation of controllers considering error due to sideslip and the mapping between steering commands and graceful lateral motion. Observer based sideslip estimates are combined with heading error in the kinematic controller to provide feedforward slip compensation. Path following error is compensated by a continuous Variable Structure Controller (VSC) using speed-based path manifolds to balance graceful motion and error convergence. Resulting yaw rate commands are used by a backstepping dynamic controller to generate steering rate commands. A High Gain Observer (HGO) estimates sideslip and yaw rate for output feedback control. Stability analysis of the output feedback controller is provided, and peaking is resolved. The work focuses on lateral control alone so that the steering controller can be combined with other speed controllers. Field results provide comparisons to related approaches demonstrating gracefulness and accuracy in different complex scenarios with varied weather conditions and perturbations. △ Less

Submitted 12 July, 2022; originally announced July 2022.

arXiv:2107.09856 [pdf, other]

Firmware Re-hosting Through Static Binary-level Porting

Authors: Mingfeng Xin, Hui Wen, Liting Deng, Hong Li, Qiang Li, Limin Sun

Abstract: The rapid growth of the Industrial Internet of Things (IIoT) has brought embedded systems into focus as major targets for both security analysts and malicious adversaries. Due to the non-standard hardware and diverse software, embedded devices present unique challenges to security analysts for the accurate analysis of firmware binaries. The diversity in hardware components and tight coupling betwe… ▽ More The rapid growth of the Industrial Internet of Things (IIoT) has brought embedded systems into focus as major targets for both security analysts and malicious adversaries. Due to the non-standard hardware and diverse software, embedded devices present unique challenges to security analysts for the accurate analysis of firmware binaries. The diversity in hardware components and tight coupling between firmware and hardware makes it hard to perform dynamic analysis, which must have the ability to execute firmware code in virtualized environments. However, emulating the large expanse of hardware peripherals makes analysts have to frequently modify the emulator for executing various firmware code in different virtualized environments, greatly limiting the ability of security analysis. In this work, we explore the problem of firmware re-hosting related to the real-time operating system (RTOS). Specifically, developers create a Board Support Package (BSP) and develop device drivers to make that RTOS run on their platform. By providing high-level replacements for BSP routines and device drivers, we can make the minimal modification of the firmware that is to be migrated from its original hardware environment into a virtualized one. We show that an approach capable of offering the ability to execute firmware at scale through patching firmware in an automated manner without modifying the existing emulators. Our approach, called static binary-level porting, first identifies the BSP and device drivers in target firmware, then patches the firmware with pre-built BSP routines and drivers that can be adapted to the existing emulators. Finally, we demonstrate the practicality of the proposed method on multiple hardware platforms and firmware samples for security analysis. The result shows that the approach is flexible enough to emulate firmware for vulnerability assessment and exploits development. △ Less

Submitted 29 July, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

arXiv:2107.01189

NTIRE 2021 Multi-modal Aerial View Object Classification Challenge

Authors: Jerrick Liu, Nathan Inkawhich, Oliver Nina, Radu Timofte, Sahil Jain, Bob Lee, Yuru Duan, Wei Wei, Lei Zhang, Songzheng Xu, Yuxuan Sun, Jiaqi Tang, Xueli Geng, Mengru Ma, Gongzhe Li, Xueli Geng, Huanqia Cai, Chengxue Cai, Sol Cummings, Casian Miron, Alexandru Pasarica, Cheng-Yen Yang, Hung-Min Hsu, Jiarui Cai, Jie Mei , et al. (9 additional authors not shown)

Abstract: In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in compl… ▽ More In this paper, we introduce the first Challenge on Multi-modal Aerial View Object Classification (MAVOC) in conjunction with the NTIRE 2021 workshop at CVPR. This challenge is composed of two different tracks using EO andSAR imagery. Both EO and SAR sensors possess different advantages and drawbacks. The purpose of this competition is to analyze how to use both sets of sensory information in complementary ways. We discuss the top methods submitted for this competition and evaluate their results on our blind test set. Our challenge results show significant improvement of more than 15% accuracy from our current baselines for each track of the competition △ Less

Submitted 6 April, 2022; v1 submitted 2 July, 2021; originally announced July 2021.

Comments: The paper needs to be withdrawn since it did not properly go through the public release process. We will soon release a new version to replace this one

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, 588-595

arXiv:2011.12105 [pdf, other]

Learning of Long-Horizon Sparse-Reward Robotic Manipulator Tasks with Base Controllers

Authors: Guangming Wang, Minjian Xin, Wenhua Wu, Zhe Liu, Hesheng Wang

Abstract: Deep Reinforcement Learning (DRL) enables robots to perform some intelligent tasks end-to-end. However, there are still many challenges for long-horizon sparse-reward robotic manipulator tasks. On the one hand, a sparse-reward setting causes exploration inefficient. On the other hand, exploration using physical robots is of high cost and unsafe. In this paper, we propose a method of learning long-… ▽ More Deep Reinforcement Learning (DRL) enables robots to perform some intelligent tasks end-to-end. However, there are still many challenges for long-horizon sparse-reward robotic manipulator tasks. On the one hand, a sparse-reward setting causes exploration inefficient. On the other hand, exploration using physical robots is of high cost and unsafe. In this paper, we propose a method of learning long-horizon sparse-reward tasks utilizing one or more existing traditional controllers named base controllers in this paper. Built upon Deep Deterministic Policy Gradients (DDPG), our algorithm incorporates the existing base controllers into stages of exploration, value learning, and policy update. Furthermore, we present a straightforward way of synthesizing different base controllers to integrate their strengths. Through experiments ranging from stacking blocks to cups, it is demonstrated that the learned state-based or image-based policies steadily outperform base controllers. Compared to previous works of learning from demonstrations, our method improves sample efficiency by orders of magnitude and improves the performance. Overall, our method bears the potential of leveraging existing industrial robot manipulation systems to build more flexible and intelligent controllers. △ Less

Submitted 4 December, 2021; v1 submitted 24 November, 2020; originally announced November 2020.

Comments: 10 pages, 6 figures, under review

arXiv:2004.10876 [pdf, other]

Flexible and Efficient Long-Range Planning Through Curious Exploration

Authors: Aidan Curtis, Minjian Xin, Dilip Arumugam, Kevin Feigelis, Daniel Yamins

Abstract: Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) l… ▽ More Identifying algorithms that flexibly and efficiently discover temporally-extended multi-phase plans is an essential step for the advancement of robotics and model-based reinforcement learning. The core problem of long-range planning is finding an efficient way to search through the tree of possible action sequences. Existing non-learned planning solutions from the Task and Motion Planning (TAMP) literature rely on the existence of logical descriptions for the effects and preconditions for actions. This constraint allows TAMP methods to efficiently reduce the tree search problem but limits their ability to generalize to unseen and complex physical environments. In contrast, deep reinforcement learning (DRL) methods use flexible neural-network-based function approximators to discover policies that generalize naturally to unseen circumstances. However, DRL methods struggle to handle the very sparse reward landscapes inherent to long-range multi-step planning situations. Here, we propose the Curious Sample Planner (CSP), which fuses elements of TAMP and DRL by combining a curiosity-guided sampling strategy with imitation learning to accelerate planning. We show that CSP can efficiently discover interesting and complex temporally-extended plans for solving a wide range of physically realistic 3D tasks. In contrast, standard planning and learning methods often fail to solve these tasks at all or do so only with a huge and highly variable number of training samples. We explore the use of a variety of curiosity metrics with CSP and analyze the types of solutions that CSP discovers. Finally, we show that CSP supports task transfer so that the exploration policies learned during experience with one task can help improve efficiency on related tasks. △ Less

Submitted 8 July, 2020; v1 submitted 22 April, 2020; originally announced April 2020.

Showing 1–11 of 11 results for author: Xin, M