subscribe to arXiv mailings

RoboMorph: Evolving Robot Morphology using Large Language Models

Authors: Kevin Qiu, Krzysztof Ciebiera, Paweł Fijałkowski, Marek Cygan, Łukasz Kuciński

Abstract: We introduce RoboMorph, an automated approach for generating and optimizing modular robot designs using large language models (LLMs) and evolutionary algorithms. In this framework, we represent each robot design as a grammar and leverage the capabilities of LLMs to navigate the extensive robot design space, which is traditionally time-consuming and computationally demanding. By integrating automat… ▽ More We introduce RoboMorph, an automated approach for generating and optimizing modular robot designs using large language models (LLMs) and evolutionary algorithms. In this framework, we represent each robot design as a grammar and leverage the capabilities of LLMs to navigate the extensive robot design space, which is traditionally time-consuming and computationally demanding. By integrating automatic prompt design and a reinforcement learning based control algorithm, RoboMorph iteratively improves robot designs through feedback loops. Our experimental results demonstrate that RoboMorph can successfully generate nontrivial robots that are optimized for a single terrain while showcasing improvements in morphology over successive evolutions. Our approach demonstrates the potential of using LLMs for data-driven and modular robot design, providing a promising methodology that can be extended to other domains with similar design frameworks. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2406.10856 [pdf, other]

LEO Satellite Networks Assisted Geo-distributed Data Processing

Authors: Zhiyuan Zhao, Zhe Chen, Zheng Lin, Wenjun Zhu, Kun Qiu, Chaoqun You, Yue Gao

Abstract: Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective soluti… ▽ More Nowadays, the increasing deployment of edge clouds globally provides users with low-latency services. However, connecting an edge cloud to a core cloud via optic cables in terrestrial networks poses significant barriers due to the prohibitively expensive building cost of optic cables. Fortunately, emerging Low Earth Orbit (LEO) satellite networks (e.g., Starlink) offer a more cost-effective solution for increasing edge clouds, and hence large volumes of data in edge clouds can be transferred to a core cloud via those networks for time-sensitive big data tasks processing, such as attack detection. However, the state-of-the-art satellite selection algorithms bring poor performance for those processing via our measurements. Therefore, we propose a novel data volume aware satellite selection algorithm, named DVA, to support such big data processing tasks. DVA first takes into account both the data size in edge clouds and satellite capacity to finalize the selection, thereby preventing congestion in the access network and reducing transmitting duration. Extensive simulations validate that DVA has a significantly lower average access network duration than the state-of-the-art satellite selection algorithms in a LEO satellite emulation platform. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 6 pages, 5 figures

arXiv:2406.09750 [pdf, other]

ControlVAR: Exploring Controllable Visual Autoregressive Modeling

Authors: Xiang Li, Kai Qiu, Hao Chen, Jason Kuen, Zhe Lin, Rita Singh, Bhiksha Raj

Abstract: Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs), especially in tasks like control-to-image generation. However, challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces ControlVAR, a novel… ▽ More Conditional visual generation has witnessed remarkable progress with the advent of diffusion models (DMs), especially in tasks like control-to-image generation. However, challenges such as expensive computational cost, high inference latency, and difficulties of integration with large language models (LLMs) have necessitated exploring alternatives to DMs. This paper introduces ControlVAR, a novel framework that explores pixel-level controls in visual autoregressive (VAR) modeling for flexible and efficient conditional generation. In contrast to traditional conditional models that learn the conditional distribution, ControlVAR jointly models the distribution of image and pixel-level conditions during training and imposes conditional controls during testing. To enhance the joint modeling, we adopt the next-scale AR prediction paradigm and unify control and image representations. A teacher-forcing guidance strategy is proposed to further facilitate controllable generation with joint modeling. Extensive experiments demonstrate the superior efficacy and flexibility of ControlVAR across various conditional generation tasks against popular conditional DMs, \eg, ControlNet and T2I-Adaptor. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 24 pages, 19 figures, 4 tables

arXiv:2406.03159 [pdf, other]

Hurry: Dynamic Collaborative Framework For Low-orbit Mega-Constellation Data Downloading

Authors: Handong Luo, Wenhao Liu, Qi Zhang, Ziheng Yang, Quanwei Lin, Wenjun Zhu, Kun Qiu, Zhe Chen, Yue Gao

Abstract: Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field. Each satellite collects TB-level data daily, including delay-sensitive data used for crucial tasks, such as military surveillance, natural disaster monitoring, and weather forecasting. According to NASA's sta… ▽ More Low-orbit mega-constellation network, which utilize thousands of satellites to provide a variety of network services and collect a wide range of space information, is a rapidly growing field. Each satellite collects TB-level data daily, including delay-sensitive data used for crucial tasks, such as military surveillance, natural disaster monitoring, and weather forecasting. According to NASA's statement, these data need to be downloaded to the ground for processing within 3 to 5 hours. To reduce the time required for satellite data downloads, the state-of-the-art solution known as CoDld, which is only available for small constellations, uses an iterative method for cooperative downloads via inter-satellite links. However, in LMCN, the time required to download the same amount of data using CoDld will exponentially increase compared to downloading the same amount of data in a small constellation. We have identified and analyzed the reasons for this degradation phenomenon and propose a new satellite data download framework, named Hurry. By modeling and mapping satellite topology changes and data transmission to Time-Expanded Graphs, we implement our algorithm within the Hurry framework to avoid degradation effects. In the fixed data volume download evaluation, Hurry achieves 100% completion of the download task while the CoDld only reached 44% of download progress. In continuous data generation evaluation, the Hurry flow algorithm improves throughput from 11% to 66% compared to the CoDld in different scenarios. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 15 pages, 7 figures

arXiv:2405.12731 [pdf, other]

From Today's Code to Tomorrow's Symphony: The AI Transformation of Developer's Routine by 2030

Authors: Matteo Ciniselli, Niccolò Puccinelli, Ketai Qiu, Luca Di Grazia

Abstract: In the rapidly evolving landscape of software engineering, the integration of Artificial Intelligence (AI) into the Software Development Life-Cycle (SDLC) heralds a transformative era for developers. Recently, we have assisted to a pivotal shift towards AI-assisted programming, exemplified by tools like GitHub Copilot and OpenAI's ChatGPT, which have become a crucial element for coding, debugging,… ▽ More In the rapidly evolving landscape of software engineering, the integration of Artificial Intelligence (AI) into the Software Development Life-Cycle (SDLC) heralds a transformative era for developers. Recently, we have assisted to a pivotal shift towards AI-assisted programming, exemplified by tools like GitHub Copilot and OpenAI's ChatGPT, which have become a crucial element for coding, debugging, and software design. In this paper we provide a comparative analysis between the current state of AI-assisted programming in 2024 and our projections for 2030, by exploring how AI advancements are set to enhance the implementation phase, fundamentally altering developers' roles from manual coders to orchestrators of AI-driven development ecosystems. We envision HyperAssistant, an augmented AI tool that offers comprehensive support to 2030 developers, addressing current limitations in mental health support, fault detection, code optimization, team interaction, and skill development. We emphasize AI as a complementary force, augmenting developers' capabilities rather than replacing them, leading to the creation of sophisticated, reliable, and secure software solutions. Our vision seeks to anticipate the evolution of programming practices, challenges, and future directions, shaping a new paradigm where developers and AI collaborate more closely, promising a significant leap in SE efficiency, security and creativity. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2404.12141 [pdf, other]

MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

Authors: Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

Abstract: Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and… ▽ More Generative models for structure-based drug design (SBDD) have shown promising results in recent years. Existing works mainly focus on how to generate molecules with higher binding affinity, ignoring the feasibility prerequisites for generated 3D poses and resulting in false positives. We conduct thorough studies on key factors of ill-conformational problems when applying autoregressive methods and diffusion to SBDD, including mode collapse and hybrid continuous-discrete space. In this paper, we introduce MolCRAFT, the first SBDD model that operates in the continuous parameter space, together with a novel noise reduced sampling strategy. Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. To our best knowledge, MolCRAFT is the first to achieve reference-level Vina Scores (-6.59 kcal/mol) with comparable molecular size, outperforming other strong baselines by a wide margin (-0.84 kcal/mol). Code is available at https://github.com/AlgoMole/MolCRAFT. △ Less

Submitted 27 May, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: Accepted to ICML 2024

arXiv:2403.08515 [pdf, other]

Plotinus: A Satellite Internet Digital Twin System

Authors: Yue Gao, Kun Qiu, Zhe Chen, Wenjun Zhu, Qi Zhang, Handong Luo, Quanwei Lin, Ziheng Yang, Wenhao Liu

Abstract: The development of an integrated space-air-ground network (SAGIN) requires sophisticated satellite Internet emulation tools that can handle complex, dynamic topologies and offer in-depth analysis. Existing emulation platforms struggle with challenges like the need for detailed implementation across all network layers, real-time response, and scalability. This paper proposes a digital twin system b… ▽ More The development of an integrated space-air-ground network (SAGIN) requires sophisticated satellite Internet emulation tools that can handle complex, dynamic topologies and offer in-depth analysis. Existing emulation platforms struggle with challenges like the need for detailed implementation across all network layers, real-time response, and scalability. This paper proposes a digital twin system based on microservices for satellite Internet emulation, namely Plotinus, which aims to solve these problems. Plotinus features a modular design, allowing for easy replacement of the physical layer to emulate different aerial vehicles and analyze channel interference. It also enables replacing path computation methods to simplify testing and deploying algorithms. In particular, Plotinus allows for real-time emulation with live network traffic, enhancing practical network models. The evaluation result shows Plotinus's effective emulation of dynamic satellite networks with real-world devices. Its adaptability for various communication models and algorithm testing highlights Plotinus's role as a vital tool for developing and analyzing SAGIN systems, offering a cross-layer, real-time and scalable digital twin system. △ Less

Submitted 24 March, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

arXiv:2403.04924 [pdf, other]

$\text{R}^2$-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations

Authors: Xiang Li, Kai Qiu, Jinglu Wang, Xiaohao Xu, Rita Singh, Kashu Yamazak, Hao Chen, Xiaonan Huang, Bhiksha Raj

Abstract: Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive. Despite progress in this field, the robustness of referring perception models (RPMs) against disruptive perturbations is not well explored. This work thoroughly assesses t… ▽ More Referring perception, which aims at grounding visual objects with multimodal referring guidance, is essential for bridging the gap between humans, who provide instructions, and the environment where intelligent systems perceive. Despite progress in this field, the robustness of referring perception models (RPMs) against disruptive perturbations is not well explored. This work thoroughly assesses the resilience of RPMs against various perturbations in both general and specific contexts. Recognizing the complex nature of referring perception tasks, we present a comprehensive taxonomy of perturbations, and then develop a versatile toolbox for synthesizing and evaluating the effects of composite disturbances. Employing this toolbox, we construct $\text{R}^2$-Bench, a benchmark for assessing the Robustness of Referring perception models under noisy conditions across five key tasks. Moreover, we propose the $\text{R}^2$-Agent, an LLM-based agent that simplifies and automates model evaluation via natural language instructions. Our investigation uncovers the vulnerabilities of current RPMs to various perturbations and provides tools for assessing model robustness, potentially promoting the safe and resilient integration of intelligent systems into complex real-world scenarios. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: Code and dataset will be released at https://github.com/lxa9867/r2bench

arXiv:2401.05771 [pdf, ps, other]

Learn From Zoom: Decoupled Supervised Contrastive Learning For WCE Image Classification

Authors: Kunpeng Qiu, Zhiying Zhou, Yongxin Guo

Abstract: Accurate lesion classification in Wireless Capsule Endoscopy (WCE) images is vital for early diagnosis and treatment of gastrointestinal (GI) cancers. However, this task is confronted with challenges like tiny lesions and background interference. Additionally, WCE images exhibit higher intra-class variance and inter-class similarities, adding complexity. To tackle these challenges, we propose Deco… ▽ More Accurate lesion classification in Wireless Capsule Endoscopy (WCE) images is vital for early diagnosis and treatment of gastrointestinal (GI) cancers. However, this task is confronted with challenges like tiny lesions and background interference. Additionally, WCE images exhibit higher intra-class variance and inter-class similarities, adding complexity. To tackle these challenges, we propose Decoupled Supervised Contrastive Learning for WCE image classification, learning robust representations from zoomed-in WCE images generated by Saliency Augmentor. Specifically, We use uniformly down-sampled WCE images as anchors and WCE images from the same class, especially their zoomed-in images, as positives. This approach empowers the Feature Extractor to capture rich representations from various views of the same image, facilitated by Decoupled Supervised Contrastive Learning. Training a linear Classifier on these representations within 10 epochs yields an impressive 92.01% overall accuracy, surpassing the prior state-of-the-art (SOTA) by 0.72% on a blend of two publicly accessible WCE datasets. Code is available at: https://github.com/Qiukunpeng/DSCL. △ Less

Submitted 11 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP2024

arXiv:2312.09020 [pdf, other]

Exploring Transferability for Randomized Smoothing

Authors: Kai Qiu, Huishuai Zhang, Zhirong Wu, Stephen Lin

Abstract: Training foundation models on extensive datasets and then finetuning them on specific tasks has emerged as the mainstream approach in artificial intelligence. However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage. In this paper, we propose a method for pretraining certifiably robust models that can be re… ▽ More Training foundation models on extensive datasets and then finetuning them on specific tasks has emerged as the mainstream approach in artificial intelligence. However, the model robustness, which is a critical aspect for safety, is often optimized for each specific task rather than at the pretraining stage. In this paper, we propose a method for pretraining certifiably robust models that can be readily finetuned for adaptation to a particular task. A key challenge is dealing with the compromise between semantic learning and robustness. We address this with a simple yet highly effective strategy based on significantly broadening the pretraining data distribution, which is shown to greatly benefit finetuning for downstream tasks. Through pretraining on a mixture of clean and various noisy images, we find that surprisingly strong certified accuracy can be achieved even when finetuning on only clean images. Furthermore, this strategy requires just a single model to deal with various noise levels, thus substantially reducing computational costs in relation to previous works that employ multiple models. Despite using just one model, our method can still yield results that are on par with, or even superior to, existing multi-model methods. △ Less

Submitted 14 December, 2023; originally announced December 2023.

arXiv:2311.18834 [pdf, other]

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

Authors: Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

Abstract: We present ART$\boldsymbol{\cdot}$V, an efficient framework for auto-regressive video generation with diffusion models. Unlike existing methods that generate entire videos in one-shot, ART$\boldsymbol{\cdot}$V generates a single frame at a time, conditioned on the previous ones. The framework offers three distinct advantages. First, it only learns simple continual motions between adjacent frames,… ▽ More We present ART$\boldsymbol{\cdot}$V, an efficient framework for auto-regressive video generation with diffusion models. Unlike existing methods that generate entire videos in one-shot, ART$\boldsymbol{\cdot}$V generates a single frame at a time, conditioned on the previous ones. The framework offers three distinct advantages. First, it only learns simple continual motions between adjacent frames, therefore avoiding modeling complex long-range motions that require huge training data. Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications. Third, it can generate arbitrarily long videos conditioned on a variety of prompts such as text, image or their combinations, making it highly versatile and flexible. To combat the common drifting issue in AR models, we propose masked diffusion model which implicitly learns which information can be drawn from reference images rather than network predictions, in order to reduce the risk of generating inconsistent appearances that cause drifting. Moreover, we further enhance generation coherence by conditioning it on the initial frame, which typically contains minimal noise. This is particularly useful for long video generation. When trained for only two weeks on four GPUs, ART$\boldsymbol{\cdot}$V already can generate videos with natural motions, rich details and a high level of aesthetic quality. Besides, it enables various appealing applications, e.g., composing a long video from multiple text prompts. △ Less

Submitted 30 November, 2023; originally announced November 2023.

Comments: 24 pages, 21 figures. Project page at https://warranweng.github.io/art.v

arXiv:2311.18829 [pdf, other]

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

Authors: Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

Abstract: We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two signific… ▽ More We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation. Unlike existing approaches that align text prompts with video directly, MicroCinema introduces a Divide-and-Conquer strategy which divides the text-to-video into a two-stage process: text-to-image generation and image\&text-to-video generation. This strategy offers two significant advantages. a) It allows us to take full advantage of the recent advances in text-to-image models, such as Stable Diffusion, Midjourney, and DALLE, to generate photorealistic and highly detailed images. b) Leveraging the generated image, the model can allocate less focus to fine-grained appearance details, prioritizing the efficient learning of motion dynamics. To implement this strategy effectively, we introduce two core designs. First, we propose the Appearance Injection Network, enhancing the preservation of the appearance of the given image. Second, we introduce the Appearance Noise Prior, a novel mechanism aimed at maintaining the capabilities of pre-trained 2D diffusion models. These design elements empower MicroCinema to generate high-quality videos with precise motion, guided by the provided text prompts. Extensive experiments demonstrate the superiority of the proposed framework. Concretely, MicroCinema achieves SOTA zero-shot FVD of 342.86 on UCF-101 and 377.40 on MSR-VTT. See https://wangyanhui666.github.io/MicroCinema.github.io/ for video samples. △ Less

Submitted 29 December, 2023; v1 submitted 30 November, 2023; originally announced November 2023.

Comments: Project page: https://wangyanhui666.github.io/MicroCinema.github.io/

arXiv:2311.03761 [pdf, other]

Augmenting Radio Signals with Wavelet Transform for Deep Learning-Based Modulation Recognition

Authors: Tao Chen, Shilian Zheng, Kunfeng Qiu, Luxin Zhang, Qi Xuan, Xiaoniu Yang

Abstract: The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the d… ▽ More The use of deep learning for radio modulation recognition has become prevalent in recent years. This approach automatically extracts high-dimensional features from large datasets, facilitating the accurate classification of modulation schemes. However, in real-world scenarios, it may not be feasible to gather sufficient training data in advance. Data augmentation is a method used to increase the diversity and quantity of training dataset and to reduce data sparsity and imbalance. In this paper, we propose data augmentation methods that involve replacing detail coefficients decomposed by discrete wavelet transform for reconstructing to generate new samples and expand the training set. Different generation methods are used to generate replacement sequences. Simulation results indicate that our proposed methods significantly outperform the other augmentation methods. △ Less

Submitted 7 November, 2023; originally announced November 2023.

arXiv:2306.08055 [pdf, other]

Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training

Authors: Abraham J. Fetterman, Ellie Kitanidis, Joshua Albrecht, Zachary Polizzi, Bryden Fogelman, Maksis Knutins, Bartosz Wróblewski, James B. Simon, Kanjun Qiu

Abstract: Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a prac… ▽ More Hyperparameter tuning of deep learning models can lead to order-of-magnitude performance gains for the same amount of compute. Despite this, systematic tuning is uncommon, particularly for large models, which are expensive to evaluate and tend to have many hyperparameters, necessitating difficult judgment calls about tradeoffs, budgets, and search bounds. To address these issues and propose a practical method for robustly tuning large models, we present Cost-Aware Pareto Region Bayesian Search (CARBS), a Bayesian optimization algorithm that performs local search around the performance-cost Pareto frontier. CARBS does well even in unbounded search spaces with many hyperparameters, learns scaling relationships so that it can tune models even as they are scaled up, and automates much of the "black magic" of tuning. Among our results, we effectively solve the entire ProcGen benchmark just by tuning a simple baseline (PPO, as provided in the original ProcGen paper). We also reproduce the model size vs. training tokens scaling result from the Chinchilla project (Hoffmann et al. 2022), while simultaneously discovering scaling laws for every other hyperparameter, via an easy automated process that uses significantly less compute and is applicable to any deep learning problem (not just language models). △ Less

Submitted 13 June, 2023; originally announced June 2023.

arXiv:2305.10596 [pdf, other]

Towards Invisible Backdoor Attacks in the Frequency Domain against Deep Neural Networks

Authors: Xinrui Liu, Yajie Wang, Yu-an Tan, Kefan Qiu, Yuanzhang Li

Abstract: Deep neural networks (DNNs) have made tremendous progress in the past ten years and have been applied in various critical applications. However, recent studies have shown that deep neural networks are vulnerable to backdoor attacks. By injecting malicious data into the training set, an adversary can plant the backdoor into the original model. The backdoor can remain hidden indefinitely until activ… ▽ More Deep neural networks (DNNs) have made tremendous progress in the past ten years and have been applied in various critical applications. However, recent studies have shown that deep neural networks are vulnerable to backdoor attacks. By injecting malicious data into the training set, an adversary can plant the backdoor into the original model. The backdoor can remain hidden indefinitely until activated by a sample with a specific trigger, which is hugely concealed, bringing serious security risks to critical applications. However, one main limitation of current backdoor attacks is that the trigger is often visible to human perception. Therefore, it is crucial to study the stealthiness of backdoor triggers. In this paper, we propose a novel frequency-domain backdooring technique. In particular, our method aims to add a backdoor trigger in the frequency domain of original images via Discrete Fourier Transform, thus hidding the trigger. We evaluate our method on three benchmark datasets: MNIST, CIFAR-10 and Imagenette. Our experiments show that we can simultaneously fool human inspection and DNN models. We further apply two image similarity evaluation metrics to illustrate that our method adds the most subtle perturbation without compromising attack success rate and clean sample accuracy. △ Less

Submitted 10 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2305.09677

arXiv:2305.09677 [pdf, other]

Stealthy Low-frequency Backdoor Attack against Deep Neural Networks

Authors: Xinrui Liu, Yu-an Tan, Yajie Wang, Kefan Qiu, Yuanzhang Li

Abstract: Deep neural networks (DNNs) have gain its popularity in various scenarios in recent years. However, its excellent ability of fitting complex functions also makes it vulnerable to backdoor attacks. Specifically, a backdoor can remain hidden indefinitely until activated by a sample with a specific trigger, which is hugely concealed. Nevertheless, existing backdoor attacks operate backdoors in spatia… ▽ More Deep neural networks (DNNs) have gain its popularity in various scenarios in recent years. However, its excellent ability of fitting complex functions also makes it vulnerable to backdoor attacks. Specifically, a backdoor can remain hidden indefinitely until activated by a sample with a specific trigger, which is hugely concealed. Nevertheless, existing backdoor attacks operate backdoors in spatial domain, i.e., the poisoned images are generated by adding additional perturbations to the original images, which are easy to detect. To bring the potential of backdoor attacks into full play, we propose low-pass attack, a novel attack scheme that utilizes low-pass filter to inject backdoor in frequency domain. Unlike traditional poisoned image generation methods, our approach reduces high-frequency components and preserve original images' semantic information instead of adding additional perturbations, improving the capability of evading current defenses. Besides, we introduce "precision mode" to make our backdoor triggered at a specified level of filtering, which further improves stealthiness. We evaluate our low-pass attack on four datasets and demonstrate that even under pollution rate of 0.01, we can perform stealthy attack without trading off attack performance. Besides, our backdoor attack can successfully bypass state-of-the-art defending mechanisms. We also compare our attack with existing backdoor attacks and show that our poisoned images are nearly invisible and retain higher image quality. △ Less

Submitted 10 May, 2023; originally announced May 2023.

arXiv:2305.01458 [pdf, other]

An Efficient Multi-solution Solver for the Inverse Kinematics of 3-Section Constant-Curvature Robots

Authors: Ke Qiu, Jingyu Zhang, Danying Sun, Rong Xiong, Haojian Lu, Yue Wang

Abstract: Piecewise constant curvature is a popular kinematics framework for continuum robots. Computing the model parameters from the desired end pose, known as the inverse kinematics problem, is fundamental in manipulation, tracking and planning tasks. In this paper, we propose an efficient multi-solution solver to address the inverse kinematics problem of 3-section constant-curvature robots by bridging b… ▽ More Piecewise constant curvature is a popular kinematics framework for continuum robots. Computing the model parameters from the desired end pose, known as the inverse kinematics problem, is fundamental in manipulation, tracking and planning tasks. In this paper, we propose an efficient multi-solution solver to address the inverse kinematics problem of 3-section constant-curvature robots by bridging both the theoretical reduction and numerical correction. We derive analytical conditions to simplify the original problem into a one-dimensional problem. Further, the equivalence of the two problems is formalised. In addition, we introduce an approximation with bounded error so that the one dimension becomes traversable while the remaining parameters analytically solvable. With the theoretical results, the global search and numerical correction are employed to implement the solver. The experiments validate the better efficiency and higher success rate of our solver than the numerical methods when one solution is required, and demonstrate the ability of obtaining multiple solutions with optimal path planning in a space with obstacles. △ Less

Submitted 2 May, 2023; originally announced May 2023.

Comments: Robotics: Science and Systems 2023

arXiv:2212.08132 [pdf, ps, other]

WEKA-Based: Key Features and Classifier for French of Five Countries

Authors: Zeqian Li, Keyu Qiu, Chenxu Jiao, Wen Zhu, Haoran Tang

Abstract: This paper describes a French dialect recognition system that will appropriately distinguish between different regional French dialects. A corpus of five regions - Monaco, French-speaking, Belgium, French-speaking Switzerland, French-speaking Canada and France, which is targeted forconstruction by the Sketch Engine. The content of the corpus is related to the four themes of eating, drinking, sleep… ▽ More This paper describes a French dialect recognition system that will appropriately distinguish between different regional French dialects. A corpus of five regions - Monaco, French-speaking, Belgium, French-speaking Switzerland, French-speaking Canada and France, which is targeted forconstruction by the Sketch Engine. The content of the corpus is related to the four themes of eating, drinking, sleeping and living, which are closely linked to popular life. The experimental results were obtained through the processing of a python coded pre-processor and Waikato Environment for Knowledge Analysis (WEKA) data analytic tool which contains many filters and classifiers for machine learning. △ Less

Submitted 10 November, 2022; originally announced December 2022.

arXiv:2212.00352 [pdf, ps, other]

doi 10.1038/s41597-022-01854-w

A Dataset with Multibeam Forward-Looking Sonar for Underwater Object Detection

Authors: Kaibing Xie, Jian Yang, Kang Qiu

Abstract: Multibeam forward-looking sonar (MFLS) plays an important role in underwater detection. There are several challenges to the research on underwater object detection with MFLS. Firstly, the research is lack of available dataset. Secondly, the sonar image, generally processed at pixel level and transformed to sector representation for the visual habits of human beings, is disadvantageous to the resea… ▽ More Multibeam forward-looking sonar (MFLS) plays an important role in underwater detection. There are several challenges to the research on underwater object detection with MFLS. Firstly, the research is lack of available dataset. Secondly, the sonar image, generally processed at pixel level and transformed to sector representation for the visual habits of human beings, is disadvantageous to the research in artificial intelligence (AI) areas. Towards these challenges, we present a novel dataset, the underwater acoustic target detection (UATD) dataset, consisting of over 9000 MFLS images captured using Tritech Gemini 1200ik sonar. Our dataset provides raw data of sonar images with annotation of 10 categories of target objects (cube, cylinder, tyres, etc). The data was collected from lake and shallow water. To verify the practicality of UATD, we apply the dataset to the state-of-the-art detectors and provide corresponding benchmarks for its accuracy and efficiency. △ Less

Submitted 1 December, 2022; v1 submitted 1 December, 2022; originally announced December 2022.

arXiv:2211.11983 [pdf, other]

Weakly-supervised Pre-training for 3D Human Pose Estimation via Perspective Knowledge

Authors: Zhongwei Qiu, Kai Qiu, Jianlong Fu, Dongmei Fu

Abstract: Modern deep learning-based 3D pose estimation approaches require plenty of 3D pose annotations. However, existing 3D datasets lack diversity, which limits the performance of current methods and their generalization ability. Although existing methods utilize 2D pose annotations to help 3D pose estimation, they mainly focus on extracting 2D structural constraints from 2D poses, ignoring the 3D infor… ▽ More Modern deep learning-based 3D pose estimation approaches require plenty of 3D pose annotations. However, existing 3D datasets lack diversity, which limits the performance of current methods and their generalization ability. Although existing methods utilize 2D pose annotations to help 3D pose estimation, they mainly focus on extracting 2D structural constraints from 2D poses, ignoring the 3D information hidden in the images. In this paper, we propose a novel method to extract weak 3D information directly from 2D images without 3D pose supervision. Firstly, we utilize 2D pose annotations and perspective prior knowledge to generate the relationship of that keypoint is closer or farther from the camera, called relative depth. We collect a 2D pose dataset (MCPC) and generate relative depth labels. Based on MCPC, we propose a weakly-supervised pre-training (WSP) strategy to distinguish the depth relationship between two points in an image. WSP enables the learning of the relative depth of two keypoints on lots of in-the-wild images, which is more capable of predicting depth and generalization ability for 3D human pose estimation. After fine-tuning on 3D pose datasets, WSP achieves state-of-the-art results on two widely-used benchmarks. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2210.13417 [pdf, other]

Avalon: A Benchmark for RL Generalization Using Procedurally Generated Worlds

Authors: Joshua Albrecht, Abraham J. Fetterman, Bryden Fogelman, Ellie Kitanidis, Bartosz Wróblewski, Nicole Seo, Michael Rosenthal, Maksis Knutins, Zachary Polizzi, James B. Simon, Kanjun Qiu

Abstract: Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gatheri… ▽ More Despite impressive successes, deep reinforcement learning (RL) systems still fall short of human performance on generalization to new tasks and environments that differ from their training. As a benchmark tailored for studying RL generalization, we introduce Avalon, a set of tasks in which embodied agents in highly diverse procedural 3D worlds must survive by navigating terrain, hunting or gathering food, and avoiding hazards. Avalon is unique among existing RL benchmarks in that the reward function, world dynamics, and action space are the same for every task, with tasks differentiated solely by altering the environment; its 20 tasks, ranging in complexity from eat and throw to hunt and navigate, each create worlds in which the agent must perform specific skills in order to survive. This setup enables investigations of generalization within tasks, between tasks, and to compositional tasks that require combining skills learned from previous tasks. Avalon includes a highly efficient simulator, a library of baselines, and a benchmark with scoring metrics evaluated against hundreds of hours of human performance, all of which are open-source and publicly available. We find that standard RL baselines make progress on most tasks but are still far from human performance, suggesting Avalon is challenging enough to advance the quest for generalizable RL. △ Less

Submitted 24 October, 2022; originally announced October 2022.

Comments: Accepted to NeurIPS Datasets and Benchmarks 2022. Video and links to all code, data, etc can be found at https://generallyintelligent.com/avalon/

arXiv:2208.07558 [pdf, other]

doi 10.1109/ICUFN55119.2022.9829628

Traffic Analytics Development Kits (TADK): Enable Real-Time AI Inference in Networking Apps

Authors: Kun Qiu, Harry Chang, Ying Wang, Xiahui Yu, Wenjun Zhu, Yingqi Liu, Jianwei Ma, Weigang Li, Xiaobo Liu, Shuo Dai

Abstract: Sophisticated traffic analytics, such as the encrypted traffic analytics and unknown malware detection, emphasizes the need for advanced methods to analyze the network traffic. Traditional methods of using fixed patterns, signature matching, and rules to detect known patterns in network traffic are being replaced with AI (Artificial Intelligence) driven algorithms. However, the absence of a high-p… ▽ More Sophisticated traffic analytics, such as the encrypted traffic analytics and unknown malware detection, emphasizes the need for advanced methods to analyze the network traffic. Traditional methods of using fixed patterns, signature matching, and rules to detect known patterns in network traffic are being replaced with AI (Artificial Intelligence) driven algorithms. However, the absence of a high-performance AI networking-specific framework makes deploying real-time AI-based processing within networking workloads impossible. In this paper, we describe the design of Traffic Analytics Development Kits (TADK), an industry-standard framework specific for AI-based networking workloads processing. TADK can provide real-time AI-based networking workload processing in networking equipment from the data center out to the edge without the need for specialized hardware (e.g., GPUs, Neural Processing Unit, and so on). We have deployed TADK in commodity WAF and 5G UPF, and the evaluation result shows that TADK can achieve a throughput up to 35.3Gbps per core on traffic feature extraction, 6.5Gbps per core on traffic classification, and can decrease SQLi/XSS detection down to 4.5us per request with higher accuracy than fixed pattern solution. △ Less

Submitted 16 August, 2022; originally announced August 2022.

Comments: Published in: 2022 Thirteenth International Conference on Ubiquitous and Future Networks (ICUFN)

arXiv:2207.12579 [pdf, other]

RenderNet: Visual Relocalization Using Virtual Viewpoints in Large-Scale Indoor Environments

Authors: Jiahui Zhang, Shitao Tang, Kejie Qiu, Rui Huang, Chuan Fang, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Abstract: Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated. Relocalization in large-scale indoor environments enables attractive applications such as augmented reality and robot navigation. However, appearance changes fast in such environments when the camera moves, which is challe… ▽ More Visual relocalization has been a widely discussed problem in 3D vision: given a pre-constructed 3D visual map, the 6 DoF (Degrees-of-Freedom) pose of a query image is estimated. Relocalization in large-scale indoor environments enables attractive applications such as augmented reality and robot navigation. However, appearance changes fast in such environments when the camera moves, which is challenging for the relocalization system. To address this problem, we propose a virtual view synthesis-based approach, RenderNet, to enrich the database and refine poses regarding this particular scenario. Instead of rendering real images which requires high-quality 3D models, we opt to directly render the needed global and local features of virtual viewpoints and apply them in the subsequent image retrieval and feature matching operations respectively. The proposed method can largely improve the performance in large-scale indoor environments, e.g., achieving an improvement of 7.1\% and 12.2\% on the Inloc dataset. △ Less

Submitted 25 July, 2022; originally announced July 2022.

arXiv:2205.02842 [pdf, other]

InvNorm: Domain Generalization for Object Detection in Gastrointestinal Endoscopy

Authors: Weichen Fan, Yuanbo Yang, Kunpeng Qiu, Shuo Wang, Yongxin Guo

Abstract: Domain Generalization is a challenging topic in computer vision, especially in Gastrointestinal Endoscopy image analysis. Due to several device limitations and ethical reasons, current open-source datasets are typically collected on a limited number of patients using the same brand of sensors. Different brands of devices and individual differences will significantly affect the model's generalizabi… ▽ More Domain Generalization is a challenging topic in computer vision, especially in Gastrointestinal Endoscopy image analysis. Due to several device limitations and ethical reasons, current open-source datasets are typically collected on a limited number of patients using the same brand of sensors. Different brands of devices and individual differences will significantly affect the model's generalizability. Therefore, to address the generalization problem in GI(Gastrointestinal) endoscopy, we propose a multi-domain GI dataset and a light, plug-in block called InvNorm(Invertible Normalization), which could achieve a better generalization performance in any structure. Previous DG(Domain Generalization) methods fail to achieve invertible transformation, which would lead to some misleading augmentation. Moreover, these models would be more likely to lead to medical ethics issues. Our method utilizes normalizing flow to achieve invertible and explainable style normalization to address the problem. The effectiveness of InvNorm is demonstrated on a wide range of tasks, including GI recognition, GI object detection, and natural image recognition. △ Less

Submitted 4 May, 2022; originally announced May 2022.

arXiv:2205.01631 [pdf, ps, other]

doi 10.1080/17445760.2022.2060977

A general approach to deriving diagnosability results of interconnection networks

Authors: Eddie Cheng, Yaping Mao, Ke Qiu, Zhizhang Shen

Abstract: We generalize an approach to deriving diagnosability results of various interconnection networks in terms of the popular $g$-good-neighbor and $g$-extra fault-tolerant models, as well as mainstream diagnostic models such as the PMC and the MM* models. As demonstrative examples, we show how to follow this constructive, and effective, process to derive the $g$-extra diagnosabilities of the hypercu… ▽ More We generalize an approach to deriving diagnosability results of various interconnection networks in terms of the popular $g$-good-neighbor and $g$-extra fault-tolerant models, as well as mainstream diagnostic models such as the PMC and the MM* models. As demonstrative examples, we show how to follow this constructive, and effective, process to derive the $g$-extra diagnosabilities of the hypercube, the $(n, k)$-star, and the arrangement graph. These results agree with those achieved individually, without duplicating structure independent technical details. Some of them come with a larger applicable range than those already known, and the result for the arrangement graph in terms of the MM* model is new. △ Less

Submitted 6 April, 2022; originally announced May 2022.

Comments: Preliminary versions of some results of this paper were announced (without proofs) at 2019 International Conference on Modeling, Simulation, Optimization and Algorithm (ICMSOA 2019), November 9-10, 2019, Sanya, China. J. Phys: Conf. Ser. 1409 (2019) 012024

arXiv:2106.08564 [pdf, other]

Adaptive Visibility Graph Neural Network and its Application in Modulation Classification

Authors: Qi Xuan, Kunfeng Qiu, Jinchao Zhou, Zhuangzhi Chen, Dongwei Xu, Shilian Zheng, Xiaoniu Yang

Abstract: Our digital world is full of time series and graphs which capture the various aspects of many complex systems. Traditionally, there are respective methods in processing these two different types of data, e.g., Recurrent Neural Network (RNN) and Graph Neural Network (GNN), while in recent years, time series could be mapped to graphs by using the techniques such as Visibility Graph (VG), so that res… ▽ More Our digital world is full of time series and graphs which capture the various aspects of many complex systems. Traditionally, there are respective methods in processing these two different types of data, e.g., Recurrent Neural Network (RNN) and Graph Neural Network (GNN), while in recent years, time series could be mapped to graphs by using the techniques such as Visibility Graph (VG), so that researchers can use graph algorithms to mine the knowledge in time series. Such mapping methods establish a bridge between time series and graphs, and have high potential to facilitate the analysis of various real-world time series. However, the VG method and its variants are just based on fixed rules and thus lack of flexibility, largely limiting their application in reality. In this paper, we propose an Adaptive Visibility Graph (AVG) algorithm that can adaptively map time series into graphs, based on which we further establish an end-to-end classification framework AVGNet, by utilizing GNN model DiffPool as the classifier. We then adopt AVGNet for radio signal modulation classification which is an important task in the field of wireless communication. The simulations validate that AVGNet outperforms a series of advanced deep learning methods, achieving the state-of-the-art performance in this task. △ Less

Submitted 16 June, 2021; originally announced June 2021.

arXiv:2105.09543 [pdf, other]

Manual Evaluation Matters: Reviewing Test Protocols of Distantly Supervised Relation Extraction

Authors: Tianyu Gao, Xu Han, Keyue Qiu, Yuzhuo Bai, Zhiyu Xie, Yankai Lin, Zhiyuan Liu, Peng Li, Maosong Sun, Jie Zhou

Abstract: Distantly supervised (DS) relation extraction (RE) has attracted much attention in the past few years as it can utilize large-scale auto-labeled data. However, its evaluation has long been a problem: previous works either took costly and inconsistent methods to manually examine a small sample of model predictions, or directly test models on auto-labeled data -- which, by our check, produce as much… ▽ More Distantly supervised (DS) relation extraction (RE) has attracted much attention in the past few years as it can utilize large-scale auto-labeled data. However, its evaluation has long been a problem: previous works either took costly and inconsistent methods to manually examine a small sample of model predictions, or directly test models on auto-labeled data -- which, by our check, produce as much as 53% wrong labels at the entity pair level in the popular NYT10 dataset. This problem has not only led to inaccurate evaluation, but also made it hard to understand where we are and what's left to improve in the research of DS-RE. To evaluate DS-RE models in a more credible way, we build manually-annotated test sets for two DS-RE datasets, NYT10 and Wiki20, and thoroughly evaluate several competitive models, especially the latest pre-trained ones. The experimental results show that the manual evaluation can indicate very different conclusions from automatic ones, especially some unexpected observations, e.g., pre-trained models can achieve dominating performance while being more susceptible to false-positives compared to previous methods. We hope that both our manual test sets and novel observations can help advance future DS-RE research. △ Less

Submitted 20 May, 2021; originally announced May 2021.

Comments: ACL 2021 Findings

arXiv:2104.13772 [pdf, other]

doi 10.1063/5.0048243

CLPVG: Circular limited penetrable visibility graph as a new network model for time series

Authors: Qi Xuan, Jinchao Zhou, Kunfeng Qiu, Dongwei Xu, Shilian Zheng, Xiaoniu Yang

Abstract: Visibility Graph (VG) transforms time series into graphs, facilitating signal processing by advanced graph data mining algorithms. In this paper, based on the classic Limited Penetrable Visibility Graph (LPVG) method, we propose a novel nonlinear mapping method named Circular Limited Penetrable Visibility Graph (CLPVG). The testing on degree distribution and clustering coefficient on the generated… ▽ More Visibility Graph (VG) transforms time series into graphs, facilitating signal processing by advanced graph data mining algorithms. In this paper, based on the classic Limited Penetrable Visibility Graph (LPVG) method, we propose a novel nonlinear mapping method named Circular Limited Penetrable Visibility Graph (CLPVG). The testing on degree distribution and clustering coefficient on the generated graphs of typical time series validates that our CLPVG is able to effectively capture the important features of time series and has better anti-noise ability than traditional LPVG. The experiments on real-world time-series datasets of radio signal and electroencephalogram (EEG) also suggest that the structural features provided by CLPVG, rather than LPVG, are more useful for time-series classification, leading to higher accuracy. And this classification performance can be further enhanced through structural feature expansion by adopting Subgraph Networks (SGN). All of these results validate the effectiveness of our CLPVG model. △ Less

Submitted 28 February, 2021; originally announced April 2021.

Comments: 9 pages, 9 figures

arXiv:2104.04718 [pdf]

Use of Metamorphic Relations as Knowledge Carriers to Train Deep Neural Networks

Authors: Tsong Yueh Chen, Pak-Lok Poon, Kun Qiu, Zheng Zheng, Jinyi Zhou

Abstract: Training multiple-layered deep neural networks (DNNs) is difficult. The standard practice of using a large number of samples for training often does not improve the performance of a DNN to a satisfactory level. Thus, a systematic training approach is needed. To address this need, we introduce an innovative approach of using metamorphic relations (MRs) as "knowledge carriers" to train DNNs. Based o… ▽ More Training multiple-layered deep neural networks (DNNs) is difficult. The standard practice of using a large number of samples for training often does not improve the performance of a DNN to a satisfactory level. Thus, a systematic training approach is needed. To address this need, we introduce an innovative approach of using metamorphic relations (MRs) as "knowledge carriers" to train DNNs. Based on the concept of metamorphic testing and MRs (which play the role of a test oracle in software testing), we make use of the notion of metamorphic group of inputs as concrete instances of MRs (which are abstractions of knowledge) to train a DNN in a systematic and effective manner. To verify the viability of our training approach, we have conducted a preliminary experiment to compare the performance of two DNNs: one trained with MRs and the other trained without MRs. We found that the DNN trained with MRs has delivered a better performance, thereby confirming that our approach of using MRs as knowledge carriers to train DNNs is promising. More work and studies, however, are needed to solidify and leverage this approach to generate widespread impact on effective DNN training. △ Less

Submitted 11 May, 2021; v1 submitted 10 April, 2021; originally announced April 2021.

arXiv:2103.14846 [pdf, other]

AR Mapping: Accurate and Efficient Mapping for Augmented Reality

Authors: Rui Huang, Chuan Fang, Kejie Qiu, Le Cui, Zilong Dong, Siyu Zhu, Ping Tan

Abstract: Augmented reality (AR) has gained increasingly attention from both research and industry communities. By overlaying digital information and content onto the physical world, AR enables users to experience the world in a more informative and efficient manner. As a major building block for AR systems, localization aims at determining the device's pose from a pre-built "map" consisting of visual and d… ▽ More Augmented reality (AR) has gained increasingly attention from both research and industry communities. By overlaying digital information and content onto the physical world, AR enables users to experience the world in a more informative and efficient manner. As a major building block for AR systems, localization aims at determining the device's pose from a pre-built "map" consisting of visual and depth information in a known environment. While the localization problem has been widely studied in the literature, the "map" for AR systems is rarely discussed. In this paper, we introduce the AR Map for a specific scene to be composed of 1) color images with 6-DOF poses; 2) dense depth maps for each image and 3) a complete point cloud map. We then propose an efficient end-to-end solution to generating and evaluating AR Maps. Firstly, for efficient data capture, a backpack scanning device is presented with a unified calibration pipeline. Secondly, we propose an AR mapping pipeline which takes the input from the scanning device and produces accurate AR Maps. Finally, we present an approach to evaluating the accuracy of AR Maps with the help of the highly accurate reconstruction result from a high-end laser scanner. To the best of our knowledge, it is the first time to present an end-to-end solution to efficient and accurate mapping for AR applications. △ Less

Submitted 27 March, 2021; originally announced March 2021.

Comments: 8 pages, 14 figures

arXiv:2103.14826 [pdf, other]

Compact 3D Map-Based Monocular Localization Using Semantic Edge Alignment

Authors: Kejie Qiu, Shenzhou Chen, Jiahui Zhang, Rui Huang, Le Cui, Siyu Zhu, Ping Tan

Abstract: Accurate localization is fundamental to a variety of applications, such as navigation, robotics, autonomous driving, and Augmented Reality (AR). Different from incremental localization, global localization has no drift caused by error accumulation, which is desired in many application scenarios. In addition to GPS used in the open air, 3D maps are also widely used as alternative global localizatio… ▽ More Accurate localization is fundamental to a variety of applications, such as navigation, robotics, autonomous driving, and Augmented Reality (AR). Different from incremental localization, global localization has no drift caused by error accumulation, which is desired in many application scenarios. In addition to GPS used in the open air, 3D maps are also widely used as alternative global localization references. In this paper, we propose a compact 3D map-based global localization system using a low-cost monocular camera and an IMU (Inertial Measurement Unit). The proposed compact map consists of two types of simplified elements with multiple semantic labels, which is well adaptive to various man-made environments like urban environments. Also, semantic edge features are used for the key image-map registration, which is robust against occlusion and long-term appearance changes in the environments. To further improve the localization performance, the key semantic edge alignment is formulated as an optimization problem based on initial poses predicted by an independent VIO (Visual-Inertial Odometry) module. The localization system is realized with modular design in real time. We evaluate the localization accuracy through real-world experimental results compared with ground truth, long-term localization performance is also demonstrated. △ Less

Submitted 27 March, 2021; originally announced March 2021.

arXiv:2012.10658 [pdf, other]

Generalize a Small Pre-trained Model to Arbitrarily Large TSP Instances

Authors: Zhang-Hua Fu, Kai-Bin Qiu, Hongyuan Zha

Abstract: For the traveling salesman problem (TSP), the existing supervised learning based algorithms suffer seriously from the lack of generalization ability. To overcome this drawback, this paper tries to train (in supervised manner) a small-scale model, which could be repetitively used to build heat maps for TSP instances of arbitrarily large size, based on a series of techniques such as graph sampling,… ▽ More For the traveling salesman problem (TSP), the existing supervised learning based algorithms suffer seriously from the lack of generalization ability. To overcome this drawback, this paper tries to train (in supervised manner) a small-scale model, which could be repetitively used to build heat maps for TSP instances of arbitrarily large size, based on a series of techniques such as graph sampling, graph converting and heat maps merging. Furthermore, the heat maps are fed into a reinforcement learning approach (Monte Carlo tree search), to guide the search of high-quality solutions. Experimental results based on a large number of instances (with up to 10,000 vertices) show that, this new approach clearly outperforms the existing machine learning based TSP algorithms, and significantly improves the generalization ability of the trained model. △ Less

Submitted 23 February, 2021; v1 submitted 19 December, 2020; originally announced December 2020.

Journal ref: AAAI2021

arXiv:2011.03525 [pdf, other]

SigNet: A Novel Deep Learning Framework for Radio Signal Classification

Authors: Zhuangzhi Chen, Hui Cui, Jingyang Xiang, Kunfeng Qiu, Liang Huang, Shilian Zheng, Shichuan Chen, Qi Xuan, Xiaoniu Yang

Abstract: Deep learning methods achieve great success in many areas due to their powerful feature extraction capabilities and end-to-end training mechanism, and recently they are also introduced for radio signal modulation classification. In this paper, we propose a novel deep learning framework called SigNet, where a signal-to-matrix (S2M) operator is adopted to convert the original signal into a square ma… ▽ More Deep learning methods achieve great success in many areas due to their powerful feature extraction capabilities and end-to-end training mechanism, and recently they are also introduced for radio signal modulation classification. In this paper, we propose a novel deep learning framework called SigNet, where a signal-to-matrix (S2M) operator is adopted to convert the original signal into a square matrix first and is co-trained with a follow-up CNN architecture for classification. This model is further accelerated by integrating 1D convolution operators, leading to the upgraded model SigNet2.0. The simulations on two signal datasets show that both SigNet and SigNet2.0 outperform a number of well-known baselines. More interestingly, our proposed models behave extremely well in small-sample learning when only a small training dataset is provided. They can achieve a relatively high accuracy even when 1\% training data are kept, while other baseline models may lose their effectiveness much more quickly as the datasets get smaller. Such result suggests that SigNet/SigNet2.0 could be extremely useful in the situations where labeled signal data are difficult to obtain. The visualization of the output features of our models demonstrates that our model can well divide different modulation types of signals in the feature hyper-space. △ Less

Submitted 18 October, 2021; v1 submitted 28 October, 2020; originally announced November 2020.

Comments: 13 pages, 8 figures

arXiv:2003.05138 [pdf, other]

doi 10.1109/ICRA40945.2020.9196944

Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm

Authors: Hao Xu, Luqi Wang, Yichen Zhang, Kejie Qiu, Shaojie Shen

Abstract: The collaboration of unmanned aerial vehicles (UAVs) has become a popular research topic for its practicability in multiple scenarios. The collaboration of multiple UAVs, which is also known as aerial swarm is a highly complex system, which still lacks a state-of-art decentralized relative state estimation method. In this paper, we present a novel fully decentralized visual-inertial-UWB fusion fra… ▽ More The collaboration of unmanned aerial vehicles (UAVs) has become a popular research topic for its practicability in multiple scenarios. The collaboration of multiple UAVs, which is also known as aerial swarm is a highly complex system, which still lacks a state-of-art decentralized relative state estimation method. In this paper, we present a novel fully decentralized visual-inertial-UWB fusion framework for relative state estimation and demonstrate the practicability by performing extensive aerial swarm flight experiments. The comparison result with ground truth data from the motion capture system shows the centimeter-level precision which outperforms all the Ultra-WideBand (UWB) and even vision based method. The system is not limited by the field of view (FoV) of the camera or Global Positioning System (GPS), meanwhile on account of its estimation consistency, we believe that the proposed relative state estimation framework has the potential to be prevalently adopted by aerial swarm applications in different scenarios in multiple scales. △ Less

Submitted 11 March, 2020; originally announced March 2020.

Comments: Accepted ICRA 2020

Journal ref: H. Xu, L. Wang, Y. Zhang, K. Qiu and S. Shen, "Decentralized Visual-Inertial-UWB Fusion for Relative State Estimation of Aerial Swarm," 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020, pp. 8776-8782

arXiv:1912.09121 [pdf, other]

doi 10.1109/LGRS.2020.2988294

SCAttNet: Semantic Segmentation Network with Spatial and Channel Attention Mechanism for High-Resolution Remote Sensing Images

Authors: Haifeng Li, Kaijian Qiu, Li Chen, Xiaoming Mei, Liang Hong, Chao Tao

Abstract: High-resolution remote sensing images (HRRSIs) contain substantial ground object information, such as texture, shape, and spatial location. Semantic segmentation, which is an important task for element extraction, has been widely used in processing mass HRRSIs. However, HRRSIs often exhibit large intraclass variance and small interclass variance due to the diversity and complexity of ground object… ▽ More High-resolution remote sensing images (HRRSIs) contain substantial ground object information, such as texture, shape, and spatial location. Semantic segmentation, which is an important task for element extraction, has been widely used in processing mass HRRSIs. However, HRRSIs often exhibit large intraclass variance and small interclass variance due to the diversity and complexity of ground objects, thereby bringing great challenges to a semantic segmentation task. In this paper, we propose a new end-to-end semantic segmentation network, which integrates lightweight spatial and channel attention modules that can refine features adaptively. We compare our method with several classic methods on the ISPRS Vaihingen and Potsdam datasets. Experimental results show that our method can achieve better semantic segmentation results. The source codes are available at https://github.com/lehaifeng/SCAttNet. △ Less

Submitted 7 May, 2020; v1 submitted 19 December, 2019; originally announced December 2019.

Comments: 5 pages, 3 figures, 2 tables

Journal ref: IEEE Geoscience and Remote Sensing Letters 2020

arXiv:1907.12428 [pdf, other]

Learn to Scale: Generating Multipolar Normalized Density Maps for Crowd Counting

Authors: Chenfeng Xu, Kai Qiu, Jianlong Fu, Song Bai, Yongchao Xu, Xiang Bai

Abstract: Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels. Existing approaches mainly suffer from the extreme density variances. Such density pattern shift poses challenges even for multi-scale model ensembling. In this paper, we propose a simple yet effective approach to tackle this problem. First, a patch-level de… ▽ More Dense crowd counting aims to predict thousands of human instances from an image, by calculating integrals of a density map over image pixels. Existing approaches mainly suffer from the extreme density variances. Such density pattern shift poses challenges even for multi-scale model ensembling. In this paper, we propose a simple yet effective approach to tackle this problem. First, a patch-level density map is extracted by a density estimation model and further grouped into several density levels which are determined over full datasets. Second, each patch density map is automatically normalized by an online center learning strategy with a multipolar center loss. Such a design can significantly condense the density distribution into several clusters, and enable that the density variance can be learned by a single model. Extensive experiments demonstrate the superiority of the proposed method. Our work outperforms the state-of-the-art by 4.2%, 14.3%, 27.1% and 20.1% in MAE, on ShanghaiTech Part A, ShanghaiTech Part B, UCF_CC_50 and UCF-QNRF datasets, respectively. △ Less

Submitted 8 August, 2019; v1 submitted 29 July, 2019; originally announced July 2019.

Comments: Accepted to ICCV 2019

arXiv:1808.06753 [pdf, other]

Estimating Metric Poses of Dynamic Objects Using Monocular Visual-Inertial Fusion

Authors: Kejie Qiu, Tong Qin, Hongwen Xie, Shaojie Shen

Abstract: A monocular 3D object tracking system generally has only up-to-scale pose estimation results without any prior knowledge of the tracked object. In this paper, we propose a novel idea to recover the metric scale of an arbitrary dynamic object by optimizing the trajectory of the objects in the world frame, without motion assumptions. By introducing an additional constraint in the time domain, our mo… ▽ More A monocular 3D object tracking system generally has only up-to-scale pose estimation results without any prior knowledge of the tracked object. In this paper, we propose a novel idea to recover the metric scale of an arbitrary dynamic object by optimizing the trajectory of the objects in the world frame, without motion assumptions. By introducing an additional constraint in the time domain, our monocular visual-inertial tracking system can obtain continuous six degree of freedom (6-DoF) pose estimation without scale ambiguity. Our method requires neither fixed multi-camera nor depth sensor settings for scale observability, instead, the IMU inside the monocular sensing suite provides scale information for both camera itself and the tracked object. We build the proposed system on top of our monocular visual-inertial system (VINS) to obtain accurate state estimation of the monocular camera in the world frame. The whole system consists of a 2D object tracker, an object region-based visual bundle adjustment (BA), VINS and a correlation analysis-based metric scale estimator. Experimental comparisons with ground truth demonstrate the tracking accuracy of our 3D tracking performance while a mobile augmented reality (AR) demo shows the feasibility of potential applications. △ Less

Submitted 21 August, 2018; originally announced August 2018.

Comments: IROS 2018

arXiv:1112.0463 [pdf, ps, other]

Mask Iterative Hard Thresholding Algorithms for Sparse Image Reconstruction of Objects with Known Contour

Authors: Aleksandar Dogandzic, Renliang Gu, Kun Qiu

Abstract: We develop mask iterative hard thresholding algorithms (mask IHT and mask DORE) for sparse image reconstruction of objects with known contour. The measurements follow a noisy underdetermined linear model common in the compressive sampling literature. Assuming that the contour of the object that we wish to reconstruct is known and that the signal outside the contour is zero, we formulate a constrai… ▽ More We develop mask iterative hard thresholding algorithms (mask IHT and mask DORE) for sparse image reconstruction of objects with known contour. The measurements follow a noisy underdetermined linear model common in the compressive sampling literature. Assuming that the contour of the object that we wish to reconstruct is known and that the signal outside the contour is zero, we formulate a constrained residual squared error minimization problem that incorporates both the geometric information (i.e. the knowledge of the object's contour) and the signal sparsity constraint. We first introduce a mask IHT method that aims at solving this minimization problem and guarantees monotonically non-increasing residual squared error for a given signal sparsity level. We then propose a double overrelaxation scheme for accelerating the convergence of the mask IHT algorithm. We also apply convex mask reconstruction approaches that employ a convex relaxation of the signal sparsity constraint. In X-ray computed tomography (CT), we propose an automatic scheme for extracting the convex hull of the inspected object from the measured sinograms; the obtained convex hull is used to capture the object contour information. We compare the proposed mask reconstruction schemes with the existing large-scale sparse signal reconstruction methods via numerical simulations and demonstrate that, by exploiting both the geometric contour information of the underlying image and sparsity of its wavelet coefficients, we can reconstruct this image using a significantly smaller number of measurements than the existing methods. △ Less

Submitted 2 December, 2011; originally announced December 2011.

Comments: 6 pages, 19 figures, 2011 45th Asilomar Conf. Signals, Syst. Comput., Pacific Grove, CA, Nov. 2011

arXiv:1004.4880 [pdf, ps, other]

ECME Thresholding Methods for Sparse Signal Reconstruction

Authors: Kun Qiu, Aleksandar Dogandzic

Abstract: We propose a probabilistic framework for interpreting and developing hard thresholding sparse signal reconstruction methods and present several new algorithms based on this framework. The measurements follow an underdetermined linear model, where the regression-coefficient vector is the sum of an unknown deterministic sparse signal component and a zero-mean white Gaussian component with an unknown… ▽ More We propose a probabilistic framework for interpreting and developing hard thresholding sparse signal reconstruction methods and present several new algorithms based on this framework. The measurements follow an underdetermined linear model, where the regression-coefficient vector is the sum of an unknown deterministic sparse signal component and a zero-mean white Gaussian component with an unknown variance. We first derive an expectation-conditional maximization either (ECME) iteration that guarantees convergence to a local maximum of the likelihood function of the unknown parameters for a given signal sparsity level. To analyze the reconstruction accuracy, we introduce the minimum sparse subspace quotient (SSQ), a more flexible measure of the sampling operator than the well-established restricted isometry property (RIP). We prove that, if the minimum SSQ is sufficiently large, ECME achieves perfect or near-optimal recovery of sparse or approximately sparse signals, respectively. We also propose a double overrelaxation (DORE) thresholding scheme for accelerating the ECME iteration. If the signal sparsity level is unknown, we introduce an unconstrained sparsity selection (USS) criterion for its selection and show that, under certain conditions, applying this criterion is equivalent to finding the sparsest solution of the underlying underdetermined linear system. Finally, we present our automatic double overrelaxation (ADORE) thresholding method that utilizes the USS criterion to select the signal sparsity level. We apply the proposed schemes to reconstruct sparse and approximately sparse signals from tomographic projections and compressive samples. △ Less

Submitted 5 November, 2010; v1 submitted 27 April, 2010; originally announced April 2010.

Comments: 39 pages, 4 figures

Showing 1–39 of 39 results for author: Qiu, K