subscribe to arXiv mailings

Gravitational orbital Hall effect of vortex photons in Lense-Thirring metric

Authors: Wei-Si Qiu, Dan-Dan Lian, Peng-Ming Zhang

Abstract: Vortex photons, possessing an intrinsic orbital angular momentum (OAM) aligned with the direction of propagation, are described using vortex electromagnetic wave packets. Similar to the gravitational spin Hall effect (SHE), these vortex photons are expected to exhibit intrinsic OAM-dependent trajectories and separations when propagating through a gravitational field, a phenomenon termed the gravit… ▽ More Vortex photons, possessing an intrinsic orbital angular momentum (OAM) aligned with the direction of propagation, are described using vortex electromagnetic wave packets. Similar to the gravitational spin Hall effect (SHE), these vortex photons are expected to exhibit intrinsic OAM-dependent trajectories and separations when propagating through a gravitational field, a phenomenon termed the gravitational orbital Hall effect (OHE). In this work, we construct a vortex Laguerre-Gaussian electromagnetic wave packet and analyze its motion by solving covariant Maxwell equations within the Lense-Thirring metric. Our findings reveal that vortex photons with different intrinsic OAM not only separate perpendicular to the null geodesic plane but also within it. This behavior contrasts with the gravitational SHE, where photons of opposite spins separate primarily perpendicular to the null geodesic plane. Moreover, the relationship between the separation and intrinsic OAM differs significantly from that between the separation and spin. These results suggest a unique interaction between intrinsic OAM and gravity, distinct from the spin-gravity coupling, indicating that the gravitational OHE might not be precisely predicted by merely substituting spin with intrinsic OAM in the gravitational SHE. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.05350 [pdf]

Multiple boundary states in bilayer and decorated Su-Schrieffer-Heeger-like models

Authors: Shengqun Guo, Jinke Huang, Ruimin Huang, Fengjiang Zhuang, Zhili Lin, Weibin Qiu

Abstract: Topological boundary states have attracted widespread fascination due to their series of intriguing properties. In this paper, we investigate the multiple boundary states within the two kinds of extended Su-Schrieffer-Heeger (SSH) models. The coexistence of boundary states that exist both in the bulk and band gaps is realized based on the bilayer SSH-like model, which consists of two conventional… ▽ More Topological boundary states have attracted widespread fascination due to their series of intriguing properties. In this paper, we investigate the multiple boundary states within the two kinds of extended Su-Schrieffer-Heeger (SSH) models. The coexistence of boundary states that exist both in the bulk and band gaps is realized based on the bilayer SSH-like model, which consists of two conventional square-root SSH models that are directly coupled. We further show the square-root topology within the decorated SSH-like model, which supports multiple boundary states that could be embedded into the bulk continuum by tuning the hopping parameters. In addition, the connection between the decorated SSH-like model and its effectively decomposed counterparts is revealed. Our results broaden insight into the multiple boundary states and open up an exciting avenue for the future exploration of square-root topology. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.03856 [pdf, other]

Q-Adapter: Training Your LLM Adapter as a Residual Q-Function

Authors: Yi-Chen Li, Fuxiang Zhang, Wenjie Qiu, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu

Abstract: We consider the problem of adapting Large Language Models (LLMs) pre-trained with Reinforcement Learning from Human Feedback (RLHF) to downstream preference data. Naive approaches to achieve this could be supervised fine-tuning on preferred responses or reinforcement learning with a learned reward model. However, the LLM runs the risk of forgetting its initial knowledge as the fine-tuning progress… ▽ More We consider the problem of adapting Large Language Models (LLMs) pre-trained with Reinforcement Learning from Human Feedback (RLHF) to downstream preference data. Naive approaches to achieve this could be supervised fine-tuning on preferred responses or reinforcement learning with a learned reward model. However, the LLM runs the risk of forgetting its initial knowledge as the fine-tuning progresses. To customize the LLM while preserving its existing capabilities, this paper proposes a novel method, named as Q-Adapter. We start by formalizing LLM adaptation as a problem of maximizing the linear combination of two rewards, one of which corresponds to the reward optimized by the pre-trained LLM and the other to the downstream preference data. Although both rewards are unknown, we show that this can be solved by directly learning a new module from the preference data that approximates the \emph{residual Q-function}. We consider this module to be an adapter because the original pre-trained LLM, together with it, can form the optimal customised LLM. Empirically, experiments on a range of domain-specific tasks and safety alignment tasks illustrate the superiority of Q-Adapter in both anti-forgetting and learning from new preferences. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.03317 [pdf, other]

Quantum Geometry Probed by Chiral Excitonic Optical Response of Chern Insulators

Authors: Wen-Xuan Qiu, Fengcheng Wu

Abstract: We theoretically derive the sum rule for the negative first moment of the absorptive optical conductivity with excitonic effects and establish its connection to the quantum weight $K$ and Chern number $C$ of the ground state. Applying this framework, we investigate the excitonic optical response of the Chern insulator at hole filling factor $ν=1$ in twisted bilayer MoTe$_2$. A single chiral excito… ▽ More We theoretically derive the sum rule for the negative first moment of the absorptive optical conductivity with excitonic effects and establish its connection to the quantum weight $K$ and Chern number $C$ of the ground state. Applying this framework, we investigate the excitonic optical response of the Chern insulator at hole filling factor $ν=1$ in twisted bilayer MoTe$_2$. A single chiral exciton state, which selectively absorbs circularly polarized light of a specific handedness, dominates the optical sum rule. The chiral exciton state comprises two types of interlayer electron-hole transitions, which cancel out the total out-of-plane dipole moment. The absorption spectrum shows nearly perfect magnetic circular dichroism, which can be attributed to the nearly saturated bound $K \ge |C|$ of the Chern insulator under study. Our work illustrates the potential of using excitonic optical responses to probe quantum geometry encoded by $K$ and $C$. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 6+5 pages,4+2 figures

arXiv:2406.18259 [pdf, other]

Detecting Machine-Generated Texts: Not Just "AI vs Humans" and Explainability is Complicated

Authors: Jiazhou Ji, Ruizhe Li, Shujun Li, Jie Guo, Weidong Qiu, Zheng Huang, Chiyu Chen, Xiaoyu Jiang, Xinru Lu

Abstract: As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlapping behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentia… ▽ More As LLMs rapidly advance, increasing concerns arise regarding risks about actual authorship of texts we see online and in real world. The task of distinguishing LLM-authored texts is complicated by the nuanced and overlapping behaviors of both machines and humans. In this paper, we challenge the current practice of considering LLM-generated text detection a binary classification task of differentiating human from AI. Instead, we introduce a novel ternary text classification scheme, adding an "undecided" category for texts that could be attributed to either source, and we show that this new category is crucial to understand how to make the detection result more explainable to lay users. This research shifts the paradigm from merely classifying to explaining machine-generated texts, emphasizing need for detectors to provide clear and understandable explanations to users. Our study involves creating four new datasets comprised of texts from various LLMs and human authors. Based on new datasets, we performed binary classification tests to ascertain the most effective SOTA detection methods and identified SOTA LLMs capable of producing harder-to-detect texts. We constructed a new dataset of texts generated by two top-performing LLMs and human authors, and asked three human annotators to produce ternary labels with explanation notes. This dataset was used to investigate how three top-performing SOTA detectors behave in new ternary classification context. Our results highlight why "undecided" category is much needed from the viewpoint of explainability. Additionally, we conducted an analysis of explainability of the three best-performing detectors and the explanation notes of the human annotators, revealing insights about the complexity of explainable detection of machine-generated texts. Finally, we propose guidelines for developing future detection systems with improved explanatory power. △ Less

Submitted 26 June, 2024; originally announced June 2024.

Comments: 19 pages, 2 figures

arXiv:2406.15407 [pdf]

Preliminary Design of a General Electronics Platform for Accelerator Facilities

Authors: Jinfu Zhu, Hongli Ding, Haokui Li, Qiaoye Ran, Xiwen Dai, Wei Li, Jiawei Han, Yue Li, Zhiyuan Zhang, Weixin Qiu, Weiqing Zhang

Abstract: Many accelerators require considerable electronic systems for tests, verification, and operation. In Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL), to meet the early tests and verification of various systems, save development expenses, and improve the reusability of hardware, firmware, and software systems, we have considered the needs of each system and preliminarily designed a… ▽ More Many accelerators require considerable electronic systems for tests, verification, and operation. In Shenzhen Superconducting Soft X-ray Free Electron Laser (S3FEL), to meet the early tests and verification of various systems, save development expenses, and improve the reusability of hardware, firmware, and software systems, we have considered the needs of each system and preliminarily designed a general electronics platform based on MicroTCA.4. The Advanced Mezzanine Card (AMC) will place an FPGA Mezzanine Card (FMC) that supports 500 MSPS to 2 GSPS ADC/DAC. We will design two FMC cards on the Rear Transition Module (RTM), which can be used for analog signal conditioning and waveform digitization by 10 MSPS to 250 MSPS ADC/DAC or motor control. The commercial MCH, CPU, power module, and MTCA crate are deployed. This platform can also be applied to other accelerator facilities. △ Less

Submitted 11 May, 2024; originally announced June 2024.

Comments: 3 pages, 4 figures, 2024 IEEE Real-Time Conference

arXiv:2406.14898 [pdf, other]

Safely Learning with Private Data: A Federated Learning Framework for Large Language Model

Authors: JiaYing Zheng, HaiNan Zhang, LingXiang Wang, WangJie Qiu, HongWei Zheng, ZhiMing Zheng

Abstract: Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuit… ▽ More Private data, being larger and quality-higher than public data, can greatly improve large language models (LLM). However, due to privacy concerns, this data is often dispersed in multiple silos, making its secure utilization for LLM training a challenge. Federated learning (FL) is an ideal solution for training models with distributed private data, but traditional frameworks like FedAvg are unsuitable for LLM due to their high computational demands on clients. An alternative, split learning, offloads most training parameters to the server while training embedding and output layers locally, making it more suitable for LLM. Nonetheless, it faces significant challenges in security and efficiency. Firstly, the gradients of embeddings are prone to attacks, leading to potential reverse engineering of private data. Furthermore, the server's limitation of handle only one client's training request at a time hinders parallel training, severely impacting training efficiency. In this paper, we propose a Federated Learning framework for LLM, named FL-GLM, which prevents data leakage caused by both server-side and peer-client attacks while improving training efficiency. Specifically, we first place the input block and output block on local client to prevent embedding gradient attacks from server. Secondly, we employ key-encryption during client-server communication to prevent reverse engineering attacks from peer-clients. Lastly, we employ optimization methods like client-batching or server-hierarchical, adopting different acceleration methods based on the actual computational capabilities of the server. Experimental results on NLU and generation tasks demonstrate that FL-GLM achieves comparable metrics to centralized chatGLM model, validating the effectiveness of our federated learning framework. △ Less

Submitted 26 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.14217 [pdf, other]

Defending Against Sophisticated Poisoning Attacks with RL-based Aggregation in Federated Learning

Authors: Yujing Wang, Hainan Zhang, Sijia Wen, Wangjie Qiu, Binghui Guo

Abstract: Federated learning is highly susceptible to model poisoning attacks, especially those meticulously crafted for servers. Traditional defense methods mainly focus on updating assessments or robust aggregation against manually crafted myopic attacks. When facing advanced attacks, their defense stability is notably insufficient. Therefore, it is imperative to develop adaptive defenses against such adv… ▽ More Federated learning is highly susceptible to model poisoning attacks, especially those meticulously crafted for servers. Traditional defense methods mainly focus on updating assessments or robust aggregation against manually crafted myopic attacks. When facing advanced attacks, their defense stability is notably insufficient. Therefore, it is imperative to develop adaptive defenses against such advanced poisoning attacks. We find that benign clients exhibit significantly higher data distribution stability than malicious clients in federated learning in both CV and NLP tasks. Therefore, the malicious clients can be recognized by observing the stability of their data distribution. In this paper, we propose AdaAggRL, an RL-based Adaptive Aggregation method, to defend against sophisticated poisoning attacks. Specifically, we first utilize distribution learning to simulate the clients' data distributions. Then, we use the maximum mean discrepancy (MMD) to calculate the pairwise similarity of the current local model data distribution, its historical data distribution, and global model data distribution. Finally, we use policy learning to adaptively determine the aggregation weights based on the above similarities. Experiments on four real-world datasets demonstrate that the proposed defense model significantly outperforms widely adopted defense models for sophisticated attacks. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.08523 [pdf, other]

A Plug-and-Play Untrained Neural Network for Full Waveform Inversion in Reconstructing Sound Speed Images of Ultrasound Computed Tomography

Authors: Weicheng Yan, Qiude Zhang, Yun Wu, Zhaohui Liu, Liang Zhou, Mingyue Ding, Ming Yuchi, Wu Qiu

Abstract: Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in… ▽ More Ultrasound computed tomography (USCT), as an emerging technology, can provide multiple quantitative parametric images of human tissue, such as sound speed and attenuation images, distinguishing it from conventional B-mode (reflection) ultrasound imaging. Full waveform inversion (FWI) is acknowledged as a technique with the greatest potential for reconstructing high-resolution sound speed images in USCT. However, traditional FWI for sound speed image reconstruction suffers from high sensitivity to the initial model caused by its strong non-convex nonlinearity, resulting in poor performance when ultrasound signals are at high frequencies. This limitation significantly restricts the application of FWI in the USCT imaging field. In this paper, we propose an untrained neural network (UNN) that can be integrated into the traditional iteration-based FWI framework as an implicit regularization prior. This integration allows for seamless deployment as a plug-and-play module within existing FWI algorithms or their variants. Notably, the proposed UNN method can be trained in an unsupervised fashion, a vital aspect in medical imaging where ground truth data is often unavailable. Evaluations of the numerical simulation and phantom experiment of the breast demonstrate that the proposed UNN improves the robustness of image reconstruction, reduces image artifacts, and achieves great image contrast. To the best of our knowledge, this study represents the first attempt to propose an implicit UNN for FWI in reconstructing sound speed images for USCT. △ Less

Submitted 13 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.02941 [pdf, ps, other]

Numerical approximation for variable-exponent fractional diffusion-wave equation

Authors: Xiangcheng Zheng, Hong Wang, Wenlin Qiu

Abstract: This work considers the variable-exponent fractional diffusion-wave equation, which describes, e.g. the propagation of mechanical diffusive waves in viscoelastic media with varying material properties. Rigorous mathematical and numerical analysis for this model is not available in the literature, partly because the variable-exponent Abel kernel may not be positive definite or monotonic. We overcom… ▽ More This work considers the variable-exponent fractional diffusion-wave equation, which describes, e.g. the propagation of mechanical diffusive waves in viscoelastic media with varying material properties. Rigorous mathematical and numerical analysis for this model is not available in the literature, partly because the variable-exponent Abel kernel may not be positive definite or monotonic. We overcome these difficulties to design two numerical schemes and derive their stability and error estimate based on the proved solution regularity, with $α(0)$-order and second-order accuracy in time, respectively. Numerical experiments are presented to substantiate the theoretical findings. △ Less

Submitted 2 July, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

MSC Class: 35R11; 65M12; 65M60

arXiv:2405.08816 [pdf, other]

The RoboDrive Challenge: Drive Anytime Anywhere in Any Condition

Authors: Lingdong Kong, Shaoyuan Xie, Hanjiang Hu, Yaru Niu, Wei Tsang Ooi, Benoit R. Cottereau, Lai Xing Ng, Yuexin Ma, Wenwei Zhang, Liang Pan, Kai Chen, Ziwei Liu, Weichao Qiu, Wei Zhang, Xu Cao, Hao Lu, Ying-Cong Chen, Caixin Kang, Xinning Zhou, Chengyang Ying, Wentao Shang, Xingxing Wei, Yinpeng Dong, Bo Yang, Shengyin Jiang , et al. (66 additional authors not shown)

Abstract: In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that c… ▽ More In the realm of autonomous driving, robust perception under out-of-distribution conditions is paramount for the safe deployment of vehicles. Challenges such as adverse weather, sensor malfunctions, and environmental unpredictability can severely impact the performance of autonomous systems. The 2024 RoboDrive Challenge was crafted to propel the development of driving perception technologies that can withstand and adapt to these real-world variabilities. Focusing on four pivotal tasks -- BEV detection, map segmentation, semantic occupancy prediction, and multi-view depth estimation -- the competition laid down a gauntlet to innovate and enhance system resilience against typical and atypical disturbances. This year's challenge consisted of five distinct tracks and attracted 140 registered teams from 93 institutes across 11 countries, resulting in nearly one thousand submissions evaluated through our servers. The competition culminated in 15 top-performing solutions, which introduced a range of innovative approaches including advanced data augmentation, multi-sensor fusion, self-supervised learning for error correction, and new algorithmic strategies to enhance sensor robustness. These contributions significantly advanced the state of the art, particularly in handling sensor inconsistencies and environmental variability. Participants, through collaborative efforts, pushed the boundaries of current technologies, showcasing their potential in real-world scenarios. Extensive evaluations and analyses provided insights into the effectiveness of these solutions, highlighting key trends and successful strategies for improving the resilience of driving perception systems. This challenge has set a new benchmark in the field, providing a rich repository of techniques expected to guide future research in this field. △ Less

Submitted 29 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: ICRA 2024; 32 pages, 24 figures, 5 tables; Code at https://robodrive-24.github.io/

arXiv:2404.11934 [pdf]

doi 10.1103/PhysRevB.109.L161102

Quantum simulation of honeycomb lattice model by high-order moiré pattern

Authors: Qiang Wan, Chunlong Wu, Xun-Jiang Luo, Shenghao Dai, Cao Peng, Renzhe Li, Shangkun Mo, Keming Zhao, Wen-Xuan Qiu, Hao Zhong, Yiwei Li, Chendong Zhang, Fengcheng Wu, Nan Xu

Abstract: Moiré superlattices have become an emergent solid-state platform for simulating quantum lattice models. However, in single moiré device, Hamiltonians parameters like lattice constant, hopping and interaction terms can hardly be manipulated, limiting the controllability and accessibility of moire quantum simulator. Here, by combining angle-resolved photoemission spectroscopy and theoretical analysi… ▽ More Moiré superlattices have become an emergent solid-state platform for simulating quantum lattice models. However, in single moiré device, Hamiltonians parameters like lattice constant, hopping and interaction terms can hardly be manipulated, limiting the controllability and accessibility of moire quantum simulator. Here, by combining angle-resolved photoemission spectroscopy and theoretical analysis, we demonstrate that high-order moiré patterns in graphene-monolayered xenon/krypton heterostructures can simulate honeycomb model in mesoscale, with in-situ tunable Hamiltonians parameters. The length scale of simulated lattice constant can be tuned by annealing processes, which in-situ adjusts intervalley interaction and hopping parameters in the simulated honeycomb lattice. The sign of the lattice constant can be switched by choosing xenon or krypton monolayer deposited on graphene, which controls sublattice degree of freedom and valley arrangment of Dirac fermions. Our work establishes a novel path for experimentally simulating the honeycomb model with tunable parameters by high-order moiré patterns. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 19 pages, 5 figure

Journal ref: Phy. Rev. B 109, L161102 (2024)

arXiv:2404.02617 [pdf, other]

Neural Radiance Fields with Torch Units

Authors: Bingnan Ni, Huanyu Wang, Dongfeng Bai, Minghe Weng, Dexin Qi, Weichao Qiu, Bingbing Liu

Abstract: Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current i… ▽ More Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current inference pattern, $i.e.$, a pixel only relies on an individual camera ray, fails to capture contextual information. To solve these problems, we propose to enlarge the ray perception field and build up the sample points interactions. In this paper, we design a novel inference pattern that encourages a single camera ray possessing more contextual information, and models the relationship among sample points on each camera ray. To hold contextual information,a camera ray in our proposed method can render a patch of pixels simultaneously. Moreover, we replace the MLP in neural radiance field models with distance-aware convolutions to enhance the feature propagation among sample points from the same camera ray. To summarize, as a torchlight, a ray in our proposed method achieves rendering a patch of image. Thus, we call the proposed method, Torch-NeRF. Extensive experiments on KITTI-360 and LLFF show that the Torch-NeRF exhibits excellent performance. △ Less

Submitted 3 April, 2024; originally announced April 2024.

arXiv:2404.00293 [pdf, ps, other]

Some super-Poincaré inequalities for gaussian-like measures on stratified Lie groups

Authors: Yaozhong W. Qiu

Abstract: We continue the $U$-bound program initiated in [J. Funct. Anal. 258, 814-851 (2010)] and prove super-Poincaré inequalities for a class of subelliptic probability measures defined on Métivier groups, the main ingredient in the proof being a Hardy-type inequality. In doing so, we recover and extend some previous results from the probabilistic viewpoint. We continue the $U$-bound program initiated in [J. Funct. Anal. 258, 814-851 (2010)] and prove super-Poincaré inequalities for a class of subelliptic probability measures defined on Métivier groups, the main ingredient in the proof being a Hardy-type inequality. In doing so, we recover and extend some previous results from the probabilistic viewpoint. △ Less

Submitted 25 May, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

Comments: 19 pages, major changes to proofs, some corrections, and some examples added

MSC Class: 26D10; 60J60

arXiv:2403.12722 [pdf, other]

HUGS: Holistic Urban 3D Scene Understanding via Gaussian Splatting

Authors: Hongyu Zhou, Jiahao Shao, Lu Xu, Dongfeng Bai, Weichao Qiu, Bingbing Liu, Yue Wang, Andreas Geiger, Yiyi Liao

Abstract: Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manu… ▽ More Holistic understanding of urban scenes based on RGB images is a challenging yet important problem. It encompasses understanding both the geometry and appearance to enable novel view synthesis, parsing semantic labels, and tracking moving objects. Despite considerable progress, existing approaches often focus on specific aspects of this task and require additional inputs such as LiDAR scans or manually annotated 3D bounding boxes. In this paper, we introduce a novel pipeline that utilizes 3D Gaussian Splatting for holistic urban scene understanding. Our main idea involves the joint optimization of geometry, appearance, semantics, and motion using a combination of static and dynamic 3D Gaussians, where moving object poses are regularized via physical constraints. Our approach offers the ability to render new viewpoints in real-time, yielding 2D and 3D semantic information with high accuracy, and reconstruct dynamic scenes, even in scenarios where 3D bounding box detection are highly noisy. Experimental results on KITTI, KITTI-360, and Virtual KITTI 2 demonstrate the effectiveness of our approach. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Our project page is at https://xdimlab.github.io/hugs_website

arXiv:2403.12695 [pdf, other]

Federated Semi-supervised Learning for Medical Image Segmentation with intra-client and inter-client Consistency

Authors: Yubin Zheng, Peng Tang, Tianjie Ju, Weidong Qiu, Bo Yan

Abstract: Medical image segmentation plays a vital role in clinic disease diagnosis and medical image analysis. However, labeling medical images for segmentation task is tough due to the indispensable domain expertise of radiologists. Furthermore, considering the privacy and sensitivity of medical images, it is impractical to build a centralized segmentation dataset from different medical institutions. Fede… ▽ More Medical image segmentation plays a vital role in clinic disease diagnosis and medical image analysis. However, labeling medical images for segmentation task is tough due to the indispensable domain expertise of radiologists. Furthermore, considering the privacy and sensitivity of medical images, it is impractical to build a centralized segmentation dataset from different medical institutions. Federated learning aims to train a shared model of isolated clients without local data exchange which aligns well with the scarcity and privacy characteristics of medical data. To solve the problem of labeling hard, many advanced semi-supervised methods have been proposed in a centralized data setting. As for federated learning, how to conduct semi-supervised learning under this distributed scenario is worth investigating. In this work, we propose a novel federated semi-supervised learning framework for medical image segmentation. The intra-client and inter-client consistency learning are introduced to smooth predictions at the data level and avoid confirmation bias of local models. They are achieved with the assistance of a Variational Autoencoder (VAE) trained collaboratively by clients. The added VAE model plays three roles: 1) extracting latent low-dimensional features of all labeled and unlabeled data; 2) performing a novel type of data augmentation in calculating intra-client consistency loss; 3) utilizing the generative ability of itself to conduct inter-client consistency distillation. The proposed framework is compared with other federated semi-supervised or self-supervised learning methods. The experimental results illustrate that our method outperforms the state-of-the-art method while avoiding a lot of computation and communication overhead. △ Less

Submitted 19 March, 2024; originally announced March 2024.

Comments: Working in progress

arXiv:2403.04880 [pdf, other]

An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control

Authors: Aosong Feng, Weikang Qiu, Jinbin Bai, Xiao Zhang, Zhen Dong, Kaicheng Zhou, Rex Ying, Leandros Tassiulas

Abstract: Building on the success of text-to-image diffusion models (DPMs), image editing is an important application to enable human interaction with AI-generated content. Among various editing methods, editing within the prompt space gains more attention due to its capacity and simplicity of controlling semantics. However, since diffusion models are commonly pretrained on descriptive text captions, direct… ▽ More Building on the success of text-to-image diffusion models (DPMs), image editing is an important application to enable human interaction with AI-generated content. Among various editing methods, editing within the prompt space gains more attention due to its capacity and simplicity of controlling semantics. However, since diffusion models are commonly pretrained on descriptive text captions, direct editing of words in text prompts usually leads to completely different generated images, violating the requirements for image editing. On the other hand, existing editing methods usually consider introducing spatial masks to preserve the identity of unedited regions, which are usually ignored by DPMs and therefore lead to inharmonic editing results. Targeting these two challenges, in this work, we propose to disentangle the comprehensive image-prompt interaction into several item-prompt interactions, with each item linked to a special learned prompt. The resulting framework, named D-Edit, is based on pretrained diffusion models with cross-attention layers disentangled and adopts a two-step optimization to build item-prompt associations. Versatile image editing can then be applied to specific items by manipulating the corresponding prompts. We demonstrate state-of-the-art results in four types of editing operations including image-based, text-based, mask-based editing, and item removal, covering most types of editing applications, all within a single unified framework. Notably, D-Edit is the first framework that can (1) achieve item editing through mask editing and (2) combine image and text-based editing. We demonstrate the quality and versatility of the editing results for a diverse collection of images through both qualitative and quantitative evaluations. △ Less

Submitted 28 May, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

arXiv:2402.11317 [pdf, other]

Debiased Offline Representation Learning for Fast Online Adaptation in Non-stationary Dynamics

Authors: Xinyu Zhang, Wenjie Qiu, Yi-Chen Li, Lei Yuan, Chengxing Jia, Zongzhang Zhang, Yang Yu

Abstract: Developing policies that can adjust to non-stationary environments is essential for real-world reinforcement learning applications. However, learning such adaptable policies in offline settings, with only a limited set of pre-collected trajectories, presents significant challenges. A key difficulty arises because the limited offline data makes it hard for the context encoder to differentiate betwe… ▽ More Developing policies that can adjust to non-stationary environments is essential for real-world reinforcement learning applications. However, learning such adaptable policies in offline settings, with only a limited set of pre-collected trajectories, presents significant challenges. A key difficulty arises because the limited offline data makes it hard for the context encoder to differentiate between changes in the environment dynamics and shifts in the behavior policy, often leading to context misassociations. To address this issue, we introduce a novel approach called Debiased Offline Representation for fast online Adaptation (DORA). DORA incorporates an information bottleneck principle that maximizes mutual information between the dynamics encoding and the environmental data, while minimizing mutual information between the dynamics encoding and the actions of the behavior policy. We present a practical implementation of DORA, leveraging tractable bounds of the information bottleneck principle. Our experimental evaluation across six benchmark MuJoCo tasks with variable parameters demonstrates that DORA not only achieves a more precise dynamics encoding but also significantly outperforms existing baselines in terms of performance. △ Less

Submitted 17 February, 2024; originally announced February 2024.

arXiv:2402.01723 [pdf, other]

An Empirical Study on Large Language Models in Accuracy and Robustness under Chinese Industrial Scenarios

Authors: Zongjie Li, Wenying Qiu, Pingchuan Ma, Yichen Li, You Li, Sijia He, Baozheng Jiang, Shuai Wang, Weixi Gu

Abstract: Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment… ▽ More Recent years have witnessed the rapid development of large language models (LLMs) in various domains. To better serve the large number of Chinese users, many commercial vendors in China have adopted localization strategies, training and providing local LLMs specifically customized for Chinese users. Furthermore, looking ahead, one of the key future applications of LLMs will be practical deployment in industrial production by enterprises and users in those sectors. However, the accuracy and robustness of LLMs in industrial scenarios have not been well studied. In this paper, we present a comprehensive empirical study on the accuracy and robustness of LLMs in the context of the Chinese industrial production area. We manually collected 1,200 domain-specific problems from 8 different industrial sectors to evaluate LLM accuracy. Furthermore, we designed a metamorphic testing framework containing four industrial-specific stability categories with eight abilities, totaling 13,631 questions with variants to evaluate LLM robustness. In total, we evaluated 9 different LLMs developed by Chinese vendors, as well as four different LLMs developed by global vendors. Our major findings include: (1) Current LLMs exhibit low accuracy in Chinese industrial contexts, with all LLMs scoring less than 0.6. (2) The robustness scores vary across industrial sectors, and local LLMs overall perform worse than global ones. (3) LLM robustness differs significantly across abilities. Global LLMs are more robust under logical-related variants, while advanced local LLMs perform better on problems related to understanding Chinese industrial terminology. Our study results provide valuable guidance for understanding and promoting the industrial domain capabilities of LLMs from both development and industrial enterprise perspectives. The results further motivate possible research directions and tooling support. △ Less

Submitted 26 January, 2024; originally announced February 2024.

arXiv:2401.16885 [pdf, ps, other]

Local modification of subdiffusion by initial Fickian diffusion: Multiscale modeling, analysis and computation

Authors: Xiangcheng Zheng, Yiqun Li, Wenlin Qiu

Abstract: We propose a local modification of the standard subdiffusion model by introducing the initial Fickian diffusion, which results in a multiscale diffusion model. The developed model resolves the incompatibility between the nonlocal operators in subdiffusion and the local initial conditions and thus eliminates the initial singularity of the solutions of the subdiffusion, while retaining its heavy tai… ▽ More We propose a local modification of the standard subdiffusion model by introducing the initial Fickian diffusion, which results in a multiscale diffusion model. The developed model resolves the incompatibility between the nonlocal operators in subdiffusion and the local initial conditions and thus eliminates the initial singularity of the solutions of the subdiffusion, while retaining its heavy tail behavior away from the initial time. The well-posedness of the model and high-order regularity estimates of its solutions are analyzed by resolvent estimates, based on which the numerical discretization and analysis are performed. Numerical experiments are carried out to substantiate the theoretical findings. △ Less

Submitted 30 January, 2024; originally announced January 2024.

MSC Class: 35R11; 65M12

arXiv:2401.14597 [pdf]

Superconducting flux qubit with ferromagnetic Josephson pi junction operating at zero magnetic field

Authors: Sunmi Kim, Leonid V. Abdurakhimov, Duong Pham, Wei Qiu, Hirotaka Terai, Sahel Ashhab, Shiro Saito, Taro Yamashita, Kouichi Semba

Abstract: The operation of a conventional superconducting flux qubit requires the application of a precisely tuned magnetic field to set the operation point at half a flux quantum through the qubit loop, which makes the scaling of quantum circuits based on this type of qubits difficult. It has been proposed that, by inducing a pi phase shift in the superconducting order parameter using a precisely controlle… ▽ More The operation of a conventional superconducting flux qubit requires the application of a precisely tuned magnetic field to set the operation point at half a flux quantum through the qubit loop, which makes the scaling of quantum circuits based on this type of qubits difficult. It has been proposed that, by inducing a pi phase shift in the superconducting order parameter using a precisely controlled nanoscale-thickness superconductor/ferromagnet/superconductor Josephson junction, commonly referred to as pi-junction, it is possible to realize a flux qubit operating at zero magnetic flux. We report the realization of a zero-flux-biased flux qubit based on three NbN/AlN/NbN Josephson junctions and a NbN/PdNi/NbN ferromagnetic pi-junction. The qubit lifetime is in the microsecond range, which we argue is limited by quasiparticle excitations in the metallic ferromagnet layer. With further improvements in the materials of the ferromagnetic junction, the zero-flux-biased flux qubits can become a promising platform for quantum computing. △ Less

Submitted 15 February, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

Comments: 22 pages, 4 figures

arXiv:2401.08045 [pdf, other]

Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities

Authors: Xu Yan, Haiming Zhang, Yingjie Cai, Jingming Guo, Weichao Qiu, Bin Gao, Kaiqiang Zhou, Yue Zhao, Huan Jin, Jiantao Gao, Zhen Li, Lihui Jiang, Wei Zhang, Hongbo Zhang, Dengxin Dai, Bingbing Liu

Abstract: The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains chal… ▽ More The rise of large foundation models, trained on extensive datasets, is revolutionizing the field of AI. Models such as SAM, DALL-E2, and GPT-4 showcase their adaptability by extracting intricate patterns and performing effectively across diverse tasks, thereby serving as potent building blocks for a wide range of AI applications. Autonomous driving, a vibrant front in AI applications, remains challenged by the lack of dedicated vision foundation models (VFMs). The scarcity of comprehensive training data, the need for multi-sensor integration, and the diverse task-specific architectures pose significant obstacles to the development of VFMs in this field. This paper delves into the critical challenge of forging VFMs tailored specifically for autonomous driving, while also outlining future directions. Through a systematic analysis of over 250 papers, we dissect essential techniques for VFM development, including data preparation, pre-training strategies, and downstream task adaptation. Moreover, we explore key advancements such as NeRF, diffusion models, 3D Gaussian Splatting, and world models, presenting a comprehensive roadmap for future research. To empower researchers, we have built and maintained https://github.com/zhanghm1995/Forge_VFM4AD, an open-access repository constantly updated with the latest advancements in forging VFMs for autonomous driving. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: Github Repo: https://github.com/zhanghm1995/Forge_VFM4AD

arXiv:2401.08038 [pdf, other]

Calpric: Inclusive and Fine-grain Labeling of Privacy Policies with Crowdsourcing and Active Learning

Authors: Wenjun Qiu, David Lie, Lisa Austin

Abstract: A significant challenge to training accurate deep learning models on privacy policies is the cost and difficulty of obtaining a large and comprehensive set of training data. To address these challenges, we present Calpric , which combines automatic text selection and segmentation, active learning and the use of crowdsourced annotators to generate a large, balanced training set for privacy policies… ▽ More A significant challenge to training accurate deep learning models on privacy policies is the cost and difficulty of obtaining a large and comprehensive set of training data. To address these challenges, we present Calpric , which combines automatic text selection and segmentation, active learning and the use of crowdsourced annotators to generate a large, balanced training set for privacy policies at low cost. Automated text selection and segmentation simplifies the labeling task, enabling untrained annotators from crowdsourcing platforms, like Amazon's Mechanical Turk, to be competitive with trained annotators, such as law students, and also reduces inter-annotator agreement, which decreases labeling cost. Having reliable labels for training enables the use of active learning, which uses fewer training samples to efficiently cover the input space, further reducing cost and improving class and data category balance in the data set. The combination of these techniques allows Calpric to produce models that are accurate over a wider range of data categories, and provide more detailed, fine-grain labels than previous work. Our crowdsourcing process enables Calpric to attain reliable labeled data at a cost of roughly $0.92-$1.71 per labeled text segment. Calpric 's training process also generates a labeled data set of 16K privacy policy text segments across 9 Data categories with balanced positive and negative samples. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: published at USENIX Security 2023; associated website: https://www.usenix.org/conference/usenixsecurity23/presentation/qiu

arXiv:2312.06951 [pdf, other]

Feature Norm Regularized Federated Learning: Transforming Skewed Distributions into Global Insights

Authors: Ke Hu, WeiDong Qiu, Peng Tang

Abstract: In the field of federated learning, addressing non-independent and identically distributed (non-i.i.d.) data remains a quintessential challenge for improving global model performance. This work introduces the Feature Norm Regularized Federated Learning (FNR-FL) algorithm, which uniquely incorporates class average feature norms to enhance model accuracy and convergence in non-i.i.d. scenarios. Our… ▽ More In the field of federated learning, addressing non-independent and identically distributed (non-i.i.d.) data remains a quintessential challenge for improving global model performance. This work introduces the Feature Norm Regularized Federated Learning (FNR-FL) algorithm, which uniquely incorporates class average feature norms to enhance model accuracy and convergence in non-i.i.d. scenarios. Our comprehensive analysis reveals that FNR-FL not only accelerates convergence but also significantly surpasses other contemporary federated learning algorithms in test accuracy, particularly under feature distribution skew scenarios. The novel modular design of FNR-FL facilitates seamless integration with existing federated learning frameworks, reinforcing its adaptability and potential for widespread application. We substantiate our claims through rigorous empirical evaluations, demonstrating FNR-FL's exceptional performance across various skewed data distributions. Relative to FedAvg, FNR-FL exhibits a substantial 66.24\% improvement in accuracy and a significant 11.40\% reduction in training time, underscoring its enhanced effectiveness and efficiency. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.02203 [pdf, other]

Learning High-Order Relationships of Brain Regions

Authors: Weikang Qiu, Huangrui Chu, Selena Wang, Haolan Zuo, Xiaoxiao Li, Yize Zhao, Rex Ying

Abstract: Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationshi… ▽ More Discovering reliable and informative relationships among brain regions from functional magnetic resonance imaging (fMRI) signals is essential in phenotypic predictions. Most of the current methods fail to accurately characterize those interactions because they only focus on pairwise connections and overlook the high-order relationships of brain regions. We propose that these high-order relationships should be maximally informative and minimally redundant (MIMR). However, identifying such high-order relationships is challenging and under-explored due to the exponential search space and the absence of a tractable objective. In response to this gap, we propose a novel method named HYBRID which aims to extract MIMR high-order relationships from fMRI data. HYBRID employs a CONSTRUCTOR to identify hyperedge structures, and a WEIGHTER to compute a weight for each hyperedge, which avoids searching in exponential space. HYBRID achieves the MIMR objective through an innovative information bottleneck framework named multi-head drop-bottleneck with theoretical guarantees. Our comprehensive experiments demonstrate the effectiveness of our model. Our model outperforms the state-of-the-art predictive model by an average of 11.2%, regarding the quality of hyperedges measured by CPM, a standard protocol for studying brain connections. △ Less

Submitted 8 June, 2024; v1 submitted 2 December, 2023; originally announced December 2023.

Comments: Accepted at ICML 2024, Camera Ready Version

arXiv:2311.17080 [pdf, other]

Combating the "Sameness" in AI Art: Reflections on the Interactive AI Installation Fencing Hallucination

Authors: Weihao Qiu, George Legrady

Abstract: The article summarizes three types of "sameness" issues in Artificial Intelligence(AI) art, each occurring at different stages of development in AI image creation tools. Through the Fencing Hallucination project, the article reflects on the design of AI art production in alleviating the sense of uniformity, maintaining the uniqueness of images from an AI image synthesizer, and enhancing the connec… ▽ More The article summarizes three types of "sameness" issues in Artificial Intelligence(AI) art, each occurring at different stages of development in AI image creation tools. Through the Fencing Hallucination project, the article reflects on the design of AI art production in alleviating the sense of uniformity, maintaining the uniqueness of images from an AI image synthesizer, and enhancing the connection between the artworks and the audience. This paper endeavors to stimulate the creation of distinctive AI art by recounting the efforts and insights derived from the Fencing Hallucination project, all dedicated to addressing the issue of "sameness". △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Paper for NeurIPS 2023 Workshop, Machine Learning for Creativity and Design

arXiv:2311.15917 [pdf, other]

When Graph Convolution Meets Double Attention: Online Privacy Disclosure Detection with Multi-Label Text Classification

Authors: Zhanbo Liang, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li

Abstract: With the rise of Web 2.0 platforms such as online social media, people's private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled… ▽ More With the rise of Web 2.0 platforms such as online social media, people's private information, such as their location, occupation and even family information, is often inadvertently disclosed through online discussions. Therefore, it is important to detect such unwanted privacy disclosures to help alert people affected and the online platform. In this paper, privacy disclosure detection is modeled as a multi-label text classification (MLTC) problem, and a new privacy disclosure detection model is proposed to construct an MLTC classifier for detecting online privacy disclosures. This classifier takes an online post as the input and outputs multiple labels, each reflecting a possible privacy disclosure. The proposed presentation method combines three different sources of information, the input text itself, the label-to-text correlation and the label-to-label correlation. A double-attention mechanism is used to combine the first two sources of information, and a graph convolutional network (GCN) is employed to extract the third source of information that is then used to help fuse features extracted from the first two sources of information. Our extensive experimental results, obtained on a public dataset of privacy-disclosing posts on Twitter, demonstrated that our proposed privacy disclosure detection method significantly and consistently outperformed other state-of-the-art methods in terms of all key performance indicators. △ Less

Submitted 20 December, 2023; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: The manuscript is accepted by Data Mining and Knowledge Discovery(ECML PKDD Journal track)

arXiv:2311.13588 [pdf, other]

User-guided Page Merging for Memory Deduplication in Serverless Systems

Authors: Wei Qiu, Marcin Copik, Yun Wang, Alexandru Calotoiu, Torsten Hoefler

Abstract: Serverless computing is an emerging cloud paradigm that offers an elastic and scalable allocation of computing resources with pay-as-you-go billing. In the Function-as-a-Service (FaaS) programming model, applications comprise short-lived and stateless serverless functions executed in isolated containers or microVMs, which can quickly scale to thousands of instances and process terabytes of data. T… ▽ More Serverless computing is an emerging cloud paradigm that offers an elastic and scalable allocation of computing resources with pay-as-you-go billing. In the Function-as-a-Service (FaaS) programming model, applications comprise short-lived and stateless serverless functions executed in isolated containers or microVMs, which can quickly scale to thousands of instances and process terabytes of data. This flexibility comes at the cost of duplicated runtimes, libraries, and user data spread across many function instances, and cloud providers do not utilize this redundancy. The memory footprint of serverless forces removing idle containers to make space for new ones, which decreases performance through more cold starts and fewer data caching opportunities. We address this issue by proposing deduplicating memory pages of serverless workers with identical content, based on the content-based page-sharing concept of Linux Kernel Same-page Merging (KSM). We replace the background memory scanning process of KSM, as it is too slow to locate sharing candidates in short-lived functions. Instead, we design User-Guided Page Merging (UPM), a built-in Linux kernel module that leverages the madvise system call: we enable users to advise the kernel of memory areas that can be shared with others. We show that UPM reduces memory consumption by up to 55% on 16 concurrent containers executing a typical image recognition function, more than doubling the density for containers of the same function that can run on a system. △ Less

Submitted 22 November, 2023; originally announced November 2023.

Comments: Accepted at IEEE BigData 2023

arXiv:2311.09836 [pdf, other]

PELMS: Pre-training for Effective Low-Shot Multi-Document Summarization

Authors: Joseph J. Peper, Wenzhao Qiu, Lu Wang

Abstract: We investigate pre-training techniques for abstractive multi-document summarization (MDS), which is much less studied than summarizing single documents. Though recent work has demonstrated the effectiveness of highlighting information salience for pre-training strategy design, it struggles to generate abstractive and reflective summaries, which are critical properties for MDS. To this end, we pres… ▽ More We investigate pre-training techniques for abstractive multi-document summarization (MDS), which is much less studied than summarizing single documents. Though recent work has demonstrated the effectiveness of highlighting information salience for pre-training strategy design, it struggles to generate abstractive and reflective summaries, which are critical properties for MDS. To this end, we present PELMS, a pre-trained model that uses objectives based on semantic coherence heuristics and faithfulness constraints with un-labeled multi-document inputs, to promote the generation of concise, fluent, and faithful summaries. To support the training of PELMS, we compile MultiPT, a multi-document pre-training corpus containing over 93 million documents to form more than 3 million unlabeled topic-centric document clusters, covering diverse genres such as product reviews, news, and general knowledge. We perform extensive evaluation of PELMS in low-shot settings on a wide range of MDS datasets. Our approach consistently outperforms competitive comparisons with respect to overall informativeness, abstractiveness, coherence, and faithfulness. △ Less

Submitted 16 November, 2023; originally announced November 2023.

arXiv:2311.01766 [pdf, other]

Support or Refute: Analyzing the Stance of Evidence to Detect Out-of-Context Mis- and Disinformation

Authors: Xin Yuan, Jie Guo, Weidong Qiu, Zheng Huang, Shujun Li

Abstract: Mis- and disinformation online have become a major societal problem as major sources of online harms of different kinds. One common form of mis- and disinformation is out-of-context (OOC) information, where different pieces of information are falsely associated, e.g., a real image combined with a false textual caption or a misleading textual description. Although some past studies have attempted t… ▽ More Mis- and disinformation online have become a major societal problem as major sources of online harms of different kinds. One common form of mis- and disinformation is out-of-context (OOC) information, where different pieces of information are falsely associated, e.g., a real image combined with a false textual caption or a misleading textual description. Although some past studies have attempted to defend against OOC mis- and disinformation through external evidence, they tend to disregard the role of different pieces of evidence with different stances. Motivated by the intuition that the stance of evidence represents a bias towards different detection results, we propose a stance extraction network (SEN) that can extract the stances of different pieces of multi-modal evidence in a unified framework. Moreover, we introduce a support-refutation score calculated based on the co-occurrence relations of named entities into the textual SEN. Extensive experiments on a public large-scale dataset demonstrated that our proposed method outperformed the state-of-the-art baselines, with the best model achieving a performance gain of 3.2% in accuracy. △ Less

Submitted 9 December, 2023; v1 submitted 3 November, 2023; originally announced November 2023.

Comments: Accepted and published by EMNLP 2023. Details can be found in https://aclanthology.org/2023.emnlp-main.259

Journal ref: In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 4268-4280, Singapore. Association for Computational Linguistics

arXiv:2311.01150 [pdf, other]

Revisiting the Knowledge Injection Frameworks

Authors: Peng Fu, Yiming Zhang, Haobo Wang, Weikang Qiu, Junbo Zhao

Abstract: In recent years, large language models (LLMs), such as GPTs, have attained great impact worldwide. However, how to adapt these LLMs to better suit the vertical domain-specific tasks by utilizing external knowledge remains not completely solved. Indeed, there have emerged a few works on this line where most of them rely on an alignment heuristic that is built to inject the corresponding knowledge t… ▽ More In recent years, large language models (LLMs), such as GPTs, have attained great impact worldwide. However, how to adapt these LLMs to better suit the vertical domain-specific tasks by utilizing external knowledge remains not completely solved. Indeed, there have emerged a few works on this line where most of them rely on an alignment heuristic that is built to inject the corresponding knowledge tuple into the associated text sample. However, despite the promise, we identify a pivotal problem in this work ubiquitously. Simply put, we find that injecting unaligned (i.e., random) knowledge tuple into the LLMs achieves comparable (and sometimes better) results than the aligned knowledge being injected. We therefore take a thorough investigation of this frustrating finding on a variety of related prior work and further provide a chain of potential interpretations for the phenomenon. Based on all that, we offer a simple remediated technique. Briefly, the core of this technique is rooted in an ideological emphasis on the pruning and purification of the external knowledge base to be injected into LLMs. At last, we show that by integrating this technique into most (if not all) knowledge injection frameworks and recent LLMs, it manages to overcome the aforementioned sanity problem and further pushes the boundary of the performance of the domain-adaptive LLMs. △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: 9 pages, 6 figures, accepted by EMNLP 2023 Main

arXiv:2310.02217 [pdf, other]

doi 10.1103/PhysRevB.109.L041106

Electrically tuned topology and magnetism in twisted bilayer MoTe$_2$ at $ν_h=1$

Authors: Bohao Li, Wen-Xuan Qiu, Fengcheng Wu

Abstract: We present a theoretical study of an interaction-driven quantum phase diagram of twisted bilayer MoTe$_2$ at hole filling factor $ν_h=1$ as a function of twist angle $θ$ and layer potential difference $V_z$, where $V_z$ is generated by an applied out-of-plane electric field. At $V_z=0$, the phase diagram includes quantum anomalous Hall insulators in the intermediate $θ$ regime and topologically tr… ▽ More We present a theoretical study of an interaction-driven quantum phase diagram of twisted bilayer MoTe$_2$ at hole filling factor $ν_h=1$ as a function of twist angle $θ$ and layer potential difference $V_z$, where $V_z$ is generated by an applied out-of-plane electric field. At $V_z=0$, the phase diagram includes quantum anomalous Hall insulators in the intermediate $θ$ regime and topologically trivial multiferroic states with coexisting ferroelectricity and magnetism in both small and large $θ$ regimes. There can be two transitions from the quantum anomalous Hall insulator phase to topologically trivial out-of-plane ferromagnetic phase, and finally to in-plane 120$^\circ$ antiferromagnetic phase as $|V_z|$ increases, or a single transition without the intervening ferromagnetic phase. We show explicitly that the spin vector chirality of various 120$^\circ$ antiferromagnetic states can be electrically switched. We discuss the connection between the experimentally measured Curie-Weiss temperature and the low-temperature magnetic order based on an effective Heisenberg model with magnetic anisotropy. △ Less

Submitted 18 January, 2024; v1 submitted 3 October, 2023; originally announced October 2023.

Comments: 7 pages, 4 figures

Journal ref: Phys. Rev. B 109, L041106 (2024)

arXiv:2308.09707 [pdf, other]

doi 10.1103/PhysRevB.109.L041103

Majorana zero modes in twisted transition metal dichalcogenides homobilayers

Authors: Xun-Jiang Luo, Wen-Xuan Qiu, Fengcheng Wu

Abstract: Semiconductor moiré superlattices provide a highly tunable platform to study the interplay between electron correlation and band topology. For example, the generalized Kane-Mele-Hubbard model can be simulated by the topological moiré flat bands in twisted transition metal dichalcogenides homobilayers. For this system, we obtain the filling factor, twist angle, and electric field-dependent quantum… ▽ More Semiconductor moiré superlattices provide a highly tunable platform to study the interplay between electron correlation and band topology. For example, the generalized Kane-Mele-Hubbard model can be simulated by the topological moiré flat bands in twisted transition metal dichalcogenides homobilayers. For this system, we obtain the filling factor, twist angle, and electric field-dependent quantum phase diagrams with a plethora of phases, including the quantum spin Hall insulator, the in-plane antiferromagnetic state, the out-of-plane antiferromagnetic Chern insulator, the spin-polarized Chern insulator, the in-plane ferromagnetic state, and the 120$^\circ$ antiferromagnetic state. We predict that a gate-defined junction formed between the quantum spin Hall insulator phase with proximitized superconductivity and magnetic phases with in-plane magnetization (either ferromagnetic or antiferromagnetic) can realize one-dimensional topological superconductor with Majorana zero modes. Our proposal introduces semiconductor moiré homobilayers as an electrically tunable Majorana platform with no need of an external magnetic field. △ Less

Submitted 9 January, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

Comments: 12 pages, 9 figures

Journal ref: Phys. Rev. B 109, L041103 (2024)

arXiv:2306.06824 [pdf, other]

SE#PCFG: Semantically Enhanced PCFG for Password Analysis and Cracking

Authors: Yangde Wang, Weidong Qiu, Weicheng Zhang, Hao Tian, Shujun Li

Abstract: Much research has been done on user-generated textual passwords. Surprisingly, semantic information in such passwords remain underinvestigated, with passwords created by English- and/or Chinese-speaking users being more studied with limited semantics. This paper fills this gap by proposing a general framework based on semantically enhanced PCFG (probabilistic context-free grammars) named SE#PCFG.… ▽ More Much research has been done on user-generated textual passwords. Surprisingly, semantic information in such passwords remain underinvestigated, with passwords created by English- and/or Chinese-speaking users being more studied with limited semantics. This paper fills this gap by proposing a general framework based on semantically enhanced PCFG (probabilistic context-free grammars) named SE#PCFG. It allowed us to consider 43 types of semantic information, the richest set considered so far, for semantic password analysis. Applying SE#PCFG to 17 large leaked password databases of user speaking four languages (English, Chinese, German and French), we demonstrate its usefulness and report a wide range of new insights about password semantics at different levels such as cross-website password correlations. Furthermore, based on SE#PCFG and a new systematic smoothing method, we proposed the Semantically Enhanced Password Cracking Architecture (SEPCA). To compare the performance of SEPCA against three state-of-the-art (SOTA) benchmarks in terms of the password coverage rate: two other PCFG variants and FLA. Our experimental results showed that SEPCA outperformed all the three benchmarks consistently and significantly across 52 test cases, by up to 21.53%, 52.55% and 7.86%, respectively, at the user level (with duplicate passwords). At the level of unique passwords, SEPCA also beats the three benchmarks by up to 33.32%, 86.19% and 10.46%, respectively. The results demonstrated the power of SEPCA as a new password cracking framework. △ Less

Submitted 11 June, 2023; originally announced June 2023.

arXiv:2306.04928 [pdf, other]

Underwater Intention Recognition using Head Motion and Throat Vibration for Supernumerary Robotic Assistance

Authors: Yuqin Guo, Rongzheng Zhang, Wanghongjie Qiu, Harry Asada, Fang Wan, Chaoyang Song

Abstract: This study presents a multi-modal mechanism for recognizing human intentions while diving underwater, aiming to achieve natural human-robot interactions through an underwater superlimb for diving assistance. The underwater environment severely limits the divers' capabilities in intention expression, which becomes more challenging when they intend to operate tools while keeping control of body post… ▽ More This study presents a multi-modal mechanism for recognizing human intentions while diving underwater, aiming to achieve natural human-robot interactions through an underwater superlimb for diving assistance. The underwater environment severely limits the divers' capabilities in intention expression, which becomes more challenging when they intend to operate tools while keeping control of body postures in 3D with the various diving suits and gears. The current literature is limited in underwater intention recognition, impeding the development of intelligent wearable systems for human-robot interactions underwater. Here, we present a novel solution to simultaneously detect head motion and throat vibrations under the water in a compact, wearable design. Experiment results show that using machine learning algorithms, we achieved high performance in integrating these two modalities to translate human intentions to robot control commands for an underwater superlimb system. This study's results paved the way for future development in underwater intention recognition and underwater human-robot interactions with supernumerary support. △ Less

Submitted 16 August, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

Comments: 6 pages, 9 figures, 3 tables, accepted to IEEE CASE 2023

arXiv:2305.06281 [pdf, ps, other]

doi 10.1090/proc/16765

Weyl asymptotics for functional difference operators with power to quadratic exponential potential

Authors: Yaozhong W. Qiu

Abstract: We continue the program first initiated in [Geom. Funct. Anal. 26, 288-305 (2016)] and develop a modification of the technique introduced in that paper to study the spectral asymptotics, namely the Riesz means and eigenvalue counting functions, of functional difference operators $\smash{H_0 = \mathcal F^{-1} M_{\cosh(ξ)} \mathcal F}$ with potentials of the form… ▽ More We continue the program first initiated in [Geom. Funct. Anal. 26, 288-305 (2016)] and develop a modification of the technique introduced in that paper to study the spectral asymptotics, namely the Riesz means and eigenvalue counting functions, of functional difference operators $\smash{H_0 = \mathcal F^{-1} M_{\cosh(ξ)} \mathcal F}$ with potentials of the form $\smash{W(x) = \lvert{x\rvert}^pe^{\lvert{x\rvert}^β}}$ for either $β= 0$ and $p > 0$ or $β\in (0, 2]$ and $p \geq 0$. We provide a new method for studying general potentials which includes the potentials studied in [Geom. Funct. Anal. 26, 288-305 (2016)] and [J. Math. Phys. 60, 103505 (2019)]. The proof involves dilating the variance of the gaussian defining the coherent state transform in a controlled manner preserving the expected asymptotics. △ Less

Submitted 3 April, 2024; v1 submitted 10 May, 2023; originally announced May 2023.

Comments: 14 pages, changed title, some changes made according to referee recommendations, to appear in Proc. Amer. Math. Soc

MSC Class: 34K08; 47A75

arXiv:2305.01006 [pdf, other]

doi 10.1103/PhysRevX.13.041026

Interaction-driven topological phase diagram of twisted bilayer MoTe$_2$

Authors: Wen-Xuan Qiu, Bohao Li, Xun-Jiang Luo, Fengcheng Wu

Abstract: Twisted bilayer MoTe$_2$ is a promising platform to investigate the interplay between band topology and many-body interaction. We present a theoretical study of its interaction-driven quantum phase diagrams based on a three-orbital model, which can be viewed as a generalization of the Kane-Mele-Hubbard model with one additional orbital and long-range Coulomb repulsion. We predict a cascade of phas… ▽ More Twisted bilayer MoTe$_2$ is a promising platform to investigate the interplay between band topology and many-body interaction. We present a theoretical study of its interaction-driven quantum phase diagrams based on a three-orbital model, which can be viewed as a generalization of the Kane-Mele-Hubbard model with one additional orbital and long-range Coulomb repulsion. We predict a cascade of phase transitions tuned by the twist angle $θ$. At the hole filling factor $ν=1$ (one hole per moiré unit cell), the ground state can be in the multiferroic phase with coexisting spontaneous layer polarization and magnetism, the quantum anomalous Hall phase, and finally the topologically trivial magnetic phases, as $θ$ increases from $1.5^{\circ}$ to $5^{\circ}$. At $ν=2$, the ground state can have a second-order phase transition between an antiferromagnetic phase and the quantum spin Hall phase as $θ$ passes through a critical value. The dependence of the phase boundaries on model parameters such as the gate-to-sample distance, the dielectric constant, and the moiré potential amplitude is examined. The predicted phase diagrams can guide the search for topological phases in twisted transition metal dichalcogenide homobilayers. △ Less

Submitted 15 November, 2023; v1 submitted 1 May, 2023; originally announced May 2023.

Comments: 16 pages, 13 figures

Journal ref: Phys. Rev. X 13, 041026 (2023)

arXiv:2304.06287 [pdf, other]

NeRFVS: Neural Radiance Fields for Free View Synthesis via Geometry Scaffolds

Authors: Chen Yang, Peihao Li, Zanwei Zhou, Shanxin Yuan, Bingbing Liu, Xiaokang Yang, Weichao Qiu, Wei Shen

Abstract: We present NeRFVS, a novel neural radiance fields (NeRF) based method to enable free navigation in a room. NeRF achieves impressive performance in rendering images for novel views similar to the input views while suffering for novel views that are significantly different from the training views. To address this issue, we utilize the holistic priors, including pseudo depth maps and view coverage in… ▽ More We present NeRFVS, a novel neural radiance fields (NeRF) based method to enable free navigation in a room. NeRF achieves impressive performance in rendering images for novel views similar to the input views while suffering for novel views that are significantly different from the training views. To address this issue, we utilize the holistic priors, including pseudo depth maps and view coverage information, from neural reconstruction to guide the learning of implicit neural representations of 3D indoor scenes. Concretely, an off-the-shelf neural reconstruction method is leveraged to generate a geometry scaffold. Then, two loss functions based on the holistic priors are proposed to improve the learning of NeRF: 1) A robust depth loss that can tolerate the error of the pseudo depth map to guide the geometry learning of NeRF; 2) A variance loss to regularize the variance of implicit neural representations to reduce the geometry and color ambiguity in the learning procedure. These two loss functions are modulated during NeRF optimization according to the view coverage information to reduce the negative influence brought by the view coverage imbalance. Extensive results demonstrate that our NeRFVS outperforms state-of-the-art view synthesis methods quantitatively and qualitatively on indoor scenes, achieving high-fidelity free navigation results. △ Less

Submitted 23 May, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

Comments: 10 pages, 7 figures

arXiv:2303.15494 [pdf, other]

Semantic-visual Guided Transformer for Few-shot Class-incremental Learning

Authors: Wenhao Qiu, Sichao Fu, Jingyi Zhang, Chengxiang Lei, Qinmu Peng

Abstract: Few-shot class-incremental learning (FSCIL) has recently attracted extensive attention in various areas. Existing FSCIL methods highly depend on the robustness of the feature backbone pre-trained on base classes. In recent years, different Transformer variants have obtained significant processes in the feature representation learning of massive fields. Nevertheless, the progress of the Transformer… ▽ More Few-shot class-incremental learning (FSCIL) has recently attracted extensive attention in various areas. Existing FSCIL methods highly depend on the robustness of the feature backbone pre-trained on base classes. In recent years, different Transformer variants have obtained significant processes in the feature representation learning of massive fields. Nevertheless, the progress of the Transformer in FSCIL scenarios has not achieved the potential promised in other fields so far. In this paper, we develop a semantic-visual guided Transformer (SV-T) to enhance the feature extracting capacity of the pre-trained feature backbone on incremental classes. Specifically, we first utilize the visual (image) labels provided by the base classes to supervise the optimization of the Transformer. And then, a text encoder is introduced to automatically generate the corresponding semantic (text) labels for each image from the base classes. Finally, the constructed semantic labels are further applied to the Transformer for guiding its hyperparameters updating. Our SV-T can take full advantage of more supervision information from base classes and further enhance the training robustness of the feature backbone. More importantly, our SV-T is an independent method, which can directly apply to the existing FSCIL architectures for acquiring embeddings of various incremental classes. Extensive experiments on three benchmarks, two FSCIL architectures, and two Transformer variants show that our proposed SV-T obtains a significant improvement in comparison to the existing state-of-the-art FSCIL methods. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: Accepted by IEEE International Conference on Multimedia and Expo (ICME 2023)

arXiv:2303.13567 [pdf]

AI Models Close to your Chest: Robust Federated Learning Strategies for Multi-site CT

Authors: Edward H. Lee, Brendan Kelly, Emre Altinmakas, Hakan Dogan, Maryam Mohammadzadeh, Errol Colak, Steve Fu, Olivia Choudhury, Ujjwal Ratan, Felipe Kitamura, Hernan Chaves, Jimmy Zheng, Mourad Said, Eduardo Reis, Jaekwang Lim, Patricia Yokoo, Courtney Mitchell, Golnaz Houshmand, Marzyeh Ghassemi, Ronan Killeen, Wendy Qiu, Joel Hayden, Farnaz Rafiee, Chad Klochko, Nicholas Bevins , et al. (5 additional authors not shown)

Abstract: While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI developm… ▽ More While it is well known that population differences from genetics, sex, race, and environmental factors contribute to disease, AI studies in medicine have largely focused on locoregional patient cohorts with less diverse data sources. Such limitation stems from barriers to large-scale data share and ethical concerns over data privacy. Federated learning (FL) is one potential pathway for AI development that enables learning across hospitals without data share. In this study, we show the results of various FL strategies on one of the largest and most diverse COVID-19 chest CT datasets: 21 participating hospitals across five continents that comprise >10,000 patients with >1 million images. We also propose an FL strategy that leverages synthetically generated data to overcome class and size imbalances. We also describe the sources of data heterogeneity in the context of FL, and show how even among the correctly labeled populations, disparities can arise due to these biases. △ Less

Submitted 13 April, 2023; v1 submitted 23 March, 2023; originally announced March 2023.

arXiv:2303.09143 [pdf, ps, other]

Weak discrete maximum principle of isoparametric finite element methods in curvilinear polyhedra

Authors: Buyang Li, Weifeng Qiu, Yupei Xie, Wenshan Yu

Abstract: The weak maximum principle of the isoparametric finite element method is proved for the Poisson equation under the Dirichlet boundary condition in a (possibly concave) curvilinear polyhedral domain with edge openings smaller than $π$, which include smooth domains and smooth deformations of convex polyhedra. The proof relies on the analysis of a dual elliptic problem with a discontinuous coefficien… ▽ More The weak maximum principle of the isoparametric finite element method is proved for the Poisson equation under the Dirichlet boundary condition in a (possibly concave) curvilinear polyhedral domain with edge openings smaller than $π$, which include smooth domains and smooth deformations of convex polyhedra. The proof relies on the analysis of a dual elliptic problem with a discontinuous coefficient matrix arising from the isoparametric finite elements. Therefore, the standard $H^2$ elliptic regularity which is required in the proof of the weak maximum principle in the literature does not hold for this dual problem. To overcome this difficulty, we have decomposed the solution into a smooth part and a nonsmooth part, and estimated the two parts by $H^2$ and $W^{1,p}$ estimates, respectively. As an application of the weak maximum principle, we have proved a maximum-norm best approximation property of the isoparametric finite element method for the Poisson equation in a curvilinear polyhedron. The proof contains non-trivial modifications of Schatz's argument due to the non-conformity of the iso-parametric finite elements, which requires us to construct a globally smooth flow map which maps the curvilinear polyhedron to a perturbed larger domain on which we can establish the $W^{1,\infty}$ regularity estimate of the Poisson equation uniformly with respect to the perturbation. △ Less

Submitted 16 March, 2023; originally announced March 2023.

arXiv:2302.08250 [pdf, other]

Self-supervised Guided Hypergraph Feature Propagation for Semi-supervised Classification with Missing Node Features

Authors: Chengxiang Lei, Sichao Fu, Yuetian Wang, Wenhao Qiu, Yachen Hu, Qinmu Peng, Xinge You

Abstract: Graph neural networks (GNNs) with missing node features have recently received increasing interest. Such missing node features seriously hurt the performance of the existing GNNs. Some recent methods have been proposed to reconstruct the missing node features by the information propagation among nodes with known and unknown attributes. Although these methods have achieved superior performance, how… ▽ More Graph neural networks (GNNs) with missing node features have recently received increasing interest. Such missing node features seriously hurt the performance of the existing GNNs. Some recent methods have been proposed to reconstruct the missing node features by the information propagation among nodes with known and unknown attributes. Although these methods have achieved superior performance, how to exactly exploit the complex data correlations among nodes to reconstruct missing node features is still a great challenge. To solve the above problem, we propose a self-supervised guided hypergraph feature propagation (SGHFP). Specifically, the feature hypergraph is first generated according to the node features with missing information. And then, the reconstructed node features produced by the previous iteration are fed to a two-layer GNNs to construct a pseudo-label hypergraph. Before each iteration, the constructed feature hypergraph and pseudo-label hypergraph are fused effectively, which can better preserve the higher-order data correlations among nodes. After then, we apply the fused hypergraph to the feature propagation for reconstructing missing features. Finally, the reconstructed node features by multi-iteration optimization are applied to the downstream semi-supervised classification task. Extensive experiments demonstrate that the proposed SGHFP outperforms the existing semi-supervised classification with missing node feature methods. △ Less

Submitted 16 February, 2023; originally announced February 2023.

Comments: Accepted by 48th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

arXiv:2302.05014 [pdf, ps, other]

A Non-gradient DG method for second-order Elliptic Equations in the Non-divergence Form

Authors: Weifeng Qiu, Jin Ren, Ke Shi, Yuesheng Xu

Abstract: $L^1$ based optimization is widely used in image denoising, machine learning and related applications. One of the main features of such approach is that it naturally provide a sparse structure in the numerical solutions. In this paper, we study an $L^1… ▽ More $L^1$ based optimization is widely used in image denoising, machine learning and related applications. One of the main features of such approach is that it naturally provide a sparse structure in the numerical solutions. In this paper, we study an $L^1$ based mixed DG method for second-order elliptic equations in the non-divergence form. The elliptic PDE in nondivergence form arises in the linearization of fully nonlinear PDEs. Due to the nature of the equations, classical finite element methods based on variational forms can not be employed directly. In this work, we propose a new optimization scheme coupling the classical DG framework with recently developed $L^1$ optimization technique. Convergence analysis in both energy norm and $L^{\infty}$ norm are obtained under weak regularity assumption. Such $L^1$ models are nondifferentiable and therefore invalidate traditional gradient methods. Therefore all existing gradient based solvers are no longer feasible under this setting. To overcome this difficulty, we characterize solutions of $L^1$ optimization as fixed-points of proximity equations and utilize matrix splitting technique to obtain a class of fixed-point proximity algorithms with convergence analysis. Various numerical examples are displayed to illustrate the numerical solution has sparse structure with careful choice of the bases of the finite dimensional spaces. Numerical examples in both smooth and nonsmooth settings are provided to validate the theoretical results. △ Less

Submitted 9 February, 2023; originally announced February 2023.

arXiv:2302.03429 [pdf, other]

Towards Skilled Population Curriculum for Multi-Agent Reinforcement Learning

Authors: Rundong Wang, Longtao Zheng, Wei Qiu, Bowei He, Bo An, Zinovi Rabinovich, Yujing Hu, Yingfeng Chen, Tangjie Lv, Changjie Fan

Abstract: Recent advances in multi-agent reinforcement learning (MARL) allow agents to coordinate their behaviors in complex environments. However, common MARL algorithms still suffer from scalability and sparse reward issues. One promising approach to resolving them is automatic curriculum learning (ACL). ACL involves a student (curriculum learner) training on tasks of increasing difficulty controlled by a… ▽ More Recent advances in multi-agent reinforcement learning (MARL) allow agents to coordinate their behaviors in complex environments. However, common MARL algorithms still suffer from scalability and sparse reward issues. One promising approach to resolving them is automatic curriculum learning (ACL). ACL involves a student (curriculum learner) training on tasks of increasing difficulty controlled by a teacher (curriculum generator). Despite its success, ACL's applicability is limited by (1) the lack of a general student framework for dealing with the varying number of agents across tasks and the sparse reward problem, and (2) the non-stationarity of the teacher's task due to ever-changing student strategies. As a remedy for ACL, we introduce a novel automatic curriculum learning framework, Skilled Population Curriculum (SPC), which adapts curriculum learning to multi-agent coordination. Specifically, we endow the student with population-invariant communication and a hierarchical skill set, allowing it to learn cooperation and behavior skills from distinct tasks with varying numbers of agents. In addition, we model the teacher as a contextual bandit conditioned by student policies, enabling a team of agents to change its size while still retaining previously acquired skills. We also analyze the inherent non-stationarity of this multi-agent automatic curriculum teaching problem and provide a corresponding regret bound. Empirical results show that our method improves the performance, scalability and sample efficiency in several MARL environments. △ Less

Submitted 7 February, 2023; originally announced February 2023.

arXiv:2301.00557 [pdf, other]

Learning to Maximize Mutual Information for Dynamic Feature Selection

Authors: Ian Covert, Wei Qiu, Mingyu Lu, Nayoon Kim, Nathan White, Su-In Lee

Abstract: Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning, but we explore a simpler approach of greedily selecting featu… ▽ More Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning, but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality, and it outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem. △ Less

Submitted 8 June, 2023; v1 submitted 2 January, 2023; originally announced January 2023.

Comments: ICML 2023 camera-ready

arXiv:2212.12878 [pdf, other]

A Lightweight Reconstruction Network for Surface Defect Inspection

Authors: Chao Hu, Jian Yao, Weijie Wu, Weibin Qiu, Liqiang Zhu

Abstract: Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction a… ▽ More Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy. △ Less

Submitted 25 December, 2022; originally announced December 2022.

Comments: Journal of Mathematical Imaging and Vision(JMIV)

Journal ref: 2023 Journal of Mathematical Imaging and Vision(2023 JMIV)

arXiv:2212.03978 [pdf, other]

Learning Graph Search Heuristics

Authors: Michal Pándy, Weikang Qiu, Gabriele Corso, Petar Veličković, Rex Ying, Jure Leskovec, Pietro Liò

Abstract: Searching for a path between two nodes in a graph is one of the most well-studied and fundamental problems in computer science. In numerous domains such as robotics, AI, or biology, practitioners develop search heuristics to accelerate their pathfinding algorithms. However, it is a laborious and complex process to hand-design heuristics based on the problem and the structure of a given use case. H… ▽ More Searching for a path between two nodes in a graph is one of the most well-studied and fundamental problems in computer science. In numerous domains such as robotics, AI, or biology, practitioners develop search heuristics to accelerate their pathfinding algorithms. However, it is a laborious and complex process to hand-design heuristics based on the problem and the structure of a given use case. Here we present PHIL (Path Heuristic with Imitation Learning), a novel neural architecture and a training algorithm for discovering graph search and navigation heuristics from data by leveraging recent advances in imitation learning and graph representation learning. At training time, we aggregate datasets of search trajectories and ground-truth shortest path distances, which we use to train a specialized graph neural network-based heuristic function using backpropagation through steps of the pathfinding process. Our heuristic function learns graph embeddings useful for inferring node distances, runs in constant time independent of graph sizes, and can be easily incorporated in an algorithm such as A* at test time. Experiments show that PHIL reduces the number of explored nodes compared to state-of-the-art methods on benchmark datasets by 58.5\% on average, can be directly applied in diverse graphs ranging from biological networks to road networks, and allows for fast planning in time-critical robotics domains. △ Less

Submitted 10 January, 2023; v1 submitted 7 December, 2022; originally announced December 2022.

arXiv:2212.01528

IDMS: Instance Depth for Multi-scale Monocular 3D Object Detection

Authors: Chao Hu, Liqiang Zhu, Weibing Qiu, Weijie Wu

Abstract: Due to the lack of depth information of images and poor detection accuracy in monocular 3D object detection, we proposed the instance depth for multi-scale monocular 3D object detection method. Firstly, to enhance the model's processing ability for different scale targets, a multi-scale perception module based on dilated convolution is designed, and the depth features containing multi-scale inform… ▽ More Due to the lack of depth information of images and poor detection accuracy in monocular 3D object detection, we proposed the instance depth for multi-scale monocular 3D object detection method. Firstly, to enhance the model's processing ability for different scale targets, a multi-scale perception module based on dilated convolution is designed, and the depth features containing multi-scale information are re-refined from both spatial and channel directions considering the inconsistency between feature maps of different scales. Firstly, we designed a multi-scale perception module based on dilated convolution to enhance the model's processing ability for different scale targets. The depth features containing multi-scale information are re-refined from spatial and channel directions considering the inconsistency between feature maps of different scales. Secondly, so as to make the model obtain better 3D perception, this paper proposed to use the instance depth information as an auxiliary learning task to enhance the spatial depth feature of the 3D target and use the sparse instance depth to supervise the auxiliary task. Finally, by verifying the proposed algorithm on the KITTI test set and evaluation set, the experimental results show that compared with the baseline method, the proposed method improves by 5.27\% in AP40 in the car category, effectively improving the detection performance of the monocular 3D object detection algorithm. △ Less

Submitted 13 February, 2023; v1 submitted 2 December, 2022; originally announced December 2022.

Comments: This submission has been withdrawn by arXiv administrators due to inappropriate text overlap with external sources

Journal ref: Journal of Machine Learning Research 2023

arXiv:2211.17228 [pdf, other]

doi 10.1609/aaai.v37i8.26101

AIO-P: Expanding Neural Performance Predictors Beyond Image Classification

Authors: Keith G. Mills, Di Niu, Mohammad Salameh, Weichen Qiu, Fred X. Han, Puyuan Liu, Jialin Zhang, Wei Lu, Shangling Jui

Abstract: Evaluating neural network performance is critical to deep neural network design but a costly procedure. Neural predictors provide an efficient solution by treating architectures as samples and learning to estimate their performance on a given task. However, existing predictors are task-dependent, predominantly estimating neural network performance on image classification benchmarks. They are also… ▽ More Evaluating neural network performance is critical to deep neural network design but a costly procedure. Neural predictors provide an efficient solution by treating architectures as samples and learning to estimate their performance on a given task. However, existing predictors are task-dependent, predominantly estimating neural network performance on image classification benchmarks. They are also search-space dependent; each predictor is designed to make predictions for a specific architecture search space with predefined topologies and set of operations. In this paper, we propose a novel All-in-One Predictor (AIO-P), which aims to pretrain neural predictors on architecture examples from multiple, separate computer vision (CV) task domains and multiple architecture spaces, and then transfer to unseen downstream CV tasks or neural architectures. We describe our proposed techniques for general graph representation, efficient predictor pretraining and knowledge infusion techniques, as well as methods to transfer to downstream tasks/spaces. Extensive experimental results show that AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively, on a breadth of target downstream CV tasks with or without fine-tuning, outperforming a number of baselines. Moreover, AIO-P can directly transfer to new architectures not seen during training, accurately rank them and serve as an effective performance estimator when paired with an algorithm designed to preserve performance while reducing FLOPs. △ Less

Submitted 24 April, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

Comments: AAAI 2023 Oral Presentation; version includes supplementary material; 16 Pages, 4 Figures, 22 Tables

arXiv:2211.13555 [pdf, other]

Error estimates for the scalar auxiliary variable (SAV) scheme to the Cahn-Hilliard equation

Authors: Shu Ma, Weifeng Qiu, Xiaofeng Yang

Abstract: The optimal error estimate that depending only on the polynomial degree of $ \varepsilon^{-1}$ is established for the temporal semi-discrete scheme of the Cahn-Hilliard equation, which is based on the scalar auxiliary variable (SAV) formulation. The key to our analysis is to convert the structure of the SAV time-stepping scheme back to a form compatible with the original format of the Cahn-Hilliar… ▽ More The optimal error estimate that depending only on the polynomial degree of $ \varepsilon^{-1}$ is established for the temporal semi-discrete scheme of the Cahn-Hilliard equation, which is based on the scalar auxiliary variable (SAV) formulation. The key to our analysis is to convert the structure of the SAV time-stepping scheme back to a form compatible with the original format of the Cahn-Hilliard equation, which makes it feasible to use spectral estimates to handle the nonlinear term. Based on the transformation of the SAV numerical scheme, the optimal error estimate for the temporal semi-discrete scheme which depends only on the low polynomial order of $\varepsilon^{-1}$ instead of the exponential order, is derived by using mathematical induction, spectral arguments, and the superconvergence properties of some nonlinear terms. Numerical examples are provided to illustrate the discrete energy decay property and validate our theoretical convergence analysis. △ Less

Submitted 7 December, 2022; v1 submitted 24 November, 2022; originally announced November 2022.

Comments: 25 pages, 8 figures

Showing 1–50 of 219 results for author: Qiu, W