subscribe to arXiv mailings

Proof of Lew's conjecture on the spectral gap of simplicial complex

Authors: Xiongfeng Zhan, Xueyi Huang, Huiqiu Lin

Abstract: Let $X$ be a simplicial complex on vertex set $V$ of size $n$. Let $X(k)$ denote the set of all $k$-dimensional simplices of $X$, and $\mathrm{deg}_X(σ)=|\{η\in X(k+1):σ\subseteq η\}|$ denote the degree of $σ\in X$. A missing face in $X$ is a subset $σ$ of $V$ such that $σ\notin X$ but $τ\in X$ for any proper subset $τ$ of $σ$. Let $d$ denote the maximal dimension of a missing face of $X$, and… ▽ More Let $X$ be a simplicial complex on vertex set $V$ of size $n$. Let $X(k)$ denote the set of all $k$-dimensional simplices of $X$, and $\mathrm{deg}_X(σ)=|\{η\in X(k+1):σ\subseteq η\}|$ denote the degree of $σ\in X$. A missing face in $X$ is a subset $σ$ of $V$ such that $σ\notin X$ but $τ\in X$ for any proper subset $τ$ of $σ$. Let $d$ denote the maximal dimension of a missing face of $X$, and $μ_k(X)$ denote the $k$-th spectral gap of $X$, i.e., the smallest eigenvalue of the reduced $k$-dimensional Laplacian of $X$. In [J. Combin. Theory Ser. A 169 (2020) 105127], Lew established a lower bound for $μ_k(X)$: $$μ_k(X)\geq (d+1)\left(\min_{σ\in X(k)}\mathrm{deg}_X(σ)+k+1\right)-dn\geq (d+1)(k+1)-dn,$$ and further conjectured that if $μ_k(X)=(d+1)(k+1)-dn$ for some $k$, then $X\cong (Δ_d^{(d-1)})^{*(n-k-1)}*Δ_{(d+1)(k+1)-dn-1}$. In this paper, we confirm Lew's conjecture. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 14 pages

MSC Class: 05E45

arXiv:2407.08763 [pdf, ps, other]

On distance-regular Cayley graphs over abelian groups of rank $2$

Authors: Xiongfeng Zhan, Xueyi Huang, Lu Lu

Abstract: In 2007, Miklavič and Potočnik proposed the problem of characterizing distance-regular Cayley graphs over specified groups, which can be viewed as a natural extension of the problem of characterizing strongly regular Cayley graphs, or equivalently, regular partial difference sets. In this paper, we consider the Miklavič-Potočnik problem for abelian groups of rank $2$. More specifically, we determi… ▽ More In 2007, Miklavič and Potočnik proposed the problem of characterizing distance-regular Cayley graphs over specified groups, which can be viewed as a natural extension of the problem of characterizing strongly regular Cayley graphs, or equivalently, regular partial difference sets. In this paper, we consider the Miklavič-Potočnik problem for abelian groups of rank $2$. More specifically, we determine all distance-regular Cayley graphs over the group $\mathbb{Z}_n\oplus \mathbb{Z}_p$, where $p$ is an odd prime. Our proof use some new tools such as polynomial addition set, Desarguesian affine plane, and duality of Schur rings over abelian groups. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 33 pages. arXiv admin note: text overlap with arXiv:2308.14368, arXiv:2311.08128

MSC Class: 05E30; 05C25; 05C50

arXiv:2406.18053 [pdf, other]

Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies

Authors: Yu Luo, Fuchun Sun, Tianying Ji, Xianyuan Zhan

Abstract: Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, the effectiveness of HRL is greatly influenced by subgoal reachability. Typical HRL methods only consider subgoal reachability from the unilateral level, where a dominant level enforces compliance to the subordinate level. However, we observe that when the dominan… ▽ More Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by skillfully decomposing them into subgoals. Therefore, the effectiveness of HRL is greatly influenced by subgoal reachability. Typical HRL methods only consider subgoal reachability from the unilateral level, where a dominant level enforces compliance to the subordinate level. However, we observe that when the dominant level becomes trapped in local exploration or generates unattainable subgoals, the subordinate level is negatively affected and cannot follow the dominant level's actions. This can potentially make both levels stuck in local optima, ultimately hindering subsequent subgoal reachability. Allowing real-time bilateral information sharing and error correction would be a natural cure for this issue, which motivates us to propose a mutual response mechanism. Based on this, we propose the Bidirectional-reachable Hierarchical Policy Optimization (BrHPO)--a simple yet effective algorithm that also enjoys computation efficiency. Experiment results on a variety of long-horizon tasks showcase that BrHPO outperforms other state-of-the-art HRL baselines, coupled with a significantly higher exploration efficiency and robustness. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.08899 [pdf, other]

ESND: An Embedding-based Framework for Signed Network Dismantling

Authors: Chenwei Xie, Chuang Liu, Cong Li, Xiu-Xiu Zhan, Xiang Li

Abstract: Network dismantling aims to maximize the disintegration of a network by removing a specific set of nodes or edges and is applied to various tasks in diverse domains, such as cracking down on crime organizations, delaying the propagation of rumors, and blocking the transmission of viruses. Most of the current network dismantling methods are tailored for unsigned networks, which only consider the co… ▽ More Network dismantling aims to maximize the disintegration of a network by removing a specific set of nodes or edges and is applied to various tasks in diverse domains, such as cracking down on crime organizations, delaying the propagation of rumors, and blocking the transmission of viruses. Most of the current network dismantling methods are tailored for unsigned networks, which only consider the connection between nodes without evaluating the nature of the relationships, such as friendship/hostility, enhancing/repressing, and trust/distrust. We here propose an embedding-based algorithm, namely ESND, to solve the signed network dismantling problem. The algorithm generally iterates the following four steps, i.e., giant component detection, network embedding, node clustering, and removal node selection. To illustrate the efficacy and stability of ESND, we conduct extensive experiments on six signed network datasets as well as null models, and compare the performance of our method with baselines. Experimental results consistently show that the proposed ESND is superior to the baselines and displays stable performance with the change in the network structure. Additionally, we examine the impact of sign proportions on network robustness via ESND, observing that networks with a high ratio of negative edges are generally easier to dismantle than networks with high positive edges. △ Less

Submitted 21 June, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08756 [pdf, other]

Optimizing Large Model Training through Overlapped Activation Recomputation

Authors: Ping Chen, Wenjie Zhang, Shuibing He, Yingjie Gu, Zhuwei Peng, Kexin Huang, Xuan Zhan, Weijian Chen, Yi Zheng, Zhefeng Wang, Yanlong Yin, Gang Chen

Abstract: Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a ne… ▽ More Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a new recomputation framework, Lynx, to reduce the overhead by overlapping the recomputation with communication occurring in training pipelines. It consists of an optimal scheduling algorithm (OPT) and a heuristic-based scheduling algorithm (HEU). OPT achieves a global optimum but suffers from a long search time. HEU was designed based on our observation that there are identical structures in large DNN models so that we can apply the same scheduling policy to all identical structures. HEU achieves a local optimum but reduces the search time by 99% compared to OPT. Our comprehensive evaluation using GPT models with 1.3B-20B parameters shows that both OPT and HEU outperform the state-of-the-art recomputation approaches (e.g., Megatron-LM and Checkmake) by 1.02-1.53x. HEU achieves a similar performance as OPT with a search time of 0.16s on average. △ Less

Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 13 pages

arXiv:2406.08386 [pdf, other]

Banal Deception Human-AI Ecosystems: A Study of People's Perceptions of LLM-generated Deceptive Behaviour

Authors: Xiao Zhan, Yifan Xu, Noura Abdi, Joe Collenette, Ruba Abu-Salma, Stefan Sarkadi

Abstract: Large language models (LLMs) can provide users with false, inaccurate, or misleading information, and we consider the output of this type of information as what Natale (2021) calls `banal' deceptive behaviour. Here, we investigate peoples' perceptions of ChatGPT-generated deceptive behaviour and how this affects peoples' own behaviour and trust. To do this, we use a mixed-methods approach comprisi… ▽ More Large language models (LLMs) can provide users with false, inaccurate, or misleading information, and we consider the output of this type of information as what Natale (2021) calls `banal' deceptive behaviour. Here, we investigate peoples' perceptions of ChatGPT-generated deceptive behaviour and how this affects peoples' own behaviour and trust. To do this, we use a mixed-methods approach comprising of (i) an online survey with 220 participants and (ii) semi-structured interviews with 12 participants. Our results show that (i) the most common types of deceptive information encountered were over-simplifications and outdated information; (ii) humans' perceptions of trust and `worthiness' of talking to ChatGPT are impacted by `banal' deceptive behaviour; (iii) the perceived responsibility for deception is influenced by education level and the frequency of deceptive information; and (iv) users become more cautious after encountering deceptive information, but they come to trust the technology more when they identify advantages of using it. Our findings contribute to the understanding of human-AI interaction dynamics in the context of \textit{Deceptive AI Ecosystems}, and highlight the importance of user-centric approaches to mitigating the potential harms of deceptive AI technologies. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2405.19783 [pdf, other]

Instruction-Guided Visual Masking

Authors: Jinliang Zheng, Jianxiong Li, Sijie Cheng, Yinan Zheng, Jiaming Li, Jihao Liu, Yu Liu, Jingjing Liu, Xianyuan Zhan

Abstract: Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with d… ▽ More Instruction following is crucial in contemporary LLM. However, when extended to multimodal setting, it often suffers from misalignment between specific textual instruction and targeted local region of an image. To achieve more accurate and nuanced multimodal instruction following, we introduce Instruction-guided Visual Masking (IVM), a new versatile visual grounding model that is compatible with diverse multimodal models, such as LMM and robot model. By constructing visual masks for instruction-irrelevant regions, IVM-enhanced multimodal models can effectively focus on task-relevant image regions to better align with complex instructions. Specifically, we design a visual masking data generation pipeline and create an IVM-Mix-1M dataset with 1 million image-instruction pairs. We further introduce a new learning technique, Discriminator Weighted Supervised Learning (DWSL) for preferential IVM training that prioritizes high-quality data samples. Experimental results on generic multimodal tasks such as VQA and embodied robotic control demonstrate the versatility of IVM, which as a plug-and-play tool, significantly boosts the performance of diverse multimodal models, yielding new state-of-the-art results across challenging multimodal benchmarks. Code is available at https://github.com/2toinf/IVM. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: preprint, 21 pages

arXiv:2405.19283 [pdf, other]

Programmable Motion Generation for Open-Set Motion Control Tasks

Authors: Hanchao Liu, Xiaohang Zhan, Shaoli Huang, Tai-Jiang Mu, Ying Shan

Abstract: Character animation in real-world scenarios necessitates a variety of constraints, such as trajectories, key-frames, interactions, etc. Existing methodologies typically treat single or a finite set of these constraint(s) as separate control tasks. They are often specialized, and the tasks they address are rarely extendable or customizable. We categorize these as solutions to the close-set motion c… ▽ More Character animation in real-world scenarios necessitates a variety of constraints, such as trajectories, key-frames, interactions, etc. Existing methodologies typically treat single or a finite set of these constraint(s) as separate control tasks. They are often specialized, and the tasks they address are rarely extendable or customizable. We categorize these as solutions to the close-set motion control problem. In response to the complexity of practical motion control, we propose and attempt to solve the open-set motion control problem. This problem is characterized by an open and fully customizable set of motion control tasks. To address this, we introduce a new paradigm, programmable motion generation. In this paradigm, any given motion control task is broken down into a combination of atomic constraints. These constraints are then programmed into an error function that quantifies the degree to which a motion sequence adheres to them. We utilize a pre-trained motion generation model and optimize its latent code to minimize the error function of the generated motion. Consequently, the generated motion not only inherits the prior of the generative model but also satisfies the required constraints. Experiments show that we can generate high-quality motions when addressing a wide range of unseen tasks. These tasks encompass motion control by motion dynamics, geometric constraints, physical laws, interactions with scenes, objects or the character own body parts, etc. All of these are achieved in a unified approach, without the need for ad-hoc paired training data collection or specialized network designs. During the programming of novel tasks, we observed the emergence of new skills beyond those of the prior model. With the assistance of large language models, we also achieved automatic programming. We hope that this work will pave the way for the motion control of general AI agents. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Accepted by CVPR 2024

arXiv:2405.19080 [pdf, other]

OMPO: A Unified Framework for RL under Policy and Dynamics Shifts

Authors: Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan

Abstract: Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy performances and high learning variances. In this pap… ▽ More Training reinforcement learning policies using environment interaction data collected from varying policies or dynamics presents a fundamental challenge. Existing works often overlook the distribution discrepancies induced by policy or dynamics shifts, or rely on specialized algorithms with task priors, thus often resulting in suboptimal policy performances and high learning variances. In this paper, we identify a unified strategy for online RL policy learning under diverse settings of policy and dynamics shifts: transition occupancy matching. In light of this, we introduce a surrogate policy learning objective by considering the transition occupancy discrepancies and then cast it into a tractable min-max optimization problem through dual reformulation. Our method, dubbed Occupancy-Matching Policy Optimization (OMPO), features a specialized actor-critic structure equipped with a distribution discriminator and a small-size local buffer. We conduct extensive experiments based on the OpenAI Gym, Meta-World, and Panda Robots environments, encompassing policy shifts under stationary and nonstationary dynamics, as well as domain adaption. The results demonstrate that OMPO outperforms the specialized baselines from different categories in all settings. We also find that OMPO exhibits particularly strong performance when combined with domain randomization, highlighting its potential in RL-based robotics applications △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18520 [pdf, other]

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Authors: Yu Luo, Tianying Ji, Fuchun Sun, Jianwei Zhang, Huazhe Xu, Xianyuan Zhan

Abstract: Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer, limiting sample efficiency and policy performance. In this work, we discover that concurrently training an offline R… ▽ More Off-policy reinforcement learning (RL) has achieved notable success in tackling many complex real-world tasks, by leveraging previously collected data for policy learning. However, most existing off-policy RL algorithms fail to maximally exploit the information in the replay buffer, limiting sample efficiency and policy performance. In this work, we discover that concurrently training an offline RL policy based on the shared online replay buffer can sometimes outperform the original online learning policy, though the occurrence of such performance gains remains uncertain. This motivates a new possibility of harnessing the emergent outperforming offline optimal policy to improve online policy learning. Based on this insight, we present Offline-Boosted Actor-Critic (OBAC), a model-free online RL framework that elegantly identifies the outperforming offline policy through value comparison, and uses it as an adaptive constraint to guarantee stronger policy learning performance. Our experiments demonstrate that OBAC outperforms other popular model-free RL baselines and rivals advanced model-based RL methods in terms of sample efficiency and asymptotic performance across 53 tasks spanning 6 task suites. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.09819 [pdf]

Automating the Training and Deployment of Models in MLOps by Integrating Systems with Machine Learning

Authors: Penghao Liang, Bo Song, Xiaoan Zhan, Zhou Chen, Jiaqiang Yuan

Abstract: This article introduces the importance of machine learning in real-world applications and explores the rise of MLOps (Machine Learning Operations) and its importance for solving challenges such as model deployment and performance monitoring. By reviewing the evolution of MLOps and its relationship to traditional software development methods, the paper proposes ways to integrate the system into mac… ▽ More This article introduces the importance of machine learning in real-world applications and explores the rise of MLOps (Machine Learning Operations) and its importance for solving challenges such as model deployment and performance monitoring. By reviewing the evolution of MLOps and its relationship to traditional software development methods, the paper proposes ways to integrate the system into machine learning to solve the problems faced by existing MLOps and improve productivity. This paper focuses on the importance of automated model training, and the method to ensure the transparency and repeatability of the training process through version control system. In addition, the challenges of integrating machine learning components into traditional CI/CD pipelines are discussed, and solutions such as versioning environments and containerization are proposed. Finally, the paper emphasizes the importance of continuous monitoring and feedback loops after model deployment to maintain model performance and reliability. Using case studies and best practices from Netflix, the article presents key strategies and lessons learned for successful implementation of MLOps practices, providing valuable references for other organizations to build and optimize their own MLOps practices. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.07479 [pdf, other]

Enhancing 3D Object Detection by Using Neural Network with Self-adaptive Thresholding

Authors: Houze Liu, Chongqing Wang, Xiaoan Zhan, Haotian Zheng, Chang Che

Abstract: Robust 3D object detection remains a pivotal concern in the domain of autonomous field robotics. Despite notable enhancements in detection accuracy across standard datasets, real-world urban environments, characterized by their unstructured and dynamic nature, frequently precipitate an elevated incidence of false positives, thereby undermining the reliability of existing detection paradigms. In th… ▽ More Robust 3D object detection remains a pivotal concern in the domain of autonomous field robotics. Despite notable enhancements in detection accuracy across standard datasets, real-world urban environments, characterized by their unstructured and dynamic nature, frequently precipitate an elevated incidence of false positives, thereby undermining the reliability of existing detection paradigms. In this context, our study introduces an advanced post-processing algorithm that modulates detection thresholds dynamically relative to the distance from the ego object. Traditional perception systems typically utilize a uniform threshold, which often leads to decreased efficacy in detecting distant objects. In contrast, our proposed methodology employs a Neural Network with a self-adaptive thresholding mechanism that significantly attenuates false negatives while concurrently diminishing false positives, particularly in complex urban settings. Empirical results substantiate that our algorithm not only augments the performance of 3D object detection models in diverse urban and adverse weather scenarios but also establishes a new benchmark for adaptive thresholding techniques in field robotics. △ Less

Submitted 13 May, 2024; originally announced May 2024.

Comments: This paper has been accepted by the CONF-SEML 2024

arXiv:2405.05565 [pdf, other]

doi 10.1109/TGRS.2024.3406711

Array SAR 3D Sparse Imaging Based on Regularization by Denoising Under Few Observed Data

Authors: Yangyang Wang, Xu Zhan, Jing Gao, Jinjie Yao, Shunjun Wei, JianSheng Bai

Abstract: Array synthetic aperture radar (SAR) three-dimensional (3D) imaging can obtain 3D information of the target region, which is widely used in environmental monitoring and scattering information measurement. In recent years, with the development of compressed sensing (CS) theory, sparse signal processing is used in array SAR 3D imaging. Compared with matched filter (MF), sparse SAR imaging can effect… ▽ More Array synthetic aperture radar (SAR) three-dimensional (3D) imaging can obtain 3D information of the target region, which is widely used in environmental monitoring and scattering information measurement. In recent years, with the development of compressed sensing (CS) theory, sparse signal processing is used in array SAR 3D imaging. Compared with matched filter (MF), sparse SAR imaging can effectively improve image quality. However, sparse imaging based on handcrafted regularization functions suffers from target information loss in few observed SAR data. Therefore, in this article, a general 3D sparse imaging framework based on Regulation by Denoising (RED) and proximal gradient descent type method for array SAR is presented. Firstly, we construct explicit prior terms via state-of-the-art denoising operators instead of regularization functions, which can improve the accuracy of sparse reconstruction and preserve the structure information of the target. Then, different proximal gradient descent type methods are presented, including a generalized alternating projection (GAP) and an alternating direction method of multiplier (ADMM), which is suitable for high-dimensional data processing. Additionally, the proposed method has robust convergence, which can achieve sparse reconstruction of 3D SAR in few observed SAR data. Extensive simulations and real data experiments are conducted to analyze the performance of the proposed method. The experimental results show that the proposed method has superior sparse reconstruction performance. △ Less

Submitted 26 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2404.10178 [pdf, other]

CryoMAE: Few-Shot Cryo-EM Particle Picking with Masked Autoencoders

Authors: Chentianye Xu, Xueying Zhan, Min Xu

Abstract: Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods' sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-bas… ▽ More Cryo-electron microscopy (cryo-EM) emerges as a pivotal technology for determining the architecture of cells, viruses, and protein assemblies at near-atomic resolution. Traditional particle picking, a key step in cryo-EM, struggles with manual effort and automated methods' sensitivity to low signal-to-noise ratio (SNR) and varied particle orientations. Furthermore, existing neural network (NN)-based approaches often require extensive labeled datasets, limiting their practicality. To overcome these obstacles, we introduce cryoMAE, a novel approach based on few-shot learning that harnesses the capabilities of Masked Autoencoders (MAE) to enable efficient selection of single particles in cryo-EM images. Contrary to conventional NN-based techniques, cryoMAE requires only a minimal set of positive particle images for training yet demonstrates high performance in particle detection. Furthermore, the implementation of a self-cross similarity loss ensures distinct features for particle and background regions, thereby enhancing the discrimination capability of cryoMAE. Experiments on large-scale cryo-EM datasets show that cryoMAE outperforms existing state-of-the-art (SOTA) methods, improving 3D reconstruction resolution by up to 22.4%. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.01046 [pdf, ps, other]

Locating influential nodes in hypergraphs via fuzzy collective influence

Authors: Su-Su Zhang, Xiaoyan Yu, Gui-Quan Sun, Chuang Liu, Xiu-Xiu Zhan

Abstract: Complex contagion phenomena, such as the spread of information or contagious diseases, often occur among the population due to higher-order interactions between individuals. Individuals who can be represented by nodes in a network may play different roles in the spreading process, and thus finding the most influential nodes in a network has become a crucial topic in network science for application… ▽ More Complex contagion phenomena, such as the spread of information or contagious diseases, often occur among the population due to higher-order interactions between individuals. Individuals who can be represented by nodes in a network may play different roles in the spreading process, and thus finding the most influential nodes in a network has become a crucial topic in network science for applications such as viral marketing, rumor suppression, and disease control. To solve the problem of identifying nodes that have high influence in a complex system, we propose a higher-order distance-based fuzzy centrality methods (HDF and EHDF) that are customized for a hypergraph which can characterize higher-order interactions between nodes via hyperedges. The methods we proposed assume that the influence of a node is reliant on the neighboring nodes with a certain higher-order distance. We compare the proposed methods with the baseline centrality methods to verify their effectiveness. Experimental results on six empirical hypergraphs show that the proposed methods could better identify influential nodes, especially showing plausible performance in finding the top influential nodes. Our proposed theoretical framework for identifying influential nodes could provide insights into how higher-order topological structure can be used for tasks such as vital node identification, influence maximization, and network dismantling. △ Less

Submitted 1 April, 2024; originally announced April 2024.

arXiv:2403.19417 [pdf, other]

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

Authors: Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu

Abstract: We present OAKINK2, a dataset of bimanual object manipulation tasks for complex daily activities. In pursuit of constructing the complex tasks into a structured representation, OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task. OAKINK2 features on an object-centric perspective for decoding the complex tasks, treating them… ▽ More We present OAKINK2, a dataset of bimanual object manipulation tasks for complex daily activities. In pursuit of constructing the complex tasks into a structured representation, OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task. OAKINK2 features on an object-centric perspective for decoding the complex tasks, treating them as a sequence of object affordance fulfillment. The first level, Affordance, outlines the functionalities that objects in the scene can afford, the second level, Primitive Task, describes the minimal interaction units that humans interact with the object to achieve its affordance, and the third level, Complex Task, illustrates how Primitive Tasks are composed and interdependent. OAKINK2 dataset provides multi-view image streams and precise pose annotations for the human body, hands and various interacting objects. This extensive collection supports applications such as interaction reconstruction and motion synthesis. Based on the 3-level abstraction of OAKINK2, we explore a task-oriented framework for Complex Task Completion (CTC). CTC aims to generate a sequence of bimanual manipulation to achieve task objectives. Within the CTC framework, we employ Large Language Models (LLMs) to decompose the complex task objectives into sequences of Primitive Tasks and have developed a Motion Fulfillment Model that generates bimanual hand motion for each Primitive Task. OAKINK2 datasets and models are available at https://oakink.net/v2. △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: To be appeared in CVPR 2024. 26 pages

arXiv:2403.12847 [pdf, other]

Policy Bifurcation in Safe Reinforcement Learning

Authors: Wenjun Zou, Yao Lyu, Jie Li, Yujie Yang, Shengbo Eben Li, Jingliang Duan, Xianyuan Zhan, Jingjing Liu, Yaqin Zhang, Keqiang Li

Abstract: Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous l… ▽ More Safe reinforcement learning (RL) offers advanced solutions to constrained optimal control problems. Existing studies in safe RL implicitly assume continuity in policy functions, where policies map states to actions in a smooth, uninterrupted manner; however, our research finds that in some scenarios, the feasible policy should be discontinuous or multi-valued, interpolating between discontinuous local optima can inevitably lead to constraint violations. We are the first to identify the generating mechanism of such a phenomenon, and employ topological analysis to rigorously prove the existence of policy bifurcation in safe RL, which corresponds to the contractibility of the reachable tuple. Our theorem reveals that in scenarios where the obstacle-free state space is non-simply connected, a feasible policy is required to be bifurcated, meaning its output action needs to change abruptly in response to the varying state. To train such a bifurcated policy, we propose a safe RL algorithm called multimodal policy optimization (MUPO), which utilizes a Gaussian mixture distribution as the policy output. The bifurcated behavior can be achieved by selecting the Gaussian component with the highest mixing coefficient. Besides, MUPO also integrates spectral normalization and forward KL divergence to enhance the policy's capability of exploring different modes. Experiments with vehicle control tasks show that our algorithm successfully learns the bifurcated policy and ensures satisfying safety, while a continuous policy suffers from inevitable constraint violations. △ Less

Submitted 28 March, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

arXiv:2403.09326 [pdf, other]

HeadEvolver: Text to Head Avatars via Expressive and Attribute-Preserving Mesh Deformation

Authors: Duotun Wang, Hengyu Meng, Zeyu Cai, Zhijing Shao, Qianxi Liu, Lin Wang, Mingming Fan, Xiaohang Zhan, Zeyu Wang

Abstract: We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce… ▽ More We present HeadEvolver, a novel framework to generate stylized head avatars from text guidance. HeadEvolver uses locally learnable mesh deformation from a template head mesh, producing high-quality digital assets for detail-preserving editing and animation. To tackle the challenges of lacking fine-grained and semantic-aware local shape control in global deformation through Jacobians, we introduce a trainable parameter as a weighting factor for the Jacobian at each triangle to adaptively change local shapes while maintaining global correspondences and facial features. Moreover, to ensure the coherence of the resulting shape and appearance from different viewpoints, we use pretrained image diffusion models for differentiable rendering with regularization terms to refine the deformation under text guidance. Extensive experiments demonstrate that our method can generate diverse head avatars with an articulated mesh that can be edited seamlessly in 3D graphics software, facilitating downstream applications such as more efficient animation with inherited blend shapes and semantic consistency. △ Less

Submitted 10 June, 2024; v1 submitted 14 March, 2024; originally announced March 2024.

Comments: 12 pages, 17 figures

ACM Class: I.2.6; I.3.8

arXiv:2403.05159 [pdf, other]

LVIC: Multi-modality segmentation by Lifting Visual Info as Cue

Authors: Zichao Dong, Bowen Pang, Xufeng Huang, Hang Ji, Xin Zhan, Junbo Chen

Abstract: Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection e… ▽ More Multi-modality fusion is proven an effective method for 3d perception for autonomous driving. However, most current multi-modality fusion pipelines for LiDAR semantic segmentation have complicated fusion mechanisms. Point painting is a quite straight forward method which directly bind LiDAR points with visual information. Unfortunately, previous point painting like methods suffer from projection error between camera and LiDAR. In our experiments, we find that this projection error is the devil in point painting. As a result of that, we propose a depth aware point painting mechanism, which significantly boosts the multi-modality fusion. Apart from that, we take a deeper look at the desired visual feature for LiDAR to operate semantic segmentation. By Lifting Visual Information as Cue, LVIC ranks 1st on nuScenes LiDAR semantic segmentation benchmark. Our experiments show the robustness and effectiveness. Codes would be make publicly available soon. △ Less

Submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.02561 [pdf, other]

Semantic Human Mesh Reconstruction with Textures

Authors: Xiaoyu Zhan, Jianxin Yang, Yuanqi Li, Jie Guo, Yanwen Guo, Wenping Wang

Abstract: The field of 3D detailed human mesh reconstruction has made significant progress in recent years. However, current methods still face challenges when used in industrial applications due to unstable results, low-quality meshes, and a lack of UV unwrapping and skinning weights. In this paper, we present SHERT, a novel pipeline that can reconstruct semantic human meshes with textures and high-precisi… ▽ More The field of 3D detailed human mesh reconstruction has made significant progress in recent years. However, current methods still face challenges when used in industrial applications due to unstable results, low-quality meshes, and a lack of UV unwrapping and skinning weights. In this paper, we present SHERT, a novel pipeline that can reconstruct semantic human meshes with textures and high-precision details. SHERT applies semantic- and normal-based sampling between the detailed surface (e.g. mesh and SDF) and the corresponding SMPL-X model to obtain a partially sampled semantic mesh and then generates the complete semantic mesh by our specifically designed self-supervised completion and refinement networks. Using the complete semantic mesh as a basis, we employ a texture diffusion model to create human textures that are driven by both images and texts. Our reconstructed meshes have stable UV unwrapping, high-quality triangle meshes, and consistent semantic information. The given SMPL-X model provides semantic information and shape priors, allowing SHERT to perform well even with incorrect and incomplete inputs. The semantic information also makes it easy to substitute and animate different body parts such as the face, body, and hands. Quantitative and qualitative experiments demonstrate that SHERT is capable of producing high-fidelity and robust semantic meshes that outperform state-of-the-art methods. △ Less

Submitted 3 April, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR 2024. Project page: https://zhanxy.xyz/projects/shert/

arXiv:2402.18137 [pdf, other]

DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning

Authors: Jianxiong Li, Jinliang Zheng, Yinan Zheng, Liyuan Mao, Xiao Hu, Sijie Cheng, Haoyi Niu, Jihao Liu, Yu Liu, Jingjing Liu, Ya-Qin Zhang, Xianyuan Zhan

Abstract: Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding. Most existing methods approach these via separate objectives, which often reach sub-optimal solutions. In this pa… ▽ More Multimodal pretraining is an effective strategy for the trinity of goals of representation learning in autonomous robots: 1) extracting both local and global task progressions; 2) enforcing temporal consistency of visual representation; 3) capturing trajectory-level language grounding. Most existing methods approach these via separate objectives, which often reach sub-optimal solutions. In this paper, we propose a universal unified objective that can simultaneously extract meaningful task progression information from image sequences and seamlessly align them with language instructions. We discover that via implicit preferences, where a visual trajectory inherently aligns better with its corresponding language instruction than mismatched pairs, the popular Bradley-Terry model can transform into representation learning through proper reward reparameterizations. The resulted framework, DecisionNCE, mirrors an InfoNCE-style objective but is distinctively tailored for decision-making tasks, providing an embodied representation learning framework that elegantly extracts both local and global task progression features, with temporal consistency enforced through implicit time contrastive learning, while ensuring trajectory-level instruction grounding via multimodal joint encoding. Evaluation on both simulated and real robots demonstrates that DecisionNCE effectively facilitates diverse downstream policy learning tasks, offering a versatile solution for unified representation and reward learning. Project Page: https://2toinf.github.io/DecisionNCE/ △ Less

Submitted 23 May, 2024; v1 submitted 28 February, 2024; originally announced February 2024.

Comments: ICML 2024

arXiv:2402.15580 [pdf, other]

CharacterMixer: Rig-Aware Interpolation of 3D Characters

Authors: Xiao Zhan, Rao Fu, Daniel Ritchie

Abstract: We present CharacterMixer, a system for blending two rigged 3D characters with different mesh and skeleton topologies while maintaining a rig throughout interpolation. CharacterMixer also enables interpolation during motion for such characters, a novel feature. Interpolation is an important shape editing operation, but prior methods have limitations when applied to rigged characters: they either i… ▽ More We present CharacterMixer, a system for blending two rigged 3D characters with different mesh and skeleton topologies while maintaining a rig throughout interpolation. CharacterMixer also enables interpolation during motion for such characters, a novel feature. Interpolation is an important shape editing operation, but prior methods have limitations when applied to rigged characters: they either ignore the rig (making interpolated characters no longer posable) or use a fixed rig and mesh topology. To handle different mesh topologies, CharacterMixer uses a signed distance field (SDF) representation of character shapes, with one SDF per bone. To handle different skeleton topologies, it computes a hierarchical correspondence between source and target character skeletons and interpolates the SDFs of corresponding bones. This correspondence also allows the creation of a single "unified skeleton" for posing and animating interpolated characters. We show that CharacterMixer produces qualitatively better interpolation results than two state-of-the-art methods while preserving a rig throughout interpolation. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.04580 [pdf, other]

A Comprehensive Survey of Cross-Domain Policy Transfer for Embodied Agents

Authors: Haoyi Niu, Jianming Hu, Guyue Zhou, Xianyuan Zhan

Abstract: The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily accessible source domains, such as simulation and labora… ▽ More The burgeoning fields of robot learning and embodied AI have triggered an increasing demand for large quantities of data. However, collecting sufficient unbiased data from the target domain remains a challenge due to costly data collection processes and stringent safety requirements. Consequently, researchers often resort to data from easily accessible source domains, such as simulation and laboratory environments, for cost-effective data acquisition and rapid model iteration. Nevertheless, the environments and embodiments of these source domains can be quite different from their target domain counterparts, underscoring the need for effective cross-domain policy transfer approaches. In this paper, we conduct a systematic review of existing cross-domain policy transfer methods. Through a nuanced categorization of domain gaps, we encapsulate the overarching insights and design considerations of each problem setting. We also provide a high-level discussion about the key methodologies used in cross-domain policy transfer problems. Lastly, we summarize the open challenges that lie beyond the capabilities of current paradigms and discuss potential future directions in this field. △ Less

Submitted 6 February, 2024; originally announced February 2024.

arXiv:2402.04572 [pdf, ps, other]

A conjecture implying Thomassen's chord conjecture in graph theory

Authors: Xingzhi Zhan

Abstract: Thomassen's chord conjecture from 1976 states that every longest cycle in a $3$-connected graph has a chord. This is one of the most important unsolved problems in graph theory. We pose a new conjecture which implies Thomassen's conjecture. It involves bound vertices in a longest path between two vertices in a $k$-connected graph. We also give supporting evidence and analyze a special case. The pu… ▽ More Thomassen's chord conjecture from 1976 states that every longest cycle in a $3$-connected graph has a chord. This is one of the most important unsolved problems in graph theory. We pose a new conjecture which implies Thomassen's conjecture. It involves bound vertices in a longest path between two vertices in a $k$-connected graph. We also give supporting evidence and analyze a special case. The purpose of making this new conjecture is to explore the surroundings of Thomassen's conjecture. △ Less

Submitted 6 February, 2024; originally announced February 2024.

MSC Class: 05C38; 05C40; 05C35

arXiv:2402.00348 [pdf, other]

ODICE: Revealing the Mystery of Distribution Correction Estimation via Orthogonal-gradient Update

Authors: Liyuan Mao, Haoran Xu, Weinan Zhang, Xianyuan Zhan

Abstract: In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE-based methods impose state-action-level behavior constraint, which is an ideal choice for offline learning. However, they typically perform much worse than current state-of-the-art (SOTA) methods that solely use acti… ▽ More In this study, we investigate the DIstribution Correction Estimation (DICE) methods, an important line of work in offline reinforcement learning (RL) and imitation learning (IL). DICE-based methods impose state-action-level behavior constraint, which is an ideal choice for offline learning. However, they typically perform much worse than current state-of-the-art (SOTA) methods that solely use action-level behavior constraint. After revisiting DICE-based methods, we find there exist two gradient terms when learning the value function using true-gradient update: forward gradient (taken on the current state) and backward gradient (taken on the next state). Using forward gradient bears a large similarity to many offline RL methods, and thus can be regarded as applying action-level constraint. However, directly adding the backward gradient may degenerate or cancel out its effect if these two gradients have conflicting directions. To resolve this issue, we propose a simple yet effective modification that projects the backward gradient onto the normal plane of the forward gradient, resulting in an orthogonal-gradient update, a new learning rule for DICE-based methods. We conduct thorough theoretical analyses and find that the projected backward gradient brings state-level behavior regularization, which reveals the mystery of DICE-based methods: the value learning objective does try to impose state-action-level constraint, but needs to be used in a corrected way. Through toy examples and extensive experiments on complex offline RL and IL tasks, we demonstrate that DICE-based methods using orthogonal-gradient updates (O-DICE) achieve SOTA performance and great robustness. △ Less

Submitted 1 February, 2024; originally announced February 2024.

Comments: Spotlight @ ICLR 2024, first two authors contribute equally

arXiv:2401.14328 [pdf, ps, other]

Tripartite entanglement and tripartite steering in three-qubit pure states induced by vacuum-one-photon superpositions

Authors: Jian Wang, Huan Liu, Xue-feng Zhan, Xue-xiang Xu

Abstract: Utilizing a tritter with variable parameter $T$ and induced by vacuum-one-photon superpositions $\left\vert 0\right\rangle +α\left\vert 1\right\rangle $ with $α=\left\vert α\right\vert e^{iφ}$, we generate a class of three-qubit pure states. These states take the form of… ▽ More Utilizing a tritter with variable parameter $T$ and induced by vacuum-one-photon superpositions $\left\vert 0\right\rangle +α\left\vert 1\right\rangle $ with $α=\left\vert α\right\vert e^{iφ}$, we generate a class of three-qubit pure states. These states take the form of $\left\vert ψ\right\rangle _{123}=c_{0}\left\vert 000\right\rangle +c_{1}\left\vert 100\right\rangle +c_{2}\left\vert 010\right\rangle +c_{3}\left\vert 001\right\rangle $. The coefficients ($ c_{0}$, $c_{1}$, $c_{2}$, and $c_{3}$) can be manipulated through interaction parameters ($\left\vert α\right\vert $, $φ$, and $T$). In line with Xie and Eberly's work[Phys. Rev. Lett. 127, 040403 (2021)], we investigate the genuine tripartite entanglement for $\left\vert ψ\right\rangle _{123}$ using the concurrence triangle measure. Drawing on Hao et al.'s research [Phys. Rev. Lett. 128, 120402 (2021)], we examine tripartite steering for $\left\vert ψ\right\rangle _{123}$ under certain measurements based on the uncertainty relations criterion. We identify nine potential configurations exhibiting varying steerability across different parameter spaces. It is important to highlight that while the state $% \left\vert ψ\right\rangle _{123}$ exhibits entanglement, steering remains unattainable in a substantial portion of the parameter space. △ Less

Submitted 25 January, 2024; originally announced January 2024.

Comments: 10 pages, 8 figures, comments are welcome

arXiv:2401.10700 [pdf, other]

Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion Model

Authors: Yinan Zheng, Jianxiong Li, Dongjie Yu, Yujie Yang, Shengbo Eben Li, Xianyuan Zhan, Jingjing Liu

Abstract: Safe offline RL is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios. An alternative is to enforce the hard constraint of zero violation.… ▽ More Safe offline RL is a promising way to bypass risky online interactions towards safe policy learning. Most existing methods only enforce soft constraints, i.e., constraining safety violations in expectation below thresholds predetermined. This can lead to potentially unsafe outcomes, thus unacceptable in safety-critical scenarios. An alternative is to enforce the hard constraint of zero violation. However, this can be challenging in offline setting, as it needs to strike the right balance among three highly intricate and correlated aspects: safety constraint satisfaction, reward maximization, and behavior regularization imposed by offline datasets. Interestingly, we discover that via reachability analysis of safe-control theory, the hard safety constraint can be equivalently translated to identifying the largest feasible region given the offline dataset. This seamlessly converts the original trilogy problem to a feasibility-dependent objective, i.e., maximizing reward value within the feasible region while minimizing safety risks in the infeasible region. Inspired by these, we propose FISOR (FeasIbility-guided Safe Offline RL), which allows safety constraint adherence, reward maximization, and offline policy learning to be realized via three decoupled processes, while offering strong safety performance and stability. In FISOR, the optimal policy for the translated optimization problem can be derived in a special form of weighted behavior cloning. Thus, we propose a novel energy-guided diffusion model that does not require training a complicated time-dependent classifier to extract the policy, greatly simplifying the training. We compare FISOR against baselines on DSRL benchmark for safe offline RL. Evaluation results show that FISOR is the only method that can guarantee safety satisfaction in all tasks, while achieving top returns in most tasks. △ Less

Submitted 19 January, 2024; originally announced January 2024.

Comments: ICLR 2024, 30pages, 11 figures

arXiv:2401.08120 [pdf]

Operation Scheme Optimizations to Achieve Ultra-high Endurance (1010) in Flash Memory with Robust Reliabilities

Authors: Yang Feng, Zhaohui Sun, Chengcheng Wang, Xinyi Guo, Junyao Mei, Yueran Qi, Jing Liu, Junyu Zhang, Jixuan Wu, Xuepeng Zhan, Jiezhi Chen

Abstract: Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability. However, the limited endurance obstacles its further applications in storage class memory (SCM) and to proceed endurance-required computing-in-memory (CIM) tasks. In this work, the optimization strategies have been studied to tackle this concern. It is shown that by adopting the channel ho… ▽ More Flash memory has been widely adopted as stand-alone memory and embedded memory due to its robust reliability. However, the limited endurance obstacles its further applications in storage class memory (SCM) and to proceed endurance-required computing-in-memory (CIM) tasks. In this work, the optimization strategies have been studied to tackle this concern. It is shown that by adopting the channel hot electrons injection (CHEI) and hot hole injection (HHI) to implement program/erase (PE) cycling together with a balanced memory window (MW) at the high-Vth (HV) mode, impressively, the endurance can be greatly extended to 1010 PE cycles, which is a record-high value in flash memory. Moreover, by using the proposed electric-field-assisted relaxation (EAR) scheme, the degradation of flash cells can be well suppressed with better subthreshold swings (SS) and lower leakage currents (sub-10pA after 1010 PE cycles). Our results shed light on the optimization strategy of flash memory to serve as SCM and implementendurance-required CIM tasks. △ Less

Submitted 16 January, 2024; originally announced January 2024.

arXiv:2401.06445 [pdf, other]

Directed network comparison using motifs

Authors: Chenwei Xie, Qiao Ke, Haoyu Chen, Chuang Liu, Xiu-Xiu Zhan

Abstract: Analyzing and characterizing the differences between networks is a fundamental and challenging problem in network science. Previously, most network comparison methods that rely on topological properties have been restricted to measuring differences between two undirected networks. However, many networks, such as biological networks, social networks, and transportation networks, exhibit inherent di… ▽ More Analyzing and characterizing the differences between networks is a fundamental and challenging problem in network science. Previously, most network comparison methods that rely on topological properties have been restricted to measuring differences between two undirected networks. However, many networks, such as biological networks, social networks, and transportation networks, exhibit inherent directionality and higher-order attributes that should not be ignored when comparing networks. Therefore, we propose a motif-based directed network comparison method that captures local, global, and higher-order differences between two directed networks. Specifically, we first construct a motif distribution vector for each node, which captures the information of a node's involvement in different directed motifs. Then, the dissimilarity between two directed networks is defined on the basis of a matrix which is composed of the motif distribution vector of every node and Jensen-Shannon divergence. The performance of our method is evaluated via the comparison of six real directed networks with their null models as well as their perturbed networks based on edge perturbation. Our method is superior to the state-of-the-art baselines and is robust with different parameter settings. △ Less

Submitted 12 January, 2024; originally announced January 2024.

arXiv:2401.05606 [pdf]

Weiss-Weinstein bound of frequency estimation error for very weak GNSS signals

Authors: Xin Zhang, Xingqun Zhan, Jihong Huang, Jiahui Liu, Yingchao Xiao

Abstract: Tightness remains the center quest in all modern estimation bounds. For very weak signals, this is made possible with judicial choices of prior probability distribution and bound family. While current bounds in GNSS assess performance of carrier frequency estimators under Gaussian or uniform assumptions, the circular nature of frequency is overlooked. In addition, of all bounds in Bayesian framewo… ▽ More Tightness remains the center quest in all modern estimation bounds. For very weak signals, this is made possible with judicial choices of prior probability distribution and bound family. While current bounds in GNSS assess performance of carrier frequency estimators under Gaussian or uniform assumptions, the circular nature of frequency is overlooked. In addition, of all bounds in Bayesian framework, Weiss-Weinstein bound (WWB) stands out since it is free from regularity conditions or requirements on the prior distribution. Therefore, WWB is extended for the current frequency estimation problem. A divide-and-conquer type of hyperparameter tuning method is developed to level off the curse of computational complexity for the WWB family while enhancing tightness. Synthetic results show that with von Mises as prior probability distribution, WWB provides a bound up to 22.5% tighter than Ziv-Zakaï bound (ZZB) when SNR varies between -3.5 dB and -20 dB, where GNSS signal is deemed extremely weak. △ Less

Submitted 10 January, 2024; originally announced January 2024.

Comments: 35 pages, 13 figures, submitted to NAVIGATION, Journal of the Institute of Navigation

arXiv:2401.04543 [pdf, other]

doi 10.1145/3637339

Healthcare Voice AI Assistants: Factors Influencing Trust and Intention to Use

Authors: Xiao Zhan, Noura Abdi, William Seymour, Jose Such

Abstract: AI assistants such as Alexa, Google Assistant, and Siri, are making their way into the healthcare sector, offering a convenient way for users to access different healthcare services. Trust is a vital factor in the uptake of healthcare services, but the factors affecting trust in voice assistants used for healthcare are under-explored and this specialist domain introduces additional requirements. T… ▽ More AI assistants such as Alexa, Google Assistant, and Siri, are making their way into the healthcare sector, offering a convenient way for users to access different healthcare services. Trust is a vital factor in the uptake of healthcare services, but the factors affecting trust in voice assistants used for healthcare are under-explored and this specialist domain introduces additional requirements. This study explores the effects of different functional, personal, and risk factors on trust in and adoption of healthcare voice AI assistants (HVAs), generating a partial least squares structural model from a survey of 300 voice assistant users. Our results indicate that trust in HVAs can be significantly explained by functional factors (usefulness, content credibility, quality of service relative to a healthcare professional), together with security, and privacy risks and personal stance in technology. We also discuss differences in terms of trust between HVAs and general-purpose voice assistants as well as implications that are unique to HVAs. △ Less

Submitted 11 January, 2024; v1 submitted 9 January, 2024; originally announced January 2024.

Comments: 37 pages. This is a preprint of the paper accepted for the 27th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW'24)

arXiv:2312.16892 [pdf, other]

FlexSSL : A Generic and Efficient Framework for Semi-Supervised Learning

Authors: Huiling Qin, Xianyuan Zhan, Yuanxun Li, Yu Zheng

Abstract: Semi-supervised learning holds great promise for many real-world applications, due to its ability to leverage both unlabeled and expensive labeled data. However, most semi-supervised learning algorithms still heavily rely on the limited labeled data to infer and utilize the hidden information from unlabeled data. We note that any semi-supervised learning task under the self-training paradigm also… ▽ More Semi-supervised learning holds great promise for many real-world applications, due to its ability to leverage both unlabeled and expensive labeled data. However, most semi-supervised learning algorithms still heavily rely on the limited labeled data to infer and utilize the hidden information from unlabeled data. We note that any semi-supervised learning task under the self-training paradigm also hides an auxiliary task of discriminating label observability. Jointly solving these two tasks allows full utilization of information from both labeled and unlabeled data, thus alleviating the problem of over-reliance on labeled data. This naturally leads to a new generic and efficient learning framework without the reliance on any domain-specific information, which we call FlexSSL. The key idea of FlexSSL is to construct a semi-cooperative "game", which forges cooperation between a main self-interested semi-supervised learning task and a companion task that infers label observability to facilitate main task training. We show with theoretical derivation of its connection to loss re-weighting on noisy labels. Through evaluations on a diverse range of tasks, we demonstrate that FlexSSL can consistently enhance the performance of semi-supervised learning algorithms. △ Less

Submitted 28 December, 2023; originally announced December 2023.

arXiv:2312.11013 [pdf, other]

PPT4J: Patch Presence Test for Java Binaries

Authors: Zhiyuan Pan, Xing Hu, Xin Xia, Xian Zhan, David Lo, Xiaohu Yang

Abstract: The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In practice, it is difficult to identify whether patches have been integrated into software, especially if we only have binary files. Therefore, the ability to test whether a patch is applie… ▽ More The number of vulnerabilities reported in open source software has increased substantially in recent years. Security patches provide the necessary measures to protect software from attacks and vulnerabilities. In practice, it is difficult to identify whether patches have been integrated into software, especially if we only have binary files. Therefore, the ability to test whether a patch is applied to the target binary, a.k.a. patch presence test, is crucial for practitioners. However, it is challenging to obtain accurate semantic information from patches, which could lead to incorrect results. In this paper, we propose a new patch presence test framework named PPT4J ($\textbf{P}$atch $\textbf{P}$resence $\textbf{T}$est $\textbf{for}$ $\textbf{J}$ava Binaries). PPT4J is designed for open-source Java libraries. It takes Java binaries (i.e. bytecode files) as input, extracts semantic information from patches, and uses feature-based techniques to identify patch lines in the binaries. To evaluate the effectiveness of our proposed approach PPT4J, we construct a dataset with binaries that include 110 vulnerabilities. The results show that PPT4J achieves an F1 score of 98.5% with reasonable efficiency, improving the baseline by 14.2%. Furthermore, we conduct an in-the-wild evaluation of PPT4J on JetBrains IntelliJ IDEA. The results suggest that a third-party library included in the software is not patched for two CVEs, and we have reported this potential security problem to the vendor. △ Less

Submitted 15 January, 2024; v1 submitted 18 December, 2023; originally announced December 2023.

Comments: 12 pages

arXiv:2312.04276 [pdf]

Unusual Sign Reversal of Field-like Spin-Orbit Torque in Pt/Ni/Py with an Ultrathin Ni Spacer

Authors: Zishuang Li, Wenqiang Wang, Kaiyuan Zhou, Xiang Zhan, Tiejun Zhou, Ronghua Liu

Abstract: The magnetization manipulation by spin-orbit torques (SOTs) in nonmagnetic-metal (NM)/ferromagnet (FM) heterostructures has provided great opportunities for spin devices. Besides the conventional spin Hall effect (SHE) in heavy metals with strong spin-orbit coupling, the orbital currents have been proposed to be another promising approach to generate strong SOTs. Here, we systematically study the… ▽ More The magnetization manipulation by spin-orbit torques (SOTs) in nonmagnetic-metal (NM)/ferromagnet (FM) heterostructures has provided great opportunities for spin devices. Besides the conventional spin Hall effect (SHE) in heavy metals with strong spin-orbit coupling, the orbital currents have been proposed to be another promising approach to generate strong SOTs. Here, we systematically study the SOTs efficiency and its dependence on the FM thickness and different NM/FM interfaces in two prototypical Pt/Py and Ta/Py systems by inserting an ultrathin magnetic layer (0.4 nm thick ML = Co, Fe, Gd, and Ni). The dampinglike (DL) torque efficiency $ξ_{DL}$ is significantly enhanced by inserting ultrathin Co, Fe, and Ni layers and is noticeably suppressed for the Gd insertion. Moreover, the Ni insertion results in a sign change of the field-like (FL) torque in Pt/Py and substantially reduces $ξ_{DL}$ in Ta/Py. These results are likely related to the additional spin currents generated by combining the orbital Hall effect (OHE) in the NM and orbital-to-spin conversion in the ML insertion layer and/or their interfaces, especially for the Ni insertion. Our results demonstrate that inserting ultrathin ML can effectively manipulate the strength and sign of the SOTs, which would be helpful for spintronics applications. △ Less

Submitted 7 December, 2023; v1 submitted 7 December, 2023; originally announced December 2023.

arXiv:2312.01353 [pdf, other]

The minimum number of detours in graphs

Authors: Xingzhi Zhan

Abstract: A longest path in a graph is called a detour. It is easy to see that a connected graph of minimum degree at least $2$ and order at least $4$ has at least $4$ detours. We prove that if the number of detours in such a graph of order at least $9$ is odd, then it is at least $9,$ and this lower bound can be attained for every order. Thus the possibilities $3,$ $5$ and $7$ are excluded. Two open proble… ▽ More A longest path in a graph is called a detour. It is easy to see that a connected graph of minimum degree at least $2$ and order at least $4$ has at least $4$ detours. We prove that if the number of detours in such a graph of order at least $9$ is odd, then it is at least $9,$ and this lower bound can be attained for every order. Thus the possibilities $3,$ $5$ and $7$ are excluded. Two open problems are posed. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 8 pages

MSC Class: 05C30; 05C35; 05C38

arXiv:2312.01298 [pdf, ps, other]

Noiselessly amplified thermal states and after multi-photon addition or subtraction

Authors: Xue-feng Zhan, Xue-xiang Xu

Abstract: In this paper, we introduce a noiselessly amplified thermal state (ATS), by operating the noiseless amplification operator ($g^{\hat{n}}$) on the thermal state (TS) with corresponding mean photon number (MPN) $\bar{n}$. Actually, the ATS is an new TS with MPN $\bar{N}=g^{2}\bar{n}/[1-\bar{n}\left(g^{2}-1\right)]$. Furthermore, we introduce photon-added-ATS (PAATS) and photon-subtracted-ATS (PSATS)… ▽ More In this paper, we introduce a noiselessly amplified thermal state (ATS), by operating the noiseless amplification operator ($g^{\hat{n}}$) on the thermal state (TS) with corresponding mean photon number (MPN) $\bar{n}$. Actually, the ATS is an new TS with MPN $\bar{N}=g^{2}\bar{n}/[1-\bar{n}\left(g^{2}-1\right)]$. Furthermore, we introduce photon-added-ATS (PAATS) and photon-subtracted-ATS (PSATS) by operating $m$-photon addition ($\hat{a}^{†m}$) and $m$-photon subtraction ($\hat{a}^{m}$) on the ATS, respectively. We study photon number distributions (PNDs), purities, and Wigner functions (WFs) for all these states. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 8 pages, 8 figures

arXiv:2311.17061 [pdf, other]

HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting

Authors: Xian Liu, Xiaohang Zhan, Jiaxiang Tang, Ying Shan, Gang Zeng, Dahua Lin, Xihui Liu, Ziwei Liu

Abstract: Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-graine… ▽ More Realistic 3D human generation from text prompts is a desirable yet challenging task. Existing methods optimize 3D representations like mesh or neural fields via score distillation sampling (SDS), which suffers from inadequate fine details or excessive training time. In this paper, we propose an efficient yet effective framework, HumanGaussian, that generates high-quality 3D humans with fine-grained geometry and realistic appearance. Our key insight is that 3D Gaussian Splatting is an efficient renderer with periodic Gaussian shrinkage or growing, where such adaptive density control can be naturally guided by intrinsic human structures. Specifically, 1) we first propose a Structure-Aware SDS that simultaneously optimizes human appearance and geometry. The multi-modal score function from both RGB and depth space is leveraged to distill the Gaussian densification and pruning process. 2) Moreover, we devise an Annealed Negative Prompt Guidance by decomposing SDS into a noisier generative score and a cleaner classifier score, which well addresses the over-saturation issue. The floating artifacts are further eliminated based on Gaussian size in a prune-only phase to enhance generation smoothness. Extensive experiments demonstrate the superior efficiency and competitive quality of our framework, rendering vivid 3D humans under diverse scenarios. Project Page: https://alvinliu0.github.io/projects/HumanGaussian △ Less

Submitted 14 March, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: Accepted by CVPR 2024, camera-ready version. Project Page: https://alvinliu0.github.io/projects/HumanGaussian

arXiv:2311.15920 [pdf, other]

A Fully Data-Driven Approach for Realistic Traffic Signal Control Using Offline Reinforcement Learning

Authors: Jianxiong Li, Shichao Lin, Tianyu Shi, Chujie Tian, Yu Mei, Jian Song, Xianyuan Zhan, Ruimin Li

Abstract: The optimization of traffic signal control (TSC) is critical for an efficient transportation system. In recent years, reinforcement learning (RL) techniques have emerged as a popular approach for TSC and show promising results for highly adaptive control. However, existing RL-based methods suffer from notably poor real-world applicability and hardly have any successful deployments. The reasons for… ▽ More The optimization of traffic signal control (TSC) is critical for an efficient transportation system. In recent years, reinforcement learning (RL) techniques have emerged as a popular approach for TSC and show promising results for highly adaptive control. However, existing RL-based methods suffer from notably poor real-world applicability and hardly have any successful deployments. The reasons for such failures are mostly due to the reliance on over-idealized traffic simulators for policy optimization, as well as using unrealistic fine-grained state observations and reward signals that are not directly obtainable from real-world sensors. In this paper, we propose a fully Data-Driven and simulator-free framework for realistic Traffic Signal Control (D2TSC). Specifically, we combine well-established traffic flow theory with machine learning to construct a reward inference model to infer the reward signals from coarse-grained traffic data. With the inferred rewards, we further propose a sample-efficient offline RL method to enable direct signal control policy learning from historical offline datasets of real-world intersections. To evaluate our approach, we collect historical traffic data from a real-world intersection, and develop a highly customized simulation environment that strictly follows real data characteristics. We demonstrate through extensive experiments that our approach achieves superior performance over conventional and offline RL baselines, and also enjoys much better real-world applicability. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: 15 pages, 6 figures

arXiv:2311.09375 [pdf, other]

Distributed Constrained Combinatorial Optimization leveraging Hypergraph Neural Networks

Authors: Nasimeh Heydaribeni, Xinrui Zhan, Ruisi Zhang, Tina Eliassi-Rad, Farinaz Koushanfar

Abstract: Scalable addressing of high dimensional constrained combinatorial optimization problems is a challenge that arises in several science and engineering disciplines. Recent work introduced novel application of graph neural networks for solving quadratic-cost combinatorial optimization problems. However, effective utilization of models such as graph neural networks to address general problems with hig… ▽ More Scalable addressing of high dimensional constrained combinatorial optimization problems is a challenge that arises in several science and engineering disciplines. Recent work introduced novel application of graph neural networks for solving quadratic-cost combinatorial optimization problems. However, effective utilization of models such as graph neural networks to address general problems with higher order constraints is an unresolved challenge. This paper presents a framework, HypOp, which advances the state of the art for solving combinatorial optimization problems in several aspects: (i) it generalizes the prior results to higher order constrained problems with arbitrary cost functions by leveraging hypergraph neural networks; (ii) enables scalability to larger problems by introducing a new distributed and parallel training architecture; (iii) demonstrates generalizability across different problem formulations by transferring knowledge within the same hypergraph; (iv) substantially boosts the solution accuracy compared with the prior art by suggesting a fine-tuning step using simulated annealing; (v) shows a remarkable progress on numerous benchmark examples, including hypergraph MaxCut, satisfiability, and resource allocation problems, with notable run time improvements using a combination of fine-tuning and distributed training techniques. We showcase the application of HypOp in scientific discovery by solving a hypergraph MaxCut problem on NDC drug-substance hypergraph. Through extensive experimentation on various optimization problems, HypOp demonstrates superiority over existing unsupervised learning-based solvers and generic optimization methods. △ Less

Submitted 16 May, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.08663 [pdf, ps, other]

Influence maximization in multilayer networks based on adaptive coupling degree

Authors: Su-Su Zhang, Ming Xie, Chuang Liu, Xiu-Xiu Zhan

Abstract: Influence Maximization(IM) aims to identify highly influential nodes to maximize influence spread in a network. Previous research on the IM problem has mainly concentrated on single-layer networks, disregarding the comprehension of the coupling structure that is inherent in multilayer networks. To solve the IM problem in multilayer networks, we first propose an independent cascade model (MIC) in a… ▽ More Influence Maximization(IM) aims to identify highly influential nodes to maximize influence spread in a network. Previous research on the IM problem has mainly concentrated on single-layer networks, disregarding the comprehension of the coupling structure that is inherent in multilayer networks. To solve the IM problem in multilayer networks, we first propose an independent cascade model (MIC) in a multilayer network where propagation occurs simultaneously across different layers. Consequently, a heuristic algorithm, i.e., Adaptive Coupling Degree (ACD), which selects seed nodes with high spread influence and a low degree of overlap of influence, is proposed to identify seed nodes for IM in a multilayer network. By conducting experiments based on MIC, we have demonstrated that our proposed method is superior to the baselines in terms of influence spread and time cost in 6 synthetic and 4 real-world multilayer networks. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.08128 [pdf, ps, other]

Distance-regular Cayley graphs over (pseudo-) semi-dihedral groups

Authors: Xueyi Huang, Lu Lu, Xiongfeng Zhan

Abstract: Distance-regular graphs are a class of regualr graphs with pretty combinatorial symmetry. In 2007, Miklavič and Potočnik proposed the problem of charaterizing distance-regular Cayley graphs, which can be viewed as a natural extension of the problem of characterizing strongly-regular Cayley graphs (or equivalently, regular partial difference sets). In this paper, we provide a partial characterizati… ▽ More Distance-regular graphs are a class of regualr graphs with pretty combinatorial symmetry. In 2007, Miklavič and Potočnik proposed the problem of charaterizing distance-regular Cayley graphs, which can be viewed as a natural extension of the problem of characterizing strongly-regular Cayley graphs (or equivalently, regular partial difference sets). In this paper, we provide a partial characterization for distance-regular Cayley graphs over semi-dihedral groups and pseudo-semi-dihedral groups, both of which are $2$-groups with a cyclic subgroup of index $2$. △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 21 pages

MSC Class: 05E30; 05C25; 05C50

arXiv:2310.20323 [pdf, other]

SemanticBoost: Elevating Motion Generation with Augmented Textual Cues

Authors: Xin He, Shaoli Huang, Xiaohang Zhan, Chao Weng, Ying Shan

Abstract: Current techniques face difficulties in generating motions from intricate semantic descriptions, primarily due to insufficient semantic annotations in datasets and weak contextual understanding. To address these issues, we present SemanticBoost, a novel framework that tackles both challenges simultaneously. Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser… ▽ More Current techniques face difficulties in generating motions from intricate semantic descriptions, primarily due to insufficient semantic annotations in datasets and weak contextual understanding. To address these issues, we present SemanticBoost, a novel framework that tackles both challenges simultaneously. Our framework comprises a Semantic Enhancement module and a Context-Attuned Motion Denoiser (CAMD). The Semantic Enhancement module extracts supplementary semantics from motion data, enriching the dataset's textual description and ensuring precise alignment between text and motion data without depending on large language models. On the other hand, the CAMD approach provides an all-encompassing solution for generating high-quality, semantically consistent motion sequences by effectively capturing context information and aligning the generated motion with the given textual descriptions. Distinct from existing methods, our approach can synthesize accurate orientational movements, combined motions based on specific body part descriptions, and motions generated from complex, extended sentences. Our experimental results demonstrate that SemanticBoost, as a diffusion-based method, outperforms auto-regressive-based techniques, achieving cutting-edge performance on the Humanml3D dataset while maintaining realistic and smooth motion generation quality. △ Less

Submitted 28 November, 2023; v1 submitted 31 October, 2023; originally announced October 2023.

arXiv:2310.15465 [pdf, ps, other]

A universal meta-heuristic framework for influence maximization in hypergraphs

Authors: Ming Xie, Xiu-Xiu Zhan, Chuang Liu, Zi-Ke Zhang

Abstract: Influence maximization (IM) aims to select a small number of nodes that are able to maximize their influence in a network and covers a wide range of applications. Despite numerous attempts to provide effective solutions in ordinary networks, higher-order interactions between entities in various real-world systems are not usually taken into account. In this paper, we propose a versatile meta-heuris… ▽ More Influence maximization (IM) aims to select a small number of nodes that are able to maximize their influence in a network and covers a wide range of applications. Despite numerous attempts to provide effective solutions in ordinary networks, higher-order interactions between entities in various real-world systems are not usually taken into account. In this paper, we propose a versatile meta-heuristic approach, hyper genetic algorithm (HGA), to tackle the IM problem in hypergraphs, which is based on the concept of genetic evolution. Systematic validations in synthetic and empirical hypergraphs under both simple and complex contagion models indicate that HGA achieves universal and plausible performance compared to baseline methods. We explore the cause of the excellent performance of HGA through ablation studies and correlation analysis. The findings show that the solution of HGA is distinct from that of other prior methods. Moreover, a closer look at the local topological features of the seed nodes acquired by different algorithms reveals that the selection of seed nodes cannot be based on a single topological characteristic, but should involve a combination of multiple topological features to address the IM problem. △ Less

Submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.14315 [pdf, ps, other]

Photoproduction of doubly heavy baryons at future $e^+e^-$ colliders

Authors: Xi-Jie Zhan, Xing-Gang Wu, Xu-Chang Zheng

Abstract: The photoprodution of doubly heavy baryon ($Ξ_{cc},Ξ_{bb},Ξ_{bc}$) is investigated in the context of future high-energy and high-luminosity $e^+e^-$ colliders. The study incorporates two sources of initial photons, namely the LBS photon and the WWA photon. Alongside the direct photoproduction via the sub-process $γ+γ\rightarrow Ξ_{QQ^{'}} +\bar{Q}+\bar{Q^{'}}$ ($Q^{(')}=c,b$), the resolved photopr… ▽ More The photoprodution of doubly heavy baryon ($Ξ_{cc},Ξ_{bb},Ξ_{bc}$) is investigated in the context of future high-energy and high-luminosity $e^+e^-$ colliders. The study incorporates two sources of initial photons, namely the LBS photon and the WWA photon. Alongside the direct photoproduction via the sub-process $γ+γ\rightarrow Ξ_{QQ^{'}} +\bar{Q}+\bar{Q^{'}}$ ($Q^{(')}=c,b$), the resolved photoproduction channels are specifically considered, encompassing the sub-processes $γ+ g \rightarrow Ξ_{QQ^{'}} +\bar{Q}+\bar{Q^{'}}$, $g + g \rightarrow Ξ_{QQ^{'}} +\bar{Q}+\bar{Q^{'}}$, and $q + \bar{q} \rightarrow Ξ_{QQ^{'}} +\bar{Q}+\bar{Q^{'}}$ with $q=u,d,s$. Within the framework of non-relativistic QCD, two $(cc(bb))$-diquark configurations, ${}_{\bar{\textbf{3}}}[{}^3S_1]$ and ${}_{\textbf{6}}[{}^1S_0]$, and four $(bc)$-diquark configurations, $(bc)_{\bar{\textbf{3}}}[{}^3S_1]$, $(bc)_{\textbf{6}}[{}^1S_0]$, $(bc)_{\textbf{6}}[{}^3S_1]$ and $(bc)_{\bar{\textbf{3}}}[{}^1S_0]$, are considered in the calculations. Numerical results show that the single resolved photoproduction processes provide dominant contributions under certain collision configuration. At the future $e^+e^-$ colliders, the doubly heavy baryon generated via the photoproduction mechanism is promisingly observable and can be well studied. △ Less

Submitted 22 October, 2023; originally announced October 2023.

arXiv:2310.12678 [pdf, other]

TapMo: Shape-aware Motion Generation of Skeleton-free Characters

Authors: Jiaxu Zhang, Shaoli Huang, Zhigang Tu, Xin Chen, Xiaohang Zhan, Gang Yu, Ying Shan

Abstract: Previous motion generation methods are limited to the pre-rigged 3D human model, hindering their applications in the animation of various non-rigged characters. In this work, we present TapMo, a Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters. The pivotal innovation in TapMo is its use of shape deformation-aware features as a condition to g… ▽ More Previous motion generation methods are limited to the pre-rigged 3D human model, hindering their applications in the animation of various non-rigged characters. In this work, we present TapMo, a Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters. The pivotal innovation in TapMo is its use of shape deformation-aware features as a condition to guide the diffusion model, thereby enabling the generation of mesh-specific motions for various characters. Specifically, TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module. Mesh Handle Predictor predicts the skinning weights and clusters mesh vertices into adaptive handles for deformation control, which eliminates the need for traditional skeletal rigging. Shape-aware Motion Diffusion synthesizes motion with mesh-specific adaptations. This module employs text-guided motions and mesh features extracted during the first stage, preserving the geometric integrity of the animations by accounting for the character's shape and deformation. Trained in a weakly-supervised manner, TapMo can accommodate a multitude of non-human meshes, both with and without associated text motions. We demonstrate the effectiveness and generalizability of TapMo through rigorous qualitative and quantitative experiments. Our results reveal that TapMo consistently outperforms existing auto-animation methods, delivering superior-quality animations for both seen or unseen heterogeneous 3D characters. △ Less

Submitted 19 October, 2023; originally announced October 2023.

arXiv:2310.07591 [pdf, other]

PeP: a Point enhanced Painting method for unified point cloud tasks

Authors: Zichao Dong, Hang Ji, Xufeng Huang, Weikun Zhang, Xin Zhan, Junbo Chen

Abstract: Point encoder is of vital importance for point cloud recognition. As the very beginning step of whole model pipeline, adding features from diverse sources and providing stronger feature encoding mechanism would provide better input for downstream modules. In our work, we proposed a novel PeP module to tackle above issue. PeP contains two main parts, a refined point painting method and a LM-based p… ▽ More Point encoder is of vital importance for point cloud recognition. As the very beginning step of whole model pipeline, adding features from diverse sources and providing stronger feature encoding mechanism would provide better input for downstream modules. In our work, we proposed a novel PeP module to tackle above issue. PeP contains two main parts, a refined point painting method and a LM-based point encoder. Experiments results on the nuScenes and KITTI datasets validate the superior performance of our PeP. The advantages leads to strong performance on both semantic segmentation and object detection, in both lidar and multi-modal settings. Notably, our PeP module is model agnostic and plug-and-play. Our code will be publicly available soon. △ Less

Submitted 28 February, 2024; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.05026 [pdf, other]

Low-Resolution Self-Attention for Semantic Segmentation

Authors: Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen

Abstract: Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution S… ▽ More Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution, with additional 3x3 depth-wise convolutions to capture fine details in the high-resolution space. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure. Extensive experiments on the ADE20K, COCO-Stuff, and Cityscapes datasets demonstrate that LRFormer outperforms state-of-the-art models. The code will be made available at https://github.com/yuhuan-wu/LRFormer. △ Less

Submitted 8 October, 2023; originally announced October 2023.

Comments: 11 pages, 11 tables, 6 figures

arXiv:2309.12716 [pdf, other]

H2O+: An Improved Framework for Hybrid Offline-and-Online RL with Dynamics Gaps

Authors: Haoyi Niu, Tianying Ji, Bingqi Liu, Haocheng Zhao, Xiangyu Zhu, Jianying Zheng, Pengfei Huang, Guyue Zhou, Jianming Hu, Xianyuan Zhan

Abstract: Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of… ▽ More Solving real-world complex tasks using reinforcement learning (RL) without high-fidelity simulation environments or large amounts of offline data can be quite challenging. Online RL agents trained in imperfect simulation environments can suffer from severe sim-to-real issues. Offline RL approaches although bypass the need for simulators, often pose demanding requirements on the size and quality of the offline datasets. The recently emerged hybrid offline-and-online RL provides an attractive framework that enables joint use of limited offline data and imperfect simulator for transferable policy learning. In this paper, we develop a new algorithm, called H2O+, which offers great flexibility to bridge various choices of offline and online learning methods, while also accounting for dynamics gaps between the real and simulation environment. Through extensive simulation and real-world robotics experiments, we demonstrate superior performance and flexibility over advanced cross-domain online and offline RL algorithms. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.11669 [pdf, other]

Construction of Paired Knowledge Graph-Text Datasets Informed by Cyclic Evaluation

Authors: Ali Mousavi, Xin Zhan, He Bai, Peng Shi, Theo Rekatsinas, Benjamin Han, Yunyao Li, Jeff Pound, Josh Susskind, Natalie Schluter, Ihab Ilyas, Navdeep Jaitly

Abstract: Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find… ▽ More Datasets that pair Knowledge Graphs (KG) and text together (KG-T) can be used to train forward and reverse neural models that generate text from KG and vice versa. However models trained on datasets where KG and text pairs are not equivalent can suffer from more hallucination and poorer recall. In this paper, we verify this empirically by generating datasets with different levels of noise and find that noisier datasets do indeed lead to more hallucination. We argue that the ability of forward and reverse models trained on a dataset to cyclically regenerate source KG or text is a proxy for the equivalence between the KG and the text in the dataset. Using cyclic evaluation we find that manually created WebNLG is much better than automatically created TeKGen and T-REx. Guided by these observations, we construct a new, improved dataset called LAGRANGE using heuristics meant to improve equivalence between KG and text and show the impact of each of the heuristics on cyclic evaluation. We also construct two synthetic datasets using large language models (LLMs), and observe that these are conducive to models that perform significantly well on cyclic generation of text, but less so on cyclic generation of KGs, probably because of a lack of a consistent underlying ontology. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 16 pages

arXiv:2309.11235 [pdf, other]

OpenChat: Advancing Open-source Language Models with Mixed-Quality Data

Authors: Guan Wang, Sijie Cheng, Xianyuan Zhan, Xiangang Li, Sen Song, Yang Liu

Abstract: Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we pres… ▽ More Nowadays, open-source large language models like LLaMA have emerged. Recent developments have incorporated supervised fine-tuning (SFT) and reinforcement learning fine-tuning (RLFT) to align these models with human goals. However, SFT methods treat all training data with mixed quality equally, while RLFT methods require high-quality pairwise or ranking-based preference data. In this study, we present a novel framework, named OpenChat, to advance open-source language models with mixed-quality data. Specifically, we consider the general SFT training data, consisting of a small amount of expert data mixed with a large proportion of sub-optimal data, without any preference labels. We propose the C(onditioned)-RLFT, which regards different data sources as coarse-grained reward labels and learns a class-conditioned policy to leverage complementary data quality information. Interestingly, the optimal policy in C-RLFT can be easily solved through single-stage, RL-free supervised learning, which is lightweight and avoids costly human preference labeling. Through extensive experiments on three standard benchmarks, our openchat-13b fine-tuned with C-RLFT achieves the highest average performance among all 13b open-source language models. Moreover, we use AGIEval to validate the model generalization performance, in which only openchat-13b surpasses the base model. Finally, we conduct a series of analyses to shed light on the effectiveness and robustness of OpenChat. Our code, data, and models are publicly available at https://github.com/imoneoi/openchat and https://huggingface.co/openchat. △ Less

Submitted 16 March, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Showing 1–50 of 324 results for author: Zhan, X