subscribe to arXiv mailings

An Economic Framework for 6-DoF Grasp Detection

Authors: Xiao-Ming Wu, Jia-Feng Cai, Jian-Jian Jiang, Dian Zheng, Yi-Lin Wei, Wei-Shi Zheng

Abstract: Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload… ▽ More Robotic grasping in clutters is a fundamental task in robotic manipulation. In this work, we propose an economic framework for 6-DoF grasp detection, aiming to economize the resource cost in training and meanwhile maintain effective grasp performance. To begin with, we discover that the dense supervision is the bottleneck of current SOTA methods that severely encumbers the entire training overload, meanwhile making the training difficult to converge. To solve the above problem, we first propose an economic supervision paradigm for efficient and effective grasping. This paradigm includes a well-designed supervision selection strategy, selecting key labels basically without ambiguity, and an economic pipeline to enable the training after selection. Furthermore, benefit from the economic supervision, we can focus on a specific grasp, and thus we devise a focal representation module, which comprises an interactive grasp head and a composite score estimation to generate the specific grasp more accurately. Combining all together, the EconomicGrasp framework is proposed. Our extensive experiments show that EconomicGrasp surpasses the SOTA grasp method by about 3AP on average, and with extremely low resource cost, for about 1/4 training time cost, 1/8 memory cost and 1/30 storage cost. Our code is available at https://github.com/iSEE-Laboratory/EconomicGrasp. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 19 pages, 7 figures. Accepted in ECCV 2024!

arXiv:2407.01904 [pdf, other]

From Directed Steiner Tree to Directed Polymatroid Steiner Tree in Planar Graphs

Authors: Chandra Chekuri, Rhea Jain, Shubhang Kulkarni, Da Wei Zheng, Weihao Zhu

Abstract: In the Directed Steiner Tree (DST) problem the input is a directed edge-weighted graph $G=(V,E)$, a root vertex $r$ and a set $S \subseteq V$ of $k$ terminals. The goal is to find a min-cost subgraph that connects $r$ to each of the terminals. DST admits an $O(\log^2 k/\log \log k)$-approximation in quasi-polynomial time, and an $O(k^ε)$-approximation for any fixed $ε> 0$ in polynomial-time. Resol… ▽ More In the Directed Steiner Tree (DST) problem the input is a directed edge-weighted graph $G=(V,E)$, a root vertex $r$ and a set $S \subseteq V$ of $k$ terminals. The goal is to find a min-cost subgraph that connects $r$ to each of the terminals. DST admits an $O(\log^2 k/\log \log k)$-approximation in quasi-polynomial time, and an $O(k^ε)$-approximation for any fixed $ε> 0$ in polynomial-time. Resolving the existence of a polynomial-time poly-logarithmic approximation is a major open problem in approximation algorithms. In a recent work, Friggstad and Mousavi [ICALP 2023] obtained a simple and elegant polynomial-time $O(\log k)$-approximation for DST in planar digraphs via Thorup's shortest path separator theorem. We build on their work and obtain several new results on DST and related problems. - We develop a tree embedding technique for rooted problems in planar digraphs via an interpretation of the recursion in Friggstad and Mousavi [ICALP 2023]. Using this we obtain polynomial-time poly-logarithmic approximations for Group Steiner Tree, Covering Steiner Tree, and the Polymatroid Steiner Tree problems in planar digraphs. All these problems are hard to approximate to within a factor of $Ω(\log^2 n/\log \log n)$ even in trees. - We prove that the natural cut-based LP relaxation for DST has an integrality gap of $O(\log^2 k)$ in planar graphs. This is in contrast to general graphs where the integrality gap of this LP is known to be $Ω(k)$ and $Ω(n^δ)$ for some fixed $δ> 0$. - We combine the preceding results with density based arguments to obtain poly-logarithmic approximations for the multi-rooted versions of the problems in planar digraphs. For DST our result improves the $O(R + \log k)$ approximation of Friggstad and Mousavi [ICALP 2023] when $R= ω(\log^2 k)$. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.11884 [pdf, other]

Hierarchical Compression of Text-Rich Graphs via Large Language Models

Authors: Shichang Zhang, Da Zheng, Jiani Zhang, Qi Zhu, Xiang song, Soji Adeshina, Christos Faloutsos, George Karypis, Yizhou Sun

Abstract: Text-rich graphs, prevalent in data mining contexts like e-commerce and academic graphs, consist of nodes with textual features linked by various relations. Traditional graph machine learning models, such as Graph Neural Networks (GNNs), excel in encoding the graph structural information, but have limited capability in handling rich text on graph nodes. Large Language Models (LLMs), noted for thei… ▽ More Text-rich graphs, prevalent in data mining contexts like e-commerce and academic graphs, consist of nodes with textual features linked by various relations. Traditional graph machine learning models, such as Graph Neural Networks (GNNs), excel in encoding the graph structural information, but have limited capability in handling rich text on graph nodes. Large Language Models (LLMs), noted for their superior text understanding abilities, offer a solution for processing the text in graphs but face integration challenges due to their limitation for encoding graph structures and their computational complexities when dealing with extensive text in large neighborhoods of interconnected nodes. This paper introduces ``Hierarchical Compression'' (HiCom), a novel method to align the capabilities of LLMs with the structure of text-rich graphs. HiCom processes text in a node's neighborhood in a structured manner by organizing the extensive textual information into a more manageable hierarchy and compressing node text step by step. Therefore, HiCom not only preserves the contextual richness of the text but also addresses the computational challenges of LLMs, which presents an advancement in integrating the text processing power of LLMs with the structural complexities of text-rich graphs. Empirical results show that HiCom can outperform both GNNs and LLM backbones for node classification on e-commerce and citation graphs. HiCom is especially effective for nodes from a dense region in a graph, where it achieves a 3.48% average performance improvement on five datasets while being more efficient than LLM backbones. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.06918 [pdf, other]

Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond

Authors: Dewu Zheng, Yanlin Wang, Ensheng Shi, Ruikai Zhang, Yuchi Ma, Hongyu Zhang, Zibin Zheng

Abstract: To evaluate the code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation approaches have been developed. They typically leverage contextual code from the latest version of a project to facilitate LLMs in accurately generating the desired function. However, such evaluation approaches fail to consider the dynamic evolution of… ▽ More To evaluate the code generation capabilities of Large Language Models (LLMs) in complex real-world software development scenarios, many evaluation approaches have been developed. They typically leverage contextual code from the latest version of a project to facilitate LLMs in accurately generating the desired function. However, such evaluation approaches fail to consider the dynamic evolution of software projects over time, which we refer to as evolving-ignored situation, leading to issues of future context leakage and useful context missing. This in turn results in inaccurate evaluation of LLMs' performance. In this paper, we conduct an empirical study to deeply understand LLMs' code generation performance within settings that reflect the evolving nature of software development. To achieve this, we first construct an evolving-aware repository-level code generation dataset, namely HumanEvo, equipped with an automated execution-based evaluation tool. Second, we manually categorize HumanEvo according to dependency levels to more comprehensively analyze the model's performance in generating functions with different dependency levels. Third, we conduct extensive experiments on HumanEvo with seven representative and diverse LLMs to verify the effectiveness of the proposed benchmark. We obtain many important findings through our experimental study. For example, we find that previous evolving-ignored evaluation approaches lead to inflated performance of the LLMs, ranging from 10.0% to 61.1%. Based on the findings, we give actionable suggestions on more realistic evaluation of LLMs on code generation. We also build a shared evolving-aware code generation toolbox to facilitate future research. Replication package including source code, datasets and appendix is available at https://github.com/DeepSoftwareAnalytics/EvoEval. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.06022 [pdf, other]

GraphStorm: all-in-one graph machine learning framework for industry applications

Authors: Da Zheng, Xiang Song, Qi Zhu, Jian Zhang, Theodore Vasiloudis, Runjie Ma, Houyu Zhang, Zichen Wang, Soji Adeshina, Israt Nisa, Alejandro Mottini, Qingjun Cui, Huzefa Rangwala, Belinda Zeng, Christos Faloutsos, George Karypis

Abstract: Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perfor… ▽ More Graph machine learning (GML) is effective in many business applications. However, making GML easy to use and applicable to industry applications with massive datasets remain challenging. We developed GraphStorm, which provides an end-to-end solution for scalable graph construction, graph model training and inference. GraphStorm has the following desirable properties: (a) Easy to use: it can perform graph construction and model training and inference with just a single command; (b) Expert-friendly: GraphStorm contains many advanced GML modeling techniques to handle complex graph data and improve model performance; (c) Scalable: every component in GraphStorm can operate on graphs with billions of nodes and can scale model training and inference to different hardware without changing any code. GraphStorm has been used and deployed for over a dozen billion-scale industry applications after its release in May 2023. It is open-sourced in Github: https://github.com/awslabs/graphstorm. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Journal ref: KDD 2024

arXiv:2405.19596 [pdf, ps, other]

The weight hierarchies of three classes of linear codes

Authors: Wei Lu, Qingyao Wang, Xiaoqiang Wang, Dabin Zheng

Abstract: Studying the generalized Hamming weights of linear codes is a significant research area within coding theory, as it provides valuable structural information about the codes and plays a crucial role in determining their performance in various applications. However, determining the generalized Hamming weights of linear codes, particularly their weight hierarchy, is generally a challenging task. In t… ▽ More Studying the generalized Hamming weights of linear codes is a significant research area within coding theory, as it provides valuable structural information about the codes and plays a crucial role in determining their performance in various applications. However, determining the generalized Hamming weights of linear codes, particularly their weight hierarchy, is generally a challenging task. In this paper, we focus on investigating the generalized Hamming weights of three classes of linear codes over finite fields. These codes are constructed by different defining sets. By analysing the intersections between the definition sets and the duals of all $r$-dimensional subspaces, we get the inequalities on the sizes of these intersections. Then constructing subspaces that reach the upper bounds of these inequalities, we successfully determine the complete weight hierarchies of these codes. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.13565 [pdf, other]

doi 10.1145/3664646.3665664

AI-Assisted Assessment of Coding Practices in Modern Code Review

Authors: Manushree Vijayvergiya, Małgorzata Salawa, Ivan Budiselić, Dan Zheng, Pascal Lamblin, Marko Ivanković, Juanjo Carin, Mateusz Lewko, Jovan Andonov, Goran Petrović, Daniel Tarlow, Petros Maniatis, René Just

Abstract: Modern code review is a process in which an incremental code contribution made by a code author is reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that code contributions adhere to best practices. While some of these best practices can be automatically verified, verifying others is commonly left to human re… ▽ More Modern code review is a process in which an incremental code contribution made by a code author is reviewed by one or more peers before it is committed to the version control system. An important element of modern code review is verifying that code contributions adhere to best practices. While some of these best practices can be automatically verified, verifying others is commonly left to human reviewers. This paper reports on the development, deployment, and evaluation of AutoCommenter, a system backed by a large language model that automatically learns and enforces coding best practices. We implemented AutoCommenter for four programming languages (C++, Java, Python, and Go) and evaluated its performance and adoption in a large industrial setting. Our evaluation shows that an end-to-end system for learning and enforcing coding best practices is feasible and has a positive impact on the developer workflow. Additionally, this paper reports on the challenges associated with deploying such a system to tens of thousands of developers and the corresponding lessons learned. △ Less

Submitted 22 May, 2024; originally announced May 2024.

Comments: To appear at the ACM International Conference on AI-Powered Software (AIware '24)

arXiv:2404.18271 [pdf, other]

Parameter-Efficient Tuning Large Language Models for Graph Representation Learning

Authors: Qi Zhu, Da Zheng, Xiang Song, Shichang Zhang, Bowen Jin, Yizhou Sun, George Karypis

Abstract: Text-rich graphs, which exhibit rich textual information on nodes and edges, are prevalent across a wide range of real-world business applications. Large Language Models (LLMs) have demonstrated remarkable abilities in understanding text, which also introduced the potential for more expressive modeling in text-rich graphs. Despite these capabilities, efficiently applying LLMs to representation lea… ▽ More Text-rich graphs, which exhibit rich textual information on nodes and edges, are prevalent across a wide range of real-world business applications. Large Language Models (LLMs) have demonstrated remarkable abilities in understanding text, which also introduced the potential for more expressive modeling in text-rich graphs. Despite these capabilities, efficiently applying LLMs to representation learning on graphs presents significant challenges. Recently, parameter-efficient fine-tuning methods for LLMs have enabled efficient new task generalization with minimal time and memory consumption. Inspired by this, we introduce Graph-aware Parameter-Efficient Fine-Tuning - GPEFT, a novel approach for efficient graph representation learning with LLMs on text-rich graphs. Specifically, we utilize a graph neural network (GNN) to encode structural information from neighboring nodes into a graph prompt. This prompt is then inserted at the beginning of the text sequence. To improve the quality of graph prompts, we pre-trained the GNN to assist the frozen LLM in predicting the next token in the node text. Compared with existing joint GNN and LMs, our method directly generate the node embeddings from large language models with an affordable fine-tuning cost. We validate our approach through comprehensive experiments conducted on 8 different text-rich graphs, observing an average improvement of 2% in hit@1 and Mean Reciprocal Rank (MRR) in link prediction evaluations. Our results demonstrate the efficacy and efficiency of our model, showing that it can be smoothly integrated with various large language models, including OPT, LLaMA and Falcon. △ Less

Submitted 28 April, 2024; originally announced April 2024.

arXiv:2404.18135 [pdf, other]

Dexterous Grasp Transformer

Authors: Guo-Hao Xu, Yi-Lin Wei, Dian Zheng, Xiao-Ming Wu, Wei-Shi Zheng

Abstract: In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify tha… ▽ More In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify that this set prediction paradigm encounters several optimization challenges in the field of dexterous grasping and results in restricted performance. To address these issues, we propose progressive strategies for both the training and testing phases. First, the dynamic-static matching training (DSMT) strategy is presented to enhance the optimization stability during the training phase. Second, we introduce the adversarial-balanced test-time adaptation (AB-TTA) with a pair of adversarial losses to improve grasping quality during the testing phase. Experimental results on the DexGraspNet dataset demonstrate the capability of DGTR to predict dexterous grasp poses with both high quality and diversity. Notably, while keeping high quality, the diversity of grasp poses predicted by DGTR significantly outperforms previous works in multiple metrics without any data pre-processing. Codes are available at https://github.com/iSEE-Laboratory/DGTR . △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2403.17502 [pdf, other]

SeNM-VAE: Semi-Supervised Noise Modeling with Hierarchical Variational Autoencoder

Authors: Dihan Zheng, Yihang Zou, Xiaowen Zhang, Chenglong Bao

Abstract: The data bottleneck has emerged as a fundamental challenge in learning based image restoration methods. Researchers have attempted to generate synthesized training data using paired or unpaired samples to address this challenge. This study proposes SeNM-VAE, a semi-supervised noise modeling method that leverages both paired and unpaired datasets to generate realistic degraded data. Our approach is… ▽ More The data bottleneck has emerged as a fundamental challenge in learning based image restoration methods. Researchers have attempted to generate synthesized training data using paired or unpaired samples to address this challenge. This study proposes SeNM-VAE, a semi-supervised noise modeling method that leverages both paired and unpaired datasets to generate realistic degraded data. Our approach is based on modeling the conditional distribution of degraded and clean images with a specially designed graphical model. Under the variational inference framework, we develop an objective function for handling both paired and unpaired data. We employ our method to generate paired training samples for real-world image denoising and super-resolution tasks. Our approach excels in the quality of synthetic degraded images compared to other unpaired and paired noise modeling methods. Furthermore, our approach demonstrates remarkable performance in downstream image restoration tasks, even with limited paired data. With more paired data, our method achieves the best performance on the SIDD dataset. △ Less

Submitted 26 March, 2024; originally announced March 2024.

arXiv:2403.12303 [pdf, other]

Semialgebraic Range Stabbing, Ray Shooting, and Intersection Counting in the Plane

Authors: Timothy M. Chan, Pingan Cheng, Da Wei Zheng

Abstract: Polynomial partitioning techniques have recently led to improved geometric data structures for a variety of fundamental problems related to semialgebraic range searching and intersection searching in 3D and higher dimensions (e.g., see [Agarwal, Aronov, Ezra, and Zahl, SoCG 2019; Ezra and Sharir, SoCG 2021; Agarwal, Aronov, Ezra, Katz, and Sharir, SoCG 2022]). They have also led to improved algori… ▽ More Polynomial partitioning techniques have recently led to improved geometric data structures for a variety of fundamental problems related to semialgebraic range searching and intersection searching in 3D and higher dimensions (e.g., see [Agarwal, Aronov, Ezra, and Zahl, SoCG 2019; Ezra and Sharir, SoCG 2021; Agarwal, Aronov, Ezra, Katz, and Sharir, SoCG 2022]). They have also led to improved algorithms for offline versions of semialgebraic range searching in 2D, via lens-cutting [Sharir and Zahl (2017)]. In this paper, we show that these techniques can yield new data structures for a number of other 2D problems even for online queries: 1. Semialgebraic range stabbing. We present a data structure for $n$ semialgebraic ranges in 2D of constant description complexity with $O(n^{3/2+\varepsilon})$ preprocessing time and space, so that we can count the number of ranges containing a query point in $O(n^{1/4+\varepsilon})$ time, for an arbitrarily small constant $\varepsilon>0$. 2. Ray shooting amid algebraic arcs. We present a data structure for $n$ algebraic arcs in 2D of constant description complexity with $O(n^{3/2+\varepsilon})$ preprocessing time and space, so that we can find the first arc hit by a query (straight-line) ray in $O(n^{1/4+\varepsilon})$ time. 3. Intersection counting amid algebraic arcs. We present a data structure for $n$ algebraic arcs in 2D of constant description complexity with $O(n^{3/2+\varepsilon})$ preprocessing time and space, so that we can count the number of intersection points with a query algebraic arc of constant description complexity in $O(n^{1/2+\varepsilon})$ time. In particular, this implies an $O(n^{3/2+\varepsilon})$-time algorithm for counting intersections between two sets of $n$ algebraic arcs in 2D. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: SOCG 2024

arXiv:2403.11157 [pdf, other]

Selective Hourglass Mapping for Universal Image Restoration Based on Diffusion Model

Authors: Dian Zheng, Xiao-Ming Wu, Shuzhou Yang, Jian Zhang, Jian-Fang Hu, Wei-Shi Zheng

Abstract: Universal image restoration is a practical and potential computer vision task for real-world applications. The main challenge of this task is handling the different degradation distributions at once. Existing methods mainly utilize task-specific conditions (e.g., prompt) to guide the model to learn different distributions separately, named multi-partite mapping. However, it is not suitable for uni… ▽ More Universal image restoration is a practical and potential computer vision task for real-world applications. The main challenge of this task is handling the different degradation distributions at once. Existing methods mainly utilize task-specific conditions (e.g., prompt) to guide the model to learn different distributions separately, named multi-partite mapping. However, it is not suitable for universal model learning as it ignores the shared information between different tasks. In this work, we propose an advanced selective hourglass mapping strategy based on diffusion model, termed DiffUIR. Two novel considerations make our DiffUIR non-trivial. Firstly, we equip the model with strong condition guidance to obtain accurate generation direction of diffusion model (selective). More importantly, DiffUIR integrates a flexible shared distribution term (SDT) into the diffusion algorithm elegantly and naturally, which gradually maps different distributions into a shared one. In the reverse process, combined with SDT and strong condition guidance, DiffUIR iteratively guides the shared distribution to the task-specific distribution with high image quality (hourglass). Without bells and whistles, by only modifying the mapping strategy, we achieve state-of-the-art performance on five image restoration tasks, 22 benchmarks in the universal setting and zero-shot generalization setting. Surprisingly, by only using a lightweight model (only 0.89M), we could achieve outstanding performance. The source code and pre-trained models are available at https://github.com/iSEE-Laboratory/DiffUIR △ Less

Submitted 17 March, 2024; originally announced March 2024.

Comments: Accepted to CVPR2024

arXiv:2403.09475 [pdf, other]

Covert Communication for Untrusted UAV-Assisted Wireless Systems

Authors: Chan Gao, Linying Tian, Dong Zheng

Abstract: Wireless systems are of paramount importance for providing ubiquitous data transmission for smart cities. However, due to the broadcasting and openness of wireless channels, such systems face potential security challenges. UAV-assisted covert communication is a supporting technology for improving covert performances and has become a hot issue in the research of wireless communication security. Thi… ▽ More Wireless systems are of paramount importance for providing ubiquitous data transmission for smart cities. However, due to the broadcasting and openness of wireless channels, such systems face potential security challenges. UAV-assisted covert communication is a supporting technology for improving covert performances and has become a hot issue in the research of wireless communication security. This paper investigates the performance of joint covert and security communication in a tow-hop UAV-assisted wireless system, where a source transmits the covert message to a destination with the help of an untrusted UAV. We first design a transmission scheme such that use UAVs to assist in covert communications while ensuring the security of covert messages. Then, we develop a theoretical model to derive the expressions for the detection error probability of the warden and the covert and security rate, and the maximum covert and security rate is optimized by power control under a given covertness and security requirements. Finally, numerical results are provided to illustrate our theoretical analysis and the performance of covert and security communication in such systems. △ Less

Submitted 14 March, 2024; originally announced March 2024.

arXiv:2403.05530 [pdf, other]

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Authors: Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, Soroosh Mariooryad, Yifan Ding, Xinyang Geng, Fred Alcober, Roy Frostig, Mark Omernick, Lexi Walker, Cosmin Paduraru, Christina Sorokin, Andrea Tacchetti, Colin Gaffney, Samira Daruki, Olcan Sercinoglu, Zach Gleicher, Juliette Love , et al. (1092 additional authors not shown)

Abstract: In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February… ▽ More In this report, we introduce the Gemini 1.5 family of models, representing the next generation of highly compute-efficient multimodal models capable of recalling and reasoning over fine-grained information from millions of tokens of context, including multiple long documents and hours of video and audio. The family includes two new models: (1) an updated Gemini 1.5 Pro, which exceeds the February version on the great majority of capabilities and benchmarks; (2) Gemini 1.5 Flash, a more lightweight variant designed for efficiency with minimal regression in quality. Gemini 1.5 models achieve near-perfect recall on long-context retrieval tasks across modalities, improve the state-of-the-art in long-document QA, long-video QA and long-context ASR, and match or surpass Gemini 1.0 Ultra's state-of-the-art performance across a broad set of benchmarks. Studying the limits of Gemini 1.5's long-context ability, we find continued improvement in next-token prediction and near-perfect retrieval (>99%) up to at least 10M tokens, a generational leap over existing models such as Claude 3.0 (200k) and GPT-4 Turbo (128k). Finally, we highlight real-world use cases, such as Gemini 1.5 collaborating with professionals on completing their tasks achieving 26 to 75% time savings across 10 different job categories, as well as surprising new capabilities of large language models at the frontier; when given a grammar manual for Kalamang, a language with fewer than 200 speakers worldwide, the model learns to translate English to Kalamang at a similar level to a person who learned from the same content. △ Less

Submitted 14 June, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.02630 [pdf, other]

FedHCDR: Federated Cross-Domain Recommendation with Hypergraph Signal Decoupling

Authors: Hongyu Zhang, Dongyi Zheng, Lin Zhong, Xu Yang, Jiyuan Feng, Yunqing Feng, Qing Liao

Abstract: In recent years, Cross-Domain Recommendation (CDR) has drawn significant attention, which utilizes user data from multiple domains to enhance the recommendation performance. However, current CDR methods require sharing user data across domains, thereby violating the General Data Protection Regulation (GDPR). Consequently, numerous approaches have been proposed for Federated Cross-Domain Recommenda… ▽ More In recent years, Cross-Domain Recommendation (CDR) has drawn significant attention, which utilizes user data from multiple domains to enhance the recommendation performance. However, current CDR methods require sharing user data across domains, thereby violating the General Data Protection Regulation (GDPR). Consequently, numerous approaches have been proposed for Federated Cross-Domain Recommendation (FedCDR). Nevertheless, the data heterogeneity across different domains inevitably influences the overall performance of federated learning. In this study, we propose FedHCDR, a novel Federated Cross-Domain Recommendation framework with Hypergraph signal decoupling. Specifically, to address the data heterogeneity across domains, we introduce an approach called hypergraph signal decoupling (HSD) to decouple the user features into domain-exclusive and domain-shared features. The approach employs high-pass and low-pass hypergraph filters to decouple domain-exclusive and domain-shared user representations, which are trained by the local-global bi-directional transfer algorithm. In addition, a hypergraph contrastive learning (HCL) module is devised to enhance the learning of domain-shared user relationship information by perturbing the user hypergraph. Extensive experiments conducted on three real-world scenarios demonstrate that FedHCDR outperforms existing baselines significantly. △ Less

Submitted 10 June, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

Comments: 16 pages, 5 figures

arXiv:2403.00095 [pdf]

Solving Jigsaw Puzzles using Iterative Random Sampling: Parallels with Development of Skill Mastery

Authors: Neil Zhao, Diana Zheng

Abstract: Skill mastery is a priority for success in all fields. We present a parallel between the development of skill mastery and the process of solving jigsaw puzzles. We show that iterative random sampling solves jigsaw puzzles in two phases: a lag phase that is characterized by little change and occupies the majority of the time, and a growth phase that marks rapid and imminent puzzle completion. Chang… ▽ More Skill mastery is a priority for success in all fields. We present a parallel between the development of skill mastery and the process of solving jigsaw puzzles. We show that iterative random sampling solves jigsaw puzzles in two phases: a lag phase that is characterized by little change and occupies the majority of the time, and a growth phase that marks rapid and imminent puzzle completion. Changes in the proportions of the number of single pieces and larger pieces can be overlaid on the timeline and progression of skill mastery. An emphasis is placed on the development of connections between pieces, which serves as an indicator of increasing puzzle completion and increasing skill mastery. Our manuscript provides a straightforward visual of skill mastery in the context of a common recreational activity. △ Less

Submitted 29 February, 2024; originally announced March 2024.

Comments: 26 pages, 15 figures, 1 table

arXiv:2402.12554 [pdf, other]

Archer: A Human-Labeled Text-to-SQL Dataset with Arithmetic, Commonsense and Hypothetical Reasoning

Authors: Danna Zheng, Mirella Lapata, Jeff Z. Pan

Abstract: We present Archer, a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. Notably, this dataset demonstrates a significantly higher level of complexity compared to exist… ▽ More We present Archer, a challenging bilingual text-to-SQL dataset specific to complex reasoning, including arithmetic, commonsense and hypothetical reasoning. It contains 1,042 English questions and 1,042 Chinese questions, along with 521 unique SQL queries, covering 20 English databases across 20 domains. Notably, this dataset demonstrates a significantly higher level of complexity compared to existing publicly available datasets. Our evaluation shows that Archer challenges the capabilities of current state-of-the-art models, with a high-ranked model on the Spider leaderboard achieving only 6.73% execution accuracy on Archer test set. Thus, Archer presents a significant challenge for future research in this field. △ Less

Submitted 24 February, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

Comments: EACL 2024

arXiv:2402.12545 [pdf, other]

TrustScore: Reference-Free Evaluation of LLM Response Trustworthiness

Authors: Danna Zheng, Danyang Liu, Mirella Lapata, Jeff Z. Pan

Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This… ▽ More Large Language Models (LLMs) have demonstrated impressive capabilities across various domains, prompting a surge in their practical applications. However, concerns have arisen regarding the trustworthiness of LLMs outputs, particularly in closed-book question-answering tasks, where non-experts may struggle to identify inaccuracies due to the absence of contextual or ground truth information. This paper introduces TrustScore, a framework based on the concept of Behavioral Consistency, which evaluates whether an LLMs response aligns with its intrinsic knowledge. Additionally, TrustScore can seamlessly integrate with fact-checking methods, which assesses alignment with external knowledge sources. The experimental results show that TrustScore achieves strong correlations with human judgments, surpassing existing reference-free metrics, and achieving results on par with reference-based metrics. △ Less

Submitted 6 May, 2024; v1 submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.07999 [pdf, other]

NetInfoF Framework: Measuring and Exploiting Network Usable Information

Authors: Meng-Chieh Lee, Haiyang Yu, Jian Zhang, Vassilis N. Ioannidis, Xiang Song, Soji Adeshina, Da Zheng, Christos Faloutsos

Abstract: Given a node-attributed graph, and a graph task (link prediction or node classification), can we tell if a graph neural network (GNN) will perform well? More specifically, do the graph structure and the node features carry enough usable information for the task? Our goals are (1) to develop a fast tool to measure how much information is in the graph structure and in the node features, and (2) to e… ▽ More Given a node-attributed graph, and a graph task (link prediction or node classification), can we tell if a graph neural network (GNN) will perform well? More specifically, do the graph structure and the node features carry enough usable information for the task? Our goals are (1) to develop a fast tool to measure how much information is in the graph structure and in the node features, and (2) to exploit the information to solve the task, if there is enough. We propose NetInfoF, a framework including NetInfoF_Probe and NetInfoF_Act, for the measurement and the exploitation of network usable information (NUI), respectively. Given a graph data, NetInfoF_Probe measures NUI without any model training, and NetInfoF_Act solves link prediction and node classification, while two modules share the same backbone. In summary, NetInfoF has following notable advantages: (a) General, handling both link prediction and node classification; (b) Principled, with theoretical guarantee and closed-form solution; (c) Effective, thanks to the proposed adjustment to node similarity; (d) Scalable, scaling linearly with the input size. In our carefully designed synthetic datasets, NetInfoF correctly identifies the ground truth of NUI and is the only method being robust to all graph scenarios. Applied on real-world datasets, NetInfoF wins in 11 out of 12 times on link prediction compared to general GNN baselines. △ Less

Submitted 20 March, 2024; v1 submitted 12 February, 2024; originally announced February 2024.

Comments: Accepted to ICLR 2024 (Spotlight)

arXiv:2401.16444 [pdf, other]

Enhancing Human Experience in Human-Agent Collaboration: A Human-Centered Modeling Approach Based on Positive Human Gain

Authors: Yiming Gao, Feiyu Liu, Liang Wang, Zhenjie Lian, Dehua Zheng, Weixuan Wang, Wenjin Yang, Siqin Li, Xianliang Wang, Wenhui Chen, Jing Dai, Qiang Fu, Wei Yang, Lanxiao Huang, Wei Liu

Abstract: Existing game AI research mainly focuses on enhancing agents' abilities to win games, but this does not inherently make humans have a better experience when collaborating with these agents. For example, agents may dominate the collaboration and exhibit unintended or detrimental behaviors, leading to poor experiences for their human partners. In other words, most game AI agents are modeled in a "se… ▽ More Existing game AI research mainly focuses on enhancing agents' abilities to win games, but this does not inherently make humans have a better experience when collaborating with these agents. For example, agents may dominate the collaboration and exhibit unintended or detrimental behaviors, leading to poor experiences for their human partners. In other words, most game AI agents are modeled in a "self-centered" manner. In this paper, we propose a "human-centered" modeling scheme for collaborative agents that aims to enhance the experience of humans. Specifically, we model the experience of humans as the goals they expect to achieve during the task. We expect that agents should learn to enhance the extent to which humans achieve these goals while maintaining agents' original abilities (e.g., winning games). To achieve this, we propose the Reinforcement Learning from Human Gain (RLHG) approach. The RLHG approach introduces a "baseline", which corresponds to the extent to which humans primitively achieve their goals, and encourages agents to learn behaviors that can effectively enhance humans in achieving their goals better. We evaluate the RLHG agent in the popular Multi-player Online Battle Arena (MOBA) game, Honor of Kings, by conducting real-world human-agent tests. Both objective performance and subjective preference results show that the RLHG agent provides participants better gaming experience. △ Less

Submitted 28 January, 2024; originally announced January 2024.

Comments: Accepted at ICLR 2024. arXiv admin note: text overlap with arXiv:2304.11632

arXiv:2401.15560 [pdf]

An Analysis of Letter Dynamics in the English Alphabet

Authors: Neil Zhao, Diana Zheng

Abstract: The frequency with which the letters of the English alphabet appear in writings has been applied to the field of cryptography, the development of keyboard mechanics, and the study of linguistics. We expanded on the statistical analysis of the English alphabet by examining the average frequency which each letter appears in different categories of writings. We evaluated news articles, novels, plays,… ▽ More The frequency with which the letters of the English alphabet appear in writings has been applied to the field of cryptography, the development of keyboard mechanics, and the study of linguistics. We expanded on the statistical analysis of the English alphabet by examining the average frequency which each letter appears in different categories of writings. We evaluated news articles, novels, plays, scientific publications and calculated the frequency of each letter of the alphabet, the information density of each letter, and the overall letter distribution. Furthermore, we developed a metric known as distance, d that can be used to algorithmically recognize different categories of writings. The results of our study can be applied to information transmission, large data curation, and linguistics. △ Less

Submitted 27 January, 2024; originally announced January 2024.

Comments: 22 pages, 6 figures, 5 tables

MSC Class: 94A15

arXiv:2401.00283 [pdf, other]

Near-Space Communications: the Last Piece of 6G Space-Air-Ground-Sea Integrated Network Puzzle

Authors: Hongshan Liu, Tong Qin, Zhen Gao, Tianqi Mao, Keke Ying, Ziwei Wan, Li Qiao, Rui Na, Zhongxiang Li, Chun Hu, Yikun Mei, Tuan Li, Guanghui Wen, Lei Chen, Zhonghuai Wu, Ruiqi Liu, Gaojie Chen, Shuo Wang, Dezhi Zheng

Abstract: This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis… ▽ More This article presents a comprehensive study on the emerging near-space communications (NS-COM) within the context of space-air-ground-sea integrated network (SAGSIN). Specifically, we firstly explore the recent technical developments of NS-COM, followed by the discussions about motivations behind integrating NS-COM into SAGSIN. To further demonstrate the necessity of NS-COM, a comparative analysis between the NS-COM network and other counterparts in SAGSIN is conducted, covering aspects of deployment, coverage, channel characteristics and unique problems of NS-COM network. Afterwards, the technical aspects of NS-COM, including channel modeling, random access, channel estimation, array-based beam management and joint network optimization, are examined in detail. Furthermore, we explore the potential applications of NS-COM, such as structural expansion in SAGSIN communication, civil aviation communication, remote and urgent communication, weather monitoring and carbon neutrality. Finally, some promising research avenues are identified, including stratospheric satellite (StratoSat) -to-ground direct links for mobile terminals, reconfigurable multiple-input multiple-output (MIMO) and holographic MIMO, federated learning in NS-COM networks, maritime communication, electromagnetic spectrum sensing and adversarial game, integrated sensing and communications, StratoSat-based radar detection and imaging, NS-COM assisted enhanced global navigation system, NS-COM assisted intelligent unmanned system and free space optical (FSO) communication. Overall, this paper highlights that the NS-COM plays an indispensable role in the SAGSIN puzzle, providing substantial performance and coverage enhancement to the traditional SAGSIN architecture. △ Less

Submitted 4 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

Comments: 28 pages, 8 figures, 2 tables

arXiv:2312.08887 [pdf, other]

SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models

Authors: Weilong Chai, DanDan Zheng, Jiajiong Cao, Zhiquan Chen, Changbao Wang, Chenguang Ma

Abstract: Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Though many acceleration methods have been proposed, they suffer from generation quality degradation or extra training cost generalizing to new fine-tuned models. To address these limitations, we propose a novel and universal Stable-Diffusion (SD) acceleration module called Speed… ▽ More Text-to-image diffusion models (SD) exhibit significant advancements while requiring extensive computational resources. Though many acceleration methods have been proposed, they suffer from generation quality degradation or extra training cost generalizing to new fine-tuned models. To address these limitations, we propose a novel and universal Stable-Diffusion (SD) acceleration module called SpeedUpNet(SUN). SUN can be directly plugged into various fine-tuned SD models without extra training. This technique utilizes cross-attention layers to learn the relative offsets in the generated image results between negative and positive prompts achieving classifier-free guidance distillation with negative prompts controllable, and introduces a Multi-Step Consistency (MSC) loss to ensure a harmonious balance between reducing inference steps and maintaining consistency in the generated output. Consequently, SUN significantly reduces the number of inference steps to just 4 steps and eliminates the need for classifier-free guidance. It leads to an overall speedup of more than 10 times for SD models compared to the state-of-the-art 25-step DPM-solver++, and offers two extra advantages: (1) classifier-free guidance distillation with controllable negative prompts and (2) seamless integration into various fine-tuned Stable-Diffusion models without training. The effectiveness of the SUN has been verified through extensive experimentation. Project Page: https://williechai.github.io/speedup-plugin-for-stable-diffusions.github.io △ Less

Submitted 20 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: Table 1. shows the comparison with existing methods, but the lack of experimental data of the LCM method under 12-step makes the table incomplete. We need to temporarily withdraw the manuscript and conduct corresponding experiments before resubmitting it

arXiv:2312.06682 [pdf, other]

Learning to Denoise Unreliable Interactions for Link Prediction on Biomedical Knowledge Graph

Authors: Tengfei Ma, Yujie Chen, Wen Tao, Dashun Zheng, Xuan Lin, Patrick Cheong-lao Pang, Yiping Liu, Yijun Wang, Bosheng Song, Xiangxiang Zeng

Abstract: Link prediction in biomedical knowledge graphs (KGs) aims at predicting unknown interactions between entities, including drug-target interaction (DTI) and drug-drug interaction (DDI), which is critical for drug discovery and therapeutics. Previous methods prefer to utilize the rich semantic relations and topological structure of the KG to predict missing links, yielding promising outcomes. However… ▽ More Link prediction in biomedical knowledge graphs (KGs) aims at predicting unknown interactions between entities, including drug-target interaction (DTI) and drug-drug interaction (DDI), which is critical for drug discovery and therapeutics. Previous methods prefer to utilize the rich semantic relations and topological structure of the KG to predict missing links, yielding promising outcomes. However, all these works only focus on improving the predictive performance without considering the inevitable noise and unreliable interactions existing in the KGs, which limits the development of KG-based computational methods. To address these limitations, we propose a Denoised Link Prediction framework, called DenoisedLP. DenoisedLP obtains reliable interactions based on the local subgraph by denoising noisy links in a learnable way, providing a universal module for mining underlying task-relevant relations. To collaborate with the smoothed semantic information, DenoisedLP introduces the semantic subgraph by blurring conflict relations around the predicted link. By maximizing the mutual information between the reliable structure and smoothed semantic relations, DenoisedLP emphasizes the informative interactions for predicting relation-specific links. Experimental results on real-world datasets demonstrate that DenoisedLP outperforms state-of-the-art methods on DTI and DDI prediction tasks, and verify the effectiveness and robustness of denoising unreliable interactions on the contaminated KGs. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.05474 [pdf, ps, other]

The duals of narrow-sense BCH codes with length $\frac{q^m-1}λ$

Authors: Xiaoqiang Wang, Chengliang Xiao, Dabin Zheng

Abstract: BCH codes are an interesting class of cyclic codes due to their efficient encoding and decoding algorithms. In the past sixty years, a lot of progress on the study of BCH codes has been made, but little is known about the properties of their duals. Recently, in order to study the duals of BCH codes and the lower bounds on their minimum distances, a new concept called dually-BCH code was proposed b… ▽ More BCH codes are an interesting class of cyclic codes due to their efficient encoding and decoding algorithms. In the past sixty years, a lot of progress on the study of BCH codes has been made, but little is known about the properties of their duals. Recently, in order to study the duals of BCH codes and the lower bounds on their minimum distances, a new concept called dually-BCH code was proposed by authors in \cite{GDL21}. In this paper, the lower bounds on the minimum distances of the duals of narrow-sense BCH codes with length $\frac{q^m-1}λ$ over $\mathbb{F}_q$ are developed, where $λ$ is a positive integer satisfying $λ\, |\, q-1$, or $λ=q^s-1$ and $s\, |\,m$. In addition, the sufficient and necessary conditions in terms of the designed distances for these codes being dually-BCH codes are presented. Many considered codes in \cite{GDL21} and \cite{Wang23} are the special cases of the codes showed in this paper. Our lower bounds on the minimum distances of the duals of BCH codes include the bounds stated in \cite{GDL21} as a special case. Several examples show that the lower bounds are good in some cases. △ Less

Submitted 9 December, 2023; originally announced December 2023.

arXiv:2312.02010 [pdf, other]

Towards Learning a Generalist Model for Embodied Navigation

Authors: Duo Zheng, Shijia Huang, Lin Zhao, Yiwu Zhong, Liwei Wang

Abstract: Building a generalist agent that can interact with the world is the intriguing target of AI systems, thus spurring the research for embodied navigation, where an agent is required to navigate according to instructions or respond to queries. Despite the major progress attained, previous works primarily focus on task-specific agents and lack generalizability to unseen scenarios. Recently, LLMs have… ▽ More Building a generalist agent that can interact with the world is the intriguing target of AI systems, thus spurring the research for embodied navigation, where an agent is required to navigate according to instructions or respond to queries. Despite the major progress attained, previous works primarily focus on task-specific agents and lack generalizability to unseen scenarios. Recently, LLMs have presented remarkable capabilities across various fields, and provided a promising opportunity for embodied navigation. Drawing on this, we propose the first generalist model for embodied navigation, NaviLLM. It adapts LLMs to embodied navigation by introducing schema-based instruction. The schema-based instruction flexibly casts various tasks into generation problems, thereby unifying a wide range of tasks. This approach allows us to integrate diverse data sources from various datasets into the training, equipping NaviLLM with a wide range of capabilities required by embodied navigation. We conduct extensive experiments to evaluate the performance and generalizability of our model. The experimental results demonstrate that our unified model achieves state-of-the-art performance on CVDN, SOON, and ScanQA. Specifically, it surpasses the previous stats-of-the-art method by a significant margin of 29% in goal progress on CVDN. Moreover, our model also demonstrates strong generalizability and presents impressive results on unseen tasks, e.g., embodied question answering and 3D captioning. △ Less

Submitted 1 April, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

Comments: Accepted by CVPR 2024 (14 pages, 3 figures)

arXiv:2311.18432 [pdf, ps, other]

Three classes of new optimal cyclic $(r,δ)$ locally recoverable codes

Authors: Yaozong Zhang, Dabin Zheng, Xiaoqiang Wang

Abstract: An $(r, δ)$-locally repairable code ($(r, δ)$-LRC for short) was introduced by Prakash et al. for tolerating multiple failed nodes in distributed storage systems, and has garnered significant interest among researchers. An $(r,δ)$-LRC is called an optimal code if its parameters achieve the Singleton-like bound. In this paper, we construct three classes of $q$-ary optimal cyclic $(r,δ)$-LRCs with n… ▽ More An $(r, δ)$-locally repairable code ($(r, δ)$-LRC for short) was introduced by Prakash et al. for tolerating multiple failed nodes in distributed storage systems, and has garnered significant interest among researchers. An $(r,δ)$-LRC is called an optimal code if its parameters achieve the Singleton-like bound. In this paper, we construct three classes of $q$-ary optimal cyclic $(r,δ)$-LRCs with new parameters by investigating the defining sets of cyclic codes. Our results generalize the related work of \cite{Chen2022,Qian2020}, and the obtained optimal cyclic $(r, δ)$-LRCs have flexible parameters. A lot of numerical examples of optimal cyclic $(r, δ)$-LRCs are given to show that our constructions are capable of generating new optimal cyclic $(r, δ)$-LRCs. △ Less

Submitted 30 November, 2023; originally announced November 2023.

arXiv:2311.10372 [pdf, other]

A Survey of Large Language Models for Code: Evolution, Benchmarking, and Future Trends

Authors: Zibin Zheng, Kaiwen Ning, Yanlin Wang, Jingwen Zhang, Dewu Zheng, Mingxi Ye, Jiachi Chen

Abstract: General large language models (LLMs), represented by ChatGPT, have demonstrated significant potential in tasks such as code generation in software engineering. This has led to the development of specialized LLMs for software engineering, known as Code LLMs. A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning. As a result, Code LLMs are often updated frequentl… ▽ More General large language models (LLMs), represented by ChatGPT, have demonstrated significant potential in tasks such as code generation in software engineering. This has led to the development of specialized LLMs for software engineering, known as Code LLMs. A considerable portion of Code LLMs is derived from general LLMs through model fine-tuning. As a result, Code LLMs are often updated frequently and their performance can be influenced by the base LLMs. However, there is currently a lack of systematic investigation into Code LLMs and their performance. In this study, we conduct a comprehensive survey and analysis of the types of Code LLMs and their differences in performance compared to general LLMs. We aim to address three questions: (1) What LLMs are specifically designed for software engineering tasks, and what is the relationship between these Code LLMs? (2) Do Code LLMs really outperform general LLMs in software engineering tasks? (3) Which LLMs are more proficient in different software engineering tasks? To answer these questions, we first collect relevant literature and work from five major databases and open-source communities, resulting in 134 works for analysis. Next, we categorize the Code LLMs based on their publishers and examine their relationships with general LLMs and among themselves. Furthermore, we investigate the performance differences between general LLMs and Code LLMs in various software engineering tasks to demonstrate the impact of base models and Code LLMs. Finally, we comprehensively maintained the performance of LLMs across multiple mainstream benchmarks to identify the best-performing LLMs for each software engineering task. Our research not only assists developers of Code LLMs in choosing base models for the development of more advanced LLMs but also provides insights for practitioners to better understand key improvement directions for Code LLMs. △ Less

Submitted 8 January, 2024; v1 submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.07993 [pdf, other]

Explicit Change Relation Learning for Change Detection in VHR Remote Sensing Images

Authors: Dalong Zheng, Zebin Wu, Jia Liu, Chih-Cheng Hung, Zhihui Wei

Abstract: Change detection has always been a concerned task in the interpretation of remote sensing images. It is essentially a unique binary classification task with two inputs, and there is a change relationship between these two inputs. At present, the mining of change relationship features is usually implicit in the network architectures that contain single-branch or two-branch encoders. However, due to… ▽ More Change detection has always been a concerned task in the interpretation of remote sensing images. It is essentially a unique binary classification task with two inputs, and there is a change relationship between these two inputs. At present, the mining of change relationship features is usually implicit in the network architectures that contain single-branch or two-branch encoders. However, due to the lack of artificial prior design for change relationship features, these networks cannot learn enough change semantic information and lose more accurate change detection performance. So we propose a network architecture NAME for the explicit mining of change relation features. In our opinion, the change features of change detection should be divided into pre-changed image features, post-changed image features and change relation features. In order to fully mine these three kinds of change features, we propose the triple branch network combining the transformer and convolutional neural network (CNN) to extract and fuse these change features from two perspectives of global information and local information, respectively. In addition, we design the continuous change relation (CCR) branch to further obtain the continuous and detail change relation features to improve the change discrimination capability of the model. The experimental results show that our network performs better, in terms of F1, IoU, and OA, than those of the existing advanced networks for change detection on four public very high-resolution (VHR) remote sensing datasets. Our source code is available at https://github.com/DalongZ/NAME. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2311.05697 [pdf, other]

doi 10.3390/cancers15235496

3DGAUnet: 3D generative adversarial networks with a 3D U-Net based generator to achieve the accurate and effective synthesis of clinical tumor image data for pancreatic cancer

Authors: Yu Shi, Hannah Tang, Michael Baine, Michael A. Hollingsworth, Huijing Du, Dandan Zheng, Chi Zhang, Hongfeng Yu

Abstract: Pancreatic ductal adenocarcinoma (PDAC) presents a critical global health challenge, and early detection is crucial for improving the 5-year survival rate. Recent medical imaging and computational algorithm advances offer potential solutions for early diagnosis. Deep learning, particularly in the form of convolutional neural networks (CNNs), has demonstrated success in medical image analysis tasks… ▽ More Pancreatic ductal adenocarcinoma (PDAC) presents a critical global health challenge, and early detection is crucial for improving the 5-year survival rate. Recent medical imaging and computational algorithm advances offer potential solutions for early diagnosis. Deep learning, particularly in the form of convolutional neural networks (CNNs), has demonstrated success in medical image analysis tasks, including classification and segmentation. However, the limited availability of clinical data for training purposes continues to provide a significant obstacle. Data augmentation, generative adversarial networks (GANs), and cross-validation are potential techniques to address this limitation and improve model performance, but effective solutions are still rare for 3D PDAC, where contrast is especially poor owing to the high heterogeneity in both tumor and background tissues. In this study, we developed a new GAN-based model, named 3DGAUnet, for generating realistic 3D CT images of PDAC tumors and pancreatic tissue, which can generate the interslice connection data that the existing 2D CT image synthesis models lack. Our innovation is to develop a 3D U-Net architecture for the generator to improve shape and texture learning for PDAC tumors and pancreatic tissue. Our approach offers a promising path to tackle the urgent requirement for creative and synergistic methods to combat PDAC. The development of this GAN-based model has the potential to alleviate data scarcity issues, elevate the quality of synthesized data, and thereby facilitate the progression of deep learning models to enhance the accuracy and early detection of PDAC tumors, which could profoundly impact patient outcomes. Furthermore, this model has the potential to be adapted to other types of solid tumors, hence making significant contributions to the field of medical imaging in terms of image processing models. △ Less

Submitted 27 November, 2023; v1 submitted 9 November, 2023; originally announced November 2023.

Comments: Published on Cancers: Shi, Yu, Hannah Tang, Michael J. Baine, Michael A. Hollingsworth, Huijing Du, Dandan Zheng, Chi Zhang, and Hongfeng Yu. 2023. "3DGAUnet: 3D Generative Adversarial Networks with a 3D U-Net Based Generator to Achieve the Accurate and Effective Synthesis of Clinical Tumor Image Data for Pancreatic Cancer" Cancers 15, no. 23: 5496

arXiv:2311.05141 [pdf, other]

Differentiable Cloth Parameter Identification and State Estimation in Manipulation

Authors: Dongzhe Zheng, Siqiong Yao, Wenqiang Xu, Cewu Lu

Abstract: In the realm of robotic cloth manipulation, accurately estimating the cloth state during or post-execution is imperative. However, the inherent complexities in a cloth's dynamic behavior and its near-infinite degrees of freedom (DoF) pose significant challenges. Traditional methods have been restricted to using keypoints or boundaries as cues for cloth state, which do not holistically capture the… ▽ More In the realm of robotic cloth manipulation, accurately estimating the cloth state during or post-execution is imperative. However, the inherent complexities in a cloth's dynamic behavior and its near-infinite degrees of freedom (DoF) pose significant challenges. Traditional methods have been restricted to using keypoints or boundaries as cues for cloth state, which do not holistically capture the cloth's structure, especially during intricate tasks like folding. Additionally, the critical influence of cloth physics has often been overlooked in past research. Addressing these concerns, we introduce DiffCP, a novel differentiable pipeline that leverages the Anisotropic Elasto-Plastic (A-EP) constitutive model, tailored for differentiable computation and robotic tasks. DiffCP adopts a ``real-to-sim-to-real'' methodology. By observing real-world cloth states through an RGB-D camera and projecting this data into a differentiable simulator, the system identifies physics parameters by minimizing the geometric variance between observed and target states. Extensive experiments demonstrate DiffCP's ability and stability to determine physics parameters under varying manipulations, grasping points, and speeds. Additionally, its applications extend to cloth material identification, manipulation trajectory generation, and more notably, enhancing cloth pose estimation accuracy. More experiments and videos can be found in the supplementary materials and on the website: https://sites.google.com/view/diffcp. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.05137 [pdf, other]

Differentiable Fluid Physics Parameter Identification Via Stirring

Authors: Wenqiang Xu, Dongzhe Zheng, Yutong Li, Jieji Ren, Cewu Lu

Abstract: Fluid interactions permeate daily human activities, with properties like density and viscosity playing pivotal roles in household tasks. While density estimation is straightforward through Archimedes' principle, viscosity poses a more intricate challenge, especially given the varied behaviors of Newtonian and non-Newtonian fluids. These fluids, which differ in their stress-strain relationships, ar… ▽ More Fluid interactions permeate daily human activities, with properties like density and viscosity playing pivotal roles in household tasks. While density estimation is straightforward through Archimedes' principle, viscosity poses a more intricate challenge, especially given the varied behaviors of Newtonian and non-Newtonian fluids. These fluids, which differ in their stress-strain relationships, are delineated by specific constitutive models such as the Carreau, Cross, and Herschel-Bulkley models, each possessing unique viscosity parameters. This study introduces a novel differentiable fitting framework, DiffStir, tailored to identify key physics parameters via the common daily operation of stirring. By employing a robotic arm for stirring and harnessing a differentiable Material Point Method (diffMPM)-based simulator, the framework can determine fluid parameters by matching observations from both the simulator and the real world. Recognizing the distinct preferences of the aforementioned constitutive models for specific fluids, an online strategy was adopted to adaptively select the most fitting model based on real-world data. Additionally, we propose a refining neural network to bridge the sim-to-real gap and mitigate sensor noise-induced inaccuracies. Comprehensive experiments were conducted to validate the efficacy of DiffStir, showcasing its precision in parameter estimation when benchmarked against reported literature values. More experiments and videos can be found in the supplementary materials and on the website: https://sites.google.com/view/diffstir. △ Less

Submitted 8 November, 2023; originally announced November 2023.

arXiv:2311.01267 [pdf, other]

UniFolding: Towards Sample-efficient, Scalable, and Generalizable Robotic Garment Folding

Authors: Han Xue, Yutong Li, Wenqiang Xu, Huanyu Li, Dongzhe Zheng, Cewu Lu

Abstract: This paper explores the development of UniFolding, a sample-efficient, scalable, and generalizable robotic system for unfolding and folding various garments. UniFolding employs the proposed UFONet neural network to integrate unfolding and folding decisions into a single policy model that is adaptable to different garment types and states. The design of UniFolding is based on a garment's partial po… ▽ More This paper explores the development of UniFolding, a sample-efficient, scalable, and generalizable robotic system for unfolding and folding various garments. UniFolding employs the proposed UFONet neural network to integrate unfolding and folding decisions into a single policy model that is adaptable to different garment types and states. The design of UniFolding is based on a garment's partial point cloud, which aids in generalization and reduces sensitivity to variations in texture and shape. The training pipeline prioritizes low-cost, sample-efficient data collection. Training data is collected via a human-centric process with offline and online stages. The offline stage involves human unfolding and folding actions via Virtual Reality, while the online stage utilizes human-in-the-loop learning to fine-tune the model in a real-world setting. The system is tested on two garment types: long-sleeve and short-sleeve shirts. Performance is evaluated on 20 shirts with significant variations in textures, shapes, and materials. More experiments and videos can be found in the supplementary materials and on the website: https://unifolding.robotflow.ai △ Less

Submitted 2 November, 2023; originally announced November 2023.

Comments: CoRL 2023

arXiv:2310.17331 [pdf]

A novel solution for seepage problems using physics-informed neural networks

Authors: Tianfu Luo, Yelin Feng, Qingfu Huang, Zongliang Zhang, Mingjiao Yan, Zaihong Yang, Dawei Zheng, Yang Yang

Abstract: A Physics-Informed Neural Network (PINN) provides a distinct advantage by synergizing neural networks' capabilities with the problem's governing physical laws. In this study, we introduce an innovative approach for solving seepage problems by utilizing the PINN, harnessing the capabilities of Deep Neural Networks (DNNs) to approximate hydraulic head distributions in seepage analysis. To effectivel… ▽ More A Physics-Informed Neural Network (PINN) provides a distinct advantage by synergizing neural networks' capabilities with the problem's governing physical laws. In this study, we introduce an innovative approach for solving seepage problems by utilizing the PINN, harnessing the capabilities of Deep Neural Networks (DNNs) to approximate hydraulic head distributions in seepage analysis. To effectively train the PINN model, we introduce a comprehensive loss function comprising three components: one for evaluating differential operators, another for assessing boundary conditions, and a third for appraising initial conditions. The validation of the PINN involves solving four benchmark seepage problems. The results unequivocally demonstrate the exceptional accuracy of the PINN in solving seepage problems, surpassing the accuracy of FEM in addressing both steady-state and free-surface seepage problems. Hence, the presented approach highlights the robustness of the PINN and underscores its precision in effectively addressing a spectrum of seepage challenges. This amalgamation enables the derivation of accurate solutions, overcoming limitations inherent in conventional methods such as mesh generation and adaptability to complex geometries. △ Less

Submitted 25 November, 2023; v1 submitted 26 October, 2023; originally announced October 2023.

arXiv:2310.15363 [pdf, other]

An Optimal Algorithm for Higher-Order Voronoi Diagrams in the Plane: The Usefulness of Nondeterminism

Authors: Timothy M. Chan, Pingan Cheng, Da Wei Zheng

Abstract: We present the first optimal randomized algorithm for constructing the order-$k$ Voronoi diagram of $n$ points in two dimensions. The expected running time is $O(n\log n + nk)$, which improves the previous, two-decades-old result of Ramos (SoCG'99) by a $2^{O(\log^*k)}$ factor. To obtain our result, we (i) use a recent decision-tree technique of Chan and Zheng (SODA'22) in combination with Ramos's… ▽ More We present the first optimal randomized algorithm for constructing the order-$k$ Voronoi diagram of $n$ points in two dimensions. The expected running time is $O(n\log n + nk)$, which improves the previous, two-decades-old result of Ramos (SoCG'99) by a $2^{O(\log^*k)}$ factor. To obtain our result, we (i) use a recent decision-tree technique of Chan and Zheng (SODA'22) in combination with Ramos's cutting construction, to reduce the problem to verifying an order-$k$ Voronoi diagram, and (ii) solve the verification problem by a new divide-and-conquer algorithm using planar-graph separators. We also describe a deterministic algorithm for constructing the $k$-level of $n$ lines in two dimensions in $O(n\log n + nk^{1/3})$ time, and constructing the $k$-level of $n$ planes in three dimensions in $O(n\log n + nk^{3/2})$ time. These time bounds (ignoring the $n\log n$ term) match the current best upper bounds on the combinatorial complexity of the $k$-level. Previously, the same time bound in two dimensions was obtained by Chan (1999) but with randomization. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: To appear in SODA 2024. 16 pages, 1 figure

arXiv:2310.11873 [pdf, ps, other]

The Weight Hierarchies of Linear Codes from Simplicial Complexes

Authors: Chao Liu, Dabin Zheng, Wei Lu, Xiaoqiang Wang

Abstract: The study of the generalized Hamming weight of linear codes is a significant research topic in coding theory as it conveys the structural information of the codes and determines their performance in various applications. However, determining the generalized Hamming weights of linear codes, especially the weight hierarchy, is generally challenging. In this paper, we investigate the generalized Hamm… ▽ More The study of the generalized Hamming weight of linear codes is a significant research topic in coding theory as it conveys the structural information of the codes and determines their performance in various applications. However, determining the generalized Hamming weights of linear codes, especially the weight hierarchy, is generally challenging. In this paper, we investigate the generalized Hamming weights of a class of linear code $\C$ over $\bF_q$, which is constructed from defining sets. These defining sets are either special simplicial complexes or their complements in $\bF_q^m$. We determine the complete weight hierarchies of these codes by analyzing the maximum or minimum intersection of certain simplicial complexes and all $r$-dimensional subspaces of $\bF_q^m$, where $1\leq r\leq {\rm dim}_{\bF_q}(\C)$. △ Less

Submitted 18 October, 2023; originally announced October 2023.

arXiv:2310.06328 [pdf, other]

Antenna Response Consistency Driven Self-supervised Learning for WIFI-based Human Activity Recognition

Authors: Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

Abstract: Self-supervised learning (SSL) for WiFi-based human activity recognition (HAR) holds great promise due to its ability to address the challenge of insufficient labeled data. However, directly transplanting SSL algorithms, especially contrastive learning, originally designed for other domains to CSI data, often fails to achieve the expected performance. We attribute this issue to the inappropriate a… ▽ More Self-supervised learning (SSL) for WiFi-based human activity recognition (HAR) holds great promise due to its ability to address the challenge of insufficient labeled data. However, directly transplanting SSL algorithms, especially contrastive learning, originally designed for other domains to CSI data, often fails to achieve the expected performance. We attribute this issue to the inappropriate alignment criteria, which disrupt the semantic distance consistency between the feature space and the input space. To address this challenge, we introduce \textbf{A}ntenna \textbf{R}esponse \textbf{C}onsistency (ARC) as a solution to define proper alignment criteria. ARC is designed to retain semantic information from the input space while introducing robustness to real-world noise. Moreover, we substantiate the effectiveness of ARC through a comprehensive set of experiments, demonstrating its capability to enhance the performance of self-supervised learning for WiFi-based HAR by achieving an increase of over 5\% in accuracy in most cases and achieving a best accuracy of 94.97\%. △ Less

Submitted 28 November, 2023; v1 submitted 10 October, 2023; originally announced October 2023.

arXiv:2309.15668 [pdf, other]

A New Centralized Multi-Node Repair Scheme of MSR codes with Error-Correcting Capability

Authors: Shenghua Li, Maximilien Gadouleau, Jiaojiao Wang, Dabin Zheng

Abstract: Minimum storage regenerating (MSR) codes, with the MDS property and the optimal repair bandwidth, are widely used in distributed storage systems (DSS) for data recovery. In this paper, we consider the construction of $(n,k,l)$ MSR codes in the centralized model that can repair $h$ failed nodes simultaneously with $e$ out $d$ helper nodes providing erroneous information. We first propose the new re… ▽ More Minimum storage regenerating (MSR) codes, with the MDS property and the optimal repair bandwidth, are widely used in distributed storage systems (DSS) for data recovery. In this paper, we consider the construction of $(n,k,l)$ MSR codes in the centralized model that can repair $h$ failed nodes simultaneously with $e$ out $d$ helper nodes providing erroneous information. We first propose the new repair scheme, and give a complete proof of the lower bound on the amount of symbols downloaded from the helped nodes, provided that some of helper nodes provide erroneous information. Then we focus on two explicit constructions with the repair scheme proposed. For $2\leq h\leq n-k$, $k+2e\leq d \leq n-h$ and $d\equiv k+2e \;(\mod{h})$, the first one has the UER $(h, d)$-optimal repair property, and the second one has the UER $(h, d)$-optimal access property. Compared with the original constructions (Ye and Barg, IEEE Tran. Inf. Theory, Vol. 63, April 2017), our constructions have improvements in three aspects: 1) The proposed repair scheme is more feasible than the one-by-one scheme presented by Ye and Barg in a parallel data system; 2) The sub-packetization is reduced from $\left(\operatorname{lcm}(d-k+1, d-k+2,\cdots, d-k+h)\right)^n$ to $\left((d-2e-k+h)/h\right)^n$, which reduces at least by a factor of $(h(d-k+h))^n$; 3) The field size of the first construction is reduced to $|\mathbb{F}| \geq n(d-2e-k+h)/h$, which reduces at least by a factor of $h(d-k+h)$. Small sub-packetization and small field size are preferred in practice due to the limited storage capacity and low computation complexity in the process of encoding, decoding and repairing. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.14954 [pdf, other]

Addressing preferred orientation in single-particle cryo-EM through AI-generated auxiliary particles

Authors: Hui Zhang, Dihan Zheng, Qiurong Wu, Nieng Yan, Zuoqiang Shi, Mingxu Hu, Chenglong Bao

Abstract: The single-particle cryo-EM field faces the persistent challenge of preferred orientation, lacking general computational solutions. We introduce cryoPROS, an AI-based approach designed to address the above issue. By generating the auxiliary particles with a conditional deep generative model, cryoPROS addresses the intrinsic bias in orientation estimation for the observed particles. We effectively… ▽ More The single-particle cryo-EM field faces the persistent challenge of preferred orientation, lacking general computational solutions. We introduce cryoPROS, an AI-based approach designed to address the above issue. By generating the auxiliary particles with a conditional deep generative model, cryoPROS addresses the intrinsic bias in orientation estimation for the observed particles. We effectively employed cryoPROS in the cryo-EM single particle analysis of the hemagglutinin trimer, showing the ability to restore the near-atomic resolution structure on non-tilt data. Moreover, the enhanced version named cryoPROS-MP significantly improves the resolution of the membrane protein NaX using the no-tilted data that contains the effects of micelles. Compared to the classical approaches, cryoPROS does not need special experimental or image acquisition techniques, providing a purely computational yet effective solution for the preferred orientation problem. Finally, we conduct extensive experiments that establish the low risk of model bias and the high robustness of cryoPROS. △ Less

Submitted 26 September, 2023; originally announced September 2023.

arXiv:2309.12645 [pdf, other]

KuaiSim: A Comprehensive Simulator for Recommender Systems

Authors: Kesen Zhao, Shuchang Liu, Qingpeng Cai, Xiangyu Zhao, Ziru Liu, Dong Zheng, Peng Jiang, Kun Gai

Abstract: Reinforcement Learning (RL)-based recommender systems (RSs) have garnered considerable attention due to their ability to learn optimal recommendation policies and maximize long-term user rewards. However, deploying RL models directly in online environments and generating authentic data through A/B tests can pose challenges and require substantial resources. Simulators offer an alternative approach… ▽ More Reinforcement Learning (RL)-based recommender systems (RSs) have garnered considerable attention due to their ability to learn optimal recommendation policies and maximize long-term user rewards. However, deploying RL models directly in online environments and generating authentic data through A/B tests can pose challenges and require substantial resources. Simulators offer an alternative approach by providing training and evaluation environments for RS models, reducing reliance on real-world data. Existing simulators have shown promising results but also have limitations such as simplified user feedback, lacking consistency with real-world data, the challenge of simulator evaluation, and difficulties in migration and expansion across RSs. To address these challenges, we propose KuaiSim, a comprehensive user environment that provides user feedback with multi-behavior and cross-session responses. The resulting simulator can support three levels of recommendation problems: the request level list-wise recommendation task, the whole-session level sequential recommendation task, and the cross-session level retention optimization task. For each task, KuaiSim also provides evaluation protocols and baseline recommendation algorithms that further serve as benchmarks for future research. We also restructure existing competitive simulators on the KuaiRand Dataset and compare them against KuaiSim to future assess their performance and behavioral differences. Furthermore, to showcase KuaiSim's flexibility in accommodating different datasets, we demonstrate its versatility and robustness when deploying it on the ML-1m dataset. △ Less

Submitted 19 October, 2023; v1 submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.10722 [pdf, other]

LEA*: An A* Variant Algorithm with Improved Edge Efficiency for Robot Motion Planning

Authors: Dongliang Zheng, Panagiotis Tsiotras

Abstract: In this work, we introduce a new graph search algorithm, lazy edged based A* (LEA*), for robot motion planning. By using an edge queue and exploiting the idea of lazy search, LEA* is optimally vertex efficient similar to A*, and has improved edge efficiency compared to A*. LEA* is simple and easy to implement with minimum modification to A*, resulting in a very small overhead compared to previous… ▽ More In this work, we introduce a new graph search algorithm, lazy edged based A* (LEA*), for robot motion planning. By using an edge queue and exploiting the idea of lazy search, LEA* is optimally vertex efficient similar to A*, and has improved edge efficiency compared to A*. LEA* is simple and easy to implement with minimum modification to A*, resulting in a very small overhead compared to previous lazy search algorithms. We also explore the effect of inflated heuristics, which results in the weighted LEA* (wLEA*). We show that the edge efficiency of wLEA* becomes close to LazySP and, thus is near-optimal. We test LEA* and wLEA* on 2D planning problems and planning of a 7-DOF manipulator. We perform a thorough comparison with previous algorithms by considering sparse, medium, and cluttered random worlds and small, medium, and large graph sizes. Our results show that LEA* and wLEA* are the fastest algorithms to find the plan compared to previous algorithms. △ Less

Submitted 19 September, 2023; originally announced September 2023.

arXiv:2309.08420 [pdf, other]

doi 10.1137/1.9781611978032.62

FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning

Authors: Hongyu Zhang, Dongyi Zheng, Xu Yang, Jiyuan Feng, Qing Liao

Abstract: Cross-domain Sequential Recommendation (CSR) which leverages user sequence data from multiple domains has received extensive attention in recent years. However, the existing CSR methods require sharing origin user data across domains, which violates the General Data Protection Regulation (GDPR). Thus, it is necessary to combine federated learning (FL) and CSR to fully utilize knowledge from differ… ▽ More Cross-domain Sequential Recommendation (CSR) which leverages user sequence data from multiple domains has received extensive attention in recent years. However, the existing CSR methods require sharing origin user data across domains, which violates the General Data Protection Regulation (GDPR). Thus, it is necessary to combine federated learning (FL) and CSR to fully utilize knowledge from different domains while preserving data privacy. Nonetheless, the sequence feature heterogeneity across different domains significantly impacts the overall performance of FL. In this paper, we propose FedDCSR, a novel federated cross-domain sequential recommendation framework via disentangled representation learning. Specifically, to address the sequence feature heterogeneity across domains, we introduce an approach called inter-intra domain sequence representation disentanglement (SRD) to disentangle the user sequence features into domain-shared and domain-exclusive features. In addition, we design an intra domain contrastive infomax (CIM) strategy to learn richer domain-exclusive features of users by performing data augmentation on user sequences. Extensive experiments on three real-world scenarios demonstrate that FedDCSR achieves significant improvements over existing baselines. △ Less

Submitted 16 January, 2024; v1 submitted 15 September, 2023; originally announced September 2023.

arXiv:2309.04068 [pdf, ps, other]

Two classes of reducible cyclic codes with large minimum symbol-pair distances

Authors: Xiaoqiang Wang, Yue Su, Dabin Zheng, Wei Lu

Abstract: The high-density data storage technology aims to design high-capacity storage at a relatively low cost. In order to achieve this goal, symbol-pair codes were proposed by Cassuto and Blaum \cite{CB10,CB11} to handle channels that output pairs of overlapping symbols. Such a channel is called symbol-pair read channel, which introduce new concept called symbol-pair weight and minimum symbol-pair dista… ▽ More The high-density data storage technology aims to design high-capacity storage at a relatively low cost. In order to achieve this goal, symbol-pair codes were proposed by Cassuto and Blaum \cite{CB10,CB11} to handle channels that output pairs of overlapping symbols. Such a channel is called symbol-pair read channel, which introduce new concept called symbol-pair weight and minimum symbol-pair distance. In this paper, we consider the parameters of two classes of reducible cyclic codes under the symbol-pair metric. Based on the theory of cyclotomic numbers and Gaussian period over finite fields, we show the possible symbol-pair weights of these codes. Their minimum symbol-pair distances are twice the minimum Hamming distances under some conditions. Moreover, we obtain some three symbol-pair weight codes and determine their symbol-pair weight distribution. A class of MDS symbol-pair codes is also established. Among other results, we determine the values of some generalized cyclotomic numbers. △ Less

Submitted 7 September, 2023; originally announced September 2023.

arXiv:2308.15989 [pdf, other]

DiffuVolume: Diffusion Model for Volume based Stereo Matching

Authors: Dian Zheng, Xiao-Ming Wu, Zuhao Liu, Jingke Meng, Wei-shi Zheng

Abstract: Stereo matching is a significant part in many computer vision tasks and driving-based applications. Recently cost volume-based methods have achieved great success benefiting from the rich geometry information in paired images. However, the redundancy of cost volume also interferes with the model training and limits the performance. To construct a more precise cost volume, we pioneeringly apply the… ▽ More Stereo matching is a significant part in many computer vision tasks and driving-based applications. Recently cost volume-based methods have achieved great success benefiting from the rich geometry information in paired images. However, the redundancy of cost volume also interferes with the model training and limits the performance. To construct a more precise cost volume, we pioneeringly apply the diffusion model to stereo matching. Our method, termed DiffuVolume, considers the diffusion model as a cost volume filter, which will recurrently remove the redundant information from the cost volume. Two main designs make our method not trivial. Firstly, to make the diffusion model more adaptive to stereo matching, we eschew the traditional manner of directly adding noise into the image but embed the diffusion model into a task-specific module. In this way, we outperform the traditional diffusion stereo matching method by 22% EPE improvement and 240 times inference acceleration. Secondly, DiffuVolume can be easily embedded into any volume-based stereo matching network with boost performance but slight parameters rise (only 2%). By adding the DiffuVolume into well-performed methods, we outperform all the published methods on Scene Flow, KITTI2012, KITTI2015 benchmarks and zero-shot generalization setting. It is worth mentioning that the proposed model ranks 1st on KITTI 2012 leader board, 2nd on KITTI 2015 leader board since 15, July 2023. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 17 pages, 11 figures

arXiv:2308.11159 [pdf, other]

SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection

Authors: Dalong Zheng, Zebin Wu, Jia Liu, Zhihui Wei

Abstract: Among the current mainstream change detection networks, transformer is deficient in the ability to capture accurate low-level details, while convolutional neural network (CNN) is wanting in the capacity to understand global information and establish remote spatial relationships. Meanwhile, both of the widely used early fusion and late fusion frameworks are not able to well learn complete change fe… ▽ More Among the current mainstream change detection networks, transformer is deficient in the ability to capture accurate low-level details, while convolutional neural network (CNN) is wanting in the capacity to understand global information and establish remote spatial relationships. Meanwhile, both of the widely used early fusion and late fusion frameworks are not able to well learn complete change features. Therefore, based on swin transformer V2 (Swin V2) and VGG16, we propose an end-to-end compounded dense network SwinV2DNet to inherit the advantages of both transformer and CNN and overcome the shortcomings of existing networks in feature learning. Firstly, it captures the change relationship features through the densely connected Swin V2 backbone, and provides the low-level pre-changed and post-changed features through a CNN branch. Based on these three change features, we accomplish accurate change detection results. Secondly, combined with transformer and CNN, we propose mixed feature pyramid (MFP) which provides inter-layer interaction information and intra-layer multi-scale information for complete feature learning. MFP is a plug and play module which is experimentally proven to be also effective in other change detection networks. Further more, we impose a self-supervision strategy to guide a new CNN branch, which solves the untrainable problem of the CNN branch and provides the semantic change information for the features of encoder. The state-of-the-art (SOTA) change detection scores and fine-grained change maps were obtained compared with other advanced methods on four commonly used public remote sensing datasets. The code is available at https://github.com/DalongZ/SwinV2DNet. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.06689 [pdf, other]

Estimator Meets Equilibrium Perspective: A Rectified Straight Through Estimator for Binary Neural Networks Training

Authors: Xiao-Ming Wu, Dian Zheng, Zuhao Liu, Wei-Shi Zheng

Abstract: Binarization of neural networks is a dominant paradigm in neural networks compression. The pioneering work BinaryConnect uses Straight Through Estimator (STE) to mimic the gradients of the sign function, but it also causes the crucial inconsistency problem. Most of the previous methods design different estimators instead of STE to mitigate it. However, they ignore the fact that when reducing the e… ▽ More Binarization of neural networks is a dominant paradigm in neural networks compression. The pioneering work BinaryConnect uses Straight Through Estimator (STE) to mimic the gradients of the sign function, but it also causes the crucial inconsistency problem. Most of the previous methods design different estimators instead of STE to mitigate it. However, they ignore the fact that when reducing the estimating error, the gradient stability will decrease concomitantly. These highly divergent gradients will harm the model training and increase the risk of gradient vanishing and gradient exploding. To fully take the gradient stability into consideration, we present a new perspective to the BNNs training, regarding it as the equilibrium between the estimating error and the gradient stability. In this view, we firstly design two indicators to quantitatively demonstrate the equilibrium phenomenon. In addition, in order to balance the estimating error and the gradient stability well, we revise the original straight through estimator and propose a power function based estimator, Rectified Straight Through Estimator (ReSTE for short). Comparing to other estimators, ReSTE is rational and capable of flexibly balancing the estimating error with the gradient stability. Extensive experiments on CIFAR-10 and ImageNet datasets show that ReSTE has excellent performance and surpasses the state-of-the-art methods without any auxiliary modules or losses. △ Less

Submitted 25 August, 2023; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: 10 pages, 6 figures. Accepted in ICCV 2023

arXiv:2308.04813 [pdf, other]

CLEVA: Chinese Language Models EVAluation Platform

Authors: Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, Liwei Wang

Abstract: With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of… ▽ More With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs' performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks and a model API, users can conduct a thorough evaluation with minimal coding. Large-scale experiments featuring 23 Chinese LLMs have validated CLEVA's efficacy. △ Less

Submitted 16 October, 2023; v1 submitted 9 August, 2023; originally announced August 2023.

Comments: EMNLP 2023 System Demonstrations camera-ready

arXiv:2308.02412 [pdf, other]

Self-Supervised Learning for WiFi CSI-Based Human Activity Recognition: A Systematic Study

Authors: Ke Xu, Jiangtao Wang, Hongyuan Zhu, Dingchang Zheng

Abstract: Recently, with the advancement of the Internet of Things (IoT), WiFi CSI-based HAR has gained increasing attention from academic and industry communities. By integrating the deep learning technology with CSI-based HAR, researchers achieve state-of-the-art performance without the need of expert knowledge. However, the scarcity of labeled CSI data remains the most prominent challenge when applying d… ▽ More Recently, with the advancement of the Internet of Things (IoT), WiFi CSI-based HAR has gained increasing attention from academic and industry communities. By integrating the deep learning technology with CSI-based HAR, researchers achieve state-of-the-art performance without the need of expert knowledge. However, the scarcity of labeled CSI data remains the most prominent challenge when applying deep learning models in the context of CSI-based HAR due to the privacy and incomprehensibility of CSI-based HAR data. On the other hand, SSL has emerged as a promising approach for learning meaningful representations from data without heavy reliance on labeled examples. Therefore, considerable efforts have been made to address the challenge of insufficient data in deep learning by leveraging SSL algorithms. In this paper, we undertake a comprehensive inventory and analysis of the potential held by different categories of SSL algorithms, including those that have been previously studied and those that have not yet been explored, within the field. We provide an in-depth investigation of SSL algorithms in the context of WiFi CSI-based HAR. We evaluate four categories of SSL algorithms using three publicly available CSI HAR datasets, each encompassing different tasks and environmental settings. To ensure relevance to real-world applications, we design performance metrics that align with specific requirements. Furthermore, our experimental findings uncover several limitations and blind spots in existing work, highlighting the barriers that need to be addressed before SSL can be effectively deployed in real-world WiFi-based HAR applications. Our results also serve as a practical guideline for industry practitioners and provide valuable insights for future research endeavors in this field. △ Less

Submitted 19 July, 2023; originally announced August 2023.

arXiv:2308.00890 [pdf, other]

Tango: rethinking quantization for graph neural network training on GPUs

Authors: Shiyang Chen, Da Zheng, Caiwen Ding, Chengying Huan, Yuede Ji, Hang Liu

Abstract: Graph Neural Networks (GNNs) are becoming increasingly popular due to their superior performance in critical graph-related tasks. While quantization is widely used to accelerate GNN computation, quantized training faces unprecedented challenges. Current quantized GNN training systems often have longer training times than their full-precision counterparts for two reasons: (i) addressing the accurac… ▽ More Graph Neural Networks (GNNs) are becoming increasingly popular due to their superior performance in critical graph-related tasks. While quantization is widely used to accelerate GNN computation, quantized training faces unprecedented challenges. Current quantized GNN training systems often have longer training times than their full-precision counterparts for two reasons: (i) addressing the accuracy challenge leads to excessive overhead, and (ii) the optimization potential exposed by quantization is not adequately leveraged. This paper introduces Tango which re-thinks quantization challenges and opportunities for graph neural network training on GPUs with three contributions: Firstly, we introduce efficient rules to maintain accuracy during quantized GNN training. Secondly, we design and implement quantization-aware primitives and inter-primitive optimizations that can speed up GNN training. Finally, we integrate Tango with the popular Deep Graph Library (DGL) system and demonstrate its superior performance over state-of-the-art approaches on various GNN models and datasets. △ Less

Submitted 31 August, 2023; v1 submitted 1 August, 2023; originally announced August 2023.

arXiv:2307.12612 [pdf, other]

Less is More: Focus Attention for Efficient DETR

Authors: Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, Yunhe Wang

Abstract: DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models. However, all tokens are treated equally without discrimination brings a redundant computational burden in the traditional encoder structure. The recent sparsification strategies exploit a subset of informative tokens to reduce attention complexity maintaining performance t… ▽ More DETR-like models have significantly boosted the performance of detectors and even outperformed classical convolutional models. However, all tokens are treated equally without discrimination brings a redundant computational burden in the traditional encoder structure. The recent sparsification strategies exploit a subset of informative tokens to reduce attention complexity maintaining performance through the sparse encoder. But these methods tend to rely on unreliable model statistics. Moreover, simply reducing the token population hinders the detection performance to a large extent, limiting the application of these sparse models. We propose Focus-DETR, which focuses attention on more informative tokens for a better trade-off between computation efficiency and model accuracy. Specifically, we reconstruct the encoder with dual attention, which includes a token scoring mechanism that considers both localization and category semantic information of the objects from multi-scale feature maps. We efficiently abandon the background queries and enhance the semantic interaction of the fine-grained object queries based on the scores. Compared with the state-of-the-art sparse DETR-like detectors under the same setting, our Focus-DETR gets comparable complexity while achieving 50.4AP (+2.2) on COCO. The code is available at https://github.com/huawei-noah/noah-research/tree/master/Focus-DETR and https://gitee.com/mindspore/models/tree/master/research/cv/Focus-DETR. △ Less

Submitted 24 July, 2023; originally announced July 2023.

Comments: 8 pages, 6 figures, accepted to ICCV2023

Showing 1–50 of 199 results for author: Zheng, D