subscribe to arXiv mailings

PAS: Data-Efficient Plug-and-Play Prompt Augmentation System

Authors: Miao Zheng, Hao Liang, Fan Yang, Haoze Sun, Tianpeng Li, Lingchu Xiong, Yan Zhang, Youzhen Wu, Kun Li, Yanjun Shen, Mingan Lin, Tao Zhang, Guosheng Dong, Yujing Qiao, Kun Fang, Weipeng Chen, Bin Cui, Wentao Zhang, Zenan Zhou

Abstract: In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficul… ▽ More In recent years, the rise of Large Language Models (LLMs) has spurred a growing demand for plug-and-play AI systems. Among the various AI techniques, prompt engineering stands out as particularly significant. However, users often face challenges in writing prompts due to the steep learning curve and significant time investment, and existing automatic prompt engineering (APE) models can be difficult to use. To address this issue, we propose PAS, an LLM-based plug-and-play APE system. PAS utilizes LLMs trained on high-quality, automatically generated prompt complementary datasets, resulting in exceptional performance. In comprehensive benchmarks, PAS achieves state-of-the-art (SoTA) results compared to previous APE models, with an average improvement of 6.09 points. Moreover, PAS is highly efficient, achieving SoTA performance with only 9000 data points. Additionally, PAS can autonomously generate prompt augmentation data without requiring additional human labor. Its flexibility also allows it to be compatible with all existing LLMs and applicable to a wide range of tasks. PAS excels in human evaluations, underscoring its suitability as a plug-in for users. This combination of high performance, efficiency, and flexibility makes PAS a valuable system for enhancing the usability and effectiveness of LLMs through improved prompt engineering. △ Less

Submitted 12 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03104 [pdf, other]

KeyVideoLLM: Towards Large-scale Video Keyframe Selection

Authors: Hao Liang, Jiapeng Li, Tianyi Bai, Chong Chen, Conghui He, Bin Cui, Wentao Zhang

Abstract: Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particular… ▽ More Recently, with the rise of web videos, managing and understanding large-scale video datasets has become increasingly important. Video Large Language Models (VideoLLMs) have emerged in recent years due to their strong video understanding capabilities. However, training and inference processes for VideoLLMs demand vast amounts of data, presenting significant challenges to data management, particularly regarding efficiency, robustness, and effectiveness. In this work, we present KeyVideoLLM, a text-video frame similarity-based keyframe selection method designed to manage VideoLLM data efficiently, robustly, and effectively. Specifically, KeyVideoLLM achieves a remarkable data compression rate of up to 60.9 times, substantially lowering disk space requirements, which proves its high efficiency. Additionally, it maintains a 100% selection success rate across all video formats and scales, enhances processing speed by up to 200 times compared to existing keyframe selection methods, and does not require hyperparameter tuning. Beyond its outstanding efficiency and robustness, KeyVideoLLM further improves model performance in video question-answering tasks during both training and inference stages. Notably, it consistently achieved the state-of-the-art (SoTA) experimental results on diverse datasets. △ Less

Submitted 3 July, 2024; originally announced July 2024.

arXiv:2407.02398 [pdf, other]

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

Authors: Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, Bin Cui

Abstract: Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce… ▽ More Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field. Consistency-FM directly defines straight flows starting from different times to the same endpoint, imposing constraints on their velocity values. Additionally, we propose a multi-segment training approach for Consistency-FM to enhance expressiveness, achieving a better trade-off between sampling quality and speed. Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models and 1.7x faster than rectified flow models while achieving better generation quality. Our code is available at: https://github.com/YangLing0818/consistency_flow_matching △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Code: https://github.com/YangLing0818/consistency_flow_matching

arXiv:2407.01937 [pdf, other]

Efficient-Empathy: Towards Efficient and Effective Selection of Empathy Data

Authors: Linzhuang Sun, Hao Liang, Jingxuan Wei, Linkun Sun, Bihui Yu, Bin Cui, Wentao Zhang

Abstract: In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computation… ▽ More In recent years, with the rapid advancements in large language models (LLMs), achieving excellent empathetic response capability has become a crucial prerequisite. Consequently, managing and understanding large-scale video datasets has gained increasing importance. However, empathetic data are typically trained without any quality selection, leading to inefficient data usage and wasted computational resources. Additionally, using raw data can result in low performance in empathetic dialogues. In this work, we present Efficient-Empathy, a sensibility and rationality score-based data selection algorithm that automatically selects sensibility and rationality data while discarding low-quality data. With only the sensibility data (59% of the full dataset), our trained sensibility model efficiently achieves state-of-the-art (SoTA) performance. Furthermore, with multiple data selection hyperparameters, the sensibility model demonstrates SoTA performance, showcasing the robustness of our method. By integrating sensibility and rationality data with a MoE structure, we achieve even higher performance, demonstrating the effectiveness of our Efficient-Empathy algorithm. △ Less

Submitted 9 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

arXiv:2406.13200 [pdf, other]

RobGC: Towards Robust Graph Condensation

Authors: Xinyi Gao, Hongzhi Yin, Tong Chen, Guanhua Ye, Wentao Zhang, Bin Cui

Abstract: Graph neural networks (GNNs) have attracted widespread attention for their impressive capability of graph representation learning. However, the increasing prevalence of large-scale graphs presents a significant challenge for GNN training due to their computational demands, limiting the applicability of GNNs in various scenarios. In response to this challenge, graph condensation (GC) is proposed as… ▽ More Graph neural networks (GNNs) have attracted widespread attention for their impressive capability of graph representation learning. However, the increasing prevalence of large-scale graphs presents a significant challenge for GNN training due to their computational demands, limiting the applicability of GNNs in various scenarios. In response to this challenge, graph condensation (GC) is proposed as a promising acceleration solution, focusing on generating an informative compact graph that enables efficient training of GNNs while retaining performance. Despite the potential to accelerate GNN training, existing GC methods overlook the quality of large training graphs during both the training and inference stages. They indiscriminately emulate the training graph distributions, making the condensed graphs susceptible to noises within the training graph and significantly impeding the application of GC in intricate real-world scenarios. To address this issue, we propose robust graph condensation (RobGC), a plug-and-play approach for GC to extend the robustness and applicability of condensed graphs in noisy graph structure environments. Specifically, RobGC leverages the condensed graph as a feedback signal to guide the denoising process on the original training graph. A label propagation-based alternating optimization strategy is in place for the condensation and denoising processes, contributing to the mutual purification of the condensed graph and training graph. Additionally, as a GC method designed for inductive graph inference, RobGC facilitates test-time graph denoising by leveraging the noise-free condensed graph to calibrate the structure of the test graph. Extensive experiments show that RobGC is compatible with various GC methods, significantly boosting their robustness under different types and levels of graph structural noises. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.13048 [pdf, other]

Head Pose Estimation and 3D Neural Surface Reconstruction via Monocular Camera in situ for Navigation and Safe Insertion into Natural Openings

Authors: Ruijie Tang, Beilei Cui, Hongliang Ren

Abstract: As the significance of simulation in medical care and intervention continues to grow, it is anticipated that a simplified and low-cost platform can be set up to execute personalized diagnoses and treatments. 3D Slicer can not only perform medical image analysis and visualization but can also provide surgical navigation and surgical planning functions. In this paper, we have chosen 3D Slicer as our… ▽ More As the significance of simulation in medical care and intervention continues to grow, it is anticipated that a simplified and low-cost platform can be set up to execute personalized diagnoses and treatments. 3D Slicer can not only perform medical image analysis and visualization but can also provide surgical navigation and surgical planning functions. In this paper, we have chosen 3D Slicer as our base platform and monocular cameras are used as sensors. Then, We used the neural radiance fields (NeRF) algorithm to complete the 3D model reconstruction of the human head. We compared the accuracy of the NeRF algorithm in generating 3D human head scenes and utilized the MarchingCube algorithm to generate corresponding 3D mesh models. The individual's head pose, obtained through single-camera vision, is transmitted in real-time to the scene created within 3D Slicer. The demonstrations presented in this paper include real-time synchronization of transformations between the human head model in the 3D Slicer scene and the detected head posture. Additionally, we tested a scene where a tool, marked with an ArUco Maker tracked by a single camera, synchronously points to the real-time transformation of the head posture. These demos indicate that our methodology can provide a feasible real-time simulation platform for nasopharyngeal swab collection or intubation. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: Accepted by ICBIR 2024

arXiv:2406.04277 [pdf, other]

VideoTetris: Towards Compositional Text-to-Video Generation

Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui

Abstract: Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio… ▽ More Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio-temporal compositional diffusion to precisely follow complex textual semantics by manipulating and composing the attention maps of denoising networks spatially and temporally. Moreover, we propose an enhanced video data preprocessing to enhance the training data regarding motion dynamics and prompt understanding, equipped with a new reference frame attention mechanism to improve the consistency of auto-regressive video generation. Extensive experiments demonstrate that our VideoTetris achieves impressive qualitative and quantitative results in compositional T2V generation. Code is available at: https://github.com/YangLing0818/VideoTetris △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/YangLing0818/VideoTetris

arXiv:2406.04271 [pdf, other]

Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models

Authors: Ling Yang, Zhaochen Yu, Tianjun Zhang, Shiyi Cao, Minkai Xu, Wentao Zhang, Joseph E. Gonzalez, Bin Cui

Abstract: We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a… ▽ More We introduce Buffer of Thoughts (BoT), a novel and versatile thought-augmented reasoning approach for enhancing accuracy, efficiency and robustness of large language models (LLMs). Specifically, we propose meta-buffer to store a series of informative high-level thoughts, namely thought-template, distilled from the problem-solving processes across various tasks. Then for each problem, we retrieve a relevant thought-template and adaptively instantiate it with specific reasoning structures to conduct efficient reasoning. To guarantee the scalability and stability, we further propose buffer-manager to dynamically update the meta-buffer, thus enhancing the capacity of meta-buffer as more tasks are solved. We conduct extensive experiments on 10 challenging reasoning-intensive tasks, and achieve significant performance improvements over previous SOTA methods: 11% on Game of 24, 20% on Geometric Shapes and 51% on Checkmate-in-One. Further analysis demonstrate the superior generalization ability and model robustness of our BoT, while requiring only 12% of the cost of multi-query prompting methods (e.g., tree/graph of thoughts) on average. Notably, we find that our Llama3-8B+BoT has the potential to surpass Llama3-70B model. Our project is available at: https://github.com/YangLing0818/buffer-of-thought-llm △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project: https://github.com/YangLing0818/buffer-of-thought-llm

arXiv:2406.02419 [pdf, ps, other]

Quasi-two-body decays $B\to P f_0(500)\to Pπ^+π^-$ in the perturbative QCD approach

Authors: Jia-Wei Zhang, Bo-Yan Cui, Xing-Gang Wu, Hai-Bing Fu, Ya-Hui Chen

Abstract: In this paper, we study the quasi-two-body decays $B\to P f_0(500)\to Pπ^+π^-$ (with $P=(π, K, η, η^{\prime})$) within framework of perturbative QCD (PQCD) factorization approach. With the help of $π$-$π$ distribution amplitude and scalar form factor $F_{ππ}(ω^2)$, we calculate the CP averaged branching fraction and the CP asymmetry for the quasi-two-body decays $B\to P f_0(500)\to Pπ^+π^-$. Takin… ▽ More In this paper, we study the quasi-two-body decays $B\to P f_0(500)\to Pπ^+π^-$ (with $P=(π, K, η, η^{\prime})$) within framework of perturbative QCD (PQCD) factorization approach. With the help of $π$-$π$ distribution amplitude and scalar form factor $F_{ππ}(ω^2)$, we calculate the CP averaged branching fraction and the CP asymmetry for the quasi-two-body decays $B\to P f_0(500)\to Pπ^+π^-$. Taking the quasi-two-body decay $B^+ \to π^+ f_0(500) \to π^+ π^+ π^-$ as an explicit example, we present the behaviour of differential branching fraction and direct CP violation versus the $π$-$π$ invariant mass. The total branching fraction and direct CP violation are $\mathcal{B}(B^+\to π^+ [σ\to]π^+π^-) = (1.78 \pm 0.41\pm 0.51) \times 10^{-6}$ and $\mathcal{A}_{\rm CP}(B^+\to π^+ [σ\to]π^+π^-) = (29.8\pm 11.1\pm 13.0)\%$ respectively. Our results could be tested by further experiments. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 15 pages, 3 figures, comments welcome

arXiv:2405.16640 [pdf, other]

A Survey of Multimodal Large Language Model from A Data-centric Perspective

Authors: Tianyi Bai, Hao Liang, Binwang Wan, Ling Yang, Bozhou Li, Yifan Wang, Bin Cui, Conghui He, Binhang Yuan, Wentao Zhang

Abstract: Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these m… ▽ More Human beings perceive the world through diverse senses such as sight, smell, hearing, and touch. Similarly, multimodal large language models (MLLMs) enhance the capabilities of traditional large language models by integrating and processing data from multiple modalities including text, vision, audio, video, and 3D environments. Data plays a pivotal role in the development and refinement of these models. In this survey, we comprehensively review the literature on MLLMs from a data-centric perspective. Specifically, we explore methods for preparing multimodal data during the pretraining and adaptation phases of MLLMs. Additionally, we analyze the evaluation methods for datasets and review benchmarks for evaluating MLLMs. Our survey also outlines potential future research directions. This work aims to provide researchers with a detailed understanding of the data-driven aspects of MLLMs, fostering further exploration and innovation in this field. △ Less

Submitted 26 May, 2024; originally announced May 2024.

arXiv:2405.15193 [pdf, other]

CuckooGraph: A Scalable and Space-Time Efficient Data Structure for Large-Scale Dynamic Graphs

Authors: Zhuochen Fan, Yalun Cai, Zirui Liu, Jiarui Guo, Xin Fan, Tong Yang, Bin Cui

Abstract: Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of gra… ▽ More Graphs play an increasingly important role in various big data applications. However, existing graph data structures cannot simultaneously address the performance bottlenecks caused by the dynamic updates, large scale, and high query complexity of current graphs. This paper proposes a novel data structure for large-scale dynamic graphs called CuckooGraph. It does not need to know the amount of graph data in advance, and can adaptively resize to the most memory-efficient form according to the data scale, realizing multiple graph analytic tasks faster. The key techniques of CuckooGraph include TRANSFORMATION and DENYLIST. TRANSFORMATION fully utilizes the limited memory by designing related data structures that allow flexible space transformations to smoothly expand/tighten the required space depending on the number of incoming items. DENYLIST efficiently handles item insertion failures and further improves processing speed. We conduct extensive experiments, and the results show that CuckooGraph significantly reduces query time by four orders of magnitude on 1-hop successor and precursor queries compared to the state-of-the-art. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14578 [pdf, other]

Surge Phenomenon in Optimal Learning Rate and Batch Size Scaling

Authors: Shuaipeng Li, Penghao Zhao, Hailin Zhang, Xingwu Sun, Hao Wu, Dian Jiao, Weiyan Wang, Chengjun Liu, Zheng Fang, Jinbao Xue, Yangyu Tao, Bin Cui, Di Wang

Abstract: In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require c… ▽ More In current deep learning tasks, Adam style optimizers such as Adam, Adagrad, RMSProp, Adafactor, and Lion have been widely used as alternatives to SGD style optimizers. These optimizers typically update model parameters using the sign of gradients, resulting in more stable convergence curves. The learning rate and the batch size are the most critical hyperparameters for optimizers, which require careful tuning to enable effective convergence. Previous research has shown that the optimal learning rate increases linearly or follows similar rules with batch size for SGD style optimizers. However, this conclusion is not applicable to Adam style optimizers. In this paper, we elucidate the connection between optimal learning rates and batch sizes for Adam style optimizers through both theoretical analysis and extensive experiments. First, we raise the scaling law between batch sizes and optimal learning rates in the sign of gradient case, in which we prove that the optimal learning rate first rises and then falls as the batch size increases. Moreover, the peak value of the surge will gradually move toward the larger batch size as training progresses. Second, we conducted experiments on various CV and NLP tasks and verified the correctness of the scaling law. △ Less

Submitted 4 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.08672 [pdf, other]

EndoDAC: Efficient Adapting Foundation Model for Self-Supervised Depth Estimation from Any Endoscopic Camera

Authors: Beilei Cui, Mobarakol Islam, Long Bai, An Wang, Hongliang Ren

Abstract: Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adapt… ▽ More Depth estimation plays a crucial role in various tasks within endoscopic surgery, including navigation, surface reconstruction, and augmented reality visualization. Despite the significant achievements of foundation models in vision tasks, including depth estimation, their direct application to the medical domain often results in suboptimal performance. This highlights the need for efficient adaptation methods to adapt these models to endoscopic depth estimation. We propose Endoscopic Depth Any Camera (EndoDAC) which is an efficient self-supervised depth estimation framework that adapts foundation models to endoscopic scenes. Specifically, we develop the Dynamic Vector-Based Low-Rank Adaptation (DV-LoRA) and employ Convolutional Neck blocks to tailor the foundational model to the surgical domain, utilizing remarkably few trainable parameters. Given that camera information is not always accessible, we also introduce a self-supervised adaptation strategy that estimates camera intrinsics using the pose encoder. Our framework is capable of being trained solely on monocular surgical videos from any camera, ensuring minimal training costs. Experiments demonstrate that our approach obtains superior performance even with fewer training epochs and unaware of the ground truth camera intrinsics. Code is available at https://github.com/BeileiCui/EndoDAC. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: early accepted by MICCAI 2024

arXiv:2405.04114 [pdf, other]

Acceleration Algorithms in GNNs: A Survey

Authors: Lu Ma, Zeang Sheng, Xunkai Li, Xinyi Gao, Zhezheng Hao, Ling Yang, Wentao Zhang, Bin Cui

Abstract: Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the resear… ▽ More Graph Neural Networks (GNNs) have demonstrated effectiveness in various graph-based tasks. However, their inefficiency in training and inference presents challenges for scaling up to real-world and large-scale graph applications. To address the critical challenges, a range of algorithms have been proposed to accelerate training and inference of GNNs, attracting increasing attention from the research community. In this paper, we present a systematic review of acceleration algorithms in GNNs, which can be categorized into three main topics based on their purpose: training acceleration, inference acceleration, and execution acceleration. Specifically, we summarize and categorize the existing approaches for each main topic, and provide detailed characterizations of the approaches within each category. Additionally, we review several libraries related to acceleration algorithms in GNNs and discuss our Scalable Graph Learning (SGL) library. Finally, we propose promising directions for future research. A complete summary is presented in our GitHub repository: https://github.com/PKU-DAIR/SGL/blob/main/Awsome-GNN-Acceleration.md. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: 9 pages,3 figures

arXiv:2405.00263 [pdf, other]

Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge

Authors: Bin Xiao, Chunan Shi, Xiaonan Nie, Fan Yang, Xiangwei Deng, Lei Su, Weipeng Chen, Bin Cui

Abstract: Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its ti… ▽ More Large language models (LLMs) suffer from low efficiency as the mismatch between the requirement of auto-regressive decoding and the design of most contemporary GPUs. Specifically, billions to trillions of parameters must be loaded to the GPU cache through its limited memory bandwidth for computation, but only a small batch of tokens is actually computed. Consequently, the GPU spends most of its time on memory transfer instead of computation. Recently, parallel decoding, a type of speculative decoding algorithms, is becoming more popular and has demonstrated impressive efficiency improvement in generation. It introduces extra decoding heads to large models, enabling them to predict multiple subsequent tokens simultaneously and verify these candidate continuations in a single decoding step. However, this approach deviates from the training objective of next token prediction used during pre-training, resulting in a low hit rate for candidate tokens. In this paper, we propose a new speculative decoding algorithm, Clover, which integrates sequential knowledge into the parallel decoding process. This enhancement improves the hit rate of speculators and thus boosts the overall efficiency. Clover transmits the sequential knowledge from pre-speculated tokens via the Regressive Connection, then employs an Attention Decoder to integrate these speculated tokens. Additionally, Clover incorporates an Augmenting Block that modifies the hidden states to better align with the purpose of speculative generation rather than next token prediction. The experiment results demonstrate that Clover outperforms the baseline by up to 91% on Baichuan-Small and 146% on Baichuan-Large, respectively, and exceeds the performance of the previously top-performing method, Medusa, by up to 37% on Baichuan-Small and 57% on Baichuan-Large, respectively. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.17360 [pdf, other]

UniRGB-IR: A Unified Framework for Visible-Infrared Downstream Tasks via Adapter Tuning

Authors: Maoxun Yuan, Bo Cui, Tianyi Zhao, Xingxing Wei

Abstract: Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on… ▽ More Semantic analysis on visible (RGB) and infrared (IR) images has gained attention for its ability to be more accurate and robust under low-illumination and complex weather conditions. Due to the lack of pre-trained foundation models on the large-scale infrared image datasets, existing methods prefer to design task-specific frameworks and directly fine-tune them with pre-trained foundation models on their RGB-IR semantic relevance datasets, which results in poor scalability and limited generalization. In this work, we propose a scalable and efficient framework called UniRGB-IR to unify RGB-IR downstream tasks, in which a novel adapter is developed to efficiently introduce richer RGB-IR features into the pre-trained RGB-based foundation model. Specifically, our framework consists of a vision transformer (ViT) foundation model, a Multi-modal Feature Pool (MFP) module and a Supplementary Feature Injector (SFI) module. The MFP and SFI modules cooperate with each other as an adpater to effectively complement the ViT features with the contextual multi-scale features. During training process, we freeze the entire foundation model to inherit prior knowledge and only optimize the MFP and SFI modules. Furthermore, to verify the effectiveness of our framework, we utilize the ViT-Base as the pre-trained foundation model to perform extensive experiments. Experimental results on various RGB-IR downstream tasks demonstrate that our method can achieve state-of-the-art performance. The source code and results are available at https://github.com/PoTsui99/UniRGB-IR.git. △ Less

Submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.05648 [pdf, other]

Resistive Memory-based Neural Differential Equation Solver for Score-based Diffusion Model

Authors: Jichang Yang, Hegan Chen, Jia Chen, Songqi Wang, Shaocong Wang, Yifei Yu, Xi Chen, Bo Wang, Xinyuan Zhang, Binbin Cui, Yi Li, Ning Lin, Meng Xu, Yi Li, Xiaoxin Xu, Xiaojuan Qi, Zhongrui Wang, Xumeng Zhang, Dashan Shang, Han Wang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Abstract: Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated st… ▽ More Human brains image complicated scenes when reading a novel. Replicating this imagination is one of the ultimate goals of AI-Generated Content (AIGC). However, current AIGC methods, such as score-based diffusion, are still deficient in terms of rapidity and efficiency. This deficiency is rooted in the difference between the brain and digital computers. Digital computers have physically separated storage and processing units, resulting in frequent data transfers during iterative calculations, incurring large time and energy overheads. This issue is further intensified by the conversion of inherently continuous and analog generation dynamics, which can be formulated by neural differential equations, into discrete and digital operations. Inspired by the brain, we propose a time-continuous and analog in-memory neural differential equation solver for score-based diffusion, employing emerging resistive memory. The integration of storage and computation within resistive memory synapses surmount the von Neumann bottleneck, benefiting the generative speed and energy efficiency. The closed-loop feedback integrator is time-continuous, analog, and compact, physically implementing an infinite-depth neural network. Moreover, the software-hardware co-design is intrinsically robust to analog noise. We experimentally validate our solution with 180 nm resistive memory in-memory computing macros. Demonstrating equivalent generative quality to the software baseline, our system achieved remarkable enhancements in generative speed for both unconditional and conditional generation tasks, by factors of 64.8 and 156.5, respectively. Moreover, it accomplished reductions in energy consumption by factors of 5.2 and 4.1. Our approach heralds a new horizon for hardware solutions in edge computing for generative AI applications. △ Less

Submitted 8 April, 2024; originally announced April 2024.

arXiv:2403.07331 [pdf, other]

LIST: Learning to Index Spatio-Textual Data for Embedding based Spatial Keyword Queries

Authors: Ziqi Yin, Shanshan Feng, Shang Liu, Gao Cong, Yew Soon Ong, Bin Cui

Abstract: With the proliferation of spatio-textual data, Top-k KNN spatial keyword queries (TkQs), which return a list of objects based on a ranking function that evaluates both spatial and textual relevance, have found many real-life applications. Existing geo-textual indexes for TkQs use traditional retrieval models like BM25 to compute text relevance and usually exploit a simple linear function to comput… ▽ More With the proliferation of spatio-textual data, Top-k KNN spatial keyword queries (TkQs), which return a list of objects based on a ranking function that evaluates both spatial and textual relevance, have found many real-life applications. Existing geo-textual indexes for TkQs use traditional retrieval models like BM25 to compute text relevance and usually exploit a simple linear function to compute spatial relevance, but its effectiveness is limited. To improve effectiveness, several deep learning models have recently been proposed, but they suffer severe efficiency issues. To the best of our knowledge, there are no efficient indexes specifically designed to accelerate the top-k search process for these deep learning models. To tackle these issues, we propose a novel technique, which Learns to Index the Spatio-Textual data for answering embedding based spatial keyword queries (called LIST). LIST is featured with two novel components. Firstly, we propose a lightweight and effective relevance model that is capable of learning both textual and spatial relevance. Secondly, we introduce a novel machine learning based Approximate Nearest Neighbor Search (ANNS) index, which utilizes a new learning-to-cluster technique to group relevant queries and objects together while separating irrelevant queries and objects. Two key challenges in building an effective and efficient index are the absence of high-quality labels and unbalanced clustering results. We develop a novel pseudo-label generation technique to address the two challenges. Experimental results show that LIST significantly outperforms state-of-the-art methods on effectiveness, with improvements up to 19.21% and 12.79% in terms of NDCG@1 and Recall@10, and is three orders of magnitude faster than the most effective baseline. △ Less

Submitted 18 March, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2402.19473 [pdf, other]

Retrieval-Augmented Generation for AI-Generated Content: A Survey

Authors: Penghao Zhao, Hailin Zhang, Qinhan Yu, Zhengren Wang, Yunteng Geng, Fangcheng Fu, Ling Yang, Wentao Zhang, Jie Jiang, Bin Cui

Abstract: Advancements in model algorithms, the growth of foundational models, and access to high-quality datasets have propelled the evolution of Artificial Intelligence Generated Content (AIGC). Despite its notable successes, AIGC still faces hurdles such as updating knowledge, handling long-tail data, mitigating data leakage, and managing high training and inference costs. Retrieval-Augmented Generation… ▽ More Advancements in model algorithms, the growth of foundational models, and access to high-quality datasets have propelled the evolution of Artificial Intelligence Generated Content (AIGC). Despite its notable successes, AIGC still faces hurdles such as updating knowledge, handling long-tail data, mitigating data leakage, and managing high training and inference costs. Retrieval-Augmented Generation (RAG) has recently emerged as a paradigm to address such challenges. In particular, RAG introduces the information retrieval process, which enhances the generation process by retrieving relevant objects from available data stores, leading to higher accuracy and better robustness. In this paper, we comprehensively review existing efforts that integrate RAG technique into AIGC scenarios. We first classify RAG foundations according to how the retriever augments the generator, distilling the fundamental abstractions of the augmentation methodologies for various retrievers and generators. This unified perspective encompasses all RAG scenarios, illuminating advancements and pivotal technologies that help with potential future progress. We also summarize additional enhancements methods for RAG, facilitating effective engineering and implementation of RAG systems. Then from another view, we survey on practical applications of RAG across different modalities and tasks, offering valuable references for researchers and practitioners. Furthermore, we introduce the benchmarks for RAG, discuss the limitations of current RAG systems, and suggest potential directions for future research. Github: https://github.com/PKU-DAIR/RAG-Survey. △ Less

Submitted 21 June, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Citing 353 papers, 22 pages, 1 table, 12 figures. Project: https://github.com/PKU-DAIR/RAG-Survey

arXiv:2402.17563 [pdf, other]

Structure-Guided Adversarial Training of Diffusion Models

Authors: Ling Yang, Haotian Qian, Zhilong Zhang, Jingwei Liu, Bin Cui

Abstract: Diffusion models have demonstrated exceptional efficacy in various generative applications. While existing models focus on minimizing a weighted sum of denoising score matching losses for data distribution modeling, their training primarily emphasizes instance-level optimization, overlooking valuable structural information within each mini-batch, indicative of pair-wise relationships among samples… ▽ More Diffusion models have demonstrated exceptional efficacy in various generative applications. While existing models focus on minimizing a weighted sum of denoising score matching losses for data distribution modeling, their training primarily emphasizes instance-level optimization, overlooking valuable structural information within each mini-batch, indicative of pair-wise relationships among samples. To address this limitation, we introduce Structure-guided Adversarial training of Diffusion Models (SADM). In this pioneering approach, we compel the model to learn manifold structures between samples in each training batch. To ensure the model captures authentic manifold structures in the data distribution, we advocate adversarial training of the diffusion generator against a novel structure discriminator in a minimax game, distinguishing real manifold structures from the generated ones. SADM substantially improves existing diffusion transformers (DiT) and outperforms existing methods in image generation and cross-domain fine-tuning tasks across 12 datasets, establishing a new state-of-the-art FID of 1.58 and 2.11 on ImageNet for class-conditional image generation at resolutions of 256x256 and 512x512, respectively. △ Less

Submitted 4 March, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Accepted by CVPR 2024

arXiv:2402.16627 [pdf, other]

Contextualized Diffusion Models for Text-Guided Image and Video Generation

Authors: Ling Yang, Zhilong Zhang, Zhaochen Yu, Jingwei Liu, Minkai Xu, Stefano Ermon, Bin Cui

Abstract: Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual relationships exclusively into the reverse process, often disregarding their relevance in the forward process. This inconsistency between forward and reverse processes m… ▽ More Conditional diffusion models have exhibited superior performance in high-fidelity text-guided visual generation and editing. Nevertheless, prevailing text-guided visual diffusion models primarily focus on incorporating text-visual relationships exclusively into the reverse process, often disregarding their relevance in the forward process. This inconsistency between forward and reverse processes may limit the precise conveyance of textual semantics in visual synthesis results. To address this issue, we propose a novel and general contextualized diffusion model (ContextDiff) by incorporating the cross-modal context encompassing interactions and alignments between text condition and visual sample into forward and reverse processes. We propagate this context to all timesteps in the two processes to adapt their trajectories, thereby facilitating cross-modal conditional modeling. We generalize our contextualized diffusion to both DDPMs and DDIMs with theoretical derivations, and demonstrate the effectiveness of our model in evaluations with two challenging tasks: text-to-image generation, and text-to-video editing. In each task, our ContextDiff achieves new state-of-the-art performance, significantly enhancing the semantic alignment between text condition and generated samples, as evidenced by quantitative and qualitative evaluations. Our code is available at https://github.com/YangLing0818/ContextDiff △ Less

Submitted 3 June, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

Comments: ICLR 2024. Project: https://github.com/YangLing0818/ContextDiff

arXiv:2402.12908 [pdf, other]

RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models

Authors: Xinchen Zhang, Ling Yang, Yaqi Cai, Zhaochen Yu, Kai-Ni Wang, Jiake Xie, Ye Tian, Minkai Xu, Yong Tang, Yujiu Yang, Bin Cui

Abstract: Diffusion models have achieved remarkable advancements in text-to-image generation. However, existing models still have many difficulties when faced with multiple-object compositional generation. In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-a… ▽ More Diffusion models have achieved remarkable advancements in text-to-image generation. However, existing models still have many difficulties when faced with multiple-object compositional generation. In this paper, we propose RealCompo, a new training-free and transferred-friendly text-to-image generation framework, which aims to leverage the respective advantages of text-to-image models and spatial-aware image diffusion models (e.g., layout, keypoints and segmentation maps) to enhance both realism and compositionality of the generated images. An intuitive and novel balancer is proposed to dynamically balance the strengths of the two models in denoising process, allowing plug-and-play use of any model without extra training. Extensive experiments show that our RealCompo consistently outperforms state-of-the-art text-to-image models and spatial-aware image diffusion models in multiple-object compositional generation while keeping satisfactory realism and compositionality of the generated images. Notably, our RealCompo can be seamlessly extended with a wide range of spatial-aware image diffusion models and stylized diffusion models. Our code is available at: https://github.com/YangLing0818/RealCompo △ Less

Submitted 24 May, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Project: https://github.com/YangLing0818/RealCompo

arXiv:2402.08519 [pdf, other]

Hyperballistic transport in dense ionized matter under external AC electric fields

Authors: Daniele Gamba, Bingyu Cui, Alessio Zaccone

Abstract: The Langevin equation is ubiquitously employed to numerically simulate plasmas and dusty plasmas. However, the usual assumption of white noise becomes untenable when the system is subject to an external AC electric field. This is because the charged particles in the plasma, which provide the thermal bath for the particle transport, become themselves responsive to the AC field and the thermal noise… ▽ More The Langevin equation is ubiquitously employed to numerically simulate plasmas and dusty plasmas. However, the usual assumption of white noise becomes untenable when the system is subject to an external AC electric field. This is because the charged particles in the plasma, which provide the thermal bath for the particle transport, become themselves responsive to the AC field and the thermal noise is field-dependent and non-Markovian. We theoretically study the particle diffusivity in a Langevin transport model for a tagged charged particle immersed in a dense plasma of charged particles that act as the thermal bath, under an external AC electric field, by properly accounting for the effects of the AC field on the thermal bath statistics. We analytically derive the time-dependent generalized diffusivity $D(t)$ for different initial conditions. The generalized diffusivity exhibits damped oscillatory-like behaviour with initial very large peaks, where the generalized diffusion coefficient is enhanced by orders of magnitude with respect to the infinite-time steady-state value. The latter coincides with the Stokes-Einstein diffusivity in the absence of external field. For initial conditions where the external field is already on at $t=0$ and the system is thermalized under DC conditions for $t \leq 0$, the short-time behaviour is hyperballistic, $MSD \sim t^4$ (where MSD is the mean-squared displacement), leading to giant enhancement of the particle transport. Finally, the theory elucidates the role of medium polarization on the local Lorentz field, and allows for estimates of the effective electric charge due to polarization by the surrounding charges. △ Less

Submitted 20 February, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

arXiv:2401.16416 [pdf, other]

Endo-4DGS: Endoscopic Monocular Scene Reconstruction with 4D Gaussian Splatting

Authors: Yiming Huang, Beilei Cui, Long Bai, Ziqi Guo, Mengya Xu, Mobarakol Islam, Hongliang Ren

Abstract: In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previo… ▽ More In the realm of robot-assisted minimally invasive surgery, dynamic scene reconstruction can significantly enhance downstream tasks and improve surgical outcomes. Neural Radiance Fields (NeRF)-based methods have recently risen to prominence for their exceptional ability to reconstruct scenes but are hampered by slow inference speed, prolonged training, and inconsistent depth estimation. Some previous work utilizes ground truth depth for optimization but is hard to acquire in the surgical domain. To overcome these obstacles, we present Endo-4DGS, a real-time endoscopic dynamic reconstruction approach that utilizes 3D Gaussian Splatting (GS) for 3D representation. Specifically, we propose lightweight MLPs to capture temporal dynamics with Gaussian deformation fields. To obtain a satisfactory Gaussian Initialization, we exploit a powerful depth estimation foundation model, Depth-Anything, to generate pseudo-depth maps as a geometry prior. We additionally propose confidence-guided learning to tackle the ill-pose problems in monocular depth estimation and enhance the depth-guided reconstruction with surface normal constraints and depth regularization. Our approach has been validated on two surgical datasets, where it can effectively render in real-time, compute efficiently, and reconstruct with remarkable accuracy. △ Less

Submitted 2 April, 2024; v1 submitted 29 January, 2024; originally announced January 2024.

arXiv:2401.14044 [pdf]

Electrical switching of the perpendicular Neel order in a collinear antiferromagnet

Authors: Wenqing He, Tianyi Zhang, Yongjian Zhou, Caihua Wan, Hao Wu, Baoshan Cui, Jihao Xia, Ran Zhang, Tengyu Guo, Peng Chen, Mingkun Zhao, Leina Jiang, Alexander Grutter, Purnima P. Balakrishnan, Andrew J. Caruana, Christy J. Kinane, Sean Langridge, Guoqiang Yu, Cheng Song, Xiufeng Han

Abstract: Electrical manipulation of magnetic order by current-induced spin torques lays the foundation for spintronics. One promising approach is encoding information in the Néel vector of antiferromagnetic (AFM) materials, particularly to collinear antiferromagnets with the perpendicular magnetic anisotropy (PMA), as the negligible stray fields and terahertz spin dynamics can enable memory devices with hi… ▽ More Electrical manipulation of magnetic order by current-induced spin torques lays the foundation for spintronics. One promising approach is encoding information in the Néel vector of antiferromagnetic (AFM) materials, particularly to collinear antiferromagnets with the perpendicular magnetic anisotropy (PMA), as the negligible stray fields and terahertz spin dynamics can enable memory devices with higher integration density and ultrafast speed. Here we demonstrate that the Néel order information in a prototypical collinear AFM insulator with PMA, Cr2O3, can be reliably readout via the anomalous Hall effect and efficiently switched by the spin-orbit torque (SOT) effect with a low current density of 5.8*106 A/cm2. Moreover, using Cr2O3 as a mediator, we electrically switch the magnetization of a Y3Fe5O12 film exchange-coupled to the Cr2O3 layer, unambiguously confirming the Néel order switching of the Cr2O3 layer. This work provides a significant basis for developing AFM memory devices based on collinear AFM materials with PMA. △ Less

Submitted 25 January, 2024; originally announced January 2024.

arXiv:2401.11708 [pdf, other]

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui

Abstract: Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessin… ▽ More Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessing the powerful chain-of-thought reasoning ability of multimodal LLMs to enhance the compositionality of text-to-image diffusion models. Our approach employs the MLLM as a global planner to decompose the process of generating complex images into multiple simpler generation tasks within subregions. We propose complementary regional diffusion to enable region-wise compositional generation. Furthermore, we integrate text-guided image generation and editing within the proposed RPG in a closed-loop fashion, thereby enhancing generalization ability. Extensive experiments demonstrate our RPG outperforms state-of-the-art text-to-image diffusion models, including DALL-E 3 and SDXL, particularly in multi-category object composition and text-image semantic alignment. Notably, our RPG framework exhibits wide compatibility with various MLLM architectures (e.g., MiniGPT-4) and diffusion backbones (e.g., ControlNet). Our code is available at: https://github.com/YangLing0818/RPG-DiffusionMaster △ Less

Submitted 5 May, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: ICML 2024. Project: https://github.com/YangLing0818/RPG-DiffusionMaster

arXiv:2401.06013 [pdf, other]

Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

Authors: Beilei Cui, Mobarakol Islam, Long Bai, Hongliang Ren

Abstract: Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoR… ▽ More Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth estimation (e.g., DINOv2), recent works observed its limitations in medical and surgical domain-specific applications. This work presents a low-ranked adaptation (LoRA) of the foundation model for surgical depth estimation. Methods: We design a foundation model-based depth estimation method, referred to as Surgical-DINO, a low-rank adaptation of the DINOv2 for depth estimation in endoscopic surgery. We build LoRA layers and integrate them into DINO to adapt with surgery-specific domain knowledge instead of conventional fine-tuning. During training, we freeze the DINO image encoder, which shows excellent visual representation capacity, and only optimize the LoRA layers and depth decoder to integrate features from the surgical scene. Results: Our model is extensively validated on a MICCAI challenge dataset of SCARED, which is collected from da Vinci Xi endoscope surgery. We empirically show that Surgical-DINO significantly outperforms all the state-of-the-art models in endoscopic depth estimation tasks. The analysis with ablation studies has shown evidence of the remarkable effect of our LoRA layers and adaptation. Conclusion: Surgical-DINO shed some light on the successful adaptation of the foundation models into the surgical domain for depth estimation. There is clear evidence in the results that zero-shot prediction on pre-trained weights in computer vision datasets or naive fine-tuning is not sufficient to use the foundation model in the surgical domain directly. Code is available at https://github.com/BeileiCui/SurgicalDINO. △ Less

Submitted 12 January, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

Comments: Accepted by IPCAI 2024 (IJCAR Special Issue)

arXiv:2401.02015 [pdf, other]

Improving Diffusion-Based Image Synthesis with Context Prediction

Authors: Ling Yang, Jingwei Liu, Shenda Hong, Zhilong Zhang, Zhilin Huang, Zheming Cai, Wentao Zhang, Bin Cui

Abstract: Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its… ▽ More Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its neighborhood context, impairing diffusion-based image synthesis. As a powerful source of automatic supervisory signal, context has been well studied for learning representations. Inspired by this, we for the first time propose ConPreDiff to improve diffusion-based image synthesis with context prediction. We explicitly reinforce each point to predict its neighborhood context (i.e., multi-stride features/tokens/pixels) with a context decoder at the end of diffusion denoising blocks in training stage, and remove the decoder for inference. In this way, each point can better reconstruct itself by preserving its semantic connections with neighborhood context. This new paradigm of ConPreDiff can generalize to arbitrary discrete and continuous diffusion backbones without introducing extra parameters in sampling procedure. Extensive experiments are conducted on unconditional image generation, text-to-image generation and image inpainting tasks. Our ConPreDiff consistently outperforms previous methods and achieves a new SOTA text-to-image generation results on MS-COCO, with a zero-shot FID score of 6.21. △ Less

Submitted 3 January, 2024; originally announced January 2024.

Comments: Accepted by NeurIPS 2023

arXiv:2312.10864 [pdf, ps, other]

On-Device Recommender Systems: A Tutorial on The New-Generation Recommendation Paradigm

Authors: Hongzhi Yin, Tong Chen, Liang Qu, Bin Cui

Abstract: Given the sheer volume of contemporary e-commerce applications, recommender systems (RSs) have gained significant attention in both academia and industry. However, traditional cloud-based RSs face inevitable challenges, such as resource-intensive computation, reliance on network access, and privacy breaches. In response, a new paradigm called on-device recommender systems (ODRSs) has emerged recen… ▽ More Given the sheer volume of contemporary e-commerce applications, recommender systems (RSs) have gained significant attention in both academia and industry. However, traditional cloud-based RSs face inevitable challenges, such as resource-intensive computation, reliance on network access, and privacy breaches. In response, a new paradigm called on-device recommender systems (ODRSs) has emerged recently in various industries like Taobao, Google, and Kuaishou. ODRSs unleash the computational capacity of user devices with lightweight recommendation models tailored for resource-constrained environments, enabling real-time inference with users' local data. This tutorial aims to systematically introduce methodologies of ODRSs, including (1) an overview of existing research on ODRSs; (2) a comprehensive taxonomy of ODRSs, where the core technical content to be covered span across three major ODRS research directions, including on-device deployment and inference, on-device training, and privacy/security of ODRSs; (3) limitations and future directions of ODRSs. This tutorial expects to lay the foundation and spark new insights for follow-up research and applications concerning this new recommendation paradigm. △ Less

Submitted 17 December, 2023; originally announced December 2023.

Comments: Technical tutorial; to appear at The Web Conference 2024

arXiv:2312.03256 [pdf, other]

CAFE: Towards Compact, Adaptive, and Fast Embedding for Large-scale Recommendation Models

Authors: Hailin Zhang, Zirui Liu, Boxuan Chen, Yikai Zhao, Tong Zhao, Tong Yang, Bin Cui

Abstract: Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embed… ▽ More Recently, the growing memory demands of embedding tables in Deep Learning Recommendation Models (DLRMs) pose great challenges for model training and deployment. Existing embedding compression solutions cannot simultaneously meet three key design requirements: memory efficiency, low latency, and adaptability to dynamic data distribution. This paper presents CAFE, a Compact, Adaptive, and Fast Embedding compression framework that addresses the above requirements. The design philosophy of CAFE is to dynamically allocate more memory resources to important features (called hot features), and allocate less memory to unimportant ones. In CAFE, we propose a fast and lightweight sketch data structure, named HotSketch, to capture feature importance and report hot features in real time. For each reported hot feature, we assign it a unique embedding. For the non-hot features, we allow multiple features to share one embedding by using hash embedding technique. Guided by our design philosophy, we further propose a multi-level hash embedding framework to optimize the embedding tables of non-hot features. We theoretically analyze the accuracy of HotSketch, and analyze the model convergence against deviation. Extensive experiments show that CAFE significantly outperforms existing embedding compression methods, yielding 3.92% and 3.68% superior testing AUC on Criteo Kaggle dataset and CriteoTB dataset at a compression ratio of 10000x. The source codes of CAFE are available at GitHub. △ Less

Submitted 26 March, 2024; v1 submitted 5 December, 2023; originally announced December 2023.

arXiv:2311.18244 [pdf, other]

Unveiling Vulnerabilities of Contrastive Recommender Systems to Poisoning Attacks

Authors: Zongwei Wang, Junliang Yu, Min Gao, Hongzhi Yin, Bin Cui, Shazia Sadiq

Abstract: Contrastive learning (CL) has recently gained prominence in the domain of recommender systems due to its great ability to enhance recommendation accuracy and improve model robustness. Despite its advantages, this paper identifies a vulnerability of CL-based recommender systems that they are more susceptible to poisoning attacks aiming to promote individual items. Our analysis indicates that this v… ▽ More Contrastive learning (CL) has recently gained prominence in the domain of recommender systems due to its great ability to enhance recommendation accuracy and improve model robustness. Despite its advantages, this paper identifies a vulnerability of CL-based recommender systems that they are more susceptible to poisoning attacks aiming to promote individual items. Our analysis indicates that this vulnerability is attributed to the uniform spread of representations caused by the InfoNCE loss. Furthermore, theoretical and empirical evidence shows that optimizing this loss favors smooth spectral values of representations. This finding suggests that attackers could facilitate this optimization process of CL by encouraging a more uniform distribution of spectral values, thereby enhancing the degree of representation dispersion. With these insights, we attempt to reveal a potential poisoning attack against CL-based recommender systems, which encompasses a dual-objective framework: one that induces a smoother spectral value distribution to amplify the InfoNCE loss's inherent dispersion effect, named dispersion promotion; and the other that directly elevates the visibility of target items, named rank promotion. We validate the threats of our attack model through extensive experimentation on four datasets. By shedding light on these vulnerabilities, our goal is to advance the development of more robust CL-based recommender systems. The code is available at \url{https://github.com/CoderWZW/ARLib}. △ Less

Submitted 25 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

Comments: 12 pages, 7 figures

arXiv:2311.15578 [pdf, other]

Experimental Analysis of Large-scale Learnable Vector Storage Compression

Authors: Hailin Zhang, Penghao Zhao, Xupeng Miao, Yingxia Shao, Zirui Liu, Tong Yang, Bin Cui

Abstract: Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of mo… ▽ More Learnable embedding vector is one of the most important applications in machine learning, and is widely used in various database-related domains. However, the high dimensionality of sparse data in recommendation tasks and the huge volume of corpus in retrieval-related tasks lead to a large memory consumption of the embedding table, which poses a great challenge to the training and deployment of models. Recent research has proposed various methods to compress the embeddings at the cost of a slight decrease in model quality or the introduction of other overheads. Nevertheless, the relative performance of these methods remains unclear. Existing experimental comparisons only cover a subset of these methods and focus on limited metrics. In this paper, we perform a comprehensive comparative analysis and experimental evaluation of embedding compression. We introduce a new taxonomy that categorizes these techniques based on their characteristics and methodologies, and further develop a modular benchmarking framework that integrates 14 representative methods. Under a uniform test environment, our benchmark fairly evaluates each approach, presents their strengths and weaknesses under different memory budgets, and recommends the best method based on the use case. In addition to providing useful guidelines, our study also uncovers the limitations of current methods and suggests potential directions for future research. △ Less

Submitted 13 February, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

arXiv:2311.15566 [pdf, other]

SpotServe: Serving Generative Large Language Models on Preemptible Instances

Authors: Xupeng Miao, Chunan Shi, Jiangfei Duan, Xiaoli Xi, Dahua Lin, Bin Cui, Zhihao Jia

Abstract: The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time. Serving LLMs on pre… ▽ More The high computational and memory requirements of generative large language models (LLMs) make it challenging to serve them cheaply. This paper aims to reduce the monetary cost for serving LLMs by leveraging preemptible GPU instances on modern clouds, which offer accesses to spare GPUs at a much cheaper price than regular instances but may be preempted by the cloud at any time. Serving LLMs on preemptible instances requires addressing challenges induced by frequent instance preemptions and the necessity of migrating instances to handle these preemptions. This paper presents SpotServe, the first distributed LLM serving system on preemptible instances. Several key techniques in SpotServe realize fast and reliable serving of generative LLMs on cheap preemptible instances. First, SpotServe dynamically adapts the LLM parallelization configuration for dynamic instance availability and fluctuating workload, while balancing the trade-off among the overall throughput, inference latency and monetary costs. Second, to minimize the cost of migrating instances for dynamic reparallelization, the task of migrating instances is formulated as a bipartite graph matching problem, which uses the Kuhn-Munkres algorithm to identify an optimal migration plan that minimizes communications. Finally, to take advantage of the grace period offered by modern clouds, we introduce stateful inference recovery, a new inference mechanism that commits inference progress at a much finer granularity and allows SpotServe to cheaply resume inference upon preemption. We evaluate on real spot instance preemption traces and various popular LLMs and show that SpotServe can reduce the P99 tail latency by 2.4 - 9.1x compared with the best existing LLM serving systems. We also show that SpotServe can leverage the price advantage of preemptive instances, saving 54% monetary cost compared with only using on-demand instances. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: ASPLOS 2024

arXiv:2311.15506 [pdf]

Compact Electrochromic Optical Recording of Bioelectric Potentials

Authors: Kenneth Nakasone, Chris Zavik, Erica Liu, Burhan Ahmed, Dana Griffith, Lothar Maisenbacher, Ashwin Singh, Yuecheng Zhou, Bianxiao Cui, Holger Müller

Abstract: Electrochromic optical recording (ECORE) is a label-free method that utilizes electrochromism to optically detect electrical signals in biological cells with a high signal-to-noise ratio and is suitable for long-term recording. However, ECORE usually requires a large and intricate optical setup, making it relatively difficult to transport and to study specimens on a large scale. Here, we present a… ▽ More Electrochromic optical recording (ECORE) is a label-free method that utilizes electrochromism to optically detect electrical signals in biological cells with a high signal-to-noise ratio and is suitable for long-term recording. However, ECORE usually requires a large and intricate optical setup, making it relatively difficult to transport and to study specimens on a large scale. Here, we present a compact ECORE apparatus that drastically reduces the spatial footprint and complexity of the ECORE setup whilst maintaining high sensitivity. An autobalancing differential photodetector automates common-mode noise rejection, removing the need for manually adjustable optics, and a compact laser module conserves space compared to a typical laser mount. The result is a simple, easy-to-use, and relatively low cost system that achieves a sensitivity of 16.7 μV (within a factor of 5 of the shot noise limit), and reliably detects action potentials from Human-induced pluripotent stem cell (HiPSC) derived cardiomyocytes. This setup can be further improved to within 1.5 dB of the shot noise limit by filtering out power-line interference. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 11 pages, 4 figures

arXiv:2311.11263 [pdf, ps, other]

Effect of the entropy on the shear viscosity of metallic glasses near the glass transition

Authors: A. S. Makarov, J. B. Cui, J. C. Qiao, G. V. Afonin, N. P. Kobelev, V. A. Khonik

Abstract: We measured the shear viscosity of 14 metallic glasses differing with their mixing entropy $ΔS_{mix}$. It is found that the viscosity at the glass transition temperature $T_g$ significantly increases with $ΔS_{mix}$. Using calorimetric data, we calculated the excess entropy of all glasses $ΔS$ with respect to their maternal crystalline states as a function of temperature. It is shown that the exce… ▽ More We measured the shear viscosity of 14 metallic glasses differing with their mixing entropy $ΔS_{mix}$. It is found that the viscosity at the glass transition temperature $T_g$ significantly increases with $ΔS_{mix}$. Using calorimetric data, we calculated the excess entropy of all glasses $ΔS$ with respect to their maternal crystalline states as a function of temperature. It is shown that the excess entropy $ΔS$ both at room temperature and at $T_g$ \textit{decreases} with $ΔS_{mix}$. It is concluded that glasses with "high mixing entropy" $ΔS_{mix}$ correspond to MGs with \textit{low} excess entropy $ΔS$. The origin of the increased shear viscosity at $T_g$ of glasses with high $ΔS_{mix}$ is determined by their reduced excess entropy $ΔS$. △ Less

Submitted 19 November, 2023; originally announced November 2023.

Comments: 12 pages, 4 Figures

arXiv:2311.07858 [pdf]

doi 10.1038/s41467-024-47133-7

Large-area, freestanding single-crystal gold of single nanometer thickness

Authors: Chenxinyu Pan, Yuanbiao Tong, Haoliang Qian, Alexey V. Krasavin, Jialin Li, Jiajie Zhu, Yiyun Zhang, Bowen Cui, Zhiyong Li, Chenming Wu, Zhenxin Wang, Lufang Liu, Linjun Li, Xin Guo, Anatoly V. Zayats, Limin Tong, Pan Wang

Abstract: Two-dimensional single-crystal metals are highly sought after for next-generation technologies. Here, we report large-area (>10^4 μm2), single-crystal two-dimensional gold with thicknesses down to a single-nanometer level, employing an atomic-level-precision chemical etching approach. The ultrathin thickness and single-crystal quality endow two-dimensional gold with unique properties including sig… ▽ More Two-dimensional single-crystal metals are highly sought after for next-generation technologies. Here, we report large-area (>10^4 μm2), single-crystal two-dimensional gold with thicknesses down to a single-nanometer level, employing an atomic-level-precision chemical etching approach. The ultrathin thickness and single-crystal quality endow two-dimensional gold with unique properties including significantly quantum-confinement-augmented optical nonlinearity, low sheet resistance, high transparency and excellent mechanical flexibility. By patterning the two-dimensional gold into nanoribbon arrays, extremely-confined near-infrared plasmonic resonances are further demonstrated with quality factors up to 5. The freestanding nature of two-dimensional gold allows its straightforward manipulation and transfer-printing for integration with other structures. The developed two-dimensional gold provides an emerging platform for fundamental studies in various disciplines and opens up new opportunities for applications in high-performance ultrathin optoelectronic, photonic and quantum devices. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Journal ref: Nature Commun. 15 (2024) 2840-2849

arXiv:2311.07164 [pdf, other]

Pruning random resistive memory for optimizing analogue AI

Authors: Yi Li, Songqi Wang, Yaping Zhao, Shaocong Wang, Woyu Zhang, Yangu He, Ning Lin, Binbin Cui, Xi Chen, Shiming Zhang, Hao Jiang, Peng Lin, Xumeng Zhang, Xiaojuan Qi, Zhongrui Wang, Xiaoxin Xu, Dashan Shang, Qi Liu, Kwang-Ting Cheng, Ming Liu

Abstract: The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic device… ▽ More The rapid advancement of artificial intelligence (AI) has been marked by the large language models exhibiting human-like intelligence. However, these models also present unprecedented challenges to energy consumption and environmental sustainability. One promising solution is to revisit analogue computing, a technique that predates digital computing and exploits emerging analogue electronic devices, such as resistive memory, which features in-memory computing, high scalability, and nonvolatility. However, analogue computing still faces the same challenges as before: programming nonidealities and expensive programming due to the underlying devices physics. Here, we report a universal solution, software-hardware co-design using structural plasticity-inspired edge pruning to optimize the topology of a randomly weighted analogue resistive memory neural network. Software-wise, the topology of a randomly weighted neural network is optimized by pruning connections rather than precisely tuning resistive memory weights. Hardware-wise, we reveal the physical origin of the programming stochasticity using transmission electron microscopy, which is leveraged for large-scale and low-cost implementation of an overparameterized random neural network containing high-performance sub-networks. We implemented the co-design on a 40nm 256K resistive memory macro, observing 17.3% and 19.9% accuracy improvements in image and audio classification on FashionMNIST and Spoken digits datasets, as well as 9.8% (2%) improvement in PR (ROC) in image segmentation on DRIVE datasets, respectively. This is accompanied by 82.1%, 51.2%, and 99.8% improvement in energy efficiency thanks to analogue in-memory computing. By embracing the intrinsic stochasticity and in-memory computing, this work may solve the biggest obstacle of analogue computing systems and thus unleash their immense potential for next-generation AI hardware. △ Less

Submitted 13 November, 2023; originally announced November 2023.

arXiv:2310.10998 [pdf, other]

Accelerating Scalable Graph Neural Network Inference with Node-Adaptive Propagation

Authors: Xinyi Gao, Wentao Zhang, Junliang Yu, Yingxia Shao, Quoc Viet Hung Nguyen, Bin Cui, Hongzhi Yin

Abstract: Graph neural networks (GNNs) have exhibited exceptional efficacy in a diverse array of applications. However, the sheer size of large-scale graphs presents a significant challenge to real-time inference with GNNs. Although existing Scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure, these methods still suffer from scalability is… ▽ More Graph neural networks (GNNs) have exhibited exceptional efficacy in a diverse array of applications. However, the sheer size of large-scale graphs presents a significant challenge to real-time inference with GNNs. Although existing Scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure, these methods still suffer from scalability issues when making inferences on unseen nodes, as the feature preprocessing requires the graph to be known and fixed. To further accelerate Scalable GNNs inference in this inductive setting, we propose an online propagation framework and two novel node-adaptive propagation methods that can customize the optimal propagation depth for each node based on its topological information and thereby avoid redundant feature propagation. The trade-off between accuracy and latency can be flexibly managed through simple hyper-parameters to accommodate various latency constraints. Moreover, to compensate for the inference accuracy loss caused by the potential early termination of propagation, we further propose Inception Distillation to exploit the multi-scale receptive field information within graphs. The rigorous and comprehensive experimental study on public datasets with varying scales and characteristics demonstrates that the proposed inference acceleration framework outperforms existing state-of-the-art graph inference acceleration methods in terms of accuracy and efficiency. Particularly, the superiority of our approach is notable on datasets with larger scales, yielding a 75x inference speedup on the largest Ogbn-products dataset. △ Less

Submitted 9 December, 2023; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: 2024 IEEE 40th International Conference on Data Engineering (ICDE). arXiv admin note: substantial text overlap with arXiv:2211.00495

arXiv:2310.05529 [pdf, other]

Distribution System Flexibility Characterization: A Network-Informed Data-Driven Approach

Authors: Qi Li, Jianzhe Liu, Bai Cui, Wenzhan Song, Jin Ye

Abstract: A distribution system can flexibly adjust its substation-level power output by aggregating its local distributed energy resources (DERs). Due to DER and network constraints, characterizing the exact feasible power output region is computationally intensive. Hence, existing results usually rely on unpractical assumptions or suffer from conservativeness issues. Sampling-based data-driven methods can… ▽ More A distribution system can flexibly adjust its substation-level power output by aggregating its local distributed energy resources (DERs). Due to DER and network constraints, characterizing the exact feasible power output region is computationally intensive. Hence, existing results usually rely on unpractical assumptions or suffer from conservativeness issues. Sampling-based data-driven methods can potentially address these limitations. Still, existing works usually exhibit computational inefficiency issues as they use a random sampling approach, which carries little information from network physics and provides few insights into the iterative search process. This letter proposes a novel network-informed data-driven method to close this gap. A computationally efficient data sampling approach is developed to obtain high-quality training data, leveraging network information and legacy learning experience. Then, a classifier is trained to estimate the feasible power output region with high accuracy. Numerical studies based on a real-world Southern California Edison network validate the performance of the proposed work. △ Less

Submitted 9 October, 2023; originally announced October 2023.

arXiv:2310.01047 [pdf, other]

Charge equilibration of Laser-accelerated Carbon Ions in Foam Target

Authors: Bubo Ma, Jieru Ren, Lirong Liu, Wenqing Wei, Benzheng Chen, Shizheng Zhang, Hao Xu, Zhongmin Hu, Fangfang Li, Xing Wang, Shuai Yin, Jianhua Feng, Xianming Zhou, Yifang Gao, Yuan Li, Xiaohua Shi, Jianxing Li, Xueguang Ren, Zhongfeng Xu, Zhigang Deng, Wei Qi, Shaoyi Wang, Quanping Fan, Bo Cui, Weiwu Wang , et al. (17 additional authors not shown)

Abstract: The charge equilibration of laser-accelerated carbon ion beams in 2 mg/cm3 foam target was investigated experimentally. The ions were generated through target normal sheath acceleration mechanism in laser-foil interaction scheme. This allows to get the equilibrium charge state in wide energy range near Bragg peak within a single shot. By using foam, the charge equilibration measurement in density… ▽ More The charge equilibration of laser-accelerated carbon ion beams in 2 mg/cm3 foam target was investigated experimentally. The ions were generated through target normal sheath acceleration mechanism in laser-foil interaction scheme. This allows to get the equilibrium charge state in wide energy range near Bragg peak within a single shot. By using foam, the charge equilibration measurement in density regime between gas and solid state was firstly reached out experimentally. It was found that the theoretical predictions with tabulated cross section data for gas target greatly underestimated the charge states. The experimental data are in close agreement with both semi-empirical formula as well as rate equation predictions based on ion-solid interactions. The important role of target density effects that increase the ionization probability and decrease the electron capture probability through frequent multi-collisions in foam are demonstrated. The double electron processes are shown to have little influence on the average charge states. The findings are essential for high energy density physics research where the foams are widely used, and have impacts on a broad range of applications in medical, biological and material fields. The method also provides a new approach to investigate the interaction mechanism of swift heavy ions in matter by taking advantage of the laser-accelerated short-pulse wide-energy range ions. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2309.15675 [pdf, other]

SJTU-TMQA: A quality assessment database for static mesh with texture map

Authors: Bingyang Cui, Qi Yang, Kaifa Yang, Yiling Xu, Xiaozhong Xu, Shan Liu

Abstract: In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compressi… ▽ More In recent years, static meshes with texture maps have become one of the most prevalent digital representations of 3D shapes in various applications, such as animation, gaming, medical imaging, and cultural heritage applications. However, little research has been done on the quality assessment of textured meshes, which hinders the development of quality-oriented applications, such as mesh compression and enhancement. In this paper, we create a large-scale textured mesh quality assessment database, namely SJTU-TMQA, which includes 21 reference meshes and 945 distorted samples. The meshes are rendered into processed video sequences and then conduct subjective experiments to obtain mean opinion scores (MOS). The diversity of content and accuracy of MOS has been shown to validate its heterogeneity and reliability. The impact of various types of distortion on human perception is demonstrated. 13 state-of-the-art objective metrics are evaluated on SJTU-TMQA. The results report the highest correlation of around 0.6, indicating the need for more effective objective metrics. The SJTU-TMQA is available at https://ccccby.github.io △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.13335 [pdf, other]

Model-enhanced Vector Index

Authors: Hailin Zhang, Yujing Wang, Qi Chen, Ruiheng Chang, Ting Zhang, Ziming Miao, Yingyan Hou, Yang Ding, Xupeng Miao, Haonan Wang, Bochen Pang, Yuefeng Zhan, Hao Sun, Weiwei Deng, Qi Zhang, Fan Yang, Xing Xie, Mao Yang, Bin Cui

Abstract: Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to sup… ▽ More Embedding-based retrieval methods construct vector indices to search for document representations that are most similar to the query representations. They are widely used in document retrieval due to low latency and decent recall performance. Recent research indicates that deep retrieval solutions offer better model quality, but are hindered by unacceptable serving latency and the inability to support document updates. In this paper, we aim to enhance the vector index with end-to-end deep generative models, leveraging the differentiable advantages of deep retrieval models while maintaining desirable serving efficiency. We propose Model-enhanced Vector Index (MEVI), a differentiable model-enhanced index empowered by a twin-tower representation model. MEVI leverages a Residual Quantization (RQ) codebook to bridge the sequence-to-sequence deep retrieval and embedding-based models. To substantially reduce the inference time, instead of decoding the unique document ids in long sequential steps, we first generate some semantic virtual cluster ids of candidate documents in a small number of steps, and then leverage the well-adapted embedding vectors to further perform a fine-grained search for the relevant documents in the candidate virtual clusters. We empirically show that our model achieves better performance on the commonly used academic benchmarks MSMARCO Passage and Natural Questions, with comparable serving latency to dense retrieval solutions. △ Less

Submitted 9 November, 2023; v1 submitted 23 September, 2023; originally announced September 2023.

arXiv:2309.13169 [pdf, other]

Cloudy Forecast: How Predictable is Communication Latency in the Cloud?

Authors: Owen Hilyard, Bocheng Cui, Marielle Webster, Abishek Bangalore Muralikrishna, Aleksey Charapko

Abstract: Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components to communicate on time -- a delay in communication may trigger the failure detector or cause the syst… ▽ More Many systems and services rely on timing assumptions for performance and availability to perform critical aspects of their operation, such as various timeouts for failure detectors or optimizations to concurrency control mechanisms. Many such assumptions rely on the ability of different components to communicate on time -- a delay in communication may trigger the failure detector or cause the system to enter a less-optimized execution mode. Unfortunately, these timing assumptions are often set with little regard to actual communication guarantees of the underlying infrastructure -- in particular, the variability of communication delays between processes in different nodes/servers. The higher communication variability holds especially true for systems deployed in the public cloud since the cloud is a utility shared by many users and organizations, making it prone to higher performance variance due to noisy neighbor syndrome. In this work, we present Cloud Latency Tester (CLT), a simple tool that can help measure the variability of communication delays between nodes to help engineers set proper values for their timing assumptions. We also provide our observational analysis of running CLT in three major cloud providers and share the lessons we learned. △ Less

Submitted 22 September, 2023; originally announced September 2023.

arXiv:2309.12239 [pdf, other]

ContTune: Continuous Tuning by Conservative Bayesian Optimization for Distributed Stream Data Processing Systems

Authors: Jinqing Lian, Xinyi Zhang, Yingxia Shao, Zenglin Pu, Qingfeng Xiang, Yawen Li, Bin Cui

Abstract: The past decade has seen rapid growth of distributed stream data processing systems. Under these systems, a stream application is realized as a Directed Acyclic Graph (DAG) of operators, where the level of parallelism of each operator has a substantial impact on its overall performance. However, finding optimal levels of parallelism remains challenging. Most existing methods are heavily coupled wi… ▽ More The past decade has seen rapid growth of distributed stream data processing systems. Under these systems, a stream application is realized as a Directed Acyclic Graph (DAG) of operators, where the level of parallelism of each operator has a substantial impact on its overall performance. However, finding optimal levels of parallelism remains challenging. Most existing methods are heavily coupled with the topological graph of operators, unable to efficiently tune under-provisioned jobs. They either insufficiently use previous tuning experience by treating successively tuning independently, or explore the configuration space aggressively, violating the Service Level Agreements (SLA). To address the above problems, we propose ContTune, a continuous tuning system for stream applications. It is equipped with a novel Big-small algorithm, in which the Big phase decouples the tuning from the topological graph by decomposing the job tuning problem into sub-problems that can be solved concurrently. We propose a conservative Bayesian Optimization (CBO) technique in the Small phase to speed up the tuning process by utilizing the previous observations. It leverages the state-of-the-art (SOTA) tuning method as conservative exploration to avoid SLA violations. Experimental results show that ContTune reduces up to 60.75% number of reconfigurations under synthetic workloads and up to 57.5% number of reconfigurations under real workloads, compared to the SOTA method DS2. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2309.01901 [pdf, other]

doi 10.14778/3611540.3611548

Towards General and Efficient Online Tuning for Spark

Authors: Yang Li, Huaijun Jiang, Yu Shen, Yide Fang, Xiaofeng Yang, Danqing Huang, Xinyi Zhang, Wentao Zhang, Ce Zhang, Peng Chen, Bin Cui

Abstract: The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general a… ▽ More The distributed data analytic system -- Spark is a common choice for processing massive volumes of heterogeneous data, while it is challenging to tune its parameters to achieve high performance. Recent studies try to employ auto-tuning techniques to solve this problem but suffer from three issues: limited functionality, high overhead, and inefficient search. In this paper, we present a general and efficient Spark tuning framework that can deal with the three issues simultaneously. First, we introduce a generalized tuning formulation, which can support multiple tuning goals and constraints conveniently, and a Bayesian optimization (BO) based solution to solve this generalized optimization problem. Second, to avoid high overhead from additional offline evaluations in existing methods, we propose to tune parameters along with the actual periodic executions of each job (i.e., online evaluations). To ensure safety during online job executions, we design a safe configuration acquisition method that models the safe region. Finally, three innovative techniques are leveraged to further accelerate the search process: adaptive sub-space generation, approximate gradient descent, and meta-learning method. We have implemented this framework as an independent cloud service, and applied it to the data platform in Tencent. The empirical results on both public benchmarks and large-scale production tasks demonstrate its superiority in terms of practicality, generality, and efficiency. Notably, this service saves an average of 57.00% memory cost and 34.93% CPU cost on 25K in-production tasks within 20 iterations, respectively. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Journal ref: Proceedings of the VLDB Endowment 2023

arXiv:2308.16436 [pdf, other]

QCD factorization for the $B\to γ\ellν_{\ell}$ decay beyond leading power

Authors: Bo-Yan Cui, Yue-Long Shen, Chao Wang, Yan-Bing Wei

Abstract: The radiative leptonic $B\to γ\ellν_{\ell}$ decay serves as an ideal platform to determine the $B$-meson inverse moment which is a fundamental nonperturbative parameter for the $B$ meson. In this paper, we explore precise QCD contributions to this decay with an energetic photon. We reproduce the next-to-next-to-leading-logarithmic resummation formula for the decay amplitude at leading power in… ▽ More The radiative leptonic $B\to γ\ellν_{\ell}$ decay serves as an ideal platform to determine the $B$-meson inverse moment which is a fundamental nonperturbative parameter for the $B$ meson. In this paper, we explore precise QCD contributions to this decay with an energetic photon. We reproduce the next-to-next-to-leading-logarithmic resummation formula for the decay amplitude at leading power in $Λ_{\rm QCD}/m_b$. Employing operator identities, we calculate subleading-power contributions from the expansion of the hard-collinear propagator of the internal up quark and the heavy-quark expansion of the bottom quark. We update the contributions from the hadronic structure of the photon to the $\decay$ process with the dispersion technique. Together with other yet known power corrections, phenomenological applications including the partial branching fraction and ratio of the branching fractions of the radiative $B$ decay are investigated. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: 29 pages, 6 figures

arXiv:2308.10878 [pdf, other]

Proton-Boron Fusion Yield Increased by Orders of Magnitude with Foam Targets

Authors: Wen-Qing Wei, Shi-Zheng Zhang, Zhi-Gang Deng, Wei Qi, Hao Xu, Li-Rong Liu, Jia-Lin Zhang, Fang-Fang Li, Xing Xu, Zhong-Min Hu, Ben-Zheng Chen, Bu-Bo Ma, Jian-Xing Li, Xue-Guang Ren, Zhong-Feng Xu, Dieter H. H. Hoffmann, Quan-Ping Fan, Wei-Wu Wang, Shao-Yi Wang, Jian Teng, Bo Cui, Feng Lu, Lei Yang, Yu-Qiu Gu, Zong-Qing Zhao , et al. (13 additional authors not shown)

Abstract: A novel intense beam-driven scheme for high yield of the tri-alpha reaction 11B(p,α)2α was investigated. We used a foam target made of cellulose triacetate (TAC, C_9H_{16}O_8) doped with boron. It was then heated volumetrically by soft X-ray radiation from a laser heated hohlraum and turned into a homogenous, and long living plasma. We employed a picosecond laser pulse to generate a high-intensity… ▽ More A novel intense beam-driven scheme for high yield of the tri-alpha reaction 11B(p,α)2α was investigated. We used a foam target made of cellulose triacetate (TAC, C_9H_{16}O_8) doped with boron. It was then heated volumetrically by soft X-ray radiation from a laser heated hohlraum and turned into a homogenous, and long living plasma. We employed a picosecond laser pulse to generate a high-intensity energetic proton beam via the well-known Target Normal Sheath Acceleration (TNSA) mechanism. We observed up to 10^{10}/sr α particles per laser shot. This constitutes presently the highest yield value normalized to the laser energy on target. The measured fusion yield per proton exceeds the classical expectation of beam-target reactions by up to four orders of magnitude under high proton intensities. This enhancement is attributed to the strong electric fields and nonequilibrium thermonuclear fusion reactions as a result of the new method. Our approach shows opportunities to pursue ignition of aneutronic fusion. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.09568 [pdf, other]

PUMGPT: A Large Vision-Language Model for Product Understanding

Authors: Wei Xue, Zongyi Guo, Baoliang Cui, Zheng Xing, Xiaoyi Zeng, Xiufei Wang, Shuhui Wu, Weiming Lu

Abstract: E-commerce platforms benefit from accurate product understanding to enhance user experience and operational efficiency. Traditional methods often focus on isolated tasks such as attribute extraction or categorization, posing adaptability issues to evolving tasks and leading to usability challenges with noisy data from the internet. Current Large Vision Language Models (LVLMs) lack domain-specific… ▽ More E-commerce platforms benefit from accurate product understanding to enhance user experience and operational efficiency. Traditional methods often focus on isolated tasks such as attribute extraction or categorization, posing adaptability issues to evolving tasks and leading to usability challenges with noisy data from the internet. Current Large Vision Language Models (LVLMs) lack domain-specific fine-tuning, thus falling short in precision and instruction following. To address these issues, we introduce PumGPT, the first e-commerce specialized LVLM designed for multi-modal product understanding tasks. We collected and curated a dataset of over one million products from AliExpress, filtering out non-inferable attributes using a universal hallucination detection framework, resulting in 663k high-quality data samples. PumGPT focuses on five essential tasks aimed at enhancing workflows for e-commerce platforms and retailers. We also introduce PumBench, a benchmark to evaluate product understanding across LVLMs. Our experiments show that PumGPT outperforms five other open-source LVLMs and GPT-4V in product understanding tasks. We also conduct extensive analytical experiments to delve deeply into the superiority of PumGPT, demonstrating the necessity for a specialized model in the e-commerce domain. △ Less

Submitted 16 June, 2024; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2308.08823 [pdf, other]

Mitigating Semantic Confusion from Hostile Neighborhood for Graph Active Learning

Authors: Tianmeng Yang, Min Zhou, Yujing Wang, Zhengjie Lin, Lujia Pan, Bin Cui, Yunhai Tong

Abstract: Graph Active Learning (GAL), which aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance, has attracted many research efforts but remains non-trivial challenges. One major challenge is that existing GAL strategies may introduce semantic confusion to the selected training set, particularly when graphs are noisy. Specifically, most… ▽ More Graph Active Learning (GAL), which aims to find the most informative nodes in graphs for annotation to maximize the Graph Neural Networks (GNNs) performance, has attracted many research efforts but remains non-trivial challenges. One major challenge is that existing GAL strategies may introduce semantic confusion to the selected training set, particularly when graphs are noisy. Specifically, most existing methods assume all aggregating features to be helpful, ignoring the semantically negative effect between inter-class edges under the message-passing mechanism. In this work, we present Semantic-aware Active learning framework for Graphs (SAG) to mitigate the semantic confusion problem. Pairwise similarities and dissimilarities of nodes with semantic features are introduced to jointly evaluate the node influence. A new prototype-based criterion and query policy are also designed to maintain diversity and class balance of the selected nodes, respectively. Extensive experiments on the public benchmark graphs and a real-world financial dataset demonstrate that SAG significantly improves node classification performances and consistently outperforms previous methods. Moreover, comprehensive analysis and ablation study also verify the effectiveness of the proposed framework. △ Less

Submitted 17 August, 2023; originally announced August 2023.

Comments: Accepted by CIKM 2023

arXiv:2308.02117 [pdf, other]

VQGraph: Rethinking Graph Representation Space for Bridging GNNs and MLPs

Authors: Ling Yang, Ye Tian, Minkai Xu, Zhongyi Liu, Shenda Hong, Wei Qu, Wentao Zhang, Bin Cui, Muhan Zhang, Jure Leskovec

Abstract: GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (student MLP) on graph data by mimicking the output representations of teacher GNN. Existing methods mainly make the MLP to mimic the GNN predictions over a few class labels. However, the class space may not be expressive enough for covering numerous diverse local graph str… ▽ More GNN-to-MLP distillation aims to utilize knowledge distillation (KD) to learn computationally-efficient multi-layer perceptron (student MLP) on graph data by mimicking the output representations of teacher GNN. Existing methods mainly make the MLP to mimic the GNN predictions over a few class labels. However, the class space may not be expressive enough for covering numerous diverse local graph structures, thus limiting the performance of knowledge transfer from GNN to MLP. To address this issue, we propose to learn a new powerful graph representation space by directly labeling nodes' diverse local structures for GNN-to-MLP distillation. Specifically, we propose a variant of VQ-VAE to learn a structure-aware tokenizer on graph data that can encode each node's local substructure as a discrete code. The discrete codes constitute a codebook as a new graph representation space that is able to identify different local graph structures of nodes with the corresponding code indices. Then, based on the learned codebook, we propose a new distillation target, namely soft code assignments, to directly transfer the structural knowledge of each node from GNN to MLP. The resulting framework VQGraph achieves new state-of-the-art performance on GNN-to-MLP distillation in both transductive and inductive settings across seven graph datasets. We show that VQGraph with better performance infers faster than GNNs by 828x, and also achieves accuracy improvement over GNNs and stand-alone MLPs by 3.90% and 28.05% on average, respectively. Code: https://github.com/YangLing0818/VQGraph. △ Less

Submitted 6 March, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: ICLR 2024. Code: https://github.com/YangLing0818/VQGraph

Showing 1–50 of 237 results for author: Cui, B