-
Dynamic Dimension Wrapping (DDW) Algorithm: A Novel Approach for Efficient Cross-Dimensional Search in Dynamic Multidimensional Spaces
Authors:
Dongnan Jin,
Yali Liu,
Qiuzhi Song,
Xunju Ma,
Yue Liu,
Dehao Wu
Abstract:
In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search i…
▽ More
In the real world, as the complexity of optimization problems continues to increase, there is an urgent need to research more efficient optimization methods. Current optimization algorithms excel in solving problems with a fixed number of dimensions. However, their efficiency in searching dynamic multi-dimensional spaces is unsatisfactory. In response to the challenge of cross-dimensional search in multi-dimensional spaces with varying numbers of dimensions, this study proposes a new optimization algorithm-Dynamic Dimension Wrapping (DDW) algorithm. Firstly, by utilizing the Dynamic Time Warping (DTW) algorithm and Euclidean distance, a mapping relationship between different time series across dimensions is established, thus creating a fitness function suitable for dimensionally dynamic multi-dimensional space. Additionally, DDW introduces a novel, more efficient cross-dimensional search mechanism for dynamic multidimensional spaces. Finally, through comparative tests with 31 optimization algorithms in dynamic multidimensional space search, the results demonstrate that DDW exhibits outstanding search efficiency and provides search results closest to the actual optimal solution.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Fitting an Elephant with Four non-Zero Parameters
Authors:
Dian Jin,
Junze Yuan
Abstract:
In 1953, Enrico Fermi criticized Dyson's model by quoting Johnny von Neumann: "With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." So far, there have been several attempts to fit an elephant using four parameters, but as the problem has not been well-defined, the current methods do not completely satisfy the requirements. This paper defines the problem and p…
▽ More
In 1953, Enrico Fermi criticized Dyson's model by quoting Johnny von Neumann: "With four parameters I can fit an elephant, and with five I can make him wiggle his trunk." So far, there have been several attempts to fit an elephant using four parameters, but as the problem has not been well-defined, the current methods do not completely satisfy the requirements. This paper defines the problem and presents an attempt.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Voluminous Fur Stroking Experience through Interactive Visuo-Haptic Model in Virtual Reality
Authors:
Juro Hosoi,
Du Jin,
Yuki Ban,
Shin'ichi Warisawa
Abstract:
The tactile sensation of stroking soft fur, known for its comfort and emotional benefits, has numerous applications in virtual reality, animal-assisted therapy, and household products. Previous studies have primarily utilized actual fur to present a voluminous fur experience that poses challenges concerning versatility and flexibility. In this study, we develop a system that integrates a head-moun…
▽ More
The tactile sensation of stroking soft fur, known for its comfort and emotional benefits, has numerous applications in virtual reality, animal-assisted therapy, and household products. Previous studies have primarily utilized actual fur to present a voluminous fur experience that poses challenges concerning versatility and flexibility. In this study, we develop a system that integrates a head-mounted display with an ultrasound haptic display to provide visual and haptic feedback. Measurements taken using an artificial skin sheet reveal directional differences in tactile and visual responses to voluminous fur. Based on observations and measurements, we propose interactive models that dynamically adjust to hand movements, simulating fur-stroking sensations. Our experiments demonstrate that the proposed model using visual and haptic modalities significantly enhances the realism of a fur-stroking experience. Our findings suggest that the interactive visuo-haptic model offers a promising fur-stroking experience in virtual reality, potentially enhancing the user experience in therapeutic, entertainment, and retail applications.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Quantum Electronics on Quantum Liquids and Solids
Authors:
Wei Guo,
Denis Konstantinov,
Dafei Jin
Abstract:
Nonpolar atoms or molecules with light particle mass and weak particle-particle interaction can form quantum liquids and solids (QLS) at low temperatures. Excess electrons can be naturally bound to the surface of a QLS in a vacuum and exhibit unique quantum electronic behaviors in two and lower dimensions. In this article, we review the historical study and recent progress in this area. The main t…
▽ More
Nonpolar atoms or molecules with light particle mass and weak particle-particle interaction can form quantum liquids and solids (QLS) at low temperatures. Excess electrons can be naturally bound to the surface of a QLS in a vacuum and exhibit unique quantum electronic behaviors in two and lower dimensions. In this article, we review the historical study and recent progress in this area. The main topics covered in this review include the collective and individual electron transport on liquid helium, solid neon, and solid hydrogen, the theoretical proposal and experimental effort toward single electron qubits on superfluid helium, the recent experimental realization of single electron charge qubits on solid neon and the related theoretical calculation. In the end, we review and envision extended exploration of quantum electronics on heterogeneous QLS.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
A GPU-accelerated Large-scale Simulator for Transportation System Optimization Benchmarking
Authors:
Jun Zhang,
Wenxuan Ao,
Junbo Yan,
Depeng Jin,
Yong Li
Abstract:
With the development of artificial intelligence techniques, transportation system optimization is evolving from traditional methods relying on expert experience to simulation and learning-based decision optimization methods. Learning-based optimization methods require extensive interaction with highly realistic microscopic traffic simulators for optimization. However, existing microscopic traffic…
▽ More
With the development of artificial intelligence techniques, transportation system optimization is evolving from traditional methods relying on expert experience to simulation and learning-based decision optimization methods. Learning-based optimization methods require extensive interaction with highly realistic microscopic traffic simulators for optimization. However, existing microscopic traffic simulators are computationally inefficient in large-scale scenarios and therefore significantly reduce the efficiency of the data sampling process of optimization algorithms. In addition, the optimization scenarios supported by existing simulators are limited, mainly focusing on the traffic signal control. To address these challenges and limitations, we propose the first open-source GPU-accelerated large-scale microscopic simulator for transportation system simulation. The simulator is able to iterate at 84.09Hz, which achieves 88.92 times computational acceleration in the large-scale scenario with more than a million vehicles compared to the best baseline. Based on the simulator, we implement a set of microscopic and macroscopic controllable objects and metrics to support most typical transportation system optimization scenarios. These controllable objects and metrics are all provided by Python API for ease of use. We choose five important and representative transportation system optimization scenarios and benchmark classical rule-based algorithms, reinforcement learning, and black-box optimization in four cities. The codes are available at \url{https://github.com/tsinghua-fib-lab/moss-benchmark} with the MIT License.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
MOSS: A Large-scale Open Microscopic Traffic Simulation System
Authors:
Jun Zhang,
Wenxuan Ao,
Junbo Yan,
Can Rong,
Depeng Jin,
Wei Wu,
Yong Li
Abstract:
In the research of Intelligent Transportation Systems (ITS), traffic simulation is a key procedure for the evaluation of new methods and optimization of strategies. However, existing traffic simulation systems face two challenges. First, how to balance simulation scale with realism is a dilemma. Second, it is hard to simulate realistic results, which requires realistic travel demand data and simul…
▽ More
In the research of Intelligent Transportation Systems (ITS), traffic simulation is a key procedure for the evaluation of new methods and optimization of strategies. However, existing traffic simulation systems face two challenges. First, how to balance simulation scale with realism is a dilemma. Second, it is hard to simulate realistic results, which requires realistic travel demand data and simulator. These problems limit computer-aided optimization of traffic management strategies for large-scale road networks and reduce the usability of traffic simulations in areas where real-world travel demand data are lacking. To address these problems, we design and implement MObility Simulation System (MOSS). MOSS adopts GPU acceleration to significantly improve the efficiency and scale of microscopic traffic simulation, which enables realistic and fast simulations for large-scale road networks. It provides realistic travel Origin-Destination (OD) matrices generation through a pre-trained generative neural network model based on publicly available data on a global scale, such as satellite imagery, to help researchers build meaningful travel demand data. It also provides a complete open toolchain to help users with road network construction, demand generation, simulation, and result analysis. The whole toolchain including the simulator can be accessed at https://moss.fiblab.net and the codes are open-source for community collaboration.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models
Authors:
Tong Zhang,
Peixin Qin,
Yang Deng,
Chen Huang,
Wenqiang Lei,
Junhong Liu,
Dingnan Jin,
Hongru Liang,
Tat-Seng Chua
Abstract:
Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality…
▽ More
Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality data to assess the strengths, weaknesses, and potential risks of various off-the-shelf LLMs. Our findings indicate the limited practical utility of current LLMs in identifying and clarifying ambiguous user queries, even enhanced by chain-of-thought (CoT) and few-shot prompting. These techniques may result in overconfidence in LLMs and yield only marginal enhancements in identifying ambiguity. Furthermore, current LLMs fall short in generating high-quality clarifying questions due to a lack of conflict resolution and inaccurate utilization of inherent knowledge. In this paper, CLAMBER presents a guidance and promotes further research on proactive and trustworthy LLMs. Our dataset is available at https://github.com/zt991211/CLAMBER
△ Less
Submitted 1 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents
Authors:
Yue Chen,
Chen Huang,
Yang Deng,
Wenqiang Lei,
Dingnan Jin,
Jia Liu,
Tat-Seng Chua
Abstract:
Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still s…
▽ More
Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still struggle to deliver promising performance on unseen domains, struggling to achieve effective domain transferability. We take the first step to investigate this issue and existing methods tend to produce one-size-fits-all strategies across diverse domains, limiting their search effectiveness. In response, we introduce a novel method, called Style, to achieve effective domain transferability. Our experimental results indicate that Style bears strong domain transferability, resulting in an average search performance improvement of ~10% on four unseen domains.
△ Less
Submitted 1 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Density functions for the overdamped generalized Langevin equation and its Euler--Maruyama method: smoothness and convergence
Authors:
Xinjie Dai,
Diancong Jin
Abstract:
This paper focuses on studying the convergence rate of the density function of the Euler--Maruyama (EM) method, when applied to the overdamped generalized Langevin equation with fractional noise which serves as an important model in many fields. Firstly, we give an improved upper bound estimate for the total variation distance between random variables by their Malliavin--Sobolev norms. Secondly, w…
▽ More
This paper focuses on studying the convergence rate of the density function of the Euler--Maruyama (EM) method, when applied to the overdamped generalized Langevin equation with fractional noise which serves as an important model in many fields. Firstly, we give an improved upper bound estimate for the total variation distance between random variables by their Malliavin--Sobolev norms. Secondly, we establish the existence and smoothness of the density function for both the exact solution and the numerical one. Based on the above results, the convergence rate of the density function of the numerical solution is obtained, which relies on the regularity of the noise and kernel. This convergence result provides a powerful support for numerically capturing the statistical information of the exact solution through the EM method.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
OpenGait: A Comprehensive Benchmark Study for Gait Recognition towards Better Practicality
Authors:
Chao Fan,
Saihui Hou,
Junhao Liang,
Chuanfu Shen,
Jingzhe Ma,
Dongyang Jin,
Yongzhen Huang,
Shiqi Yu
Abstract:
Gait recognition, a rapidly advancing vision technology for person identification from a distance, has made significant strides in indoor settings. However, evidence suggests that existing methods often yield unsatisfactory results when applied to newly released real-world gait datasets. Furthermore, conclusions drawn from indoor gait datasets may not easily generalize to outdoor ones. Therefore,…
▽ More
Gait recognition, a rapidly advancing vision technology for person identification from a distance, has made significant strides in indoor settings. However, evidence suggests that existing methods often yield unsatisfactory results when applied to newly released real-world gait datasets. Furthermore, conclusions drawn from indoor gait datasets may not easily generalize to outdoor ones. Therefore, the primary goal of this work is to present a comprehensive benchmark study aimed at improving practicality rather than solely focusing on enhancing performance. To this end, we first develop OpenGait, a flexible and efficient gait recognition platform. Using OpenGait as a foundation, we conduct in-depth ablation experiments to revisit recent developments in gait recognition. Surprisingly, we detect some imperfect parts of certain prior methods thereby resulting in several critical yet undiscovered insights. Inspired by these findings, we develop three structurally simple yet empirically powerful and practically robust baseline models, i.e., DeepGaitV2, SkeletonGait, and SkeletonGait++, respectively representing the appearance-based, model-based, and multi-modal methodology for gait pattern description. Beyond achieving SoTA performances, more importantly, our careful exploration sheds new light on the modeling experience of deep gait models, the representational capacity of typical gait modalities, and so on. We hope this work can inspire further research and application of gait recognition towards better practicality. The code is available at https://github.com/ShiqiYu/OpenGait.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Theoretical Analysis for Expectation-Maximization-Based Multi-Model 3D Registration
Authors:
David Jin,
Harry Zhang,
Kai Chang
Abstract:
We perform detailed theoretical analysis of an expectation-maximization-based algorithm recently proposed in for solving a variation of the 3D registration problem, named multi-model 3D registration. Despite having shown superior empirical results, did not theoretically justify the conditions under which the EM approach converges to the ground truth. In this project, we aim to close this gap by es…
▽ More
We perform detailed theoretical analysis of an expectation-maximization-based algorithm recently proposed in for solving a variation of the 3D registration problem, named multi-model 3D registration. Despite having shown superior empirical results, did not theoretically justify the conditions under which the EM approach converges to the ground truth. In this project, we aim to close this gap by establishing such conditions. In particular, the analysis revolves around the usage of probabilistic tail bounds that are developed and applied in various instances throughout the course. The problem studied in this project stands as another example, different from those seen in the course, in which tail-bounds help advance our algorithmic understanding in a probabilistic way. We provide self-contained background materials on 3D Registration
△ Less
Submitted 24 May, 2024; v1 submitted 14 May, 2024;
originally announced May 2024.
-
MARE: Multi-Agents Collaboration Framework for Requirements Engineering
Authors:
Dongming Jin,
Zhi Jin,
Xiaohong Chen,
Chunhui Wang
Abstract:
Requirements Engineering (RE) is a critical phase in the software development process that generates requirements specifications from stakeholders' needs. Recently, deep learning techniques have been successful in several RE tasks. However, obtaining high-quality requirements specifications requires collaboration across multiple tasks and roles. In this paper, we propose an innovative framework ca…
▽ More
Requirements Engineering (RE) is a critical phase in the software development process that generates requirements specifications from stakeholders' needs. Recently, deep learning techniques have been successful in several RE tasks. However, obtaining high-quality requirements specifications requires collaboration across multiple tasks and roles. In this paper, we propose an innovative framework called MARE, which leverages collaboration among large language models (LLMs) throughout the entire RE process. MARE divides the RE process into four tasks: elicitation, modeling, verification, and specification. Each task is conducted by engaging one or two specific agents and each agent can conduct several actions. MARE has five agents and nine actions. To facilitate collaboration between agents, MARE has designed a workspace for agents to upload their generated intermediate requirements artifacts and obtain the information they need. We conduct experiments on five public cases, one dataset, and four new cases created by this work. We compared MARE with three baselines using three widely used metrics for the generated requirements models. Experimental results show that MARE can generate more correct requirements models and outperform the state-of-the-art approaches by 15.4%. For the generated requirements specifications, we conduct a human evaluation in three aspects and provide insights about the quality
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Semantic Line Combination Detector
Authors:
Jinwon Ko,
Dongkwon Jin,
Chang-Su Kim
Abstract:
A novel algorithm, called semantic line combination detector (SLCD), to find an optimal combination of semantic lines is proposed in this paper. It processes all lines in each line combination at once to assess the overall harmony of the lines. First, we generate various line combinations from reliable lines. Second, we estimate the score of each line combination and determine the best one. Experi…
▽ More
A novel algorithm, called semantic line combination detector (SLCD), to find an optimal combination of semantic lines is proposed in this paper. It processes all lines in each line combination at once to assess the overall harmony of the lines. First, we generate various line combinations from reliable lines. Second, we estimate the score of each line combination and determine the best one. Experimental results demonstrate that the proposed SLCD outperforms existing semantic line detectors on various datasets. Moreover, it is shown that SLCD can be applied effectively to three vision tasks of vanishing point detection, symmetry axis detection, and composition-based image retrieval. Our codes are available at https://github.com/Jinwon-Ko/SLCD.
△ Less
Submitted 1 May, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
Diagnosing Emergent Isotropy in Anisotropic Holographic Systems using Quantum Information Measures
Authors:
Chong-Ye Chen,
Mu-Jing Li,
Zhe Yang,
Da-Ming Jin,
Peng Liu
Abstract:
This study presents a comprehensive investigation of anisotropy in a holographic p-wave superconductor model, revealing novel insights into the behavior of quantum information measures in strongly coupled systems. Through rigorous semi-analytical methods, we uncover the existence of an isotropic point emerging at a critical temperature $T_{II}$, marking a significant transition in the system's ani…
▽ More
This study presents a comprehensive investigation of anisotropy in a holographic p-wave superconductor model, revealing novel insights into the behavior of quantum information measures in strongly coupled systems. Through rigorous semi-analytical methods, we uncover the existence of an isotropic point emerging at a critical temperature $T_{II}$, marking a significant transition in the system's anisotropic properties. We offer a systematic analysis of the mechanisms driving anisotropy and isotropy transitions, finding that this phenomenon is unique to the p-wave model and absent in other anisotropic systems like anisotropic axion models with metal-insulator transitions. We propose that the explicit component dependence of the vector field manifesting anisotropy is the key driver of the emergent isotropy. Our analysis of holographic entanglement entropy (HEE), entanglement wedge cross-section (EWCS), and butterfly velocity demonstrates their distinct sensitivities to bulk anisotropy, with EWCS and butterfly velocity emerging as superior probes for detecting the isotropic point.
△ Less
Submitted 9 July, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
The Effects of Splayed Lipid Molecules on Lubrication by Lipid Bilayers
Authors:
Di Jin,
Jacob Klein
Abstract:
The outstanding lubrication of articular cartilage in the major synovial joints such as hips and knees, essential for the joint well-being, has been attributed to boundary layers of lipids at the outer cartilage surfaces, which have very low friction mediated by the hydration lubrication mechanism at their highly hydrated exposed headgroups. However, the role of spontaneously present lipid splays,…
▽ More
The outstanding lubrication of articular cartilage in the major synovial joints such as hips and knees, essential for the joint well-being, has been attributed to boundary layers of lipids at the outer cartilage surfaces, which have very low friction mediated by the hydration lubrication mechanism at their highly hydrated exposed headgroups. However, the role of spontaneously present lipid splays, lipids with an acyl tail in each of the opposing bilayers, in modulating the frictional force between lipid bilayers has not, to date, been considered. In this study, we perform all-atom molecular dynamics simulations to quantitatively assess the significance of splayed molecules within the framework of lubricating lipid bilayers. We demonstrate that, although transient, splayed molecules significantly increase the inter-membrane friction until their retraction back into the lamellar phase, with this effect more steadily occurring at lower sliding velocities that are comparable to the physiological velocities of sliding articular cartilage.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Pressure-dependent adhesion between solid-supported PC-lipid bilayers and vesicles under electric fields
Authors:
Yu Zhang,
Di Jin,
Jacob Klein
Abstract:
Fusion of lipid bilayers in membranes is important in processes from vesicle-cell interactions (as in drug delivery) to exosome-cell signaling, while transient transmembrane electric fields are known to occur spontaneously. Two contacting phosphatidylcholine (PC) lipid membranes are known to fuse into one under external electric fields, suggesting that the interaction between them is modified by t…
▽ More
Fusion of lipid bilayers in membranes is important in processes from vesicle-cell interactions (as in drug delivery) to exosome-cell signaling, while transient transmembrane electric fields are known to occur spontaneously. Two contacting phosphatidylcholine (PC) lipid membranes are known to fuse into one under external electric fields, suggesting that the interaction between them is modified by the field as they approach, prior to the fusion event. Here we measure directly the adhesion energy between dimyristoylphosphatidylcholine (DMPC) and between distearoylphosphatidylcholine (DSPC) surface layers attached to solid substrates both without and with a transmembrane electric field. We find a marked pressure-dependent adhesion behavior in the electric field, which we attribute to fusion intermediates that are formed, shedding new light on membrane electro-fusion.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Effective Lymph Nodes Detection in CT Scans Using Location Debiased Query Selection and Contrastive Query Representation in Transformer
Authors:
Qinji Yu,
Yirui Wang,
Ke Yan,
Haoshen Li,
Dazhou Guo,
Li Zhang,
Le Lu,
Na Shen,
Qifeng Wang,
Xiaowei Ding,
Xianghua Ye,
Dakai Jin
Abstract:
Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previou…
▽ More
Lymph node (LN) assessment is a critical, indispensable yet very challenging task in the routine clinical workflow of radiology and oncology. Accurate LN analysis is essential for cancer diagnosis, staging, and treatment planning. Finding scatteredly distributed, low-contrast clinically relevant LNs in 3D CT is difficult even for experienced physicians under high inter-observer variations. Previous automatic LN detection works typically yield limited recall and high false positives (FPs) due to adjacent anatomies with similar image intensities, shapes, or textures (vessels, muscles, esophagus, etc). In this work, we propose a new LN DEtection TRansformer, named LN-DETR, to achieve more accurate performance. By enhancing the 2D backbone with a multi-scale 2.5D feature fusion to incorporate 3D context explicitly, more importantly, we make two main contributions to improve the representation quality of LN queries. 1) Considering that LN boundaries are often unclear, an IoU prediction head and a location debiased query selection are proposed to select LN queries of higher localization accuracy as the decoder query's initialization. 2) To reduce FPs, query contrastive learning is employed to explicitly reinforce LN queries towards their best-matched ground-truth queries over unmatched query predictions. Trained and tested on 3D CT scans of 1067 patients (with 10,000+ labeled LNs) via combining seven LN datasets from different body parts (neck, chest, and abdomen) and pathologies/cancers, our method significantly improves the performance of previous leading methods by > 4-5% average recall at the same FP rates in both internal and external testing. We further evaluate on the universal lesion detection task using NIH DeepLesion benchmark, and our method achieves the top performance of 88.46% averaged recall across 0.5 to 4 FPs per image, compared with other leading reported results.
△ Less
Submitted 4 April, 2024;
originally announced April 2024.
-
Large Motion Model for Unified Multi-Modal Motion Generation
Authors:
Mingyuan Zhang,
Daisheng Jin,
Chenyang Gu,
Fangzhou Hong,
Zhongang Cai,
Jingfang Huang,
Chongzhi Zhang,
Xinying Guo,
Lei Yang,
Ying He,
Ziwei Liu
Abstract:
Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on developing specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation t…
▽ More
Human motion generation, a cornerstone technique in animation and video production, has widespread applications in various tasks like text-to-motion and music-to-dance. Previous works focus on developing specialist models tailored for each task without scalability. In this work, we present Large Motion Model (LMM), a motion-centric, multi-modal framework that unifies mainstream motion generation tasks into a generalist model. A unified motion model is appealing since it can leverage a wide range of motion data to achieve broad generalization beyond a single task. However, it is also challenging due to the heterogeneous nature of substantially different motion data and tasks. LMM tackles these challenges from three principled aspects: 1) Data: We consolidate datasets with different modalities, formats and tasks into a comprehensive yet unified motion generation dataset, MotionVerse, comprising 10 tasks, 16 datasets, a total of 320k sequences, and 100 million frames. 2) Architecture: We design an articulated attention mechanism ArtAttention that incorporates body part-aware modeling into Diffusion Transformer backbone. 3) Pre-Training: We propose a novel pre-training strategy for LMM, which employs variable frame rates and masking forms, to better exploit knowledge from diverse training data. Extensive experiments demonstrate that our generalist LMM achieves competitive performance across various standard motion generation tasks over state-of-the-art specialist models. Notably, LMM exhibits strong generalization capabilities and emerging properties across many unseen tasks. Additionally, our ablation studies reveal valuable insights about training and scaling up large motion models for future research.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans
Authors:
Heng Guo,
Jianfeng Zhang,
Jiaxing Huang,
Tony C. W. Mok,
Dazhou Guo,
Ke Yan,
Le Lu,
Dakai Jin,
Minfeng Xu
Abstract:
Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM mod…
▽ More
Segment anything model (SAM) demonstrates strong generalization ability on natural image segmentation. However, its direct adaption in medical image segmentation tasks shows significant performance drops with inferior accuracy and unstable results. It may also requires an excessive number of prompt points to obtain a reasonable accuracy. For segmenting 3D radiological CT or MRI scans, a 2D SAM model has to separately handle hundreds of 2D slices. Although quite a few studies explore adapting SAM into medical image volumes, the efficiency of 2D adaption methods is unsatisfactory and 3D adaptation methods only capable of segmenting specific organs/tumors. In this work, we propose a comprehensive and scalable 3D SAM model for whole-body CT segmentation, named CT-SAM3D. Instead of adapting SAM, we propose a 3D promptable segmentation model using a (nearly) fully labeled CT dataset. To train CT-SAM3D effectively, ensuring the model's accurate responses to higher-dimensional spatial prompts is crucial, and 3D patch-wise training is required due to GPU memory constraints. For this purpose, we propose two key technical developments: 1) a progressively and spatially aligned prompt encoding method to effectively encode click prompts in local 3D space; and 2) a cross-patch prompt learning scheme to capture more 3D spatial context, which is beneficial for reducing the editing workloads when interactively prompting on large organs. CT-SAM3D is trained and validated using a curated dataset of 1204 CT scans containing 107 whole-body anatomies, reporting significantly better quantitative performance against all previous SAM-derived models by a large margin with much fewer click prompts. Our model can handle segmenting unseen organ as well. Code, data, and our 3D interactive segmentation tool with quasi-real-time responses will be made publicly available.
△ Less
Submitted 22 March, 2024;
originally announced March 2024.
-
Multi-role Consensus through LLMs Discussions for Vulnerability Detection
Authors:
Zhenyu Mao,
Jialong Li,
Dongming Jin,
Munan Li,
Kenji Tei
Abstract:
Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and teste…
▽ More
Recent advancements in large language models (LLMs) have highlighted the potential for vulnerability detection, a crucial component of software quality assurance. Despite this progress, most studies have been limited to the perspective of a single role, usually testers, lacking diverse viewpoints from different roles in a typical software development life-cycle, including both developers and testers. To this end, this paper introduces a multi-role approach to employ LLMs to act as different roles simulating a real-life code review process and engaging in discussions toward a consensus on the existence and classification of vulnerabilities in the code. Preliminary evaluation of this approach indicates a 13.48% increase in the precision rate, an 18.25% increase in the recall rate, and a 16.13% increase in the F1 score.
△ Less
Submitted 18 May, 2024; v1 submitted 21 March, 2024;
originally announced March 2024.
-
Data is all you need: Finetuning LLMs for Chip Design via an Automated design-data augmentation framework
Authors:
Kaiyan Chang,
Kun Wang,
Nan Yang,
Ying Wang,
Dantong Jin,
Wenlong Zhu,
Zhirong Chen,
Cangyuan Li,
Hao Yan,
Yunhao Zhou,
Zhuoliang Zhao,
Yuan Cheng,
Yudong Pan,
Yiqi Liu,
Mengdi Wang,
Shengwen Liang,
Yinhe Han,
Huawei Li,
Xiaowei Li
Abstract:
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by L…
▽ More
Recent advances in large language models have demonstrated their potential for automated generation of hardware description language (HDL) code from high-level prompts. Researchers have utilized fine-tuning to enhance the ability of these large language models (LLMs) in the field of Chip Design. However, the lack of Verilog data hinders further improvement in the quality of Verilog generation by LLMs. Additionally, the absence of a Verilog and Electronic Design Automation (EDA) script data augmentation framework significantly increases the time required to prepare the training dataset for LLM trainers. This paper proposes an automated design-data augmentation framework, which generates high-volume and high-quality natural language aligned with Verilog and EDA scripts. For Verilog generation, it translates Verilog files to an abstract syntax tree and then maps nodes to natural language with a predefined template. For Verilog repair, it uses predefined rules to generate the wrong verilog file and then pairs EDA Tool feedback with the right and wrong verilog file. For EDA Script generation, it uses existing LLM(GPT-3.5) to obtain the description of the Script. To evaluate the effectiveness of our data augmentation method, we finetune Llama2-13B and Llama2-7B models using the dataset generated by our augmentation framework. The results demonstrate a significant improvement in the Verilog generation tasks with LLMs. Moreover, the accuracy of Verilog generation surpasses that of the current state-of-the-art open-source Verilog generation model, increasing from 58.8% to 70.6% with the same benchmark. Our 13B model (ChipGPT-FT) has a pass rate improvement compared with GPT-3.5 in Verilog generation and outperforms in EDA script (i.e., SiliconCompiler) generation with only 200 EDA script data.
△ Less
Submitted 10 July, 2024; v1 submitted 17 March, 2024;
originally announced March 2024.
-
Rumor Mitigation in Social Media Platforms with Deep Reinforcement Learning
Authors:
Hongyuan Su,
Yu Zheng,
Jingtao Ding,
Depeng Jin,
Yong Li
Abstract:
Social media platforms have become one of the main channels where people disseminate and acquire information, of which the reliability is severely threatened by rumors widespread in the network. Existing approaches such as suspending users or broadcasting real information to combat rumors are either with high cost or disturbing users. In this paper, we introduce a novel rumor mitigation paradigm,…
▽ More
Social media platforms have become one of the main channels where people disseminate and acquire information, of which the reliability is severely threatened by rumors widespread in the network. Existing approaches such as suspending users or broadcasting real information to combat rumors are either with high cost or disturbing users. In this paper, we introduce a novel rumor mitigation paradigm, where only a minimal set of links in the social network are intervened to decelerate the propagation of rumors, countering misinformation with low business cost and user awareness. A knowledge-informed agent embodying rumor propagation mechanisms is developed, which intervenes the social network with a graph neural network for capturing information flow in the social media platforms and a policy network for selecting links. Experiments on real social media platforms demonstrate that the proposed approach can effectively alleviate the influence of rumors, substantially reducing the affected populations by over 25%. Codes for this paper are released at https://github.com/tsinghua-fib-lab/DRL-Rumor-Mitigation.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
MetroGNN: Metro Network Expansion with Reinforcement Learning
Authors:
Hongyuan Su,
Yu Zheng,
Jingtao Ding,
Depeng Jin,
Yong Li
Abstract:
Selecting urban regions for metro network expansion to meet maximal transportation demands is crucial for urban development, while computationally challenging to solve. The expansion process relies not only on complicated features like urban demographics and origin-destination (OD) flow but is also constrained by the existing metro network and urban geography. In this paper, we introduce a reinfor…
▽ More
Selecting urban regions for metro network expansion to meet maximal transportation demands is crucial for urban development, while computationally challenging to solve. The expansion process relies not only on complicated features like urban demographics and origin-destination (OD) flow but is also constrained by the existing metro network and urban geography. In this paper, we introduce a reinforcement learning framework to address a Markov decision process within an urban heterogeneous multi-graph. Our approach employs an attentive policy network that intelligently selects nodes based on information captured by a graph neural network. Experiments on real-world urban data demonstrate that our proposed methodology substantially improve the satisfied transportation demands by over 30\% when compared with state-of-the-art methods. Codes are published at https://github.com/tsinghua-fib-lab/MetroGNN.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
GMKF: Generalized Moment Kalman Filter for Polynomial Systems with Arbitrary Noise
Authors:
Sangli Teng,
Harry Zhang,
David Jin,
Ashkan Jasour,
Maani Ghaffari,
Luca Carlone
Abstract:
This paper develops a new filtering approach for state estimation in polynomial systems corrupted by arbitrary noise, which commonly arise in robotics. We first consider a batch setup where we perform state estimation using all data collected from the initial to the current time. We formulate the batch state estimation problem as a Polynomial Optimization Problem (POP) and relax the assumption of…
▽ More
This paper develops a new filtering approach for state estimation in polynomial systems corrupted by arbitrary noise, which commonly arise in robotics. We first consider a batch setup where we perform state estimation using all data collected from the initial to the current time. We formulate the batch state estimation problem as a Polynomial Optimization Problem (POP) and relax the assumption of Gaussian noise by specifying a finite number of moments of the noise. We solve the resulting POP using a moment relaxation and prove that under suitable conditions on the rank of the relaxation, (i) we can extract a provably optimal estimate from the moment relaxation, and (ii) we can obtain a belief representation from the dual (sum-of-squares) relaxation. We then turn our attention to the filtering setup and apply similar insights to develop a GMKF for recursive state estimation in polynomial systems with arbitrary noise. The GMKF formulates the prediction and update steps as POPs and solves them using moment relaxations, carrying over a possibly non-Gaussian belief. In the linear-Gaussian case, GMKF reduces to the standard Kalman Filter. We demonstrate that GMKF performs well under highly non-Gaussian noise and outperforms common alternatives, including the Extended and Unscented Kalman Filter, and their variants on matrix Lie group.
△ Less
Submitted 8 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
InjectTST: A Transformer Method of Injecting Global Information into Independent Channels for Long Time Series Forecasting
Authors:
Ce Chi,
Xing Wang,
Kexin Yang,
Zhiyan Song,
Di Jin,
Lin Zhu,
Chao Deng,
Junlan Feng
Abstract:
Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent…
▽ More
Transformer has become one of the most popular architectures for multivariate time series (MTS) forecasting. Recent Transformer-based MTS models generally prefer channel-independent structures with the observation that channel independence can alleviate noise and distribution drift issues, leading to more robustness. Nevertheless, it is essential to note that channel dependency remains an inherent characteristic of MTS, carrying valuable information. Designing a model that incorporates merits of both channel-independent and channel-mixing structures is a key to further improvement of MTS forecasting, which poses a challenging conundrum. To address the problem, an injection method for global information into channel-independent Transformer, InjectTST, is proposed in this paper. Instead of designing a channel-mixing model directly, we retain the channel-independent backbone and gradually inject global information into individual channels in a selective way. A channel identifier, a global mixing module and a self-contextual attention module are devised in InjectTST. The channel identifier can help Transformer distinguish channels for better representation. The global mixing module produces cross-channel global information. Through the self-contextual attention module, the independent channels can selectively concentrate on useful global information without robustness degradation, and channel mixing is achieved implicitly. Experiments indicate that InjectTST can achieve stable improvement compared with state-of-the-art models.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Authors:
Tony C. W. Mok,
Zi Li,
Yunhao Bai,
Jianpeng Zhang,
Wei Liu,
Yan-Jie Zhou,
Ke Yan,
Dakai Jin,
Yu Shi,
Xiaoli Yin,
Le Lu,
Ling Zhang
Abstract:
Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise,…
▽ More
Establishing dense anatomical correspondence across distinct imaging modalities is a foundational yet challenging procedure for numerous medical image analysis studies and image-guided radiotherapy. Existing multi-modality image registration algorithms rely on statistical-based similarity measures or local structural image representations. However, the former is sensitive to locally varying noise, while the latter is not discriminative enough to cope with complex anatomical structures in multimodal scans, causing ambiguity in determining the anatomical correspondence across scans with different modalities. In this paper, we propose a modality-agnostic structural representation learning method, which leverages Deep Neighbourhood Self-similarity (DNS) and anatomy-aware contrastive learning to learn discriminative and contrast-invariance deep structural image representations (DSIR) without the need for anatomical delineations or pre-aligned training images. We evaluate our method on multiphase CT, abdomen MR-CT, and brain MR T1w-T2w registration. Comprehensive results demonstrate that our method is superior to the conventional local structural representation and statistical-based similarity measures in terms of discriminability and accuracy.
△ Less
Submitted 31 March, 2024; v1 submitted 29 February, 2024;
originally announced February 2024.
-
Large Language Model for Participatory Urban Planning
Authors:
Zhilun Zhou,
Yuming Lin,
Depeng Jin,
Yong Li
Abstract:
Participatory urban planning is the mainstream of modern urban planning that involves the active engagement of residents. However, the traditional participatory paradigm requires experienced planning experts and is often time-consuming and costly. Fortunately, the emerging Large Language Models (LLMs) have shown considerable ability to simulate human-like agents, which can be used to emulate the p…
▽ More
Participatory urban planning is the mainstream of modern urban planning that involves the active engagement of residents. However, the traditional participatory paradigm requires experienced planning experts and is often time-consuming and costly. Fortunately, the emerging Large Language Models (LLMs) have shown considerable ability to simulate human-like agents, which can be used to emulate the participatory process easily. In this work, we introduce an LLM-based multi-agent collaboration framework for participatory urban planning, which can generate land-use plans for urban regions considering the diverse needs of residents. Specifically, we construct LLM agents to simulate a planner and thousands of residents with diverse profiles and backgrounds. We first ask the planner to carry out an initial land-use plan. To deal with the different facilities needs of residents, we initiate a discussion among the residents in each community about the plan, where residents provide feedback based on their profiles. Furthermore, to improve the efficiency of discussion, we adopt a fishbowl discussion mechanism, where part of the residents discuss and the rest of them act as listeners in each round. Finally, we let the planner modify the plan based on residents' feedback. We deploy our method on two real-world regions in Beijing. Experiments show that our method achieves state-of-the-art performance in residents satisfaction and inclusion metrics, and also outperforms human experts in terms of service accessibility and ecology metrics.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Spatio-Temporal Few-Shot Learning via Diffusive Neural Network Generation
Authors:
Yuan Yuan,
Chenyang Shao,
Jingtao Ding,
Depeng Jin,
Yong Li
Abstract:
Spatio-temporal modeling is foundational for smart city applications, yet it is often hindered by data scarcity in many cities and regions. To bridge this gap, we propose a novel generative pre-training framework, GPD, for spatio-temporal few-shot learning with urban knowledge transfer. Unlike conventional approaches that heavily rely on common feature extraction or intricate few-shot learning des…
▽ More
Spatio-temporal modeling is foundational for smart city applications, yet it is often hindered by data scarcity in many cities and regions. To bridge this gap, we propose a novel generative pre-training framework, GPD, for spatio-temporal few-shot learning with urban knowledge transfer. Unlike conventional approaches that heavily rely on common feature extraction or intricate few-shot learning designs, our solution takes a novel approach by performing generative pre-training on a collection of neural network parameters optimized with data from source cities. We recast spatio-temporal few-shot learning as pre-training a generative diffusion model, which generates tailored neural networks guided by prompts, allowing for adaptability to diverse data distributions and city-specific characteristics. GPD employs a Transformer-based denoising diffusion model, which is model-agnostic to integrate with powerful spatio-temporal neural networks. By addressing challenges arising from data gaps and the complexity of generalizing knowledge across cities, our framework consistently outperforms state-of-the-art baselines on multiple real-world datasets for tasks such as traffic speed prediction and crowd flow prediction. The implementation of our approach is available: https://github.com/tsinghua-fib-lab/GPD.
△ Less
Submitted 25 March, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
UniST: A Prompt-Empowered Universal Model for Urban Spatio-Temporal Prediction
Authors:
Yuan Yuan,
Jingtao Ding,
Jie Feng,
Depeng Jin,
Yong Li
Abstract:
Urban spatio-temporal prediction is crucial for informed decision-making, such as traffic management, resource optimization, and emergence response. Despite remarkable breakthroughs in pretrained natural language models that enable one model to handle diverse tasks, a universal solution for spatio-temporal prediction remains challenging Existing prediction approaches are typically tailored for spe…
▽ More
Urban spatio-temporal prediction is crucial for informed decision-making, such as traffic management, resource optimization, and emergence response. Despite remarkable breakthroughs in pretrained natural language models that enable one model to handle diverse tasks, a universal solution for spatio-temporal prediction remains challenging Existing prediction approaches are typically tailored for specific spatio-temporal scenarios, requiring task-specific model designs and extensive domain-specific training data. In this study, we introduce UniST, a universal model designed for general urban spatio-temporal prediction across a wide range of scenarios. Inspired by large language models, UniST achieves success through: (i) utilizing diverse spatio-temporal data from different scenarios, (ii) effective pre-training to capture complex spatio-temporal dynamics, (iii) knowledge-guided prompts to enhance generalization capabilities. These designs together unlock the potential of building a universal model for various scenarios Extensive experiments on more than 20 spatio-temporal scenarios demonstrate UniST's efficacy in advancing state-of-the-art performance, especially in few-shot and zero-shot prediction. The datasets and code implementation are released on https://github.com/tsinghua-fib-lab/UniST.
△ Less
Submitted 30 June, 2024; v1 submitted 19 February, 2024;
originally announced February 2024.
-
Vision-Flan: Scaling Human-Labeled Tasks in Visual Instruction Tuning
Authors:
Zhiyang Xu,
Chao Feng,
Rulin Shao,
Trevor Ashby,
Ying Shen,
Di Jin,
Yu Cheng,
Qifan Wang,
Lifu Huang
Abstract:
Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and…
▽ More
Despite vision-language models' (VLMs) remarkable capabilities as versatile visual assistants, two substantial challenges persist within the existing VLM frameworks: (1) lacking task diversity in pretraining and visual instruction tuning, and (2) annotation error and bias in GPT-4 synthesized instruction tuning data. Both challenges lead to issues such as poor generalizability, hallucination, and catastrophic forgetting. To address these challenges, we construct Vision-Flan, the most diverse publicly available visual instruction tuning dataset to date, comprising 187 diverse tasks and 1,664,261 instances sourced from academic datasets, and each task is accompanied by an expert-written instruction. In addition, we propose a two-stage instruction tuning framework, in which VLMs are firstly finetuned on Vision-Flan and further tuned on GPT-4 synthesized data. We find this two-stage tuning framework significantly outperforms the traditional single-stage visual instruction tuning framework and achieves the state-of-the-art performance across a wide range of multi-modal evaluation benchmarks. Finally, we conduct in-depth analyses to understand visual instruction tuning and our findings reveal that: (1) GPT-4 synthesized data does not substantially enhance VLMs' capabilities but rather modulates the model's responses to human-preferred formats; (2) A minimal quantity (e.g., 1,000) of GPT-4 synthesized data can effectively align VLM responses with human-preference; (3) Visual instruction tuning mainly helps large-language models (LLMs) to understand visual features.
△ Less
Submitted 18 February, 2024;
originally announced February 2024.
-
Multi-Model 3D Registration: Finding Multiple Moving Objects in Cluttered Point Clouds
Authors:
David Jin,
Sushrut Karmalkar,
Harry Zhang,
Luca Carlone
Abstract:
We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D…
▽ More
We investigate a variation of the 3D registration problem, named multi-model 3D registration. In the multi-model registration problem, we are given two point clouds picturing a set of objects at different poses (and possibly including points belonging to the background) and we want to simultaneously reconstruct how all objects moved between the two point clouds. This setup generalizes standard 3D registration where one wants to reconstruct a single pose, e.g., the motion of the sensor picturing a static scene. Moreover, it provides a mathematically grounded formulation for relevant robotics applications, e.g., where a depth sensor onboard a robot perceives a dynamic scene and has the goal of estimating its own motion (from the static portion of the scene) while simultaneously recovering the motion of all dynamic objects. We assume a correspondence-based setup where we have putative matches between the two point clouds and consider the practical case where these correspondences are plagued with outliers. We then propose a simple approach based on Expectation-Maximization (EM) and establish theoretical conditions under which the EM approach converges to the ground truth. We evaluate the approach in simulated and real datasets ranging from table-top scenes to self-driving scenarios and demonstrate its effectiveness when combined with state-of-the-art scene flow methods to establish dense correspondences.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Slow-Wave Hybrid Magnonics
Authors:
Jing Xu,
Changchun Zhong,
Shihao Zhuang,
Chen Qian,
Yu Jiang,
Amin Pishehvar,
Xu Han,
Dafei Jin,
Josep M. Jornet,
Bo Zhen,
Jiamian Hu,
Liang Jiang,
Xufeng Zhang
Abstract:
Cavity magnonics is an emerging research area focusing on the coupling between magnons and photons. Despite its great potential for coherent information processing, it has been long restricted by the narrow interaction bandwidth. In this work, we theoretically propose and experimentally demonstrate a novel approach to achieve broadband photon-magnon coupling by adopting slow waves on engineered mi…
▽ More
Cavity magnonics is an emerging research area focusing on the coupling between magnons and photons. Despite its great potential for coherent information processing, it has been long restricted by the narrow interaction bandwidth. In this work, we theoretically propose and experimentally demonstrate a novel approach to achieve broadband photon-magnon coupling by adopting slow waves on engineered microwave waveguides. To the best of our knowledge, this is the first time that slow wave is combined with hybrid magnonics. Its unique properties promise great potentials for both fundamental research and practical applications, for instance, by deepening our understanding of the light-matter interaction in the slow wave regime and providing high-efficiency spin wave transducers. The device concept can be extended to other systems such as optomagnonics and magnomechanics, opening up new directions for hybrid magnonics.
△ Less
Submitted 13 February, 2024;
originally announced February 2024.
-
Diffusion Model-based Probabilistic Downscaling for 180-year East Asian Climate Reconstruction
Authors:
Fenghua Ling,
Zeyu Lu,
Jing-Jia Luo,
Lei Bai,
Swadhin K. Behera,
Dachao Jin,
Baoxiang Pan,
Huidong Jiang,
Toshio Yamagata
Abstract:
As our planet is entering into the "global boiling" era, understanding regional climate change becomes imperative. Effective downscaling methods that provide localized insights are crucial for this target. Traditional approaches, including computationally-demanding regional dynamical models or statistical downscaling frameworks, are often susceptible to the influence of downscaling uncertainty. He…
▽ More
As our planet is entering into the "global boiling" era, understanding regional climate change becomes imperative. Effective downscaling methods that provide localized insights are crucial for this target. Traditional approaches, including computationally-demanding regional dynamical models or statistical downscaling frameworks, are often susceptible to the influence of downscaling uncertainty. Here, we address these limitations by introducing a diffusion probabilistic downscaling model (DPDM) into the meteorological field. This model can efficiently transform data from 1° to 0.1° resolution. Compared with deterministic downscaling schemes, it not only has more accurate local details, but also can generate a large number of ensemble members based on probability distribution sampling to evaluate the uncertainty of downscaling. Additionally, we apply the model to generate a 180-year dataset of monthly surface variables in East Asia, offering a more detailed perspective for understanding local scale climate change over the past centuries.
△ Less
Submitted 5 April, 2024; v1 submitted 1 February, 2024;
originally announced February 2024.
-
GOODAT: Towards Test-time Graph Out-of-Distribution Detection
Authors:
Luzhi Wang,
Dongxiao He,
He Zhang,
Yixin Liu,
Wenjie Wang,
Shirui Pan,
Di Jin,
Tat-Seng Chua
Abstract:
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts (in distribution, ID), they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD sa…
▽ More
Graph neural networks (GNNs) have found widespread application in modeling graph data across diverse domains. While GNNs excel in scenarios where the testing data shares the distribution of their training counterparts (in distribution, ID), they often exhibit incorrect predictions when confronted with samples from an unfamiliar distribution (out-of-distribution, OOD). To identify and reject OOD samples with GNNs, recent studies have explored graph OOD detection, often focusing on training a specific model or modifying the data on top of a well-trained GNN. Despite their effectiveness, these methods come with heavy training resources and costs, as they need to optimize the GNN-based models on training data. Moreover, their reliance on modifying the original GNNs and accessing training data further restricts their universality. To this end, this paper introduces a method to detect Graph Out-of-Distribution At Test-time (namely GOODAT), a data-centric, unsupervised, and plug-and-play solution that operates independently of training data and modifications of GNN architecture. With a lightweight graph masker, GOODAT can learn informative subgraphs from test samples, enabling the capture of distinct graph patterns between OOD and ID samples. To optimize the graph masker, we meticulously design three unsupervised objective functions based on the graph information bottleneck principle, motivating the masker to capture compact yet informative subgraphs for OOD detection. Comprehensive evaluations confirm that our GOODAT method outperforms state-of-the-art benchmarks across a variety of real-world datasets. The code is available at Github: https://github.com/Ee1s/GOODAT
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Robust Geometry and Reflectance Disentanglement for 3D Face Reconstruction from Sparse-view Images
Authors:
Daisheng Jin,
Jiangbei Hu,
Baixin Xu,
Yuxin Dai,
Chen Qian,
Ying He
Abstract:
This paper presents a novel two-stage approach for reconstructing human faces from sparse-view images, a task made challenging by the unique geometry and complex skin reflectance of each individual. Our method focuses on decomposing key facial attributes, including geometry, diffuse reflectance, and specular reflectance, from ambient light. Initially, we create a general facial template from a div…
▽ More
This paper presents a novel two-stage approach for reconstructing human faces from sparse-view images, a task made challenging by the unique geometry and complex skin reflectance of each individual. Our method focuses on decomposing key facial attributes, including geometry, diffuse reflectance, and specular reflectance, from ambient light. Initially, we create a general facial template from a diverse collection of individual faces, capturing essential geometric and reflectance characteristics. Guided by this template, we refine each specific face model in the second stage, which further considers the interaction between geometry and reflectance, as well as the subsurface scattering effects on facial skin. Our method enables the reconstruction of high-quality facial representations from as few as three images, offering improved geometric accuracy and reflectance detail. Through comprehensive evaluations and comparisons, our method demonstrates superiority over existing techniques. Our method effectively disentangles geometry and reflectance components, leading to enhanced quality in synthesizing new views and opening up possibilities for applications such as relighting and reflectance editing. We will make the code publicly available.
△ Less
Submitted 10 December, 2023;
originally announced December 2023.
-
SAME++: A Self-supervised Anatomical eMbeddings Enhanced medical image registration framework using stable sampling and regularized transformation
Authors:
Lin Tian,
Zi Li,
Fengze Liu,
Xiaoyu Bai,
Jia Ge,
Le Lu,
Marc Niethammer,
Xianghua Ye,
Ke Yan,
Daikai Jin
Abstract:
Image registration is a fundamental medical image analysis task. Ideally, registration should focus on aligning semantically corresponding voxels, i.e., the same anatomical locations. However, existing methods often optimize similarity measures computed directly on intensities or on hand-crafted features, which lack anatomical semantic information. These similarity measures may lead to sub-optimal…
▽ More
Image registration is a fundamental medical image analysis task. Ideally, registration should focus on aligning semantically corresponding voxels, i.e., the same anatomical locations. However, existing methods often optimize similarity measures computed directly on intensities or on hand-crafted features, which lack anatomical semantic information. These similarity measures may lead to sub-optimal solutions where large deformations, complex anatomical differences, or cross-modality imagery exist. In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration building on top of a Self-supervised Anatomical eMbedding (SAM) algorithm, which is capable of computing dense anatomical correspondences between two images at the voxel level. We name our approach SAM-Enhanced registration (SAME++), which decomposes image registration into four steps: affine transformation, coarse deformation, deep non-parametric transformation, and instance optimization. Using SAM embeddings, we enhance these steps by finding more coherent correspondence and providing features with better semantic guidance. We extensively evaluated SAME++ using more than 50 labeled organs on three challenging inter-subject registration tasks of different body parts. As a complete registration framework, SAME++ markedly outperforms leading methods by $4.2\%$ - $8.2\%$ in terms of Dice score while being orders of magnitude faster than numerical optimization-based methods. Code is available at \url{https://github.com/alibaba-damo-academy/same}.
△ Less
Submitted 25 February, 2024; v1 submitted 25 November, 2023;
originally announced November 2023.
-
Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
Authors:
Di Jin,
Shikib Mehri,
Devamanyu Hazarika,
Aishwarya Padmakumar,
Sungjin Lee,
Yang Liu,
Mahdi Namazifar
Abstract:
Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which ma…
▽ More
Learning from human feedback is a prominent technique to align the output of large language models (LLMs) with human expectations. Reinforcement learning from human feedback (RLHF) leverages human preference signals that are in the form of ranking of response pairs to perform this alignment. However, human preference on LLM outputs can come in much richer forms including natural language, which may provide detailed feedback on strengths and weaknesses of a given response. In this work we investigate data efficiency of modeling human feedback that is in natural language. Specifically, we fine-tune an open-source LLM, e.g., Falcon-40B-Instruct, on a relatively small amount (1000 records or even less) of human feedback in natural language in the form of critiques and revisions of responses. We show that this model is able to improve the quality of responses from even some of the strongest LLMs such as ChatGPT, BARD, and Vicuna, through critique and revision of those responses. For instance, through one iteration of revision of ChatGPT responses, the revised responses have 56.6% win rate over the original ones, and this win rate can be further improved to 65.9% after applying the revision for five iterations.
△ Less
Submitted 24 November, 2023;
originally announced November 2023.
-
SkeletonGait: Gait Recognition Using Skeleton Maps
Authors:
Chao Fan,
Jingzhe Ma,
Dongyang Jin,
Chuanfu Shen,
Shiqi Yu
Abstract:
The choice of the representations is essential for deep gait recognition methods. The binary silhouettes and skeletal coordinates are two dominant representations in recent literature, achieving remarkable advances in many scenarios. However, inherent challenges remain, in which silhouettes are not always guaranteed in unconstrained scenes, and structural cues have not been fully utilized from ske…
▽ More
The choice of the representations is essential for deep gait recognition methods. The binary silhouettes and skeletal coordinates are two dominant representations in recent literature, achieving remarkable advances in many scenarios. However, inherent challenges remain, in which silhouettes are not always guaranteed in unconstrained scenes, and structural cues have not been fully utilized from skeletons. In this paper, we introduce a novel skeletal gait representation named skeleton map, together with SkeletonGait, a skeleton-based method to exploit structural information from human skeleton maps. Specifically, the skeleton map represents the coordinates of human joints as a heatmap with Gaussian approximation, exhibiting a silhouette-like image devoid of exact body structure. Beyond achieving state-of-the-art performances over five popular gait datasets, more importantly, SkeletonGait uncovers novel insights about how important structural features are in describing gait and when they play a role. Furthermore, we propose a multi-branch architecture, named SkeletonGait++, to make use of complementary features from both skeletons and silhouettes. Experiments indicate that SkeletonGait++ outperforms existing state-of-the-art methods by a significant margin in various scenarios. For instance, it achieves an impressive rank-1 accuracy of over 85% on the challenging GREW dataset. All the source code is available at https://github.com/ShiqiYu/OpenGait.
△ Less
Submitted 18 December, 2023; v1 submitted 22 November, 2023;
originally announced November 2023.
-
How AI-driven Digital Twins Can Empower Mobile Networks
Authors:
Tong Li,
Fenyu Jiang,
Qiaohong Yu,
Wenzhen Huang,
Tao Jiang,
Depeng Jin
Abstract:
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which…
▽ More
The growing complexity of next-generation networks exacerbates the modeling and algorithmic flaws of conventional network optimization methodology. In this paper, we propose a mobile network digital twin (MNDT) architecture for 6G networks. To address the modeling and algorithmic shortcomings, the MNDT uses a simulation-optimization structure. The feedback from the network simulation engine, which serves as validation for the optimizer's decision outcomes, is used explicitly to train artificial intelligence (AI) empowered optimizers iteratively. In practice, we develop a network digital twin prototype system leveraging data-driven technology to accurately model the behaviors of mobile network elements (e.g., mobile users and base stations), wireless environments, and network performance. An AI-powered network optimizer has been developed based on the deployed MNDT prototype system for providing reliable and optimized network configurations. The results of the experiments demonstrate that the proposed MNDT infrastructure can provide practical network optimization solutions while adapting to the more complex environment.
△ Less
Submitted 20 November, 2023;
originally announced November 2023.
-
Inverse Learning with Extremely Sparse Feedback for Recommendation
Authors:
Guanyu Lin,
Chen Gao,
Yu Zheng,
Yinfeng Li,
Jianxin Chang,
Yanan Niu,
Yang Song,
Kun Gai,
Zhiheng Li,
Depeng Jin,
Yong Li
Abstract:
Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback f…
▽ More
Modern personalized recommendation services often rely on user feedback, either explicit or implicit, to improve the quality of services. Explicit feedback refers to behaviors like ratings, while implicit feedback refers to behaviors like user clicks. However, in the scenario of full-screen video viewing experiences like Tiktok and Reels, the click action is absent, resulting in unclear feedback from users, hence introducing noises in modeling training. Existing approaches on de-noising recommendation mainly focus on positive instances while ignoring the noise in a large amount of sampled negative feedback. In this paper, we propose a meta-learning method to annotate the unlabeled data from loss and gradient perspectives, which considers the noises in both positive and negative instances. Specifically, we first propose an Inverse Dual Loss (IDL) to boost the true label learning and prevent the false label learning. Then we further propose an Inverse Gradient (IG) method to explore the correct updating gradient and adjust the updating based on meta-learning. Finally, we conduct extensive experiments on both benchmark and industrial datasets where our proposed method can significantly improve AUC by 9.25% against state-of-the-art methods. Further analysis verifies the proposed inverse learning framework is model-agnostic and can improve a variety of recommendation backbones. The source code, along with the best hyper-parameter settings, is available at this link: https://github.com/Guanyu-Lin/InverseLearning.
△ Less
Submitted 20 November, 2023; v1 submitted 14 November, 2023;
originally announced November 2023.
-
Mixed Attention Network for Cross-domain Sequential Recommendation
Authors:
Guanyu Lin,
Chen Gao,
Yu Zheng,
Jianxin Chang,
Yanan Niu,
Yang Song,
Kun Gai,
Zhiheng Li,
Depeng Jin,
Yong Li,
Meng Wang
Abstract:
In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain…
▽ More
In modern recommender systems, sequential recommendation leverages chronological user behaviors to make effective next-item suggestions, which suffers from data sparsity issues, especially for new users. One promising line of work is the cross-domain recommendation, which trains models with data across multiple domains to improve the performance in data-scarce domains. Recent proposed cross-domain sequential recommendation models such as PiNet and DASL have a common drawback relying heavily on overlapped users in different domains, which limits their usage in practical recommender systems. In this paper, we propose a Mixed Attention Network (MAN) with local and global attention modules to extract the domain-specific and cross-domain information. Firstly, we propose a local/global encoding layer to capture the domain-specific/cross-domain sequential pattern. Then we propose a mixed attention layer with item similarity attention, sequence-fusion attention, and group-prototype attention to capture the local/global item similarity, fuse the local/global item sequence, and extract the user groups across different domains, respectively. Finally, we propose a local/global prediction layer to further evolve and combine the domain-specific and cross-domain interests. Experimental results on two real-world datasets (each with two domains) demonstrate the superiority of our proposed model. Further study also illustrates that our proposed method and components are model-agnostic and effective, respectively. The code and data are available at https://github.com/Guanyu-Lin/MAN.
△ Less
Submitted 14 November, 2023;
originally announced November 2023.
-
Single-electron qubits based on quantum ring states on solid neon surface
Authors:
Toshiaki Kanai,
Dafei Jin,
Wei Guo
Abstract:
Single electrons trapped on solid neon surfaces (eNe) have recently emerged as a promising platform for charge qubits. Experimental results have revealed their exceptionally long coherence times, yet the actual quantum states of these trapped electrons, presumably on imperfectly flat neon surfaces, remain elusive. In this paper, we examine the electron's interactions with neon surface topography,…
▽ More
Single electrons trapped on solid neon surfaces (eNe) have recently emerged as a promising platform for charge qubits. Experimental results have revealed their exceptionally long coherence times, yet the actual quantum states of these trapped electrons, presumably on imperfectly flat neon surfaces, remain elusive. In this paper, we examine the electron's interactions with neon surface topography, such as bumps and valleys. By evaluating the surface charges induced by the electron, we demonstrate its strong perpendicular binding to the neon surface. The Schrödinger equation for the electron's lateral motion on the curved 2D surface is then solved for extensive topographical variations. Our results reveal that surface bumps can naturally bind an electron, forming unique quantum ring states that align with experimental observations. We also show that the electron's excitation energy can be tuned using a modest magnetic field to facilitate qubit operation. This study offers a leap in our understanding of eNe qubit properties and provides strategic insights on minimizing charge noise and scaling the system to propel forward quantum computing architectures.
△ Less
Submitted 30 May, 2024; v1 submitted 4 November, 2023;
originally announced November 2023.
-
Optimal vintage factor analysis with deflation varimax
Authors:
Xin Bing,
Dian Jin,
Yuqian Zhang
Abstract:
Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. Perhaps the most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popu…
▽ More
Vintage factor analysis is one important type of factor analysis that aims to first find a low-dimensional representation of the original data, and then to seek a rotation such that the rotated low-dimensional representation is scientifically meaningful. Perhaps the most widely used vintage factor analysis is the Principal Component Analysis (PCA) followed by the varimax rotation. Despite its popularity, little theoretical guarantee can be provided mainly because varimax rotation requires to solve a non-convex optimization over the set of orthogonal matrices.
In this paper, we propose a deflation varimax procedure that solves each row of an orthogonal matrix sequentially. In addition to its net computational gain and flexibility, we are able to fully establish theoretical guarantees for the proposed procedure in a broad context.
Adopting this new varimax approach as the second step after PCA, we further analyze this two step procedure under a general class of factor models. Our results show that it estimates the factor loading matrix in the optimal rate when the signal-to-noise-ratio (SNR) is moderate or large. In the low SNR regime, we offer possible improvement over using PCA and the deflation procedure when the additive noise under the factor model is structured. The modified procedure is shown to be optimal in all SNR regimes. Our theory is valid for finite sample and allows the number of the latent factors to grow with the sample size as well as the ambient dimension to grow with, or even exceed, the sample size.
Extensive simulation and real data analysis further corroborate our theoretical findings.
△ Less
Submitted 16 October, 2023;
originally announced October 2023.
-
Stance Detection with Collaborative Role-Infused LLM-Based Agents
Authors:
Xiaochong Lan,
Chen Gao,
Depeng Jin,
Yong Li
Abstract:
Stance detection automatically detects the stance in a text towards a target, vital for content analysis in web and social media research. Despite their promising capabilities, LLMs encounter challenges when directly applied to stance detection. First, stance detection demands multi-aspect knowledge, from deciphering event-related terminologies to understanding the expression styles in social medi…
▽ More
Stance detection automatically detects the stance in a text towards a target, vital for content analysis in web and social media research. Despite their promising capabilities, LLMs encounter challenges when directly applied to stance detection. First, stance detection demands multi-aspect knowledge, from deciphering event-related terminologies to understanding the expression styles in social media platforms. Second, stance detection requires advanced reasoning to infer authors' implicit viewpoints, as stance are often subtly embedded rather than overtly stated in the text. To address these challenges, we design a three-stage framework COLA (short for Collaborative rOle-infused LLM-based Agents) in which LLMs are designated distinct roles, creating a collaborative system where each role contributes uniquely. Initially, in the multidimensional text analysis stage, we configure the LLMs to act as a linguistic expert, a domain specialist, and a social media veteran to get a multifaceted analysis of texts, thus overcoming the first challenge. Next, in the reasoning-enhanced debating stage, for each potential stance, we designate a specific LLM-based agent to advocate for it, guiding the LLM to detect logical connections between text features and stance, tackling the second challenge. Finally, in the stance conclusion stage, a final decision maker agent consolidates prior insights to determine the stance. Our approach avoids extra annotated data and model training and is highly usable. We achieve state-of-the-art performance across multiple datasets. Ablation studies validate the effectiveness of each design role in handling stance detection. Further experiments have demonstrated the explainability and the versatility of our approach. Our approach excels in usability, accuracy, effectiveness, explainability and versatility, highlighting its value.
△ Less
Submitted 16 April, 2024; v1 submitted 16 October, 2023;
originally announced October 2023.
-
How enlightened self-interest guided global vaccine sharing benefits all: a modelling study
Authors:
Zhenyu Han,
Qianyue Hao,
Qiwei He,
Katherine Budeski,
Depeng Jin,
Fengli Xu,
Kun Tang
Abstract:
Background: Despite the consensus that vaccines play an important role in combating the global spread of infectious diseases, vaccine inequity is still rampant with deep-seated mentality of self-priority. This study aims to evaluate the existence and possible outcomes of a more equitable global vaccine distribution and explore a concrete incentive mechanism that promotes vaccine equity. Methods: W…
▽ More
Background: Despite the consensus that vaccines play an important role in combating the global spread of infectious diseases, vaccine inequity is still rampant with deep-seated mentality of self-priority. This study aims to evaluate the existence and possible outcomes of a more equitable global vaccine distribution and explore a concrete incentive mechanism that promotes vaccine equity. Methods: We design a metapopulation epidemiological model that simultaneously considers global vaccine distribution and human mobility, which is then calibrated by the number of infections and real-world vaccination records during COVID-19 pandemic from March 2020 to July 2021. We explore the possibility of the enlightened self-interest incentive mechanism, i.e., improving one's own epidemic outcomes by sharing vaccines with other countries, by evaluating the number of infections and deaths under various vaccine sharing strategies using the proposed model. To understand how these strategies affect the national interests, we distinguish the imported and local cases for further cost-benefit analyses that rationalize the enlightened self-interest incentive mechanism behind vaccine sharing. ...
△ Less
Submitted 13 October, 2023;
originally announced October 2023.
-
Collaborative Distributed Machine Learning
Authors:
David Jin,
Niclas Kannengießer,
Sascha Rank,
Ali Sunyaev
Abstract:
Various collaborative distributed machine learning (CDML) systems, including federated learning systems and swarm learning systems, with different key traits were developed to leverage resources for development and use of machine learning (ML) models in a confidentiality-preserving way. To meet use case requirements, suitable CDML systems need to be selected. However, comparison between CDML syste…
▽ More
Various collaborative distributed machine learning (CDML) systems, including federated learning systems and swarm learning systems, with different key traits were developed to leverage resources for development and use of machine learning (ML) models in a confidentiality-preserving way. To meet use case requirements, suitable CDML systems need to be selected. However, comparison between CDML systems regarding their suitability for use cases is often difficult. This work presents a CDML system conceptualization and CDML archetypes to support comparison of CDML systems and introduce scientific and practical audiences to the principal functioning and key traits of CDML systems.
△ Less
Submitted 21 March, 2024; v1 submitted 28 September, 2023;
originally announced September 2023.
-
Towards Generative Modeling of Urban Flow through Knowledge-enhanced Denoising Diffusion
Authors:
Zhilun Zhou,
Jingtao Ding,
Yu Liu,
Depeng Jin,
Yong Li
Abstract:
Although generative AI has been successful in many areas, its ability to model geospatial data is still underexplored. Urban flow, a typical kind of geospatial data, is critical for a wide range of urban applications. Existing studies mostly focus on predictive modeling of urban flow that predicts the future flow based on historical flow data, which may be unavailable in data-sparse areas or newly…
▽ More
Although generative AI has been successful in many areas, its ability to model geospatial data is still underexplored. Urban flow, a typical kind of geospatial data, is critical for a wide range of urban applications. Existing studies mostly focus on predictive modeling of urban flow that predicts the future flow based on historical flow data, which may be unavailable in data-sparse areas or newly planned regions. Some other studies aim to predict OD flow among regions but they fail to model dynamic changes of urban flow over time. In this work, we study a new problem of urban flow generation that generates dynamic urban flow for regions without historical flow data. To capture the effect of multiple factors on urban flow, such as region features and urban environment, we employ diffusion model to generate urban flow for regions under different conditions. We first construct an urban knowledge graph (UKG) to model the urban environment and relationships between regions, based on which we design a knowledge-enhanced spatio-temporal diffusion model (KSTDiff) to generate urban flow for each region. Specifically, to accurately generate urban flow for regions with different flow volumes, we design a novel diffusion process guided by a volume estimator, which is learnable and customized for each region. Moreover, we propose a knowledge-enhanced denoising network to capture the spatio-temporal dependencies of urban flow as well as the impact of urban environment in the denoising process. Extensive experiments on four real-world datasets validate the superiority of our model over state-of-the-art baselines in urban flow generation. Further in-depth studies demonstrate the utility of generated urban flow data and the ability of our model for long-term flow generation and urban flow prediction. Our code is released at: https://github.com/tsinghua-fib-lab/KSTDiff-Urban-flow-generation.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Alleviating Video-Length Effect for Micro-video Recommendation
Authors:
Yuhan Quan,
Jingtao Ding,
Chen Gao,
Nian Li,
Lingling Yi,
Depeng Jin,
Yong Li
Abstract:
Micro-videos platforms such as TikTok are extremely popular nowadays. One important feature is that users no longer select interested videos from a set, instead they either watch the recommended video or skip to the next one. As a result, the time length of users' watching behavior becomes the most important signal for identifying preferences. However, our empirical data analysis has shown a video…
▽ More
Micro-videos platforms such as TikTok are extremely popular nowadays. One important feature is that users no longer select interested videos from a set, instead they either watch the recommended video or skip to the next one. As a result, the time length of users' watching behavior becomes the most important signal for identifying preferences. However, our empirical data analysis has shown a video-length effect that long videos are easier to receive a higher value of average view time, thus adopting such view-time labels for measuring user preferences can easily induce a biased model that favors the longer videos. In this paper, we propose a Video Length Debiasing Recommendation (VLDRec) method to alleviate such an effect for micro-video recommendation. VLDRec designs the data labeling approach and the sample generation module that better capture user preferences in a view-time oriented manner. It further leverages the multi-task learning technique to jointly optimize the above samples with original biased ones. Extensive experiments show that VLDRec can improve the users' view time by 1.81% and 11.32% on two real-world datasets, given a recommendation list of a fixed overall video length, compared with the best baseline method. Moreover, VLDRec is also more effective in matching users' interests in terms of the video content.
△ Less
Submitted 31 August, 2023; v1 submitted 27 August, 2023;
originally announced August 2023.
-
Learning and Optimization of Implicit Negative Feedback for Industrial Short-video Recommender System
Authors:
Yunzhu Pan,
Nian Li,
Chen Gao,
Jianxin Chang,
Yanan Niu,
Yang Song,
Depeng Jin,
Yong Li
Abstract:
Short-video recommendation is one of the most important recommendation applications in today's industrial information systems. Compared with other recommendation tasks, the enormous amount of feedback is the most typical characteristic. Specifically, in short-video recommendation, the easiest-to-collect user feedback is the skipping behavior, which leads to two critical challenges for the recommen…
▽ More
Short-video recommendation is one of the most important recommendation applications in today's industrial information systems. Compared with other recommendation tasks, the enormous amount of feedback is the most typical characteristic. Specifically, in short-video recommendation, the easiest-to-collect user feedback is the skipping behavior, which leads to two critical challenges for the recommendation model. First, the skipping behavior reflects implicit user preferences, and thus, it is challenging for interest extraction. Second, this kind of special feedback involves multiple objectives, such as total watching time and skipping rate, which is also very challenging. In this paper, we present our industrial solution in Kuaishou, which serves billion-level users every day. Specifically, we deploy a feedback-aware encoding module that extracts user preferences, taking the impact of context into consideration. We further design a multi-objective prediction module which well distinguishes the relation and differences among different model objectives in the short-video recommendation. We conduct extensive online A/B tests, along with detailed and careful analysis, which verify the effectiveness of our solution.
△ Less
Submitted 5 March, 2024; v1 submitted 25 August, 2023;
originally announced August 2023.
-
SkipcrossNets: Adaptive Skip-cross Fusion for Road Detection
Authors:
Xinyu Zhang,
Yan Gong,
Zhiwei Li,
Xin Gao,
Dafeng Jin,
Jun Li,
Huaping Liu
Abstract:
Multi-modal fusion is increasingly being used for autonomous driving tasks, as images from different modalities provide unique information for feature extraction. However, the existing two-stream networks are only fused at a specific network layer, which requires a lot of manual attempts to set up. As the CNN goes deeper, the two modal features become more and more advanced and abstract, and the f…
▽ More
Multi-modal fusion is increasingly being used for autonomous driving tasks, as images from different modalities provide unique information for feature extraction. However, the existing two-stream networks are only fused at a specific network layer, which requires a lot of manual attempts to set up. As the CNN goes deeper, the two modal features become more and more advanced and abstract, and the fusion occurs at the feature level with a large gap, which can easily hurt the performance. In this study, we propose a novel fusion architecture called skip-cross networks (SkipcrossNets), which combines adaptively LiDAR point clouds and camera images without being bound to a certain fusion epoch. Specifically, skip-cross connects each layer to each layer in a feed-forward manner, and for each layer, the feature maps of all previous layers are used as input and its own feature maps are used as input to all subsequent layers for the other modality, enhancing feature propagation and multi-modal features fusion. This strategy facilitates selection of the most similar feature layers from two data pipelines, providing a complementary effect for sparse point cloud features during fusion processes. The network is also divided into several blocks to reduce the complexity of feature fusion and the number of model parameters. The advantages of skip-cross fusion were demonstrated through application to the KITTI and A2D2 datasets, achieving a MaxF score of 96.85% on KITTI and an F1 score of 84.84% on A2D2. The model parameters required only 2.33 MB of memory at a speed of 68.24 FPS, which could be viable for mobile terminals and embedded devices.
△ Less
Submitted 24 August, 2023;
originally announced August 2023.