-
UrbanWorld: An Urban World Model for 3D City Generation
Authors:
Yu Shang,
Jiansheng Chen,
Hangyu Fan,
Jingtao Ding,
Jie Feng,
Yong Li
Abstract:
Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban en…
▽ More
Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban environments usually entails extensive manual labor from designers, involving intricate detailing and accurate representation of complex urban features. Therefore, how to accomplish this in an automatical way remains a longstanding challenge. Toward this problem, we propose UrbanWorld, the first generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. UrbanWorld incorporates four key stages in the automatical crafting pipeline: 3D layout generation from openly accessible OSM data, urban scene planning and designing with a powerful urban multimodal large language model (Urban MLLM), controllable urban asset rendering with advanced 3D diffusion techniques, and finally the MLLM-assisted scene refinement. The crafted high-fidelity 3D urban environments enable realistic feedback and interactions for general AI and machine perceptual systems in simulations. We are working on contributing UrbanWorld as an open-source and versatile platform for evaluating and improving AI abilities in perception, decision-making, and interaction in realistic urban environments.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Robust Score-Based Quickest Change Detection
Authors:
Sean Moushegian,
Suya Wu,
Enmao Diao,
Jie Ding,
Taposh Banerjee,
Vahid Tarokh
Abstract:
Methods in the field of quickest change detection rapidly detect in real-time a change in the data-generating distribution of an online data stream. Existing methods have been able to detect this change point when the densities of the pre- and post-change distributions are known. Recent work has extended these results to the case where the pre- and post-change distributions are known only by their…
▽ More
Methods in the field of quickest change detection rapidly detect in real-time a change in the data-generating distribution of an online data stream. Existing methods have been able to detect this change point when the densities of the pre- and post-change distributions are known. Recent work has extended these results to the case where the pre- and post-change distributions are known only by their score functions. This work considers the case where the pre- and post-change score functions are known only to correspond to distributions in two disjoint sets. This work employs a pair of "least-favorable" distributions to robustify the existing score-based quickest change detection algorithm, the properties of which are studied. This paper calculates the least-favorable distributions for specific model classes and provides methods of estimating the least-favorable distributions for common constructions. Simulation results are provided demonstrating the performance of our robust change detection algorithm.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
MedBench: A Comprehensive, Standardized, and Reliable Benchmarking System for Evaluating Chinese Medical Large Language Models
Authors:
Mianxin Liu,
Jinru Ding,
Jie Xu,
Weiguo Hu,
Xiaoyang Li,
Lifeng Zhu,
Zhian Bai,
Xiaoming Shi,
Benyou Wang,
Haitao Song,
Pengfei Liu,
Xiaofan Zhang,
Shanshan Wang,
Kang Li,
Haofen Wang,
Tong Ruan,
Xuanjing Huang,
Xin Sun,
Shaoting Zhang
Abstract:
Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese med…
▽ More
Ensuring the general efficacy and goodness for human beings from medical large language models (LLM) before real-world deployment is crucial. However, a widely accepted and accessible evaluation process for medical LLM, especially in the Chinese context, remains to be established. In this work, we introduce "MedBench", a comprehensive, standardized, and reliable benchmarking system for Chinese medical LLM. First, MedBench assembles the currently largest evaluation dataset (300,901 questions) to cover 43 clinical specialties and performs multi-facet evaluation on medical LLM. Second, MedBench provides a standardized and fully automatic cloud-based evaluation infrastructure, with physical separations for question and ground truth. Third, MedBench implements dynamic evaluation mechanisms to prevent shortcut learning and answer remembering. Applying MedBench to popular general and medical LLMs, we observe unbiased, reproducible evaluation results largely aligning with medical professionals' perspectives. This study establishes a significant foundation for preparing the practical applications of Chinese medical LLMs. MedBench is publicly accessible at https://medbench.opencompass.org.cn.
△ Less
Submitted 23 June, 2024;
originally announced July 2024.
-
New Paradigm for Secure Full-Duplex Transmission: Movable Antenna-Aided Multi-User Systems
Authors:
Jingze Ding,
Zijian Zhou,
Bingli Jiao
Abstract:
In this paper, we investigate physical layer security (PLS) for full-duplex (FD) multi-user systems. To simultaneously protect uplink (UL) and downlink (DL) transmissions and ensure efficient use of time-frequency resources, we consider a base station (BS) that operates in FD mode and enables to emit the artificial noise (AN). Conventional fixed-position antennas (FPAs) at the BS struggle to fully…
▽ More
In this paper, we investigate physical layer security (PLS) for full-duplex (FD) multi-user systems. To simultaneously protect uplink (UL) and downlink (DL) transmissions and ensure efficient use of time-frequency resources, we consider a base station (BS) that operates in FD mode and enables to emit the artificial noise (AN). Conventional fixed-position antennas (FPAs) at the BS struggle to fully exploit spatial degrees of freedom (DoFs). Therefore, we propose a new paradigm for secure FD multi-user systems, where multiple transmit and receive movable antennas (MAs) are deployed at the BS to serve UL and DL users and effectively counter the cooperative interception by multiple eavesdroppers (Eves). Specifically, the MA positions, the transmit, receive, and AN beamformers at the BS, and the UL powers are jointly optimized to maximize the sum of secrecy rates (SSR). To solve the challenging non-convex optimization problem with highly coupled variables, we propose an alternating optimization (AO) algorithm. This algorithm decomposes the original problem into three sub-problems, which are iteratively solved by the proposed multi-velocity particle swarm optimization (MVPSO) and successive convex approximation (SCA). Simulation results demonstrate that the proposed scheme for MA-aided secure FD multi-user systems can significantly enhance security performance compared to conventional FPA systems.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Entropy Increasing Numerical Methods for Prediction of Non-isothermal Electrokinetics in Supercapacitors
Authors:
Jie Ding,
Xiang Ji,
Shenggao Zhou
Abstract:
Accurate characterization of entropy plays a pivotal role in capturing reversible and irreversible heating in supercapacitors during charging/discharging cycles. However, numerical methods that can faithfully capture entropy variation in supercapacitors are still in lack. This work proposes a novel second-order accurate finite-volume scheme for a Poisson--Nernst--Planck--Fourier model developed in…
▽ More
Accurate characterization of entropy plays a pivotal role in capturing reversible and irreversible heating in supercapacitors during charging/discharging cycles. However, numerical methods that can faithfully capture entropy variation in supercapacitors are still in lack. This work proposes a novel second-order accurate finite-volume scheme for a Poisson--Nernst--Planck--Fourier model developed in our previous work for the description of non-isothermal electrokinetics in supercapacitors. The temporal second-order accuracy with original entropy increase is achieved by modified Crank-Nicolson discretization for the logarithm of both temperature and ionic concentrations. Numerical analysis rigorously proves that the proposed schemes are able to preserve ionic mass conservation and entropy increase for a closed, thermally insulated supercapacitor. Numerical positivity of temperature and ionic concentrations is guaranteed by using logarithmic transformations. Extensive numerical simulations show that the proposed schemes have expected accuracy and robust performance in preserving the desired properties. Temperature oscillation in the charging/discharging processes is successfully predicted, unraveling a quadratic scaling law of temperature rising slope against voltage scanning rate. It is also demonstrated that the variation of ionic entropy contribution, which is the underlying mechanism responsible for reversible heating, is faithfully captured. Our work provides a promising tool in predicting non-isothermal electrokinetics of supercapacitors.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
WizardMerge -- Save Us From Merging Without Any Clues
Authors:
Qingyu Zhang,
Junzhe Li,
Jiayi Lin,
Jie Ding,
Lanteng Lin,
Chenxiong Qian
Abstract:
Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, dev…
▽ More
Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, developers remain struggling to resolve the conflicts and fix incorrectly modified code without clues. We present WizardMerge, an auxiliary tool that leverages merging results from Git to retrieve code block dependency on text and LLVM-IR level and provide suggestions for developers to resolve errors introduced by textual merging. Through the evaluation, we subjected WizardMerge to testing on 227 conflicts within five large-scale projects. The outcomes demonstrate that WizardMerge diminishes conflict merging time costs, achieving a 23.85% reduction. Beyond addressing conflicts, WizardMerge provides merging suggestions for over 70% of the code blocks potentially affected by the conflicts. Notably, WizardMerge exhibits the capability to identify conflict-unrelated code blocks that require manual intervention yet are harmfully applied by Git during the merging.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
A Comparative Study of Quality Evaluation Methods for Text Summarization
Authors:
Huyen Nguyen,
Haihua Chen,
Lavanya Pobbathi,
Junhua Ding
Abstract:
Evaluating text summarization has been a challenging task in natural language processing (NLP). Automatic metrics which heavily rely on reference summaries are not suitable in many situations, while human evaluation is time-consuming and labor-intensive. To bridge this gap, this paper proposes a novel method based on large language models (LLMs) for evaluating text summarization. We also conducts…
▽ More
Evaluating text summarization has been a challenging task in natural language processing (NLP). Automatic metrics which heavily rely on reference summaries are not suitable in many situations, while human evaluation is time-consuming and labor-intensive. To bridge this gap, this paper proposes a novel method based on large language models (LLMs) for evaluating text summarization. We also conducts a comparative study on eight automatic metrics, human evaluation, and our proposed LLM-based method. Seven different types of state-of-the-art (SOTA) summarization models were evaluated. We perform extensive experiments and analysis on datasets with patent documents. Our results show that LLMs evaluation aligns closely with human evaluation, while widely-used automatic metrics such as ROUGE-2, BERTScore, and SummaC do not and also lack consistency. Based on the empirical comparison, we propose a LLM-powered framework for automatically evaluating and improving text summarization, which is beneficial and could attract wide attention among the community.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
InfiniBench: A Comprehensive Benchmark for Large Multimodal Models in Very Long Video Understanding
Authors:
Kirolos Ataallah,
Chenhui Gou,
Eslam Abdelrahman,
Khushbu Pahwa,
Jian Ding,
Mohamed Elhoseiny
Abstract:
Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding which presents 1)The longest video duration, averagin…
▽ More
Understanding long videos, ranging from tens of minutes to several hours, presents unique challenges in video comprehension. Despite the increasing importance of long-form video content, existing benchmarks primarily focus on shorter clips. To address this gap, we introduce InfiniBench a comprehensive benchmark for very long video understanding which presents 1)The longest video duration, averaging 76.34 minutes; 2) The largest number of question-answer pairs, 108.2K; 3) Diversity in questions that examine nine different skills and include both multiple-choice questions and open-ended questions; 4) Humancentric, as the video sources come from movies and daily TV shows, with specific human-level question designs such as Movie Spoiler Questions that require critical thinking and comprehensive understanding. Using InfiniBench, we comprehensively evaluate existing Large MultiModality Models (LMMs) on each skill, including the commercial model Gemini 1.5 Flash and the open-source models. The evaluation shows significant challenges in our benchmark.Our results show that the best AI models such Gemini struggles to perform well with 42.72% average accuracy and 2.71 out of 5 average score. We hope this benchmark will stimulate the LMMs community towards long video and human-level understanding. Our benchmark can be accessed at https://vision-cair.github.io/InfiniBench/
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
A Survey on Data Quality Dimensions and Tools for Machine Learning
Authors:
Yuhan Zhou,
Fengjiao Tu,
Kewei Sha,
Junhua Ding,
Haihua Chen
Abstract:
Machine learning (ML) technologies have become substantial in practically all aspects of our society, and data quality (DQ) is critical for the performance, fairness, robustness, safety, and scalability of ML models. With the large and complex data in data-centric AI, traditional methods like exploratory data analysis (EDA) and cross-validation (CV) face challenges, highlighting the importance of…
▽ More
Machine learning (ML) technologies have become substantial in practically all aspects of our society, and data quality (DQ) is critical for the performance, fairness, robustness, safety, and scalability of ML models. With the large and complex data in data-centric AI, traditional methods like exploratory data analysis (EDA) and cross-validation (CV) face challenges, highlighting the importance of mastering DQ tools. In this survey, we review 17 DQ evaluation and improvement tools in the last 5 years. By introducing the DQ dimensions, metrics, and main functions embedded in these tools, we compare their strengths and limitations and propose a roadmap for developing open-source DQ tools for ML. Based on the discussions on the challenges and emerging trends, we further highlight the potential applications of large language models (LLMs) and generative AI in DQ evaluation and improvement for ML. We believe this comprehensive survey can enhance understanding of DQ in ML and could drive progress in data-centric AI. A complete list of the literature investigated in this survey is available on GitHub at: https://github.com/haihua0913/awesome-dq4ml.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
EHR-Based Mobile and Web Platform for Chronic Disease Risk Prediction Using Large Language Multimodal Models
Authors:
Chun-Chieh Liao,
Wei-Ting Kuo,
I-Hsuan Hu,
Yen-Chen Shih,
Jun-En Ding,
Feng Liu,
Fang-Ming Hung
Abstract:
Traditional diagnosis of chronic diseases involves in-person consultations with physicians to identify the disease. However, there is a lack of research focused on predicting and developing application systems using clinical notes and blood test values. We collected five years of Electronic Health Records (EHRs) from Taiwan's hospital database between 2017 and 2021 as an AI database. Furthermore,…
▽ More
Traditional diagnosis of chronic diseases involves in-person consultations with physicians to identify the disease. However, there is a lack of research focused on predicting and developing application systems using clinical notes and blood test values. We collected five years of Electronic Health Records (EHRs) from Taiwan's hospital database between 2017 and 2021 as an AI database. Furthermore, we developed an EHR-based chronic disease prediction platform utilizing Large Language Multimodal Models (LLMMs), successfully integrating with frontend web and mobile applications for prediction. This prediction platform can also connect to the hospital's backend database, providing physicians with real-time risk assessment diagnostics. The demonstration link can be found at https://www.youtube.com/watch?v=oqmL9DEDFgA.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Electrical switching of chirality in rhombohedral graphene Chern insulators
Authors:
Jing Ding,
Hanxiao Xiang,
Jiannan Hua,
Wenqiang Zhou,
Naitian Liu,
Le Zhang,
Na Xin,
Kenji Watanabe,
Takashi Taniguchi,
Wei Zhu,
Shuigang Xu
Abstract:
A Chern insulator hosts topologically protected chiral edge currents with quantized conductance characterized by its Chern number. Switching the chirality of the Chern insulator, namely, the direction of the edge current, is highly challenging due to topologically forbidden backscattering but is of considerable importance for the design of topological devices. Nevertheless, this can be achieved by…
▽ More
A Chern insulator hosts topologically protected chiral edge currents with quantized conductance characterized by its Chern number. Switching the chirality of the Chern insulator, namely, the direction of the edge current, is highly challenging due to topologically forbidden backscattering but is of considerable importance for the design of topological devices. Nevertheless, this can be achieved by reversing the sign of the Chern number through a topological phase transition. Here, we report electrically switchable chirality in rhombohedral heptalayer graphene-based Chern insulators. The surface flat band and giant Berry curvature in rhombohedral multilayer graphene provide a highly tunable platform for engineering the topological states. By introducing moire superlattices in rhombohedral heptalayer graphene, we observed a cascade of topological phase transitions at quarter electron filling of a moire band. The Chern number can be continuously tuned from 0, -1, 1 to 2 by electric fields, manifesting as a large anomalous Hall effect and following Streda's formula. Sign reversal and the anomalous Hall effect also occurred at non-integer fillings, suggesting the possibility of electrically tunable topological phase transitions within the regime of fractional Chern insulators. Our work establishes rhombohedral heptalayer graphene moire superlattices as a versatile platform for topological engineering. The realization of switchable chirality enhances the potential application of chiral edge currents in topological circuit interconnects.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
VRSBench: A Versatile Vision-Language Benchmark Dataset for Remote Sensing Image Understanding
Authors:
Xiang Li,
Jian Ding,
Mohamed Elhoseiny
Abstract:
We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. Although several vision-language datasets in remote sensing have been proposed to pursue this goal, existing datasets are typically tailored to single tasks, lack detailed object information, or suffer from inadequate quality control. Exploring these im…
▽ More
We introduce a new benchmark designed to advance the development of general-purpose, large-scale vision-language models for remote sensing images. Although several vision-language datasets in remote sensing have been proposed to pursue this goal, existing datasets are typically tailored to single tasks, lack detailed object information, or suffer from inadequate quality control. Exploring these improvement opportunities, we present a Versatile vision-language Benchmark for Remote Sensing image understanding, termed VRSBench. This benchmark comprises 29,614 images, with 29,614 human-verified detailed captions, 52,472 object references, and 123,221 question-answer pairs. It facilitates the training and evaluation of vision-language models across a broad spectrum of remote sensing image understanding tasks. We further evaluated state-of-the-art models on this benchmark for three vision-language tasks: image captioning, visual grounding, and visual question answering. Our work aims to significantly contribute to the development of advanced vision-language models in the field of remote sensing. The data and code can be accessed at https://github.com/lx709/VRSBench.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Experimenting with D-Wave Quantum Annealers on Prime Factorization problems
Authors:
Jingwen Ding,
Giuseppe Spallitta,
Roberto Sebastiani
Abstract:
This paper builds on top of a paper we have published very recently, in which we have proposed a novel approach to prime factorization (PF) by quantum annealing, where 8,219,999=32,749x251 was the highest prime product we were able to factorize -- which, to the best of our knowledge is the largest number which was ever factorized by means of a quantum device. The series of annealing experiments wh…
▽ More
This paper builds on top of a paper we have published very recently, in which we have proposed a novel approach to prime factorization (PF) by quantum annealing, where 8,219,999=32,749x251 was the highest prime product we were able to factorize -- which, to the best of our knowledge is the largest number which was ever factorized by means of a quantum device. The series of annealing experiments which led us to these results, however, did not follow a straight-line path; rather, they involved a convoluted trial-and-error process, full of failed or partially-failed attempts and backtracks, which only in the end drove us to find the successful annealing strategies. In this paper, we delve into the reasoning behind our experimental decisions and provide an account of some of the attempts we have taken before conceiving the final strategies that allowed us to achieve the results. This involves also a bunch of ideas, techniques, and strategies we investigated which, although turned out to be inferior wrt. those we adopted in the end, may instead provide insights to a more-specialized audience of D-Wave users and practitioners. In particular, we show the following insights: ($i$) different initialization techniques affect performances, among which flux biases are effective when targeting locally-structured embeddings; ($ii$) chain strengths have a lower impact in locally-structured embeddings compared to problem relying on global embeddings; ($iii$) there is a trade-off between broken chain and excited CFAs, suggesting an incremental annealing offset remedy approach based on the modules instead of single qubits. Thus, by sharing the details of our experiences, we aim to provide insights into the evolving landscape of quantum annealing, and help people access and effectively use D-Wave quantum annealers.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
iMotion-LLM: Motion Prediction Instruction Tuning
Authors:
Abdulwahab Felemban,
Eslam Mohamed Bakr,
Xiaoqian Shen,
Jian Ding,
Abduallah Mohamed,
Mohamed Elhoseiny
Abstract:
We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios. Different from conventional motion prediction approaches, iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories. By enriching the real-world driving scenarios in the Waymo Open Dataset with tex…
▽ More
We introduce iMotion-LLM: a Multimodal Large Language Models (LLMs) with trajectory prediction, tailored to guide interactive multi-agent scenarios. Different from conventional motion prediction approaches, iMotion-LLM capitalizes on textual instructions as key inputs for generating contextually relevant trajectories. By enriching the real-world driving scenarios in the Waymo Open Dataset with textual motion instructions, we created InstructWaymo. Leveraging this dataset, iMotion-LLM integrates a pretrained LLM, fine-tuned with LoRA, to translate scene features into the LLM input space. iMotion-LLM offers significant advantages over conventional motion prediction models. First, it can generate trajectories that align with the provided instructions if it is a feasible direction. Second, when given an infeasible direction, it can reject the instruction, thereby enhancing safety. These findings act as milestones in empowering autonomous navigation systems to interpret and predict the dynamics of multi-agent environments, laying the groundwork for future advancements in this field.
△ Less
Submitted 11 June, 2024; v1 submitted 10 June, 2024;
originally announced June 2024.
-
A second-order accurate, original energy dissipative numerical scheme for chemotaxis and its convergence analysis
Authors:
Jie Ding,
Cheng Wang,
Shenggao Zhou
Abstract:
This paper proposes a second-order accurate numerical scheme for the Patlak-Keller-Segel system with various mobilities for the description of chemotaxis. Formulated in a variational structure, the entropy part is novelly discretized by a modified Crank-Nicolson approach so that the solution to the proposed nonlinear scheme corresponds to a minimizer of a convex functional. A careful theoretical a…
▽ More
This paper proposes a second-order accurate numerical scheme for the Patlak-Keller-Segel system with various mobilities for the description of chemotaxis. Formulated in a variational structure, the entropy part is novelly discretized by a modified Crank-Nicolson approach so that the solution to the proposed nonlinear scheme corresponds to a minimizer of a convex functional. A careful theoretical analysis reveals that the unique solvability and positivity-preserving property could be theoretically justified. More importantly, such a second order numerical scheme is able to preserve the dissipative property of the original energy functional, instead of a modified one. To the best of our knowledge, the proposed scheme is the first second-order accurate one in literature that could achieve both the numerical positivity and original energy dissipation. In addition, an optimal rate convergence estimate is provided for the proposed scheme, in which rough and refined error estimate techniques have to be included to accomplish such an analysis. Ample numerical results are presented to demonstrate robust performance of the proposed scheme in preserving positivity and original energy dissipation in blowup simulations.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting
Authors:
Zhanyu Liu,
Jianrong Ding,
Guanjie Zheng
Abstract:
The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While…
▽ More
The field of Intelligent Transportation Systems (ITS) relies on accurate traffic forecasting to enable various downstream applications. However, developing cities often face challenges in collecting sufficient training traffic data due to limited resources and outdated infrastructure. Recognizing this obstacle, the concept of cross-city few-shot forecasting has emerged as a viable approach. While previous cross-city few-shot forecasting methods ignore the frequency similarity between cities, we have made an observation that the traffic data is more similar in the frequency domain between cities. Based on this fact, we propose a \textbf{F}requency \textbf{E}nhanced \textbf{P}re-training Framework for \textbf{Cross}-city Few-shot Forecasting (\textbf{FEPCross}). FEPCross has a pre-training stage and a fine-tuning stage. In the pre-training stage, we propose a novel Cross-Domain Spatial-Temporal Encoder that incorporates the information of the time and frequency domain and trains it with self-supervised tasks encompassing reconstruction and contrastive objectives. In the fine-tuning stage, we design modules to enrich training samples and maintain a momentum-updated graph structure, thereby mitigating the risk of overfitting to the few-shot training data. Empirical evaluations performed on real-world traffic datasets validate the exceptional efficacy of FEPCross, outperforming existing approaches of diverse categories and demonstrating characteristics that foster the progress of cross-city few-shot forecasting.
△ Less
Submitted 5 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
One-arm Probabilities for Metric Graph Gaussian Free Fields below and at the Critical Dimension
Authors:
Zhenhao Cai,
Jian Ding
Abstract:
For the critical level-set of the Gaussian free field on the metric graph of $\mathbb Z^d$, we consider the one-arm probability $θ_d(N)$, i.e., the probability that the boundary of a box of side length $2N$ is connected to the center. We prove that $θ_d(N)$ is $O(N^{-\frac{d}{2}+1})$ for $3\le d\le 5$, and is $N^{-2+o(1)}$ for $d=6$. Our upper bounds match the lower bounds in a previous work by Di…
▽ More
For the critical level-set of the Gaussian free field on the metric graph of $\mathbb Z^d$, we consider the one-arm probability $θ_d(N)$, i.e., the probability that the boundary of a box of side length $2N$ is connected to the center. We prove that $θ_d(N)$ is $O(N^{-\frac{d}{2}+1})$ for $3\le d\le 5$, and is $N^{-2+o(1)}$ for $d=6$. Our upper bounds match the lower bounds in a previous work by Ding and Wirth up to a constant factor for $3\le d\le 5$, and match the exponent therein for $d=6$. Combined with our previous result that $θ_d(N) \asymp N^{-2}$ for $d>6$, this seems to present the first percolation model whose one-arm probabilities are essentially completely understood in all dimensions. In particular, these results fully confirm Werner's conjectures (2021) on the one-arm exponents:
\begin{equation*}
\text{(1) for}\ 3\le d<d_c=6,\ θ_d(N)=N^{-\frac{d}{2}+o(1)};\ \text{(2) for}\ d>d_c,\ θ_d(N)=N^{-2+o(1)}.
\end{equation*}
Prior to our work, Drewitz, Prévost and Rodriguez obtained upper bounds for $d\in \{3, 4\}$, which are very sharp although lose some diverging factors. In the same work, they conjectured that $θ_{d_c}(N) = N^{-2+o(1)}$, which is now established. In addition, in a recent concurrent work, Drewitz, Prévost and Rodriguez independently obtained the up-to-constant upper bound for $d=3$.
△ Less
Submitted 12 July, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
CondTSF: One-line Plugin of Dataset Condensation for Time Series Forecasting
Authors:
Jianrong Ding,
Zhanyu Liu,
Guanjie Zheng,
Haiming Jin,
Linghe Kong
Abstract:
Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing c…
▽ More
Dataset condensation is a newborn technique that generates a small dataset that can be used in training deep neural networks to lower training costs. The objective of dataset condensation is to ensure that the model trained with the synthetic dataset can perform comparably to the model trained with full datasets. However, existing methods predominantly concentrate on classification tasks, posing challenges in their adaptation to time series forecasting (TS-forecasting). This challenge arises from disparities in the evaluation of synthetic data. In classification, the synthetic data is considered well-distilled if the model trained with the full dataset and the model trained with the synthetic dataset yield identical labels for the same input, regardless of variations in output logits distribution. Conversely, in TS-forecasting, the effectiveness of synthetic data distillation is determined by the distance between predictions of the two models. The synthetic data is deemed well-distilled only when all data points within the predictions are similar. Consequently, TS-forecasting has a more rigorous evaluation methodology compared to classification. To mitigate this gap, we theoretically analyze the optimization objective of dataset condensation for TS-forecasting and propose a new one-line plugin of dataset condensation designated as Dataset Condensation for Time Series Forecasting (CondTSF) based on our analysis. Plugging CondTSF into previous dataset condensation methods facilitates a reduction in the distance between the predictions of the model trained with the full dataset and the model trained with the synthetic dataset, thereby enhancing performance. We conduct extensive experiments on eight commonly used time series datasets. CondTSF consistently improves the performance of all previous dataset condensation methods across all datasets, particularly at low condensing ratios.
△ Less
Submitted 11 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Robust Stable Spiking Neural Networks
Authors:
Jianhao Ding,
Zhiyu Pan,
Yujia Liu,
Zhaofei Yu,
Tiejun Huang
Abstract:
Spiking neural networks (SNNs) are gaining popularity in deep learning due to their low energy budget on neuromorphic hardware. However, they still face challenges in lacking sufficient robustness to guard safety-critical applications such as autonomous driving. Many studies have been conducted to defend SNNs from the threat of adversarial attacks. This paper aims to uncover the robustness of SNN…
▽ More
Spiking neural networks (SNNs) are gaining popularity in deep learning due to their low energy budget on neuromorphic hardware. However, they still face challenges in lacking sufficient robustness to guard safety-critical applications such as autonomous driving. Many studies have been conducted to defend SNNs from the threat of adversarial attacks. This paper aims to uncover the robustness of SNN through the lens of the stability of nonlinear systems. We are inspired by the fact that searching for parameters altering the leaky integrate-and-fire dynamics can enhance their robustness. Thus, we dive into the dynamics of membrane potential perturbation and simplify the formulation of the dynamics. We present that membrane potential perturbation dynamics can reliably convey the intensity of perturbation. Our theoretical analyses imply that the simplified perturbation dynamics satisfy input-output stability. Thus, we propose a training framework with modified SNN neurons and to reduce the mean square of membrane potential perturbation aiming at enhancing the robustness of SNN. Finally, we experimentally verify the effectiveness of the framework in the setting of Gaussian noise training and adversarial training on the image classification task.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Enhancing Adversarial Robustness in SNNs with Sparse Gradients
Authors:
Yujia Liu,
Tong Bu,
Jianhao Ding,
Zecheng Hao,
Tiejun Huang,
Zhaofei Yu
Abstract:
Spiking Neural Networks (SNNs) have attracted great attention for their energy-efficient operations and biologically inspired structures, offering potential advantages over Artificial Neural Networks (ANNs) in terms of energy efficiency and interpretability. Nonetheless, similar to ANNs, the robustness of SNNs remains a challenge, especially when facing adversarial attacks. Existing techniques, wh…
▽ More
Spiking Neural Networks (SNNs) have attracted great attention for their energy-efficient operations and biologically inspired structures, offering potential advantages over Artificial Neural Networks (ANNs) in terms of energy efficiency and interpretability. Nonetheless, similar to ANNs, the robustness of SNNs remains a challenge, especially when facing adversarial attacks. Existing techniques, whether adapted from ANNs or specifically designed for SNNs, exhibit limitations in training SNNs or defending against strong attacks. In this paper, we propose a novel approach to enhance the robustness of SNNs through gradient sparsity regularization. We observe that SNNs exhibit greater resilience to random perturbations compared to adversarial perturbations, even at larger scales. Motivated by this, we aim to narrow the gap between SNNs under adversarial and random perturbations, thereby improving their overall robustness. To achieve this, we theoretically prove that this performance gap is upper bounded by the gradient sparsity of the probability associated with the true label concerning the input image, laying the groundwork for a practical strategy to train robust SNNs by regularizing the gradient sparsity. We validate the effectiveness of our approach through extensive experiments on both image-based and event-based datasets. The results demonstrate notable improvements in the robustness of SNNs. Our work highlights the importance of gradient sparsity in SNNs and its role in enhancing robustness.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories
Authors:
Jia Li,
Ge Li,
Yunfei Zhao,
Yongmin Li,
Huanyu Liu,
Hao Zhu,
Lecheng Wang,
Kaibo Liu,
Zheng Fang,
Lanshen Wang,
Jiazheng Ding,
Xuanming Zhang,
Yuqi Zhu,
Yihong Dong,
Zhi Jin,
Binhua Li,
Fei Huang,
Yongbin Li
Abstract:
How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs.
To address the knowledge gap, we propose a new benchmark named DevEval, which has three advances. (1) DevEval aligns with real-world repositories in multi…
▽ More
How to evaluate the coding abilities of Large Language Models (LLMs) remains an open question. We find that existing benchmarks are poorly aligned with real-world code repositories and are insufficient to evaluate the coding abilities of LLMs.
To address the knowledge gap, we propose a new benchmark named DevEval, which has three advances. (1) DevEval aligns with real-world repositories in multiple dimensions, e.g., code distributions and dependency distributions. (2) DevEval is annotated by 13 developers and contains comprehensive annotations (e.g., requirements, original repositories, reference code, and reference dependencies). (3) DevEval comprises 1,874 testing samples from 117 repositories, covering 10 popular domains (e.g., Internet, Database). Based on DevEval, we propose repository-level code generation and evaluate 8 popular LLMs on DevEval (e.g., gpt-4, gpt-3.5, StarCoder 2, DeepSeek Coder, CodeLLaMa). Our experiments reveal these LLMs' coding abilities in real-world code repositories. For example, in our experiments, the highest Pass@1 of gpt-4-turbo is only 53.04%. We also analyze LLMs' failed cases and summarize their shortcomings. We hope DevEval can facilitate the development of LLMs in real code repositories. DevEval, prompts, and LLMs' predictions have been released.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
AI Risk Management Should Incorporate Both Safety and Security
Authors:
Xiangyu Qi,
Yangsibo Huang,
Yi Zeng,
Edoardo Debenedetti,
Jonas Geiping,
Luxi He,
Kaixuan Huang,
Udari Madhushani,
Vikash Sehwag,
Weijia Shi,
Boyi Wei,
Tinghao Xie,
Danqi Chen,
Pin-Yu Chen,
Jeffrey Ding,
Ruoxi Jia,
Jiaqi Ma,
Arvind Narayanan,
Weijie J Su,
Mengdi Wang,
Chaowei Xiao,
Bo Li,
Dawn Song,
Peter Henderson,
Prateek Mittal
Abstract:
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this pape…
▽ More
The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of "safety" and "security" themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Kestrel: Point Grounding Multimodal LLM for Part-Aware 3D Vision-Language Understanding
Authors:
Junjie Fei,
Mahmoud Ahmed,
Jian Ding,
Eslam Mohamed Bakr,
Mohamed Elhoseiny
Abstract:
While 3D MLLMs have achieved significant progress, they are restricted to object and scene understanding and struggle to understand 3D spatial structures at the part level. In this paper, we introduce Kestrel, representing a novel approach that empowers 3D MLLMs with part-aware understanding, enabling better interpretation and segmentation grounding of 3D objects at the part level. Despite its sig…
▽ More
While 3D MLLMs have achieved significant progress, they are restricted to object and scene understanding and struggle to understand 3D spatial structures at the part level. In this paper, we introduce Kestrel, representing a novel approach that empowers 3D MLLMs with part-aware understanding, enabling better interpretation and segmentation grounding of 3D objects at the part level. Despite its significance, the current landscape lacks tasks and datasets that endow and assess this capability. Therefore, we propose two novel tasks: (1) Part-Aware Point Grounding, the model is tasked with directly predicting a part-level segmentation mask based on user instructions, and (2) Part-Aware Point Grounded Captioning, the model provides a detailed caption that includes part-level descriptions and their corresponding masks. To support learning and evaluating for these tasks, we introduce 3DCoMPaT Grounded Instructions Dataset (3DCoMPaT-GRIN). 3DCoMPaT-GRIN Vanilla, comprising 789k part-aware point cloud-instruction-segmentation mask triplets, is used to evaluate MLLMs' ability of part-aware segmentation grounding. 3DCoMPaT-GRIN Grounded Caption, containing 107k part-aware point cloud-instruction-grounded caption triplets, assesses both MLLMs' part-aware language comprehension and segmentation grounding capabilities. Our introduced tasks, dataset, and Kestrel represent a preliminary effort to bridge the gap between human cognition and 3D MLLMs, i.e., the ability to perceive and engage with the environment at both global and part levels. Extensive experiments on the 3DCoMPaT-GRIN show that Kestrel can generate user-specified segmentation masks, a capability not present in any existing 3D MLLM. Kestrel thus established a benchmark for evaluating the part-aware language comprehension and segmentation grounding of 3D objects. Project page at https://feielysia.github.io/Kestrel.github.io/
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Unified Low-rank Compression Framework for Click-through Rate Prediction
Authors:
Hao Yu,
Minghao Fu,
Jiandong Ding,
Yusheng Zhou,
Jianxin Wu
Abstract:
Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has…
▽ More
Deep Click-Through Rate (CTR) prediction models play an important role in modern industrial recommendation scenarios. However, high memory overhead and computational costs limit their deployment in resource-constrained environments. Low-rank approximation is an effective method for computer vision and natural language processing models, but its application in compressing CTR prediction models has been less explored. Due to the limited memory and computing resources, compression of CTR prediction models often confronts three fundamental challenges, i.e., (1). How to reduce the model sizes to adapt to edge devices? (2). How to speed up CTR prediction model inference? (3). How to retain the capabilities of original models after compression? Previous low-rank compression research mostly uses tensor decomposition, which can achieve a high parameter compression ratio, but brings in AUC degradation and additional computing overhead. To address these challenges, we propose a unified low-rank decomposition framework for compressing CTR prediction models. We find that even with the most classic matrix decomposition SVD method, our framework can achieve better performance than the original model. To further improve the effectiveness of our framework, we locally compress the output features instead of compressing the model weights. Our unified low-rank compression framework can be applied to embedding tables and MLP layers in various CTR prediction models. Extensive experiments on two academic datasets and one real industrial benchmark demonstrate that, with 3-5x model size reduction, our compressed models can achieve both faster inference and higher AUC than the uncompressed original models. Our code is at https://github.com/yuhao318/Atomic_Feature_Mimicking.
△ Less
Submitted 11 June, 2024; v1 submitted 28 May, 2024;
originally announced May 2024.
-
Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust
Authors:
Hongjie Chen,
Jingqiu Ding,
Yiding Hua,
David Steurer
Abstract:
We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erdős-Rényi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or…
▽ More
We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erdős-Rényi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates.
Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).
△ Less
Submitted 3 June, 2024; v1 submitted 26 May, 2024;
originally announced May 2024.
-
Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing
Authors:
Jiarun Ding,
Peiwen Jiang,
Chao-Kai Wen,
Shi Jin
Abstract:
Semantic communication has undergone considerable evolution due to the recent rapid development of artificial intelligence (AI), significantly enhancing both communication robustness and efficiency. Despite these advancements, most current semantic communication methods for image transmission pay little attention to the differing importance of objects and backgrounds in images. To address this iss…
▽ More
Semantic communication has undergone considerable evolution due to the recent rapid development of artificial intelligence (AI), significantly enhancing both communication robustness and efficiency. Despite these advancements, most current semantic communication methods for image transmission pay little attention to the differing importance of objects and backgrounds in images. To address this issue, we propose a novel scheme named ASCViT-JSCC, which utilizes vision transformers (ViTs) integrated with an orthogonal frequency division multiplexing (OFDM) system. This scheme adaptively allocates bandwidth for objects and backgrounds in images according to the importance order of different parts determined by object detection of you only look once version 5 (YOLOv5) and feature points detection of scale invariant feature transform (SIFT). Furthermore, the proposed scheme adheres to digital modulation standards by incorporating quantization modules. We validate this approach through an over-the-air (OTA) testbed named intelligent communication prototype validation platform (ICP) based on a software-defined radio (SDR) and NVIDIA embedded kits. Our findings from both simulations and practical measurements show that ASCViT-JSCC significantly preserves objects in images and enhances reconstruction quality compared to existing methods.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
The AI Community Building the Future? A Quantitative Analysis of Development Activity on Hugging Face Hub
Authors:
Cailean Osborne,
Jennifer Ding,
Hannah Rose Kirk
Abstract:
Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. Firs…
▽ More
Open model developers have emerged as key actors in the political economy of artificial intelligence (AI), but we still have a limited understanding of collaborative practices in the open AI ecosystem. This paper responds to this gap with a three-part quantitative analysis of development activity on the Hugging Face (HF) Hub, a popular platform for building, sharing, and demonstrating models. First, various types of activity across 348,181 model, 65,761 dataset, and 156,642 space repositories exhibit right-skewed distributions. Activity is extremely imbalanced between repositories; for example, over 70% of models have 0 downloads, while 1% account for 99% of downloads. Furthermore, licenses matter: there are statistically significant differences in collaboration patterns in model repositories with permissive, restrictive, and no licenses. Second, we analyse a snapshot of the social network structure of collaboration in model repositories, finding that the community has a core-periphery structure, with a core of prolific developers and a majority of isolate developers (89%). Upon removing the isolate developers from the network, collaboration is characterised by high reciprocity regardless of developers' network positions. Third, we examine model adoption through the lens of model usage in spaces, finding that a minority of models, developed by a handful of companies, are widely used on the HF Hub. Overall, activity on the HF Hub is characterised by Pareto distributions, congruent with OSS development patterns on platforms like GitHub. We conclude with recommendations for researchers, companies, and policymakers to advance our understanding of open AI development.
△ Less
Submitted 5 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Engineering band structures of two-dimensional materials with remote moire ferroelectricity
Authors:
Jing Ding,
Hanxiao Xiang,
Wenqiang Zhou,
Naitian Liu,
Xinjie Fang,
Kangyu Wang,
Linfeng Wu,
Kenji Watanabe,
Takashi Taniguchi,
Shuigang Xu
Abstract:
The stacking order and twist angle provide abundant opportunities for engineering band structures of two-dimensional materials, including the formation of moire bands, flat bands, and topologically nontrivial bands. The inversion symmetry breaking in rhombohedral-stacked transitional metal dichalcogenides (TMDCs) endows them with an interfacial ferroelectricity associated with an out-of-plane elec…
▽ More
The stacking order and twist angle provide abundant opportunities for engineering band structures of two-dimensional materials, including the formation of moire bands, flat bands, and topologically nontrivial bands. The inversion symmetry breaking in rhombohedral-stacked transitional metal dichalcogenides (TMDCs) endows them with an interfacial ferroelectricity associated with an out-of-plane electric polarization. By utilizing twist angle as a knob to construct rhombohedral-stacked TMDCs, antiferroelectric domain networks with alternating out-of-plane polarization can be generated. Here, we demonstrate that such spatially periodic ferroelectric polarizations in parallel-stacked twisted WSe2 can imprint their moire potential onto a remote bilayer graphene. This remote moire potential gives rise to pronounced satellite resistance peaks besides the charge-neutrality point in graphene, which are tunable by the twist angle of WSe2. Our observations of ferroelectric hysteresis at finite displacement fields suggest the moire is delivered by a long-range electrostatic potential. The constructed superlattices by moire ferroelectricity represent a highly flexible approach, as they involve the separation of the moire construction layer from the electronic transport layer. This remote moire is identified as a weak potential and can coexist with conventional moire. Our results offer a comprehensive strategy for engineering band structures and properties of two-dimensional materials by utilizing moire ferroelectricity.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Large band-splitting in $g$-wave type altermagnet CrSb
Authors:
Jianyang Ding,
Zhicheng Jiang,
Xiuhua Chen,
Zicheng Tao,
Zhengtai Liu,
Jishan Liu,
Tongrui Li,
Jiayu Liu,
Yichen Yang,
Runfeng Zhang,
Liwei Deng,
Wenchuan Jing,
Yu Huang,
Yuming Shi,
Shan Qiao,
Yilin Wang,
Yanfeng Guo,
Donglai Feng,
Dawei Shen
Abstract:
Altermagnetism (AM), a newly discovered magnetic state, ingeniously integrates the properties of ferromagnetism and antiferromagnetism, representing a significant breakthrough in the field of magnetic materials. Despite experimental verification of some typical AM materials, such as MnTe and MnTe$_2$, the pursuit of AM materials that feature larger spin splitting and higher transition temperature…
▽ More
Altermagnetism (AM), a newly discovered magnetic state, ingeniously integrates the properties of ferromagnetism and antiferromagnetism, representing a significant breakthrough in the field of magnetic materials. Despite experimental verification of some typical AM materials, such as MnTe and MnTe$_2$, the pursuit of AM materials that feature larger spin splitting and higher transition temperature is still essential. Here, our research focuses on CrSb, which possesses N{é}el temperature of up to 700K and giant spin splitting near the Fermi level ($E_F$). Utilizing high-resolution angle-resolved photoemission spectroscopy and density functional theory calculations, we meticulously map the three-dimensional electronic structure of CrSb. Our photoemission spectroscopic results on both (0001) and (10$\overline{1}$0) cleavages of CrSb collaboratively reveal unprecedented details on AM-induced band splitting, and subsequently pin down its unique bulk $g$-wave symmetry through quantitative analysis of the angular and photon-energy dependence of spin splitting. Moreover, the observed spin splitting reaches the magnitude of 0.93~eV near $E_F$, the most substantial among all confirmed AM materials. This study not only validates the nature of CrSb as a prototype $g$-wave like AM material but also underscores its pivotal role in pioneering applications in spintronics.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Authors:
Zhenwei Shao,
Zhou Yu,
Jun Yu,
Xuecheng Ouyang,
Lihao Zheng,
Zhenbiao Gai,
Mingyang Wang,
Jiajun Ding
Abstract:
By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximiz…
▽ More
By harnessing the capabilities of large language models (LLMs), recent large multimodal models (LMMs) have shown remarkable versatility in open-world multimodal understanding. Nevertheless, they are usually parameter-heavy and computation-intensive, thus hindering their applicability in resource-constrained scenarios. To this end, several lightweight LMMs have been proposed successively to maximize the capabilities under constrained scale (e.g., 3B). Despite the encouraging results achieved by these methods, most of them only focus on one or two aspects of the design space, and the key design choices that influence model capability have not yet been thoroughly investigated. In this paper, we conduct a systematic study for lightweight LMMs from the aspects of model architecture, training strategy, and training data. Based on our findings, we obtain Imp -- a family of highly capable LMMs at the 2B-4B scales. Notably, our Imp-3B model steadily outperforms all the existing lightweight LMMs of similar size, and even surpasses the state-of-the-art LMMs at the 13B scale. With low-bit quantization and resolution reduction techniques, our Imp model can be deployed on a Qualcomm Snapdragon 8Gen3 mobile chip with a high inference speed of about 13 tokens/s.
△ Less
Submitted 29 May, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
NTTSuite: Number Theoretic Transform Benchmarks for Accelerating Encrypted Computation
Authors:
Juran Ding,
Yuanzhe Liu,
Lingbin Sun,
Brandon Reagen
Abstract:
Privacy concerns have thrust privacy-preserving computation into the spotlight. Homomorphic encryption (HE) is a cryptographic system that enables computation to occur directly on encrypted data, providing users with strong privacy (and security) guarantees while using the same services they enjoy today unprotected. While promising, HE has seen little adoption due to extremely high computational o…
▽ More
Privacy concerns have thrust privacy-preserving computation into the spotlight. Homomorphic encryption (HE) is a cryptographic system that enables computation to occur directly on encrypted data, providing users with strong privacy (and security) guarantees while using the same services they enjoy today unprotected. While promising, HE has seen little adoption due to extremely high computational overheads, rendering it impractical. Homomorphic encryption (HE) is a cryptographic system that enables computation to occur directly on encrypted data. In this paper we develop a benchmark suite, named NTTSuite, to enable researchers to better address these overheads by studying the primary source of HE's slowdown: the number theoretic transform (NTT). NTTSuite constitutes seven unique NTT algorithms with support for CPUs (C++), GPUs (CUDA), and custom hardware (Catapult HLS).In addition, we propose optimizations to improve the performance of NTT running on FPGAs. We find our implementation outperforms the state-of-the-art by 30%.
△ Less
Submitted 18 May, 2024;
originally announced May 2024.
-
Inner-approximate Reachability Computation via Zonotopic Boundary Analysis
Authors:
Dejin Ren,
Zhen Liang,
Chenyu Wu,
Jianqiang Ding,
Taoran Wu,
Bai Xue
Abstract:
Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability…
▽ More
Inner-approximate reachability analysis involves calculating subsets of reachable sets, known as inner-approximations. This analysis is crucial in the fields of dynamic systems analysis and control theory as it provides a reliable estimation of the set of states that a system can reach from given initial states at a specific time instant. In this paper, we study the inner-approximate reachability analysis problem based on the set-boundary reachability method for systems modelled by ordinary differential equations, in which the computed inner-approximations are represented with zonotopes. The set-boundary reachability method computes an inner-approximation by excluding states reached from the initial set's boundary. The effectiveness of this method is highly dependent on the efficient extraction of the exact boundary of the initial set. To address this, we propose methods leveraging boundary and tiling matrices that can efficiently extract and refine the exact boundary of the initial set represented by zonotopes. Additionally, we enhance the exclusion strategy by contracting the outer-approximations in a flexible way, which allows for the computation of less conservative inner-approximations. To evaluate the proposed method, we compare it with state-of-the-art methods against a series of benchmarks. The numerical results demonstrate that our method is not only efficient but also accurate in computing inner-approximations.
△ Less
Submitted 21 May, 2024; v1 submitted 17 May, 2024;
originally announced May 2024.
-
When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models
Authors:
Xianzheng Ma,
Yash Bhalgat,
Brandon Smart,
Shuai Chen,
Xinghui Li,
Jian Ding,
Jindong Gu,
Dave Zhenyu Chen,
Songyou Peng,
Jia-Wang Bian,
Philip H Torr,
Marc Pollefeys,
Matthias Nießner,
Ian D Reid,
Angel X. Chang,
Iro Laina,
Victor Adrian Prisacariu
Abstract:
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context lear…
▽ More
As large language models (LLMs) evolve, their integration with 3D spatial data (3D-LLMs) has seen rapid progress, offering unprecedented capabilities for understanding and interacting with physical spaces. This survey provides a comprehensive overview of the methodologies enabling LLMs to process, understand, and generate 3D data. Highlighting the unique advantages of LLMs, such as in-context learning, step-by-step reasoning, open-vocabulary capabilities, and extensive world knowledge, we underscore their potential to significantly advance spatial comprehension and interaction within embodied Artificial Intelligence (AI) systems. Our investigation spans various 3D data representations, from point clouds to Neural Radiance Fields (NeRFs). It examines their integration with LLMs for tasks such as 3D scene understanding, captioning, question-answering, and dialogue, as well as LLM-based agents for spatial reasoning, planning, and navigation. The paper also includes a brief review of other methods that integrate 3D and language. The meta-analysis presented in this paper reveals significant progress yet underscores the necessity for novel approaches to harness the full potential of 3D-LLMs. Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world. To support this survey, we have established a project page where papers related to our topic are organized and listed: https://github.com/ActiveVisionLab/Awesome-LLM-3D.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Distilling Implicit Multimodal Knowledge into LLMs for Zero-Resource Dialogue Generation
Authors:
Bo Zhang,
Hui Ma,
Jian Ding,
Jian Wang,
Bo Xu,
Hongfei Lin
Abstract:
Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework…
▽ More
Integrating multimodal knowledge into large language models (LLMs) represents a significant advancement in dialogue generation capabilities. However, the effective incorporation of such knowledge in zero-resource scenarios remains a substantial challenge due to the scarcity of diverse, high-quality dialogue datasets. To address this, we propose the Visual Implicit Knowledge Distillation Framework (VIKDF), an innovative approach aimed at enhancing LLMs for enriched dialogue generation in zero-resource contexts by leveraging implicit multimodal knowledge. VIKDF comprises two main stages: knowledge distillation, using an Implicit Query Transformer to extract and encode visual implicit knowledge from image-text pairs into knowledge vectors; and knowledge integration, employing a novel Bidirectional Variational Information Fusion technique to seamlessly integrate these distilled vectors into LLMs. This enables the LLMs to generate dialogues that are not only coherent and engaging but also exhibit a deep understanding of the context through implicit multimodal cues, effectively overcoming the limitations of zero-resource scenarios. Our extensive experimentation across two dialogue datasets shows that VIKDF outperforms existing state-of-the-art models in generating high-quality dialogues. The code will be publicly available following acceptance.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Additive-Effect Assisted Learning
Authors:
Jiawei Zhang,
Yuhong Yang,
Jie Ding
Abstract:
It is quite popular nowadays for researchers and data analysts holding different datasets to seek assistance from each other to enhance their modeling performance. We consider a scenario where different learners hold datasets with potentially distinct variables, and their observations can be aligned by a nonprivate identifier. Their collaboration faces the following difficulties: First, learners m…
▽ More
It is quite popular nowadays for researchers and data analysts holding different datasets to seek assistance from each other to enhance their modeling performance. We consider a scenario where different learners hold datasets with potentially distinct variables, and their observations can be aligned by a nonprivate identifier. Their collaboration faces the following difficulties: First, learners may need to keep data values or even variable names undisclosed due to, e.g., commercial interest or privacy regulations; second, there are restrictions on the number of transmission rounds between them due to e.g., communication costs. To address these challenges, we develop a two-stage assisted learning architecture for an agent, Alice, to seek assistance from another agent, Bob. In the first stage, we propose a privacy-aware hypothesis testing-based screening method for Alice to decide on the usefulness of the data from Bob, in a way that only requires Bob to transmit sketchy data. Once Alice recognizes Bob's usefulness, Alice and Bob move to the second stage, where they jointly apply a synergistic iterative model training procedure. With limited transmissions of summary statistics, we show that Alice can achieve the oracle performance as if the training were from centralized data, both theoretically and numerically.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks
Authors:
Yanhong Peng,
Miao He,
Fangchao Hu,
Zebing Mao,
Xia Huang,
Jun Ding
Abstract:
We present a novel approach to predicting the pressure and flow rate of flexible electrohydrodynamic pumps using the Kolmogorov-Arnold Network. Inspired by the Kolmogorov-Arnold representation theorem, KAN replaces fixed activation functions with learnable spline-based activation functions, enabling it to approximate complex nonlinear functions more effectively than traditional models like Multi-L…
▽ More
We present a novel approach to predicting the pressure and flow rate of flexible electrohydrodynamic pumps using the Kolmogorov-Arnold Network. Inspired by the Kolmogorov-Arnold representation theorem, KAN replaces fixed activation functions with learnable spline-based activation functions, enabling it to approximate complex nonlinear functions more effectively than traditional models like Multi-Layer Perceptron and Random Forest. We evaluated KAN on a dataset of flexible EHD pump parameters and compared its performance against RF, and MLP models. KAN achieved superior predictive accuracy, with Mean Squared Errors of 12.186 and 0.001 for pressure and flow rate predictions, respectively. The symbolic formulas extracted from KAN provided insights into the nonlinear relationships between input parameters and pump performance. These findings demonstrate that KAN offers exceptional accuracy and interpretability, making it a promising alternative for predictive modeling in electrohydrodynamic pumping.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
High-Order Synchrosqueezed Chirplet Transforms for Multicomponent Signal Analysis
Authors:
Yi-Ju Yen,
De-Yan Lu,
Sing-Yuan Yeh,
Jian-Jiun Ding,
Chun-Yen Shen
Abstract:
This study focuses on the analysis of signals containing multiple components with crossover instantaneous frequencies (IF). This problem was initially solved with the chirplet transform (CT). Also, it can be sharpened by adding the synchrosqueezing step, which is called the synchrosqueezed chirplet transform (SCT). However, we found that the SCT goes wrong with the high chirp modulation signal due…
▽ More
This study focuses on the analysis of signals containing multiple components with crossover instantaneous frequencies (IF). This problem was initially solved with the chirplet transform (CT). Also, it can be sharpened by adding the synchrosqueezing step, which is called the synchrosqueezed chirplet transform (SCT). However, we found that the SCT goes wrong with the high chirp modulation signal due to the wrong estimation of the IF. In this paper, we present the improvement of the post-transformation of the CT. The main goal of this paper is to amend the estimation introduced in the SCT and carry out the high-order synchrosqueezed chirplet transform. The proposed method reduces the wrong estimation when facing a stronger variety of chirp-modulated multi-component signals. The theoretical analysis of the new reassignment ingredient is provided. Numerical experiments on some synthetic signals are presented to verify the effectiveness of the proposed high-order SCT.
△ Less
Submitted 11 May, 2024;
originally announced May 2024.
-
Polynomial lower bound on the effective resistance for the one-dimensional critical long-range percolation
Authors:
Jian Ding,
Zherui Fan,
Lu-Jing Huang
Abstract:
In this work, we study the critical long-range percolation on $\mathbb{Z}$, where an edge connects $i$ and $j$ independently with probability $1-\exp\{-β|i-j|^{-2}\}$ for some fixed $β>0$. Viewing this as a random electric network where each edge has a unit conductance, we show that with high probability the effective resistances from the origin 0 to $[-N, N]^c$ and from the interval $[-N,N]$ to…
▽ More
In this work, we study the critical long-range percolation on $\mathbb{Z}$, where an edge connects $i$ and $j$ independently with probability $1-\exp\{-β|i-j|^{-2}\}$ for some fixed $β>0$. Viewing this as a random electric network where each edge has a unit conductance, we show that with high probability the effective resistances from the origin 0 to $[-N, N]^c$ and from the interval $[-N,N]$ to $[-2N,2N]^c$ (conditioned on no edge joining $[-N,N]$ and $[-2N,2N]^c$) both have a polynomial lower bound in $N$. Our bound holds for all $β>0$ and thus rules out a potential phase transition (around $β= 1$) which seemed to be a reasonable possibility.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
Gain suppression study on LGADs at the CENPA tandem accelerator
Authors:
S. Braun,
Q. Buat,
J. Ding,
P. Kammel,
S. M. Mazza,
F. McKinney-Martinez,
A. Molnar,
C. Lansdell,
J. Ott,
A. Seiden,
B. Schumm,
Y. Zhao
Abstract:
Low-Gain Avalanche Detectors (LGADs) are a type of thin silicon detector with a highly doped gain layer that provides moderate internal signal amplification. One recent challenge in the use of LGADs, studied by several research groups, is the gain suppression mechanism for large localized charge deposits. Using the CENPA Tandem accelerator at the University of Washington, the response of the LGADs…
▽ More
Low-Gain Avalanche Detectors (LGADs) are a type of thin silicon detector with a highly doped gain layer that provides moderate internal signal amplification. One recent challenge in the use of LGADs, studied by several research groups, is the gain suppression mechanism for large localized charge deposits. Using the CENPA Tandem accelerator at the University of Washington, the response of the LGADs to MeV-range energy deposits from a proton beam was studied. Two LGAD prototypes and a PIN diode were characterized, and the gain of the devices was determined as a function of bias voltage, incidence beam angle and proton energy. This study was conducted in the scope of the PIONEER experiment, an experiment proposed at the Paul Scherrer Institute to perform high-precision measurements of rare pion decays. %At the center of the experiment, a high-granularity active target (ATAR) will stop the pion and characterize its decay. A range of deposited charge from Minimum Ionizing Particle (MIP, few 10s of KeV) from positrons to several MeV from the stopping pions/muons is expected in PIONEER; the detection and separation of close-by hits in such a wide dynamic range will be a main challenge of the experiment. To achieve this goal, the gain suppression mechanism has to be understood fully.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
RepEval: Effective Text Evaluation with LLM Representation
Authors:
Shuqian Sheng,
Yi Xu,
Tianhang Zhang,
Zanwei Shen,
Luoyi Fu,
Jiaxin Ding,
Lei Zhou,
Xinbing Wang,
Chenghu Zhou
Abstract:
Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the…
▽ More
Automatic evaluation metrics for generated texts play an important role in the NLG field, especially with the rapid growth of LLMs. However, existing metrics are often limited to specific scenarios, making it challenging to meet the evaluation requirements of expanding LLM applications. Therefore, there is a demand for new, flexible, and effective metrics. In this study, we introduce RepEval, the first metric leveraging the projection of LLM representations for evaluation. RepEval requires minimal sample pairs for training, and through simple prompt modifications, it can easily transition to various tasks. Results on ten datasets from three tasks demonstrate the high effectiveness of our method, which exhibits stronger correlations with human judgments compared to previous metrics, even outperforming GPT-4. Our work underscores the richness of information regarding text quality embedded within LLM representations, offering insights for the development of new metrics.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Converting High-Performance and Low-Latency SNNs through Explicit Modelling of Residual Error in ANNs
Authors:
Zhipeng Huang,
Jianhao Ding,
Zhiyu Pan,
Haoran Li,
Ying Fang,
Zhaofei Yu,
Jian K. Liu
Abstract:
Spiking neural networks (SNNs) have garnered interest due to their energy efficiency and superior effectiveness on neuromorphic chips compared with traditional artificial neural networks (ANNs). One of the mainstream approaches to implementing deep SNNs is the ANN-SNN conversion, which integrates the efficient training strategy of ANNs with the energy-saving potential and fast inference capability…
▽ More
Spiking neural networks (SNNs) have garnered interest due to their energy efficiency and superior effectiveness on neuromorphic chips compared with traditional artificial neural networks (ANNs). One of the mainstream approaches to implementing deep SNNs is the ANN-SNN conversion, which integrates the efficient training strategy of ANNs with the energy-saving potential and fast inference capability of SNNs. However, under extreme low-latency conditions, the existing conversion theory suggests that the problem of misrepresentation of residual membrane potentials in SNNs, i.e., the inability of IF neurons with a reset-by-subtraction mechanism to respond to residual membrane potentials beyond the range from resting potential to threshold, leads to a performance gap in the converted SNNs compared to the original ANNs. This severely limits the possibility of practical application of SNNs on delay-sensitive edge devices. Existing conversion methods addressing this problem usually involve modifying the state of the conversion spiking neurons. However, these methods do not consider their adaptability and compatibility with neuromorphic chips. We propose a new approach based on explicit modeling of residual errors as additive noise. The noise is incorporated into the activation function of the source ANN, which effectively reduces the residual error. Our experiments on the CIFAR10/100 dataset verify that our approach exceeds the prevailing ANN-SNN conversion methods and directly trained SNNs concerning accuracy and the required time steps. Overall, our method provides new ideas for improving SNN performance under ultra-low-latency conditions and is expected to promote practical neuromorphic hardware applications for further development.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Toeplitz Operators on Weighted Bergman Spaces over Tubular Domains
Authors:
Lvchang Li,
Jiaqing Ding,
Haichou Li
Abstract:
In this paper, we mainly study the necessary and sufficient conditions for the boundedness and compactness of Toeplitz operators on weighted Bergman spaces over a tubular domains by using the Carlson measures on tubular domains. We also give some related results about Carlson measures.
In this paper, we mainly study the necessary and sufficient conditions for the boundedness and compactness of Toeplitz operators on weighted Bergman spaces over a tubular domains by using the Carlson measures on tubular domains. We also give some related results about Carlson measures.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
Dynamic Nanodomains Dictate Macroscopic Properties in Lead Halide Perovskites
Authors:
Milos Dubajic,
James R. Neilson,
Johan Klarbring,
Xia Liang,
Stephanie A. Boer,
Kirrily C. Rule,
Josie E. Auckett,
Leilei Gu,
Xuguang Jia,
Andreas Pusch,
Ganbaatar Tumen-Ulzii,
Qiyuan Wu,
Thomas A. Selby,
Yang Lu,
Julia C. Trowbridge,
Eve M. Mozur,
Arianna Minelli,
Nikolaj Roth,
Kieran W. P. Orr,
Arman Mahboubi Soufiani,
Simon Kahmann,
Irina Kabakova,
Jianning Ding,
Tom Wu,
Gavin J. Conibeer
, et al. (4 additional authors not shown)
Abstract:
Empirical A-site cation substitution has advanced the stability and efficiency of hybrid organic-inorganic lead halide perovskites solar cells and the functionality of X-ray detectors. Yet, the fundamental mechanisms underpinning their unique performance remain elusive. This multi-modal study unveils the link between nanoscale structural dynamics and macroscopic optoelectronic properties in these…
▽ More
Empirical A-site cation substitution has advanced the stability and efficiency of hybrid organic-inorganic lead halide perovskites solar cells and the functionality of X-ray detectors. Yet, the fundamental mechanisms underpinning their unique performance remain elusive. This multi-modal study unveils the link between nanoscale structural dynamics and macroscopic optoelectronic properties in these materials by utilising X-ray diffuse scattering, inelastic neutron spectroscopy and optical microscopy complemented by state-of-the-art machine learning-assisted molecular dynamics simulations. Our approach uncovers the presence of dynamic, lower-symmetry local nanodomains embedded within the higher-symmetry average phase in various perovskite compositions. The properties of these nanodomains are tunable via the A-site cation selection: methylammonium induces a high density of anisotropic, planar nanodomains of out-of-phase octahedral tilts, while formamidinium favours sparsely distributed isotropic, spherical nanodomains with in-phase tilting, even when crystallography reveals cubic symmetry on average. The observed variations in the properties of dynamic nanodomains are in agreement with our simulations and are directly linked to the differing macroscopic optoelectronic and ferroelastic behaviours of these compositions. By demonstrating the influence of A-site cation on local nanodomains and consequently, on macroscopic properties, we propose leveraging this relationship to engineer the optoelectronic response of these materials, propelling further advancements in perovskite-based photovoltaics, optoelectronics, and X-ray imaging.
△ Less
Submitted 1 May, 2024; v1 submitted 22 April, 2024;
originally announced April 2024.
-
ColA: Collaborative Adaptation with Gradient Learning
Authors:
Enmao Diao,
Qi Le,
Suya Wu,
Xinran Wang,
Ali Anwar,
Jie Ding,
Vahid Tarokh
Abstract:
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational over…
▽ More
A primary function of back-propagation is to compute both the gradient of hidden representations and parameters for optimization with gradient descent. Training large models requires high computational costs due to their vast parameter sizes. While Parameter-Efficient Fine-Tuning (PEFT) methods aim to train smaller auxiliary models to save computational space, they still present computational overheads, especially in Fine-Tuning as a Service (FTaaS) for numerous users. We introduce Collaborative Adaptation (ColA) with Gradient Learning (GL), a parameter-free, model-agnostic fine-tuning approach that decouples the computation of the gradient of hidden representations and parameters. In comparison to PEFT methods, ColA facilitates more cost-effective FTaaS by offloading the computation of the gradient to low-cost devices. We also provide a theoretical analysis of ColA and experimentally demonstrate that ColA can perform on par or better than existing PEFT methods on various benchmarks.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
AllTheDocks road safety dataset: A cyclist's perspective and experience
Authors:
Chia-Yen Chiang,
Ruikang Zhong,
Jennifer Ding,
Joseph Wood,
Stephen Bee,
Mona Jaber
Abstract:
Active travel is an essential component in intelligent transportation systems. Cycling, as a form of active travel, shares the road space with motorised traffic which often affects the cyclists' safety and comfort and therefore peoples' propensity to uptake cycling instead of driving. This paper presents a unique dataset, collected by cyclists across London, that includes video footage, accelerome…
▽ More
Active travel is an essential component in intelligent transportation systems. Cycling, as a form of active travel, shares the road space with motorised traffic which often affects the cyclists' safety and comfort and therefore peoples' propensity to uptake cycling instead of driving. This paper presents a unique dataset, collected by cyclists across London, that includes video footage, accelerometer, GPS, and gyroscope data. The dataset is then labelled by an independent group of London cyclists to rank the safety level of each frame and to identify objects in the cyclist's field of vision that might affect their experience. Furthermore, in this dataset, the quality of the road is measured by the international roughness index of the surface, which indicates the comfort of cycling on the road. The dataset will be made available for open access in the hope of motivating more research in this area to underpin the requirements for cyclists' safety and comfort and encourage more people to replace vehicle travel with cycling.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
SRGS: Super-Resolution 3D Gaussian Splatting
Authors:
Xiang Feng,
Yongbo He,
Yubo Wang,
Yan Yang,
Wen Li,
Yifei Chen,
Zhenzhong Kuang,
Jiajun ding,
Jianping Fan,
Yu Jun
Abstract:
Recently, 3D Gaussian Splatting (3DGS) has gained popularity as a novel explicit 3D representation. This approach relies on the representation power of Gaussian primitives to provide a high-quality rendering. However, primitives optimized at low resolution inevitably exhibit sparsity and texture deficiency, posing a challenge for achieving high-resolution novel view synthesis (HRNVS). To address t…
▽ More
Recently, 3D Gaussian Splatting (3DGS) has gained popularity as a novel explicit 3D representation. This approach relies on the representation power of Gaussian primitives to provide a high-quality rendering. However, primitives optimized at low resolution inevitably exhibit sparsity and texture deficiency, posing a challenge for achieving high-resolution novel view synthesis (HRNVS). To address this problem, we propose Super-Resolution 3D Gaussian Splatting (SRGS) to perform the optimization in a high-resolution (HR) space. The sub-pixel constraint is introduced for the increased viewpoints in HR space, exploiting the sub-pixel cross-view information of the multiple low-resolution (LR) views. The gradient accumulated from more viewpoints will facilitate the densification of primitives. Furthermore, a pre-trained 2D super-resolution model is integrated with the sub-pixel constraint, enabling these dense primitives to learn faithful texture features. In general, our method focuses on densification and texture learning to effectively enhance the representation ability of primitives. Experimentally, our method achieves high rendering quality on HRNVS only with LR inputs, outperforming state-of-the-art methods on challenging datasets such as Mip-NeRF 360 and Tanks & Temples. Related codes will be released upon acceptance.
△ Less
Submitted 18 June, 2024; v1 submitted 16 April, 2024;
originally announced April 2024.
-
Deep Learning for Cosmological Parameter Inference from Dark Matter Halo Density Field
Authors:
Zhiwei Min,
Xu Xiao,
Jiacheng Ding,
Liang Xiao,
Jie Jiang,
Donglin Wu,
Qiufan Lin,
Yin Li,
Yang Wang,
Shuai Liu,
Zhixin Chen,
Xiangru Li,
Jinqu Zhang,
Le Zhang,
Xiao-Dong Li
Abstract:
We propose a lightweight deep convolutional neural network (lCNN) to estimate cosmological parameters from simulated three-dimensional DM halo distributions and associated statistics. The training dataset comprises 2000 realizations of a cubic box with a side length of 1000 $h^{-1}{\rm Mpc}$, and interpolated over a cubic grid of $300^3$ voxels, with each simulation produced using $512^3$ DM parti…
▽ More
We propose a lightweight deep convolutional neural network (lCNN) to estimate cosmological parameters from simulated three-dimensional DM halo distributions and associated statistics. The training dataset comprises 2000 realizations of a cubic box with a side length of 1000 $h^{-1}{\rm Mpc}$, and interpolated over a cubic grid of $300^3$ voxels, with each simulation produced using $512^3$ DM particles and $512^3$ neutrinos . Under the flat $Λ$CDM model, simulations vary standard six cosmological parameters including $Ω_m$, $Ω_b$, $h$, $n_s$, $σ_8$, $w$, along with the neutrino mass sum, $M_ν$. We find that: 1) within the framework of lCNN, extracting large-scale structure information is more efficient from the halo density field compared to relying on the statistical quantities including the power spectrum, the two-point correlation function, and the coefficients from wavelet scattering transform; 2) combining the halo density field with its Fourier transformed counterpart enhances predictions, while augmenting the training dataset with measured statistics further improves performance; 3) achieving high accuracy in inferring $Ω_m$, $h$, $n_s$, and $σ_8$ by the neural network model, while being inefficient in predicting $Ω_b$,$M_ν$ and $w$; 4) compared to the simple random forest network trained with three statistical quantities, lCNN yields unbiased estimations with reduced statistical errors: approximately 33.3\% for $Ω_m$, 20.0\% for $h$, 8.3\% for $n_s$, and 40.0\% for $σ_8$. Our study emphasizes this lCNN-based novel approach in extracting large-scale structure information and estimating cosmological parameters.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Characterizing the Influence of Topology on Graph Learning Tasks
Authors:
Kailong Wu,
Yule Xie,
Jiaxin Ding,
Yuxiang Ren,
Luoyi Fu,
Xinbing Wang,
Chenghu Zhou
Abstract:
Graph neural networks (GNN) have achieved remarkable success in a wide range of tasks by encoding features combined with topology to create effective representations. However, the fundamental problem of understanding and analyzing how graph topology influences the performance of learning models on downstream tasks has not yet been well understood. In this paper, we propose a metric, TopoInf, which…
▽ More
Graph neural networks (GNN) have achieved remarkable success in a wide range of tasks by encoding features combined with topology to create effective representations. However, the fundamental problem of understanding and analyzing how graph topology influences the performance of learning models on downstream tasks has not yet been well understood. In this paper, we propose a metric, TopoInf, which characterizes the influence of graph topology by measuring the level of compatibility between the topological information of graph data and downstream task objectives. We provide analysis based on the decoupled GNNs on the contextual stochastic block model to demonstrate the effectiveness of the metric. Through extensive experiments, we demonstrate that TopoInf is an effective metric for measuring topological influence on corresponding tasks and can be further leveraged to enhance graph learning.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Deep Reinforcement Learning Based Toolpath Generation for Thermal Uniformity in Laser Powder Bed Fusion Process
Authors:
Mian Qin,
Junhao Ding,
Shuo Qu,
Xu Song,
Charlie C. L. Wang,
Wei-Hsin Liao
Abstract:
Laser powder bed fusion (LPBF) is a widely used metal additive manufacturing technology. However, the accumulation of internal residual stress during printing can cause significant distortion and potential failure. Although various scan patterns have been studied to reduce possible accumulated stress, such as zigzag scanning vectors with changing directions or a chessboard-based scan pattern with…
▽ More
Laser powder bed fusion (LPBF) is a widely used metal additive manufacturing technology. However, the accumulation of internal residual stress during printing can cause significant distortion and potential failure. Although various scan patterns have been studied to reduce possible accumulated stress, such as zigzag scanning vectors with changing directions or a chessboard-based scan pattern with divided small islands, most conventional scan patterns cannot significantly reduce residual stress. The proposed adaptive toolpath generation (ATG) algorithms, aiming to minimize the thermal gradients, may result in extremely accumulated temperature fields in some cases. To address these issues, we developed a deep reinforcement learning (DRL)-based toolpath generation framework, with the goal of achieving uniformly distributed heat and avoiding extremely thermal accumulation regions during the LPBF process. We first developed an overall pipeline for the DRL-based toolpath generation framework, which includes uniformly sampling, agent moving and environment observation, action selection, moving constraints, rewards calculation, and the training process. To accelerate the training process, we simplified the data-intensive numerical model by considering the turning angles on the toolpath. We designed the action spaces with three options, including the minimum temperature value, the smoothest path, and the second smoothest path. The reward function was designed to minimize energy density to ensure the temperature field remains relatively stable. To verify the effectiveness of the proposed DRL-based toolpath generation framework, we performed numerical simulations of polygon shape printing domains. In addition, four groups of thin plate samples with different scan patterns were compared using the LPBF process.
△ Less
Submitted 16 February, 2024;
originally announced April 2024.
-
Invisible and Semi-invisible Decays of Bottom Baryons
Authors:
Yong Zheng,
Jian-Nan Ding,
Dong-Hao Li,
Lei-Yi Li,
Cai-Dian Lü,
Fu-Sheng Yu
Abstract:
The similar densities of dark matter and baryons in the universe imply that they might arise from the same ultraviolet model. The B-Mesogenesis, which assumes dark matter is charged under the baryon number, attempts to simultaneously explain the origin of baryon asymmetry and dark matter in the universe. In particular, the B-Mesogenesis might induce bottom-baryon decays into invisible or semi-invi…
▽ More
The similar densities of dark matter and baryons in the universe imply that they might arise from the same ultraviolet model. The B-Mesogenesis, which assumes dark matter is charged under the baryon number, attempts to simultaneously explain the origin of baryon asymmetry and dark matter in the universe. In particular, the B-Mesogenesis might induce bottom-baryon decays into invisible or semi-invisible final states, which provide a distinctive signal for probing this scenario. In this work, we systematically study the invisible decays of bottom baryons into dark matters, and semi-invisible decays of bottom baryons into a meson or a photon together with a dark matter particle. In particular, the fully invisible decay can explore the stable particles in B-Mesogenesis. Some QCD-based frameworks are used to calculate the hadronic matrix elements under the B-Mesogenesis model. We estimate the constraints on the Wilson coefficients or the product of some new physics couplings with the Wilson coefficients by the semi-invisible and invisible decays of bottom baryons at future colliders.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.