subscribe to arXiv mailings

Inference Performance Optimization for Large Language Models on CPUs

Authors: Pujiang He, Shan Zhou, Wenhuan Huang, Changqing Li, Duyi Wang, Bin Guo, Chen Meng, Sheng Gui, Weifei Yu, Yi Xie

Abstract: Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardw… ▽ More Large language models (LLMs) have shown exceptional performance and vast potential across diverse tasks. However, the deployment of LLMs with high performance in low-resource environments has garnered significant attention in the industry. When GPU hardware resources are limited, we can explore alternative options on CPUs. To mitigate the financial burden and alleviate constraints imposed by hardware resources, optimizing inference performance is necessary. In this paper, we introduce an easily deployable inference performance optimization solution aimed at accelerating LLMs on CPUs. In this solution, we implement an effective way to reduce the KV cache size while ensuring precision. We propose a distributed inference optimization approach and implement it based on oneAPI Collective Communications Library. Furthermore, we propose optimization approaches for LLMs on CPU, and conduct tailored optimizations for the most commonly used models. The code is open-sourced at https://github.com/intel/xFasterTransformer. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 5 pages, 6 figure, ICML 2024 on Foundation Models in the Wild

arXiv:2407.02398 [pdf, other]

Consistency Flow Matching: Defining Straight Flows with Velocity Consistency

Authors: Ling Yang, Zixiang Zhang, Zhilong Zhang, Xingchao Liu, Minkai Xu, Wentao Zhang, Chenlin Meng, Stefano Ermon, Bin Cui

Abstract: Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce… ▽ More Flow matching (FM) is a general framework for defining probability paths via Ordinary Differential Equations (ODEs) to transform between noise and data samples. Recent approaches attempt to straighten these flow trajectories to generate high-quality samples with fewer function evaluations, typically through iterative rectification methods or optimal transport solutions. In this paper, we introduce Consistency Flow Matching (Consistency-FM), a novel FM method that explicitly enforces self-consistency in the velocity field. Consistency-FM directly defines straight flows starting from different times to the same endpoint, imposing constraints on their velocity values. Additionally, we propose a multi-segment training approach for Consistency-FM to enhance expressiveness, achieving a better trade-off between sampling quality and speed. Preliminary experiments demonstrate that our Consistency-FM significantly improves training efficiency by converging 4.4x faster than consistency models and 1.7x faster than rectified flow models while achieving better generation quality. Our code is available at: https://github.com/YangLing0818/consistency_flow_matching △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Code: https://github.com/YangLing0818/consistency_flow_matching

arXiv:2407.01521 [pdf, other]

Improving Diffusion Inverse Problem Solving with Decoupled Noise Annealing

Authors: Bingliang Zhang, Wenda Chu, Julius Berner, Chenlin Meng, Anima Anandkumar, Yang Song

Abstract: Diffusion models have recently achieved success in solving Bayesian inverse problems with learned data priors. Current methods build on top of the diffusion sampling process, where each denoising step makes small modifications to samples from the previous step. However, this process struggles to correct errors from earlier sampling steps, leading to worse performance in complicated nonlinear inver… ▽ More Diffusion models have recently achieved success in solving Bayesian inverse problems with learned data priors. Current methods build on top of the diffusion sampling process, where each denoising step makes small modifications to samples from the previous step. However, this process struggles to correct errors from earlier sampling steps, leading to worse performance in complicated nonlinear inverse problems, such as phase retrieval. To address this challenge, we propose a new method called Decoupled Annealing Posterior Sampling (DAPS) that relies on a novel noise annealing process. Specifically, we decouple consecutive steps in a diffusion sampling trajectory, allowing them to vary considerably from one another while ensuring their time-marginals anneal to the true posterior as we reduce noise levels. This approach enables the exploration of a larger solution space, improving the success rate for accurate reconstructions. We demonstrate that DAPS significantly improves sample quality and stability across multiple image restoration tasks, particularly in complicated nonlinear inverse problems. For example, we achieve a PSNR of 30.72dB on the FFHQ 256 dataset for phase retrieval, which is an improvement of 9.12dB compared to existing methods. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2407.00029 [pdf, other]

Distributed Inference Performance Optimization for LLMs on CPUs

Authors: Pujiang He, Shan Zhou, Changqing Li, Wenhuan Huang, Weifei Yu, Duyi Wang, Chen Meng, Sheng Gui

Abstract: Large language models (LLMs) hold tremendous potential for addressing numerous real-world challenges, yet they typically demand significant computational resources and memory. Deploying LLMs onto a resource-limited hardware device with restricted memory capacity presents considerable challenges. Distributed computing emerges as a prevalent strategy to mitigate single-node memory constraints and ex… ▽ More Large language models (LLMs) hold tremendous potential for addressing numerous real-world challenges, yet they typically demand significant computational resources and memory. Deploying LLMs onto a resource-limited hardware device with restricted memory capacity presents considerable challenges. Distributed computing emerges as a prevalent strategy to mitigate single-node memory constraints and expedite LLM inference performance. To reduce the hardware limitation burden, we proposed an efficient distributed inference optimization solution for LLMs on CPUs. We conduct experiments with the proposed solution on 5th Gen Intel Xeon Scalable Processors, and the result shows the time per output token for the LLM with 72B parameter is 140 ms/token, much faster than the average human reading speed about 200ms per token. △ Less

Submitted 16 May, 2024; originally announced July 2024.

Comments: 4 pages, 3 figures, Practical ML for Low Resource Settings Workshop @ ICLR 2024

arXiv:2406.14288 [pdf, other]

Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective

Authors: Yunfei Liu, Jintang Li, Yuehe Chen, Ruofan Wu, Ericbk Wang, Jing Zhou, Sheng Tian, Shuheng Shen, Xing Fu, Changhua Meng, Weiqiang Wang, Liang Chen

Abstract: Graph clustering, a fundamental and challenging task in graph mining, aims to classify nodes in a graph into several disjoint clusters. In recent years, graph contrastive learning (GCL) has emerged as a dominant line of research in graph clustering and advances the new state-of-the-art. However, GCL-based methods heavily rely on graph augmentations and contrastive schemes, which may potentially in… ▽ More Graph clustering, a fundamental and challenging task in graph mining, aims to classify nodes in a graph into several disjoint clusters. In recent years, graph contrastive learning (GCL) has emerged as a dominant line of research in graph clustering and advances the new state-of-the-art. However, GCL-based methods heavily rely on graph augmentations and contrastive schemes, which may potentially introduce challenges such as semantic drift and scalability issues. Another promising line of research involves the adoption of modularity maximization, a popular and effective measure for community detection, as the guiding principle for clustering tasks. Despite the recent progress, the underlying mechanism of modularity maximization is still not well understood. In this work, we dig into the hidden success of modularity maximization for graph clustering. Our analysis reveals the strong connections between modularity maximization and graph contrastive learning, where positive and negative examples are naturally defined by modularity. In light of our results, we propose a community-aware graph clustering framework, coined MAGI, which leverages modularity maximization as a contrastive pretext task to effectively uncover the underlying information of communities in graphs, while avoiding the problem of semantic drift. Extensive experiments on multiple graph datasets verify the effectiveness of MAGI in terms of scalability and clustering performance compared to state-of-the-art graph clustering methods. Notably, MAGI easily scales a sufficiently large graph with 100M nodes while outperforming strong baselines. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: KDD 2024 research track. Code available at https://github.com/EdisonLeeeee/MAGI

arXiv:2406.14250 [pdf, other]

E-ANT: A Large-Scale Dataset for Efficient Automatic GUI NavigaTion

Authors: Ke Wang, Tianyu Xia, Zhangxuan Gu, Yi Zhao, Shuheng Shen, Changhua Meng, Weiqiang Wang, Ke Xu

Abstract: Online GUI navigation on mobile devices has driven a lot of attention recent years since it contributes to many real-world applications. With the rapid development of large language models (LLM), multimodal large language models (MLLM) have tremendous potential on this task. However, existing MLLMs need high quality data to improve its abilities of making the correct navigation decisions according… ▽ More Online GUI navigation on mobile devices has driven a lot of attention recent years since it contributes to many real-world applications. With the rapid development of large language models (LLM), multimodal large language models (MLLM) have tremendous potential on this task. However, existing MLLMs need high quality data to improve its abilities of making the correct navigation decisions according to the human user inputs. In this paper, we developed a novel and highly valuable dataset, named \textbf{E-ANT}, as the first Chinese GUI navigation dataset that contains real human behaviour and high quality screenshots with annotations, containing nearly 40,000 real human traces over 5000+ different tinyAPPs. Furthermore, we evaluate various powerful MLLMs on E-ANT and show their experiments results with sufficient ablations. We believe that our proposed dataset will be beneficial for both the evaluation and development of GUI navigation and LLM/MLLM decision-making capabilities. △ Less

Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

Comments: 9 pages, 5 figures, Under review

arXiv:2406.12323 [pdf, other]

Hybrid Beamforming Design for Near-Field ISAC with Modular XL-MIMO

Authors: Chunwei Meng, Dingyou Ma, Zhaolin Wang, Yuanwei Liu, Zhiqing Wei, Zhiyong Feng

Abstract: A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and pla… ▽ More A novel modular extremely large-scale multiple-input-multiple-output (XL-MIMO) integrated sensing and communication (ISAC) framework is proposed in this paper. We consider a downlink ISAC scenario and exploit the modular array architecture to enhance the communication spectral efficiency and sensing resolution while reducing the channel modeling complexity by employing the hybrid spherical and planar wavefront model. Considering the hybrid digital-analog structure inherent to modular arrays, we formulate a joint analog-digital beamforming design problem based on the communication spectral efficiency and sensing signal-to-clutter-plus-noise ratio (SCNR). By exploring the structural similarity of the communication and sensing channels, it is proved that the optimal transmit covariance matrix lies in the subspace spanned by the subarray response vectors, yielding a closed-form solution for the optimal analog beamformer. Consequently, the joint design problem is transformed into a low-dimensional rank-constrained digital beamformer optimization. We first propose a manifold optimization method that directly optimizes the digital beamformer on the rank-constrained Stiefel manifold. Additionally, we develop an semidefinite relaxation (SDR)-based approach that relaxes the rank constraint and employ the randomization technique to obtain a near-optimal solution. Simulation results demonstrate the effectiveness of the proposed modular XL-MIMO ISAC framework and algorithms. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.09022 [pdf, other]

doi 10.1109/JIOT.2024.3413687

Multi-Objective Optimization-based Transmit Beamforming for Multi-Target and Multi-User MIMO-ISAC Systems

Authors: Chunwei Meng, Zhiqing Wei, Dingyou Ma, Wanli Ni, Liyan Su, Zhiyong Feng

Abstract: Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi… ▽ More Integrated sensing and communication (ISAC) is an enabling technology for the sixth-generation mobile communications, which equips the wireless communication networks with sensing capabilities. In this paper, we investigate transmit beamforming design for multiple-input and multiple-output (MIMO)-ISAC systems in scenarios with multiple radar targets and communication users. A general form of multi-target sensing mutual information (MI) is derived, along with its upper bound, which can be interpreted as the sum of individual single-target sensing MI. Additionally, this upper bound can be achieved by suppressing the cross-correlation among reflected signals from different targets, which aligns with the principles of adaptive MIMO radar. Then, we propose a multi-objective optimization framework based on the signal-to-interference-plus-noise ratio of each user and the tight upper bound of sensing MI, introducing the Pareto boundary to characterize the achievable communication-sensing performance boundary of the proposed ISAC system. To achieve the Pareto boundary, the max-min system utility function method is employed, while considering the fairness between communication users and radar targets. Subsequently, the bisection search method is employed to find a specific Pareto optimal solution by solving a series of convex feasible problems. Finally, simulation results validate that the proposed method achieves a better tradeoff between multi-user communication and multi-target sensing performance. Additionally, utilizing the tight upper bound of sensing MI as a performance metric can enhance the multi-target resolution capability and angle estimation accuracy. △ Less

Submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.05600 [pdf, other]

Can We Use Large Language Models to Fill Relevance Judgment Holes?

Authors: Zahra Abbasiantaeb, Chuan Meng, Leif Azzopardi, Mohammad Aliannejadi

Abstract: Incomplete relevance judgments limit the re-usability of test collections. When new systems are compared against previous systems used to build the pool of judged documents, they often do so at a disadvantage due to the ``holes'' in test collection (i.e., pockets of un-assessed documents returned by the new system). In this paper, we take initial steps towards extending existing test collections b… ▽ More Incomplete relevance judgments limit the re-usability of test collections. When new systems are compared against previous systems used to build the pool of judged documents, they often do so at a disadvantage due to the ``holes'' in test collection (i.e., pockets of un-assessed documents returned by the new system). In this paper, we take initial steps towards extending existing test collections by employing Large Language Models (LLM) to fill the holes by leveraging and grounding the method using existing human judgments. We explore this problem in the context of Conversational Search using TREC iKAT, where information needs are highly dynamic and the responses (and, the results retrieved) are much more varied (leaving bigger holes). While previous work has shown that automatic judgments from LLMs result in highly correlated rankings, we find substantially lower correlates when human plus automatic judgments are used (regardless of LLM, one/two/few shot, or fine-tuned). We further find that, depending on the LLM employed, new runs will be highly favored (or penalized), and this effect is magnified proportionally to the size of the holes. Instead, one should generate the LLM annotations on the whole document pool to achieve more consistent rankings with human-generated labels. Future work is required to prompt engineering and fine-tuning LLMs to reflect and represent the human annotations, in order to ground and align the models, such that they are more fit for purpose. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.02194 [pdf, other]

Coherent XUV super continuum emission from atomic bound states

Authors: Jing Zhao, Xiaowei Wang, Li Wang, Jiacan Wang, Yalei Zhu, Fan Xiao, Wenkai Tao, Zhigang Zheng, Haizhong Wu, Xu Sun, Yue Lang, Congsen Meng, Dongwen Zhang, Zhihui Lv, Jinlei Liu, Zengxiu Zhao

Abstract: Coherent supercontinuum radiation in the extreme-ultraviolet (XUV) range is indispensable for synthesizing attosecond light pulses and for exploring transient atomic structures. Here, we report the striking observations of coherent XUV supercontinuum (XSC) extended from below to far above the ionization threshold, which exhibits completely different temporal and spatial properties comparing to the… ▽ More Coherent supercontinuum radiation in the extreme-ultraviolet (XUV) range is indispensable for synthesizing attosecond light pulses and for exploring transient atomic structures. Here, we report the striking observations of coherent XUV supercontinuum (XSC) extended from below to far above the ionization threshold, which exhibits completely different temporal and spatial properties comparing to the conventional rescattering induced high harmonic generation (HHG). We demonstrate that the strong-field created coherence among bound orbitals strongly distort the atomic transition energies during the pulse, leading to coherent emission spanning tens of electron-volts, in contrast to the line emission via free-induction decay occurring after the pulse. The supposed non-radiating bound dark states contribute as well by emitting dressed energy through dark-to-bright emission mechanism. All the processes modulated at sub-cycle time scale jointly form this new-type coherent XSC. This work achieves the strong-field attosecond control of the exotic atomic radiation dynamics and provides the means of simultaneous generation of separated attosecond sources, i.e., XSC and HHG, with potential advancing attosecond interferometry. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.18185 [pdf, other]

doi 10.1145/3626772.3657864

Ranked List Truncation for Large Language Model-based Re-Ranking

Authors: Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke

Abstract: We study ranked list truncation (RLT) from a novel "retrieve-then-re-rank" perspective, where we optimize re-ranking by truncating the retrieved list (i.e., trim re-ranking candidates). RLT is crucial for re-ranking as it can improve re-ranking efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis. It also has the potential to improve re-ranking effectiveness. D… ▽ More We study ranked list truncation (RLT) from a novel "retrieve-then-re-rank" perspective, where we optimize re-ranking by truncating the retrieved list (i.e., trim re-ranking candidates). RLT is crucial for re-ranking as it can improve re-ranking efficiency by sending variable-length candidate lists to a re-ranker on a per-query basis. It also has the potential to improve re-ranking effectiveness. Despite its importance, there is limited research into applying RLT methods to this new perspective. To address this research gap, we reproduce existing RLT methods in the context of re-ranking, especially newly emerged large language model (LLM)-based re-ranking. In particular, we examine to what extent established findings on RLT for retrieval are generalizable to the "retrieve-then-re-rank" setup from three perspectives: (i) assessing RLT methods in the context of LLM-based re-ranking with lexical first-stage retrieval, (ii) investigating the impact of different types of first-stage retrievers on RLT methods, and (iii) investigating the impact of different types of re-rankers on RLT methods. We perform experiments on the TREC 2019 and 2020 deep learning tracks, investigating 8 RLT methods for pipelines involving 3 retrievers and 2 re-rankers. We reach new insights into RLT methods in the context of re-ranking. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: Accepted for publication as a long paper at SIGIR 2024

ACM Class: H.3.3

arXiv:2404.11380 [pdf]

Non-hermitian magnonic knobbing between electromagnetically induced reflection and transparancy

Authors: Youcai Han, Changhao Meng, Zejin Rao, Jie Qian, Yiming Lv, Liping Zhu, CanMing Hu, Zhenghua An

Abstract: Manipulation of wave propagation through open resonant systems has attracted tremendous interest. When accessible to the open system, the system under study is prone to tempering to out of equilibrium, and a lack of reciprocity is the rule rather than the exception. Open systems correspond to non-hermitian Hamiltonians with very unique properties such as resulting exceptional points and ideal isol… ▽ More Manipulation of wave propagation through open resonant systems has attracted tremendous interest. When accessible to the open system, the system under study is prone to tempering to out of equilibrium, and a lack of reciprocity is the rule rather than the exception. Open systems correspond to non-hermitian Hamiltonians with very unique properties such as resulting exceptional points and ideal isolation. Here, we have found a highly sensitive modulation for the intersection of resonant patch antennas with respect to cavity magnonic coupling by means of an open coupling system of three resonant modes. Two types of crossings are implemented in this study: the first type of crossing remotely controls the sharp switching of the transmission line 's transmittance, while regulating the repulsive behavior of its zero-reflection states. The second type of crossing corresponds to the modulation of non-reciprocal phase transitions, which enables a more desirable isolation effect. Three different coupling models are realized by a non-Hermitian scattering Hamiltonian, revealing distinct spatial overlaps between modes. This elucidates that dissipative coupling of at least two modes to the environment is crucial for non-reciprocal transport. Our work not only reveals the versatility of cavity magnonic systems but also provides a way to design functional devices for general wave optics using patch antenna crossings. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2404.07472 [pdf, other]

doi 10.1109/LWC.2024.3406577

Cramer-Rao Bounds for Near-Field Sensing: A Generic Modular Architecture

Authors: Chunwei Meng, Dingyou Ma, Xu Chen, Zhiyong Feng, Yuanwei Liu

Abstract: A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are deriv… ▽ More A generic modular array architecture is proposed, featuring uniform/non-uniform subarray layouts that allows for flexible deployment. The bistatic near-field sensing system is considered, where the target is located in the near-field of the whole modular array and the far-field of each subarray. Then, the closed-form expressions of Cramer-Rao bounds (CRBs) for range and angle estimations are derived based on the hybrid spherical and planar wave model (HSPM). Simulation results validate the accuracy of the derived closed-form CRBs and demonstrate that: i) The HSPM with varying angles of arrival (AoAs) between subarrays can reduce the CRB for range estimation compared to the traditional HSPM with shared AoA; and ii) The proposed generic modular architecture with subarrays positioned closer to the edges can significantly reduce the CRBs compared to the traditional modular architecture with uniform subarray layout, when the array aperture is fixed. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.01012 [pdf, other]

Query Performance Prediction using Relevance Judgments Generated by Large Language Models

Authors: Chuan Meng, Negar Arabzadeh, Arian Askari, Mohammad Aliannejadi, Maarten de Rijke

Abstract: Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a single scalar value and do not require the predicted values to approximate a specific information retrieval (IR) evaluation measure, leading to certain drawbacks: (i) a single scalar is insufficient to accurately represe… ▽ More Query performance prediction (QPP) aims to estimate the retrieval quality of a search system for a query without human relevance judgments. Previous QPP methods typically return a single scalar value and do not require the predicted values to approximate a specific information retrieval (IR) evaluation measure, leading to certain drawbacks: (i) a single scalar is insufficient to accurately represent different IR evaluation measures, especially when metrics do not highly correlate, and (ii) a single scalar limits the interpretability of QPP methods because solely using a scalar is insufficient to explain QPP results. To address these issues, we propose a QPP framework using automatically generated relevance judgments (QPP-GenRE), which decomposes QPP into independent subtasks of predicting the relevance of each item in a ranked list to a given query. This allows us to predict any IR evaluation measure using the generated relevance judgments as pseudo-labels. This also allows us to interpret predicted IR evaluation measures, and identify, track and rectify errors in generated relevance judgments to improve QPP quality. We predict an item's relevance by using open-source large language models (LLMs) to ensure scientific reproducibility. We face two main challenges: (i) excessive computational costs of judging an entire corpus for predicting a metric considering recall, and (ii) limited performance in prompting open-source LLMs in a zero-/few-shot manner. To solve the challenges, we devise an approximation strategy to predict an IR measure considering recall and propose to fine-tune open-source LLMs using human-labeled relevance judgments. Experiments on the TREC 2019-2022 deep learning tracks show that QPP-GenRE achieves state-of-the-art QPP quality for both lexical and neural rankers. △ Less

Submitted 17 June, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

ACM Class: H.3.3

arXiv:2404.00732 [pdf, other]

An Abundance of Katherines: The Game Theory of Baby Naming

Authors: Katy Blumer, Kate Donahue, Katie Fritz, Kate Ivanovich, Katherine Lee, Katie Luo, Cathy Meng, Katie Van Koevering

Abstract: In this paper, we study the highly competitive arena of baby naming. Through making several Extremely Reasonable Assumptions (namely, that parents are myopic, perfectly knowledgeable agents who pick a name based solely on its uniquness), we create a model which is not only tractable and clean, but also perfectly captures the real world. We then extend our investigation with numerical experiments,… ▽ More In this paper, we study the highly competitive arena of baby naming. Through making several Extremely Reasonable Assumptions (namely, that parents are myopic, perfectly knowledgeable agents who pick a name based solely on its uniquness), we create a model which is not only tractable and clean, but also perfectly captures the real world. We then extend our investigation with numerical experiments, as well as analysis of large language model tools. We conclude by discussing avenues for future research. △ Less

Submitted 1 April, 2024; v1 submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted at SIGBOVIK 2024

arXiv:2403.15183 [pdf, other]

CRPlace: Camera-Radar Fusion with BEV Representation for Place Recognition

Authors: Shaowei Fu, Yifan Duan, Yao Li, Chengzhen Meng, Yingjie Wang, Jianmin Ji, Yanyong Zhang

Abstract: The integration of complementary characteristics from camera and radar data has emerged as an effective approach in 3D object detection. However, such fusion-based methods remain unexplored for place recognition, an equally important task for autonomous systems. Given that place recognition relies on the similarity between a query scene and the corresponding candidate scene, the stationary backgro… ▽ More The integration of complementary characteristics from camera and radar data has emerged as an effective approach in 3D object detection. However, such fusion-based methods remain unexplored for place recognition, an equally important task for autonomous systems. Given that place recognition relies on the similarity between a query scene and the corresponding candidate scene, the stationary background of a scene is expected to play a crucial role in the task. As such, current well-designed camera-radar fusion methods for 3D object detection can hardly take effect in place recognition because they mainly focus on dynamic foreground objects. In this paper, a background-attentive camera-radar fusion-based method, named CRPlace, is proposed to generate background-attentive global descriptors from multi-view images and radar point clouds for accurate place recognition. To extract stationary background features effectively, we design an adaptive module that generates the background-attentive mask by utilizing the camera BEV feature and radar dynamic points. With the guidance of a background mask, we devise a bidirectional cross-attention-based spatial fusion strategy to facilitate comprehensive spatial interaction between the background information of the camera BEV feature and the radar BEV feature. As the first camera-radar fusion-based place recognition network, CRPlace has been evaluated thoroughly on the nuScenes dataset. The results show that our algorithm outperforms a variety of baseline methods across a comprehensive set of metrics (recall@1 reaches 91.2%). △ Less

Submitted 22 March, 2024; originally announced March 2024.

arXiv:2403.14221 [pdf, other]

Improving the Robustness of Large Language Models via Consistency Alignment

Authors: Yukun Zhao, Lingyong Yan, Weiwei Sun, Guoliang Xing, Shuaiqiang Wang, Chong Meng, Zhicong Cheng, Zhaochun Ren, Dawei Yin

Abstract: Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses due to minor changes in the verbalized instructions. Recent literature has explored this inconsistency issue, highlighting the importance of continued improveme… ▽ More Large language models (LLMs) have shown tremendous success in following user instructions and generating helpful responses. Nevertheless, their robustness is still far from optimal, as they may generate significantly inconsistent responses due to minor changes in the verbalized instructions. Recent literature has explored this inconsistency issue, highlighting the importance of continued improvement in the robustness of response generation. However, systematic analysis and solutions are still lacking. In this paper, we quantitatively define the inconsistency problem and propose a two-stage training framework consisting of instruction-augmented supervised fine-tuning and consistency alignment training. The first stage helps a model generalize on following instructions via similar instruction augmentations. In the second stage, we improve the diversity and help the model understand which responses are more aligned with human expectations by differentiating subtle differences in similar responses. The training process is accomplished by self-rewards inferred from the trained model at the first stage without referring to external human preference resources. We conduct extensive experiments on recent publicly available LLMs on instruction-following tasks and demonstrate the effectiveness of our training framework. △ Less

Submitted 22 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: Accepted by LREC-COLING 2024

arXiv:2403.04703 [pdf, other]

mmPlace: Robust Place Recognition with Intermediate Frequency Signal of Low-cost Single-chip Millimeter Wave Radar

Authors: Chengzhen Meng, Yifan Duan, Chenming He, Dequan Wang, Xiaoran Fan, Yanyong Zhang

Abstract: Place recognition is crucial for tasks like loop-closure detection and re-localization. Single-chip millimeter wave radar (single-chip radar in short) emerges as a low-cost sensor option for place recognition, with the advantage of insensitivity to degraded visual environments. However, it encounters two challenges. Firstly, sparse point cloud from single-chip radar leads to poor performance when… ▽ More Place recognition is crucial for tasks like loop-closure detection and re-localization. Single-chip millimeter wave radar (single-chip radar in short) emerges as a low-cost sensor option for place recognition, with the advantage of insensitivity to degraded visual environments. However, it encounters two challenges. Firstly, sparse point cloud from single-chip radar leads to poor performance when using current place recognition methods, which assume much denser data. Secondly, its performance significantly declines in scenarios involving rotational and lateral variations, due to limited overlap in its field of view (FOV). We propose mmPlace, a robust place recognition system to address these challenges. Specifically, mmPlace transforms intermediate frequency (IF) signal into range azimuth heatmap and employs a spatial encoder to extract features. Additionally, to improve the performance in scenarios involving rotational and lateral variations, mmPlace employs a rotating platform and concatenates heatmaps in a rotation cycle, effectively expanding the system's FOV. We evaluate mmPlace's performance on the milliSonic dataset, which is collected on the University of Science and Technology of China (USTC) campus, the city roads surrounding the campus, and an underground parking garage. The results demonstrate that mmPlace outperforms point cloud-based methods and achieves 87.37% recall@1 in scenarios involving rotational and lateral variations. △ Less

Submitted 7 March, 2024; originally announced March 2024.

Comments: 8 pages, 8 figures

arXiv:2403.02614 [pdf, other]

doi 10.1364/OE.509601

Generation of True Quantum Random Numbers with On-Demand Probability Distributions via Single-Photon Quantum Walks

Authors: Chaoying Meng, Miao Cai, Yufang Yang, Haodong Wu, Zhixiang Li, Yaping Ruan, Yong Zhang, Han Zhang, Keyu Xia, Franco Nori

Abstract: Random numbers are at the heart of diverse fields, ranging from simulations of stochastic processes to classical and quantum cryptography. The requirement for true randomness in these applications has motivated various proposals for generating random numbers based on the inherent randomness of quantum systems. The generation of true random numbers with arbitrarily defined probability distributions… ▽ More Random numbers are at the heart of diverse fields, ranging from simulations of stochastic processes to classical and quantum cryptography. The requirement for true randomness in these applications has motivated various proposals for generating random numbers based on the inherent randomness of quantum systems. The generation of true random numbers with arbitrarily defined probability distributions is highly desirable for applications, but it is very challenging. Here we show that single-photon quantum walks can generate multi-bit random numbers with on-demand probability distributions, when the required ``coin'' parameters are found with the gradient descent (GD) algorithm. Our theoretical and experimental results exhibit high fidelity for various selected distributions. This GD-enhanced single-photon system provides a convenient way for building flexible and reliable quantum random number generators. Multi-bit random numbers are a necessary resource for high-dimensional quantum key distribution. △ Less

Submitted 4 March, 2024; originally announced March 2024.

arXiv:2403.00829 [pdf, other]

TroubleLLM: Align to Red Team Expert

Authors: Zhuoer Xu, Jianping Zhang, Shiwen Cui, Changhua Meng, Weiqiang Wang

Abstract: Large Language Models (LLMs) become the start-of-the-art solutions for a variety of natural language tasks and are integrated into real-world applications. However, LLMs can be potentially harmful in manifesting undesirable safety issues like social biases and toxic content. It is imperative to assess its safety issues before deployment. However, the quality and diversity of test prompts generated… ▽ More Large Language Models (LLMs) become the start-of-the-art solutions for a variety of natural language tasks and are integrated into real-world applications. However, LLMs can be potentially harmful in manifesting undesirable safety issues like social biases and toxic content. It is imperative to assess its safety issues before deployment. However, the quality and diversity of test prompts generated by existing methods are still far from satisfactory. Not only are these methods labor-intensive and require large budget costs, but the controllability of test prompt generation is lacking for the specific testing domain of LLM applications. With the idea of LLM for LLM testing, we propose the first LLM, called TroubleLLM, to generate controllable test prompts on LLM safety issues. Extensive experiments and human evaluation illustrate the superiority of TroubleLLM on generation quality and generation controllability. △ Less

Submitted 27 February, 2024; originally announced March 2024.

arXiv:2402.17460 [pdf, other]

Enhancement of mechanical squeezing via feedback control

Authors: Chao Meng, Warwick P. Bowen

Abstract: We explore the generation of nonclassical mechanical states by combining continuous position measurement and feedback control. We find that feedback-induced spring softening can greatly enhance position squeezing. Conversely, even with a pure position measurement, we find that spring hardening can enable momentum squeezing. Beyond enhanced squeezing, we show that feedback also mitigates degradatio… ▽ More We explore the generation of nonclassical mechanical states by combining continuous position measurement and feedback control. We find that feedback-induced spring softening can greatly enhance position squeezing. Conversely, even with a pure position measurement, we find that spring hardening can enable momentum squeezing. Beyond enhanced squeezing, we show that feedback also mitigates degradation introduced by background mechanical modes. Together, this significantly lowers the barrier to measurement-based preparation of nonclassical mechanical states at room temperature. △ Less

Submitted 27 February, 2024; originally announced February 2024.

arXiv:2402.12962 [pdf, other]

Multi-Level ML Based Burst-Aware Autoscaling for SLO Assurance and Cost Efficiency

Authors: Chunyang Meng, Haogang Tong, Tianyang Wu, Maolin Pan, Yang Yu

Abstract: Autoscaling is a technology to automatically scale the resources provided to their applications without human intervention to guarantee runtime Quality of Service (QoS) while saving costs. However, user-facing cloud applications serve dynamic workloads that often exhibit variable and contain bursts, posing challenges to autoscaling for maintaining QoS within Service-Level Objectives (SLOs). Conser… ▽ More Autoscaling is a technology to automatically scale the resources provided to their applications without human intervention to guarantee runtime Quality of Service (QoS) while saving costs. However, user-facing cloud applications serve dynamic workloads that often exhibit variable and contain bursts, posing challenges to autoscaling for maintaining QoS within Service-Level Objectives (SLOs). Conservative strategies risk over-provisioning, while aggressive ones may cause SLO violations, making it more challenging to design effective autoscaling. This paper introduces BAScaler, a Burst-Aware Autoscaling framework for containerized cloud services or applications under complex workloads, combining multi-level machine learning (ML) techniques to mitigate SLO violations while saving costs. BAScaler incorporates a novel prediction-based burst detection mechanism that distinguishes between predictable periodic workload spikes and actual bursts. When bursts are detected, BAScaler appropriately overestimates them and allocates resources accordingly to address the rapid growth in resource demand. On the other hand, BAScaler employs reinforcement learning to rectify potential inaccuracies in resource estimation, enabling more precise resource allocation during non-bursts. Experiments across ten real-world workloads demonstrate BAScaler's effectiveness, achieving a 57% average reduction in SLO violations and cutting resource costs by 10% compared to other prominent methods. △ Less

Submitted 20 February, 2024; originally announced February 2024.

arXiv:2402.11633 [pdf, other]

Self-seeding and Multi-intent Self-instructing LLMs for Generating Intent-aware Information-Seeking dialogs

Authors: Arian Askari, Roxana Petcu, Chuan Meng, Mohammad Aliannejadi, Amin Abolghasemi, Evangelos Kanoulas, Suzan Verberne

Abstract: Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study o… ▽ More Identifying user intents in information-seeking dialogs is crucial for a system to meet user's information needs. Intent prediction (IP) is challenging and demands sufficient dialogs with human-labeled intents for training. However, manually annotating intents is resource-intensive. While large language models (LLMs) have been shown to be effective in generating synthetic data, there is no study on using LLMs to generate intent-aware information-seeking dialogs. In this paper, we focus on leveraging LLMs for zero-shot generation of large-scale, open-domain, and intent-aware information-seeking dialogs. We propose SOLID, which has novel self-seeding and multi-intent self-instructing schemes. The former improves the generation quality by using the LLM's own knowledge scope to initiate dialog generation; the latter prompts the LLM to generate utterances sequentially, and mitigates the need for manual prompt design by asking the LLM to autonomously adapt its prompt instruction when generating complex multi-intent utterances. Furthermore, we propose SOLID-RL, which is further trained to generate a dialog in one step on the data generated by SOLID. We propose a length-based quality estimation mechanism to assign varying weights to SOLID-generated dialogs based on their quality during the training process of SOLID-RL. We use SOLID and SOLID-RL to generate more than 300k intent-aware dialogs, surpassing the size of existing datasets. Experiments show that IP methods trained on dialogs generated by SOLID and SOLID-RL achieve better IP quality than ones trained on human-generated dialogs. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2401.13934 [pdf, other]

MambaMorph: a Mamba-based Framework for Medical MR-CT Deformable Registration

Authors: Tao Guo, Yinuo Wang, Shihao Shu, Diansheng Chen, Zhouping Tang, Cai Meng, Xiangzhi Bai

Abstract: Capturing voxel-wise spatial correspondence across distinct modalities is crucial for medical image analysis. However, current registration approaches are not practical enough in terms of registration accuracy and clinical applicability. In this paper, we introduce MambaMorph, a novel multi-modality deformable registration framework. Specifically, MambaMorph utilizes a Mamba-based registration mod… ▽ More Capturing voxel-wise spatial correspondence across distinct modalities is crucial for medical image analysis. However, current registration approaches are not practical enough in terms of registration accuracy and clinical applicability. In this paper, we introduce MambaMorph, a novel multi-modality deformable registration framework. Specifically, MambaMorph utilizes a Mamba-based registration module and a fine-grained, yet simple, feature extractor for efficient long-range correspondence modeling and high-dimensional feature learning, respectively. Additionally, we develop a well-annotated brain MR-CT registration dataset, SR-Reg, to address the scarcity of data in multi-modality registration. To validate MambaMorph's multi-modality registration capabilities, we conduct quantitative experiments on both our SR-Reg dataset and a public T1-T2 dataset. The experimental results on both datasets demonstrate that MambaMorph significantly outperforms the current state-of-the-art learning-based registration methods in terms of registration accuracy. Further study underscores the efficiency of the Mamba-based registration module and the lightweight feature extractor, which achieve notable registration quality while maintaining reasonable computational costs and speeds. We believe that MambaMorph holds significant potential for practical applications in medical image registration. The code for MambaMorph is available at: https://github.com/Guo-Stone/MambaMorph. △ Less

Submitted 12 March, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

arXiv:2401.11708 [pdf, other]

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

Authors: Ling Yang, Zhaochen Yu, Chenlin Meng, Minkai Xu, Stefano Ermon, Bin Cui

Abstract: Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessin… ▽ More Diffusion models have exhibit exceptional performance in text-to-image generation and editing. However, existing methods often face challenges when handling complex text prompts that involve multiple objects with multiple attributes and relationships. In this paper, we propose a brand new training-free text-to-image generation/editing framework, namely Recaption, Plan and Generate (RPG), harnessing the powerful chain-of-thought reasoning ability of multimodal LLMs to enhance the compositionality of text-to-image diffusion models. Our approach employs the MLLM as a global planner to decompose the process of generating complex images into multiple simpler generation tasks within subregions. We propose complementary regional diffusion to enable region-wise compositional generation. Furthermore, we integrate text-guided image generation and editing within the proposed RPG in a closed-loop fashion, thereby enhancing generalization ability. Extensive experiments demonstrate our RPG outperforms state-of-the-art text-to-image diffusion models, including DALL-E 3 and SDXL, particularly in multi-category object composition and text-image semantic alignment. Notably, our RPG framework exhibits wide compatibility with various MLLM architectures (e.g., MiniGPT-4) and diffusion backbones (e.g., ControlNet). Our code is available at: https://github.com/YangLing0818/RPG-DiffusionMaster △ Less

Submitted 5 May, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: ICML 2024. Project: https://github.com/YangLing0818/RPG-DiffusionMaster

arXiv:2312.14224 [pdf, other]

doi 10.1007/JHEP06(2024)216

$Z_c$ and $Z_{cs}$ systems with operator mixing at NLO in QCD sum rules

Authors: Ren-Hua Wua, Chen-Yu Wang, Ce Meng, Yan-Qing Ma, Kuang-Ta Chao

Abstract: We study the mass spectra of hidden-charm tetraquark systems with quantum numbers $(I^G)J^P=(1^+)1^+$ using QCD sum rules. The analysis incorporates the complete next-to-leading order (NLO) contribution to the perturbative QCD part of the operator product expansions, with particular attention to operator mixing effects due to renormalization group evolution. For the $\bar{d}c\bar{c}u$ system, the… ▽ More We study the mass spectra of hidden-charm tetraquark systems with quantum numbers $(I^G)J^P=(1^+)1^+$ using QCD sum rules. The analysis incorporates the complete next-to-leading order (NLO) contribution to the perturbative QCD part of the operator product expansions, with particular attention to operator mixing effects due to renormalization group evolution. For the $\bar{d}c\bar{c}u$ system, the masses of two mixed operators, $J_{1,5}^{\text{Mixed}}$ and $J_{2,6}^{\text{Mixed}}$, are determined to be $3.89^{+0.18}_{-0.12}$ GeV and $4.03^{+0.06}_{-0.07}$ GeV, respectively, closely matching those of $Z_c$(3900) and $Z_c(4020)$. Similarly, for the $\bar{s}c\bar{c}u$ states, the masses of $J_{1,5}^{\text{Mixed}}$ and $J_{2,6}^{\text{Mixed}}$ are found to be $4.02^{+0.17}_{-0.09}$ GeV and $4.21^{+0.08}_{-0.07}$ GeV, respectively, in close proximity to $Z_{cs}$(3983)/$Z_{cs}$(4000) and $Z_{cs}$(4220), consistent with the expectation that they are the partners of $Z_c$(3900) and $Z_c$(4020). Our results highlight the crucial role of operator mixing, an inevitable effect in a complete NLO calculation, in achieving a robust phenomenological description for the tetraquark system. △ Less

Submitted 10 July, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: 49 pages, 32 fugures. version accepted for publication in JHEP. arXiv admin note: substantial text overlap with arXiv:2201.11714

Journal ref: JHEP 06 (2024) 216

arXiv:2312.03606 [pdf, other]

DiffusionSat: A Generative Foundation Model for Satellite Imagery

Authors: Samar Khanna, Patrick Liu, Linqi Zhou, Chenlin Meng, Robin Rombach, Marshall Burke, David Lobell, Stefano Ermon

Abstract: Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications including environmental monitoring and crop-yield prediction. Satellite images are significantly different from natural images -- they can be multi-spectral, irregular… ▽ More Diffusion models have achieved state-of-the-art results on many modalities including images, speech, and video. However, existing models are not tailored to support remote sensing data, which is widely used in important applications including environmental monitoring and crop-yield prediction. Satellite images are significantly different from natural images -- they can be multi-spectral, irregularly sampled across time -- and existing diffusion models trained on images from the Web do not support them. Furthermore, remote sensing data is inherently spatio-temporal, requiring conditional generation tasks not supported by traditional methods based on captions or images. In this paper, we present DiffusionSat, to date the largest generative foundation model trained on a collection of publicly available large, high-resolution remote sensing datasets. As text-based captions are sparsely available for satellite images, we incorporate the associated metadata such as geolocation as conditioning information. Our method produces realistic samples and can be used to solve multiple generative tasks including temporal generation, superresolution given multi-spectral inputs and in-painting. Our method outperforms previous state-of-the-art methods for satellite image generation and is the first large-scale generative foundation model for satellite imagery. The project website can be found here: https://samar-khanna.github.io/DiffusionSat/ △ Less

Submitted 25 May, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

Comments: Published at ICLR 2024

arXiv:2311.17082 [pdf, other]

DreamPropeller: Supercharge Text-to-3D Generation with Parallel Sampling

Authors: Linqi Zhou, Andy Shih, Chenlin Meng, Stefano Ermon

Abstract: Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped… ▽ More Recent methods such as Score Distillation Sampling (SDS) and Variational Score Distillation (VSD) using 2D diffusion models for text-to-3D generation have demonstrated impressive generation quality. However, the long generation time of such algorithms significantly degrades the user experience. To tackle this problem, we propose DreamPropeller, a drop-in acceleration algorithm that can be wrapped around any existing text-to-3D generation pipeline based on score distillation. Our framework generalizes Picard iterations, a classical algorithm for parallel sampling an ODE path, and can account for non-ODE paths such as momentum-based gradient updates and changes in dimensions during the optimization process as in many cases of 3D generation. We show that our algorithm trades parallel compute for wallclock time and empirically achieves up to 4.7x speedup with a negligible drop in generation quality for all tested frameworks. △ Less

Submitted 20 May, 2024; v1 submitted 27 November, 2023; originally announced November 2023.

Comments: Github repo: https://github.com/alexzhou907/DreamPropeller; Project page: https://alexzhou907.github.io/dreampropeller_page/

arXiv:2311.16605 [pdf, other]

LasTGL: An Industrial Framework for Large-Scale Temporal Graph Learning

Authors: Jintang Li, Jiawang Dan, Ruofan Wu, Jing Zhou, Sheng Tian, Yunfei Liu, Baokun Wang, Changhua Meng, Weiqiang Wang, Yuchang Zhu, Liang Chen, Zibin Zheng

Abstract: Over the past few years, graph neural networks (GNNs) have become powerful and practical tools for learning on (static) graph-structure data. However, many real-world applications, such as social networks and e-commerce, involve temporal graphs where nodes and edges are dynamically evolving. Temporal graph neural networks (TGNNs) have progressively emerged as an extension of GNNs to address time-e… ▽ More Over the past few years, graph neural networks (GNNs) have become powerful and practical tools for learning on (static) graph-structure data. However, many real-world applications, such as social networks and e-commerce, involve temporal graphs where nodes and edges are dynamically evolving. Temporal graph neural networks (TGNNs) have progressively emerged as an extension of GNNs to address time-evolving graphs and have gradually become a trending research topic in both academics and industry. Advancing research and application in such an emerging field necessitates the development of new tools to compose TGNN models and unify their different schemes for dealing with temporal graphs. In this work, we introduce LasTGL, an industrial framework that integrates unified and extensible implementations of common temporal graph learning algorithms for various advanced tasks. The purpose of LasTGL is to provide the essential building blocks for solving temporal graph learning tasks, focusing on the guiding principles of user-friendliness and quick prototyping on which PyTorch is based. In particular, LasTGL provides comprehensive temporal graph datasets, TGNN models and utilities along with well-documented tutorials, making it suitable for both absolute beginners and expert deep learning practitioners alike. △ Less

Submitted 30 November, 2023; v1 submitted 28 November, 2023; originally announced November 2023.

Comments: Preprint; Work in progress

arXiv:2311.15241 [pdf, other]

CalibFormer: A Transformer-based Automatic LiDAR-Camera Calibration Network

Authors: Yuxuan Xiao, Yao Li, Chengzhen Meng, Xingchen Li, Jianmin Ji, Yanyong Zhang

Abstract: The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual inter… ▽ More The fusion of LiDARs and cameras has been increasingly adopted in autonomous driving for perception tasks. The performance of such fusion-based algorithms largely depends on the accuracy of sensor calibration, which is challenging due to the difficulty of identifying common features across different data modalities. Previously, many calibration methods involved specific targets and/or manual intervention, which has proven to be cumbersome and costly. Learning-based online calibration methods have been proposed, but their performance is barely satisfactory in most cases. These methods usually suffer from issues such as sparse feature maps, unreliable cross-modality association, inaccurate calibration parameter regression, etc. In this paper, to address these issues, we propose CalibFormer, an end-to-end network for automatic LiDAR-camera calibration. We aggregate multiple layers of camera and LiDAR image features to achieve high-resolution representations. A multi-head correlation module is utilized to identify correlations between features more accurately. Lastly, we employ transformer architectures to estimate accurate calibration parameters from the correlation information. Our method achieved a mean translation error of $0.8751 \mathrm{cm}$ and a mean rotation error of $0.0562 ^{\circ}$ on the KITTI dataset, surpassing existing state-of-the-art methods and demonstrating strong robustness, accuracy, and generalization capabilities. △ Less

Submitted 17 March, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

arXiv:2311.08190 [pdf, other]

SAMIHS: Adaptation of Segment Anything Model for Intracranial Hemorrhage Segmentation

Authors: Yinuo Wang, Kai Chen, Weimin Yuan, Cai Meng, XiangZhi Bai

Abstract: Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In t… ▽ More Segment Anything Model (SAM), a vision foundation model trained on large-scale annotations, has recently continued raising awareness within medical image segmentation. Despite the impressive capabilities of SAM on natural scenes, it struggles with performance decline when confronted with medical images, especially those involving blurry boundaries and highly irregular regions of low contrast. In this paper, a SAM-based parameter-efficient fine-tuning method, called SAMIHS, is proposed for intracranial hemorrhage segmentation, which is a crucial and challenging step in stroke diagnosis and surgical planning. Distinguished from previous SAM and SAM-based methods, SAMIHS incorporates parameter-refactoring adapters into SAM's image encoder and considers the efficient and flexible utilization of adapters' parameters. Additionally, we employ a combo loss that combines binary cross-entropy loss and boundary-sensitive loss to enhance SAMIHS's ability to recognize the boundary regions. Our experimental results on two public datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/mileswyn/SAMIHS . △ Less

Submitted 14 November, 2023; originally announced November 2023.

Comments: 5 pages, 3 figures, 2 tables

arXiv:2311.04287 [pdf, other]

Holistic Evaluation of Text-To-Image Models

Authors: Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Benita Teufel, Marco Bellagente, Minguk Kang, Taesung Park, Jure Leskovec, Jun-Yan Zhu, Li Fei-Fei, Jiajun Wu, Stefano Ermon, Percy Liang

Abstract: The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we… ▽ More The stunning qualitative improvement of recent text-to-image models has led to their widespread attention and adoption. However, we lack a comprehensive quantitative understanding of their capabilities and risks. To fill this gap, we introduce a new benchmark, Holistic Evaluation of Text-to-Image Models (HEIM). Whereas previous evaluations focus mostly on text-image alignment and image quality, we identify 12 aspects, including text-image alignment, image quality, aesthetics, originality, reasoning, knowledge, bias, toxicity, fairness, robustness, multilinguality, and efficiency. We curate 62 scenarios encompassing these aspects and evaluate 26 state-of-the-art text-to-image models on this benchmark. Our results reveal that no single model excels in all aspects, with different models demonstrating different strengths. We release the generated images and human evaluation results for full transparency at https://crfm.stanford.edu/heim/v1.1.0 and the code at https://github.com/stanford-crfm/helm, which is integrated with the HELM codebase. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: NeurIPS 2023. First three authors contributed equally

arXiv:2311.00886 [pdf, other]

COSTAR: Improved Temporal Counterfactual Estimation with Self-Supervised Learning

Authors: Chuizheng Meng, Yihe Dong, Sercan Ö. Arık, Yan Liu, Tomas Pfister

Abstract: Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality. For real-world datasets, modeling time-dependent confounders is challenging due to complex dynamics, long-range dependencies and both past treatments and… ▽ More Estimation of temporal counterfactual outcomes from observed history is crucial for decision-making in many domains such as healthcare and e-commerce, particularly when randomized controlled trials (RCTs) suffer from high cost or impracticality. For real-world datasets, modeling time-dependent confounders is challenging due to complex dynamics, long-range dependencies and both past treatments and covariates affecting the future outcomes. In this paper, we introduce Counterfactual Self-Supervised Transformer (COSTAR), a novel approach that integrates self-supervised learning for improved historical representations. We propose a component-wise contrastive loss tailored for temporal treatment outcome observations and explain its effectiveness from the view of unsupervised domain adaptation. COSTAR yields superior performance in estimation accuracy and generalization to out-of-distribution data compared to existing models, as validated by empirical results on both synthetic and real-world datasets. △ Less

Submitted 12 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

arXiv:2310.17918 [pdf, other]

Knowing What LLMs DO NOT Know: A Simple Yet Effective Self-Detection Method

Authors: Yukun Zhao, Lingyong Yan, Weiwei Sun, Guoliang Xing, Chong Meng, Shuaiqiang Wang, Zhicong Cheng, Zhaochun Ren, Dawei Yin

Abstract: Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. However, recent literature reveals that LLMs generate nonfactual responses intermittently, which impedes the LLMs' reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual res… ▽ More Large Language Models (LLMs) have shown great potential in Natural Language Processing (NLP) tasks. However, recent literature reveals that LLMs generate nonfactual responses intermittently, which impedes the LLMs' reliability for further utilization. In this paper, we propose a novel self-detection method to detect which questions that a LLM does not know that are prone to generate nonfactual results. Specifically, we first diversify the textual expressions for a given question and collect the corresponding answers. Then we examine the divergencies between the generated answers to identify the questions that the model may generate falsehoods. All of the above steps can be accomplished by prompting the LLMs themselves without referring to any other external resources. We conduct comprehensive experiments and demonstrate the effectiveness of our method on recently released LLMs, e.g., Vicuna, ChatGPT, and GPT-4. △ Less

Submitted 21 March, 2024; v1 submitted 27 October, 2023; originally announced October 2023.

Comments: Accepted by NAACL 2024

arXiv:2310.16834 [pdf, other]

Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution

Authors: Aaron Lou, Chenlin Meng, Stefano Ermon

Abstract: Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing sc… ▽ More Despite their groundbreaking performance for many generative modeling tasks, diffusion models have fallen short on discrete data domains such as natural language. Crucially, standard diffusion models rely on the well-established theory of score matching, but efforts to generalize this to discrete structures have not yielded the same empirical gains. In this work, we bridge this gap by proposing score entropy, a novel loss that naturally extends score matching to discrete spaces, integrates seamlessly to build discrete diffusion models, and significantly boosts performance. Experimentally, we test our Score Entropy Discrete Diffusion models (SEDD) on standard language modeling tasks. For comparable model sizes, SEDD beats existing language diffusion paradigms (reducing perplexity by $25$-$75$\%) and is competitive with autoregressive models, in particular outperforming GPT-2. Furthermore, compared to autoregressive mdoels, SEDD generates faithful text without requiring distribution annealing techniques like temperature scaling (around $6$-$8\times$ better generative perplexity than un-annealed GPT-2), can trade compute and quality (similar quality with $32\times$ fewer network evaluations), and enables controllable infilling (matching nucleus sampling quality while enabling other strategies besides left to right prompting). △ Less

Submitted 6 June, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: ICML 2024 Oral. Code at https://github.com/louaaron/Score-Entropy-Discrete-Diffusion

arXiv:2310.16319 [pdf, other]

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment

Authors: Yukun Zhao, Lingyong Yan, Weiwei Sun, Chong Meng, Shuaiqiang Wang, Zhicong Cheng, Zhaochun Ren, Dawei Yin

Abstract: Dialogue assessment plays a critical role in the development of open-domain dialogue systems. Existing work are uncapable of providing an end-to-end and human-epistemic assessment dataset, while they only provide sub-metrics like coherence or the dialogues are conversed between annotators far from real user settings. In this paper, we release a large-scale dialogue quality assessment dataset (DiQA… ▽ More Dialogue assessment plays a critical role in the development of open-domain dialogue systems. Existing work are uncapable of providing an end-to-end and human-epistemic assessment dataset, while they only provide sub-metrics like coherence or the dialogues are conversed between annotators far from real user settings. In this paper, we release a large-scale dialogue quality assessment dataset (DiQAD), for automatically assessing open-domain dialogue quality. Specifically, we (1) establish the assessment criteria based on the dimensions conforming to human judgements on dialogue qualities, and (2) annotate large-scale dialogues that conversed between real users based on these annotation criteria, which contains around 100,000 dialogues. We conduct several experiments and report the performances of the baselines as the benchmark on DiQAD. The dataset is openly accessible at https://github.com/yukunZhao/Dataset_Dialogue_quality_evaluation. △ Less

Submitted 24 October, 2023; originally announced October 2023.

Comments: Accepted to Findings of EMNLP 2023

arXiv:2310.11664 [pdf, other]

Hetero$^2$Net: Heterophily-aware Representation Learning on Heterogenerous Graphs

Authors: Jintang Li, Zheng Wei, Jiawang Dan, Jing Zhou, Yuchang Zhu, Ruofan Wu, Baokun Wang, Zhang Zhen, Changhua Meng, Hong Jin, Zibin Zheng, Liang Chen

Abstract: Real-world graphs are typically complex, exhibiting heterogeneity in the global structure, as well as strong heterophily within local neighborhoods. While a growing body of literature has revealed the limitations of common graph neural networks (GNNs) in handling homogeneous graphs with heterophily, little work has been conducted on investigating the heterophily properties in the context of hetero… ▽ More Real-world graphs are typically complex, exhibiting heterogeneity in the global structure, as well as strong heterophily within local neighborhoods. While a growing body of literature has revealed the limitations of common graph neural networks (GNNs) in handling homogeneous graphs with heterophily, little work has been conducted on investigating the heterophily properties in the context of heterogeneous graphs. To bridge this research gap, we identify the heterophily in heterogeneous graphs using metapaths and propose two practical metrics to quantitatively describe the levels of heterophily. Through in-depth investigations on several real-world heterogeneous graphs exhibiting varying levels of heterophily, we have observed that heterogeneous graph neural networks (HGNNs), which inherit many mechanisms from GNNs designed for homogeneous graphs, fail to generalize to heterogeneous graphs with heterophily or low level of homophily. To address the challenge, we present Hetero$^2$Net, a heterophily-aware HGNN that incorporates both masked metapath prediction and masked label prediction tasks to effectively and flexibly handle both homophilic and heterophilic heterogeneous graphs. We evaluate the performance of Hetero$^2$Net on five real-world heterogeneous graph benchmarks with varying levels of heterophily. The results demonstrate that Hetero$^2$Net outperforms strong baselines in the semi-supervised node classification task, providing valuable insights into effectively handling more complex heterogeneous graphs. △ Less

Submitted 17 October, 2023; originally announced October 2023.

Comments: Preprint

arXiv:2310.11281 [pdf, other]

Self-supervision meets kernel graph neural models: From architecture to augmentations

Authors: Jiawang Dan, Ruofan Wu, Yunpeng Liu, Baokun Wang, Changhua Meng, Tengfei Liu, Tianyi Zhang, Ningtao Wang, Xing Fu, Qi Li, Weiqiang Wang

Abstract: Graph representation learning has now become the de facto standard when handling graph-structured data, with the framework of message-passing graph neural networks (MPNN) being the most prevailing algorithmic tool. Despite its popularity, the family of MPNNs suffers from several drawbacks such as transparency and expressivity. Recently, the idea of designing neural models on graphs using the theor… ▽ More Graph representation learning has now become the de facto standard when handling graph-structured data, with the framework of message-passing graph neural networks (MPNN) being the most prevailing algorithmic tool. Despite its popularity, the family of MPNNs suffers from several drawbacks such as transparency and expressivity. Recently, the idea of designing neural models on graphs using the theory of graph kernels has emerged as a more transparent as well as sometimes more expressive alternative to MPNNs known as kernel graph neural networks (KGNNs). Developments on KGNNs are currently a nascent field of research, leaving several challenges from algorithmic design and adaptation to other learning paradigms such as self-supervised learning. In this paper, we improve the design and learning of KGNNs. Firstly, we extend the algorithmic formulation of KGNNs by allowing a more flexible graph-level similarity definition that encompasses former proposals like random walk graph kernel, as well as providing a smoother optimization objective that alleviates the need of introducing combinatorial learning procedures. Secondly, we enhance KGNNs through the lens of self-supervision via developing a novel structure-preserving graph data augmentation method called latent graph augmentation (LGA). Finally, we perform extensive empirical evaluations to demonstrate the efficacy of our proposed mechanisms. Experimental results over benchmark datasets suggest that our proposed model achieves competitive performance that is comparable to or sometimes outperforming state-of-the-art graph representation learning frameworks with or without self-supervision on graph classification tasks. Comparisons against other previously established graph data augmentation methods verify that the proposed LGA augmentation scheme captures better semantics of graph-level invariance. △ Less

Submitted 17 October, 2023; originally announced October 2023.

arXiv:2310.10270 [pdf, other]

$h$-function, Hilbert-Kunz density function and Frobenius-Poincaré function

Authors: Cheng Meng, Alapan Mukhopadhyay

Abstract: Given ideals $I,J$ of a noetherian local ring $(R, \mathfrak m)$ such that $I+J$ is $\mathfrak m$-primary and a finitely generated $R$-module $M$, we associate an invariant of $(M,R,I,J)$ called the $h$-function. Our results on $h$-functions allow extensions of the theories of Frobenius-Poincaré functions and Hilbert-Kunz density functions from the known graded case to the local case, answering a… ▽ More Given ideals $I,J$ of a noetherian local ring $(R, \mathfrak m)$ such that $I+J$ is $\mathfrak m$-primary and a finitely generated $R$-module $M$, we associate an invariant of $(M,R,I,J)$ called the $h$-function. Our results on $h$-functions allow extensions of the theories of Frobenius-Poincaré functions and Hilbert-Kunz density functions from the known graded case to the local case, answering a question of V.Trivedi. When $J$ is $\mathfrak m$-primary, we describe the support of the corresponding density function in terms of other invariants of $(R, I,J)$. We show that the support captures the $F$-threshold: $c^J(I)$, under mild assumptions, extending results of V. Trivedi and Watanabe. The $h$-function encodes Hilbert-Samuel, Hilbert-Kunz multiplicity and $F$-threshold of the ideal pair involved. Using this feature of $h$-functions, we provide an equivalent formulation of a conjecture of Huneke, Mustaţă, Takagi, Watanabe; recover a result of Smirnov and Betancourt; prove that a result of Hanes comparing multiplicities, is equivalent to an a priori weaker containment condition on ideals. We also point out that a conjecture of Smirnov-Betancourt as stated is false and suggest a correction which we relate to the conjecture of Huneke et al. We develop the theory of $h$-functions in a more general setting which yields a density function for $F$-signature. A key to many results on $h$-functions is a `convexity technique' that we introduce, which in particular proves differentiability of Hilbert-Kunz density functions almost everywhere on $(0,\infty)$, thus contributing to another question of Trivedi. △ Less

Submitted 25 April, 2024; v1 submitted 16 October, 2023; originally announced October 2023.

Comments: v2: substantial changes: applications added, results improved

arXiv:2310.00413 [pdf, other]

SSIF: Learning Continuous Image Representation for Spatial-Spectral Super-Resolution

Authors: Gengchen Mai, Ni Lao, Weiwei Sun, Yuchi Ma, Jiaming Song, Chenlin Meng, Hongxu Ma, Jinmeng Rao, Ziyuan Li, Stefano Ermon

Abstract: Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolu… ▽ More Existing digital sensors capture images at fixed spatial and spectral resolutions (e.g., RGB, multispectral, and hyperspectral images), and each combination requires bespoke machine learning models. Neural Implicit Functions partially overcome the spatial resolution challenge by representing an image in a resolution-independent way. However, they still operate at fixed, pre-defined spectral resolutions. To address this challenge, we propose Spatial-Spectral Implicit Function (SSIF), a neural implicit model that represents an image as a function of both continuous pixel coordinates in the spatial domain and continuous wavelengths in the spectral domain. We empirically demonstrate the effectiveness of SSIF on two challenging spatio-spectral super-resolution benchmarks. We observe that SSIF consistently outperforms state-of-the-art baselines even when the baselines are allowed to train separate models at each spectral resolution. We show that SSIF generalizes well to both unseen spatial resolutions and spectral resolutions. Moreover, SSIF can generate high-resolution images that improve the performance of downstream tasks (e.g., land use classification) by 1.7%-7%. △ Less

Submitted 30 September, 2023; originally announced October 2023.

MSC Class: 68T07; 68T45 ACM Class: I.4.10; I.2.10; I.4.6

arXiv:2309.00859 [pdf, other]

DeepScaler: Holistic Autoscaling for Microservices Based on Spatiotemporal GNN with Adaptive Graph Learning

Authors: Chunyang Meng, Shijie Song, Haogang Tong, Maolin Pan, Yang Yu

Abstract: Autoscaling functions provide the foundation for achieving elasticity in the modern cloud computing paradigm. It enables dynamic provisioning or de-provisioning resources for cloud software services and applications without human intervention to adapt to workload fluctuations. However, autoscaling microservice is challenging due to various factors. In particular, complex, time-varying service depe… ▽ More Autoscaling functions provide the foundation for achieving elasticity in the modern cloud computing paradigm. It enables dynamic provisioning or de-provisioning resources for cloud software services and applications without human intervention to adapt to workload fluctuations. However, autoscaling microservice is challenging due to various factors. In particular, complex, time-varying service dependencies are difficult to quantify accurately and can lead to cascading effects when allocating resources. This paper presents DeepScaler, a deep learning-based holistic autoscaling approach for microservices that focus on coping with service dependencies to optimize service-level agreements (SLA) assurance and cost efficiency. DeepScaler employs (i) an expectation-maximization-based learning method to adaptively generate affinity matrices revealing service dependencies and (ii) an attention-based graph convolutional network to extract spatio-temporal features of microservices by aggregating neighbors' information of graph-structural data. Thus DeepScaler can capture more potential service dependencies and accurately estimate the resource requirements of all services under dynamic workloads. It allows DeepScaler to reconfigure the resources of the interacting services simultaneously in one resource provisioning operation, avoiding the cascading effect caused by service dependencies. Experimental results demonstrate that our method implements a more effective autoscaling mechanism for microservice that not only allocates resources accurately but also adapts to dependencies changes, significantly reducing SLA violations by an average of 41% at lower costs. △ Less

Submitted 2 September, 2023; originally announced September 2023.

Comments: To be published in the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE 2023)

arXiv:2309.00169 [pdf, other]

RepCodec: A Speech Representation Codec for Speech Tokenization

Authors: Zhichao Huang, Chutong Meng, Tom Ko

Abstract: With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing overall performance. To improve the performance of these discrete speech tokens, we present RepCodec, a novel speech representation codec for semantic speech token… ▽ More With recent rapid growth of large language models (LLMs), discrete speech tokenization has played an important role for injecting speech into LLMs. However, this discretization gives rise to a loss of information, consequently impairing overall performance. To improve the performance of these discrete speech tokens, we present RepCodec, a novel speech representation codec for semantic speech tokenization. In contrast to audio codecs which reconstruct the raw audio, RepCodec learns a vector quantization codebook through reconstructing speech representations from speech encoders like HuBERT or data2vec. Together, the speech encoder, the codec encoder and the vector quantization codebook form a pipeline for converting speech waveforms into semantic tokens. The extensive experiments illustrate that RepCodec, by virtue of its enhanced information retention capacity, significantly outperforms the widely used k-means clustering approach in both speech understanding and generation. Furthermore, this superiority extends across various speech encoders and languages, affirming the robustness of RepCodec. We believe our method can facilitate large language modeling research on speech processing. △ Less

Submitted 6 June, 2024; v1 submitted 31 August, 2023; originally announced September 2023.

arXiv:2308.12444 [pdf, other]

Leverage classifier: Another look at support vector machine

Authors: Yixin Han, Jun Yu, Nan Zhang, Cheng Meng, Ping Ma, Wenxuan Zhong, Changliang Zou

Abstract: Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, ena… ▽ More Support vector machine (SVM) is a popular classifier known for accuracy, flexibility, and robustness. However, its intensive computation has hindered its application to large-scale datasets. In this paper, we propose a new optimal leverage classifier based on linear SVM under a nonseparable setting. Our classifier aims to select an informative subset of the training sample to reduce data size, enabling efficient computation while maintaining high accuracy. We take a novel view of SVM under the general subsampling framework and rigorously investigate the statistical properties. We propose a two-step subsampling procedure consisting of a pilot estimation of the optimal subsampling probabilities and a subsampling step to construct the classifier. We develop a new Bahadur representation of the SVM coefficients and derive unconditional asymptotic distribution and optimal subsampling probabilities without giving the full sample. Numerical results demonstrate that our classifiers outperform the existing methods in terms of estimation, computation, and prediction. △ Less

Submitted 23 August, 2023; originally announced August 2023.

Comments: 50 pages, 9 figures, 3 tables

arXiv:2308.12061 [pdf, other]

HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing

Authors: Jonathan Xu, Amna Elmustafa, Liya Weldegebriel, Emnet Negash, Richard Lee, Chenlin Meng, Stefano Ermon, David Lobell

Abstract: Small farms contribute to a large share of the productive land in developing countries. In regions such as sub-Saharan Africa, where 80\% of farms are small (under 2 ha in size), the task of mapping smallholder cropland is an important part of tracking sustainability measures such as crop productivity. However, the visually diverse and nuanced appearance of small farms has limited the effectivenes… ▽ More Small farms contribute to a large share of the productive land in developing countries. In regions such as sub-Saharan Africa, where 80\% of farms are small (under 2 ha in size), the task of mapping smallholder cropland is an important part of tracking sustainability measures such as crop productivity. However, the visually diverse and nuanced appearance of small farms has limited the effectiveness of traditional approaches to cropland mapping. Here we introduce a new approach based on the detection of harvest piles characteristic of many smallholder systems throughout the world. We present HarvestNet, a dataset for mapping the presence of farms in the Ethiopian regions of Tigray and Amhara during 2020-2023, collected using expert knowledge and satellite images, totaling 7k hand-labeled images and 2k ground-collected labels. We also benchmark a set of baselines, including SOTA models in remote sensing, with our best models having around 80\% classification performance on hand labelled data and 90\% and 98\% accuracy on ground truth data for Tigray and Amhara, respectively. We also perform a visual comparison with a widely used pre-existing coverage map and show that our model detects an extra 56,621 hectares of cropland in Tigray. We conclude that remote sensing of harvest piles can contribute to more timely and accurate cropland assessments in food insecure regions. The dataset can be accessed through https://figshare.com/s/45a7b45556b90a9a11d2, while the code for the dataset and benchmarks is publicly available at https://github.com/jonxuxu/harvest-piles △ Less

Submitted 5 March, 2024; v1 submitted 23 August, 2023; originally announced August 2023.

Comments: submitted to AAAI24

arXiv:2308.11198 [pdf, other]

Novel-view Synthesis and Pose Estimation for Hand-Object Interaction from Sparse Views

Authors: Wentian Qu, Zhaopeng Cui, Yinda Zhang, Chenyu Meng, Cuixia Ma, Xiaoming Deng, Hongan Wang

Abstract: Hand-object interaction understanding and the barely addressed novel view synthesis are highly desired in the immersive communication, whereas it is challenging due to the high deformation of hand and heavy occlusions between hand and object. In this paper, we propose a neural rendering and pose estimation system for hand-object interaction from sparse views, which can also enable 3D hand-object i… ▽ More Hand-object interaction understanding and the barely addressed novel view synthesis are highly desired in the immersive communication, whereas it is challenging due to the high deformation of hand and heavy occlusions between hand and object. In this paper, we propose a neural rendering and pose estimation system for hand-object interaction from sparse views, which can also enable 3D hand-object interaction editing. We share the inspiration from recent scene understanding work that shows a scene specific model built beforehand can significantly improve and unblock vision tasks especially when inputs are sparse, and extend it to the dynamic hand-object interaction scenario and propose to solve the problem in two stages. We first learn the shape and appearance prior knowledge of hands and objects separately with the neural representation at the offline stage. During the online stage, we design a rendering-based joint model fitting framework to understand the dynamic hand-object interaction with the pre-built hand and object models as well as interaction priors, which thereby overcomes penetration and separation issues between hand and object and also enables novel view synthesis. In order to get stable contact during the hand-object interaction process in a sequence, we propose a stable contact loss to make the contact region to be consistent. Experiments demonstrate that our method outperforms the state-of-the-art methods. Code and dataset are available in project webpage https://iscas3dv.github.io/HO-NeRF. △ Less

Submitted 22 August, 2023; originally announced August 2023.

arXiv:2308.09966 [pdf, other]

Time-aligned Exposure-enhanced Model for Click-Through Rate Prediction

Authors: Hengyu Zhang, Chang Meng, Wei Guo, Huifeng Guo, Jieming Zhu, Guangpeng Zhao, Ruiming Tang, Xiu Li

Abstract: Click-Through Rate (CTR) prediction, crucial in applications like recommender systems and online advertising, involves ranking items based on the likelihood of user clicks. User behavior sequence modeling has marked progress in CTR prediction, which extracts users' latent interests from their historical behavior sequences to facilitate accurate CTR prediction. Recent research explores using implic… ▽ More Click-Through Rate (CTR) prediction, crucial in applications like recommender systems and online advertising, involves ranking items based on the likelihood of user clicks. User behavior sequence modeling has marked progress in CTR prediction, which extracts users' latent interests from their historical behavior sequences to facilitate accurate CTR prediction. Recent research explores using implicit feedback sequences, like unclicked records, to extract diverse user interests. However, these methods encounter key challenges: 1) temporal misalignment due to disparate sequence time ranges and 2) the lack of fine-grained interaction among feedback sequences. To address these challenges, we propose a novel framework called TEM4CTR, which ensures temporal alignment among sequences while leveraging auxiliary feedback information to enhance click behavior at the item level through a representation projection mechanism. Moreover, this projection-based information transfer module can effectively alleviate the negative impact of irrelevant or even potentially detrimental components of the auxiliary feedback information on the learning process of click behavior. Comprehensive experiments on public and industrial datasets confirm the superiority and effectiveness of TEM4CTR, showcasing the significance of temporal alignment in multi-feedback modeling. △ Less

Submitted 19 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures

arXiv:2308.07625 [pdf, other]

Backpropagation Path Search On Adversarial Transferability

Authors: Zhuoer Xu, Zhangxuan Gu, Jianping Zhang, Shiwen Cui, Changhua Meng, Weiqiang Wang

Abstract: Deep neural networks are vulnerable to adversarial examples, dictating the imperativeness to test the model's robustness before deployment. Transfer-based attackers craft adversarial examples against surrogate models and transfer them to victim models deployed in the black-box situation. To enhance the adversarial transferability, structure-based attackers adjust the backpropagation path to avoid… ▽ More Deep neural networks are vulnerable to adversarial examples, dictating the imperativeness to test the model's robustness before deployment. Transfer-based attackers craft adversarial examples against surrogate models and transfer them to victim models deployed in the black-box situation. To enhance the adversarial transferability, structure-based attackers adjust the backpropagation path to avoid the attack from overfitting the surrogate model. However, existing structure-based attackers fail to explore the convolution module in CNNs and modify the backpropagation graph heuristically, leading to limited effectiveness. In this paper, we propose backPropagation pAth Search (PAS), solving the aforementioned two problems. We first propose SkipConv to adjust the backpropagation path of convolution by structural reparameterization. To overcome the drawback of heuristically designed backpropagation paths, we further construct a DAG-based search space, utilize one-step approximation for path evaluation and employ Bayesian Optimization to search for the optimal path. We conduct comprehensive experiments in a wide range of transfer settings, showing that PAS improves the attack success rate by a huge margin for both normally trained and defense models. △ Less

Submitted 15 August, 2023; originally announced August 2023.

Comments: Accepted by ICCV2023

arXiv:2308.04807 [pdf, other]

doi 10.1145/3583780.3615004

Parallel Knowledge Enhancement based Framework for Multi-behavior Recommendation

Authors: Chang Meng, Chenhao Zhai, Yu Yang, Hengyu Zhang, Xiu Li

Abstract: Multi-behavior recommendation algorithms aim to leverage the multiplex interactions between users and items to learn users' latent preferences. Recent multi-behavior recommendation frameworks contain two steps: fusion and prediction. In the fusion step, advanced neural networks are used to model the hierarchical correlations between user behaviors. In the prediction step, multiple signals are util… ▽ More Multi-behavior recommendation algorithms aim to leverage the multiplex interactions between users and items to learn users' latent preferences. Recent multi-behavior recommendation frameworks contain two steps: fusion and prediction. In the fusion step, advanced neural networks are used to model the hierarchical correlations between user behaviors. In the prediction step, multiple signals are utilized to jointly optimize the model with a multi-task learning (MTL) paradigm. However, recent approaches have not addressed the issue caused by imbalanced data distribution in the fusion step, resulting in the learned relationships being dominated by high-frequency behaviors. In the prediction step, the existing methods use a gate mechanism to directly aggregate expert information generated by coupling input, leading to negative information transfer. To tackle these issues, we propose a Parallel Knowledge Enhancement Framework (PKEF) for multi-behavior recommendation. Specifically, we enhance the hierarchical information propagation in the fusion step using parallel knowledge (PKF). Meanwhile, in the prediction step, we decouple the representations to generate expert information and introduce a projection mechanism during aggregation to eliminate gradient conflicts and alleviate negative transfer (PME). We conduct comprehensive experiments on three real-world datasets to validate the effectiveness of our model. The results further demonstrate the rationality and effectiveness of the designed PKF and PME modules. The source code and datasets are available at https://github.com/MC-CV/PKEF. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: Accepted by CIKM 2023

arXiv:2308.03777 [pdf]

Lab-in-a-Tube: A portable imaging spectrophotometer for cost-effective, high-throughput, and label-free analysis of centrifugation processes

Authors: Yuanyuan Wei, Dehua Hu, Bijie Bai, Chenqi Meng, Tsz Kin Chan, Xing Zhao, Yuye Wang, Yi-Ping Ho, Wu Yuan, Ho-Pui Ho

Abstract: Centrifuges serve as essential instruments in modern experimental sciences, facilitating a wide range of routine sample processing tasks that necessitate material sedimentation. However, the study for real time observation of the dynamical process during centrifugation has remained elusive. In this study, we developed an innovative Lab_in_a_Tube imaging spectrophotometer that incorporates capabili… ▽ More Centrifuges serve as essential instruments in modern experimental sciences, facilitating a wide range of routine sample processing tasks that necessitate material sedimentation. However, the study for real time observation of the dynamical process during centrifugation has remained elusive. In this study, we developed an innovative Lab_in_a_Tube imaging spectrophotometer that incorporates capabilities of real time image analysis and programmable interruption. This portable LIAT device costs less than 30 US dollars. Based on our knowledge, it is the first Wi Fi camera built_in in common lab centrifuges with active closed_loop control. We tested our LIAT imaging spectrophotometer with solute solvent interaction investigation obtained from lab centrifuges with quantitative data plotting in a real time manner. Single re circulating flow was real time observed, forming the ring shaped pattern during centrifugation. To the best of our knowledge, this is the very first observation of similar phenomena. We developed theoretical simulations for the single particle in a rotating reference frame, which correlated well with experimental results. We also demonstrated the first demonstration to visualize the blood sedimentation process in clinical lab centrifuges. This remarkable cost effectiveness opens up exciting opportunities for centrifugation microbiology research and paves the way for the creation of a network of computational imaging spectrometers at an affordable price for large scale and continuous monitoring of centrifugal processes in general. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: 21 Pages, 6 Figures

arXiv:2307.12069 [pdf, other]

Some remarks on compositeness of $T^+_{cc}$

Authors: Chang Chen, Ce Meng, Zhiguang Xiao, Han-Qing Zheng

Abstract: Recently LHCb experimental group find an exotic state $T^+_{cc}$ from the process $p\bar{p} \to D^0D^0π^+ + X$. A key question is if it is just a molecule or may have confined tetraquark ingredient. To investigate this, different methods are taken, including two channel ($D^{*+}D^0$ and $D^{*0}D^+$) K-matrix unitarization and single channel Flatté-like parametrization method analysed by pole count… ▽ More Recently LHCb experimental group find an exotic state $T^+_{cc}$ from the process $p\bar{p} \to D^0D^0π^+ + X$. A key question is if it is just a molecule or may have confined tetraquark ingredient. To investigate this, different methods are taken, including two channel ($D^{*+}D^0$ and $D^{*0}D^+$) K-matrix unitarization and single channel Flatté-like parametrization method analysed by pole counting rule and spectral density function sum rule. It demonstrates that $T^+_{cc}$ is a molecular state, though the possibility that there may exist elementary ingredient can not be excluded, by rough analysis on its production rate. △ Less

Submitted 29 February, 2024; v1 submitted 22 July, 2023; originally announced July 2023.

Showing 1–50 of 232 results for author: Meng, C