subscribe to arXiv mailings

arXiv:2407.08957 [pdf, other]

High throughput screening, crystal structure prediction, and carrier mobility calculations of organic molecular semiconductors as hole transport layer materials in perovskite solar cells

Authors: Md Omar Faruque, Suchona Akter, Dil K. Limbu, Kathleen Kilway, Zhonghua Peng, Mohammad R. Momeni

Abstract: Using a representative translational dimer model, high throughput calculations are implemented for fast screening of a total of 74 diacenaphtho-extended heterocycle (DAH) derivatives as hole transport layer (HTL) materials in perovskite solar cells (PVSCs). Different electronic properties, including band structures, band gaps, and band edges compared to methylammonium and formamidinium lead iodide… ▽ More Using a representative translational dimer model, high throughput calculations are implemented for fast screening of a total of 74 diacenaphtho-extended heterocycle (DAH) derivatives as hole transport layer (HTL) materials in perovskite solar cells (PVSCs). Different electronic properties, including band structures, band gaps, and band edges compared to methylammonium and formamidinium lead iodide perovskites, along with reorganization energies, electronic couplings, and hole mobilities are calculated in order to decipher the effects of different parameters, including the polarity, steric and pi-conjugation, as well as the presence of explicit hydrogen bond interactions on the computed carrier mobilities of the studied materials. Full crystal structure predictions and hole mobility calculations of the top candidates resulted in some mobilities exceeding 10 cm2/V.s, further validating the employed translational dimer model as a robust approach for inverse design and fast high throughput screening of new HTL organic semiconductors with superior properties. The studied models and simulations performed in this work are instructive in designing next-generation HTL materials for higher-performance PVSCs. △ Less

Submitted 11 July, 2024; originally announced July 2024.

arXiv:2407.05726 [pdf, other]

Gait Patterns as Biomarkers: A Video-Based Approach for Classifying Scoliosis

Authors: Zirui Zhou, Junhao Liang, Zizhao Peng, Chao Fan, Fengwei An, Shiqi Yu

Abstract: Scoliosis poses significant diagnostic challenges, particularly in adolescents, where early detection is crucial for effective treatment. Traditional diagnostic and follow-up methods, which rely on physical examinations and radiography, face limitations due to the need for clinical expertise and the risk of radiation exposure, thus restricting their use for widespread early screening. In response,… ▽ More Scoliosis poses significant diagnostic challenges, particularly in adolescents, where early detection is crucial for effective treatment. Traditional diagnostic and follow-up methods, which rely on physical examinations and radiography, face limitations due to the need for clinical expertise and the risk of radiation exposure, thus restricting their use for widespread early screening. In response, we introduce a novel, video-based, non-invasive method for scoliosis classification using gait analysis, which circumvents these limitations. This study presents Scoliosis1K, the first large-scale dataset tailored for video-based scoliosis classification, encompassing over one thousand adolescents. Leveraging this dataset, we developed ScoNet, an initial model that encountered challenges in dealing with the complexities of real-world data. This led to the creation of ScoNet-MT, an enhanced model incorporating multi-task learning, which exhibits promising diagnostic accuracy for application purposes. Our findings demonstrate that gait can be a non-invasive biomarker for scoliosis, revolutionizing screening practices with deep learning and setting a precedent for non-invasive diagnostic methodologies. The dataset and code are publicly available at https://zhouzi180.github.io/Scoliosis1K/. △ Less

Submitted 9 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: Accepted to MICCAI 2024

arXiv:2407.00286 [pdf, other]

Digital Twin-Assisted Data-Driven Optimization for Reliable Edge Caching in Wireless Networks

Authors: Zifan Zhang, Yuchen Liu, Zhiyuan Peng, Mingzhe Chen, Dongkuan Xu, Shuguang Cui

Abstract: Optimizing edge caching is crucial for the advancement of next-generation (nextG) wireless networks, ensuring high-speed and low-latency services for mobile users. Existing data-driven optimization approaches often lack awareness of the distribution of random data variables and focus solely on optimizing cache hit rates, neglecting potential reliability concerns, such as base station overload and… ▽ More Optimizing edge caching is crucial for the advancement of next-generation (nextG) wireless networks, ensuring high-speed and low-latency services for mobile users. Existing data-driven optimization approaches often lack awareness of the distribution of random data variables and focus solely on optimizing cache hit rates, neglecting potential reliability concerns, such as base station overload and unbalanced cache issues. This oversight can result in system crashes and degraded user experience. To bridge this gap, we introduce a novel digital twin-assisted optimization framework, called D-REC, which integrates reinforcement learning (RL) with diverse intervention modules to ensure reliable caching in nextG wireless networks. We first develop a joint vertical and horizontal twinning approach to efficiently create network digital twins, which are then employed by D-REC as RL optimizers and safeguards, providing ample datasets for training and predictive evaluation of our cache replacement policy. By incorporating reliability modules into a constrained Markov decision process, D-REC can adaptively adjust actions, rewards, and states to comply with advantageous constraints, minimizing the risk of network failures. Theoretical analysis demonstrates comparable convergence rates between D-REC and vanilla data-driven methods without compromising caching performance. Extensive experiments validate that D-REC outperforms conventional approaches in cache hit rate and load balancing while effectively enforcing predetermined reliability intervention modules. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Accepted by IEEE Journal on Selected Areas in Communications (JSAC)

arXiv:2406.19195 [pdf, other]

Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights

Authors: Zeqin Yang, Weilin Chen, Ruichu Cai, Yuguang Yan, Zhifeng Hao, Zhipeng Yu, Zhichao Zou, Zhen Peng, Jiecheng Guo

Abstract: Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we… ▽ More Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we address a more general problem of estimating the long-term heterogeneous dose-response curve (HDRC) while accounting for unobserved confounders. Specifically, to remove unobserved confounding in observational data, we introduce an optimal transport weighting framework to align the observational data to the experimental data with theoretical guarantees. Furthermore,to accurately predict the heterogeneous effects of continuous treatment, we establish a generalization bound on counterfactual prediction error by leveraging the reweighted distribution induced by optimal transport. Finally, we develop an HDRC estimator building upon the above theoretical foundations. Extensive experimental studies conducted on multiple synthetic and semi-synthetic datasets demonstrate the effectiveness of our proposed method. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.16907 [pdf, other]

RayProNet: A Neural Point Field Framework for Radio Propagation Modeling in 3D Environments

Authors: Ge Cao, Zhen Peng

Abstract: The radio wave propagation channel is central to the performance of wireless communication systems. In this paper, we introduce a novel machine learning-empowered methodology for wireless channel modeling. The key ingredients include a point-cloud-based neural network and a Spherical Harmonics encoder with light probes. Our approach offers several significant advantages, including the flexibility… ▽ More The radio wave propagation channel is central to the performance of wireless communication systems. In this paper, we introduce a novel machine learning-empowered methodology for wireless channel modeling. The key ingredients include a point-cloud-based neural network and a Spherical Harmonics encoder with light probes. Our approach offers several significant advantages, including the flexibility to adjust antenna radiation patterns and transmitter/receiver locations, the capability to predict radio power maps, and the scalability of large-scale wireless scenes. As a result, it lays the groundwork for an end-to-end pipeline for network planning and deployment optimization. The proposed work is validated in various outdoor and indoor radio environments. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.16866 [pdf, other]

Revisiting Referring Expression Comprehension Evaluation in the Era of Large Multimodal Models

Authors: Jierun Chen, Fangyun Wei, Jinjing Zhao, Sizhe Song, Bohuai Wu, Zhuoxuan Peng, S. -H. Gary Chan, Hongyang Zhang

Abstract: Referring expression comprehension (REC) involves localizing a target instance based on a textual description. Recent advancements in REC have been driven by large multimodal models (LMMs) like CogVLM, which achieved 92.44% accuracy on RefCOCO. However, this study questions whether existing benchmarks such as RefCOCO, RefCOCO+, and RefCOCOg, capture LMMs' comprehensive capabilities. We begin with… ▽ More Referring expression comprehension (REC) involves localizing a target instance based on a textual description. Recent advancements in REC have been driven by large multimodal models (LMMs) like CogVLM, which achieved 92.44% accuracy on RefCOCO. However, this study questions whether existing benchmarks such as RefCOCO, RefCOCO+, and RefCOCOg, capture LMMs' comprehensive capabilities. We begin with a manual examination of these benchmarks, revealing high labeling error rates: 14% in RefCOCO, 24% in RefCOCO+, and 5% in RefCOCOg, which undermines the authenticity of evaluations. We address this by excluding problematic instances and reevaluating several LMMs capable of handling the REC task, showing significant accuracy improvements, thus highlighting the impact of benchmark noise. In response, we introduce Ref-L4, a comprehensive REC benchmark, specifically designed to evaluate modern REC models. Ref-L4 is distinguished by four key features: 1) a substantial sample size with 45,341 annotations; 2) a diverse range of object categories with 365 distinct types and varying instance scales from 30 to 3,767; 3) lengthy referring expressions averaging 24.2 words; and 4) an extensive vocabulary comprising 22,813 unique words. We evaluate a total of 24 large models on Ref-L4 and provide valuable insights. The cleaned versions of RefCOCO, RefCOCO+, and RefCOCOg, as well as our Ref-L4 benchmark and evaluation code, are available at https://github.com/JierunChen/Ref-L4. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.16557 [pdf, other]

Efficient k-means with Individual Fairness via Exponential Tilting

Authors: Shengkun Zhu, Jinshan Zeng, Yuan Sun, Sheng Wang, Xiaodong Li, Zhiyong Peng

Abstract: In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve indivi… ▽ More In location-based resource allocation scenarios, the distances between each individual and the facility are desired to be approximately equal, thereby ensuring fairness. Individually fair clustering is often employed to achieve the principle of treating all points equally, which can be applied in these scenarios. This paper proposes a novel algorithm, tilted k-means (TKM), aiming to achieve individual fairness in clustering. We integrate the exponential tilting into the sum of squared errors (SSE) to formulate a novel objective function called tilted SSE. We demonstrate that the tilted SSE can generalize to SSE and employ the coordinate descent and first-order gradient method for optimization. We propose a novel fairness metric, the variance of the distances within each cluster, which can alleviate the Matthew Effect typically caused by existing fairness metrics. Our theoretical analysis demonstrates that the well-known k-means++ incurs a multiplicative error of O(k log k), and we establish the convergence of TKM under mild conditions. In terms of fairness, we prove that the variance generated by TKM decreases with a scaled hyperparameter. In terms of efficiency, we demonstrate the time complexity is linear with the dataset size. Our experiments demonstrate that TKM outperforms state-of-the-art methods in effectiveness, fairness, and efficiency. △ Less

Submitted 24 June, 2024; originally announced June 2024.

arXiv:2406.14880 [pdf, other]

Pathformer: Recursive Path Query Encoding for Complex Logical Query Answering

Authors: Chongzhi Zhang, Zhiping Peng, Junhao Zheng, Linghao Wang, Ruifeng Shi, Qianli Ma

Abstract: Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of… ▽ More Complex Logical Query Answering (CLQA) over incomplete knowledge graphs is a challenging task. Recently, Query Embedding (QE) methods are proposed to solve CLQA by performing multi-hop logical reasoning. However, most of them only consider historical query context information while ignoring future information, which leads to their failure to capture the complex dependencies behind the elements of a query. In recent years, the transformer architecture has shown a strong ability to model long-range dependencies between words. The bidirectional attention mechanism proposed by the transformer can solve the limitation of these QE methods regarding query context. Still, as a sequence model, it is difficult for the transformer to model complex logical queries with branch structure computation graphs directly. To this end, we propose a neural one-point embedding method called Pathformer based on the tree-like computation graph, i.e., query computation tree. Specifically, Pathformer decomposes the query computation tree into path query sequences by branches and then uses the transformer encoder to recursively encode these path query sequences to obtain the final query embedding. This allows Pathformer to fully utilize future context information to explicitly model the complex interactions between various parts of the path query. Experimental results show that Pathformer outperforms existing competitive neural QE methods, and we found that Pathformer has the potential to be applied to non-one-point embedding space. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: This work has been submitted to the IEEE

arXiv:2406.09557 [pdf, other]

Measure This, Not That: Optimizing the Cost and Model-Based Information Content of Measurements

Authors: Jialu Wang, Zedong Peng, Ryan Hughes, Debangsu Bhattacharyya, David E. Bernal Neira, Alexander W. Dowling

Abstract: Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient… ▽ More Model-based design of experiments (MBDoE) is a powerful framework for selecting and calibrating science-based mathematical models from data. This work extends popular MBDoE workflows by proposing a convex mixed integer (non)linear programming (MINLP) problem to optimize the selection of measurements. The solver MindtPy is modified to support calculating the D-optimality objective and its gradient via an external package, \texttt{SciPy}, using the grey-box module in Pyomo. The new approach is demonstrated in two case studies: estimating highly correlated kinetics from a batch reactor and estimating transport parameters in a large-scale rotary packed bed for CO$_2$ capture. Both case studies show how examining the Pareto-optimal trade-offs between information content measured by A- and D-optimality versus measurement budget offers practical guidance for selecting measurements for scientific experiments. △ Less

Submitted 13 June, 2024; originally announced June 2024.

MSC Class: 90C25; 90C11; 90C30; 90C90; 62K05

arXiv:2406.09386 [pdf, other]

SimGen: Simulator-conditioned Driving Scene Generation

Authors: Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzi Zhu, Minyi Guo, Bolei Zhou

Abstract: Controllable synthetic data generation can substantially lower the annotation cost of training data in autonomous driving research and development. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. However, those models are trained on small-scale datasets like nuScenes, which lack appearance and layout diversity. Moreover, the trained models can only… ▽ More Controllable synthetic data generation can substantially lower the annotation cost of training data in autonomous driving research and development. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. However, those models are trained on small-scale datasets like nuScenes, which lack appearance and layout diversity. Moreover, the trained models can only generate images based on the real-world layout data from the validation set of the same dataset, where overfitting might happen. In this work, we introduce a simulator-conditioned scene generation framework called SimGen that can learn to generate diverse driving scenes by mixing data from the simulator and the real world. It uses a novel cascade diffusion pipeline to address challenging sim-to-real gaps and multi-condition conflicts. A driving video dataset DIVA is collected to enhance the generative diversity of SimGen, which contains over 147.5 hours of real-world driving videos from 73 locations worldwide and simulated driving data from the MetaDrive simulator. SimGen achieves superior generation quality and diversity while preserving controllability based on the text prompt and the layout pulled from a simulator. We further demonstrate the improvements brought by SimGen for synthetic data augmentation on the BEV detection and segmentation task and showcase its capability in safety-critical data generation. Code, data, and models will be made available. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.08756 [pdf, other]

Optimizing Large Model Training through Overlapped Activation Recomputation

Authors: Ping Chen, Wenjie Zhang, Shuibing He, Yingjie Gu, Zhuwei Peng, Kexin Huang, Xuan Zhan, Weijian Chen, Yi Zheng, Zhefeng Wang, Yanlong Yin, Gang Chen

Abstract: Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a ne… ▽ More Large model training has been using recomputation to alleviate the memory pressure and pipelining to exploit the parallelism of data, tensor, and devices. The existing recomputation approaches may incur up to 40% overhead when training real-world models, e.g., the GPT model with 22B parameters. This is because they are executed on demand in the critical training path. In this paper, we design a new recomputation framework, Lynx, to reduce the overhead by overlapping the recomputation with communication occurring in training pipelines. It consists of an optimal scheduling algorithm (OPT) and a heuristic-based scheduling algorithm (HEU). OPT achieves a global optimum but suffers from a long search time. HEU was designed based on our observation that there are identical structures in large DNN models so that we can apply the same scheduling policy to all identical structures. HEU achieves a local optimum but reduces the search time by 99% compared to OPT. Our comprehensive evaluation using GPT models with 1.3B-20B parameters shows that both OPT and HEU outperform the state-of-the-art recomputation approaches (e.g., Megatron-LM and Checkmake) by 1.02-1.53x. HEU achieves a similar performance as OPT with a search time of 0.16s on average. △ Less

Submitted 27 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 13 pages

arXiv:2406.07539 [pdf, other]

BAKU: An Efficient Transformer for Multi-Task Policy Learning

Authors: Siddhant Haldar, Zhuoran Peng, Lerrel Pinto

Abstract: Training generalist agents capable of solving diverse tasks is challenging, often requiring large datasets of expert demonstrations. This is particularly problematic in robotics, where each data point requires physical execution of actions in the real world. Thus, there is a pressing need for architectures that can effectively leverage the available training data. In this work, we present BAKU, a… ▽ More Training generalist agents capable of solving diverse tasks is challenging, often requiring large datasets of expert demonstrations. This is particularly problematic in robotics, where each data point requires physical execution of actions in the real world. Thus, there is a pressing need for architectures that can effectively leverage the available training data. In this work, we present BAKU, a simple transformer architecture that enables efficient learning of multi-task robot policies. BAKU builds upon recent advancements in offline imitation learning and meticulously combines observation trunks, action chunking, multi-sensory observations, and action heads to substantially improve upon prior work. Our experiments on 129 simulated tasks across LIBERO, Meta-World suite, and the Deepmind Control suite exhibit an overall 18% absolute improvement over RT-1 and MT-ACT, with a 36% improvement on the harder LIBERO benchmark. On 30 real-world manipulation tasks, given an average of just 17 demonstrations per task, BAKU achieves a 91% success rate. Videos of the robot are best viewed at https://baku-robot.github.io/. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06347 [pdf, other]

The 2024 release of the ExoMol database: molecular line lists for exoplanet and other hot atmospheres

Authors: Jonathan Tennyson, Sergei N. Yurchenko, Jingxin Zhang, Charles A. Bowesman, Ryan P. Brady, Jeanna Buldyreva, Katy L. Chubb, Robert R. Gamache, Maire N. Gorman, Elizabeth R. Guest, Christian Hill, Kyriaki Kefala, A. E. Lynas-Gray, Thomas M. Mellor, Laura K. McKemmish, Georgi B. Mitev, Irina I. Mizus, Alec Owens, Zhijian Peng, Armando N. Perri, Marco Pezzella, Oleg L. Polyansky, Qianwei Qu, Mikhail Semenov, Oleksiy Smola , et al. (5 additional authors not shown)

Abstract: The ExoMol database (www.exomol.com) provides molecular data for spectroscopic studies of hot atmospheres. These data are widely used to model atmospheres of exoplanets, cool stars and other astronomical objects, as well as a variety of terrestrial applications. The 2024 data release reports the current status of the database which contains recommended line lists for 91 molecules and 224 isotopolo… ▽ More The ExoMol database (www.exomol.com) provides molecular data for spectroscopic studies of hot atmospheres. These data are widely used to model atmospheres of exoplanets, cool stars and other astronomical objects, as well as a variety of terrestrial applications. The 2024 data release reports the current status of the database which contains recommended line lists for 91 molecules and 224 isotopologues giving a total of almost 10$^{12}$ individual transitions. New features of the database include extensive "MARVELization" of line lists to allow them to be used for high resolutions studies, extension of several line lists to ultraviolet wavelengths, provision of photodissociation cross sections and extended provision of broadening parameters. Some of the in-house data specifications have been rewritten in JSON and moved to conformity with other international standards. Data products, including specific heats, a database of lifetimes for plasma studies, and the ExoMolHR web app which allows exclusively high resolution data to be extracted, are discussed. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Report number: JQSRT in press 2024

arXiv:2406.04035 [pdf, other]

STEMO: Early Spatio-temporal Forecasting with Multi-Objective Reinforcement Learning

Authors: Wei Shao, Yufan Kang, Ziyan Peng, Xiao Xiao, Lei Wang, Yuhui Yang, Flora D Salim

Abstract: Accuracy and timeliness are indeed often conflicting goals in prediction tasks. Premature predictions may yield a higher rate of false alarms, whereas delaying predictions to gather more information can render them too late to be useful. In applications such as wildfires, crimes, and traffic jams, timely forecasting are vital for safeguarding human life and property. Consequently, finding a balanc… ▽ More Accuracy and timeliness are indeed often conflicting goals in prediction tasks. Premature predictions may yield a higher rate of false alarms, whereas delaying predictions to gather more information can render them too late to be useful. In applications such as wildfires, crimes, and traffic jams, timely forecasting are vital for safeguarding human life and property. Consequently, finding a balance between accuracy and timeliness is crucial. In this paper, we propose an early spatio-temporal forecasting model based on Multi-Objective reinforcement learning that can either implement an optimal policy given a preference or infer the preference based on a small number of samples. The model addresses two primary challenges: 1) enhancing the accuracy of early forecasting and 2) providing the optimal policy for determining the most suitable prediction time for each area. Our method demonstrates superior performance on three large-scale real-world datasets, surpassing existing methods in early spatio-temporal forecasting tasks. △ Less

Submitted 18 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: Accepted paper in KDD 2024

arXiv:2406.03395 [pdf, other]

Can recent DESI BAO measurements accommodate a negative cosmological constant?

Authors: Hao Wang, Ze-Yu Peng, Yun-Song Piao

Abstract: Anti-de Sitter vacuum, which correspond to a negative cosmological constant (CC), is theoretically important and well-motivated. It is interesting to see whether current data can allow the existence of a negative CC not. In this paper, we perform the MCMC analysis for the $w_0w_a$CDM+CC model using recent DESI BAO measurements combined with Planck CMB and Pantheon Plus dataset. The results reveal… ▽ More Anti-de Sitter vacuum, which correspond to a negative cosmological constant (CC), is theoretically important and well-motivated. It is interesting to see whether current data can allow the existence of a negative CC not. In this paper, we perform the MCMC analysis for the $w_0w_a$CDM+CC model using recent DESI BAO measurements combined with Planck CMB and Pantheon Plus dataset. The results reveal that the bestfit energy density of CC is $Ω_Λ\sim -0.3$ and the fitting to DESI is slightly improved, while $Ω_Λ=0$ is also $1σ$ consistent. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures

arXiv:2405.20654 [pdf, other]

Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

Authors: Xuyang Wu, Zhiyuan Peng, Krishna Sravanthi Rajanala Sai, Hsin-Tai Wu, Yi Fang

Abstract: Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-… ▽ More Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-written prompt (or hard prompt), and fine-tuning LLMs can be computationally intensive and time-consuming. Furthermore, this approach limits the leverage of question-passage relevance pairs and passage-specific knowledge to enhance the ranking capabilities of LLMs. In this paper, we propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT): a parameter-efficient method that fine-tunes learnable passage-specific soft prompts, incorporating passage-specific knowledge from a limited set of question-passage relevance pairs. The method involves ranking retrieved passages based on the log-likelihood of the model generating the question conditioned on each passage and the learned soft prompt. We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets and the results demonstrate the effectiveness of the proposed approach. △ Less

Submitted 20 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

Comments: Accepted at Gen-IR@SIGIR24

arXiv:2405.20589 [pdf, other]

Selective Knowledge Sharing for Personalized Federated Learning Under Capacity Heterogeneity

Authors: Zheng Wang, Zheng Wang, Zhaopeng Peng, Zihui Wang, Cheng Wang

Abstract: Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity… ▽ More Federated Learning (FL) stands to gain significant advantages from collaboratively training capacity-heterogeneous models, enabling the utilization of private data and computing power from low-capacity devices. However, the focus on personalizing capacity-heterogeneous models based on client-specific data has been limited, resulting in suboptimal local model utility, particularly for low-capacity clients. The heterogeneity in both data and device capacity poses two key challenges for model personalization: 1) accurately retaining necessary knowledge embedded within reduced submodels for each client, and 2) effectively sharing knowledge through aggregating size-varying parameters. To this end, we introduce Pa3dFL, a novel framework designed to enhance local model performance by decoupling and selectively sharing knowledge among capacity-heterogeneous models. First, we decompose each layer of the model into general and personal parameters. Then, we maintain uniform sizes for the general parameters across clients and aggregate them through direct averaging. Subsequently, we employ a hyper-network to generate size-varying personal parameters for clients using learnable embeddings. Finally, we facilitate the implicit aggregation of personal parameters by aggregating client embeddings through a self-attention module. We conducted extensive experiments on three datasets to evaluate the effectiveness of Pa3dFL. Our findings indicate that Pa3dFL consistently outperforms baseline methods across various heterogeneity settings. Moreover, Pa3dFL demonstrates competitive communication and computation efficiency compared to baseline approaches, highlighting its practicality and adaptability in adverse system conditions. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18840 [pdf, other]

Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation

Authors: Zelin Peng, Zhengqin Xu, Zhilin Zeng, Yaoming Wang, Lingxi Xie, Qi Tian, Wei Shen

Abstract: Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between… ▽ More Open-vocabulary semantic segmentation seeks to label each pixel in an image with arbitrary text descriptions. Vision-language foundation models, especially CLIP, have recently emerged as powerful tools for acquiring open-vocabulary capabilities. However, fine-tuning CLIP to equip it with pixel-level prediction ability often suffers three issues: 1) high computational cost, 2) misalignment between the two inherent modalities of CLIP, and 3) degraded generalization ability on unseen categories. To address these issues, we propose H-CLIP a symmetrical parameter-efficient fine-tuning (PEFT) strategy conducted in hyperspherical space for both of the two CLIP modalities. Specifically, the PEFT strategy is achieved by a series of efficient block-diagonal learnable transformation matrices and a dual cross-relation communication module among all learnable matrices. Since the PEFT strategy is conducted symmetrically to the two CLIP modalities, the misalignment between them is mitigated. Furthermore, we apply an additional constraint to PEFT on the CLIP text encoder according to the hyperspherical energy principle, i.e., minimizing hyperspherical energy during fine-tuning preserves the intrinsic structure of the original parameter space, to prevent the destruction of the generalization ability offered by the CLIP text encoder. Extensive evaluations across various benchmarks show that H-CLIP achieves new SOTA open-vocabulary semantic segmentation results while only requiring updating approximately 4% of the total parameters of CLIP. △ Less

Submitted 29 May, 2024; originally announced May 2024.

arXiv:2405.18775 [pdf, other]

Synchronization Scheme based on Pilot Sharing in Cell-Free Massive MIMO Systems

Authors: Qihao Peng, Hong Ren, Zhendong Peng, Cunhua Pan, Maged Elkashlan, Dongming Wang, Jiangzhou Wang, Xiaohu You

Abstract: This paper analyzes the impact of pilot-sharing scheme on synchronization performance in a scenario where several slave access points (APs) with uncertain carrier frequency offsets (CFOs) and timing offsets (TOs) share a common pilot sequence. First, the Cramer-Rao bound (CRB) with pilot contamination is derived for pilot-pairing estimation. Furthermore, a maximum likelihood algorithm is presented… ▽ More This paper analyzes the impact of pilot-sharing scheme on synchronization performance in a scenario where several slave access points (APs) with uncertain carrier frequency offsets (CFOs) and timing offsets (TOs) share a common pilot sequence. First, the Cramer-Rao bound (CRB) with pilot contamination is derived for pilot-pairing estimation. Furthermore, a maximum likelihood algorithm is presented to estimate the CFO and TO among the pairing APs. Then, to minimize the sum of CRBs, we devise a synchronization strategy based on a pilot-sharing scheme by jointly optimizing the cluster classification, synchronization overhead, and pilot-sharing scheme, while simultaneously considering the overhead and each AP's synchronization requirements. To solve this NP-hard problem, we simplify it into two sub-problems, namely cluster classification problem and the pilot sharing problem. To strike a balance between synchronization performance and overhead, we first classify the clusters by using the K-means algorithm, and propose a criteria to find a good set of master APs. Then, the pilot-sharing scheme is obtained by using the swap-matching operations. Simulation results validate the accuracy of our derivations and demonstrate the effectiveness of the proposed scheme over the benchmark schemes. △ Less

Submitted 30 May, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: Submitted to IEEE Journal for pos

arXiv:2405.18291 [pdf, other]

FedSAC: Dynamic Submodel Allocation for Collaborative Fairness in Federated Learning

Authors: Zihui Wang, Zheng Wang, Lingjuan Lyu, Zhaopeng Peng, Zhicheng Yang, Chenglu Wen, Rongshan Yu, Cheng Wang, Xiaoliang Fan

Abstract: Collaborative fairness stands as an essential element in federated learning to encourage client participation by equitably distributing rewards based on individual contributions. Existing methods primarily focus on adjusting gradient allocations among clients to achieve collaborative fairness. However, they frequently overlook crucial factors such as maintaining consistency across local models and… ▽ More Collaborative fairness stands as an essential element in federated learning to encourage client participation by equitably distributing rewards based on individual contributions. Existing methods primarily focus on adjusting gradient allocations among clients to achieve collaborative fairness. However, they frequently overlook crucial factors such as maintaining consistency across local models and catering to the diverse requirements of high-contributing clients. This oversight inevitably decreases both fairness and model accuracy in practice. To address these issues, we propose FedSAC, a novel Federated learning framework with dynamic Submodel Allocation for Collaborative fairness, backed by a theoretical convergence guarantee. First, we present the concept of "bounded collaborative fairness (BCF)", which ensures fairness by tailoring rewards to individual clients based on their contributions. Second, to implement the BCF, we design a submodel allocation module with a theoretical guarantee of fairness. This module incentivizes high-contributing clients with high-performance submodels containing a diverse range of crucial neurons, thereby preserving consistency across local models. Third, we further develop a dynamic aggregation module to adaptively aggregate submodels, ensuring the equitable treatment of low-frequency neurons and consequently enhancing overall model accuracy. Extensive experiments conducted on three public benchmarks demonstrate that FedSAC outperforms all baseline methods in both fairness and model accuracy. We see this work as a significant step towards incentivizing broader client participation in federated learning. The source code is available at https://github.com/wangzihuixmu/FedSAC. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Accepted by KDD'24

arXiv:2405.17891 [pdf, other]

A Refined 3D Gaussian Representation for High-Quality Dynamic Scene Reconstruction

Authors: Bin Zhang, Bi Zeng, Zexin Peng

Abstract: In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering qual… ▽ More In recent years, Neural Radiance Fields (NeRF) has revolutionized three-dimensional (3D) reconstruction with its implicit representation. Building upon NeRF, 3D Gaussian Splatting (3D-GS) has departed from the implicit representation of neural networks and instead directly represents scenes as point clouds with Gaussian-shaped distributions. While this shift has notably elevated the rendering quality and speed of radiance fields but inevitably led to a significant increase in memory usage. Additionally, effectively rendering dynamic scenes in 3D-GS has emerged as a pressing challenge. To address these concerns, this paper purposes a refined 3D Gaussian representation for high-quality dynamic scene reconstruction. Firstly, we use a deformable multi-layer perceptron (MLP) network to capture the dynamic offset of Gaussian points and express the color features of points through hash encoding and a tiny MLP to reduce storage requirements. Subsequently, we introduce a learnable denoising mask coupled with denoising loss to eliminate noise points from the scene, thereby further compressing 3D Gaussian model. Finally, motion noise of points is mitigated through static constraints and motion consistency constraints. Experimental results demonstrate that our method surpasses existing approaches in rendering quality and speed, while significantly reducing the memory usage associated with 3D-GS, making it highly suitable for various tasks such as novel view synthesis, and dynamic mapping. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.14794 [pdf, other]

doi 10.1145/3643834.3661581

RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices

Authors: Qiaoyi Chen, Siyu Liu, Kaihui Huang, Xingbo Wang, Xiaojuan Ma, Junkai Zhu, Zhenhui Peng

Abstract: Reading and repeatedly retelling a short story is a common and effective approach to learning the meanings and usages of target words. However, learners often struggle with comprehending, recalling, and retelling the story contexts of these target words. Inspired by the Cognitive Theory of Multimedia Learning, we propose a computational workflow to generate relevant images paired with stories. Bas… ▽ More Reading and repeatedly retelling a short story is a common and effective approach to learning the meanings and usages of target words. However, learners often struggle with comprehending, recalling, and retelling the story contexts of these target words. Inspired by the Cognitive Theory of Multimedia Learning, we propose a computational workflow to generate relevant images paired with stories. Based on the workflow, we work with learners and teachers to iteratively design an interactive vocabulary learning system named RetAssist. It can generate sentence-level images of a story to facilitate the understanding and recall of the target words in the story retelling practices. Our within-subjects study (N=24) shows that compared to a baseline system without generative images, RetAssist significantly improves learners' fluency in expressing with target words. Participants also feel that RetAssist eases their learning workload and is more useful. We discuss insights into leveraging text-to-image generative models to support learning tasks. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.14588 [pdf, other]

A Study of the Spectral properties of Gamma-Ray Bursts with the Precursors and Main bursts

Authors: Hui-Ying Deng, Zhao-Yang Peng, Jia-Ming Chen, Yue Yin, Ting Li

Abstract: There is no consensus yet on whether the precursor and the main burst of gamma-ray bursts (GRBs) have the same origin, and their jet composition is still unclear. In order to further investigate this issue, we systematically search 21 Fermi GRBs with both precursor and main burst for spectral analysis. We first perform Bayesian time-resolved spectral analysis and find that almost all the precursor… ▽ More There is no consensus yet on whether the precursor and the main burst of gamma-ray bursts (GRBs) have the same origin, and their jet composition is still unclear. In order to further investigate this issue, we systematically search 21 Fermi GRBs with both precursor and main burst for spectral analysis. We first perform Bayesian time-resolved spectral analysis and find that almost all the precursors and the main bursts (94.4$\%$) exhibit thermal components, and the vast majority of them have low-energy spectral index ($α$) (72.2$\%$) that exceed the limit of synchrotron radiation. We then analyse the evolution and correlation of the spectral parameters and find that approximately half of the $α$ (50$\%$) of the precursors and the main bursts evolve in a similar pattern, while peak energy ($E_{p}$) (55.6$\%$) behave similarly, and their evolution is mainly characterized by flux tracking; for the $α-F$ (the flux) relation, more than half of the precursors and the main bursts (61.1$\%$) exhibit roughly similar patterns; the $E_{p}-F$ relation in both the precursor and main burst (100$\%$) exhibits a positive correlation of at least moderate strength. Next, we constrain the outflow properties of the precursors and the main bursts and find that most of them exhibit typical properties of photosphere radiation. Finally, we compare the time-integrated spectra of the precursors and the main bursts and find that nearly all of them are located in similar regions of the Amati relation and follow the Yonetoku relation. Therefore, we conclude that main bursts are continuations of precursors and they may share a common physical origin. △ Less

Submitted 23 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: 36 pages,13 figures. Accepted for publication in ApJ

arXiv:2405.09053 [pdf, ps, other]

Deep Learning-Based CSI Feedback for XL-MIMO Systems in the Near-Field Domain

Authors: Zhangjie Peng, Ruijing Liu, Zhaotian Li, Cunhua Pan, Jiangzhou Wang

Abstract: In this paper, we consider an extremely large-scale massive multiple-input-multiple-output (XL-MIMO) system. As the scale of antenna arrays increases, the range of near-field communications also expands. In this case, the signals no longer exhibit planar wave characteristics but spherical wave characteristics in the near-field channel, which makes the channel state information (CSI) highly complex… ▽ More In this paper, we consider an extremely large-scale massive multiple-input-multiple-output (XL-MIMO) system. As the scale of antenna arrays increases, the range of near-field communications also expands. In this case, the signals no longer exhibit planar wave characteristics but spherical wave characteristics in the near-field channel, which makes the channel state information (CSI) highly complex. Additionally, the increase of the antenna arrays scale also makes the size of the CSI matrix significantly increase. Therefore, CSI feedback in the near-field channel becomes highly challenging. To solve this issue, we propose a deep-learning (DL)-based ExtendNLNet that can compress the CSI, and further reduce the overhead of CSI feedback. In addition, we have introduced the Non-Local block to obtain a larger area of CSI features. Simulation results show that the proposed ExtendNLNet can significantly improve the CSI recovery quality compared to other DL-based methods. △ Less

Submitted 22 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

arXiv:2405.03300 [pdf, other]

Active RIS-Aided Massive MIMO With Imperfect CSI and Phase Noise

Authors: Zhangjie Peng, Jianchen Zhu, Cunhua Pan, Zaichen Zhang, Daniel Benevides da Costa, Maged Elkashlan, George K. Karagiannidis

Abstract: Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system i… ▽ More Active reconfigurable intelligent surface (RIS) has attracted significant attention as a recently proposed RIS architecture. Owing to its capability to amplify the incident signals, active RIS can mitigate the multiplicative fading effect inherent in the passive RIS-aided system. In this paper, we consider an active RIS-aided uplink multi-user massive multiple-input multiple-output (MIMO) system in the presence of phase noise at the active RIS. Specifically, we employ a two-timescale scheme, where the beamforming at the base station (BS) is adjusted based on the instantaneous aggregated channel state information (CSI) and the statistical CSI serves as the basis for designing the phase shifts at the active RIS, so that the feedback overhead and computational complexity can be significantly reduced. The aggregated channel composed of the cascaded and direct channels is estimated by utilizing the linear minimum mean square error (LMMSE) technique. Based on the estimated channel, we derive the analytical closed-form expression of a lower bound of the achievable rate. The power scaling laws in the active RIS-aided system are investigated based on the theoretical expressions. When the transmit power of each user is scaled down by the number of BS antennas M or reflecting elements N, we find that the thermal noise will cause the lower bound of the achievable rate to approach zero, as the number of M or N increases to infinity. Moreover, an optimization approach based on genetic algorithms (GA) is introduced to tackle the phase shift optimization problem. Numerical results reveal that the active RIS can greatly enhance the performance of the considered system under various settings. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2405.02973 [pdf, other]

FairRelay: Fair and Cost-Efficient Peer-to-Peer Content Delivery through Payment Channel Networks

Authors: Jingyu Liu, Yingjie Xue, Zifan Peng, Chao Lin, Xinyi Huang

Abstract: Peer-to-Peer (P2P) content delivery, known for scalability and resilience, offers a decentralized alternative to traditional centralized Content Delivery Networks (CDNs). A significant challenge in P2P content delivery remains: the fair compensation of relayers for their bandwidth contributions. Existing solutions employ blockchains for payment settlements, however, they are not practical due to h… ▽ More Peer-to-Peer (P2P) content delivery, known for scalability and resilience, offers a decentralized alternative to traditional centralized Content Delivery Networks (CDNs). A significant challenge in P2P content delivery remains: the fair compensation of relayers for their bandwidth contributions. Existing solutions employ blockchains for payment settlements, however, they are not practical due to high on-chain costs and over-simplified network assumptions. In this paper, we introduce FairRelay, a fair and cost-efficient protocol that ensures all participants get fair payoff in complex content delivery network settings. We introduce a novel primitive, Enforceable Accumulative Hashed TimeLock Contract (Enforceable A-HTLC), designed to guarantee payment atomicity - ensuring all participants receive their payments upon successful content delivery. The fairness of FairRelay is proved using the Universal Composability (UC) framework. Our evaluation demonstrates that, in optimistic scenarios, FairRelay employs zero on-chain costs. In pessimistic scenarios, the on-chain dispute costs for relayers and customers are constant, irrespective of the network complexity. Specifically, empirical results indicate that the on-chain dispute costs for relayers and customers are 24,902 gas (equivalent to 0.01 USD on Optimism L2) and 290,797 gas (0.07 USD), respectively. In a 10-hop relay path, FairRelay introduces less than 1.5% additional overhead compared to pure data transmission, showcasing the efficiency of FairRelay. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 27 pages, 21 figures

arXiv:2405.02187 [pdf, other]

doi 10.1145/3658233

X-SLAM: Scalable Dense SLAM for Task-aware Optimization using CSFD

Authors: Zhexi Peng, Yin Yang, Tianjia Shao, Chenfanfu Jiang, Kun Zhou

Abstract: We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters throug… ▽ More We present X-SLAM, a real-time dense differentiable SLAM system that leverages the complex-step finite difference (CSFD) method for efficient calculation of numerical derivatives, bypassing the need for a large-scale computational graph. The key to our approach is treating the SLAM process as a differentiable function, enabling the calculation of the derivatives of important SLAM parameters through Taylor series expansion within the complex domain. Our system allows for the real-time calculation of not just the gradient, but also higher-order differentiation. This facilitates the use of high-order optimizers to achieve better accuracy and faster convergence. Building on X-SLAM, we implemented end-to-end optimization frameworks for two important tasks: camera relocalization in wide outdoor scenes and active robotic scanning in complex indoor environments. Comprehensive evaluations on public benchmarks and intricate real scenes underscore the improvements in the accuracy of camera relocalization and the efficiency of robotic navigation achieved through our task-aware optimization. The code and data are available at https://gapszju.github.io/X-SLAM. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: To be published in ACM SIGGRAPH 2024

arXiv:2404.19706 [pdf, other]

doi 10.1145/3658233

RTG-SLAM: Real-time 3D Reconstruction at Scale using Gaussian Splatting

Authors: Zhexi Peng, Tianjia Shao, Yong Liu, Jingke Zhou, Yin Yang, Jingdong Wang, Kun Zhou

Abstract: We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant col… ▽ More We present Real-time Gaussian SLAM (RTG-SLAM), a real-time 3D reconstruction system with an RGBD camera for large-scale environments using Gaussian splatting. The system features a compact Gaussian representation and a highly efficient on-the-fly Gaussian optimization scheme. We force each Gaussian to be either opaque or nearly transparent, with the opaque ones fitting the surface and dominant colors, and transparent ones fitting residual colors. By rendering depth in a different way from color rendering, we let a single opaque Gaussian well fit a local surface region without the need of multiple overlapping Gaussians, hence largely reducing the memory and computation cost. For on-the-fly Gaussian optimization, we explicitly add Gaussians for three types of pixels per frame: newly observed, with large color errors, and with large depth errors. We also categorize all Gaussians into stable and unstable ones, where the stable Gaussians are expected to well fit previously observed RGBD images and otherwise unstable. We only optimize the unstable Gaussians and only render the pixels occupied by unstable Gaussians. In this way, both the number of Gaussians to be optimized and pixels to be rendered are largely reduced, and the optimization can be done in real time. We show real-time reconstructions of a variety of large scenes. Compared with the state-of-the-art NeRF-based RGBD SLAM, our system achieves comparable high-quality reconstruction but with around twice the speed and half the memory cost, and shows superior performance in the realism of novel view synthesis and camera tracking accuracy. △ Less

Submitted 8 May, 2024; v1 submitted 30 April, 2024; originally announced April 2024.

Comments: To be published in ACM SIGGRAPH 2024

arXiv:2404.18540 [pdf, other]

Tunable coupling of a quantum phononic resonator to a transmon qubit with flip-chip architecture

Authors: Xinhui Ruan, Li Li, Guihan Liang, Silu Zhao, Jia-heng Wang, Yizhou Bu, Bingjie Chen, Xiaohui Song, Xiang Li, He Zhang, Jinzhe Wang, Qianchuan Zhao, Kai Xu, Heng Fan, Yu-xi Liu, Jing Zhang, Zhihui Peng, Zhongcheng Xiang, Dongning Zheng

Abstract: A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted… ▽ More A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted from different vacuum Rabi oscillation frequencies. The phonon-induced ac Stark shift of the qubit at different coupling strengths is also shown. Our approach offers a good experimental platform for exploring quantum acoustics and hybrid systems. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.18213 [pdf, other]

S$^2$Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification

Authors: Guanchun Wang, Xiangrong Zhang, Zelin Peng, Tianyang Zhang, Xiuping Jia, Licheng Jiao

Abstract: Land cover analysis using hyperspectral images (HSI) remains an open problem due to their low spatial resolution and complex spectral information. Recent studies are primarily dedicated to designing Transformer-based architectures for spatial-spectral long-range dependencies modeling, which is computationally expensive with quadratic complexity. Selective structured state space model (Mamba), whic… ▽ More Land cover analysis using hyperspectral images (HSI) remains an open problem due to their low spatial resolution and complex spectral information. Recent studies are primarily dedicated to designing Transformer-based architectures for spatial-spectral long-range dependencies modeling, which is computationally expensive with quadratic complexity. Selective structured state space model (Mamba), which is efficient for modeling long-range dependencies with linear complexity, has recently shown promising progress. However, its potential in hyperspectral image processing that requires handling numerous spectral bands has not yet been explored. In this paper, we innovatively propose S$^2$Mamba, a spatial-spectral state space model for hyperspectral image classification, to excavate spatial-spectral contextual features, resulting in more efficient and accurate land cover analysis. In S$^2$Mamba, two selective structured state space models through different dimensions are designed for feature extraction, one for spatial, and the other for spectral, along with a spatial-spectral mixture gate for optimal fusion. More specifically, S$^2$Mamba first captures spatial contextual relations by interacting each pixel with its adjacent through a Patch Cross Scanning module and then explores semantic information from continuous spectral bands through a Bi-directional Spectral Scanning module. Considering the distinct expertise of the two attributes in homogenous and complicated texture scenes, we realize the Spatial-spectral Mixture Gate by a group of learnable matrices, allowing for the adaptive incorporation of representations learned across different dimensions. Extensive experiments conducted on HSI classification benchmarks demonstrate the superiority and prospect of S$^2$Mamba. The code will be available at: https://github.com/PURE-melo/S2Mamba. △ Less

Submitted 28 April, 2024; originally announced April 2024.

Comments: 13 pages, 9 figures

arXiv:2404.17913 [pdf, other]

A comparative analysis of two peculiar long Gamma-ray bursts: GRB 230307A and GRB 211211A

Authors: Zhao-Yang Peng, Jia-Ming Chen, Jirong Mao

Abstract: GRB 211211A is a peculiar long Gamma-ray burst (GRB) with very high brightness and short burst properties. It's full lightcurve consists of three emission episodes, i.e. a precursor, a main burst and a extended emission. We find a recently detected long-duration GRB 230307A also includes the three consistent emission episodes. Furthermore, the two bursts have similar redshift 0.076 and 0.065, resp… ▽ More GRB 211211A is a peculiar long Gamma-ray burst (GRB) with very high brightness and short burst properties. It's full lightcurve consists of three emission episodes, i.e. a precursor, a main burst and a extended emission. We find a recently detected long-duration GRB 230307A also includes the three consistent emission episodes. Furthermore, the two bursts have similar redshift 0.076 and 0.065, respectively. We perform a detail temporal and spectral analysis of the two GRBs to compare their temporal and spectral properties. Our analysis shows that the two bursts share great similarities for both the whole emission and the three corresponding emission phases, which are listed as follows: (1) they have near zero spectral lag, (2) they have very short minimum variability timescale (MVT), (3) they lie in the same region of in the MVT-$T_{90}$, Amati relation, and hardness-$T_{90}$ planes, (4) the three phases are quasi-thermal spectra, (5) both the peak energy and the low-energy index track the flux, (6) the time-resolved spectra are much wider than those of the blackbody prediced by theory model, (7) there are strong correlations between thermal flux and total flux and the correlation coefficients as well as the slopes for the corresponding stages are very consistent, (8) the photosphere emission properties are very consistent. Other investigations and observations suggest the two GRBs indeed belong to short burst with a compact star merger origin. Therefore, we think that GRB 230307A and GRB 211211A are the rare and similar GRBs and the photospheric radiation can interpret their radiation mechanisms. △ Less

Submitted 27 April, 2024; originally announced April 2024.

Comments: Accepted for publication in The Astrophysical Journal. 22 pages,20 figures, 4 tables

arXiv:2404.17579 [pdf, other]

Quantum Optimization for the Maximum Cut Problem on a Superconducting Quantum Computer

Authors: Maxime Dupont, Bhuvanesh Sundar, Bram Evert, David E. Bernal Neira, Zedong Peng, Stephen Jeffrey, Mark J. Hodson

Abstract: Achieving high-quality solutions faster than classical solvers on computationally hard problems is a challenge for quantum optimization to deliver utility. Using a superconducting quantum computer, we experimentally investigate the performance of a hybrid quantum-classical algorithm inspired by semidefinite programming approaches for solving the maximum cut problem on 3-regular graphs up to severa… ▽ More Achieving high-quality solutions faster than classical solvers on computationally hard problems is a challenge for quantum optimization to deliver utility. Using a superconducting quantum computer, we experimentally investigate the performance of a hybrid quantum-classical algorithm inspired by semidefinite programming approaches for solving the maximum cut problem on 3-regular graphs up to several thousand variables. We leverage the structure of the input problems to address sizes beyond what current quantum machines can naively handle. We attain an average performance of 99% over a random ensemble of thousands of problem instances. We benchmark the quantum solver against similarly high-performing classical heuristics, including the Gurobi optimizer, simulated annealing, and the Burer-Monteiro algorithm. A runtime analysis shows that the quantum solver on large-scale problems is competitive against Gurobi but short of others. We explore multiple leads to close the gap and discuss prospects for a practical quantum speedup. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures (+ 32 pages, 23 figures)

arXiv:2404.17528 [pdf, other]

Geometry-aware Reconstruction and Fusion-refined Rendering for Generalizable Neural Radiance Fields

Authors: Tianqi Liu, Xinyi Ye, Min Shi, Zihao Huang, Zhiyu Pan, Zhan Peng, Zhiguo Cao

Abstract: Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these… ▽ More Generalizable NeRF aims to synthesize novel views for unseen scenes. Common practices involve constructing variance-based cost volumes for geometry reconstruction and encoding 3D descriptors for decoding novel views. However, existing methods show limited generalization ability in challenging conditions due to inaccurate geometry, sub-optimal descriptors, and decoding strategies. We address these issues point by point. First, we find the variance-based cost volume exhibits failure patterns as the features of pixels corresponding to the same point can be inconsistent across different views due to occlusions or reflections. We introduce an Adaptive Cost Aggregation (ACA) approach to amplify the contribution of consistent pixel pairs and suppress inconsistent ones. Unlike previous methods that solely fuse 2D features into descriptors, our approach introduces a Spatial-View Aggregator (SVA) to incorporate 3D context into descriptors through spatial and inter-view interaction. When decoding the descriptors, we observe the two existing decoding strategies excel in different areas, which are complementary. A Consistency-Aware Fusion (CAF) strategy is proposed to leverage the advantages of both. We incorporate the above ACA, SVA, and CAF into a coarse-to-fine framework, termed Geometry-aware Reconstruction and Fusion-refined Rendering (GeFu). GeFu attains state-of-the-art performance across multiple datasets. Code is available at https://github.com/TQTQliu/GeFu . △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024. Project page: https://gefucvpr24.github.io

arXiv:2404.16407 [pdf, other]

U2++ MoE: Scaling 4.7x parameters with minimal impact on RTF

Authors: Xingchen Song, Di Wu, Binbin Zhang, Dinghao Zhou, Zhendong Peng, Bo Dang, Fuping Pan, Chao Yang

Abstract: Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the… ▽ More Scale has opened new frontiers in natural language processing, but at a high cost. In response, by learning to only activate a subset of parameters in training and inference, Mixture-of-Experts (MoE) have been proposed as an energy efficient path to even larger and more capable language models and this shift towards a new generation of foundation models is gaining momentum, particularly within the field of Automatic Speech Recognition (ASR). Recent works that incorporating MoE into ASR models have complex designs such as routing frames via supplementary embedding network, improving multilingual ability for the experts, and utilizing dedicated auxiliary losses for either expert load balancing or specific language handling. We found that delicate designs are not necessary, while an embarrassingly simple substitution of MoE layers for all Feed-Forward Network (FFN) layers is competent for the ASR task. To be more specific, we benchmark our proposed model on a large scale inner-source dataset (160k hours), the results show that we can scale our baseline Conformer (Dense-225M) to its MoE counterparts (MoE-1B) and achieve Dense-1B level Word Error Rate (WER) while maintaining a Dense-225M level Real Time Factor (RTF). Furthermore, by applying Unified 2-pass framework with bidirectional attention decoders (U2++), we achieve the streaming and non-streaming decoding modes in a single MoE based model, which we call U2++ MoE. We hope that our study can facilitate the research on scaling speech foundation models without sacrificing deployment efficiency. △ Less

Submitted 25 April, 2024; originally announced April 2024.

ACM Class: I.2.7

arXiv:2404.13875 [pdf, ps, other]

Active RIS-Aided Massive MIMO Uplink Systems with Low-Resolution ADCs

Authors: Zhangjie Peng, Zecheng Lu, Xue Liu, Cunhua Pan, Jiangzhou Wang

Abstract: This letter considers an active reconfigurable intelligent surface (RIS)-aided multi-user uplink massive multipleinput multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs). The letter derives the closedform approximate expression for the sum achievable rate (AR), where the maximum ratio combination (MRC) processing and low-resolution ADCs are applied at the base st… ▽ More This letter considers an active reconfigurable intelligent surface (RIS)-aided multi-user uplink massive multipleinput multiple-output (MIMO) system with low-resolution analog-to-digital converters (ADCs). The letter derives the closedform approximate expression for the sum achievable rate (AR), where the maximum ratio combination (MRC) processing and low-resolution ADCs are applied at the base station. The system performance is analyzed, and a genetic algorithm (GA)-based method is proposed to optimize the RIS's phase shifts for enhancing the system performance. Numerical results verify the accuracy of the derivations, and demonstrate that the active RIS has an evident performance gain over the passive RIS. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.12887 [pdf, other]

3D Multi-frame Fusion for Video Stabilization

Authors: Zhan Peng, Xinyi Ye, Weiyue Zhao, Tianqi Liu, Huiqiang Sun, Baopu Li, Zhiguo Cao

Abstract: In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rend… ▽ More In this paper, we present RStab, a novel framework for video stabilization that integrates 3D multi-frame fusion through volume rendering. Departing from conventional methods, we introduce a 3D multi-frame perspective to generate stabilized images, addressing the challenge of full-frame generation while preserving structure. The core of our approach lies in Stabilized Rendering (SR), a volume rendering module, which extends beyond the image fusion by incorporating feature fusion. The core of our RStab framework lies in Stabilized Rendering (SR), a volume rendering module, fusing multi-frame information in 3D space. Specifically, SR involves warping features and colors from multiple frames by projection, fusing them into descriptors to render the stabilized image. However, the precision of warped information depends on the projection accuracy, a factor significantly influenced by dynamic regions. In response, we introduce the Adaptive Ray Range (ARR) module to integrate depth priors, adaptively defining the sampling range for the projection process. Additionally, we propose Color Correction (CC) assisting geometric constraints with optical flow for accurate color aggregation. Thanks to the three modules, our RStab demonstrates superior performance compared with previous stabilizers in the field of view (FOV), image quality, and video stability across various datasets. △ Less

Submitted 19 April, 2024; originally announced April 2024.

Comments: Accepted by CVPR 2024

arXiv:2404.11536 [pdf, other]

FedPFT: Federated Proxy Fine-Tuning of Foundation Models

Authors: Zhaopeng Peng, Xiaoliang Fan, Yufan Chen, Zheng Wang, Shirui Pan, Chenglu Wen, Ruisheng Zhang, Cheng Wang

Abstract: Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuni… ▽ More Adapting Foundation Models (FMs) for downstream tasks through Federated Learning (FL) emerges a promising strategy for protecting data privacy and valuable FMs. Existing methods fine-tune FM by allocating sub-FM to clients in FL, however, leading to suboptimal performance due to insufficient tuning and inevitable error accumulations of gradients. In this paper, we propose Federated Proxy Fine-Tuning (FedPFT), a novel method enhancing FMs adaptation in downstream tasks through FL by two key modules. First, the sub-FM construction module employs a layer-wise compression approach, facilitating comprehensive FM fine-tuning across all layers by emphasizing those crucial neurons. Second, the sub-FM alignment module conducts a two-step distillations-layer-level and neuron-level-before and during FL fine-tuning respectively, to reduce error of gradient by accurately aligning sub-FM with FM under theoretical guarantees. Experimental results on seven commonly used datasets (i.e., four text and three vision) demonstrate the superiority of FedPFT. △ Less

Submitted 28 April, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

Comments: Accepted by IJCAI'24

arXiv:2404.10264 [pdf, other]

Calibration of the Cryogenic Measurement System of a Resonant Haloscope Cavity

Authors: Dong He, Jie Fan, Xin Gao, Yu Gao, Nick Houston, Zhongqing Ji, Yirong Jin, Chuang Li, Jinmian Li, Tianjun Li, Shi-hang Liu, Jia-Shu Niu, Zhihui Peng, Liang Sun, Zheng Sun, Jia Wang, Puxian Wei, Lina Wu, Zhongchen Xiang, Qiaoli Yang, Chi Zhang, Wenxing Zhang, Xin Zhang, Dongning Zheng, Ruifeng Zheng , et al. (1 additional authors not shown)

Abstract: Possible light bosonic dark matter interactions with the Standard Model photon have been searched by microwave resonant cavities. In this paper, we demonstrate the cryogenic readout system calibration of a 7.138 GHz copper cavity with a loaded quality factor $Q_l=10^4$, operated at 22 mK temperature based on a dilution refrigerator. Our readout system consists of High Electron Mobility Transistors… ▽ More Possible light bosonic dark matter interactions with the Standard Model photon have been searched by microwave resonant cavities. In this paper, we demonstrate the cryogenic readout system calibration of a 7.138 GHz copper cavity with a loaded quality factor $Q_l=10^4$, operated at 22 mK temperature based on a dilution refrigerator. Our readout system consists of High Electron Mobility Transistors as cryogenic amplifiers at 4 K, plus room-temperature amplifiers and a spectrum analyzer for signal power detection. We test the system with a superconducting two-level system as a single-photon source in the microwave frequency regime and report an overall 95.6 dB system gain and -71.4 dB attenuation in the cavity's input channel. The effective noise temperature of the measurement system is 7.5 K. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 7 pages, 5 figures, version to appear in CPC

arXiv:2404.09487 [pdf, other]

The Shaping of Flying Qubits based on Quantum Optimal Control Theory

Authors: Xue Dong, Xi Cao, Wen-Long Li, Guofeng Zhang, Zhihui Peng, Re-Bing Wu

Abstract: The control of flying qubits carried by itinerant photons is ubiquitous in quantum communication networks. In addition to their logical states, the shape of flying qubits must also be tailored to match the remote receiver. In this paper, we introduce the quantum optimal control theory to the design of flying-qubit shaping protocols. A gradient-based algorithm is proposed for the generation of arbi… ▽ More The control of flying qubits carried by itinerant photons is ubiquitous in quantum communication networks. In addition to their logical states, the shape of flying qubits must also be tailored to match the remote receiver. In this paper, we introduce the quantum optimal control theory to the design of flying-qubit shaping protocols. A gradient-based algorithm is proposed for the generation of arbitrary-shape flying qubits with general non-ideal emitters. Simulations show that, as a joint control with the traditionally used tunable coupler, coherent driving fields can be applied to the shaping when the coupling strength is fixed or limited. The optimized control protocols can effectively suppress unwanted level leakage and multi-photon radiation. The method provides a systematic approach to high-fidelity control of flying qubits using realistic quantum devices. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 9 pages,6 figures

arXiv:2404.08171 [pdf, ps, other]

The Rank-1 Completion Problem for Cubic Tensors

Authors: Jinling Zhou, Jiawang Nie, Zheng Peng, Guangming Zhou

Abstract: This paper studies the rank-$1$ tensor completion problem for cubic order tensors. First of all, we show that this problem is equivalent to a special rank-$1$ matrix recovery problem. We propose both nuclear norm relaxation and moment relaxation methods for solving the resulting rank-$1$ matrix recovery problem. The nuclear norm relaxation sometimes get a rank-$1$ tensor completion, while sometime… ▽ More This paper studies the rank-$1$ tensor completion problem for cubic order tensors. First of all, we show that this problem is equivalent to a special rank-$1$ matrix recovery problem. We propose both nuclear norm relaxation and moment relaxation methods for solving the resulting rank-$1$ matrix recovery problem. The nuclear norm relaxation sometimes get a rank-$1$ tensor completion, while sometimes it does not. When it fails, we apply the moment hierarchy of semidefinite programming relaxations to solve the rank-$1$ matrix recovery problem. The moment hierarchy can always get a rank-$1$ tensor completion, or detect its nonexistence. In particular, when the tensor is strongly rank-$1$ completable, we show that the problem is equivalent to a rank-$1$ matrix completion problem and it can be solved by an iterative formula. Therefore, much larger size problems can be solved efficiently for strongly rank-$1$ completable tensors. Numerical experiments are shown to demonstrate the efficiency of these proposed methods. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 23 pages

arXiv:2404.07827 [pdf, other]

iPREFER: An Intelligent Parameter Extractor based on Features for BSIM-CMG Models

Authors: Zhiliang Peng, Yicheng Wang, Zhengwu Yuan, Xingsheng Wang

Abstract: This paper introduces an innovative parameter extraction method for BSIM-CMG compact models, seamlessly integrating curve feature extraction and machine learning techniques. This method offers a promising solution for bridging the division between TCAD and compact model, significantly contributing to the Design Technology Co-Optimization (DTCO) process. The key innovation lies in the development o… ▽ More This paper introduces an innovative parameter extraction method for BSIM-CMG compact models, seamlessly integrating curve feature extraction and machine learning techniques. This method offers a promising solution for bridging the division between TCAD and compact model, significantly contributing to the Design Technology Co-Optimization (DTCO) process. The key innovation lies in the development of an automated IV and CV curve feature extractor, which not only streamlines the analysis of device IV and CV curves but also enhances the consistency and efficiency of data processing. Validation on 5-nm nanosheet devices underscores the extractor's remarkable precision, with impressively low fitting errors of 0.42% for CV curves and 1.28% for IV curves. Furthermore, its adaptability to parameter variations, including those in Equivalent Oxide Thickness and Gate Length, solidifies its potential to revolutionize the TCAD-to-compact model transition. This universal BSIM-CMG model parameter extractor promises to improve the DTCO process, offering efficient process optimization and accurate simulations for semiconductor device performance prediction. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 6 pages

arXiv:2404.07644 [pdf, other]

2DLIW-SLAM:2D LiDAR-Inertial-Wheel Odometry with Real-Time Loop Closure

Authors: Bin Zhang, Zexin Peng, Bi Zeng, Junjie Lu

Abstract: Due to budgetary constraints, indoor navigation typically employs 2D LiDAR rather than 3D LiDAR. However, the utilization of 2D LiDAR in Simultaneous Localization And Mapping (SLAM) frequently encounters challenges related to motion degeneracy, particularly in geometrically similar environments. To address this problem, this paper proposes a robust, accurate, and multi-sensor-fused 2D LiDAR SLAM s… ▽ More Due to budgetary constraints, indoor navigation typically employs 2D LiDAR rather than 3D LiDAR. However, the utilization of 2D LiDAR in Simultaneous Localization And Mapping (SLAM) frequently encounters challenges related to motion degeneracy, particularly in geometrically similar environments. To address this problem, this paper proposes a robust, accurate, and multi-sensor-fused 2D LiDAR SLAM system specifically designed for indoor mobile robots. To commence, the original LiDAR data undergoes meticulous processing through point and line extraction. Leveraging the distinctive characteristics of indoor environments, line-line constraints are established to complement other sensor data effectively, thereby augmenting the overall robustness and precision of the system. Concurrently, a tightly-coupled front-end is created, integrating data from the 2D LiDAR, IMU, and wheel odometry, thus enabling real-time state estimation. Building upon this solid foundation, a novel global feature point matching-based loop closure detection algorithm is proposed. This algorithm proves highly effective in mitigating front-end accumulated errors and ultimately constructs a globally consistent map. The experimental results indicate that our system fully meets real-time requirements. When compared to Cartographer, our system not only exhibits lower trajectory errors but also demonstrates stronger robustness, particularly in degeneracy problem. △ Less

Submitted 23 April, 2024; v1 submitted 11 April, 2024; originally announced April 2024.

Comments: This paper has been accepted by Measurement Science and Technology: https://iopscience.iop.org/article/10.1088/1361-6501/ad3ea3/meta

arXiv:2404.06862 [pdf]

Electron acceleration and X-ray generation from near-critical-density carbon nanotube foams driven by moderately relativistic lasers

Authors: Zhuo Pan, Jianbo Liu, Pengjie Wang, Zhusong Mei, Zhengxuan Cao, Defeng Kong, Shirui Xu, Zhipeng Liu, Yulan Liang, Ziyang Peng, Tianqi Xu, Tan Song, Xun Chen, Qingfan Wu, Yujia Zhang, Qihang Han, Haoran Chen, Jiarui Zhao, Ying Gao, Shiyou Chen, Yanying Zhao, Xueqing Yan, Yinren Shou, Wenjun Ma

Abstract: Direct laser acceleration of electrons in near-critical-density (NCD) carbon nanotube foams (CNFs) has its advantages in the high-efficiency generation of relativistic electrons and broadband X-rays. Here, we report the first simultaneous measurement on the spectra of laser-driven electrons and X-rays from CNFs at moderately relativistic intensities of around 5\times{10}^{19}\ W/cm^2.\ The density… ▽ More Direct laser acceleration of electrons in near-critical-density (NCD) carbon nanotube foams (CNFs) has its advantages in the high-efficiency generation of relativistic electrons and broadband X-rays. Here, we report the first simultaneous measurement on the spectra of laser-driven electrons and X-rays from CNFs at moderately relativistic intensities of around 5\times{10}^{19}\ W/cm^2.\ The density and thickness of the CNFs were scanned in the experiments, indicating the optimized electrons temperature of 5.5 MeV and X-ray critical energy of 5 keV. Two-dimensional (2D) particle-in-cell (PIC) simulations confirm that the electrons, with a temperature significantly higher than the pondermotive scale, are directly accelerated by the laser along the NCD plasma channel, while the bright X-rays are emitted by these electrons through betatron radiation or Thomson backscattering inside the channel. The simultaneously generated electrons and X-rays, automatically synchronized with the femtosecond laser driver, are suitable for applications such as bi-modal radiography. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: arXiv admin note: text overlap with arXiv:2010.05702

arXiv:2404.04522 [pdf, other]

Q-PEFT: Query-dependent Parameter Efficient Fine-tuning for Text Reranking with Large Language Models

Authors: Zhiyuan Peng, Xuyang Wu, Qifan Wang, Sravanthi Rajanala, Yi Fang

Abstract: Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized in Large Language Models (LLMs) to improve the down-streaming tasks without the cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively use PEFT for fine-tuning LLMs in ranking tasks with convincing performance; there are some limitations, including the learned prompt being fixed for different doc… ▽ More Parameter Efficient Fine-Tuning (PEFT) methods have been extensively utilized in Large Language Models (LLMs) to improve the down-streaming tasks without the cost of fine-tuing the whole LLMs. Recent studies have shown how to effectively use PEFT for fine-tuning LLMs in ranking tasks with convincing performance; there are some limitations, including the learned prompt being fixed for different documents, overfitting to specific tasks, and low adaptation ability. In this paper, we introduce a query-dependent parameter efficient fine-tuning (Q-PEFT) approach for text reranking to leak the information of the true queries to LLMs and then make the generation of true queries from input documents much easier. Specifically, we utilize the query to extract the top-$k$ tokens from concatenated documents, serving as contextual clues. We further augment Q-PEFT by substituting the retrieval mechanism with a multi-head attention layer to achieve end-to-end training and cover all the tokens in the documents, guiding the LLMs to generate more document-specific synthetic queries, thereby further improving the reranking performance. Extensive experiments are conducted on four public datasets, demonstrating the effectiveness of our proposed approach. △ Less

Submitted 11 April, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.02475 [pdf, other]

PromptRPA: Generating Robotic Process Automation on Smartphones from Textual Prompts

Authors: Tian Huang, Chun Yu, Weinan Shi, Zijian Peng, David Yang, Weiqi Sun, Yuanchun Shi

Abstract: Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend v… ▽ More Robotic Process Automation (RPA) offers a valuable solution for efficiently automating tasks on the graphical user interface (GUI), by emulating human interactions, without modifying existing code. However, its broader adoption is constrained by the need for expertise in both scripting languages and workflow design. To address this challenge, we present PromptRPA, a system designed to comprehend various task-related textual prompts (e.g., goals, procedures), thereby generating and performing corresponding RPA tasks. PromptRPA incorporates a suite of intelligent agents that mimic human cognitive functions, specializing in interpreting user intent, managing external information for RPA generation, and executing operations on smartphones. The agents can learn from user feedback and continuously improve their performance based on the accumulated knowledge. Experimental results indicated a performance jump from a 22.28% success rate in the baseline to 95.21% with PromptRPA, requiring an average of 1.66 user interventions for each new task. PromptRPA presents promising applications in fields such as tutorial creation, smart assistance, and customer service. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 34 pages

arXiv:2404.00908 [pdf, other]

Dark photon constraints from a 7.139 GHz cavity haloscope experiment

Authors: Dong He, Jie Fan, Xin Gao, Yu Gao, Nick Houston, Zhongqing Ji, Yirong Jin, Chuang Li, Jinmian Li, Tianjun Li, Shi-hang Liu, Jia-Shu Niu, Zhihui Peng, Liang Sun, Zheng Sun, Jia Wang, Puxian Wei, Lina Wu, Zhongchen Xiang, Qiaoli Yang, Chi Zhang, Wenxing Zhang, Xin Zhang, Dongning Zheng, Ruifeng Zheng , et al. (1 additional authors not shown)

Abstract: The dark photon is a promising candidate for the dark matter which comprises most of the matter in our visible Universe. Via kinetic mixing with the Standard Model it can also be resonantly converted to photons in an electromagnetic cavity, offering novel experimental possibilities for the discovery and study of dark matter. We report the results of a pathfinder dark photon dark matter cavity sear… ▽ More The dark photon is a promising candidate for the dark matter which comprises most of the matter in our visible Universe. Via kinetic mixing with the Standard Model it can also be resonantly converted to photons in an electromagnetic cavity, offering novel experimental possibilities for the discovery and study of dark matter. We report the results of a pathfinder dark photon dark matter cavity search experiment performed at Hunan Normal University and the Institute of Physics, Chinese Academy of Sciences, representing the first stage of the APEX (Axion and dark Photon EXperiment) program. Finding no statistically significant excess, we place an upper limit on the kinetic mixing parameter $|χ|<3.7\times 10^{-13}$ around $m_A\simeq 29.5$ $μ$eV at 90% confidence level. This result exceeds other constraints on dark photon dark matter in this frequency range by roughly an order of magnitude. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 5 pages, 4 figures

arXiv:2404.00106 [pdf]

Precise Control of Process Parameters for >23% Efficiency Perovskite Solar Cells in Ambient Air Using an Automated Device Acceleration Platform

Authors: Jiyun Zhang, Anastasia Barabash, Tian Du, Jianchang Wu, Vincent M. Le Corre, Yicheng Zhao, Shudi Qiu, Kaicheng Zhang, Frederik Schmitt, Zijian Peng, Jingjing Tian, Chaohui Li, Chao Liu, Thomas Heumueller, Larry Lüer, Jens A. Hauch, Christoph J. Brabec

Abstract: Achieving high-performance perovskite photovoltaics, especially in ambient air relies heavily on optimizing process parameters. However, traditional manual methods often struggle to effectively control the key variables. This inherent challenge requires a paradigm shift toward automated platforms capable of precise and reproducible experiments. Herein, we use a fully automated device acceleration… ▽ More Achieving high-performance perovskite photovoltaics, especially in ambient air relies heavily on optimizing process parameters. However, traditional manual methods often struggle to effectively control the key variables. This inherent challenge requires a paradigm shift toward automated platforms capable of precise and reproducible experiments. Herein, we use a fully automated device acceleration platform (DAP) to optimize the process parameters for preparing full perovskite devices using a two-step method in ambient air. Eight process parameters that have the potential to significantly influence device performance are systematically optimized. Specifically, we delve into the impact of the dispense speed of organic ammonium halide, a parameter that is difficult to control manually, on both perovskite film and device performance. Through the targeted design of experiments, we reveal that the dispense speed significantly affects device performance primarily by adjusting the residual PbI2 content in the films. We find that moderate dispense speeds, e.g., 50 μl/s, contribute to top-performance devices. Conversely, too fast or too slow speeds result in devices with relatively poorer performance and lower reproducibility. The optimized parameter set enables us to establish a Standard Operation Procedure (SOP) for additive-free perovskite processing under ambient conditions, which yield devices with efficiencies surpassing 23%, satisfactory reproducibility, and state-of-the-art photo-thermal stability. This research underscores the importance of understanding the causality of process parameters in enhancing perovskite photovoltaic performance. Furthermore, our study highlights the pivotal role of automated platforms in discovering innovative workflows and accelerating the development of high-performing perovskite photovoltaic technologies. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.16812 [pdf, other]

Towards Human-AI Deliberation: Design and Evaluation of LLM-Empowered Deliberative AI for AI-Assisted Decision-Making

Authors: Shuai Ma, Qiaoyi Chen, Xinru Wang, Chengbo Zheng, Zhenhui Peng, Ming Yin, Xiaojuan Ma

Abstract: In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to p… ▽ More In AI-assisted decision-making, humans often passively review AI's suggestion and decide whether to accept or reject it as a whole. In such a paradigm, humans are found to rarely trigger analytical thinking and face difficulties in communicating the nuances of conflicting opinions to the AI when disagreements occur. To tackle this challenge, we propose Human-AI Deliberation, a novel framework to promote human reflection and discussion on conflicting human-AI opinions in decision-making. Based on theories in human deliberation, this framework engages humans and AI in dimension-level opinion elicitation, deliberative discussion, and decision updates. To empower AI with deliberative capabilities, we designed Deliberative AI, which leverages large language models (LLMs) as a bridge between humans and domain-specific models to enable flexible conversational interactions and faithful information provision. An exploratory evaluation on a graduate admissions task shows that Deliberative AI outperforms conventional explainable AI (XAI) assistants in improving humans' appropriate reliance and task performance. Based on a mixed-methods analysis of participant behavior, perception, user experience, and open-ended feedback, we draw implications for future AI-assisted decision tool design. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.16265 [pdf, other]

Connecting the Dots: Inferring Patent Phrase Similarity with Retrieved Phrase Graphs

Authors: Zhuoyi Peng, Yi Yang

Abstract: We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. As patent documents employ legal and highly technical language, existing semantic textual similarity methods that use localized contextual information do not perform satisfactorily in inferring patent phrase similarity. To address this, we introduce a graph-augmented approach to… ▽ More We study the patent phrase similarity inference task, which measures the semantic similarity between two patent phrases. As patent documents employ legal and highly technical language, existing semantic textual similarity methods that use localized contextual information do not perform satisfactorily in inferring patent phrase similarity. To address this, we introduce a graph-augmented approach to amplify the global contextual information of the patent phrases. For each patent phrase, we construct a phrase graph that links to its focal patents and a list of patents that are either cited by or cite these focal patents. The augmented phrase embedding is then derived from combining its localized contextual embedding with its global embedding within the phrase graph. We further propose a self-supervised learning objective that capitalizes on the retrieved topology to refine both the contextualized embedding and the graph parameters in an end-to-end manner. Experimental results from a unique patent phrase similarity dataset demonstrate that our approach significantly enhances the representation of patent phrases, resulting in marked improvements in similarity inference in a self-supervised fashion. Substantial improvements are also observed in the supervised setting, underscoring the potential benefits of leveraging retrieved phrase graph augmentation. △ Less

Submitted 24 March, 2024; originally announced March 2024.

Comments: Findings of NAACL 2024

Showing 1–50 of 527 results for author: Peng, Z