-
UrbanWorld: An Urban World Model for 3D City Generation
Authors:
Yu Shang,
Jiansheng Chen,
Hangyu Fan,
Jingtao Ding,
Jie Feng,
Yong Li
Abstract:
Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban en…
▽ More
Cities, as the most fundamental environment of human life, encompass diverse physical elements such as buildings, roads and vegetation with complex interconnection. Crafting realistic, interactive 3D urban environments plays a crucial role in constructing AI agents capable of perceiving, decision-making, and acting like humans in real-world environments. However, creating high-fidelity 3D urban environments usually entails extensive manual labor from designers, involving intricate detailing and accurate representation of complex urban features. Therefore, how to accomplish this in an automatical way remains a longstanding challenge. Toward this problem, we propose UrbanWorld, the first generative urban world model that can automatically create a customized, realistic and interactive 3D urban world with flexible control conditions. UrbanWorld incorporates four key stages in the automatical crafting pipeline: 3D layout generation from openly accessible OSM data, urban scene planning and designing with a powerful urban multimodal large language model (Urban MLLM), controllable urban asset rendering with advanced 3D diffusion techniques, and finally the MLLM-assisted scene refinement. The crafted high-fidelity 3D urban environments enable realistic feedback and interactions for general AI and machine perceptual systems in simulations. We are working on contributing UrbanWorld as an open-source and versatile platform for evaluating and improving AI abilities in perception, decision-making, and interaction in realistic urban environments.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Vectoring Languages
Authors:
Joseph Chen
Abstract:
Recent breakthroughs in large language models (LLM) have stirred up global attention, and the research has been accelerating non-stop since then. Philosophers and psychologists have also been researching the structure of language for decades, but they are having a hard time finding a theory that directly benefits from the breakthroughs of LLMs. In this article, we propose a novel structure of lang…
▽ More
Recent breakthroughs in large language models (LLM) have stirred up global attention, and the research has been accelerating non-stop since then. Philosophers and psychologists have also been researching the structure of language for decades, but they are having a hard time finding a theory that directly benefits from the breakthroughs of LLMs. In this article, we propose a novel structure of language that reflects well on the mechanisms behind language models and go on to show that this structure is also better at capturing the diverse nature of language compared to previous methods. An analogy of linear algebra is adapted to strengthen the basis of this perspective. We further argue about the difference between this perspective and the design philosophy for current language models. Lastly, we discuss how this perspective can lead us to research directions that may accelerate the improvements of science fastest.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$
Authors:
BESIII Collaboration,
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere
, et al. (634 additional authors not shown)
Abstract:
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and…
▽ More
Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise
Authors:
Qimin Yang,
Rongsheng Wang,
Jiexin Chen,
Runqi Su,
Tao Tan
Abstract:
Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities…
▽ More
Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities. However, we observed that despite improvements in specific domain knowledge, the performance of medical LLM in long-context understanding has significantly declined, especially compared to general language models with similar parameters. The purpose of this study is to investigate the phenomenon of reduced performance in understanding long-context in medical LLM. We designed a series of experiments to conduct open-book professional knowledge exams on all models to evaluate their ability to read long-context. By adjusting the proportion and quantity of general data and medical data in the process of fine-tuning, we can determine the best data composition to optimize the professional model and achieve a balance between long-context performance and specific domain knowledge.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Improved Belief Propagation Decoding on Surface Codes with High Accuracy and Low Latency
Authors:
Jiahan Chen,
Zhipeng Liang,
Zhengzhong Yi,
Xuan Wang
Abstract:
Quantum error correction is crucial for universal quantum computing, requiring highly accurate and low-latency decoding algorithms. Belief Propagation (BP) is notable for its linear time complexity and general applicability to quantum LDPC codes. However, BP performs poorly on highly degenerate codes without Order Statistic Decoding (OSD) post-processing, which significantly increases time complex…
▽ More
Quantum error correction is crucial for universal quantum computing, requiring highly accurate and low-latency decoding algorithms. Belief Propagation (BP) is notable for its linear time complexity and general applicability to quantum LDPC codes. However, BP performs poorly on highly degenerate codes without Order Statistic Decoding (OSD) post-processing, which significantly increases time complexity. We focus on improving BP's performance on surface codes. We first propose Momentum-BP and AdaGrad-BP, inspired by machine learning optimization techniques, to reduce the oscillation of message updating and break the symmetric trapping sets. We further propose EWAInit-BP, which adaptively updates initial probabilities and exhibits aggressive exploration capabilities. EWAInit-BP achieves the highest accuracy among BP improvements without OSD post-processing on planar surface code, toric code, and XZZX surface code, providing a 1~3 orders of magnitude improvement compared to traditional BP, and demonstrating the error correction capability even under parallel scheduling. Its theoretical O(1) time complexity and high accuracy make it a promising candidate for high-precision real-time decoders.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Multi-Channel Masked Autoencoder and Comprehensive Evaluations for Reconstructing 12-Lead ECG from Arbitrary Single-Lead ECG
Authors:
Jiarong Chen,
Wanqing Wu,
Tong Liu,
Shenda Hong
Abstract:
In the context of cardiovascular diseases (CVD) that exhibit an elevated prevalence and mortality, the electrocardiogram (ECG) is a popular and standard diagnostic tool for doctors, commonly utilizing a 12-lead configuration in clinical practice. However, the 10 electrodes placed on the surface would cause a lot of inconvenience and discomfort, while the rapidly advancing wearable devices adopt th…
▽ More
In the context of cardiovascular diseases (CVD) that exhibit an elevated prevalence and mortality, the electrocardiogram (ECG) is a popular and standard diagnostic tool for doctors, commonly utilizing a 12-lead configuration in clinical practice. However, the 10 electrodes placed on the surface would cause a lot of inconvenience and discomfort, while the rapidly advancing wearable devices adopt the reduced-lead or single-lead ECG to reduce discomfort as a solution in long-term monitoring. Since the single-lead ECG is a subset of 12-lead ECG, it provides insufficient cardiac health information and plays a substandard role in real-world healthcare applications. Hence, it is necessary to utilize signal generation technologies to reduce their clinical importance gap by reconstructing 12-lead ECG from the real single-lead ECG. Specifically, this study proposes a multi-channel masked autoencoder (MCMA) for this goal. In the experimental results, the visualized results between the generated and real signals can demonstrate the effectiveness of the proposed framework. At the same time, this study introduces a comprehensive evaluation benchmark named ECGGenEval, encompassing the signal-level, feature-level, and diagnostic-level evaluations, providing a holistic assessment of 12-lead ECG signals and generative model. Further, the quantitative experimental results are as follows, the mean square errors of 0.0178 and 0.0658, correlation coefficients of 0.7698 and 0.7237 in the signal-level evaluation, the average F1-score with two generated 12-lead ECG is 0.8319 and 0.7824 in the diagnostic-level evaluation, achieving the state-of-the-art performance. The open-source code is publicly available at \url{https://github.com/CHENJIAR3/MCMA}.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
RIMformer: An End-to-End Transformer for FMCW Radar Interference Mitigation
Authors:
Ziang Zhang,
Guangzhi Chen,
Youlong Weng,
Shunchuan Yang,
Zhiyu Jia,
Jingxuan Chen
Abstract:
Frequency-modulated continuous-wave (FMCW) radar plays a pivotal role in the field of remote sensing. The increasing degree of FMCW radar deployment has increased the mutual interference, which weakens the detection capabilities of radars and threatens reliability and safety of systems. In this paper, a novel FMCW radar interference mitigation (RIM) method, termed as RIMformer, is proposed by usin…
▽ More
Frequency-modulated continuous-wave (FMCW) radar plays a pivotal role in the field of remote sensing. The increasing degree of FMCW radar deployment has increased the mutual interference, which weakens the detection capabilities of radars and threatens reliability and safety of systems. In this paper, a novel FMCW radar interference mitigation (RIM) method, termed as RIMformer, is proposed by using an end-to-end Transformer-based structure. In the RIMformer, a dual multi-head self-attention mechanism is proposed to capture the correlations among the distinct distance elements of intermediate frequency (IF) signals. Additionally, an improved convolutional block is integrated to harness the power of convolution for extracting local features. The architecture is designed to process time-domain IF signals in an end-to-end manner, thereby avoiding the need for additional manual data processing steps. The improved decoder structure ensures the parallelization of the network to increase its computational efficiency. Simulation and measurement experiments are carried out to validate the accuracy and effectiveness of the proposed method. The results show that the proposed RIMformer can effectively mitigate interference and restore the target signals.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Incremental high average-utility itemset mining: survey and challenges
Authors:
Jing Chen,
Shengyi Yang,
Weiping Ding,
Peng Li,
Aijun Liu,
Hongjun Zhang,
Tian Li
Abstract:
The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researc…
▽ More
The High Average Utility Itemset Mining (HAUIM) technique, a variation of High Utility Itemset Mining (HUIM), uses the average utility of the itemsets. Historically, most HAUIM algorithms were designed for static databases. However, practical applications like market basket analysis and business decision-making necessitate regular updates of the database with new transactions. As a result, researchers have developed incremental HAUIM (iHAUIM) algorithms to identify HAUIs in a dynamically updated database. Contrary to conventional methods that begin from scratch, the iHAUIM algorithm facilitates incremental changes and outputs, thereby reducing the cost of discovery. This paper provides a comprehensive review of the state-of-the-art iHAUIM algorithms, analyzing their unique characteristics and advantages. First, we explain the concept of iHAUIM, providing formulas and real-world examples for a more in-depth understanding. Subsequently, we categorize and discuss the key technologies used by varying types of iHAUIM algorithms, encompassing Apriori-based, Tree-based, and Utility-list-based techniques. Moreover, we conduct a critical analysis of each mining method's advantages and disadvantages. In conclusion, we explore potential future directions, research opportunities, and various extensions of the iHAUIM algorithm.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly
Authors:
Junhao Chen,
Shengding Hu,
Zhiyuan Liu,
Maosong Sun
Abstract:
Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output t…
▽ More
Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output the results of two-digit number additions with lengths extending up to 15 addends. We hypothesize that the model emerges Implicit Discrete State Representations (IDSRs) within its hidden states and performs symbolic calculations internally. To test this hypothesis, we design a sequence of experiments that look into the hidden states. Specifically, we first confirm that IDSRs exist. Then, we provide interesting observations about the formation of IDSRs from layer, digit, and sequence perspectives. Finally, we confirm that models indeed use IDSRs to produce the final answers. However, we also discover that these state representations are far from lossless in current open-sourced models, leading to inaccuracies in their final performance. Our work presents a novel exploration of LLMs' symbolic calculation abilities and the underlying mechanisms.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Large Vision-Language Models as Emotion Recognizers in Context Awareness
Authors:
Yuxuan Lei,
Dingkang Yang,
Zhaoyu Chen,
Jiawei Chen,
Peng Zhai,
Lihua Zhang
Abstract:
Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore…
▽ More
Context-aware emotion recognition (CAER) is a complex and significant task that requires perceiving emotions from various contextual cues. Previous approaches primarily focus on designing sophisticated architectures to extract emotional cues from images. However, their knowledge is confined to specific training datasets and may reflect the subjective emotional biases of the annotators. Furthermore, acquiring large amounts of labeled data is often challenging in real-world applications. In this paper, we systematically explore the potential of leveraging Large Vision-Language Models (LVLMs) to empower the CAER task from three paradigms: 1) We fine-tune LVLMs on two CAER datasets, which is the most common way to transfer large models to downstream tasks. 2) We design zero-shot and few-shot patterns to evaluate the performance of LVLMs in scenarios with limited data or even completely unseen. In this case, a training-free framework is proposed to fully exploit the In-Context Learning (ICL) capabilities of LVLMs. Specifically, we develop an image similarity-based ranking algorithm to retrieve examples; subsequently, the instructions, retrieved examples, and the test example are combined to feed LVLMs to obtain the corresponding sentiment judgment. 3) To leverage the rich knowledge base of LVLMs, we incorporate Chain-of-Thought (CoT) into our framework to enhance the model's reasoning ability and provide interpretable results. Extensive experiments and analyses demonstrate that LVLMs achieve competitive performance in the CAER task across different paradigms. Notably, the superior performance in few-shot settings indicates the feasibility of LVLMs for accomplishing specific tasks without extensive training.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
X-ray and multiwavelength polarization of Mrk 501 from 2022 to 2023
Authors:
Chien-Ting J. Chen,
Ioannis Liodakis,
Riccardo Middei,
Dawoon E. Kim,
Laura Di Gesu,
Alessandro Di Marco,
Steven R. Ehlert,
Manel Errando,
Michela Negro,
Svetlana G. Jorstad,
Alan P. Marscher,
Kinwah Wu,
Iván Agudo,
Juri Poutanen,
Tsunefumi Mizuno,
Pouya M. Kouch,
Elina Lindfors,
George A. Borman,
Tatiana S. Grishina,
Evgenia N. Kopatskaya,
Elena G. Larionova,
Daria A. Morozova,
Sergey S. Savchenko,
Ivan S. Troitsky,
Yulia V. Troitskaya
, et al. (121 additional authors not shown)
Abstract:
We present multiwavelength polarization measurements of the luminous blazar Mrk~501 over a 14-month period. The 2--8 keV X-ray polarization was measured with the Imaging X-ray Polarimetry Explorer (IXPE) with six 100-ks observations spanning from 2022 March to 2023 April. Each IXPE observation was accompanied by simultaneous X-ray data from NuSTAR, Swift/XRT, and/or XMM-Newton. Complementary optic…
▽ More
We present multiwavelength polarization measurements of the luminous blazar Mrk~501 over a 14-month period. The 2--8 keV X-ray polarization was measured with the Imaging X-ray Polarimetry Explorer (IXPE) with six 100-ks observations spanning from 2022 March to 2023 April. Each IXPE observation was accompanied by simultaneous X-ray data from NuSTAR, Swift/XRT, and/or XMM-Newton. Complementary optical-infrared polarization measurements were also available in the B, V, R, I, and J bands, as were radio polarization measurements from 4.85 GHz to 225.5 GHz. Among the first five IXPE observations, we did not find significant variability in the X-ray polarization degree and angle with IXPE. However, the most recent sixth observation found an elevated polarization degree at $>3σ$ above the average of the other five observations. The optical and radio measurements show no apparent correlations with the X-ray polarization properties. Throughout the six IXPE observations, the X-ray polarization degree remained higher than, or similar to, the R-band optical polarization degree, which remained higher than the radio value. This is consistent with the energy-stratified shock scenario proposed to explain the first two IXPE observations, in which the polarized X-ray, optical, and radio emission arises from different regions.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
BECAUSE: Bilinear Causal Representation for Generalizable Offline Model-based Reinforcement Learning
Authors:
Haohong Lin,
Wenhao Ding,
Jian Chen,
Laixi Shi,
Jiacheng Zhu,
Bo Li,
Ding Zhao
Abstract:
Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper firs…
▽ More
Offline model-based reinforcement learning (MBRL) enhances data efficiency by utilizing pre-collected datasets to learn models and policies, especially in scenarios where exploration is costly or infeasible. Nevertheless, its performance often suffers from the objective mismatch between model and policy learning, resulting in inferior performance despite accurate model predictions. This paper first identifies the primary source of this mismatch comes from the underlying confounders present in offline data for MBRL. Subsequently, we introduce \textbf{B}ilin\textbf{E}ar \textbf{CAUS}al r\textbf{E}presentation~(BECAUSE), an algorithm to capture causal representation for both states and actions to reduce the influence of the distribution shift, thus mitigating the objective mismatch problem. Comprehensive evaluations on 18 tasks that vary in data quality and environment context demonstrate the superior performance of BECAUSE over existing offline RL algorithms. We show the generalizability and robustness of BECAUSE under fewer samples or larger numbers of confounders. Additionally, we offer theoretical analysis of BECAUSE to prove its error bound and sample efficiency when integrating causal representation into offline MBRL.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Authors:
Ruisheng Cao,
Fangyu Lei,
Haoyuan Wu,
Jixuan Chen,
Yeqiao Fu,
Hongcheng Gao,
Xinzhuang Xiong,
Hanchong Zhang,
Yuchen Mao,
Wenjing Hu,
Tianbao Xie,
Hongshen Xu,
Danyang Zhang,
Sida Wang,
Ruoxi Sun,
Pengcheng Yin,
Caiming Xiong,
Ansong Ni,
Qian Liu,
Victor Zhong,
Lu Chen,
Kai Yu,
Tao Yu
Abstract:
Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivit…
▽ More
Data science and engineering workflows often span multiple stages, from warehousing to orchestration, using tools like BigQuery, dbt, and Airbyte. As vision language models (VLMs) advance in multimodal understanding and code generation, VLM-based agents could potentially automate these workflows by generating SQL queries, Python code, and GUI operations. This automation can improve the productivity of experts while democratizing access to large-scale data analysis. In this paper, we introduce Spider2-V, the first multimodal agent benchmark focusing on professional data science and engineering workflows, featuring 494 real-world tasks in authentic computer environments and incorporating 20 enterprise-level professional applications. These tasks, derived from real-world use cases, evaluate the ability of a multimodal agent to perform data-related tasks by writing code and managing the GUI in enterprise data software systems. To balance realistic simulation with evaluation simplicity, we devote significant effort to developing automatic configurations for task setup and carefully crafting evaluation metrics for each task. Furthermore, we supplement multimodal agents with comprehensive documents of these enterprise data software systems. Our empirical evaluation reveals that existing state-of-the-art LLM/VLM-based agents do not reliably automate full data workflows (14.0% success). Even with step-by-step guidance, these agents still underperform in tasks that require fine-grained, knowledge-intensive GUI actions (16.2%) and involve remote cloud-hosted workspaces (10.6%). We hope that Spider2-V paves the way for autonomous multimodal agents to transform the automation of data science and engineering workflow. Our code and data are available at https://spider2-v.github.io.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
GRUtopia: Dream General Robots in a City at Scale
Authors:
Hanqing Wang,
Jiahe Chen,
Wensi Huang,
Qingwei Ben,
Tai Wang,
Boyu Mi,
Tao Huang,
Siheng Zhao,
Yilun Chen,
Sizhe Yang,
Peizhou Cao,
Wenye Yu,
Zichao Ye,
Jialun Li,
Junfeng Long,
Zirui Wang,
Huiling Wang,
Ying Zhao,
Zhongying Tu,
Yu Qiao,
Dahua Lin,
Jiangmiao Pang
Abstract:
Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements:…
▽ More
Recent works have been exploring the scaling laws in the field of Embodied AI. Given the prohibitive costs of collecting real-world data, we believe the Simulation-to-Real (Sim2Real) paradigm is a crucial step for scaling the learning of embodied models. This paper introduces project GRUtopia, the first simulated interactive 3D society designed for various robots. It features several advancements: (a) The scene dataset, GRScenes, includes 100k interactive, finely annotated scenes, which can be freely combined into city-scale environments. In contrast to previous works mainly focusing on home, GRScenes covers 89 diverse scene categories, bridging the gap of service-oriented environments where general robots would be initially deployed. (b) GRResidents, a Large Language Model (LLM) driven Non-Player Character (NPC) system that is responsible for social interaction, task generation, and task assignment, thus simulating social scenarios for embodied AI applications. (c) The benchmark, GRBench, supports various robots but focuses on legged robots as primary agents and poses moderately challenging tasks involving Object Loco-Navigation, Social Loco-Navigation, and Loco-Manipulation. We hope that this work can alleviate the scarcity of high-quality data in this field and provide a more comprehensive assessment of Embodied AI research. The project is available at https://github.com/OpenRobotLab/GRUtopia.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Multilingual Contrastive Decoding via Language-Agnostic Layers Skipping
Authors:
Wenhao Zhu,
Sizhe Liu,
Shujian Huang,
Shuaijie She,
Chris Wendler,
Jiajun Chen
Abstract:
Decoding by contrasting layers (DoLa), is designed to improve the generation quality of large language models (LLMs) by contrasting the prediction probabilities between an early exit output (amateur logits) and the final output (expert logits). However, we find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the m…
▽ More
Decoding by contrasting layers (DoLa), is designed to improve the generation quality of large language models (LLMs) by contrasting the prediction probabilities between an early exit output (amateur logits) and the final output (expert logits). However, we find that this approach does not work well on non-English tasks. Inspired by previous interpretability work on language transition during the model's forward pass, we discover that this issue arises from a language mismatch between early exit output and final output. In this work, we propose an improved contrastive decoding algorithm that is effective for diverse languages beyond English. To obtain more helpful amateur logits, we devise two strategies to skip a set of bottom, language-agnostic layers based on our preliminary analysis. Experimental results on multilingual reasoning benchmarks demonstrate that our proposed method outperforms previous contrastive decoding baselines and substantially improves LLM's chain-of-thought reasoning accuracy across 11 languages. The project will be available at: https://github.com/NJUNLP/SkipLayerCD.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
IE-NeRF: Inpainting Enhanced Neural Radiance Fields in the Wild
Authors:
Shuaixian Wang,
Haoran Xu,
Yaokun Li,
Jiwei Chen,
Guang Tan
Abstract:
We present a novel approach for synthesizing realistic novel views using Neural Radiance Fields (NeRF) with uncontrolled photos in the wild. While NeRF has shown impressive results in controlled settings, it struggles with transient objects commonly found in dynamic and time-varying scenes. Our framework called \textit{Inpainting Enhanced NeRF}, or \ours, enhances the conventional NeRF by drawing…
▽ More
We present a novel approach for synthesizing realistic novel views using Neural Radiance Fields (NeRF) with uncontrolled photos in the wild. While NeRF has shown impressive results in controlled settings, it struggles with transient objects commonly found in dynamic and time-varying scenes. Our framework called \textit{Inpainting Enhanced NeRF}, or \ours, enhances the conventional NeRF by drawing inspiration from the technique of image inpainting. Specifically, our approach extends the Multi-Layer Perceptrons (MLP) of NeRF, enabling it to simultaneously generate intrinsic properties (static color, density) and extrinsic transient masks. We introduce an inpainting module that leverages the transient masks to effectively exclude occlusions, resulting in improved volume rendering quality. Additionally, we propose a new training strategy with frequency regularization to address the sparsity issue of low-frequency transient components. We evaluate our approach on internet photo collections of landmarks, demonstrating its ability to generate high-quality novel views and achieve state-of-the-art performance.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs
Authors:
Rong Ma,
Jie Chen,
Xiangyang Xue,
Jian Pu
Abstract:
Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space acr…
▽ More
Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks. This enables semantic segmentation models to be trained simultaneously on multiple datasets, resulting in performance improvements. Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation. This significantly enhances the efficiency and effectiveness of multi-dataset segmentation model training. The results demonstrate that our method significantly outperforms other multi-dataset training methods when trained on seven datasets simultaneously, and achieves state-of-the-art performance on the WildDash 2 benchmark.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Local Action-Guided Motion Diffusion Model for Text-to-Motion Generation
Authors:
Peng Jin,
Hao Li,
Zesen Cheng,
Kehan Li,
Runyi Yu,
Chang Liu,
Xiangyang Ji,
Li Yuan,
Jie Chen
Abstract:
Text-to-motion generation requires not only grounding local actions in language but also seamlessly blending these individual actions to synthesize diverse and realistic global motions. However, existing motion generation methods primarily focus on the direct synthesis of global motions while neglecting the importance of generating and controlling local actions. In this paper, we propose the local…
▽ More
Text-to-motion generation requires not only grounding local actions in language but also seamlessly blending these individual actions to synthesize diverse and realistic global motions. However, existing motion generation methods primarily focus on the direct synthesis of global motions while neglecting the importance of generating and controlling local actions. In this paper, we propose the local action-guided motion diffusion model, which facilitates global motion generation by utilizing local actions as fine-grained control signals. Specifically, we provide an automated method for reference local action sampling and leverage graph attention networks to assess the guiding weight of each local action in the overall motion synthesis. During the diffusion process for synthesizing global motion, we calculate the local-action gradient to provide conditional guidance. This local-to-global paradigm reduces the complexity associated with direct global motion generation and promotes motion diversity via sampling diverse actions as conditions. Extensive experiments on two human motion datasets, i.e., HumanML3D and KIT, demonstrate the effectiveness of our method. Furthermore, our method provides flexibility in seamlessly combining various local actions and continuous guiding weight adjustment, accommodating diverse user preferences, which may hold potential significance for the community. The project page is available at https://jpthu17.github.io/GuidedMotion-project/.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
A Study on Lampreys Population Based on Sex-Ratio-Related Growth-Balance Model
Authors:
Zuhua Ji,
Jiarui Chen,
Zihang Wang
Abstract:
Lampreys are one of the oldest species in the world, living longer than dinosaurs, which is related to the ability to change the sex ratio during their lifespan. In this paper, to understand how sex ratio and food quantity affect the population growth rate of lampreys, the researchers draw inspiration from the logistics model and established a model called EcoSexChange(ESC), which results in a pop…
▽ More
Lampreys are one of the oldest species in the world, living longer than dinosaurs, which is related to the ability to change the sex ratio during their lifespan. In this paper, to understand how sex ratio and food quantity affect the population growth rate of lampreys, the researchers draw inspiration from the logistics model and established a model called EcoSexChange(ESC), which results in a population initially increasing and then stabilizing, a reasonable outcome that may apply to other organisms with significant differences in consumption between sexes. Subsequently, this paper develops the Sex Ratio Adaptation Eco Impact (SRAEI) model based on the ESC model using the ABM algorithm to simulate how the population of lampreys, whose lives are divided into seven stages, grows and stabilizes. Then introduces a sudden disaster factor in the middle of the simulation, while also comparing lampreys that cannot adjust their sex ratio. The results of this paper are of great reference significance for people to analyze the population changes of lampreys in different living environments, and they are also easy to apply to other species with large differences between males and females.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
High Voltage (~2 kV) field-plated Al0.64Ga0.36N-channel HEMTs
Authors:
Md Tahmidul Alam,
Jiahao Chen,
Kenneth Stephenson,
Md Abdullah-Al Mamun,
Abdullah Al Mamun Mazumder,
Shubhra S. Pasayat,
Asif Khan,
Chirag Gupta
Abstract:
High voltage (~2 kV) AlGaN-channel HEMTs were fabricated with 64% Aluminum composition in the channel. The average on-resistance was ~75 ohm. mm (~21 miliohm. cm^2) for LGD = 20 microns. Breakdown voltage reached >3 kV (tool limit) before passivation however it reduced to ~2 kV after SiN surface passivation and field plates. The apparent high breakdown voltage prior to passivation can possibly be…
▽ More
High voltage (~2 kV) AlGaN-channel HEMTs were fabricated with 64% Aluminum composition in the channel. The average on-resistance was ~75 ohm. mm (~21 miliohm. cm^2) for LGD = 20 microns. Breakdown voltage reached >3 kV (tool limit) before passivation however it reduced to ~2 kV after SiN surface passivation and field plates. The apparent high breakdown voltage prior to passivation can possibly be attributed to the field plate effect of the charged trap states of the surface. The breakdown voltage and RON demonstrated a strong linear correlation in a scattered plot with ~50 measured transistors. In pulsed IV measurements with 100 microsecond pulse width and 40 V of off-state bias (tool limit), the dynamic RON increased by ~5% compared to DC RON and current collapse was <10%.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Learning Unlabeled Clients Divergence via Anchor Model Aggregation for Federated Semi-supervised Learning
Authors:
Marawan Elbatel,
Hualiang Wang,
Jixiang Chen,
Hao Wang,
Xiaomeng Li
Abstract:
Federated semi-supervised learning (FedSemi) refers to scenarios where there may be clients with fully labeled data, clients with partially labeled, and even fully unlabeled clients while preserving data privacy. However, challenges arise from client drift due to undefined heterogeneous class distributions and erroneous pseudo-labels. Existing FedSemi methods typically fail to aggregate models fro…
▽ More
Federated semi-supervised learning (FedSemi) refers to scenarios where there may be clients with fully labeled data, clients with partially labeled, and even fully unlabeled clients while preserving data privacy. However, challenges arise from client drift due to undefined heterogeneous class distributions and erroneous pseudo-labels. Existing FedSemi methods typically fail to aggregate models from unlabeled clients due to their inherent unreliability, thus overlooking unique information from their heterogeneous data distribution, leading to sub-optimal results. In this paper, we enable unlabeled client aggregation through SemiAnAgg, a novel Semi-supervised Anchor-Based federated Aggregation. SemiAnAgg learns unlabeled client contributions via an anchor model, effectively harnessing their informative value. Our key idea is that by feeding local client data to the same global model and the same consistently initialized anchor model (i.e., random model), we can measure the importance of each unlabeled client accordingly. Extensive experiments demonstrate that SemiAnAgg achieves new state-of-the-art results on four widely used FedSemi benchmarks, leading to substantial performance improvements: a 9% increase in accuracy on CIFAR-100 and a 7.6% improvement in recall on the medical dataset ISIC-18, compared with prior state-of-the-art. Code is available at: https://github.com/xmed-lab/SemiAnAgg.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Vector Field Attention for Deformable Image Registration
Authors:
Yihao Liu,
Junyu Chen,
Lianrui Zuo,
Aaron Carass,
Jerry L. Prince
Abstract:
Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their…
▽ More
Deformable image registration establishes non-linear spatial correspondences between fixed and moving images. Deep learning-based deformable registration methods have been widely studied in recent years due to their speed advantage over traditional algorithms as well as their better accuracy. Most existing deep learning-based methods require neural networks to encode location information in their feature maps and predict displacement or deformation fields though convolutional or fully connected layers from these high-dimensional feature maps. In this work, we present Vector Field Attention (VFA), a novel framework that enhances the efficiency of the existing network design by enabling direct retrieval of location correspondences. VFA uses neural networks to extract multi-resolution feature maps from the fixed and moving images and then retrieves pixel-level correspondences based on feature similarity. The retrieval is achieved with a novel attention module without the need of learnable parameters. VFA is trained end-to-end in either a supervised or unsupervised manner. We evaluated VFA for intra- and inter-modality registration and for unsupervised and semi-supervised registration using public datasets, and we also evaluated it on the Learn2Reg challenge. Experimental results demonstrate the superior performance of VFA compared to existing methods. The source code of VFA is publicly available at https://github.com/yihao6/vfa/.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Fast and Provable Simultaneous Blind Super-Resolution and Demixing for Point Source Signals: Scaled Gradient Descent without Regularization
Authors:
Jinchi Chen
Abstract:
We address the problem of simultaneously recovering a sequence of point source signals from observations limited to the low-frequency end of the spectrum of their summed convolution, where the point spread functions (PSFs) are unknown. By exploiting the low-dimensional structures of the signals and PSFs, we formulate this as a low-rank matrix demixing problem. To solve this, we develop a scaled gr…
▽ More
We address the problem of simultaneously recovering a sequence of point source signals from observations limited to the low-frequency end of the spectrum of their summed convolution, where the point spread functions (PSFs) are unknown. By exploiting the low-dimensional structures of the signals and PSFs, we formulate this as a low-rank matrix demixing problem. To solve this, we develop a scaled gradient descent method without balancing regularization. We establish theoretical guarantees under mild conditions, demonstrating that our method, with spectral initialization, converges to the ground truth at a linear rate, independent of the condition number of the underlying data matrices. Numerical experiments indicate that our approach is competitive with existing convex methods in terms of both recovery accuracy and computational efficiency.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Authors:
Wentao Zhao,
Jiaming Chen,
Ziyu Meng,
Donghui Mao,
Ran Song,
Wei Zhang
Abstract:
Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage…
▽ More
Although Model Predictive Control (MPC) can effectively predict the future states of a system and thus is widely used in robotic manipulation tasks, it does not have the capability of environmental perception, leading to the failure in some complex scenarios. To address this issue, we introduce Vision-Language Model Predictive Control (VLMPC), a robotic manipulation framework which takes advantage of the powerful perception capability of vision language model (VLM) and integrates it with MPC. Specifically, we propose a conditional action sampling module which takes as input a goal image or a language instruction and leverages VLM to sample a set of candidate action sequences. Then, a lightweight action-conditioned video prediction model is designed to generate a set of future frames conditioned on the candidate action sequences. VLMPC produces the optimal action sequence with the assistance of VLM through a hierarchical cost function that formulates both pixel-level and knowledge-level consistence between the current observation and the goal image. We demonstrate that VLMPC outperforms the state-of-the-art methods on public benchmarks. More importantly, our method showcases excellent performance in various real-world tasks of robotic manipulation. Code is available at~\url{https://github.com/PPjmchen/VLMPC}.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Team up GBDTs and DNNs: Advancing Efficient and Effective Tabular Prediction with Tree-hybrid MLPs
Authors:
Jiahuan Yan,
Jintai Chen,
Qianxing Wang,
Danny Z. Chen,
Jian Wu
Abstract:
Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective mo…
▽ More
Tabular datasets play a crucial role in various applications. Thus, developing efficient, effective, and widely compatible prediction algorithms for tabular data is important. Currently, two prominent model types, Gradient Boosted Decision Trees (GBDTs) and Deep Neural Networks (DNNs), have demonstrated performance advantages on distinct tabular prediction tasks. However, selecting an effective model for a specific tabular dataset is challenging, often demanding time-consuming hyperparameter tuning. To address this model selection dilemma, this paper proposes a new framework that amalgamates the advantages of both GBDTs and DNNs, resulting in a DNN algorithm that is as efficient as GBDTs and is competitively effective regardless of dataset preferences for GBDTs or DNNs. Our idea is rooted in an observation that deep learning (DL) offers a larger parameter space that can represent a well-performing GBDT model, yet the current back-propagation optimizer struggles to efficiently discover such optimal functionality. On the other hand, during GBDT development, hard tree pruning, entropy-driven feature gate, and model ensemble have proved to be more adaptable to tabular data. By combining these key components, we present a Tree-hybrid simple MLP (T-MLP). In our framework, a tensorized, rapidly trained GBDT feature gate, a DNN architecture pruning approach, as well as a vanilla back-propagation optimizer collaboratively train a randomly initialized MLP model. Comprehensive experiments show that T-MLP is competitive with extensively tuned DNNs and GBDTs in their dominating tabular benchmarks (88 datasets) respectively, all achieved with compact model storage and significantly reduced training duration.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Thunderbolt: Causal Concurrent Consensus and Execution
Authors:
Junchao Chen,
Alberto Sonnino,
Lefteris Kokoris-Kogias,
Mohammad Sadoghi
Abstract:
In the realm of blockchain systems, smart contracts have gained widespread adoption owing to their programmability. Consequently, developing a system capable of facilitating high throughput and scalability is of paramount importance. Directed acyclic graph (DAG) consensus protocols have demonstrated notable enhancements in both throughput and latency, however, the serial execution is now becoming…
▽ More
In the realm of blockchain systems, smart contracts have gained widespread adoption owing to their programmability. Consequently, developing a system capable of facilitating high throughput and scalability is of paramount importance. Directed acyclic graph (DAG) consensus protocols have demonstrated notable enhancements in both throughput and latency, however, the serial execution is now becoming a bottleneck. Numerous approaches prove impractical for smart contracts by assuming that read/write sets are known in prior. This paper introduces Thunderbolt, a novel architecture based on DAG-based protocols, that aims to furnish a scalable and concurrent execution for smart contract transactions. Inspired by Hyperledger, Thunderbolt also expands Execute-Order-Validate architecture in which transactions are distributed into distinct replicas, with execution outcomes determined prior to ordering through the DAG-based protocol. Existing protocols adopt serial executions after the ordering to avoid non-determinism. However, Thunderbolt provides parallel pre-execution before the ordering as well as parallel verifications once any source of non-determinism is removed. Each replica validates the transaction results during the construction of the DAG other than after the ordering following the construction to improve the latency. In an effort to enhance smart contract execution, we implement an execution engine that constructs a dependency graph to dynamically assign transaction orders, thus mitigating abort rates due to execution conflicts. Additionally, we introduce a novel shard reconfiguration to withstand malicious attacks by relocating replicas from the current DAG to a new DAG, and rotating the shards among different replicas. Our comparison of the results on SmallBank with serial execution on Narwhal-Tusk revealed a remarkable 50 times speedup with 64 replicas.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Novel structures and collapse of solitons in nonminimally gravitating dark matter halos
Authors:
Jiajun Chen,
Hong-Yi Zhang
Abstract:
Ultralight dark matter simulations predict Bose-Einstein condensations with short-range correlation, known as solitons or boson stars, at the centers of dark matter halos. This paper investigates the formation and collapse of dark matter solitons influenced by nonminimal gravitational effects, characterized by gradient-dependent self-interactions of dark matter and an additional source in Poisson'…
▽ More
Ultralight dark matter simulations predict Bose-Einstein condensations with short-range correlation, known as solitons or boson stars, at the centers of dark matter halos. This paper investigates the formation and collapse of dark matter solitons influenced by nonminimal gravitational effects, characterized by gradient-dependent self-interactions of dark matter and an additional source in Poisson's equation for gravity. Our simulations suggest that the initial evolution of dark matter resembles that without nonminimal gravitational effects. However, regions with negative mass density may develop, and solitons will collapse when their densities reach certain critical values for both positive and negative coupling constants. With strong nonminimal coupling, structure growth could be significantly enhanced.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Study of a Novel Capacitive Pressure Sensor Using Spiral Comb Electrodes
Authors:
Wenjie Chen,
Qi Yang,
Qi Liu,
Yiqun Zhang,
Liang He,
Yuanlin Xia,
Zhuqing Wang,
Yubo Huang,
Jianfeng Chen,
Cao Xia
Abstract:
For traditional capacitive pressure sensors, high nonlinearity and poor sensitivity greatly limited their sensing applications. Hence, an innovative design of capacitors based on spiral comb electrodes is proposed for high-sensitivity pressure detection in this work. Compared to traditional capacitive pressure sensors with straight plate electrodes, the proposed sensor with the spiral electrodes i…
▽ More
For traditional capacitive pressure sensors, high nonlinearity and poor sensitivity greatly limited their sensing applications. Hence, an innovative design of capacitors based on spiral comb electrodes is proposed for high-sensitivity pressure detection in this work. Compared to traditional capacitive pressure sensors with straight plate electrodes, the proposed sensor with the spiral electrodes increases the overlap areas of electrodes sufficiently, the pressure sensitivity can thus be greatly improved. Moreover, the capacitance variation of the proposed sensor is dominated by the change of the overlap area of the electrodes rather than the electrode's distance, the linearity can also thus be improved to higher than 0.99. Theoretical analysis and COMSOL-based finite element simulation have been implemented for principle verification and performance optimization. Simulation results show that the proposed design has a mechanical sensitivity of 1.5x10-4 m/Pa, capacitive sensitivity of 1.10 aF/Pa, and nonlinear error of 3.63%, respectively, at the pressure range from 0 to 30 kPa. An equivalent experiment has been further carried out for verification. Experimental results also show that both the sensitivity and linearity of capacitive pressure sensors with spiral electrodes are higher than those with straight electrodes. This work not only provides a new avenue for capacitor design, but also can be applied to high-sensitivity pressure detection.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Adjusting for Participation Bias in Case-Control Genetic Association Studies for Rare Diseases
Authors:
Le Wang,
Zhengbang Li,
Ben Fitzpatrick,
Clarice Weinberg,
Jinbo Chen
Abstract:
Collection of genotype data in case-control genetic association studies may often be incomplete for reasons related to genes themselves. This non-ignorable missingness structure, if not appropriately accounted for, can result in participation bias in association analyses. To deal with this issue, Chen et al. (2016) proposed to collect additional genetic information from family members of individua…
▽ More
Collection of genotype data in case-control genetic association studies may often be incomplete for reasons related to genes themselves. This non-ignorable missingness structure, if not appropriately accounted for, can result in participation bias in association analyses. To deal with this issue, Chen et al. (2016) proposed to collect additional genetic information from family members of individuals whose genotype data were not available, and developed a maximum likelihood method for bias correction. In this study, we develop an estimating equation approach to analyzing data collected from this design that allows adjustment of covariates. It jointly estimates odds ratio parameters for genetic association and missingness, where a logistic regression model is used to relate missingness to genotype and other covariates. Our method allows correlation between genotype and covariates while using genetic information from family members to provide information on the missing genotype data. In the estimating equation for genetic association parameters, we weight the contribution of each genotyped subject to the empirical likelihood score function by the inverse probability that the genotype data are available. We evaluate large and finite sample performance of our method via simulation studies and apply it to a family-based case-control study of breast cancer.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Light Dark Matter Constraints from SuperCDMS HVeV Detectors Operated Underground with an Anticoincidence Event Selection
Authors:
SuperCDMS Collaboration,
M. F. Albakry,
I. Alkhatib,
D. Alonso-González,
D. W. P. Amaral,
J. Anczarski,
T. Aralis,
T. Aramaki,
I. J. Arnquist,
I. Ataee Langroudy,
E. Azadbakht,
C. Bathurst,
R. Bhattacharyya,
A. J. Biffl,
P. L. Brink,
M. Buchanan,
R. Bunker,
B. Cabrera,
R. Calkins,
R. A. Cameron,
C. Cartaro,
D. G. Cerdeño,
Y. -Y. Chang,
M. Chaudhuri,
J. -H. Chen
, et al. (116 additional authors not shown)
Abstract:
This article presents constraints on dark-matter-electron interactions obtained from the first underground data-taking campaign with multiple SuperCDMS HVeV detectors operated in the same housing. An exposure of 7.63 g-days is used to set upper limits on the dark-matter-electron scattering cross section for dark matter masses between 0.5 and 1000 MeV/$c^2$, as well as upper limits on dark photon k…
▽ More
This article presents constraints on dark-matter-electron interactions obtained from the first underground data-taking campaign with multiple SuperCDMS HVeV detectors operated in the same housing. An exposure of 7.63 g-days is used to set upper limits on the dark-matter-electron scattering cross section for dark matter masses between 0.5 and 1000 MeV/$c^2$, as well as upper limits on dark photon kinetic mixing and axion-like particle axioelectric coupling for masses between 1.2 and 23.3 eV/$c^2$. Compared to an earlier HVeV search, sensitivity was improved as a result of an increased overburden of 225 meters of water equivalent, an anticoincidence event selection, and better pile-up rejection. In the case of dark-matter-electron scattering via a heavy mediator, an improvement by up to a factor of 25 in cross-section sensitivity was achieved.
△ Less
Submitted 12 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Maritime Tracking Data Analysis and Integration with AISdb
Authors:
Gabriel Spadon,
Jay Kumar,
Jinkun Chen,
Matthew Smith,
Casey Hilliard,
Sarah Vela,
Romina Gehrmann,
Claudio DiBacco,
Stan Matwin,
Ronald Pelot
Abstract:
Efficiently handling Automatic Identification System (AIS) data is vital for enhancing maritime safety and navigation, yet is hindered by the system's high volume and error-prone datasets. This paper introduces the Automatic Identification System Database (AISdb), a novel tool designed to address the challenges of processing and analyzing AIS data. AISdb is a comprehensive, open-source platform th…
▽ More
Efficiently handling Automatic Identification System (AIS) data is vital for enhancing maritime safety and navigation, yet is hindered by the system's high volume and error-prone datasets. This paper introduces the Automatic Identification System Database (AISdb), a novel tool designed to address the challenges of processing and analyzing AIS data. AISdb is a comprehensive, open-source platform that enables the integration of AIS data with environmental datasets, thus enriching analyses of vessel movements and their environmental impacts. By facilitating AIS data collection, cleaning, and spatio-temporal querying, AISdb significantly advances AIS data research. Utilizing AIS data from various sources, AISdb demonstrates improved handling and analysis of vessel information, contributing to enhancing maritime safety, security, and environmental sustainability efforts.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Towards Robust Alignment of Language Models: Distributionally Robustifying Direct Preference Optimization
Authors:
Junkang Wu,
Yuexiang Xie,
Zhengyi Yang,
Jiancan Wu,
Jiawei Chen,
Jinyang Gao,
Bolin Ding,
Xiang Wang,
Xiangnan He
Abstract:
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robus…
▽ More
This study addresses the challenge of noise in training datasets for Direct Preference Optimization (DPO), a method for aligning Large Language Models (LLMs) with human preferences. We categorize noise into pointwise noise, which includes low-quality data points, and pairwise noise, which encompasses erroneous data pair associations that affect preference rankings. Utilizing Distributionally Robust Optimization (DRO), we enhance DPO's resilience to these types of noise. Our theoretical insights reveal that DPO inherently embeds DRO principles, conferring robustness to pointwise noise, with the regularization coefficient $β$ playing a critical role in its noise resistance. Extending this framework, we introduce Distributionally Robustifying DPO (Dr. DPO), which integrates pairwise robustness by optimizing against worst-case pairwise scenarios. The novel hyperparameter $β'$ in Dr. DPO allows for fine-tuned control over data pair reliability, providing a strategic balance between exploration and exploitation in noisy training environments. Empirical evaluations demonstrate that Dr. DPO substantially improves the quality of generated text and response accuracy in preference datasets, showcasing enhanced performance in both noisy and noise-free settings. The code is available at https://github.com/junkangwu/Dr_DPO.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Cross Domain Object Detection via Multi-Granularity Confidence Alignment based Mean Teacher
Authors:
Jiangming Chen,
Li Liu,
Wanxia Deng,
Zhen Liu,
Yu Liu,
Yingmei Wei,
Yongxiang Liu
Abstract:
Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level ov…
▽ More
Cross domain object detection learns an object detector for an unlabeled target domain by transferring knowledge from an annotated source domain. Promising results have been achieved via Mean Teacher, however, pseudo labeling which is the bottleneck of mutual learning remains to be further explored. In this study, we find that confidence misalignment of the predictions, including category-level overconfidence, instance-level task confidence inconsistency, and image-level confidence misfocusing, leading to the injection of noisy pseudo label in the training process, will bring suboptimal performance on the target domain. To tackle this issue, we present a novel general framework termed Multi-Granularity Confidence Alignment Mean Teacher (MGCAMT) for cross domain object detection, which alleviates confidence misalignment across category-, instance-, and image-levels simultaneously to obtain high quality pseudo supervision for better teacher-student learning. Specifically, to align confidence with accuracy at category level, we propose Classification Confidence Alignment (CCA) to model category uncertainty based on Evidential Deep Learning (EDL) and filter out the category incorrect labels via an uncertainty-aware selection strategy. Furthermore, to mitigate the instance-level misalignment between classification and localization, we design Task Confidence Alignment (TCA) to enhance the interaction between the two task branches and allow each classification feature to adaptively locate the optimal feature for the regression. Finally, we develop imagery Focusing Confidence Alignment (FCA) adopting another way of pseudo label learning, i.e., we use the original outputs from the Mean Teacher network for supervised learning without label assignment to concentrate on holistic information in the target image. These three procedures benefit from each other from a cooperative learning perspective.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Leveraging Self-Supervised Learning for MIMO-OFDM Channel Representation and Generation
Authors:
Zongxi Liu,
Jiacheng Chen,
Yunting Xu,
Ting Ma,
Jingbo Liu,
Haibo Zhou,
Dusit Niyato
Abstract:
In communications theory, the capacity of multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems is fundamentally determined by wireless channels, which exhibit both diversity and correlation in spatial, frequency and temporal domains. It is further envisioned to exploit the inherent nature of channels, namely representation, to achieve geolocation-based MIMO…
▽ More
In communications theory, the capacity of multiple input multiple output-orthogonal frequency division multiplexing (MIMO-OFDM) systems is fundamentally determined by wireless channels, which exhibit both diversity and correlation in spatial, frequency and temporal domains. It is further envisioned to exploit the inherent nature of channels, namely representation, to achieve geolocation-based MIMO transmission for 6G, exemplified by the fully-decoupled radio access network (FD-RAN). Accordingly, this paper first employs self-supervised learning to obtain channel representation from unlabeled channel, then proposes a channel generation assisted approach for determining MIMO precoding matrix solely based on geolocation. Specifically, we exploit the small-scale temporal domain variations of channels at a fixed geolocation, and design an ingenious pretext task tailored for contrastive learning. Then, a Transformer-based encoder is trained to output channel representations. We further develop a conditional diffusion generator to generate channel representations from geolocation. Finally, a Transformer-encoder-based decoder is utilized to reconstruct channels from generated representations, where the optimal channel is selected for calculating the precoding matrix for both single and dual BS transmission. We conduct experiments on a public ray-tracing channel dataset, and the extensive simulation results demonstrate the effectiveness of our channel representation method, and also showcase the performance improvement in geolocation-based MIMO transmission.
△ Less
Submitted 23 May, 2024;
originally announced July 2024.
-
Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$
Authors:
M. Ablikim,
M. N. Achasov,
P. Adlarson,
O. Afedulidis,
X. C. Ai,
R. Aliberti,
A. Amoroso,
Q. An,
Y. Bai,
O. Bakina,
I. Balossino,
Y. Ban,
H. -R. Bao,
V. Batozskaya,
K. Begzsuren,
N. Berger,
M. Berlowski,
M. Bertani,
D. Bettoni,
F. Bianchi,
E. Bianco,
A. Bortone,
I. Boyko,
R. A. Briere,
A. Brueggemann
, et al. (645 additional authors not shown)
Abstract:
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be…
▽ More
The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Controllable Navigation Instruction Generation with Chain of Thought Prompting
Authors:
Xianghao Kong,
Jinyu Chen,
Wenguan Wang,
Hang Su,
Xiaolin Hu,
Yi Yang,
Si Liu
Abstract:
Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation…
▽ More
Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation environment. Leveraging the capabilities of Large Language Models (LLMs), we propose C-Instructor, which utilizes the chain-of-thought-style prompt for style-controllable and content-controllable instruction generation. Firstly, we propose a Chain of Thought with Landmarks (CoTL) mechanism, which guides the LLM to identify key landmarks and then generate complete instructions. CoTL renders generated instructions more accessible to follow and offers greater controllability over the manipulation of landmark objects. Furthermore, we present a Spatial Topology Modeling Task to facilitate the understanding of the spatial structure of the environment. Finally, we introduce a Style-Mixed Training policy, harnessing the prior knowledge of LLMs to enable style control for instruction generation based on different prompts within a single model instance. Extensive experiments demonstrate that instructions generated by C-Instructor outperform those generated by previous methods in text metrics, navigation guidance evaluation, and user studies.
△ Less
Submitted 16 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Dynamics of asymmetrically deformed skyrmion driven by internal forces and strain force in a flower-shaped magnetic nanostructure
Authors:
Zhen-Yu Tan,
Ji-Pei Chen,
Yu-Ke Shi,
Yuan Chen,
Ming-Hui Qin,
Xing-Sen Gao,
Jun-Ming Liu
Abstract:
Magnetic skyrmions emerge as promising quasi-particles for encoding information in nextgeneration spintronic devices. Their innate flexibility in shape is essential for the applications although they were often ideally treated as rigid particles. In this work, we investigated the voltagecontrolled uniform strain mediated dynamics of deformed skyrmions in heterostructures with a flower-shaped magne…
▽ More
Magnetic skyrmions emerge as promising quasi-particles for encoding information in nextgeneration spintronic devices. Their innate flexibility in shape is essential for the applications although they were often ideally treated as rigid particles. In this work, we investigated the voltagecontrolled uniform strain mediated dynamics of deformed skyrmions in heterostructures with a flower-shaped magnetic nanostructure, using micromagnetic simulations. The simulated results revealed the possible states of isolated skyrmion nucleated in the nanostructure, which can be mutually switched by applying suitable in-plane strain pulses. In addition, it was found that the skyrmion motions are driven by the emerging internal forces and strain force, which originate from the asymmetric deformation of skyrmion structures. Furthermore, an analytical model of deformed skyrmions was proposed to interpret the dependences of internal forces and strain force on the asymmetric deformation of skyrmion, with some formulae derived for these forces in a semi-analytical approach. Further calculations based on these formulae verified the forces appearing in the skyrmion motion, with the resulting forces showing consistence with the simulated data. This suggested that our semi-analytical model successfully captures the main physics responsible for the motion of deformed skyrmion in the nanostructure. Our work extends the understanding of the mechanics emerging in deformed skyrmion, and provides an effective approach for deterministic manipulation of deformed skyrmion motion via strain forces and internal forces, which may be instructive to design of skyrmion-based spintronic devices.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Exploring Camera Encoder Designs for Autonomous Driving Perception
Authors:
Barath Lakshmanan,
Joshua Chen,
Shiyi Lan,
Maying Shen,
Zhiding Yu,
Jose M. Alvarez
Abstract:
The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accur…
▽ More
The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design parameters including width and depth of the model, stage compute ratio, attention mechanisms, and input resolution, supported by systematic analysis to each modifications. This customization yields an architecture optimized for AV camera encoder achieving 8.79% mAP improvement over the baseline. We believe our effort could become a sweet cookbook of image encoders for AV and pave the way to the next-level drive system.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Toward Motion Robustness: A masked attention regularization framework in remote photoplethysmography
Authors:
Pengfei Zhao,
Qigong Sun,
Xiaolin Tian,
Yige Yang,
Shuo Tao,
Jie Cheng,
Jiantong Chen
Abstract:
There has been growing interest in facial video-based remote photoplethysmography (rPPG) measurement recently, with a focus on assessing various vital signs such as heart rate and heart rate variability. Despite previous efforts on static datasets, their approaches have been hindered by inaccurate region of interest (ROI) localization and motion issues, and have shown limited generalization in rea…
▽ More
There has been growing interest in facial video-based remote photoplethysmography (rPPG) measurement recently, with a focus on assessing various vital signs such as heart rate and heart rate variability. Despite previous efforts on static datasets, their approaches have been hindered by inaccurate region of interest (ROI) localization and motion issues, and have shown limited generalization in real-world scenarios. To address these challenges, we propose a novel masked attention regularization (MAR-rPPG) framework that mitigates the impact of ROI localization and complex motion artifacts. Specifically, our approach first integrates a masked attention regularization mechanism into the rPPG field to capture the visual semantic consistency of facial clips, while it also employs a masking technique to prevent the model from overfitting on inaccurate ROIs and subsequently degrading its performance. Furthermore, we propose an enhanced rPPG expert aggregation (EREA) network as the backbone to obtain rPPG signals and attention maps simultaneously. Our EREA network is capable of discriminating divergent attentions from different facial areas and retaining the consistency of spatiotemporal attention maps. For motion robustness, a simple open source detector MediaPipe for data preprocessing is sufficient for our framework due to its superior capability of rPPG signal extraction and attention regularization. Exhaustive experiments on three benchmark datasets (UBFC-rPPG, PURE, and MMPD) substantiate the superiority of our proposed method, outperforming recent state-of-the-art works by a considerable margin.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Vector meson's spin alignments in high energy reactions
Authors:
Jin-Hui Chen,
Zuo-Tang Liang,
Yu-Gang Ma,
Xin-Li Sheng,
Qun Wang
Abstract:
The global spin alignment of vector mesons has been observed by the STAR collaboration at the Relativistic Heavy Ion Collider (RHIC) at Brookhaven National Laboratory (BNL). It provides a unique opportunity to probe the correlation between the polarized quark and antiquark in the strongly coupled quark-gluon plasma (sQGP) produced in relativistic heavy ion collisions, opening a new window to explo…
▽ More
The global spin alignment of vector mesons has been observed by the STAR collaboration at the Relativistic Heavy Ion Collider (RHIC) at Brookhaven National Laboratory (BNL). It provides a unique opportunity to probe the correlation between the polarized quark and antiquark in the strongly coupled quark-gluon plasma (sQGP) produced in relativistic heavy ion collisions, opening a new window to explore the properties of sQGP. In addition, spin alignments of vector mesons have also been observed in other high-energy particle collisions. The results seem to be strongly dependent on the hadronization mechanism, so comprehensive studies are needed.In this article, we present a brief review of theoretical and experimental advances in the study of vector meson's spin alignments in a variety of high-energy particle collisions, with emphasis on hadronization mechanisms.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Vorticity blowup in 2D compressible Euler equations
Authors:
Jiajie Chen,
Giorgio Cialdea,
Steve Shkoller,
Vlad Vicol
Abstract:
We prove finite-time vorticity blowup for smooth solutions of the 2D compressible Euler equations with smooth, localized, and non-vacuous initial data. The vorticity blowup occurs at the time of the first singularity, and is accompanied by an axisymmetric implosion in which the swirl velocity enjoys full stability, as opposed to finite co-dimension stability.
We prove finite-time vorticity blowup for smooth solutions of the 2D compressible Euler equations with smooth, localized, and non-vacuous initial data. The vorticity blowup occurs at the time of the first singularity, and is accompanied by an axisymmetric implosion in which the swirl velocity enjoys full stability, as opposed to finite co-dimension stability.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
An "Okay" Method for Observing Solar Eclipses
Authors:
Peilong Wang,
Jingyuan Chen
Abstract:
Solar eclipses, as rare astronomical events, often evoke a profound sense of wonder and awe within the human spirit. However, for ordinary people, the extremely short preparation time, a few hours of notice from friends or social media, and the lack of observation equipment often hinder safe and effective eclipse viewing. Some individuals directly observe the sun with their naked eyes, risking vis…
▽ More
Solar eclipses, as rare astronomical events, often evoke a profound sense of wonder and awe within the human spirit. However, for ordinary people, the extremely short preparation time, a few hours of notice from friends or social media, and the lack of observation equipment often hinder safe and effective eclipse viewing. Some individuals directly observe the sun with their naked eyes, risking vision damage. To enable ordinary people to safely observe eclipses in very little preparation and reduce the risk of vision damage, we present a simple and safe method that almost anyone can use under very basic conditions, known as the "Okay" observation method.
△ Less
Submitted 8 April, 2024;
originally announced July 2024.
-
Towards SAR Automatic Target Recognition MultiCategory SAR Image Classification Based on Light Weight Vision Transformer
Authors:
Guibin Zhao,
Pengfei Li,
Zhibo Zhang,
Fusen Guo,
Xueting Huang,
Wei Xu,
Jinyin Wang,
Jianlong Chen
Abstract:
Synthetic Aperture Radar has been extensively used in numerous fields and can gather a wealth of information about the area of interest. This large scene data intensive technology puts a high value on automatic target recognition which can free the utilizers and boost the efficiency. Recent advances in artificial intelligence have made it possible to create a deep learning based SAR ATR that can a…
▽ More
Synthetic Aperture Radar has been extensively used in numerous fields and can gather a wealth of information about the area of interest. This large scene data intensive technology puts a high value on automatic target recognition which can free the utilizers and boost the efficiency. Recent advances in artificial intelligence have made it possible to create a deep learning based SAR ATR that can automatically identify target features from massive input data. In the last 6 years, intensive research has been conducted in this area, however, most papers in the current SAR ATR field used recurrent neural network and convolutional neural network varied models to deepen the regime's understanding of the SAR images. To equip SAR ATR with updated deep learning technology, this paper tries to apply a lightweight vision transformer based model to classify SAR images. The entire structure was verified by an open-accessed SAR data set and recognition results show that the final classification outcomes are robust and more accurate in comparison with referred traditional network structures without even using any convolutional layers.
△ Less
Submitted 9 July, 2024; v1 submitted 18 May, 2024;
originally announced July 2024.
-
Faraday laser pumped cesium beam clock
Authors:
Hangbo Shi,
Xiaomin Qin,
Haijun Chen,
Yufei Yan,
Ziqi Lu,
Zhiyang Wang,
Zijie Liu,
Xiaolei Guan,
Qiang Wei,
Tiantian Shi,
Jingbiao Chen
Abstract:
We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday lase…
▽ More
We realize a high-performance compact optically pumped cesium beam clock using Faraday laser simultaneously as pumping and detection lasers. The Faraday laser, which is frequency stabilized by modulation transfer spectroscopy (MTS) technique, has narrow linewidth and superior frequency stability. Measured by optical heterodyne method between two identical systems, the linewidth of the Faraday laser is 2.5 kHz after MTS locking, and the fractional frequency stability of the Faraday laser is optimized to $1.8\times{10}^{-12}/\sqrtτ$. Based on this high-performance Faraday laser, the cesium beam clock realizes a signal-to-noise ratio (SNR) in 1 Hz bandwidth of $39600$ when the cesium oven temperature is 130°C. Frequency-compared with Hydrogen maser, the fractional frequency stability of the Faraday laser pumped cesium beam clock can reach $1.3\times{10}^{-12}/\sqrtτ$ and drops to $1.4\times{10}^{-14}$ at 10000 s when the cesium oven temperature is 110°C. %, which is the best reported result compared with other cesium beam clocks. This Faraday laser pumped cesium beam clock demonstrates its excellent performance, and its great potential in the fields of timekeeping, navigation, and communication. Meanwhile, the Faraday laser, as a high-performance optical frequency standard, can also contribute to the development of other applications in quantum metrology, precision measurement and atomic physics.
△ Less
Submitted 11 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Towards Understanding the Bugs in Solidity Compiler
Authors:
Haoyang Ma,
Wuqi Zhang,
Qingchao Shen,
Yongqiang Tian,
Junjie Chen,
Shing-Chi Cheung
Abstract:
Solidity compiler plays a key role in enabling the development of smart contract applications on Ethereum by governing the syntax of a domain-specific language called Solidity and performing compilation and optimization of Solidity code. The correctness of Solidity compiler is critical in fostering transparency, efficiency, and trust in industries reliant on smart contracts. However, like other so…
▽ More
Solidity compiler plays a key role in enabling the development of smart contract applications on Ethereum by governing the syntax of a domain-specific language called Solidity and performing compilation and optimization of Solidity code. The correctness of Solidity compiler is critical in fostering transparency, efficiency, and trust in industries reliant on smart contracts. However, like other software systems, Solidity compiler is prone to bugs, which may produce incorrect bytecodes on blockchain platforms, resulting in severe security concerns. As a domain-specific compiler for smart contracts, Solidity compiler differs from other compilers in many perspectives, posing unique challenges to detect its bugs. To understand the bugs in Solidity compiler and benefit future research, in this paper, we present the first systematic study on 533 Solidity compiler bugs. We carefully examined their characteristics (including symptoms, root causes, and distribution), and their triggering test cases. Our study leads to seven bug-revealing takeaways for Solidity compiler. Moreover, to study the limitations of Solidity compiler fuzzers and bring our findings into practical scenarios, we evaluate three Solidity compiler fuzzers on our constructed benchmark. The results show that these fuzzers are inefficient in detecting Solidity compiler bugs. The inefficiency arises from their failure to consider the interesting bug-inducing features, bug-related compilation flags, and test oracles
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Affordances-Oriented Planning using Foundation Models for Continuous Vision-Language Navigation
Authors:
Jiaqi Chen,
Bingqian Lin,
Xinmin Liu,
Xiaodan Liang,
Kwan-Yee K. Wong
Abstract:
LLM-based agents have demonstrated impressive zero-shot performance in the vision-language navigation (VLN) task. However, these zero-shot methods focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in realistic navigation scenarios. To bridge this gap, we propose AO-Planner, a novel affordances-oriented pla…
▽ More
LLM-based agents have demonstrated impressive zero-shot performance in the vision-language navigation (VLN) task. However, these zero-shot methods focus only on solving high-level task planning by selecting nodes in predefined navigation graphs for movements, overlooking low-level control in realistic navigation scenarios. To bridge this gap, we propose AO-Planner, a novel affordances-oriented planning framework for continuous VLN task. Our AO-Planner integrates various foundation models to achieve affordances-oriented motion planning and action decision-making, both performed in a zero-shot manner. Specifically, we employ a visual affordances prompting (VAP) approach, where visible ground is segmented utilizing SAM to provide navigational affordances, based on which the LLM selects potential next waypoints and generates low-level path planning towards selected waypoints. We further introduce a high-level agent, PathAgent, to identify the most probable pixel-based path and convert it into 3D coordinates to fulfill low-level motion. Experimental results on the challenging R2R-CE benchmark demonstrate that AO-Planner achieves state-of-the-art zero-shot performance (5.5% improvement in SPL). Our method establishes an effective connection between LLM and 3D world to circumvent the difficulty of directly predicting world coordinates, presenting novel prospects for employing foundation models in low-level motion control.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Cervical Auscultation Machine Learning for Dysphagia Assessment
Authors:
An An Chia,
Stacy Lum,
Michelle Boo,
Rex Tan,
Balamurali B T,
Jer-Ming Chen
Abstract:
This study evaluates the use of machine learning, specifically the Random Forest Classifier, to differentiate normal and pathological swallowing sounds. Employing a commercially available wearable stethoscope, we recorded swallows from both healthy adults and patients with dysphagia. The analysis revealed statistically significant differences in acoustic features, such as spectral crest, and zero-…
▽ More
This study evaluates the use of machine learning, specifically the Random Forest Classifier, to differentiate normal and pathological swallowing sounds. Employing a commercially available wearable stethoscope, we recorded swallows from both healthy adults and patients with dysphagia. The analysis revealed statistically significant differences in acoustic features, such as spectral crest, and zero-crossing rate between normal and pathological swallows, while no discriminating differences were demonstrated between different fluidand diet consistencies. The system demonstrated fair sensitivity (mean plus or minus SD: 74% plus or minus 8%) and specificity (89% plus or minus 6%) for dysphagic swallows. The model attained an overall accuracy of 83% plus or minus 3%, and F1 score of 78% plus or minus 5%. These results demonstrate that machine learning can be a valuable tool in non-invasive dysphagia assessment, although challenges such as sampling rate limitations and variability in sensitivity and specificity in discriminating between normal and pathological sounds are noted. The study underscores the need for further research to optimize these techniques for clinical use.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Extraction of fissile isotope antineutrino spectra using feedforward neural network
Authors:
Jian Chen,
Jun Wang,
Wei Wang,
Yuehuan Wei
Abstract:
Precise measurement of antineutrino spectra produced by isotope fission in reactors is of great significance for studying neutrino oscillations, refining nuclear databases, and addressing the reactor antineutrino anomaly. This work reports a method utilizing a feedforward neural network (FNN) model to decompose the reconstructed measured prompt energy spectrum observed by a short-baseline reactor…
▽ More
Precise measurement of antineutrino spectra produced by isotope fission in reactors is of great significance for studying neutrino oscillations, refining nuclear databases, and addressing the reactor antineutrino anomaly. This work reports a method utilizing a feedforward neural network (FNN) model to decompose the reconstructed measured prompt energy spectrum observed by a short-baseline reactor neutrino experiment and extract the antineutrino spectra produced by the fission of major isotopes such as $^{235}$U, $^{238}$U, $^{239}$Pu, and $^{241}$Pu in a nuclear reactor. We present two training strategies for this model and compare them with the traditional $χ^2$ minimization method, analyzing the same set of pseudo-data for a total exposure of $(2.9\times 5\times 1800)~\rm{GW_{th}\cdot tons\cdot days}$. The results show that the FNN model not only converges faster and better during the fitting process but also achieves relative errors in the extracted spectra within 1\% in the $2-8$ MeV range, outperforming the $χ^2$ minimization method. The feasibility and superiority of this method have been validated in this study.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
LLM-Based Open-Domain Integrated Task and Knowledge Assistants with Programmable Policies
Authors:
Harshit Joshi,
Shicheng Liu,
James Chen,
Robert Weigle,
Monica S. Lam
Abstract:
Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses ("hallucinate"). Traditional dialogue trees can only handle a limited number of conversation flows, making th…
▽ More
Programming LLM-based knowledge and task assistants that faithfully conform to developer-provided policies is challenging. These agents must retrieve and provide consistent, accurate, and relevant information to address user's queries and needs. Yet such agents generate unfounded responses ("hallucinate"). Traditional dialogue trees can only handle a limited number of conversation flows, making them inherently brittle. To this end, we present KITA - a programmable framework for creating task-oriented conversational agents that are designed to handle complex user interactions. Unlike LLMs, KITA provides reliable grounded responses, with controllable agent policies through its expressive specification, KITA Worksheet. In contrast to dialog trees, it is resilient to diverse user queries, helpful with knowledge sources, and offers ease of programming policies through its declarative paradigm. Through a real-user study involving 62 participants, we show that KITA beats the GPT-4 with function calling baseline by 26.1, 22.5, and 52.4 points on execution accuracy, dialogue act accuracy, and goal completion rate, respectively. We also release 22 real-user conversations with KITA manually corrected to ensure accuracy.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Particle-In-Cell simulations of filamentation process in magnetized plasma of capacitively-coupled radio-frequency discharge
Authors:
Huidong Huang,
Jian Chen,
Zhibin Wang
Abstract:
In the uniform raido-frequency capacitively-coupled plasma (RF-CCP) between a large electrode pair, adding an axial magnetic field induces diverse longitudinal filaments. This phenomenon, termed 'filamentation', challenges conventional understanding and remains poorly understood to date. To reveal its pattern dynamics, we conduct 2D Particle-In-Cell simulations to comprehensively examine whole pro…
▽ More
In the uniform raido-frequency capacitively-coupled plasma (RF-CCP) between a large electrode pair, adding an axial magnetic field induces diverse longitudinal filaments. This phenomenon, termed 'filamentation', challenges conventional understanding and remains poorly understood to date. To reveal its pattern dynamics, we conduct 2D Particle-In-Cell simulations to comprehensively examine whole process of filamentation, identifying two distinct stages. Initially, standing waves grows with a modulational instability, forming regular filaments. Subsequently, when initial wavenumber matching relation breaks, the plasma shifts towards dynamic regime governed by competition between Lorentz and thermal pressure forces, characterized by filaments' chaotic evolution. These novel clues pave the way to theoretically understanding the filamentation instability, and provides essential references in effectively manipulating the magnetized plasmas.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.