-
New physical processes for extracting GPDs with a better sensitivity to partonic structure
Authors:
Jian-Wei Qiu,
Zhite Yu
Abstract:
We introduce a new type of exclusive processes for a better study of generalized parton distributions (GPDs), which we refer to as single-diffractive hard exclusive processes (SDHEPs). We advocate a two-stage framework for picturing SDHEPs based on the separation of scales, which gives a clear description both kinematically and dynamically. We examine the sensitivity of the SDHEP to the parton mom…
▽ More
We introduce a new type of exclusive processes for a better study of generalized parton distributions (GPDs), which we refer to as single-diffractive hard exclusive processes (SDHEPs). We advocate a two-stage framework for picturing SDHEPs based on the separation of scales, which gives a clear description both kinematically and dynamically. We examine the sensitivity of the SDHEP to the parton momentum fraction $x$-dependence of GPDs, and demonstrate it quantitatively with two specific processes that can be readily measured at J-PARC or AMBER using a pion beam and at JLab using a photon beam, respectively. Both processes are capable of providing enhanced sensitivity to the $x$-dependence, overcoming the problem of shadow GPDs, and disentangling different types of GPDs with various spin asymmetries.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
SpreadFGL: Edge-Client Collaborative Federated Graph Learning with Adaptive Neighbor Generation
Authors:
Luying Zhong,
Yueyang Pi,
Zheyi Chen,
Zhengxin Yu,
Wang Miao,
Xing Chen,
Geyong Min
Abstract:
Federated Graph Learning (FGL) has garnered widespread attention by enabling collaborative training on multiple clients for semi-supervised classification tasks. However, most existing FGL studies do not well consider the missing inter-client topology information in real-world scenarios, causing insufficient feature aggregation of multi-hop neighbor clients during model training. Moreover, the cla…
▽ More
Federated Graph Learning (FGL) has garnered widespread attention by enabling collaborative training on multiple clients for semi-supervised classification tasks. However, most existing FGL studies do not well consider the missing inter-client topology information in real-world scenarios, causing insufficient feature aggregation of multi-hop neighbor clients during model training. Moreover, the classic FGL commonly adopts the FedAvg but neglects the high training costs when the number of clients expands, resulting in the overload of a single edge server. To address these important challenges, we propose a novel FGL framework, named SpreadFGL, to promote the information flow in edge-client collaboration and extract more generalized potential relationships between clients. In SpreadFGL, an adaptive graph imputation generator incorporated with a versatile assessor is first designed to exploit the potential links between subgraphs, without sharing raw data. Next, a new negative sampling mechanism is developed to make SpreadFGL concentrate on more refined information in downstream tasks. To facilitate load balancing at the edge layer, SpreadFGL follows a distributed training manner that enables fast model convergence. Using real-world testbed and benchmark graph datasets, extensive experiments demonstrate the effectiveness of the proposed SpreadFGL. The results show that SpreadFGL achieves higher accuracy and faster convergence against state-of-the-art algorithms.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Sampling from the Random Linear Model via Stochastic Localization Up to the AMP Threshold
Authors:
Han Cui,
Zhiyuan Yu,
Jingbo Liu
Abstract:
The Approximate Message Passing (AMP) algorithm has garnered significant attention in recent years for solving linear inverse problems, particularly in the field of Bayesian inference for high-dimensional models. In this paper, we consider sampling from the posterior in the linear inverse problem, with an i.i.d. random design matrix. We develop a sampling algorithm by integrating the AMP algorithm…
▽ More
The Approximate Message Passing (AMP) algorithm has garnered significant attention in recent years for solving linear inverse problems, particularly in the field of Bayesian inference for high-dimensional models. In this paper, we consider sampling from the posterior in the linear inverse problem, with an i.i.d. random design matrix. We develop a sampling algorithm by integrating the AMP algorithm and stochastic localization. We give a proof for the convergence in smoothed KL divergence between the distribution of the samples generated by our algorithm and the target distribution, whenever the noise variance $Δ$ is below $Δ_{\rm AMP}$, which is the computation threshold for mean estimation introduced in (Barbier et al., 2020).
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
SpikeGS: 3D Gaussian Splatting from Spike Streams with High-Speed Camera Motion
Authors:
Jiyuan Zhang,
Kang Chen,
Shiyan Chen,
Yajing Zheng,
Tiejun Huang,
Zhaofei Yu
Abstract:
Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-wor…
▽ More
Novel View Synthesis plays a crucial role by generating new 2D renderings from multi-view images of 3D scenes. However, capturing high-speed scenes with conventional cameras often leads to motion blur, hindering the effectiveness of 3D reconstruction. To address this challenge, high-frame-rate dense 3D reconstruction emerges as a vital technique, enabling detailed and accurate modeling of real-world objects or scenes in various fields, including Virtual Reality or embodied AI. Spike cameras, a novel type of neuromorphic sensor, continuously record scenes with an ultra-high temporal resolution, showing potential for accurate 3D reconstruction. Despite their promise, existing approaches, such as applying Neural Radiance Fields (NeRF) to spike cameras, encounter challenges due to the time-consuming rendering process. To address this issue, we make the first attempt to introduce the 3D Gaussian Splatting (3DGS) into spike cameras in high-speed capture, providing 3DGS as dense and continuous clues of views, then constructing SpikeGS. Specifically, to train SpikeGS, we establish computational equations between the rendering process of 3DGS and the processes of instantaneous imaging and exposing-like imaging of the continuous spike stream. Besides, we build a very lightweight but effective mapping process from spikes to instant images to support training. Furthermore, we introduced a new spike-based 3D rendering dataset for validation. Extensive experiments have demonstrated our method possesses the high quality of novel view rendering, proving the tremendous potential of spike cameras in modeling 3D scenes.
△ Less
Submitted 13 July, 2024;
originally announced July 2024.
-
Crossed real nodal-line phonons in gold monobromide
Authors:
Yilin Han,
Yichen Liu,
Chaoxi Cui,
Cheng-Cheng Liu,
Zhi-Ming Yu
Abstract:
Spacetime inversion symmetry can generate intriguing types of spinless excitations in crystalline materials. Here, we propose a topological phase protected by spacetime inversion symmetry - the crossed real nodal line (RNL) in the phonon spectrum of gold monobromide (AuBr). In AuBr, there exist four straight nodal lines, which are linked by a crossed nodal line formed by two lower bands. Remarkabl…
▽ More
Spacetime inversion symmetry can generate intriguing types of spinless excitations in crystalline materials. Here, we propose a topological phase protected by spacetime inversion symmetry - the crossed real nodal line (RNL) in the phonon spectrum of gold monobromide (AuBr). In AuBr, there exist four straight nodal lines, which are linked by a crossed nodal line formed by two lower bands. Remarkably, each adjacent two of the four straight nodal lines is a pair, forming a crossed RNL with nontrivial real Chern number. Such configuration and pairing mode of RNL have never been reported. The crossed RNL exhibits unique surface and hinge states distinguished from that of the conventional RNLs. The symmetry protection and the transformation under the symmetry-preserving strain of the crossed RNL are also investigated. Our results open the door to a new class of topological states, and predict its realization in experimentally synthesized material.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Establishing Rigorous and Cost-effective Clinical Trials for Artificial Intelligence Models
Authors:
Wanling Gao,
Yunyou Huang,
Dandan Cui,
Zhuoming Yu,
Wenjing Liu,
Xiaoshuang Liang,
Jiahui Zhao,
Jiyue Xie,
Hao Li,
Li Ma,
Ning Ye,
Yumiao Kang,
Dingfeng Luo,
Peng Pan,
Wei Huang,
Zhongmou Liu,
Jizhong Hu,
Gangyuan Zhao,
Chongrong Jiang,
Fan Huang,
Tianyi Wei,
Suqin Tang,
Bingjie Xia,
Zhifei Zhang,
Jianfeng Zhan
Abstract:
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of cl…
▽ More
A profound gap persists between artificial intelligence (AI) and clinical practice in medicine, primarily due to the lack of rigorous and cost-effective evaluation methodologies. State-of-the-art and state-of-the-practice AI model evaluations are limited to laboratory studies on medical datasets or direct clinical trials with no or solely patient-centered controls. Moreover, the crucial role of clinicians in collaborating with AI, pivotal for determining its impact on clinical practice, is often overlooked. For the first time, we emphasize the critical necessity for rigorous and cost-effective evaluation methodologies for AI models in clinical practice, featuring patient/clinician-centered (dual-centered) AI randomized controlled trials (DC-AI RCTs) and virtual clinician-based in-silico trials (VC-MedAI) as an effective proxy for DC-AI RCTs. Leveraging 7500 diagnosis records from two-phase inaugural DC-AI RCTs across 14 medical centers with 125 clinicians, our results demonstrate the necessity of DC-AI RCTs and the effectiveness of VC-MedAI. Notably, VC-MedAI performs comparably to human clinicians, replicating insights and conclusions from prospective DC-AI RCTs. We envision DC-AI RCTs and VC-MedAI as pivotal advancements, presenting innovative and transformative evaluation methodologies for AI models in clinical practice, offering a preclinical-like setting mirroring conventional medicine, and reshaping development paradigms in a cost-effective and fast-iterative manner. Chinese Clinical Trial Registration: ChiCTR2400086816.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Model Tells You Where to Merge: Adaptive KV Cache Merging for LLMs on Long-Context Tasks
Authors:
Zheng Wang,
Boxiao Jin,
Zhongzhi Yu,
Minjia Zhang
Abstract:
How to efficiently serve Large Language Models (LLMs) has become a pressing issue because of their huge computational cost in their autoregressive generation process. To mitigate computational costs, LLMs often employ the KV Cache technique to improve the generation speed. While improving the computational efficiency, the storage requirements of the KV cache are substantial, particularly in long-c…
▽ More
How to efficiently serve Large Language Models (LLMs) has become a pressing issue because of their huge computational cost in their autoregressive generation process. To mitigate computational costs, LLMs often employ the KV Cache technique to improve the generation speed. While improving the computational efficiency, the storage requirements of the KV cache are substantial, particularly in long-context scenarios, leading to significant memory consumption. Existing KV cache eviction methods often degrade the performance of LLMs in long-context scenarios due to the information loss introduced by eviction. In this paper, we propose a novel KV cache merging approach, called KVMerger, to achieve adaptive KV cache compression for long-context tasks without significant performance degradation under constrained memory budgets. Our approach is inspired by the intriguing observation that key states exhibit high similarity at the token level within a single sequence. To facilitate merging, we develop an effective yet straightforward merging set identification algorithm to identify suitable KV states for merging. Our merging set identification algorithm stimulates the second observation that KV cache sparsity, from similarity perspective, is independent of the dataset and remains persistent at the model level. Subsequently, we propose a Gaussian kernel weighted merging algorithm to selectively merge all states within each merging set. We conduct extensive experiments to demonstrate the effectiveness of KVMerger for long-context tasks under constrained memory budgets, applying it to models including Llama2-7B-chat and Llama2-13B-chat. Using the LongBench and ZeroScroll benchmarks, we compare our method with other KV cache compression techniques, including H2O and CaM, showing that our method achieves superior performance across tasks with both 50% and 35% KV cache budgets.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Generalized Face Anti-spoofing via Finer Domain Partition and Disentangling Liveness-irrelevant Factors
Authors:
Jingyi Yang,
Zitong Yu,
Xiuming Ni,
Jia He,
Hui Li
Abstract:
Face anti-spoofing techniques based on domain generalization have recently been studied widely. Adversarial learning and meta-learning techniques have been adopted to learn domain-invariant representations. However, prior approaches often consider the dataset gap as the primary factor behind domain shifts. This perspective is not fine-grained enough to reflect the intrinsic gap among the data accu…
▽ More
Face anti-spoofing techniques based on domain generalization have recently been studied widely. Adversarial learning and meta-learning techniques have been adopted to learn domain-invariant representations. However, prior approaches often consider the dataset gap as the primary factor behind domain shifts. This perspective is not fine-grained enough to reflect the intrinsic gap among the data accurately. In our work, we redefine domains based on identities rather than datasets, aiming to disentangle liveness and identity attributes. We emphasize ignoring the adverse effect of identity shift, focusing on learning identity-invariant liveness representations through orthogonalizing liveness and identity features. To cope with style shifts, we propose Style Cross module to expand the stylistic diversity and Channel-wise Style Attention module to weaken the sensitivity to style shifts, aiming to learn robust liveness representations. Furthermore, acknowledging the asymmetry between live and spoof samples, we introduce a novel contrastive loss, Asymmetric Augmented Instance Contrast. Extensive experiments on four public datasets demonstrate that our method achieves state-of-the-art performance under cross-dataset and limited source dataset scenarios. Additionally, our method has good scalability when expanding diversity of identities. The codes will be released soon.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis
Authors:
Wanggui He,
Siming Fu,
Mushui Liu,
Xierui Wang,
Wenyi Xiao,
Fangxun Shu,
Yi Wang,
Lei Zhang,
Zhelun Yu,
Haoyuan Li,
Ziwei Huang,
LeiLei Gan,
Hao Jiang
Abstract:
Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by in…
▽ More
Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information, freezing the textual component while fine-tuning the visual component. This methodology preserves the NLP capabilities of LLMs while imbuing them with exceptional visual understanding. Building upon the powerful base of the pre-trained Qwen-7B, MARS stands out with its bilingual generative capabilities corresponding to both English and Chinese language prompts and the capacity for joint image and text generation. The flexibility of this framework lends itself to migration towards any-to-any task adaptability. Furthermore, MARS employs a multi-stage training strategy that first establishes robust image-text alignment through complementary bidirectional tasks and subsequently concentrates on refining the T2I generation process, significantly augmenting text-image synchrony and the granularity of image details. Notably, MARS requires only 9% of the GPU days needed by SD1.5, yet it achieves remarkable results across a variety of benchmarks, illustrating the training efficiency and the potential for swift deployment in various applications.
△ Less
Submitted 11 July, 2024; v1 submitted 10 July, 2024;
originally announced July 2024.
-
Exploring Camera Encoder Designs for Autonomous Driving Perception
Authors:
Barath Lakshmanan,
Joshua Chen,
Shiyi Lan,
Maying Shen,
Zhiding Yu,
Jose M. Alvarez
Abstract:
The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accur…
▽ More
The cornerstone of autonomous vehicles (AV) is a solid perception system, where camera encoders play a crucial role. Existing works usually leverage pre-trained Convolutional Neural Networks (CNN) or Vision Transformers (ViTs) designed for general vision tasks, such as image classification, segmentation, and 2D detection. Although those well-known architectures have achieved state-of-the-art accuracy in AV-related tasks, e.g., 3D Object Detection, there remains significant potential for improvement in network design due to the nuanced complexities of industrial-level AV dataset. Moreover, existing public AV benchmarks usually contain insufficient data, which might lead to inaccurate evaluation of those architectures.To reveal the AV-specific model insights, we start from a standard general-purpose encoder, ConvNeXt and progressively transform the design. We adjust different design parameters including width and depth of the model, stage compute ratio, attention mechanisms, and input resolution, supported by systematic analysis to each modifications. This customization yields an architecture optimized for AV camera encoder achieving 8.79% mAP improvement over the baseline. We believe our effort could become a sweet cookbook of image encoders for AV and pave the way to the next-level drive system.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
LIONs: An Empirically Optimized Approach to Align Language Models
Authors:
Xiao Yu,
Qingyang Wu,
Yu Li,
Zhou Yu
Abstract:
Alignment is a crucial step to enhance the instruction-following and conversational abilities of language models. Despite many recent work proposing new algorithms, datasets, and training pipelines, there is a lack of comprehensive studies measuring the impact of various design choices throughout the whole training process. We first conduct a rigorous analysis over a three-stage training pipeline…
▽ More
Alignment is a crucial step to enhance the instruction-following and conversational abilities of language models. Despite many recent work proposing new algorithms, datasets, and training pipelines, there is a lack of comprehensive studies measuring the impact of various design choices throughout the whole training process. We first conduct a rigorous analysis over a three-stage training pipeline consisting of supervised fine-tuning, offline preference learning, and online preference learning. We have found that using techniques like sequence packing, loss masking in SFT, increasing the preference dataset size in DPO, and online DPO training can significantly improve the performance of language models. We then train from Gemma-2b-base and LLama-3-8b-base, and find that our best models exceed the performance of the official instruct models tuned with closed-source data and algorithms. Our code and models can be found at https://github.com/Columbia-NLP-Lab/LionAlignment.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
CrowdTransfer: Enabling Crowd Knowledge Transfer in AIoT Community
Authors:
Yan Liu,
Bin Guo,
Nuo Li,
Yasan Ding,
Zhouyangzi Zhang,
Zhiwen Yu
Abstract:
Artificial Intelligence of Things (AIoT) is an emerging frontier based on the deep fusion of Internet of Things (IoT) and Artificial Intelligence (AI) technologies. Although advanced deep learning techniques enhance the efficient data processing and intelligent analysis of complex IoT data, they still suffer from notable challenges when deployed to practical AIoT applications, such as constrained…
▽ More
Artificial Intelligence of Things (AIoT) is an emerging frontier based on the deep fusion of Internet of Things (IoT) and Artificial Intelligence (AI) technologies. Although advanced deep learning techniques enhance the efficient data processing and intelligent analysis of complex IoT data, they still suffer from notable challenges when deployed to practical AIoT applications, such as constrained resources, and diverse task requirements. Knowledge transfer is an effective method to enhance learning performance by avoiding the exorbitant costs associated with data recollection and model retraining. Notably, although there are already some valuable and impressive surveys on transfer learning, these surveys introduce approaches in a relatively isolated way and lack the recent advances of various knowledge transfer techniques for AIoT field. This survey endeavors to introduce a new concept of knowledge transfer, referred to as Crowd Knowledge Transfer (CrowdTransfer), which aims to transfer prior knowledge learned from a crowd of agents to reduce the training cost and as well as improve the performance of the model in real-world complicated scenarios. Particularly, we present four transfer modes from the perspective of crowd intelligence, including derivation, sharing, evolution and fusion modes. Building upon conventional transfer learning methods, we further delve into advanced crowd knowledge transfer models from three perspectives for various AIoT applications. Furthermore, we explore some applications of AIoT areas, such as human activity recognition, urban computing, multi-robot system, and smart factory. Finally, we discuss the open issues and outline future research directions of knowledge transfer in AIoT community.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
White Paper on Polarized Target Studies with Real Photons in Hall D
Authors:
F. Afzal,
M. M. Dalton,
A. Deur,
P. Hurck,
C. D. Keith,
V. Mathieu,
S. Sirca,
Z. Yu
Abstract:
This white paper summarizes the Workshop on Polarized Target Studies with Real Photons in Hall D at Jefferson Lab, that took place on 21 February 2024. The Workshop included about 45 participants both online and in person at Florida State University in Tallahassee. Contributions describe the experimental infrastructure available in Hall D and potential physics applications. The rate and detection…
▽ More
This white paper summarizes the Workshop on Polarized Target Studies with Real Photons in Hall D at Jefferson Lab, that took place on 21 February 2024. The Workshop included about 45 participants both online and in person at Florida State University in Tallahassee. Contributions describe the experimental infrastructure available in Hall D and potential physics applications. The rate and detection capabilities of Hall D are outlined, as well as the properties of a circularly polarized photon beam and a polarized target. Possible physics measurements include light and strange quark baryon spectroscopy, the GDH sum rule, proton structure accessed through measurement of Generalized Parton Distributions and modification of nucleon structure within the nuclear medium.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Hierarchical Decoupling Capacitor Optimization for Power Distribution Network of 2.5D ICs with Co-Analysis of Frequency and Time Domains Based on Deep Reinforcement Learning
Authors:
Yuanyuan Duan,
Haiyang Feng,
Zhiping Yu,
Hanming Wu,
Leilai Shao,
Xiaolei Zhu
Abstract:
With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on…
▽ More
With the growing need for higher memory bandwidth and computation density, 2.5D design, which involves integrating multiple chiplets onto an interposer, emerges as a promising solution. However, this integration introduces significant challenges due to increasing data rates and a large number of I/Os, necessitating advanced optimization of the power distribution networks (PDNs) both on-chip and on-interposer to mitigate the small signal noise and simultaneous switching noise (SSN). Traditional PDN optimization strategies in 2.5D systems primarily focus on reducing impedance by integrating decoupling capacitors (decaps) to lessen small signal noises. Unfortunately, relying solely on frequency-domain analysis has been proven inadequate for addressing coupled SSN, as indicated by our experimental results. In this work, we introduce a novel two-phase optimization flow using deep reinforcement learning to tackle both the on-chip small signal noise and SSN. Initially, we optimize the impedance in the frequency domain to maintain the small signal noise within acceptable limits while avoiding over-design. Subsequently, in the time domain, we refine the PDN to minimize the voltage violation integral (VVI), a more accurate measure of SSN severity. To the best of our knowledge, this is the first dual-domain optimization strategy that simultaneously addresses both the small signal noise and SSN propagation through strategic decap placement in on-chip and on-interposer PDNs, offering a significant step forward in the design of robust PDNs for 2.5D integrated systems.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Rethinking the fundamental performance limits of integrated sensing and communication systems
Authors:
Zhouyuan Yu,
Xiaoling Hu,
Chenxi Liu,
Mugen Peng
Abstract:
Integrated sensing and communication (ISAC) has been recognized as a key enabler and feature of future wireless networks. In the existing works analyzing the performances of ISAC, discrete-time systems were commonly assumed, which, however, overlooked the impacts of temporal, spectral, and spatial properties. To address this issue, we establish a unified information model for the band-limited cont…
▽ More
Integrated sensing and communication (ISAC) has been recognized as a key enabler and feature of future wireless networks. In the existing works analyzing the performances of ISAC, discrete-time systems were commonly assumed, which, however, overlooked the impacts of temporal, spectral, and spatial properties. To address this issue, we establish a unified information model for the band-limited continuous-time ISAC systems. In the established information model, we employ a novel sensing performance metric, called the sensing mutual information (SMI). Through analysis, we show how the SMI can be utilized as a bridge between the mutual information domain and the mean squared error (MSE) domain. In addition, we illustrate the communication mutual information (CMI)-SMI and CMI-MSE regions to identify the performance bounds of ISAC systems in practical settings and reveal the trade-off between communication and sensing performances. Moreover, via analysis and numerical results, we provide two valuable insights into the design of novel ISAC-enabled systems: i) communication prefers the waveforms of random amplitude, sensing prefers the waveforms of constant amplitude, both communication and sensing favor the waveforms of low correlations with random phases; ii) There exists a linear positive proportional relationship between the allocated time-frequency resource and the achieved communication rate/sensing MSE.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
MobileExperts: A Dynamic Tool-Enabled Agent Team in Mobile Devices
Authors:
Jiayi Zhang,
Chuang Zhao,
Yihan Zhao,
Zhaoyang Yu,
Ming He,
Jianping Fan
Abstract:
The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement…
▽ More
The attainment of autonomous operations in mobile computing devices has consistently been a goal of human pursuit. With the development of Large Language Models (LLMs) and Visual Language Models (VLMs), this aspiration is progressively turning into reality. While contemporary research has explored automation of simple tasks on mobile devices via VLMs, there remains significant room for improvement in handling complex tasks and reducing high reasoning costs. In this paper, we introduce MobileExperts, which for the first time introduces tool formulation and multi-agent collaboration to address the aforementioned challenges. More specifically, MobileExperts dynamically assembles teams based on the alignment of agent portraits with the human requirements. Following this, each agent embarks on an independent exploration phase, formulating its tools to evolve into an expert. Lastly, we develop a dual-layer planning mechanism to establish coordinate collaboration among experts. To validate our effectiveness, we design a new benchmark of hierarchical intelligence levels, offering insights into algorithm's capability to address tasks across a spectrum of complexity. Experimental results demonstrate that MobileExperts performs better on all intelligence levels and achieves ~ 22% reduction in reasoning costs, thus verifying the superiority of our design.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Detection and Multi-Parameter Estimation for NLOS Targets: An IRS-assisted Framework
Authors:
Zhouyuan Yu,
Xiaoling Hu,
Chenxi Liu,
Qin Tao,
Mugen Peng
Abstract:
Intelligent reflecting surface (IRS) has the potential to enhance sensing performance, due to its capability of reshaping the echo signals. Different from the existing literature, which has commonly focused on IRS beamforming optimization, in this paper, we pay special attention to designing effective signal processing approaches to extract sensing information from IRS-reshaped echo signals. To th…
▽ More
Intelligent reflecting surface (IRS) has the potential to enhance sensing performance, due to its capability of reshaping the echo signals. Different from the existing literature, which has commonly focused on IRS beamforming optimization, in this paper, we pay special attention to designing effective signal processing approaches to extract sensing information from IRS-reshaped echo signals. To this end, we investigate an IRS-assisted non-line-of-sight (NLOS) target detection and multi-parameter estimation problem in orthogonal frequency division multiplexing (OFDM) systems. To address this problem, we first propose a novel detection and direction estimation framework, including a low-overhead hierarchical codebook that allows the IRS to generate three-dimensional beams with adjustable beam direction and width, a delay spectrum peak-based beam training scheme for detection and direction estimation, and a beam refinement scheme for further enhancing the accuracy of the direction estimation. Then, we propose a target range and velocity estimation scheme by extracting the delay-Doppler information from the IRS-reshaped echo signals. Numerical results demonstrate that the proposed schemes can achieve 99.7% target detection rate, a 10^{-3}-rad level direction estimation accuracy, and a 10^{-6}-m/10^{-5}-m/s level range/velocity estimation accuracy.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
A new subclass of gamma-ray burst originating from compact binary merger
Authors:
Chen-Wei Wang,
Wen-Jun Tan,
Shao-Lin Xiong,
Shu-Xu Yi,
Rahim Moradi,
Bing Li,
Zhen Zhang,
Yu Wang,
Yan-Zhi Meng,
Jia-Cong Liu,
Yue Wang,
Sheng-Lun Xie,
Wang-Chen Xue,
Zheng-Hang Yu,
Peng Zhang,
Wen-Long Zhang,
Yan-Qiu Zhang,
Chao Zheng
Abstract:
Type I gamma-ray bursts (GRBs) are believed to originate from compact binary merger usually with duration less than 2 seconds for the main emission. However, recent observations of GRB 211211A and GRB 230307A indicate that some merger-origin GRBs could last much longer. Since they show strikingly similar properties (indicating a common mechanism) which are different from the classic "long"-short b…
▽ More
Type I gamma-ray bursts (GRBs) are believed to originate from compact binary merger usually with duration less than 2 seconds for the main emission. However, recent observations of GRB 211211A and GRB 230307A indicate that some merger-origin GRBs could last much longer. Since they show strikingly similar properties (indicating a common mechanism) which are different from the classic "long"-short burst (e.g. GRB 060614), forming an interesting subclass of type I GRBs, we suggest to name them as type IL GRBs. By identifying the first peak of GRB 230307A as a quasi-thermal precursor, we find that the prompt emission of type IL GRB is composed of three episodes: (1) a precursor followed by a short quiescent (or weak emission) period, (2) a long-duration main emission, and (3) an extended emission. With this burst pattern, a good candidate, GRB 170228A, was found in the Fermi/GBM archive data, and subsequent temporal and spectral analyses indeed show that GRB 170228A falls in the same cluster with GRB 211211A and GRB 230307A in many diagnostic figures. Thus this burst pattern could be a good reference for rapidly identifying type IL GRB and conducting low-latency follow-up observation. We estimated the occurrence rate and discussed the physical origins and implications for the three emission episodes of type IL GRBs. Our analysis suggests the pre-merger precursor model, especially the super flare model, is more favored for type IL GRBs.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Chemical Shift Encoding based Double Bonds Quantification in Triglycerides using Deep Image Prior
Authors:
Chaoxing Huang,
Ziqiang Yu,
Zijian Gao,
Qiuyi Shen,
Queenie Chan,
Vincent Wai-Sun Wong,
Winnie Chiu-Wing Chu,
Weitian Chen
Abstract:
This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results s…
▽ More
This study evaluated a deep learning-based method using Deep Image Prior (DIP) to quantify triglyceride double bonds from chemical-shift encoded multi-echo gradient echo images without network training. We employed a cost function based on signal constraints to iteratively update the neural network on a single dataset. The method was validated using phantom experiments and in vivo scans. Results showed close alignment between measured and reference double bond values, with phantom experiments yielding a Pearson correlation coefficient of 0.96 (p = .0005). In vivo results demonstrated good agreement in subcutaneous fat. We conclude that Deep Image Prior shows feasibility for quantifying double bonds and fatty acid content from chemical-shift encoded multi-echo MRI.
△ Less
Submitted 3 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
MG-Verilog: Multi-grained Dataset Towards Enhanced LLM-assisted Verilog Generation
Authors:
Yongan Zhang,
Zhongzhi Yu,
Yonggan Fu,
Cheng Wan,
Yingyan Celine Lin
Abstract:
Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing d…
▽ More
Large Language Models (LLMs) have recently shown promise in streamlining hardware design processes by encapsulating vast amounts of domain-specific data. In addition, they allow users to interact with the design processes through natural language instructions, thus making hardware design more accessible to developers. However, effectively leveraging LLMs in hardware design necessitates providing domain-specific data during inference (e.g., through in-context learning), fine-tuning, or pre-training. Unfortunately, existing publicly available hardware datasets are often limited in size, complexity, or detail, which hinders the effectiveness of LLMs in hardware design tasks. To address this issue, we first propose a set of criteria for creating high-quality hardware datasets that can effectively enhance LLM-assisted hardware design. Based on these criteria, we propose a Multi-Grained-Verilog (MG-Verilog) dataset, which encompasses descriptions at various levels of detail and corresponding code samples. To benefit the broader hardware design community, we have developed an open-source infrastructure that facilitates easy access, integration, and extension of the dataset to meet specific project needs. Furthermore, to fully exploit the potential of the MG-Verilog dataset, which varies in complexity and detail, we introduce a balanced fine-tuning scheme. This scheme serves as a unique use case to leverage the diverse levels of detail provided by the dataset. Extensive experiments demonstrate that the proposed dataset and fine-tuning scheme consistently improve the performance of LLMs in hardware design tasks.
△ Less
Submitted 3 July, 2024; v1 submitted 1 July, 2024;
originally announced July 2024.
-
Reconfigurable Intelligent Computational Surfaces for MEC-Assisted Autonomous Driving Networks: Design Optimization and Analysis
Authors:
Xueyao Zhang,
Bo Yang,
Zhiwen Yu,
Xuelin Cao,
George C. Alexandropoulos,
Yan Zhang,
Merouane Debbah,
Chau Yuen
Abstract:
This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can…
▽ More
This paper investigates autonomous driving safety improvement via task offloading from cellular vehicles (CVs) to a multi-access edge computing (MEC) server using vehicle-to-infrastructure (V2I) links. Considering that the latter links can be reused by vehicle-to-vehicle (V2V) communications to improve spectrum utilization, the receiver of the V2I link may suffer from severe interference that can cause outages during the task offloading. To tackle this issue, we propose the deployment of a reconfigurable intelligent computational surface (RICS) whose computationally capable metamaterials are leveraged to jointly enable V2I reflective links as well as to implement interference cancellation at the V2V links. We devise a joint optimization formulation for the task offloading ratio between the CVs and the MEC server, the spectrum sharing strategy between V2V and V2I communications, as well as the RICS reflection and refraction matrices to maximize an autonomous driving safety task. Due to the non-convexity of the problem and the coupling among its free variables, we transform it into a more tractable equivalent form, which is then decomposed into three sub-problems solved via an alternate approximation method. Our simulation results showcase that the proposed RICS-assisted offloading framework significantly improves the safety of the considered autonomous driving network, yielding a nearly 34\% improvement in the safety coefficient of the CVs. In addition, it is demonstrated that the V2V data rate can be improved by around 60\% indicating that the RICS-induced adjustment of the signals can effectively mitigate interference at the V2V link.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
AdaBridge: Dynamic Data and Computation Reuse for Efficient Multi-task DNN Co-evolution in Edge Systems
Authors:
Lehao Wang,
Zhiwen Yu,
Sicong Liu,
Chenshu Wu,
Xiangrui Xu,
Bin Guo
Abstract:
Running multi-task DNNs on mobiles is an emerging trend for various applications like autonomous driving and mobile NLP. Mobile DNNs are often compressed to fit the limited resources and thus suffer from degraded accuracy and generalizability due to data drift. DNN evolution, e.g., continuous learning and domain adaptation, has been demonstrated effective in overcoming these issues, mostly for sin…
▽ More
Running multi-task DNNs on mobiles is an emerging trend for various applications like autonomous driving and mobile NLP. Mobile DNNs are often compressed to fit the limited resources and thus suffer from degraded accuracy and generalizability due to data drift. DNN evolution, e.g., continuous learning and domain adaptation, has been demonstrated effective in overcoming these issues, mostly for single-task DNN, leaving multi-task DNN evolution an important yet open challenge. To fill up this gap, we propose AdaBridge, which exploits computational redundancies in multi-task DNNs as a unique opportunity for dynamic data and computation reuse, thereby improving training efficacy and resource efficiency among asynchronous multi-task co-evolution in edge systems. Experimental evaluation shows that AdaBridge achieves 11% average accuracy gain upon individual evolution baselines.
△ Less
Submitted 2 May, 2024;
originally announced July 2024.
-
GM-DF: Generalized Multi-Scenario Deepfake Detection
Authors:
Yingxin Lai,
Zitong Yu,
Jing Yang,
Bin Li,
Xiangui Kang,
Linlin Shen
Abstract:
Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of de…
▽ More
Existing face forgery detection usually follows the paradigm of training models in a single domain, which leads to limited generalization capacity when unseen scenarios and unknown attacks occur. In this paper, we elaborately investigate the generalization capacity of deepfake detection models when jointly trained on multiple face forgery detection datasets. We first find a rapid degradation of detection accuracy when models are directly trained on combined datasets due to the discrepancy across collection scenarios and generation methods. To address the above issue, a Generalized Multi-Scenario Deepfake Detection framework (GM-DF) is proposed to serve multiple real-world scenarios by a unified model. First, we propose a hybrid expert modeling approach for domain-specific real/forgery feature extraction. Besides, as for the commonality representation, we use CLIP to extract the common features for better aligning visual and textual features across domains. Meanwhile, we introduce a masked image reconstruction mechanism to force models to capture rich forged details. Finally, we supervise the models via a domain-aware meta-learning strategy to further enhance their generalization capacities. Specifically, we design a novel domain alignment loss to strongly align the distributions of the meta-test domains and meta-train domains. Thus, the updated models are able to represent both specific and common real/forgery features across multiple datasets. In consideration of the lack of study of multi-dataset training, we establish a new benchmark leveraging multi-source data to fairly evaluate the models' generalization capacity on unseen scenarios. Both qualitative and quantitative experiments on five datasets conducted on traditional protocols as well as the proposed benchmark demonstrate the effectiveness of our approach.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
DECOR: Improving Coherence in L2 English Writing with a Novel Benchmark for Incoherence Detection, Reasoning, and Rewriting
Authors:
Xuanming Zhang,
Anthony Diaz,
Zixun Chen,
Qingyang Wu,
Kun Qian,
Erik Voss,
Zhou Yu
Abstract:
Coherence in writing, an aspect that second-language (L2) English learners often struggle with, is crucial in assessing L2 English writing. Existing automated writing evaluation systems primarily use basic surface linguistic features to detect coherence in writing. However, little effort has been made to correct the detected incoherence, which could significantly benefit L2 language learners seeki…
▽ More
Coherence in writing, an aspect that second-language (L2) English learners often struggle with, is crucial in assessing L2 English writing. Existing automated writing evaluation systems primarily use basic surface linguistic features to detect coherence in writing. However, little effort has been made to correct the detected incoherence, which could significantly benefit L2 language learners seeking to improve their writing. To bridge this gap, we introduce DECOR, a novel benchmark that includes expert annotations for detecting incoherence in L2 English writing, identifying the underlying reasons, and rewriting the incoherent sentences. To our knowledge, DECOR is the first coherence assessment dataset specifically designed for improving L2 English writing, featuring pairs of original incoherent sentences alongside their expert-rewritten counterparts. Additionally, we fine-tuned models to automatically detect and rewrite incoherence in student essays. We find that incorporating specific reasons for incoherence during fine-tuning consistently improves the quality of the rewrites, achieving a result that is favored in both automatic and human evaluations.
△ Less
Submitted 28 June, 2024;
originally announced June 2024.
-
Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights
Authors:
Zeqin Yang,
Weilin Chen,
Ruichu Cai,
Yuguang Yan,
Zhifeng Hao,
Zhipeng Yu,
Zhichao Zou,
Zhen Peng,
Jiecheng Guo
Abstract:
Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we…
▽ More
Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we address a more general problem of estimating the long-term heterogeneous dose-response curve (HDRC) while accounting for unobserved confounders. Specifically, to remove unobserved confounding in observational data, we introduce an optimal transport weighting framework to align the observational data to the experimental data with theoretical guarantees. Furthermore,to accurately predict the heterogeneous effects of continuous treatment, we establish a generalization bound on counterfactual prediction error by leveraging the reweighted distribution induced by optimal transport. Finally, we develop an HDRC estimator building upon the above theoretical foundations. Extensive experimental studies conducted on multiple synthetic and semi-synthetic datasets demonstrate the effectiveness of our proposed method.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Application of Multimodal Fusion Deep Learning Model in Disease Recognition
Authors:
Xiaoyi Liu,
Hongjie Qiu,
Muqing Li,
Zhou Yu,
Yutian Yang,
Yafeng Yan
Abstract:
This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transform…
▽ More
This paper introduces an innovative multi-modal fusion deep learning approach to overcome the drawbacks of traditional single-modal recognition techniques. These drawbacks include incomplete information and limited diagnostic accuracy. During the feature extraction stage, cutting-edge deep learning models including convolutional neural networks (CNN), recurrent neural networks (RNN), and transformers are applied to distill advanced features from image-based, temporal, and structured data sources. The fusion strategy component seeks to determine the optimal fusion mode tailored to the specific disease recognition task. In the experimental section, a comparison is made between the performance of the proposed multi-mode fusion model and existing single-mode recognition methods. The findings demonstrate significant advantages of the multimodal fusion model across multiple evaluation metrics.
△ Less
Submitted 22 May, 2024;
originally announced June 2024.
-
Multilingual Knowledge Graph Completion from Pretrained Language Models with Knowledge Constraints
Authors:
Ran Song,
Shizhu He,
Shengxiang Gao,
Li Cai,
Kang Liu,
Zhengtao Yu,
Jun Zhao
Abstract:
Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages…
▽ More
Multilingual Knowledge Graph Completion (mKGC) aim at solving queries like (h, r, ?) in different languages by reasoning a tail entity t thus improving multilingual knowledge graphs. Previous studies leverage multilingual pretrained language models (PLMs) and the generative paradigm to achieve mKGC. Although multilingual pretrained language models contain extensive knowledge of different languages, its pretraining tasks cannot be directly aligned with the mKGC tasks. Moreover, the majority of KGs and PLMs currently available exhibit a pronounced English-centric bias. This makes it difficult for mKGC to achieve good results, particularly in the context of low-resource languages. To overcome previous problems, this paper introduces global and local knowledge constraints for mKGC. The former is used to constrain the reasoning of answer entities, while the latter is used to enhance the representation of query contexts. The proposed method makes the pretrained model better adapt to the mKGC task. Experimental results on public datasets demonstrate that our method outperforms the previous SOTA on Hits@1 and Hits@10 by an average of 12.32% and 16.03%, which indicates that our proposed method has significant enhancement on mKGC.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
Filtering Reconfigurable Intelligent Computational Surface for RF Spectrum Purification
Authors:
Kaining Wang,
Bo Yang,
Zhiwen Yu,
Xuelin Cao,
Mérouane Debbah,
Chau Yuen
Abstract:
The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-b…
▽ More
The increasing demand for communication is degrading the electromagnetic (EM) transmission environment due to severe EM interference, significantly reducing the efficiency of the radio frequency (RF) spectrum. Metasurfaces, a promising technology for controlling desired EM waves, have recently received significant attention from both academia and industry. However, the potential impact of out-of-band signals has been largely overlooked, leading to RF spectrum pollution and degradation of wireless transmissions. To address this issue, we propose a novel surface structure called the Filtering Reconfigurable Intelligent Computational Surface (FRICS). We introduce two types of FRICS structures: one that dynamically reflects resonance band signals through a tunable spatial filter while absorbing out-of-band signals using metamaterials and the other one that dynamically amplifies in-band signals using computational metamaterials while reflecting out-of-band signals. To evaluate the performance of FRICS, we implement it in device-to-device (D2D) communication and vehicular-to-everything (V2X) scenarios. The experiments demonstrate the superiority of FRICS in signal-to-interference-noise ratio (SINR) and energy efficiency (EE). Finally, we discuss the critical challenges faced and promising techniques for implementing FRICS in future wireless systems.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
DICE: End-to-end Deformation Capture of Hand-Face Interactions from a Single Image
Authors:
Qingxuan Wu,
Zhiyang Dou,
Sirui Xu,
Soshi Shimada,
Chen Wang,
Zhengming Yu,
Yuan Liu,
Cheng Lin,
Zeyu Cao,
Taku Komura,
Vladislav Golyanik,
Christian Theobalt,
Wenping Wang,
Lingjie Liu
Abstract:
Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand…
▽ More
Reconstructing 3D hand-face interactions with deformations from a single image is a challenging yet crucial task with broad applications in AR, VR, and gaming. The challenges stem from self-occlusions during single-view hand-face interactions, diverse spatial relationships between hands and face, complex deformations, and the ambiguity of the single-view setting. The first and only method for hand-face interaction recovery, Decaf, introduces a global fitting optimization guided by contact and deformation estimation networks trained on studio-collected data with 3D annotations. However, Decaf suffers from a time-consuming optimization process and limited generalization capability due to its reliance on 3D annotations of hand-face interaction data. To address these issues, we present DICE, the first end-to-end method for Deformation-aware hand-face Interaction reCovEry from a single image. DICE estimates the poses of hands and faces, contacts, and deformations simultaneously using a Transformer-based architecture. It features disentangling the regression of local deformation fields and global mesh vertex locations into two network branches, enhancing deformation and contact estimation for precise and robust hand-face mesh recovery. To improve generalizability, we propose a weakly-supervised training approach that augments the training set using in-the-wild images without 3D ground-truth annotations, employing the depths of 2D keypoints estimated by off-the-shelf models and adversarial priors of poses for supervision. Our experiments demonstrate that DICE achieves state-of-the-art performance on a standard benchmark and in-the-wild data in terms of accuracy and physical plausibility. Additionally, our method operates at an interactive rate (20 fps) on an Nvidia 4090 GPU, whereas Decaf requires more than 15 seconds for a single image. Our code will be publicly available upon publication.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
EDEN: Empathetic Dialogues for English learning
Authors:
Li Siyan,
Teresa Shao,
Zhou Yu,
Julia Hirschberg
Abstract:
Dialogue systems have been used as conversation partners in English learning, but few have studied whether these systems improve learning outcomes. Student passion and perseverance, or grit, has been associated with language learning success. Recent work establishes that as students perceive their English teachers to be more supportive, their grit improves. Hypothesizing that the same pattern appl…
▽ More
Dialogue systems have been used as conversation partners in English learning, but few have studied whether these systems improve learning outcomes. Student passion and perseverance, or grit, has been associated with language learning success. Recent work establishes that as students perceive their English teachers to be more supportive, their grit improves. Hypothesizing that the same pattern applies to English-teaching chatbots, we create EDEN, a robust open-domain chatbot for spoken conversation practice that provides empathetic feedback. To construct EDEN, we first train a specialized spoken utterance grammar correction model and a high-quality social chit-chat conversation model. We then conduct a preliminary user study with a variety of strategies for empathetic feedback. Our experiment suggests that using adaptive empathetic feedback leads to higher perceived affective support, which, in turn, predicts increased student grit.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
VarBench: Robust Language Model Benchmarking Through Dynamic Variable Perturbation
Authors:
Kun Qian,
Shunji Wan,
Claudia Tang,
Youzhi Wang,
Xuanming Zhang,
Maximillian Chen,
Zhou Yu
Abstract:
As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem. To ensure fair evaluation, recent benchmarks release only the training and validation sets, keeping the test set labels closed-source. They require anyone wishing…
▽ More
As large language models achieve impressive scores on traditional benchmarks, an increasing number of researchers are becoming concerned about benchmark data leakage during pre-training, commonly known as the data contamination problem. To ensure fair evaluation, recent benchmarks release only the training and validation sets, keeping the test set labels closed-source. They require anyone wishing to evaluate his language model to submit the model's predictions for centralized processing and then publish the model's result on their leaderboard. However, this submission process is inefficient and prevents effective error analysis. To address this issue, we propose to variabilize benchmarks and evaluate language models dynamically. Specifically, we extract variables from each test case and define a value range for each variable. For each evaluation, we sample new values from these value ranges to create unique test cases, thus ensuring a fresh evaluation each time. We applied this variable perturbation method to four datasets: GSM8K, ARC, CommonsenseQA, and TruthfulQA, which cover mathematical generation and multiple-choice tasks. Our experimental results demonstrate that this approach provides a more accurate assessment of the true capabilities of language models, effectively mitigating the contamination problem.
△ Less
Submitted 26 June, 2024; v1 submitted 25 June, 2024;
originally announced June 2024.
-
NARRepair: Non-Autoregressive Code Generation Model for Automatic Program Repair
Authors:
Zhenyu Yang,
Zhen Yang,
Zhongxing Yu
Abstract:
With the advancement of deep learning techniques, the performance of Automatic Program Repair(APR) techniques has reached a new level. Previous deep learning-based APR techniques essentially modified program sentences in the Autoregressive(AR) manner, which predicts future values based on past values. Due to the manner of word-by-word generation, the AR-based APR technique has a huge time delay. T…
▽ More
With the advancement of deep learning techniques, the performance of Automatic Program Repair(APR) techniques has reached a new level. Previous deep learning-based APR techniques essentially modified program sentences in the Autoregressive(AR) manner, which predicts future values based on past values. Due to the manner of word-by-word generation, the AR-based APR technique has a huge time delay. This negative consequence overshadows the widespread adoption of APR techniques in real-life software development.
To address the issue, we aim to apply the Non-Autoregressive(NAR) method to the APR task, which can output target code in a parallel manner to avoid huge inference delays. To effectively adapt the NAR manner for the APR task, we in this paper propose NARRepair, the first customized NAR code generation model for the APR task. The NARRepair features three major novelties, including 1) using repair actions to alleviate the over-correction issue, 2) extracting dependency information from AST to alleviate the issue of lacking inter-word dependency information, 3) employing two-stage decoding to alleviate the issue of lacking contextual information. We evaluated NARRepair on three widely used datasets in the APR community, and the results show that our technique can significantly improve the inference speed while maintaining high repair accuracy.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Wound Tissue Segmentation in Diabetic Foot Ulcer Images Using Deep Learning: A Pilot Study
Authors:
Mrinal Kanti Dhar,
Chuanbo Wang,
Yash Patel,
Taiyu Zhang,
Jeffrey Niezgoda,
Sandeep Gopalakrishnan,
Keke Chen,
Zeyun Yu
Abstract:
Identifying individual tissues, so-called tissue segmentation, in diabetic foot ulcer (DFU) images is a challenging task and little work has been published, largely due to the limited availability of a clinical image dataset. To address this gap, we have created a DFUTissue dataset for the research community to evaluate wound tissue segmentation algorithms. The dataset contains 110 images with tis…
▽ More
Identifying individual tissues, so-called tissue segmentation, in diabetic foot ulcer (DFU) images is a challenging task and little work has been published, largely due to the limited availability of a clinical image dataset. To address this gap, we have created a DFUTissue dataset for the research community to evaluate wound tissue segmentation algorithms. The dataset contains 110 images with tissues labeled by wound experts and 600 unlabeled images. Additionally, we conducted a pilot study on segmenting wound characteristics including fibrin, granulation, and callus using deep learning. Due to the limited amount of annotated data, our framework consists of both supervised learning (SL) and semi-supervised learning (SSL) phases. In the SL phase, we propose a hybrid model featuring a Mix Transformer (MiT-b3) in the encoder and a CNN in the decoder, enhanced by the integration of a parallel spatial and channel squeeze-and-excitation (P-scSE) module known for its efficacy in improving boundary accuracy. The SSL phase employs a pseudo-labeling-based approach, iteratively identifying and incorporating valuable unlabeled images to enhance overall segmentation performance. Comparative evaluations with state-of-the-art methods are conducted for both SL and SSL phases. The SL achieves a Dice Similarity Coefficient (DSC) of 84.89%, which has been improved to 87.64% in the SSL phase. Furthermore, the results are benchmarked against two widely used SSL approaches: Generative Adversarial Networks and Cross-Consistency Training. Additionally, our hybrid model outperforms the state-of-the-art methods with a 92.99% DSC in performing binary segmentation of DFU wound areas when tested on the Chronic Wound dataset. Codes and data are available at https://github.com/uwm-bigdata/DFUTissueSegNet.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Federated Adversarial Learning for Robust Autonomous Landing Runway Detection
Authors:
Yi Li,
Plamen Angelov,
Zhengxin Yu,
Alvaro Lopez Pellicer,
Neeraj Suri
Abstract:
As the development of deep learning techniques in autonomous landing systems continues to grow, one of the major challenges is trust and security in the face of possible adversarial attacks. In this paper, we propose a federated adversarial learning-based framework to detect landing runways using paired data comprising of clean local data and its adversarial version. Firstly, the local model is pr…
▽ More
As the development of deep learning techniques in autonomous landing systems continues to grow, one of the major challenges is trust and security in the face of possible adversarial attacks. In this paper, we propose a federated adversarial learning-based framework to detect landing runways using paired data comprising of clean local data and its adversarial version. Firstly, the local model is pre-trained on a large-scale lane detection dataset. Then, instead of exploiting large instance-adaptive models, we resort to a parameter-efficient fine-tuning method known as scale and shift deep features (SSF), upon the pre-trained model. Secondly, in each SSF layer, distributions of clean local data and its adversarial version are disentangled for accurate statistics estimation. To the best of our knowledge, this marks the first instance of federated learning work that address the adversarial sample problem in landing runway detection. Our experimental evaluations over both synthesis and real images of Landing Approach Runway Detection (LARD) dataset consistently demonstrate good performance of the proposed federated adversarial learning and robust to adversarial attacks.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration
Authors:
Zhongzhi Yu,
Zheng Wang,
Yonggan Fu,
Huihong Shi,
Khalid Shaikh,
Yingyan Celine Lin
Abstract:
Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores…
▽ More
Attention is a fundamental component behind the remarkable achievements of large language models (LLMs). However, our current understanding of the attention mechanism, especially regarding how attention distributions are established, remains limited. Inspired by recent studies that explore the presence of attention sink in the initial token, which receives disproportionately large attention scores despite their lack of semantic importance, this work delves deeper into this phenomenon. We aim to provide a more profound understanding of the existence of attention sinks within LLMs and to uncover ways to enhance the achievable accuracy of LLMs by directly optimizing the attention distributions, without the need for weight finetuning. Specifically, this work begins with comprehensive visualizations of the attention distributions in LLMs during inference across various inputs and tasks. Based on these visualizations, to the best of our knowledge, we are the first to discover that (1) attention sinks occur not only at the start of sequences but also within later tokens of the input, and (2) not all attention sinks have a positive impact on the achievable accuracy of LLMs. Building upon our findings, we propose a training-free Attention Calibration Technique (ACT) that automatically optimizes the attention distributions on the fly during inference in an input-adaptive manner. Extensive experiments validate that ACT consistently enhances the accuracy of various LLMs across different applications. Specifically, ACT achieves an average improvement of up to 7.30% in accuracy across different datasets when applied to Llama-30B. Our code is available at https://github.com/GATECH-EIC/ACT.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive Layer Tuning and Voting
Authors:
Zhongzhi Yu,
Zheng Wang,
Yuhan Li,
Haoran You,
Ruijie Gao,
Xiaoya Zhou,
Sreenidhi Reedy Bommu,
Yang Katie Zhao,
Yingyan Celine Lin
Abstract:
Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and ef…
▽ More
Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference. However, existing tuning techniques fall short because of the high computation and memory overheads. To this end, we introduce a computation- and memory-efficient LLM tuning framework, called Edge-LLM, to facilitate affordable and effective LLM adaptation on edge devices. Specifically, Edge-LLM features three core components: (1) a layer-wise unified compression (LUC) technique to reduce the computation overhead by generating layer-wise pruning sparsity and quantization bit-width policies, (2) an adaptive layer tuning and voting scheme to reduce the memory overhead by reducing the backpropagation depth, and (3) a complementary hardware scheduling strategy to handle the irregular computation patterns introduced by LUC and adaptive layer tuning, thereby achieving efficient computation and data movements. Extensive experiments demonstrate that Edge-LLM achieves a 2.92x speed up and a 4x memory overhead reduction as compared to vanilla tuning methods with comparable task accuracy. Our code is available at https://github.com/GATECH-EIC/Edge-LLM
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
The FRB-searching pipeline of the Tianlai Cylinder Pathfinder Array
Authors:
Zijie Yu,
Furen Deng,
Shijie Sun,
Chenhui Niu,
Jixia Li,
Fengquan Wu,
Wei-Yang Wang,
Yougang Wang,
Shifan Zuo,
Lin Shu,
Jie Hao,
Xiaohui Liu,
Reza Ansari,
Ue-Li Pen,
Albert Stebbins,
Peter Timbie,
Xuelei Chen
Abstract:
This paper presents the design, calibration, and survey strategy of the Fast Radio Burst (FRB) digital backend and its real-time data processing pipeline employed in the Tianlai Cylinder Pathfinder array. The array, consisting of three parallel cylindrical reflectors and equipped with 96 dual-polarization feeds, is a radio interferometer array designed for conducting drift scans of the northern ce…
▽ More
This paper presents the design, calibration, and survey strategy of the Fast Radio Burst (FRB) digital backend and its real-time data processing pipeline employed in the Tianlai Cylinder Pathfinder array. The array, consisting of three parallel cylindrical reflectors and equipped with 96 dual-polarization feeds, is a radio interferometer array designed for conducting drift scans of the northern celestial semi-sphere. The FRB digital backend enables the formation of 96 digital beams, effectively covering an area of approximately 40 square degrees with 3 dB beam. Our pipeline demonstrates the capability to make automatic search of FRBs, detecting at quasi-real-time and classify FRB candidates automatically. The current FRB searching pipeline has an overall recall rate of 88\%. During the commissioning phase, we successfully detected signals emitted by four well-known pulsars: PSR B0329+54, B2021+51, B0823+26, and B2020+28. We report the first discovery of an FRB by our array, designated as FRB 20220414A. We also investigate the optimal arrangement for the digitally formed beams to achieve maximum detection rate by numerical simulation.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
TinyStyler: Efficient Few-Shot Text Style Transfer with Authorship Embeddings
Authors:
Zachary Horvitz,
Ajay Patel,
Kanishk Singh,
Chris Callison-Burch,
Kathleen McKeown,
Zhou Yu
Abstract:
The goal of text style transfer is to transform the style of texts while preserving their original meaning, often with only a few examples of the target style. Existing style transfer methods generally rely on the few-shot capabilities of large language models or on complex controllable text generation approaches that are inefficient and underperform on fluency metrics. We introduce TinyStyler, a…
▽ More
The goal of text style transfer is to transform the style of texts while preserving their original meaning, often with only a few examples of the target style. Existing style transfer methods generally rely on the few-shot capabilities of large language models or on complex controllable text generation approaches that are inefficient and underperform on fluency metrics. We introduce TinyStyler, a lightweight but effective approach, which leverages a small language model (800M params) and pre-trained authorship embeddings to perform efficient, few-shot text style transfer. We evaluate on the challenging task of authorship style transfer and find TinyStyler outperforms strong approaches such as GPT-4. We also evaluate TinyStyler's ability to perform text attribute style transfer (formal $\leftrightarrow$ informal) with automatic and human evaluations and find that the approach outperforms recent controllable text generation methods. Our model has been made publicly available at https://huggingface.co/tinystyler/tinystyler .
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Leptogenesis assisted by scalar decays
Authors:
Jun-Yu Tong,
Zhao-Huan Yu,
Hong-Hao Zhang
Abstract:
We present a pragmatic approach to lower down the mass scale of right-handed neutrinos in leptogenesis by introducing a scalar decaying to right-handed neutrinos. The key point of our proposal is that the out-of-equilibrium decays of the scalar provide an additional source for right-handed neutrinos and hence the lepton asymmetry. This mechanism works well at low temperatures when the washout of t…
▽ More
We present a pragmatic approach to lower down the mass scale of right-handed neutrinos in leptogenesis by introducing a scalar decaying to right-handed neutrinos. The key point of our proposal is that the out-of-equilibrium decays of the scalar provide an additional source for right-handed neutrinos and hence the lepton asymmetry. This mechanism works well at low temperatures when the washout of the generated lepton asymmetry is suppressed. Thus, the lepton asymmetry can be effectively produced despite the washout effect is strong or not. Through a comprehensive analysis, we demonstrate that such a scalar-assisted leptogenesis can typically decrease the viable right-handed neutrino mass scale by two to four orders of magnitude.
△ Less
Submitted 27 June, 2024; v1 submitted 19 June, 2024;
originally announced June 2024.
-
AI-Empowered Multiple Access for 6G: A Survey of Spectrum Sensing, Protocol Designs, and Optimizations
Authors:
Xuelin Cao,
Bo Yang,
Kaining Wang,
Xinghua Li,
Zhiwen Yu,
Chau Yuen,
Yan Zhang,
Zhu Han
Abstract:
With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimiz…
▽ More
With the rapidly increasing number of bandwidth-intensive terminals capable of intelligent computing and communication, such as smart devices equipped with shallow neural network models, the complexity of multiple access for these intelligent terminals is increasing due to the dynamic network environment and ubiquitous connectivity in 6G systems. Traditional multiple access (MA) design and optimization methods are gradually losing ground to artificial intelligence (AI) techniques that have proven their superiority in handling complexity. AI-empowered MA and its optimization strategies aimed at achieving high Quality-of-Service (QoS) are attracting more attention, especially in the area of latency-sensitive applications in 6G systems. In this work, we aim to: 1) present the development and comparative evaluation of AI-enabled MA; 2) provide a timely survey focusing on spectrum sensing, protocol design, and optimization for AI-empowered MA; and 3) explore the potential use cases of AI-empowered MA in the typical application scenarios within 6G systems. Specifically, we first present a unified framework of AI-empowered MA for 6G systems by incorporating various promising machine learning techniques in spectrum sensing, resource allocation, MA protocol design, and optimization. We then introduce AI-empowered MA spectrum sensing related to spectrum sharing and spectrum interference management. Next, we discuss the AI-empowered MA protocol designs and implementation methods by reviewing and comparing the state-of-the-art, and we further explore the optimization algorithms related to dynamic resource management, parameter adjustment, and access scheme switching. Finally, we discuss the current challenges, point out open issues, and outline potential future research directions in this field.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Interpretable modulated differentiable STFT and physics-informed balanced spectrum metric for freight train wheelset bearing cross-machine transfer fault diagnosis under speed fluctuations
Authors:
Chao He,
Hongmei Shi,
Ruixin Li,
Jianbo Li,
ZuJun Yu
Abstract:
The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentia…
▽ More
The service conditions of wheelset bearings has a direct impact on the safe operation of railway heavy haul freight trains as the key components. However, speed fluctuation of the trains and few fault samples are the two main problems that restrict the accuracy of bearing fault diagnosis. Therefore, a cross-machine transfer diagnosis (pyDSN) network coupled with interpretable modulated differentiable short-time Fourier transform (STFT) and physics-informed balanced spectrum quality metric is proposed to learn domain-invariant and discriminative features under time-varying speeds. Firstly, due to insufficiency in extracting extract frequency components of time-varying speed signals using fixed windows, a modulated differentiable STFT (MDSTFT) that is interpretable with STFT-informed theoretical support, is proposed to extract the robust time-frequency spectrum (TFS). During training process, multiple windows with different lengths dynamically change. Also, in addition to the classification metric and domain discrepancy metric, we creatively introduce a third kind of metric, referred to as the physics-informed metric, to enhance transferable TFS. A physics-informed balanced spectrum quality (BSQ) regularization loss is devised to guide an optimization direction for MDSTFT and model. With it, not only can model acquire high-quality TFS, but also a physics-restricted domain adaptation network can be also acquired, making it learn real-world physics knowledge, ultimately diminish the domain discrepancy across different datasets. The experiment is conducted in the scenario of migrating from the laboratory datasets to the freight train dataset, indicating that the hybrid-driven pyDSN outperforms existing methods and has practical value.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Planar Hall Plateau in Magnetic Weyl Semimetals
Authors:
Lei Li,
Chaoxi Cui,
Run-Wu Zhang,
Zhi-Ming Yu,
Yugui Yao
Abstract:
Despite the rapid progress in the study of planar Hall effect (PHE) in recent years, all the previous works only showed that the PHE is connected to local geometric quantities, such as Berry curvature. Here, for the first time, we point out that the PHE in magnetic Weyl semimetals is directly related to a global quantity, namely, the Chern number of the Weyl point. This leads to a remarkable conse…
▽ More
Despite the rapid progress in the study of planar Hall effect (PHE) in recent years, all the previous works only showed that the PHE is connected to local geometric quantities, such as Berry curvature. Here, for the first time, we point out that the PHE in magnetic Weyl semimetals is directly related to a global quantity, namely, the Chern number of the Weyl point. This leads to a remarkable consequence that the PHE observation predicted here is robust against many system details, including the Fermi energy. The main difference between non-magnetic and magnetic Weyl points is that the latter breaks time-reversal symmetry T, thus generally possessing an energy tilt. Via semiclassical Boltzmann theory, we investigate the PHE in generic magnetic Weyl models with energy tilt and arbitrary Chern number. We find that by aligning the magnetic and electric fields in the same direction, the trace of the PHE conductivity contributed from Berry curvature and orbital moment is proportional to the Chern number and the energy tilt of the Weyl points, resulting in previously undiscovered quantized PHE plateau by varying Fermi energy. We further confirm the existence of PHE plateaus in a more realistic lattice model without T symmetry. By proposing a new quantized physical quantity, our work not only provides a new tool for extracting the topological character of the Weyl points but also suggests that the interplay between topology and magnetism can give rise to intriguing physics.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Quantized Andreev conductance in semiconductor nanowires
Authors:
Yichun Gao,
Wenyu Song,
Yuhao Wang,
Zuhan Geng,
Zhan Cao,
Zehao Yu,
Shuai Yang,
Jiaye Xu,
Fangting Chen,
Zonglin Li,
Ruidong Li,
Lining Yang,
Zhaoyu Wang,
Shan Zhang,
Xiao Feng,
Tiantian Wang,
Yunyi Zang,
Lin Li,
Dong E. Liu,
Runan Shang,
Qi-Kun Xue,
Ke He,
Hao Zhang
Abstract:
Clean one-dimensional electron systems can exhibit quantized conductance. The plateau conductance doubles if the transport is dominated by Andreev reflection. Here, we report quantized conductance observed in both Andreev and normal-state transports in PbTe-Pb and PbTe-In hybrid nanowires. The Andreev plateau is observed at $4e^2/h$, twice of the normal plateau value of $2e^2/h$. In comparison, An…
▽ More
Clean one-dimensional electron systems can exhibit quantized conductance. The plateau conductance doubles if the transport is dominated by Andreev reflection. Here, we report quantized conductance observed in both Andreev and normal-state transports in PbTe-Pb and PbTe-In hybrid nanowires. The Andreev plateau is observed at $4e^2/h$, twice of the normal plateau value of $2e^2/h$. In comparison, Andreev conductance in the best-optimized III-V nanowires is non-quantized due to mode-mixing induced dips (a disorder effect), despite the quantization of normal-state transport. The negligible mode mixing in PbTe hybrids indicates an unprecedented low-disorder transport regime for nanowire devices, beneficial for Majorana researches.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
The structure of periodic point free distal homeomorphisms on the annulus
Authors:
Enhui Shi,
Hui Xu,
Ziqi YU
Abstract:
Let $A$ be an annulus in the plane $\mathbb R^2$ and $g:A\rightarrow A$ be a boundary components preserving homeomorphism which is distal and has no periodic points. Then there is a continuous decomposition of $A$ into $g$-invariant circles such that all the restrictions of $g$ on them share a common irrational rotation number and all these circles are linearly ordered by the inclusion relation on…
▽ More
Let $A$ be an annulus in the plane $\mathbb R^2$ and $g:A\rightarrow A$ be a boundary components preserving homeomorphism which is distal and has no periodic points. Then there is a continuous decomposition of $A$ into $g$-invariant circles such that all the restrictions of $g$ on them share a common irrational rotation number and all these circles are linearly ordered by the inclusion relation on the sets of bounded components of their complements in $\mathbb R^2$.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center
Authors:
Zichen Yu,
Changyong Shu,
Qianpu Sun,
Junjie Linghu,
Xiaobao Wei,
Jiangyong Yu,
Zongdai Liu,
Dawei Yang,
Hui Li,
Yan Chen
Abstract:
Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc,…
▽ More
Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc, our approach simultaneously learns semantic occupancy and class-aware instance clustering in a single network, these outputs are jointly incorporated through panoptic occupancy procession for panoptic occupancy. This approach effectively addresses the drawbacks of high memory and computation requirements associated with three-dimensional voxel-level representations. With its straightforward and efficient design that facilitates easy deployment, Panoptic-FlashOcc demonstrates remarkable achievements in panoptic occupancy prediction. On the Occ3D-nuScenes benchmark, it achieves exceptional performance, with 38.5 RayIoU and 29.1 mIoU for semantic occupancy, operating at a rapid speed of 43.9 FPS. Furthermore, it attains a notable score of 16.0 RayPQ for panoptic occupancy, accompanied by a fast inference speed of 30.2 FPS. These results surpass the performance of existing methodologies in terms of both speed and accuracy. The source code and trained models can be found at the following github repository: https://github.com/Yzichen/FlashOCC.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Interstellar Nitrogen Isotope Ratios: Measurements on tracers of C$^{14}$N and C$^{15}$N
Authors:
J. L. Chen,
J. S. Zhang,
C. Henkel,
Y. T. Yan,
H. Z. Yu,
Y. X. Wang,
Y. P. Zou,
J. Y. Zhao,
X. Y. Wang
Abstract:
The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios a…
▽ More
The nitrogen isotope ratio 14N/15N is a powerful tool to trace Galactic stellar nucleosynthesis and constraining Galactic chemical evolution. Previous observations have found lower 14N/15N ratios in the Galactic center and higher values in the Galactic disk. This is consistent with the inside-out formation scenario of our Milky Way. However, previous studies mostly utilized double isotope ratios also including 12C/13C, which introduces additional uncertainties. Here we therefore present observations of C14N and its rare isotopologue, C15N, toward a sample of star forming regions, measured by the IRAM 30 m and/or the ARO 12 m telescope at $λ$ ~3 mm wavelength. For those 35 sources detected in both isotopologues, physical parameters are determined. Furthermore we have obtained nitrogen isotope ratios using the strongest hyperfine components of CN and C15N. For those sources showing small deviations from Local Thermodynamical Equilibrium and/or self-absorption, the weakest hyperfine component, likely free of the latter effect, was used to obtain reliable 14N/15N values. Our measured 14N/15N isotope ratios from C14N and C15N measurements are compatible with those from our earlier measurements of NH3 and 15NH3 (Paper I), i.e., increasing ratios to a Galacticentric distance of ~9 kpc. The unweighted second order polynomial fit yields $\frac{{\rm C^{14}N}}{{\rm C^{15}N}} = (-4.85 \pm 1.89)\;{\rm kpc^{-2}} \times R_{\rm GC}^{2} + (82.11 \pm 31.93) \;{\rm kpc^{-1}} \times R_{\rm GC} - (28.12 \pm 126.62)$. Toward the outer galaxy, the isotope ratio tends to decrease, supporting an earlier finding by H13CN/HC15N. Galactic chemical evolution models are consistent with our measurements of the 14N/15N isotope ratio, i.e. a rising trend from the Galactic center region to approximately 9 kpc, followed by a decreasing trend with increasing $R_{\rm GC}$ toward the outer Galaxy.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Q-Mamba: On First Exploration of Vision Mamba for Image Quality Assessment
Authors:
Fengbin Guan,
Xin Li,
Zihao Yu,
Yiting Lu,
Zhibo Chen
Abstract:
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment, aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba has been und…
▽ More
In this work, we take the first exploration of the recently popular foundation model, i.e., State Space Model/Mamba, in image quality assessment, aiming at observing and excavating the perception potential in vision Mamba. A series of works on Mamba has shown its significant potential in various fields, e.g., segmentation and classification. However, the perception capability of Mamba has been under-explored. Consequently, we propose Q-Mamba by revisiting and adapting the Mamba model for three crucial IQA tasks, i.e., task-specific, universal, and transferable IQA, which reveals that the Mamba model has obvious advantages compared with existing foundational models, e.g., Swin Transformer, ViT, and CNNs, in terms of perception and computational cost for IQA. To increase the transferability of Q-Mamba, we propose the StylePrompt tuning paradigm, where the basic lightweight mean and variance prompts are injected to assist the task-adaptive transfer learning of pre-trained Q-Mamba for different downstream IQA tasks. Compared with existing prompt tuning strategies, our proposed StylePrompt enables better perception transfer capability with less computational cost. Extensive experiments on multiple synthetic, authentic IQA datasets, and cross IQA datasets have demonstrated the effectiveness of our proposed Q-Mamba.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Environment-Aware Codebook Design for RIS-Assisted MU-MISO Communications: Implementation and Performance Analysis
Authors:
Zhiheng Yu,
Jiancheng An,
Ertugrul Basar,
Lu Gan,
Chau Yuen
Abstract:
Reconfigurable intelligent surface (RIS) provides a new electromagnetic response control solution, which can proactively reshape the characteristics of wireless channel environments. In RIS-assisted communication systems, the acquisition of channel state information (CSI) and the optimization of reflecting coefficients constitute major design challenges. To address these issues, codebook-based sol…
▽ More
Reconfigurable intelligent surface (RIS) provides a new electromagnetic response control solution, which can proactively reshape the characteristics of wireless channel environments. In RIS-assisted communication systems, the acquisition of channel state information (CSI) and the optimization of reflecting coefficients constitute major design challenges. To address these issues, codebook-based solutions have been developed recently, which, however, are mostly environment-agnostic. In this paper, a novel environment-aware codebook protocol is proposed, which can significantly reduce both pilot overhead and computational complexity, while maintaining expected communication performance. Specifically, first of all, a channel training framework is introduced to divide the training phase into several blocks. In each block, we directly estimate the composite end-to-end channel and focus only on the transmit beamforming. Second, we propose an environment-aware codebook generation scheme, which first generates a group of channels based on statistical CSI, and then obtains their corresponding RIS configuration by utilizing the alternating optimization (AO) method offline. In each online training block, the RIS is configured based on the corresponding codeword in the environment-aware codebook, and the optimal codeword resulting in the highest sum rate is adopted for assisting in the downlink data transmission. Third, we analyze the theoretical performance of the environment-aware codebook-based protocol taking into account the channel estimation errors. Finally, numerical simulations are provided to verify our theoretical analysis and the performance of the proposed scheme. In particular, the simulation results demonstrate that our protocol is more competitive than conventional environment-agnostic codebooks.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
AI.vs.Clinician: Unveiling Intricate Interactions Between AI and Clinicians through an Open-Access Database
Authors:
Wanling Gao,
Yuan Liu,
Zhuoming Yu,
Dandan Cui,
Wenjing Liu,
Xiaoshuang Liang,
Jiahui Zhao,
Jiyue Xie,
Hao Li,
Li Ma,
Ning Ye,
Yumiao Kang,
Dingfeng Luo,
Peng Pan,
Wei Huang,
Zhongmou Liu,
Jizhong Hu,
Fan Huang,
Gangyuan Zhao,
Chongrong Jiang,
Tianyi Wei,
Zhifei Zhang,
Yunyou Huang,
Jianfeng Zhan
Abstract:
Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI f…
▽ More
Artificial Intelligence (AI) plays a crucial role in medical field and has the potential to revolutionize healthcare practices. However, the success of AI models and their impacts hinge on the synergy between AI and medical specialists, with clinicians assuming a dominant role. Unfortunately, the intricate dynamics and interactions between AI and clinicians remain undiscovered and thus hinder AI from being translated into medical practice. To address this gap, we have curated a groundbreaking database called AI.vs.Clinician. This database is the first of its kind for studying the interactions between AI and clinicians. It derives from 7,500 collaborative diagnosis records on a life-threatening medical emergency -- Sepsis -- from 14 medical centers across China. For the patient cohorts well-chosen from MIMIC databases, the AI-related information comprises the model property, feature input, diagnosis decision, and inferred probabilities of sepsis onset presently and within next three hours. The clinician-related information includes the viewed examination data and sequence, viewed time, preliminary and final diagnosis decisions with or without AI assistance, and recommended treatment.
△ Less
Submitted 15 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
RS-Agent: Automating Remote Sensing Tasks through Intelligent Agents
Authors:
Wenjia Xu,
Zijian Yu,
Yixu Wang,
Jiuniu Wang,
Mugen Peng
Abstract:
An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in profession…
▽ More
An increasing number of models have achieved great performance in remote sensing tasks with the recent development of Large Language Models (LLMs) and Visual Language Models (VLMs). However, these models are constrained to basic vision and language instruction-tuning tasks, facing challenges in complex remote sensing applications. Additionally, these models lack specialized expertise in professional domains. To address these limitations, we propose a LLM-driven remote sensing intelligent agent named RS-Agent. Firstly, RS-Agent is powered by a large language model (LLM) that acts as its "Central Controller," enabling it to understand and respond to various problems intelligently. Secondly, our RS-Agent integrates many high-performance remote sensing image processing tools, facilitating multi-tool and multi-turn conversations. Thirdly, our RS-Agent can answer professional questions by leveraging robust knowledge documents. We conducted experiments using several datasets, e.g., RSSDIVCS, RSVQA, and DOTAv1. The experimental results demonstrate that our RS-Agent delivers outstanding performance in many tasks, i.e., scene classification, visual question answering, and object counting tasks.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.