-
Expanding the Scope: Inductive Knowledge Graph Reasoning with Multi-Starting Progressive Propagation
Authors:
Zhoutian Shao,
Yuanning Cui,
Wei Hu
Abstract:
Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance…
▽ More
Knowledge graphs (KGs) are widely acknowledged as incomplete, and new entities are constantly emerging in the real world. Inductive KG reasoning aims to predict missing facts for these new entities. Among existing models, graph neural networks (GNNs) based ones have shown promising performance for this task. However, they are still challenged by inefficient message propagation due to the distance and scalability issues. In this paper, we propose a new inductive KG reasoning model, MStar, by leveraging conditional message passing neural networks (C-MPNNs). Our key insight is to select multiple query-specific starting entities to expand the scope of progressive propagation. To propagate query-related messages to a farther area within limited steps, we subsequently design a highway layer to propagate information toward these selected starting entities. Moreover, we introduce a training strategy called LinkVerify to mitigate the impact of noisy training samples. Experimental results validate that MStar achieves superior performance compared with state-of-the-art models, especially for distant entities.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Supernova Pointing Capabilities of DUNE
Authors:
DUNE Collaboration,
A. Abed Abud,
B. Abi,
R. Acciarri,
M. A. Acero,
M. R. Adames,
G. Adamov,
M. Adamowski,
D. Adams,
M. Adinolfi,
C. Adriano,
A. Aduszkiewicz,
J. Aguilar,
B. Aimard,
F. Akbar,
K. Allison,
S. Alonso Monsalve,
M. Alrashed,
A. Alton,
R. Alvarez,
T. Alves,
H. Amar,
P. Amedo,
J. Anderson,
D. A. Andrade
, et al. (1340 additional authors not shown)
Abstract:
The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electr…
▽ More
The determination of the direction of a stellar core collapse via its neutrino emission is crucial for the identification of the progenitor for a multimessenger follow-up. A highly effective method of reconstructing supernova directions within the Deep Underground Neutrino Experiment (DUNE) is introduced. The supernova neutrino pointing resolution is studied by simulating and reconstructing electron-neutrino charged-current absorption on $^{40}$Ar and elastic scattering of neutrinos on electrons. Procedures to reconstruct individual interactions, including a newly developed technique called ``brems flipping'', as well as the burst direction from an ensemble of interactions are described. Performance of the burst direction reconstruction is evaluated for supernovae happening at a distance of 10 kpc for a specific supernova burst flux model. The pointing resolution is found to be 3.4 degrees at 68% coverage for a perfect interaction-channel classification and a fiducial mass of 40 kton, and 6.6 degrees for a 10 kton fiducial mass respectively. Assuming a 4% rate of charged-current interactions being misidentified as elastic scattering, DUNE's burst pointing resolution is found to be 4.3 degrees (8.7 degrees) at 68% coverage.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Achieving Peta-Ohm Resistance for Semi-Insulating 4H-SiC Devices by Atomic Layer Deposition
Authors:
Yuying Xi,
Helios Y. Li,
Guohui Li,
Qingmei Su,
Kaili Mao,
Bingshe Xu,
Yuying Hao,
Nicholas X. Fang,
Yanxia Cui
Abstract:
Growing demands for precise current measurements, such as atto-ampere-level measurement of cross-cellular biological current transduction, have spotlighted a pressing need for low-noise resistors with ultra-high resistance immune to voltage fluctuations. Traditional semi-insulating materials, however, struggle to provide consistent resistance across varying voltages. To bridge this gap, we introdu…
▽ More
Growing demands for precise current measurements, such as atto-ampere-level measurement of cross-cellular biological current transduction, have spotlighted a pressing need for low-noise resistors with ultra-high resistance immune to voltage fluctuations. Traditional semi-insulating materials, however, struggle to provide consistent resistance across varying voltages. To bridge this gap, we introduce a design that integrates semi-insulating 4H-SiC with atomic-level metal oxide interlayers and electrodes. The strategic adjustment of surface states via atomic-scale metal oxide layers optimizes the work functions on 4H-SiC surfaces, validated through density functional theory simulations. This design transcends conventional limitations, establishing an ideal Ohmic behavior and maintains Peta-Ohm-level resistance, unaffected by voltage variations. These on-chip devices with fine-tuned resistance are compatible with integrated circuit manufacturing processes, making them ideally suited for applications in precision electronics.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Self-Evolving GPT: A Lifelong Autonomous Experiential Learner
Authors:
Jinglong Gao,
Xiao Ding,
Yiming Cui,
Jianbai Zhao,
Hepeng Wang,
Ting Liu,
Bing Qin
Abstract:
To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential lea…
▽ More
To improve the performance of large language models (LLMs), researchers have explored providing LLMs with textual task-solving experience via prompts. However, they rely on manual efforts to acquire and apply such experience for each task, which is not feasible for the growing demand for LLMs and the variety of user questions. To address this issue, we design a lifelong autonomous experiential learning framework based on LLMs to explore whether LLMs can imitate human ability for learning and utilizing experience. It autonomously learns and accumulates experience through experience transfer and induction, categorizing the types of input questions to select which accumulated experience to employ for them. Experimental results on six widely used NLP datasets show that our framework performs reliably in each intermediate step and effectively improves the performance of GPT-3.5 and GPT-4. This validates the feasibility of using LLMs to mimic human experiential learning and application capabilities. Additionally, we provide a detailed analysis of the behavior of our framework at each step.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Deep Learning-based CSI Feedback in Wi-Fi Systems
Authors:
Fan Qi,
Jiajia Guo,
Yiming Cui,
Xiangyi Li,
Chao-Kai Wen,
Shi Jin
Abstract:
In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the r…
▽ More
In Wi-Fi systems, channel state information (CSI) plays a crucial role in enabling access points to execute beamforming operations. However, the feedback overhead associated with CSI significantly hampers the throughput improvements. Recent advancements in deep learning (DL) have transformed the approach to CSI feedback in cellular systems. Drawing inspiration from the successes witnessed in the realm of mobile communications, this paper introduces a DL-based CSI feedback framework, named EFNet, tailored for Wi-Fi systems. The proposed framework leverages an autoencoder to achieve precise feedback with minimal overhead. The process involves the station utilizing the encoder to compress and quantize a series of matrices into codeword bit streams, which are then fed back to the access point. Subsequently, the decoder installed at the AP reconstructs beamforming matrices from these bit streams. We implement the EFNet system using standard Wi-Fi equipment operating in the 2.4 GHz band. Experimental findings in an office environment reveal a remarkable 80.77% reduction in feedback overhead compared to the 802.11ac standard, alongside a significant boost in net throughput of up to 30.72%.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Regularity of powers of edge ideals of edge-weighted integrally closed cycles
Authors:
Guangjun Zhu,
Yijun Cui,
Jiaxin Li,
Yi Yang
Abstract:
This paper gives exact formulas for the regularity of powers of edge ideals of an edge-weighted integrally closed cycle.
This paper gives exact formulas for the regularity of powers of edge ideals of an edge-weighted integrally closed cycle.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
A Contrastive Learning Based Convolutional Neural Network for ERP Brain-Computer Interfaces
Authors:
Yuntian Cui,
Xinke Shen,
Dan Zhang,
Chen Yang
Abstract:
ERP-based EEG detection is gaining increasing attention in the field of brain-computer interfaces. However, due to the complexity of ERP signal components, their low signal-to-noise ratio, and significant inter-subject variability, cross-subject ERP signal detection has been challenging. The continuous advancement in deep learning has greatly contributed to addressing this issue. This brief propos…
▽ More
ERP-based EEG detection is gaining increasing attention in the field of brain-computer interfaces. However, due to the complexity of ERP signal components, their low signal-to-noise ratio, and significant inter-subject variability, cross-subject ERP signal detection has been challenging. The continuous advancement in deep learning has greatly contributed to addressing this issue. This brief proposes a contrastive learning training framework and an Inception module to extract multi-scale temporal and spatial features, representing the subject-invariant components of ERP signals. Specifically, a base encoder integrated with a linear Inception module and a nonlinear projector is used to project the raw data into latent space. By maximizing signal similarity under different targets, the inter-subject EEG signal differences in latent space are minimized. The extracted spatiotemporal features are then used for ERP target detection. The proposed algorithm achieved the best AUC performance in single-trial binary classification tasks on the P300 dataset and showed significant optimization in speller decoding tasks compared to existing algorithms.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
MVGT: A Multi-view Graph Transformer Based on Spatial Relations for EEG Emotion Recognition
Authors:
Yanjie Cui,
Xiaohong Liu,
Jing Liang,
Yamin Fu
Abstract:
Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information. However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain…
▽ More
Electroencephalography (EEG), a medical imaging technique that captures scalp electrical activity of brain structures via electrodes, has been widely used in affective computing. The spatial domain of EEG is rich in affective information. However, few of the existing studies have simultaneously analyzed EEG signals from multiple perspectives of geometric and anatomical structures in spatial domain. In this paper, we propose a multi-view Graph Transformer (MVGT) based on spatial relations, which integrates information from the temporal, frequency and spatial domains, including geometric and anatomical structures, so as to enhance the expressive power of the model comprehensively. We incorporate the spatial information of EEG channels into the model as encoding, thereby improving its ability to perceive the spatial structure of the channels. Meanwhile, experimental results based on publicly available datasets demonstrate that our proposed model outperforms state-of-the-art methods in recent years. In addition, the results also show that the MVGT could extract information from multiple domains and capture inter-channel relationships in EEG emotion recognition tasks effectively.
△ Less
Submitted 8 July, 2024; v1 submitted 3 July, 2024;
originally announced July 2024.
-
VFIMamba: Video Frame Interpolation with State Space Models
Authors:
Guozhen Zhang,
Chunxu Liu,
Yutao Cui,
Xiaotong Zhao,
Kai Ma,
Limin Wang
Abstract:
Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering b…
▽ More
Inter-frame modeling is pivotal in generating intermediate frames for video frame interpolation (VFI). Current approaches predominantly rely on convolution or attention-based models, which often either lack sufficient receptive fields or entail significant computational overheads. Recently, Selective State Space Models (S6) have emerged, tailored specifically for long sequence modeling, offering both linear complexity and data-dependent modeling capabilities. In this paper, we propose VFIMamba, a novel frame interpolation method for efficient and dynamic inter-frame modeling by harnessing the S6 model. Our approach introduces the Mixed-SSM Block (MSB), which initially rearranges tokens from adjacent frames in an interleaved fashion and subsequently applies multi-directional S6 modeling. This design facilitates the efficient transmission of information across frames while upholding linear complexity. Furthermore, we introduce a novel curriculum learning strategy that progressively cultivates proficiency in modeling inter-frame dynamics across varying motion magnitudes, fully unleashing the potential of the S6 model. Experimental findings showcase that our method attains state-of-the-art performance across diverse benchmarks, particularly excelling in high-resolution scenarios. In particular, on the X-TEST dataset, VFIMamba demonstrates a noteworthy improvement of 0.80 dB for 4K frames and 0.96 dB for 2K frames.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
FlowTrack: Point-level Flow Network for 3D Single Object Tracking
Authors:
Shuo Li,
Yubo Cui,
Zhiheng Li,
Zheng Fang
Abstract:
3D single object tracking (SOT) is a crucial task in fields of mobile robotics and autonomous driving. Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limit…
▽ More
3D single object tracking (SOT) is a crucial task in fields of mobile robotics and autonomous driving. Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limitations, we propose a point-level flow method with multi-frame information for 3D SOT task, called FlowTrack. Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the tracking performance. At the same time, to handle scenes with sparse points, we present a learnable target feature as the bridge to efficiently integrate target information from past frames. Moreover, we design a novel Instance Flow Head to transform dense point-level flow into instance-level motion, effectively aggregating local motion information to obtain global target motion. Finally, our method achieves competitive performance with improvements of 5.9% on the KITTI dataset and 2.9% on NuScenes. The code will be made publicly available soon.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Single-Ion Spectroscopy of h-BN Point Defect Fluorescence in Liquid Environments
Authors:
Yecun Wu,
Kun Xu,
Hori Pada Sarker,
Takashi Taniguchi,
Kenji Watanabe,
Frank Abild-Pedersen,
Arun Majumdar,
Yi Cui,
Yan-Kai Tzeng,
Steven Chu
Abstract:
Understanding individual ions in solutions is essential for advancing our knowledge of complex chemical systems. However, tracking and detecting ions at the single-ion level in liquid environments remains a challenge. We introduce a strategy for visualization and differentiation of different ions in liquid environment via point defects in hexagonal boron nitride (h-BN) as the ion sensor. Ions inte…
▽ More
Understanding individual ions in solutions is essential for advancing our knowledge of complex chemical systems. However, tracking and detecting ions at the single-ion level in liquid environments remains a challenge. We introduce a strategy for visualization and differentiation of different ions in liquid environment via point defects in hexagonal boron nitride (h-BN) as the ion sensor. Ions interacting with the optically active point defects in h-BN alter emission properties, allowing us to capture these changes and visualize single ions. Using Li+ in organic electrolytes as a model, we observed a spectral shift of over 10 nm upon Li+ addition, and an over 50 nm red shift with applied electric fields due to reactions between Li+ and h-BN point defects. Frequency domain analysis further revealed the rapid dynamics of ion migration and the slow electrochemical reactions. We further spectroscopically differentiated various ions (H+, Li+, Na+, K+, Zn2+, Al3+) in aqueous solution. Each ion, with its distinct electron cloud configuration, interacts distinctively with the electron clouds of h-BN defects, resulting in specific and identifiable spectroscopic signatures. This ion sensing platform enables the direct visualization and differentiation of individual ions in a liquid environment, offering insights into chemical reactions at the single-ion level. This capability presents potential applications in various fields involving ions in liquids, including but not limited to biology, battery technology, and environmental science.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Exploring causal effects of hormone- and radio-treatments in an observational study of breast cancer using copula-based semi-competing risks models
Authors:
Tonghui Yu,
Mengjiao Peng,
Yifan Cui,
Elynn Chen,
Chixiang Chen
Abstract:
Breast cancer patients may experience relapse or death after surgery during the follow-up period, leading to dependent censoring of relapse. This phenomenon, known as semi-competing risk, imposes challenges in analyzing treatment effects on breast cancer and necessitates advanced statistical tools for unbiased analysis. Despite progress in estimation and inference within semi-competing risks regre…
▽ More
Breast cancer patients may experience relapse or death after surgery during the follow-up period, leading to dependent censoring of relapse. This phenomenon, known as semi-competing risk, imposes challenges in analyzing treatment effects on breast cancer and necessitates advanced statistical tools for unbiased analysis. Despite progress in estimation and inference within semi-competing risks regression, its application to causal inference is still in its early stages. This article aims to propose a frequentist and semi-parametric framework based on copula models that can facilitate valid causal inference, net quantity estimation and interpretation, and sensitivity analysis for unmeasured factors under right-censored semi-competing risks data. We also propose novel procedures to enhance parameter estimation and its applicability in real practice. After that, we apply the proposed framework to a breast cancer study and detect the time-varying causal effects of hormone- and radio-treatments on patients' relapse-free survival and overall survival. Moreover, extensive numerical evaluations demonstrate the method's feasibility, highlighting minimal estimation bias and reliable statistical inference.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Individual brain parcellation: Review of methods, validations and applications
Authors:
Chengyi Li,
Shan Yu,
Yue Cui
Abstract:
Individual brains vary greatly in morphology, connectivity and organization. The applicability of group-level parcellations is limited by the rapid development of precision medicine today because they do not take into account the variation of parcels at the individual level. Accurate mapping of brain functional regions at the individual level is pivotal for a comprehensive understanding of the var…
▽ More
Individual brains vary greatly in morphology, connectivity and organization. The applicability of group-level parcellations is limited by the rapid development of precision medicine today because they do not take into account the variation of parcels at the individual level. Accurate mapping of brain functional regions at the individual level is pivotal for a comprehensive understanding of the variations in brain function and behaviors, early and precise identification of brain abnormalities, as well as personalized treatments for neuropsychiatric disorders. With the development of neuroimaging and machine learning techniques, studies on individual brain parcellation are booming. In this paper, we offer an overview of recent advances in the methodologies of individual brain parcellation, including optimization- and learning-based methods. Comprehensive evaluation metrics to validate individual brain mapping have been introduced. We also review the studies of how individual brain mapping promotes neuroscience research and clinical medicine. Finally, we summarize the major challenges and important future directions of individualized brain parcellation. Collectively, we intend to offer a thorough overview of individual brain parcellation methods, validations, and applications, along with highlighting the current challenges that call for an urgent demand for integrated platforms that integrate datasets, methods, and validations.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Measurement of dynamic nonlocal deformation using nanodiamond sensors
Authors:
Yue Cui,
Weng-Hang Leong,
Guoli Zhu,
Ren-Bao Liu,
Quan Li
Abstract:
Nonlocal deformation sensing achieved by integrating atomic force microscopy indentation with nanodiamond-based orientation tracking features high precision and high spatial resolution, providing a useful technique for studying the mechanical properties of soft biological systems. However, this technique is currently limited to lifeless systems because it cannot differentiate the indentation-induc…
▽ More
Nonlocal deformation sensing achieved by integrating atomic force microscopy indentation with nanodiamond-based orientation tracking features high precision and high spatial resolution, providing a useful technique for studying the mechanical properties of soft biological systems. However, this technique is currently limited to lifeless systems because it cannot differentiate the indentation-induced deformation from that associated with live activities or other external perturbations. Here we develop a dynamic nonlocal deformation sensing method using oscillatory nanoindentation and spectroscopic analysis to overcome this limitation. The method realizes both temporally and spatially resolved mechanical analysis, with tens of microsecond time-lag precision, nanometer vertical deformation precision, and sub-hundred nanometer lateral spatial resolution, leading to the disclosure of surface/interface effects in the mechanical response of viscoelastic materials and live cells. Neglecting surface tension would underestimate the liquid-like characteristics of the materials. This work demonstrates nanodiamond sensors as a useful tool for spatial-temporal mechanical analysis of soft, complex bio-relevant materials.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Deep Mamba Multi-modal Learning
Authors:
Jian Zhu,
Xin Zou,
Yu Cui,
Zhangmin Huang,
Chenshu Hu,
Bo Lyu
Abstract:
Inspired by the excellent performance of Mamba networks, we propose a novel Deep Mamba Multi-modal Learning (DMML). It can be used to achieve the fusion of multi-modal features. We apply DMML to the field of multimedia retrieval and propose an innovative Deep Mamba Multi-modal Hashing (DMMH) method. It combines the advantages of algorithm accuracy and inference speed. We validated the effectivenes…
▽ More
Inspired by the excellent performance of Mamba networks, we propose a novel Deep Mamba Multi-modal Learning (DMML). It can be used to achieve the fusion of multi-modal features. We apply DMML to the field of multimedia retrieval and propose an innovative Deep Mamba Multi-modal Hashing (DMMH) method. It combines the advantages of algorithm accuracy and inference speed. We validated the effectiveness of DMMH on three public datasets and achieved state-of-the-art results.
△ Less
Submitted 9 April, 2024;
originally announced June 2024.
-
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation
Authors:
Yuang Peng,
Yuxin Cui,
Haomiao Tang,
Zekun Qi,
Runpei Dong,
Jing Bai,
Chunrui Han,
Zheng Ge,
Xiangyu Zhang,
Shu-Tao Xia
Abstract:
Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advan…
▽ More
Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content. However, current evaluations either are automated but misalign with humans or require human evaluations that are time-consuming and expensive. In this work, we present DreamBench++, a human-aligned benchmark automated by advanced multimodal GPT models. Specifically, we systematically design the prompts to let GPT be both human-aligned and self-aligned, empowered with task reinforcement. Further, we construct a comprehensive dataset comprising diverse images and prompts. By benchmarking 7 modern generative models, we demonstrate that DreamBench++ results in significantly more human-aligned evaluation, helping boost the community with innovative findings.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Six-CD: Benchmarking Concept Removals for Benign Text-to-image Diffusion Models
Authors:
Jie Ren,
Kangrui Chen,
Yingqian Cui,
Shenglai Zeng,
Hui Liu,
Yue Xing,
Jiliang Tang,
Lingjuan Lyu
Abstract:
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate context…
▽ More
Text-to-image (T2I) diffusion models have shown exceptional capabilities in generating images that closely correspond to textual prompts. However, the advancement of T2I diffusion models presents significant risks, as the models could be exploited for malicious purposes, such as generating images with violence or nudity, or creating unauthorized portraits of public figures in inappropriate contexts. To mitigate these risks, concept removal methods have been proposed. These methods aim to modify diffusion models to prevent the generation of malicious and unwanted concepts. Despite these efforts, existing research faces several challenges: (1) a lack of consistent comparisons on a comprehensive dataset, (2) ineffective prompts in harmful and nudity concepts, (3) overlooked evaluation of the ability to generate the benign part within prompts containing malicious concepts. To address these gaps, we propose to benchmark the concept removal methods by introducing a new dataset, Six-CD, along with a novel evaluation metric. In this benchmark, we conduct a thorough evaluation of concept removals, with the experimental observations and discussions offering valuable insights in the field.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
EnTruth: Enhancing the Traceability of Unauthorized Dataset Usage in Text-to-image Diffusion Models with Minimal and Robust Alterations
Authors:
Jie Ren,
Yingqian Cui,
Chen Chen,
Vikash Sehwag,
Yue Xing,
Jiliang Tang,
Lingjuan Lyu
Abstract:
Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datasets play an important role, their protection has remained as an unsolved issue. Current protection strategies, such as watermarks and membership inference, are e…
▽ More
Generative models, especially text-to-image diffusion models, have significantly advanced in their ability to generate images, benefiting from enhanced architectures, increased computational power, and large-scale datasets. While the datasets play an important role, their protection has remained as an unsolved issue. Current protection strategies, such as watermarks and membership inference, are either in high poison rate which is detrimental to image quality or suffer from low accuracy and robustness. In this work, we introduce a novel approach, EnTruth, which Enhances Traceability of unauthorized dataset usage utilizing template memorization. By strategically incorporating the template memorization, EnTruth can trigger the specific behavior in unauthorized models as the evidence of infringement. Our method is the first to investigate the positive application of memorization and use it for copyright protection, which turns a curse into a blessing and offers a pioneering perspective for unauthorized usage detection in generative models. Comprehensive experiments are provided to demonstrate its effectiveness in terms of data-alteration rate, accuracy, robustness and generation quality.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Rastall gravity: accretion disk image in radiation fields context and visual transformations compared to Reissner-Nordstrom black holes
Authors:
Yu-Xiang Huang,
Sen Guo,
Yu Liang,
Yu-Hao Cui,
Qing-Quan Jiang,
Kai Lin
Abstract:
Our study investigates the astronomical implications of Rastall gravity, particularly its behavior amidst a radiation field compared to Reissner-Nordstrom (RN) black holes. Our research delineates a crucial correlation between the dynamics of the accretion disk and the parameters Q and N_{\rm r}, which aptly reflect the influence of spacetime metrics on the disk's appearance. Elevated electric cha…
▽ More
Our study investigates the astronomical implications of Rastall gravity, particularly its behavior amidst a radiation field compared to Reissner-Nordstrom (RN) black holes. Our research delineates a crucial correlation between the dynamics of the accretion disk and the parameters Q and N_{\rm r}, which aptly reflect the influence of spacetime metrics on the disk's appearance. Elevated electric charge Q prompts contraction in the disk's orbit due to enhanced gravitational effects, while higher N_{\rm r} values lead to outward expansion, influenced by the radiation field's attributes. Interestingly, the charged black holes surrounded by radiation fields display distinct visual disparities from RN black holes. Brightness decreases and expansion occurs within the accretion disk's innermost stable circular orbit with rising N_{\rm r} values. Our study also reveals the process by which the accretion disk transitions from a conventional disk-like structure to a hat-like form at different observation angles, with the redshift effect gradually intensifying. Moreover, the results of the Rastall gravity radiation field we consider are consistent with the constraints of the host galaxy's gravitational lensing on the Rastall gravity parameters, enhancing the consistency between theoretical predictions and actual observations.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Unveiling Encoder-Free Vision-Language Models
Authors:
Haiwen Diao,
Yufeng Cui,
Xiaotong Li,
Yueze Wang,
Huchuan Lu,
Xinlong Wang
Abstract:
Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual features followed by large language models (LLMs) for visual-language tasks. However, the vision encoders set a strong inductive bias in abstracting visual representation, e.g., resolution, aspect ratio, and semantic priors, which could impede the flexibility and efficiency of the VLMs. Training pure VLMs that…
▽ More
Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual features followed by large language models (LLMs) for visual-language tasks. However, the vision encoders set a strong inductive bias in abstracting visual representation, e.g., resolution, aspect ratio, and semantic priors, which could impede the flexibility and efficiency of the VLMs. Training pure VLMs that accept the seamless vision and language inputs, i.e., without vision encoders, remains challenging and rarely explored. Empirical observations reveal that direct training without encoders results in slow convergence and large performance gaps. In this work, we bridge the gap between encoder-based and encoder-free models, and present a simple yet effective training recipe towards pure VLMs. Specifically, we unveil the key aspects of training encoder-free VLMs efficiently via thorough experiments: (1) Bridging vision-language representation inside one unified decoder; (2) Enhancing visual recognition capability via extra supervision. With these strategies, we launch EVE, an encoder-free vision-language model that can be trained and forwarded efficiently. Notably, solely utilizing 35M publicly accessible data, EVE can impressively rival the encoder-based VLMs of similar capacities across multiple vision-language benchmarks. It significantly outperforms the counterpart Fuyu-8B with mysterious training procedures and undisclosed training data. We believe that EVE provides a transparent and efficient route for developing a pure decoder-only architecture across modalities. Our code and models are publicly available at: https://github.com/baaivision/EVE.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
ProMotion: Prototypes As Motion Learners
Authors:
Yawen Lu,
Dongfang Liu,
Qifan Wang,
Cheng Han,
Yiming Cui,
Zhiwen Cao,
Xueling Zhang,
Yingjie Victor Chen,
Heng Fan
Abstract:
In this work, we introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks. ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms. We adopt a prototypical perspective, establishing a unified paradigm that harmonizes disparate motion learning approaches. This novel paradigm streamlines the architectural desi…
▽ More
In this work, we introduce ProMotion, a unified prototypical framework engineered to model fundamental motion tasks. ProMotion offers a range of compelling attributes that set it apart from current task-specific paradigms. We adopt a prototypical perspective, establishing a unified paradigm that harmonizes disparate motion learning approaches. This novel paradigm streamlines the architectural design, enabling the simultaneous assimilation of diverse motion information. We capitalize on a dual mechanism involving the feature denoiser and the prototypical learner to decipher the intricacies of motion. This approach effectively circumvents the pitfalls of ambiguity in pixel-wise feature matching, significantly bolstering the robustness of motion representation. We demonstrate a profound degree of transferability across distinct motion patterns. This inherent versatility reverberates robustly across a comprehensive spectrum of both 2D and 3D downstream tasks. Empirical results demonstrate that ProMotion outperforms various well-known specialized architectures, achieving 0.54 and 0.054 Abs Rel error on the Sintel and KITTI depth datasets, 1.04 and 2.01 average endpoint error on the clean and final pass of Sintel flow benchmark, and 4.30 F1-all error on the KITTI flow benchmark. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision.
△ Less
Submitted 7 June, 2024;
originally announced June 2024.
-
Near-field Beamforming for Extremely Large-scale MIMO Based on Unsupervised Deep Learning
Authors:
Jiali Nie,
Yuanhao Cui,
Zhaohui Yang,
Weijie Yuan,
Xiaojun Jing
Abstract:
Extremely Large-scale Array (ELAA) is considered a frontier technology for future communication systems, pivotal in improving wireless systems' rate and spectral efficiency. However, as ELAA employs a multitude of antennas operating at higher frequencies, users are typically situated in the near-field region where the spherical wavefront propagates. This inevitably leads to a significant increase…
▽ More
Extremely Large-scale Array (ELAA) is considered a frontier technology for future communication systems, pivotal in improving wireless systems' rate and spectral efficiency. However, as ELAA employs a multitude of antennas operating at higher frequencies, users are typically situated in the near-field region where the spherical wavefront propagates. This inevitably leads to a significant increase in the overhead of beam training, requiring complex two-dimensional beam searching in both the angle domain and the distance domain. To address this problem, we propose a near-field beamforming method based on unsupervised deep learning. Our convolutional neural network efficiently extracts complex channel state information features by strategically selecting padding and kernel size. We optimize the beamformers to maximize achievable rates in a multi-user network without relying on predefined custom codebooks. Upon deployment, the model requires solely the input of pre-estimated channel state information to derive the optimal beamforming vector. Simulation results show that our proposed scheme can obtain stable beamforming gain compared with the baseline scheme. Furthermore, owing to the inherent traits of deep learning methodologies, this approach substantially diminishes the beam training costs in near-field regions.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Simulation of DAMPE silicon microstrip detectors in the $\rm Allpix^{2}$ framework
Authors:
Yu-Xin Cui,
Xiang Li,
Shen Wang,
Chuan Yue,
Qiang Wan,
Shi-Jun Lei,
Guan-Wen Yuan,
Yi-Ming Hu,
Jia-Ju Wei,
Jian-Hua Guo
Abstract:
Silicon strip detectors have been widely utilized in space experiments for gamma-ray and cosmic-ray detections thanks to their high spatial resolution and stable performance. For a silicon micro-strip detector, the Monte Carlo simulation is recognized as a practical and cost-effective approach to verify the detector performance. In this study, a technique for the simulation of the silicon micro-st…
▽ More
Silicon strip detectors have been widely utilized in space experiments for gamma-ray and cosmic-ray detections thanks to their high spatial resolution and stable performance. For a silicon micro-strip detector, the Monte Carlo simulation is recognized as a practical and cost-effective approach to verify the detector performance. In this study, a technique for the simulation of the silicon micro-strip detector with the $\rm Allpix^{2}$ framework is developed. By incorporating the electric field into the particle transport simulation based on Geant4, this framework could precisely emulate the carrier drift in the silicon micro-strip detector. The simulation results are validated using the beam test data as well as the flight data of the DAMPE experiment, which suggests that the $\rm Allpix^{2}$ framework is a powerful tool to obtain the performance of the silicon micro-strip detector.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
An Efficient Trajectory Generation for Bi-copter Flight in Tight Space
Authors:
Xin Dong,
Yangjie Cui,
Jingwu Xiang,
Daochun Li,
Zhan Tu
Abstract:
Unlike squared (or alike) quadrotors, elongated bi-copters leverage natural superiority in crossing tight spaces. To date, extensive works have focused on the design, modeling, and control of bi-copters. Besides, a proper motion planner utilizing bi-copters' shape characteristics is essential to efficiently and safely traverse tight spaces, yet it has rarely been studied. Current motion planning m…
▽ More
Unlike squared (or alike) quadrotors, elongated bi-copters leverage natural superiority in crossing tight spaces. To date, extensive works have focused on the design, modeling, and control of bi-copters. Besides, a proper motion planner utilizing bi-copters' shape characteristics is essential to efficiently and safely traverse tight spaces, yet it has rarely been studied. Current motion planning methods will significantly compromise their ability to traverse narrow spaces if the map is inflated based on the long dimension of the bi-copter. In this paper, we propose an efficient motion planning method that enables the safe navigation of bi-copters through narrow spaces. We first adapt a dynamic, feasible path-finding algorithm with whole-body collision checks to generate a collision-free path. Subsequently, we jointly optimize the position and rotation of the bi-copter to produce a trajectory that is safe, dynamically feasible, and smooth. Extensive simulations and real-world experiments have been conducted to verify the reliability and robustness of the proposed method.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving
Authors:
Yiming Cui,
Cheng Han,
Dongfang Liu
Abstract:
Visual-based perception is the key module for autonomous driving. Among those visual perception tasks, video object detection is a primary yet challenging one because of feature degradation caused by fast motion or multiple poses. Current models usually aggregate features from the neighboring frames to enhance the object representations for the task heads to generate more accurate predictions. Tho…
▽ More
Visual-based perception is the key module for autonomous driving. Among those visual perception tasks, video object detection is a primary yet challenging one because of feature degradation caused by fast motion or multiple poses. Current models usually aggregate features from the neighboring frames to enhance the object representations for the task heads to generate more accurate predictions. Though getting better performance, these methods rely on the information from the future frames and suffer from high computational complexity. Meanwhile, the aggregation process is not reconfigurable during the inference time. These issues make most of the existing models infeasible for online applications. To solve these problems, we introduce a stepwise spatial global-local aggregation network. Our proposed models mainly contain three parts: 1). Multi-stage stepwise network gradually refines the predictions and object representations from the previous stage; 2). Spatial global-local aggregation fuses the local information from the neighboring frames and global semantics from the current frame to eliminate the feature degradation; 3). Dynamic aggregation strategy stops the aggregation process early based on the refinement results to remove redundancy and improve efficiency. Extensive experiments on the ImageNet VID benchmark validate the effectiveness and efficiency of our proposed models.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion
Authors:
Hongze Sun,
Rui Liu,
Wuque Cai,
Jun Wang,
Yue Wang,
Huajin Tang,
Yan Cui,
Dezhong Yao,
Daqing Guo
Abstract:
Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches…
▽ More
Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different visual modalities and then uses a unified encoder to align the features across different domains. Moreover, we propose an enhanced transformer-based module to fuse multimodal features using attention mechanisms. With these methods, the MMHT model can effectively construct a multiscale and multidimensional visual feature space and achieve discriminative feature modeling. Extensive experiments demonstrate that the MMHT model exhibits competitive performance in comparison with that of other state-of-the-art methods. Overall, our results highlight the effectiveness of the MMHT model in terms of addressing the challenges faced in visual object tracking tasks.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking
Authors:
Hongtao Wang,
Rongyu Feng,
Liangyi Wu,
Mutian Liu,
Yinuo Cui,
Chunxia Zhang,
Zhenbo Guo
Abstract:
In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based pi…
▽ More
In seismic exploration, identifying the first break (FB) is a critical component in establishing subsurface velocity models. Various automatic picking techniques based on deep neural networks have been developed to expedite this procedure. The most popular class is using semantic segmentation networks to pick on a shot gather called 2-dimensional (2-D) picking. Generally, 2-D segmentation-based picking methods input an image of a shot gather, and output a binary segmentation map, in which the maximum of each column is the location of FB. However, current designed segmentation networks is difficult to ensure the horizontal continuity of the segmentation. Additionally, FB jumps also exist in some areas, and it is not easy for current networks to detect such jumps. Therefore, it is important to pick as much as possible and ensure horizontal continuity. To alleviate this problem, we propose a novel semantic segmentation network for the 2-D seismic FB picking task, where we introduce the dynamic snake convolution into U-Net and call the new segmentation network dynamic-snake U-Net (DSU-Net). Specifically, we develop original dynamic-snake convolution (DSConv) in CV and propose a novel DSConv module, which can extract the horizontal continuous feature in the shallow feature of the shot gather. Many experiments have shown that DSU-Net demonstrates higher accuracy and robustness than the other 2-D segmentation-based models, achieving state-of-the-art (SOTA) performance in 2-D seismic field surveys. Particularly, it can effectively detect FB jumps and better ensure the horizontal continuity of FB. In addition, the ablation experiment and the anti-noise experiment, respectively, verify the optimal structure of the DSConv module and the robustness of the picking.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
DiffCalib: Reformulating Monocular Camera Calibration as Diffusion-Based Dense Incident Map Generation
Authors:
Xiankang He,
Guangkai Xu,
Bo Zhang,
Hao Chen,
Ying Cui,
Dongyan Guo
Abstract:
Monocular camera calibration is a key precondition for numerous 3D vision applications. Despite considerable advancements, existing methods often hinge on specific assumptions and struggle to generalize across varied real-world scenarios, and the performance is limited by insufficient training data. Recently, diffusion models trained on expansive datasets have been confirmed to maintain the capabi…
▽ More
Monocular camera calibration is a key precondition for numerous 3D vision applications. Despite considerable advancements, existing methods often hinge on specific assumptions and struggle to generalize across varied real-world scenarios, and the performance is limited by insufficient training data. Recently, diffusion models trained on expansive datasets have been confirmed to maintain the capability to generate diverse, high-quality images. This success suggests a strong potential of the models to effectively understand varied visual information. In this work, we leverage the comprehensive visual knowledge embedded in pre-trained diffusion models to enable more robust and accurate monocular camera intrinsic estimation. Specifically, we reformulate the problem of estimating the four degrees of freedom (4-DoF) of camera intrinsic parameters as a dense incident map generation task. The map details the angle of incidence for each pixel in the RGB image, and its format aligns well with the paradigm of diffusion models. The camera intrinsic then can be derived from the incident map with a simple non-learning RANSAC algorithm during inference. Moreover, to further enhance the performance, we jointly estimate a depth map to provide extra geometric information for the incident map estimation. Extensive experiments on multiple testing datasets demonstrate that our model achieves state-of-the-art performance, gaining up to a 40% reduction in prediction errors. Besides, the experiments also show that the precise camera intrinsic and depth maps estimated by our pipeline can greatly benefit practical applications such as 3D reconstruction from a single in-the-wild image.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
RAEE: A Training-Free Retrieval-Augmented Early Exiting Framework for Efficient Inference
Authors:
Lianming Huang,
Shangyu Wu,
Yufei Cui,
Ying Xiong,
Xue Liu,
Tei-Wei Kuo,
Nan Guan,
Chun Jason Xue
Abstract:
Deploying large language model inference remains challenging due to their high computational overhead. Early exiting accelerates model inference by adaptively reducing the number of inference layers. Existing methods require training internal classifiers to determine whether to exit at each intermediate layer. However, such classifier-based early exiting frameworks require significant effort to de…
▽ More
Deploying large language model inference remains challenging due to their high computational overhead. Early exiting accelerates model inference by adaptively reducing the number of inference layers. Existing methods require training internal classifiers to determine whether to exit at each intermediate layer. However, such classifier-based early exiting frameworks require significant effort to design and train the classifiers. To address these limitations, this paper proposes RAEE, a training-free Retrieval-Augmented Early Exiting framework for efficient inference. First, this paper demonstrates that the early exiting problem can be modeled as a distribution prediction problem, where the distribution is approximated using similar data's existing information. Next, the paper details the process of collecting existing information to build the retrieval database. Finally, based on the pre-built retrieval database, RAEE leverages the retrieved similar data's exiting information to guide the backbone model to exit at the layer, which is predicted by the approximated distribution. Experimental results demonstrate that the proposed RAEE can significantly accelerate inference. RAEE also achieves state-of-the-art zero-shot performance on 8 classification tasks.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
A general solution to the simultaneous stabilization problem by analytic interpolation
Authors:
Yufang Cui,
Anders Lindquist
Abstract:
In this paper, we tackle the significant challenge of simultaneous stabilization in control systems engineering, where the aim is to employ a single controller to ensure stability across multiple systems. We delve into both scalar and multivariable scenarios. For the scalar case, we present the necessary and sufficient conditions for a single controller to stabilize multiple plants and reformulate…
▽ More
In this paper, we tackle the significant challenge of simultaneous stabilization in control systems engineering, where the aim is to employ a single controller to ensure stability across multiple systems. We delve into both scalar and multivariable scenarios. For the scalar case, we present the necessary and sufficient conditions for a single controller to stabilize multiple plants and reformulate these conditions to interpolation constraints, which expand Ghosh's results by allowing derivative constraints. Furthermore, we implement a methodology based on a Riccati-type matrix equation, called the Covariance Extension Equation. This approach enables us to parameterize all potential solutions using a monic Schur polynomial. Consequently, we extend our result to the multivariable scenario and derive the necessary and sufficient conditions for a group of $m\times m$ plants to be simultaneously stabilizable, which can also be solved by our analytic interpolation method. Finally, we construct four numerical examples, showcasing the application of our method across various scenarios encountered in control systems engineering and highlighting its ability to stabilize diverse systems efficiently and reliably.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Ab initio intermolecular interactions mediate thermochemically real-fluid effects that affect system reactivity
Authors:
Mingrui Wang,
Ruoyue Tang,
Xinrui Ren,
Yanqing Cui,
Song Cheng
Abstract:
The properties of supercritical fluids are dictated by intermolecular interactions that involve two or more molecules. Such intermolecular interactions were described via intermolecular potentials in historical supercritical combustion modeling studies, but have been treated empirically and with no consideration of radical interactions or multi-body interactions involving more than two molecules.…
▽ More
The properties of supercritical fluids are dictated by intermolecular interactions that involve two or more molecules. Such intermolecular interactions were described via intermolecular potentials in historical supercritical combustion modeling studies, but have been treated empirically and with no consideration of radical interactions or multi-body interactions involving more than two molecules. This approach has been adopted long ago, assuming sufficient characterization of real-fluid effects during supercritical combustion. Here, with data from ab initio multi-body intermolecular potentials, non-empirical Virial Equation of State (EoS), and real-fluid thermochemical and kinetic simulations, we reveal that empirical intermolecular potentials can lead to significant errors in representing supercritical fluids under common combustion situations, which can be impressively described by ab initio intermolecular potentials. These interactions are also found to greatly influence autoignition delay times, a common measure of global reactivity, with significant contributions from radical interactions and multi-body interactions. It is therefore of necessity to incorporate ab initio intermolecular interactions in studying supercritical combustion and various dynamic systems involving supercritical fluids, which has now been enabled through the new framework developed in the present study.
△ Less
Submitted 19 May, 2024;
originally announced May 2024.
-
NetMamba: Efficient Network Traffic Classification via Pre-training Unidirectional Mamba
Authors:
Tongze Wang,
Xiaohui Xie,
Wenduo Wang,
Chuyi Wang,
Youjian Zhao,
Yong Cui
Abstract:
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due…
▽ More
Network traffic classification is a crucial research area aiming to enhance service quality, streamline network management, and bolster cybersecurity. To address the growing complexity of transmission encryption techniques, various machine learning and deep learning methods have been proposed. However, existing approaches face two main challenges. Firstly, they struggle with model inefficiency due to the quadratic complexity of the widely used Transformer architecture. Secondly, they suffer from inadequate traffic representation because of discarding important byte information while retaining unwanted biases. To address these challenges, we propose NetMamba, an efficient linear-time state space model equipped with a comprehensive traffic representation scheme. We adopt a specially selected and improved unidirectional Mamba architecture for the networking field, instead of the Transformer, to address efficiency issues. In addition, we design a traffic representation scheme to extract valid information from massive traffic data while removing biased information. Evaluation experiments on six public datasets encompassing three main classification tasks showcase NetMamba's superior classification performance compared to state-of-the-art baselines. It achieves an accuracy rate of nearly 99% (some over 99%) in all tasks. Additionally, NetMamba demonstrates excellent efficiency, improving inference speed by up to 60 times while maintaining comparably low memory usage. Furthermore, NetMamba exhibits superior few-shot learning abilities, achieving better classification performance with fewer labeled data. To the best of our knowledge, NetMamba is the first model to tailor the Mamba architecture for networking.
△ Less
Submitted 25 May, 2024; v1 submitted 19 May, 2024;
originally announced May 2024.
-
Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities
Authors:
Hao Zhou,
Chengming Hu,
Ye Yuan,
Yufei Cui,
Yili Jin,
Can Chen,
Haolun Wu,
Dun Yuan,
Li Jiang,
Di Wu,
Xue Liu,
Charlie Zhang,
Xianbin Wang,
Jiangchuan Liu
Abstract:
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks bas…
▽ More
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks.
△ Less
Submitted 17 May, 2024;
originally announced May 2024.
-
Region of Interest Detection in Melanocytic Skin Tumor Whole Slide Images -- Nevus & Melanoma
Authors:
Yi Cui,
Yao Li,
Jayson R. Miedema,
Sharon N. Edmiston,
Sherif Farag,
J. S. Marron,
Nancy E. Thomas
Abstract:
Automated region of interest detection in histopathological image analysis is a challenging and important topic with tremendous potential impact on clinical practice. The deep-learning methods used in computational pathology may help us to reduce costs and increase the speed and accuracy of cancer diagnosis. We started with the UNC Melanocytic Tumor Dataset cohort that contains 160 hematoxylin and…
▽ More
Automated region of interest detection in histopathological image analysis is a challenging and important topic with tremendous potential impact on clinical practice. The deep-learning methods used in computational pathology may help us to reduce costs and increase the speed and accuracy of cancer diagnosis. We started with the UNC Melanocytic Tumor Dataset cohort that contains 160 hematoxylin and eosin whole-slide images of primary melanomas (86) and nevi (74). We randomly assigned 80% (134) as a training set and built an in-house deep-learning method to allow for classification, at the slide level, of nevi and melanomas. The proposed method performed well on the other 20% (26) test dataset; the accuracy of the slide classification task was 92.3% and our model also performed well in terms of predicting the region of interest annotated by the pathologists, showing excellent performance of our model on melanocytic skin tumors. Even though we tested the experiments on the skin tumor dataset, our work could also be extended to other medical image detection problems to benefit the clinical evaluation and diagnosis of different tumors.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks
Authors:
Ziquan Liu,
Yufei Cui,
Yan Yan,
Yi Xu,
Xiangyang Ji,
Xue Liu,
Antoni B. Chan
Abstract:
In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness th…
▽ More
In safety-critical applications such as medical imaging and autonomous driving, where decisions have profound implications for patient health and road safety, it is imperative to maintain both high adversarial robustness to protect against potential adversarial attacks and reliable uncertainty quantification in decision-making. With extensive research focused on enhancing adversarial robustness through various forms of adversarial training (AT), a notable knowledge gap remains concerning the uncertainty inherent in adversarially trained models. To address this gap, this study investigates the uncertainty of deep learning models by examining the performance of conformal prediction (CP) in the context of standard adversarial attacks within the adversarial defense community. It is first unveiled that existing CP methods do not produce informative prediction sets under the commonly used $l_{\infty}$-norm bounded attack if the model is not adversarially trained, which underpins the importance of adversarial training for CP. Our paper next demonstrates that the prediction set size (PSS) of CP using adversarially trained models with AT variants is often worse than using standard AT, inspiring us to research into CP-efficient AT for improved PSS. We propose to optimize a Beta-weighting loss with an entropy minimization regularizer during AT to improve CP-efficiency, where the Beta-weighting loss is shown to be an upper bound of PSS at the population level by our theoretical analysis. Moreover, our empirical study on four image classification datasets across three popular AT baselines validates the effectiveness of the proposed Uncertainty-Reducing AT (AT-UR).
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
IHC Matters: Incorporating IHC analysis to H&E Whole Slide Image Analysis for Improved Cancer Grading via Two-stage Multimodal Bilinear Pooling Fusion
Authors:
Jun Wang,
Yu Mao,
Yufei Cui,
Nan Guan,
Chun Jason Xue
Abstract:
Immunohistochemistry (IHC) plays a crucial role in pathology as it detects the over-expression of protein in tissue samples. However, there are still fewer machine learning model studies on IHC's impact on accurate cancer grading. We discovered that IHC and H\&E possess distinct advantages and disadvantages while possessing certain complementary qualities. Building on this observation, we develope…
▽ More
Immunohistochemistry (IHC) plays a crucial role in pathology as it detects the over-expression of protein in tissue samples. However, there are still fewer machine learning model studies on IHC's impact on accurate cancer grading. We discovered that IHC and H\&E possess distinct advantages and disadvantages while possessing certain complementary qualities. Building on this observation, we developed a two-stage multi-modal bilinear model with a feature pooling module. This model aims to maximize the potential of both IHC and HE's feature representation, resulting in improved performance compared to their individual use. Our experiments demonstrate that incorporating IHC data into machine learning models, alongside H\&E stained images, leads to superior predictive results for cancer grading. The proposed framework achieves an impressive ACC higher of 0.953 on the public dataset BCI.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Fast Computation of Superquantile-Constrained Optimization Through Implicit Scenario Reduction
Authors:
Jake Roth,
Ying Cui
Abstract:
Superquantiles have recently gained significant interest as a risk-aware metric for addressing fairness and distribution shifts in statistical learning and decision making problems. This paper introduces a fast, scalable and robust second-order computational framework to solve large-scale optimization problems with superquantile-based constraints. Unlike empirical risk minimization, superquantile-…
▽ More
Superquantiles have recently gained significant interest as a risk-aware metric for addressing fairness and distribution shifts in statistical learning and decision making problems. This paper introduces a fast, scalable and robust second-order computational framework to solve large-scale optimization problems with superquantile-based constraints. Unlike empirical risk minimization, superquantile-based optimization requires ranking random functions evaluated across all scenarios to compute the tail conditional expectation. While this tail-based feature might seem computationally unfriendly, it provides an advantageous setting for a semismooth-Newton-based augmented Lagrangian method. The superquantile operator effectively reduces the dimensions of the Newton systems since the tail expectation involves considerably fewer scenarios. Notably, the extra cost of obtaining relevant second-order information and performing matrix inversions is often comparable to, and sometimes even less than, the effort required for gradient computation. Our developed solver is particularly effective when the number of scenarios substantially exceeds the number of decision variables. In synthetic problems with linear and convex diagonal quadratic objectives, numerical experiments demonstrate that our method outperforms existing approaches by a large margin: It achieves speeds more than 750 times faster for linear and quadratic objectives than the alternating direction method of multipliers as implemented by OSQP for computing low-accuracy solutions. Additionally, it is up to 25 times faster for linear objectives and 70 times faster for quadratic objectives than the commercial solver Gurobi, and 20 times faster for linear objectives and 30 times faster for quadratic objectives than the Portfolio Safeguard optimization suite for high-accuracy solution computations.
△ Less
Submitted 20 May, 2024; v1 submitted 13 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
DoLLM: How Large Language Models Understanding Network Flow Data to Detect Carpet Bombing DDoS
Authors:
Qingyang Li,
Yihang Zhang,
Zhidong Jia,
Yannan Hu,
Lei Zhang,
Jianrong Zhang,
Yongming Xu,
Yong Cui,
Zongming Guo,
Xinggong Zhang
Abstract:
It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infra…
▽ More
It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
GraphRelate3D: Context-Dependent 3D Object Detection with Inter-Object Relationship Graphs
Authors:
Mingyu Liu,
Ekim Yurtsever,
Marc Brede,
Jun Meng,
Walter Zimmer,
Xingcheng Zhou,
Bare Luka Zagar,
Yuning Cui,
Alois Knoll
Abstract:
Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object r…
▽ More
Accurate and effective 3D object detection is critical for ensuring the driving safety of autonomous vehicles. Recently, state-of-the-art two-stage 3D object detectors have exhibited promising performance. However, these methods refine proposals individually, ignoring the rich contextual information in the object relationships between the neighbor proposals. In this study, we introduce an object relation module, consisting of a graph generator and a graph neural network (GNN), to learn the spatial information from certain patterns to improve 3D object detection. Specifically, we create an inter-object relationship graph based on proposals in a frame via the graph generator to connect each proposal with its neighbor proposals. Afterward, the GNN module extracts edge features from the generated graph and iteratively refines proposal features with the captured edge features. Ultimately, we leverage the refined features as input to the detection head to obtain detection results. Our approach improves upon the baseline PV-RCNN on the KITTI validation set for the car class across easy, moderate, and hard difficulty levels by 0.82%, 0.74%, and 0.58%, respectively. Additionally, our method outperforms the baseline by more than 1% under the moderate and hard levels BEV AP on the test server.
△ Less
Submitted 10 May, 2024;
originally announced May 2024.
-
FIGRET: Fine-Grained Robustness-Enhanced Traffic Engineering
Authors:
Ximeng Liu,
Shizhen Zhao,
Yong Cui
Abstract:
Traffic Engineering (TE) is critical for improving network performance and reliability. A key challenge in TE is the management of sudden traffic bursts. Existing TE schemes often struggle to accurately determine the extent of focus required for these surges, thereby facing difficulties in achieving a balance between performance under normal and peak traffic conditions. To address this issue, we i…
▽ More
Traffic Engineering (TE) is critical for improving network performance and reliability. A key challenge in TE is the management of sudden traffic bursts. Existing TE schemes often struggle to accurately determine the extent of focus required for these surges, thereby facing difficulties in achieving a balance between performance under normal and peak traffic conditions. To address this issue, we introduce FIGRET, a Fine-Grained Robustness-Enhanced TE Scheme. FIGRET offers a novel approach to TE by providing varying levels of robustness enhancements, customized according to the distinct traffic characteristics of various source-destination pairs. By leveraging a sophisticated loss function and advanced deep learning techniques, FIGRET is capable of generating high-quality TE solutions efficiently. Our evaluations of real-world production networks, including Wide Area Networks and data centers, demonstrate that FIGRET significantly outperforms existing TE schemes. Compared to the TE scheme currently deployed in the Jupiter network of Google, FIGRET achieves a 9\%-34\% reduction in average Maximum Link Utilization and improves solution speed by $35\times$-$1800 \times$. Against DOTE, a state-of-the-art deep learning-based TE method, FIGRET substantially lowers the occurrence of significant congestion events triggered by traffic bursts by 41\%-53.9\% in topologies characterized by high traffic dynamics.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
Gravitational waves from cosmic strings in LISA: reconstruction pipeline and physics interpretation
Authors:
Jose J. Blanco-Pillado,
Yanou Cui,
Sachiko Kuroyanagi,
Marek Lewicki,
Germano Nardini,
Mauro Pieroni,
Ivan Yu. Rybak,
Lara Sousa,
Jeremy M. Wachter
Abstract:
We initiate the LISA template databank for stochastic gravitational wave backgrounds sourced by cosmic strings. We include two templates, an analytical template, which enables more flexible searches, and a numerical template derived directly from large Nambu-Goto simulations of string networks. Using searches based on these templates, we forecast the parameter space within the reach of the experim…
▽ More
We initiate the LISA template databank for stochastic gravitational wave backgrounds sourced by cosmic strings. We include two templates, an analytical template, which enables more flexible searches, and a numerical template derived directly from large Nambu-Goto simulations of string networks. Using searches based on these templates, we forecast the parameter space within the reach of the experiment and the precision with which their parameters will be reconstructed, provided a signal is observed. The reconstruction permits probing the Hubble expansion and new relativistic DoF in the early universe. We quantify the impact that astrophysical foregrounds can have on these searches. Finally, we discuss the impact that these observations would have on our understanding of the fundamental models behind the string networks. Overall, we prove that LISA has great potential for probing cosmic string models and may reach tensions as low as $Gμ=10^{-16} - 10^{-17} $, which translates into energy scales of the order $10^{11}~\text{GeV}$.
△ Less
Submitted 6 May, 2024;
originally announced May 2024.
-
A Survey on Contribution Evaluation in Vertical Federated Learning
Authors:
Yue Cui,
Chung-ju Huang,
Yuzhu Zhang,
Leye Wang,
Lixin Fan,
Xiaofang Zhou,
Qiang Yang
Abstract:
Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and ac…
▽ More
Vertical Federated Learning (VFL) has emerged as a critical approach in machine learning to address privacy concerns associated with centralized data storage and processing. VFL facilitates collaboration among multiple entities with distinct feature sets on the same user population, enabling the joint training of predictive models without direct data sharing. A key aspect of VFL is the fair and accurate evaluation of each entity's contribution to the learning process. This is crucial for maintaining trust among participating entities, ensuring equitable resource sharing, and fostering a sustainable collaboration framework. This paper provides a thorough review of contribution evaluation in VFL. We categorize the vast array of contribution evaluation techniques along the VFL lifecycle, granularity of evaluation, privacy considerations, and core computational methods. We also explore various tasks in VFL that involving contribution evaluation and analyze their required evaluation properties and relation to the VFL lifecycle phases. Finally, we present a vision for the future challenges of contribution evaluation in VFL. By providing a structured analysis of the current landscape and potential advancements, this paper aims to guide researchers and practitioners in the design and implementation of more effective, efficient, and privacy-centric VFL solutions. Relevant literature and open-source resources have been compiled and are being continuously updated at the GitHub repository: \url{https://github.com/cuiyuebing/VFL_CE}.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Distillation Matters: Empowering Sequential Recommenders to Match the Performance of Large Language Model
Authors:
Yu Cui,
Feng Liu,
Pengbo Wang,
Bohao Wang,
Heng Tang,
Yi Wan,
Jun Wang,
Jiawei Chen
Abstract:
Owing to their powerful semantic reasoning capabilities, Large Language Models (LLMs) have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conv…
▽ More
Owing to their powerful semantic reasoning capabilities, Large Language Models (LLMs) have been effectively utilized as recommenders, achieving impressive performance. However, the high inference latency of LLMs significantly restricts their practical deployment. To address this issue, this work investigates knowledge distillation from cumbersome LLM-based recommendation models to lightweight conventional sequential models. It encounters three challenges: 1) the teacher's knowledge may not always be reliable; 2) the capacity gap between the teacher and student makes it difficult for the student to assimilate the teacher's knowledge; 3) divergence in semantic space poses a challenge to distill the knowledge from embeddings. To tackle these challenges, this work proposes a novel distillation strategy, DLLM2Rec, specifically tailored for knowledge distillation from LLM-based recommendation models to conventional sequential models. DLLM2Rec comprises: 1) Importance-aware ranking distillation, which filters reliable and student-friendly knowledge by weighting instances according to teacher confidence and student-teacher consistency; 2) Collaborative embedding distillation integrates knowledge from teacher embeddings with collaborative signals mined from the data. Extensive experiments demonstrate the effectiveness of the proposed DLLM2Rec, boosting three typical sequential models with an average improvement of 47.97%, even enabling them to surpass LLM-based recommenders in some cases.
△ Less
Submitted 3 May, 2024; v1 submitted 1 May, 2024;
originally announced May 2024.
-
Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation
Authors:
Yunhao Ge,
Xiaohui Zeng,
Jacob Samuel Huffman,
Tsung-Yi Lin,
Ming-Yu Liu,
Yin Cui
Abstract:
Existing automatic captioning methods for visual content face challenges such as lack of detail, content hallucination, and poor instruction following. In this work, we propose VisualFactChecker (VFC), a flexible training-free pipeline that generates high-fidelity and detailed captions for both 2D images and 3D objects. VFC consists of three steps: 1) proposal, where image-to-text captioning model…
▽ More
Existing automatic captioning methods for visual content face challenges such as lack of detail, content hallucination, and poor instruction following. In this work, we propose VisualFactChecker (VFC), a flexible training-free pipeline that generates high-fidelity and detailed captions for both 2D images and 3D objects. VFC consists of three steps: 1) proposal, where image-to-text captioning models propose multiple initial captions; 2) verification, where a large language model (LLM) utilizes tools such as object detection and VQA models to fact-check proposed captions; 3) captioning, where an LLM generates the final caption by summarizing caption proposals and the fact check verification results. In this step, VFC can flexibly generate captions in various styles following complex instructions. We conduct comprehensive captioning evaluations using four metrics: 1) CLIP-Score for image-text similarity; 2) CLIP-Image-Score for measuring the image-image similarity between the original and the reconstructed image generated by a text-to-image model using the caption. 3) human study on Amazon Mechanical Turk; 4) GPT-4V for fine-grained evaluation. Evaluation results show that VFC outperforms state-of-the-art open-sourced captioning methods for 2D images on the COCO dataset and 3D assets on the Objaverse dataset. Our study demonstrates that by combining open-source models into a pipeline, we can attain captioning capability comparable to proprietary models such as GPT-4V, despite being over 10x smaller in model size.
△ Less
Submitted 30 April, 2024;
originally announced April 2024.
-
Semiparametric fiducial inference
Authors:
Yifan Cui,
Jan Hannig,
Paul Edlefsen
Abstract:
R. A. Fisher introduced the concept of fiducial as a potential replacement for the Bayesian posterior distribution in the 1930s. During the past century, fiducial approaches have been explored in various parametric and nonparametric settings. However, to the best of our knowledge, no fiducial inference has been developed in the realm of semiparametric statistics. In this paper, we propose a novel…
▽ More
R. A. Fisher introduced the concept of fiducial as a potential replacement for the Bayesian posterior distribution in the 1930s. During the past century, fiducial approaches have been explored in various parametric and nonparametric settings. However, to the best of our knowledge, no fiducial inference has been developed in the realm of semiparametric statistics. In this paper, we propose a novel fiducial approach for semiparametric models. To streamline our presentation, we use the Cox proportional hazards model, which is the most popular model for the analysis of survival data, as a running example. Other models and extensions are also discussed. In our experiments, we find our method to perform well especially in situations when the maximum likelihood estimator fails.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Large Language Models for Networking: Workflow, Advances and Challenges
Authors:
Chang Liu,
Xiaohui Xie,
Xinggong Zhang,
Yong Cui
Abstract:
The networking field is characterized by its high complexity and rapid iteration, requiring extensive expertise to accomplish network tasks, ranging from network design, configuration, diagnosis and security. The inherent complexity of these tasks, coupled with the ever-changing landscape of networking technologies and protocols, poses significant hurdles for traditional machine learning-based met…
▽ More
The networking field is characterized by its high complexity and rapid iteration, requiring extensive expertise to accomplish network tasks, ranging from network design, configuration, diagnosis and security. The inherent complexity of these tasks, coupled with the ever-changing landscape of networking technologies and protocols, poses significant hurdles for traditional machine learning-based methods. These methods often struggle to generalize and automate complex tasks in networking, as they require extensive labeled data, domain-specific feature engineering, and frequent retraining to adapt to new scenarios. However, the recent emergence of large language models (LLMs) has sparked a new wave of possibilities in addressing these challenges. LLMs have demonstrated remarkable capabilities in natural language understanding, generation, and reasoning. These models, trained on extensive data, can benefit the networking domain. Some efforts have already explored the application of LLMs in the networking domain and revealed promising results. By reviewing recent advances, we present an abstract workflow to describe the fundamental process involved in applying LLM for Networking. We introduce the highlights of existing works by category and explain in detail how they operate at different stages of the workflow. Furthermore, we delve into the challenges encountered, discuss potential solutions, and outline future research prospects. We hope that this survey will provide insight for researchers and practitioners, promoting the development of this interdisciplinary research field.
△ Less
Submitted 29 April, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.
-
Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works
Authors:
Xinfeng Yuan,
Siyu Yuan,
Yuhan Cui,
Tianhe Lin,
Xintao Wang,
Rui Xu,
Jiangjie Chen,
Deqing Yang
Abstract:
Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or c…
▽ More
Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or characteristic imitation, failing to capture the nuanced character understanding with LLMs. In this paper, we propose evaluating LLMs' character understanding capability via the character profiling task, i.e., summarizing character profiles from corresponding materials, a widely adopted yet understudied practice for RPA development. Specifically, we construct the CroSS dataset from literature experts and assess the generated profiles by comparing ground truth references and their applicability in downstream tasks. Our experiments, which cover various summarization methods and LLMs, have yielded promising results. These results strongly validate the character understanding capability of LLMs. Resources are available at https://github.com/Joanna0123/character_profiling.
△ Less
Submitted 2 July, 2024; v1 submitted 19 April, 2024;
originally announced April 2024.