subscribe to arXiv mailings

MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis

Authors: Wanggui He, Siming Fu, Mushui Liu, Xierui Wang, Wenyi Xiao, Fangxun Shu, Yi Wang, Lei Zhang, Zhelun Yu, Haoyuan Li, Ziwei Huang, LeiLei Gan, Hao Jiang

Abstract: Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by in… ▽ More Auto-regressive models have made significant progress in the realm of language generation, yet they do not perform on par with diffusion models in the domain of image synthesis. In this work, we introduce MARS, a novel framework for T2I generation that incorporates a specially designed Semantic Vision-Language Integration Expert (SemVIE). This innovative component integrates pre-trained LLMs by independently processing linguistic and visual information, freezing the textual component while fine-tuning the visual component. This methodology preserves the NLP capabilities of LLMs while imbuing them with exceptional visual understanding. Building upon the powerful base of the pre-trained Qwen-7B, MARS stands out with its bilingual generative capabilities corresponding to both English and Chinese language prompts and the capacity for joint image and text generation. The flexibility of this framework lends itself to migration towards any-to-any task adaptability. Furthermore, MARS employs a multi-stage training strategy that first establishes robust image-text alignment through complementary bidirectional tasks and subsequently concentrates on refining the T2I generation process, significantly augmenting text-image synchrony and the granularity of image details. Notably, MARS requires only 9% of the GPU days needed by SD1.5, yet it achieves remarkable results across a variety of benchmarks, illustrating the training efficiency and the potential for swift deployment in various applications. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: 14 pages, 9 figures

arXiv:2407.07060 [pdf, other]

Imaging-based Quantum Optomechanics

Authors: Christian M. Pluchar, Wenhua He, Jack Manley, Nicolas Deshler, Saikat Guha, Dalziel J. Wilson

Abstract: In active imaging protocols, information about a landscape is encoded into the spatial mode of a scattered photon. A common assumption is that the landscape is rigid; however, in principle it can be altered by radiation pressure, a concept that has found fruitful application in the field of quantum optomechanics. Here we explore active imaging of a mechanical resonator with an eye to generalizing… ▽ More In active imaging protocols, information about a landscape is encoded into the spatial mode of a scattered photon. A common assumption is that the landscape is rigid; however, in principle it can be altered by radiation pressure, a concept that has found fruitful application in the field of quantum optomechanics. Here we explore active imaging of a mechanical resonator with an eye to generalizing the concept of radiation pressure backaction to spatially multimode light. As a thought experiment, we consider imaging the flexural modes of a membrane by sorting the spatial modes of a laser reflected from its surface. We show that backaction in this setting arises from spatial photon shot noise, an effect that cannot be observed in single-mode optomechanics. We also derive the imprecision-backaction product for coherent illumination in the limit of purely spatial backaction, revealing it to be equivalent to the standard quantum limit for purely dispersive, single-mode optomechanical coupling. Finally, we show that optomechanical correlations due to spatial backaction can give rise to two-mode entangled light. In conjunction with high-$Q$ nanomechanics, our findings point to new opportunities at the interface of quantum imaging and optomechanics, including sensors and networks enhanced by spatial mode entanglement. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 10 pages, 5 figures

arXiv:2407.02301 [pdf, other]

CFinBench: A Comprehensive Chinese Financial Benchmark for Large Language Models

Authors: Ying Nie, Binwei Yan, Tianyu Guo, Hao Liu, Haoyu Wang, Wei He, Binfan Zheng, Weihao Wang, Qiang Li, Weijian Sun, Yunhe Wang, Dacheng Tao

Abstract: Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to b… ▽ More Large language models (LLMs) have achieved remarkable performance on various NLP tasks, yet their potential in more challenging and domain-specific task, such as finance, has not been fully explored. In this paper, we present CFinBench: a meticulously crafted, the most comprehensive evaluation benchmark to date, for assessing the financial knowledge of LLMs under Chinese context. In practice, to better align with the career trajectory of Chinese financial practitioners, we build a systematic evaluation from 4 first-level categories: (1) Financial Subject: whether LLMs can memorize the necessary basic knowledge of financial subjects, such as economics, statistics and auditing. (2) Financial Qualification: whether LLMs can obtain the needed financial qualified certifications, such as certified public accountant, securities qualification and banking qualification. (3) Financial Practice: whether LLMs can fulfill the practical financial jobs, such as tax consultant, junior accountant and securities analyst. (4) Financial Law: whether LLMs can meet the requirement of financial laws and regulations, such as tax law, insurance law and economic law. CFinBench comprises 99,100 questions spanning 43 second-level categories with 3 question types: single-choice, multiple-choice and judgment. We conduct extensive experiments of 50 representative LLMs with various model size on CFinBench. The results show that GPT4 and some Chinese-oriented models lead the benchmark, with the highest average accuracy being 60.16%, highlighting the challenge presented by CFinBench. The dataset and evaluation code are available at https://cfinbench.github.io/. △ Less

Submitted 2 July, 2024; originally announced July 2024.

arXiv:2407.00079 [pdf, other]

Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving

Authors: Ruoyu Qin, Zheming Li, Weiran He, Mingxing Zhang, Yongwei Wu, Weimin Zheng, Xinran Xu

Abstract: Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. The core of Mooncake is its KVCache-centric scheduler, which balances ma… ▽ More Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI. It features a KVCache-centric disaggregated architecture that separates the prefill and decoding clusters. It also leverages the underutilized CPU, DRAM, and SSD resources of the GPU cluster to implement a disaggregated cache of KVCache. The core of Mooncake is its KVCache-centric scheduler, which balances maximizing overall effective throughput while meeting latency-related Service Level Objectives (SLOs). Unlike traditional studies that assume all requests will be processed, Mooncake faces challenges due to highly overloaded scenarios. To mitigate these, we developed a prediction-based early rejection policy. Experiments show that Mooncake excels in long-context scenarios. Compared to the baseline method, Mooncake can achieve up to a 525% increase in throughput in certain simulated scenarios while adhering to SLOs. Under real workloads, Mooncake's innovative architecture enables Kimi to handle 75% more requests. △ Less

Submitted 9 July, 2024; v1 submitted 23 June, 2024; originally announced July 2024.

Comments: 23 pages, 13 figures

arXiv:2406.18599 [pdf, other]

Fudan Multi-purpose Active TArget Time Projection Chamber (fMeta-TPC) for Photonnuclear Reaction Experiments

Authors: Huang-Kai Wu, Xi-Yang Wang, Yu-Miao Wang, You-Jing Wang, De-Qing Fang, Wan-Bing He, Wei-Hu Ma, Xi-Guang Cao, Chang-Bo Fu, Xian-Gai Deng, Yu-Gang Ma

Abstract: Active Target Time Projection Chambers (AT-TPCs) are state-of-the-art tools in the field of low-energy nuclear physics, particularly suitable for experiments using low-intensity radioactive ion beams or gamma rays. The Fudan Multi-purpose Active Target Time Projection Chamber (fMeta-TPC) with 2048 channels has been developed to study $α$-clustering nuclei. {\fcb In this work, the focus is on the s… ▽ More Active Target Time Projection Chambers (AT-TPCs) are state-of-the-art tools in the field of low-energy nuclear physics, particularly suitable for experiments using low-intensity radioactive ion beams or gamma rays. The Fudan Multi-purpose Active Target Time Projection Chamber (fMeta-TPC) with 2048 channels has been developed to study $α$-clustering nuclei. {\fcb In this work, the focus is on the study of the photonuclear reaction with the Laser Compton Scattering (LCS) gamma source, especially for the decay of the highly excited $α$-cluster state.} The design of fMeta-TPC is described and a comprehensive evaluation of its offline performance is performed by ultraviolet (UV) laser and $^{241}$Am $α$ source. The result shows that the intrinsic angular resolution of the detector is within 0.30$^{\circ}$ and has an energy resolution of 6.85\% for 3.0 MeV $α$ particles. The gain uniformity of the detector is about 10\% (RMS/Mean), tested by the $^{55}$Fe X-ray source. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 10 pages, 12 figures

arXiv:2406.18049 [pdf]

Improving Entity Recognition Using Ensembles of Deep Learning and Fine-tuned Large Language Models: A Case Study on Adverse Event Extraction from Multiple Sources

Authors: Yiming Li, Deepthi Viswaroopan, William He, Jianfu Li, Xu Zuo, Hua Xu, Cui Tao

Abstract: Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual infor… ▽ More Adverse event (AE) extraction following COVID-19 vaccines from text data is crucial for monitoring and analyzing the safety profiles of immunizations. Traditional deep learning models are adept at learning intricate feature representations and dependencies in sequential data, but often require extensive labeled data. In contrast, large language models (LLMs) excel in understanding contextual information, but exhibit unstable performance on named entity recognition tasks, possibly due to their broad but unspecific training. This study aims to evaluate the effectiveness of LLMs and traditional deep learning models in AE extraction, and to assess the impact of ensembling these models on performance. In this study, we utilized reports and posts from the VAERS (n=621), Twitter (n=9,133), and Reddit (n=131) as our corpora. Our goal was to extract three types of entities: "vaccine", "shot", and "ae". We explored and fine-tuned (except GPT-4) multiple LLMs, including GPT-2, GPT-3.5, GPT-4, and Llama-2, as well as traditional deep learning models like RNN and BioBERT. To enhance performance, we created ensembles of the three models with the best performance. For evaluation, we used strict and relaxed F1 scores to evaluate the performance for each entity type, and micro-average F1 was used to assess the overall performance. The ensemble model achieved the highest performance in "vaccine", "shot", and "ae" with strict F1-scores of 0.878, 0.930, and 0.925, respectively, along with a micro-average score of 0.903. In conclusion, this study demonstrates the effectiveness and robustness of ensembling fine-tuned traditional deep learning models and LLMs, for extracting AE-related information. This study contributes to the advancement of biomedical natural language processing, providing valuable insights into improving AE extraction from text data for pharmacovigilance and public health surveillance. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17838 [pdf, other]

InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation

Authors: Jinbin Huang, Wenbin He, Liang Gou, Liu Ren, Chris Bryan

Abstract: The emergence of large-scale pre-trained models has heightened their application in various downstream tasks, yet deployment is a challenge in environments with limited computational resources. Knowledge distillation has emerged as a solution in such scenarios, whereby knowledge from large teacher models is transferred into smaller student' models, but this is a non-trivial process that traditiona… ▽ More The emergence of large-scale pre-trained models has heightened their application in various downstream tasks, yet deployment is a challenge in environments with limited computational resources. Knowledge distillation has emerged as a solution in such scenarios, whereby knowledge from large teacher models is transferred into smaller student' models, but this is a non-trivial process that traditionally requires technical expertise in AI/ML. To address these challenges, this paper presents InFiConD, a novel framework that leverages visual concepts to implement the knowledge distillation process and enable subsequent no-code fine-tuning of student models. We develop a novel knowledge distillation pipeline based on extracting text-aligned visual concepts from a concept corpus using multimodal models, and construct highly interpretable linear student models based on visual concepts that mimic a teacher model in a response-based manner. InFiConD's interface allows users to interactively fine-tune the student model by manipulating concept influences directly in the user interface. We validate InFiConD via a robust usage scenario and user study. Our findings indicate that InFiConD's human-in-the-loop and visualization-driven approach enables users to effectively create and analyze student models, understand how knowledge is transferred, and efficiently perform fine-tuning operations. We discuss how this work highlights the potential of interactive and visual methods in making knowledge distillation and subsequent no-code fine-tuning more accessible and adaptable to a wider range of users with domain-specific demands. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16966 [pdf, other]

Mitigating Noisy Supervision Using Synthetic Samples with Soft Labels

Authors: Yangdi Lu, Wenbo He

Abstract: Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training, resulting in poor generalization performance. During an early learning phase, deep neural networks have been observed to… ▽ More Noisy labels are ubiquitous in real-world datasets, especially in the large-scale ones derived from crowdsourcing and web searching. It is challenging to train deep neural networks with noisy datasets since the networks are prone to overfitting the noisy labels during training, resulting in poor generalization performance. During an early learning phase, deep neural networks have been observed to fit the clean samples before memorizing the mislabeled samples. In this paper, we dig deeper into the representation distributions in the early learning phase and find that, regardless of their noisy labels, learned representations of images from the same category still congregate together. Inspired by it, we propose a framework that trains the model with new synthetic samples to mitigate the impact of noisy labels. Specifically, we propose a mixing strategy to create the synthetic samples by aggregating original samples with their top-K nearest neighbours, wherein the weights are calculated using a mixture model learning from the per-sample loss distribution. To enhance the performance in the presence of extreme label noise, we estimate the soft targets by gradually correcting the noisy labels. Furthermore, we demonstrate that the estimated soft targets yield a more accurate approximation to ground truth labels and the proposed method produces a superior quality of learned representations with more separated and clearly bounded clusters. The extensive experiments in two benchmarks (CIFAR-10 and CIFAR-100) and two larg-scale real-world datasets (Clothing1M and Webvision) demonstrate that our approach outperforms the state-of-the-art methods and robustness of the learned representation. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Noisy labels, Machine learning, Similarity Search

arXiv:2406.15982 [pdf, other]

Learning with Noisy Ground Truth: From 2D Classification to 3D Reconstruction

Authors: Yangdi Lu, Wenbo He

Abstract: Deep neural networks has been highly successful in data-intense computer vision applications, while such success relies heavily on the massive and clean data. In real-world scenarios, clean data sometimes is difficult to obtain. For example, in image classification and segmentation tasks, precise annotations of millions samples are generally very expensive and time-consuming. In 3D static scene re… ▽ More Deep neural networks has been highly successful in data-intense computer vision applications, while such success relies heavily on the massive and clean data. In real-world scenarios, clean data sometimes is difficult to obtain. For example, in image classification and segmentation tasks, precise annotations of millions samples are generally very expensive and time-consuming. In 3D static scene reconstruction task, most NeRF related methods require the foundational assumption of the static scene (e.g. consistent lighting condition and persistent object positions), which is often violated in real-world scenarios. To address these problem, learning with noisy ground truth (LNGT) has emerged as an effective learning method and shows great potential. In this short survey, we propose a formal definition unify the analysis of LNGT LNGT in the context of different machine learning tasks (classification and regression). Based on this definition, we propose a novel taxonomy to classify the existing work according to the error decomposition with the fundamental definition of machine learning. Further, we provide in-depth analysis on memorization effect and insightful discussion about potential future research opportunities from 2D classification to 3D reconstruction, in the hope of providing guidance to follow-up research. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: Computer vision, Noisy Labels, 3D reconstruction, 3D Gaussian Splats, (Work still in progress)

arXiv:2406.15973 [pdf, ps, other]

Performance of the plastic scintillator modules for the top veto tracker of the Taishan Antineutrino Observatory

Authors: Guang Luo, Xiaohao Yin, Fengpeng An, Zhimin Wang, Y. K. Hor, Peizhi Lu, Ruhui Li, Yichen Li, Wei He, Wei Wang, Xiang Xiao

Abstract: For tracking and tagging the cosmic-ray muon (CR-muon), the Taishan Antineutrino Observatory (TAO) experiment is equipped with a top veto tracker (TVT) system composed of 160 modules, each consisting of plastic scintillator (PS) strip as target material, embedded wavelength shifting fiber (WLS-fiber) as photon collection and transmission medium, and silicon photomultipliers (SiPMs) at both ends as… ▽ More For tracking and tagging the cosmic-ray muon (CR-muon), the Taishan Antineutrino Observatory (TAO) experiment is equipped with a top veto tracker (TVT) system composed of 160 modules, each consisting of plastic scintillator (PS) strip as target material, embedded wavelength shifting fiber (WLS-fiber) as photon collection and transmission medium, and silicon photomultipliers (SiPMs) at both ends as read-out. This article introduces the unique design of the module and reports the excellent performance of all modules, providing guidance and important reference for the process design of scintillation detectors with WLS-fibers. In general, when the CR-muon hits the center of plastic scintillator and without optical grease, the most probable value of the signal amplitude at one end of the PS strip is greater than 40.8 p.e. and 51.5 p.e. for all the 2 m-length modules and 1.5 m-length modules respectively. The CR-muon tagging efficiency of PS module is measured to be more than 99.3%, which meets the requirement of TAO. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.15175 [pdf, other]

Enhancing Idiomatic Representation in Multiple Languages via an Adaptive Contrastive Triplet Loss

Authors: Wei He, Marco Idiart, Carolina Scarton, Aline Villavicencio

Abstract: Accurately modeling idiomatic or non-compositional language has been a longstanding challenge in Natural Language Processing (NLP). This is partly because these expressions do not derive their meanings solely from their constituent words, but also due to the scarcity of relevant data resources, and their impact on the performance of downstream tasks such as machine translation and simplification.… ▽ More Accurately modeling idiomatic or non-compositional language has been a longstanding challenge in Natural Language Processing (NLP). This is partly because these expressions do not derive their meanings solely from their constituent words, but also due to the scarcity of relevant data resources, and their impact on the performance of downstream tasks such as machine translation and simplification. In this paper we propose an approach to model idiomaticity effectively using a triplet loss that incorporates the asymmetric contribution of components words to an idiomatic meaning for training language models by using adaptive contrastive learning and resampling miners to build an idiomatic-aware learning objective. Our proposed method is evaluated on a SemEval challenge and outperforms previous alternatives significantly in many metrics. △ Less

Submitted 21 June, 2024; originally announced June 2024.

arXiv:2406.10652 [pdf, other]

MDeRainNet: An Efficient Neural Network for Rain Streak Removal from Macro-pixel Images

Authors: Tao Yan, Weijiang He, Chenglong Wang, Xiangjie Zhu, Yinghui Wang, Rynson W. H. Lau

Abstract: Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benef… ▽ More Since rainy weather always degrades image quality and poses significant challenges to most computer vision-based intelligent systems, image de-raining has been a hot research topic. Fortunately, in a rainy light field (LF) image, background obscured by rain streaks in one sub-view may be visible in the other sub-views, and implicit depth information and recorded 4D structural information may benefit rain streak detection and removal. However, existing LF image rain removal methods either do not fully exploit the global correlations of 4D LF data or only utilize partial sub-views, resulting in sub-optimal rain removal performance and no-equally good quality for all de-rained sub-views. In this paper, we propose an efficient network, called MDeRainNet, for rain streak removal from LF images. The proposed network adopts a multi-scale encoder-decoder architecture, which directly works on Macro-pixel images (MPIs) to improve the rain removal performance. To fully model the global correlation between the spatial and the angular information, we propose an Extended Spatial-Angular Interaction (ESAI) module to merge them, in which a simple and effective Transformer-based Spatial-Angular Interaction Attention (SAIA) block is also proposed for modeling long-range geometric correlations and making full use of the angular information. Furthermore, to improve the generalization performance of our network on real-world rainy scenes, we propose a novel semi-supervised learning framework for our MDeRainNet, which utilizes multi-level KL loss to bridge the domain gap between features of synthetic and real-world rain streaks and introduces colored-residue image guided contrastive regularization to reconstruct rain-free images. Extensive experiments conducted on synthetic and real-world LFIs demonstrate that our method outperforms the state-of-the-art methods both quantitatively and qualitatively. △ Less

Submitted 15 June, 2024; originally announced June 2024.

Comments: 13 pages, 13 figures, 4 tables

arXiv:2406.10026 [pdf]

Retiming dynamics of harmonically modelocked laser solitons in a self-driven optomechanical lattice

Authors: Xiaocong Wang, Benhai Wang, Wenbin He, Xintong Zhang, Qi Huang, Zhiyuan Huang, Xin Jiang, Philip St. J. Russell, Meng Pang

Abstract: Harmonic mode-locking, realized actively or passively, is an effective technique for increasing the repetition rate of lasers, with important applications in optical sampling, laser micro-machining and frequency metrology. It is critically important to understand how a harmonically mode-locked pulse train responds to external perturbations and noise, so as to make sure that it is stable and resist… ▽ More Harmonic mode-locking, realized actively or passively, is an effective technique for increasing the repetition rate of lasers, with important applications in optical sampling, laser micro-machining and frequency metrology. It is critically important to understand how a harmonically mode-locked pulse train responds to external perturbations and noise, so as to make sure that it is stable and resistant to noise. Here, in a series of carefully designed experiments, we elucidate the retiming dynamics of laser pulses generated in a soliton fiber laser harmonically mode-locked at ~2 GHz to the acoustic resonance in a photonic crystal fiber (PCF) core. We characterize the self-driven optomechanical lattice along the PCF using a homodyne set-up, and reveal that each soliton undergoes damped oscillatory retiming within its trapping potential after an abrupt perturbation. In addition we show, through statistical analysis of the intra-cavity pulse spacing, how the trapping potentials are effective for suppressing timing jitter. The experimental results are well described using a dynamic model including dissipation, which provides valuable insight into the stability and noise performance of optomechanically mode-locked laser systems, and may also be useful for studying complex inter-soliton interactions. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.09844 [pdf, other]

Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy

Authors: Linhan Ma, Xinfa Zhu, Yuanjun Lv, Zhichao Wang, Ziqian Wang, Wendi He, Hongbin Zhou, Lei Xie

Abstract: Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well as training-inference mismatch still hinder conversion performance. In this paper, we propose Vec-Tok-VC+, a novel prompt-based zero-shot VC model im… ▽ More Zero-shot voice conversion (VC) aims to transform source speech into arbitrary unseen target voice while keeping the linguistic content unchanged. Recent VC methods have made significant progress, but semantic losses in the decoupling process as well as training-inference mismatch still hinder conversion performance. In this paper, we propose Vec-Tok-VC+, a novel prompt-based zero-shot VC model improved from Vec-Tok Codec, achieving voice conversion given only a 3s target speaker prompt. We design a residual-enhanced K-Means decoupler to enhance the semantic content extraction with a two-layer clustering process. Besides, we employ teacher-guided refinement to simulate the conversion process to eliminate the training-inference mismatch, forming a dual-mode training strategy. Furthermore, we design a multi-codebook progressive loss function to constrain the layer-wise output of the model from coarse to fine to improve speaker similarity and content accuracy. Objective and subjective evaluations demonstrate that Vec-Tok-VC+ outperforms the strong baselines in naturalness, intelligibility, and speaker similarity. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted by INTERSPEECH2024

arXiv:2406.08499 [pdf, ps, other]

More Efficient $k$-wise Independent Permutations from Random Reversible Circuits via log-Sobolev Inequalities

Authors: Lucas Gretta, William He, Angelos Pelecanos

Abstract: We prove that the permutation computed by a reversible circuit with $\tilde{O}(nk\cdot \log(1/\varepsilon))$ random $3$-bit gates is $\varepsilon$-approximately $k$-wise independent. Our bound improves on currently known bounds in the regime when the approximation error $\varepsilon$ is not too small. We obtain our results by analyzing the log-Sobolev constants of appropriate Markov chains rather… ▽ More We prove that the permutation computed by a reversible circuit with $\tilde{O}(nk\cdot \log(1/\varepsilon))$ random $3$-bit gates is $\varepsilon$-approximately $k$-wise independent. Our bound improves on currently known bounds in the regime when the approximation error $\varepsilon$ is not too small. We obtain our results by analyzing the log-Sobolev constants of appropriate Markov chains rather than their spectral gaps. △ Less

Submitted 8 May, 2024; originally announced June 2024.

Comments: 19 pages

arXiv:2406.08372 [pdf, other]

APSeg: Auto-Prompt Network for Cross-Domain Few-Shot Semantic Segmentation

Authors: Weizhao He, Yang Zhang, Wei Zhuo, Linlin Shen, Jiaqi Yang, Songhe Deng, Liang Sun

Abstract: Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anyt… ▽ More Few-shot semantic segmentation (FSS) endeavors to segment unseen classes with only a few labeled samples. Current FSS methods are commonly built on the assumption that their training and application scenarios share similar domains, and their performances degrade significantly while applied to a distinct domain. To this end, we propose to leverage the cutting-edge foundation model, the Segment Anything Model (SAM), for generalization enhancement. The SAM however performs unsatisfactorily on domains that are distinct from its training data, which primarily comprise natural scene images, and it does not support automatic segmentation of specific semantics due to its interactive prompting mechanism. In our work, we introduce APSeg, a novel auto-prompt network for cross-domain few-shot semantic segmentation (CD-FSS), which is designed to be auto-prompted for guiding cross-domain segmentation. Specifically, we propose a Dual Prototype Anchor Transformation (DPAT) module that fuses pseudo query prototypes extracted based on cycle-consistency with support prototypes, allowing features to be transformed into a more stable domain-agnostic space. Additionally, a Meta Prompt Generator (MPG) module is introduced to automatically generate prompt embeddings, eliminating the need for manual visual prompts. We build an efficient model which can be applied directly to target domains without fine-tuning. Extensive experiments on four cross-domain datasets show that our model outperforms the state-of-the-art CD-FSS method by 5.24% and 3.10% in average accuracy on 1-shot and 5-shot settings, respectively. △ Less

Submitted 12 June, 2024; v1 submitted 12 June, 2024; originally announced June 2024.

Comments: 15 pages, 9 figures

arXiv:2406.07209 [pdf, other]

MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance

Authors: X. Wang, Siming Fu, Qihan Huang, Wanggui He, Hao Jiang

Abstract: Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, the need to accurately maintain the details of each referenced subje… ▽ More Recent advancements in text-to-image generation models have dramatically enhanced the generation of photorealistic images from textual prompts, leading to an increased interest in personalized text-to-image applications, particularly in multi-subject scenarios. However, these advances are hindered by two main challenges: firstly, the need to accurately maintain the details of each referenced subject in accordance with the textual descriptions; and secondly, the difficulty in achieving a cohesive representation of multiple subjects in a single image without introducing inconsistencies. To address these concerns, our research introduces the MS-Diffusion framework for layout-guided zero-shot image personalization with multi-subjects. This innovative approach integrates grounding tokens with the feature resampler to maintain detail fidelity among subjects. With the layout guidance, MS-Diffusion further improves the cross-attention to adapt to the multi-subject inputs, ensuring that each subject condition acts on specific areas. The proposed multi-subject cross-attention orchestrates harmonious inter-subject compositions while preserving the control of texts. Comprehensive quantitative and qualitative experiments affirm that this method surpasses existing models in both image and text fidelity, promoting the development of personalized text-to-image generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07064 [pdf, other]

Modeling fibrous tissue in vascular fluid-structure interaction: a morphology-based pipeline and biomechanical significance

Authors: Yujie Sun, Jiayi Huang, Qingshuang Lu, Xinhai Yue, Xuanming Huang, Wei He, Yun Shi, Ju Liu

Abstract: We propose a suite of technologies for analyzing the interaction between anisotropic arterial walls and blood flow for subject-specific geometries. Utilizing an established lumen modeling strategy, we present a comprehensive pipeline for generating the thick-walled artery models. Through a specialized mesh generation procedure, we obtain the meshes for the arterial lumen and wall with mesh continu… ▽ More We propose a suite of technologies for analyzing the interaction between anisotropic arterial walls and blood flow for subject-specific geometries. Utilizing an established lumen modeling strategy, we present a comprehensive pipeline for generating the thick-walled artery models. Through a specialized mesh generation procedure, we obtain the meshes for the arterial lumen and wall with mesh continuity across the interface ensured. Exploiting the centerline information, a series of procedures is introduced for generating local basis vectors within the arterial wall. The procedures are tailored to handle thick-walled and, in particular, aneurysmatic tissues in which the basis vectors may exhibit transmural variations. Additionally, we propose methods to accurately identify the centerline in multi-branched vessels and bifurcating regions. The developed fiber generation method is evaluated against the strategy using linear elastic analysis, demonstrating that the proposed approach yields satisfactory fiber definitions in the considered benchmark. Finally, we examine the impact of anisotropic arterial wall models on the vascular fluid-structure interaction analysis through numerical examples. For comparison purposes, the neo-Hookean model is considered. The first case involves an idealized curved geometry, while the second case studies an image-based abdominal aorta model. The numerical results reveal that the deformation and stress distribution are critically related to the constitutive model of the wall, while the hemodynamic factors are less sensitive to the wall model. This work paves the way for more accurate image-based vascular modeling and enhances the prediction of arterial behavior under physiologically realistic conditions. △ Less

Submitted 20 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05271 [pdf, other]

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation

Authors: Xiaoqi Wang, Wenbin He, Xiwei Xuan, Clint Sebastian, Jorge Piazentin Ono, Xin Li, Sima Behpour, Thang Doan, Liang Gou, Han Wei Shen, Liu Ren

Abstract: The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in… ▽ More The open-vocabulary image segmentation task involves partitioning images into semantically meaningful segments and classifying them with flexible text-defined categories. The recent vision-based foundation models such as the Segment Anything Model (SAM) have shown superior performance in generating class-agnostic image segments. The main challenge in open-vocabulary image segmentation now lies in accurately classifying these segments into text-defined categories. In this paper, we introduce the Universal Segment Embedding (USE) framework to address this challenge. This framework is comprised of two key components: 1) a data pipeline designed to efficiently curate a large amount of segment-text pairs at various granularities, and 2) a universal segment embedding model that enables precise segment classification into a vast range of text-defined categories. The USE model can not only help open-vocabulary image segmentation but also facilitate other downstream tasks (e.g., querying and ranking). Through comprehensive experimental studies on semantic segmentation and part segmentation benchmarks, we demonstrate that the USE framework outperforms state-of-the-art open-vocabulary segmentation methods. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04151 [pdf, other]

AgentGym: Evolving Large Language Model-based Agents across Diverse Environments

Authors: Zhiheng Xi, Yiwen Ding, Wenxiang Chen, Boyang Hong, Honglin Guo, Junzhe Wang, Dingwen Yang, Chenyang Liao, Xin Guo, Wei He, Songyang Gao, Lu Chen, Rui Zheng, Yicheng Zou, Tao Gui, Qi Zhang, Xipeng Qiu, Xuanjing Huang, Zuxuan Wu, Yu-Gang Jiang

Abstract: Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis… ▽ More Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervision, which is hard to scale and limits environmental exploration; or they let agents explore and learn in isolated environments, resulting in specialist agents with limited generalization. In this paper, we take the first step towards building generally-capable LLM-based agents with self-evolution ability. We identify a trinity of ingredients: 1) diverse environments for agent exploration and learning, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable evolution method. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, a benchmark suite, and high-quality trajectories across environments. Next, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We release the AgentGym suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations. The AgentGym suite is available on https://github.com/WooooDyy/AgentGym. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project site: https://agentgym.github.io

arXiv:2406.01984 [pdf, other]

Unified one-parameter scaling function for Anderson localization transitions in non-reciprocal non-Hermitian systems

Authors: C. Wang, Wenxue He, X. R. Wang, Hechen Ren

Abstract: By using dimensionless conductances as scaling variables, the conventional one-parameter scaling theory of localization fails for non-reciprocal non-Hermitian systems such as the Hanato-Nelson model. Here, we propose a one-parameter scaling function using the participation ratio as the scaling variable. Employing a highly accurate numerical procedure based on exact diagonalization, we demonstrate… ▽ More By using dimensionless conductances as scaling variables, the conventional one-parameter scaling theory of localization fails for non-reciprocal non-Hermitian systems such as the Hanato-Nelson model. Here, we propose a one-parameter scaling function using the participation ratio as the scaling variable. Employing a highly accurate numerical procedure based on exact diagonalization, we demonstrate that this one-parameter scaling function can describe Anderson localization transitions of non-reciprocal non-Hermitian systems in one and two dimensions of symmetry classes AI and A. The critical exponents of correlation lengths depend on symmetries and dimensionality only, a typical feature of universality. Moreover, we derive a complex-gap equation based on the self-consistent Born approximation that can determine the disorder at which the point gap closes. The obtained disorders match perfectly the critical disorders of Anderson localization transitions from the one-parameter scaling function. Finally, we show that the one-parameter scaling function is also valid for Anderson localization transitions in reciprocal non-Hermitian systems such as two-dimensional class AII$^\dagger$ and can, thus, serve as a unified scaling function for disordered non-Hermitian systems. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures

arXiv:2406.01060 [pdf, other]

Mechanical dynamics around higher-order exceptional point in magno-optomechanics

Authors: Wen-Di He, Xiao-Hong Fan, Ming-Yue Liu, Guo-Qiang Zhang, Hai-Chao Li, Wei Xiong

Abstract: We theoretically study diverse exceptional points (EPs) in an experimentally feasible magno-optomechanics consisting of an optomechanical subsystem coupled to a magnomechanical subsystem via physically direct contact. By adiabatically eliminating both the cavity and the Kittel mode, dissipative and parity-time symmetric exceptional points can be observed. When only the cavity mode is eliminated, a… ▽ More We theoretically study diverse exceptional points (EPs) in an experimentally feasible magno-optomechanics consisting of an optomechanical subsystem coupled to a magnomechanical subsystem via physically direct contact. By adiabatically eliminating both the cavity and the Kittel mode, dissipative and parity-time symmetric exceptional points can be observed. When only the cavity mode is eliminated, a second (third) -order pseudo-Hermitian EP emerges for nondegenerate (degenerate) mechanical modes. The distinct dynamical behavior of two mechanical modes around these EPs are further studied. Our proposal provides a promising way to engineer diverse EPs and quantify non-Hermitian phase transition with exceptional dynamical behavior in magno-optomechanics. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: 6 pages,5 figures

arXiv:2405.20046 [pdf, other]

Cross-Training with Multi-View Knowledge Fusion for Heterogenous Federated Learning

Authors: Zhuang Qi, Lei Meng, Weihao He, Ruohan Zhang, Yu Wang, Xin Qi, Xiangxu Meng

Abstract: Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability. However, the data heterogeneity between sources may lead models to gradually forget previously acquired knowledge when undergoing cross-training to adapt to new tasks or data sources. We argue that integrating personalized and global know… ▽ More Federated learning benefits from cross-training strategies, which enables models to train on data from distinct sources to improve the generalization capability. However, the data heterogeneity between sources may lead models to gradually forget previously acquired knowledge when undergoing cross-training to adapt to new tasks or data sources. We argue that integrating personalized and global knowledge to gather information from multiple perspectives could potentially improve performance. To achieve this goal, this paper presents a novel approach that enhances federated learning through a cross-training scheme incorporating multi-view information. Specifically, the proposed method, termed FedCT, includes three main modules, where the consistency-aware knowledge broadcasting module aims to optimize model assignment strategies, which enhances collaborative advantages between clients and achieves an efficient federated learning process. The multi-view knowledge-guided representation learning module leverages fused prototypical knowledge from both global and local views to enhance the preservation of local knowledge before and after model exchange, as well as to ensure consistency between local and global knowledge. The mixup-based feature augmentation module aggregates rich information to further increase the diversity of feature spaces, which enables the model to better discriminate complex samples. Extensive experiments were conducted on four datasets in terms of performance comparison, ablation study, in-depth analysis and case study. The results demonstrated that FedCT alleviates knowledge forgetting from both local and global views, which enables it outperform state-of-the-art methods. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19245 [pdf, ps, other]

Efficient Optimal Control of Open Quantum Systems

Authors: Wenhao He, Tongyang Li, Xiantao Li, Zecheng Li, Chunhao Wang, Ke Wang

Abstract: The optimal control problem for open quantum systems can be formulated as a time-dependent Lindbladian that is parameterized by a number of time-dependent control variables. Given an observable and an initial state, the goal is to tune the control variables so that the expected value of some observable with respect to the final state is maximized. In this paper, we present algorithms for solving t… ▽ More The optimal control problem for open quantum systems can be formulated as a time-dependent Lindbladian that is parameterized by a number of time-dependent control variables. Given an observable and an initial state, the goal is to tune the control variables so that the expected value of some observable with respect to the final state is maximized. In this paper, we present algorithms for solving this optimal control problem efficiently, i.e., having a poly-logarithmic dependency on the system dimension, which is exponentially faster than best-known classical algorithms. Our algorithms are hybrid, consisting of both quantum and classical components. The quantum procedure simulates time-dependent Lindblad evolution that drives the initial state to the final state, and it also provides access to the gradients of the objective function via quantum gradient estimation. The classical procedure uses the gradient information to update the control variables. At the technical level, we provide the first (to the best of our knowledge) simulation algorithm for time-dependent Lindbladians with an $\ell_1$-norm dependence. As an alternative, we also present a simulation algorithm in the interaction picture to improve the algorithm for the cases where the time-independent component of a Lindbladian dominates the time-dependent part. On the classical side, we heavily adapt the state-of-the-art classical optimization analysis to interface with the quantum part of our algorithms. Both the quantum simulation techniques and the classical optimization analyses might be of independent interest. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: 52 pages. To appear in the proceedings of TQC 2024

arXiv:2405.17915 [pdf, other]

Long Context is Not Long at All: A Prospector of Long-Dependency Data for Large Language Models

Authors: Longze Chen, Ziqiang Liu, Wanwei He, Yunshui Li, Run Luo, Min Yang

Abstract: Long-context modeling capabilities are important for large language models (LLMs) in various applications. However, directly training LLMs with long context windows is insufficient to enhance this capability since some training samples do not exhibit strong semantic dependencies across long contexts. In this study, we propose a data mining framework \textbf{ProLong} that can assign each training s… ▽ More Long-context modeling capabilities are important for large language models (LLMs) in various applications. However, directly training LLMs with long context windows is insufficient to enhance this capability since some training samples do not exhibit strong semantic dependencies across long contexts. In this study, we propose a data mining framework \textbf{ProLong} that can assign each training sample with a long dependency score, which can be used to rank and filter samples that are more advantageous for enhancing long-context modeling abilities in LLM training. Specifically, we first use delta perplexity scores to measure the \textit{Dependency Strength} between text segments in a given document. Then we refine this metric based on the \textit{Dependency Distance} of these segments to incorporate spatial relationships across long-contexts. Final results are calibrated with a \textit{Dependency Specificity} metric to prevent trivial dependencies introduced by repetitive patterns. Moreover, a random sampling approach is proposed to optimize the computational efficiency of ProLong. Comprehensive experiments on multiple benchmarks indicate that ProLong effectively identifies documents that carry long dependencies and LLMs trained on these documents exhibit significantly enhanced long-context modeling capabilities. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: 13 pages, 5 figures, ACL 2024

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.17790 [pdf, other]

Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve images according to the given image or language instructions. Instruct-ReID is the first exploration of a general ReID setting, where existing 6 ReID tasks can be viewed as special cases by assigning different instructions. To facilitate research in this new instruct-ReID task, we propose a large-scale OmniReID++ benchmark equipped with diverse data and comprehensive evaluation methods e.g., task specific and task-free evaluation settings. In the task-specific evaluation setting, gallery sets are categorized according to specific ReID tasks. We propose a novel baseline model, IRM, with an adaptive triplet loss to handle various retrieval tasks within a unified framework. For task-free evaluation setting, where target person images are retrieved from task-agnostic gallery sets, we further propose a new method called IRM++ with novel memory bank-assisted learning. Extensive evaluations of IRM and IRM++ on OmniReID++ benchmark demonstrate the superiority of our proposed methods, achieving state-of-the-art performance on 10 test sets. The datasets, the model, and the code will be available at https://github.com/hwz-zju/Instruct-ReID △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

arXiv:2405.17470 [pdf, other]

Athena: Efficient Block-Wise Post-Training Quantization for Large Language Models Using Second-Order Matrix Derivative Information

Authors: Yanshu Wang, Wenyang He, Tong Yang

Abstract: Large Language Models (LLMs) have significantly advanced natural language processing tasks such as machine translation, text generation, and sentiment analysis. However, their large size, often consisting of billions of parameters, poses challenges for storage, computation, and deployment, particularly in resource-constrained environments like mobile devices and edge computing platforms. Effective… ▽ More Large Language Models (LLMs) have significantly advanced natural language processing tasks such as machine translation, text generation, and sentiment analysis. However, their large size, often consisting of billions of parameters, poses challenges for storage, computation, and deployment, particularly in resource-constrained environments like mobile devices and edge computing platforms. Effective compression and quantization techniques are crucial for addressing these issues, reducing memory footprint and computational requirements without significantly compromising performance. Traditional methods that uniformly map parameters to compressed spaces fail to account for the uneven distribution of parameters, leading to substantial accuracy loss. In this work, we propose Athena, a novel algorithm for efficient block-wise post-training quantization of LLMs. Athena leverages Second-Order Matrix Derivative Information to guide the quantization process using the curvature information of the loss landscape. By grouping parameters by columns or rows and iteratively optimizing the quantization process, Athena updates the model parameters and Hessian matrix to achieve significant compression while maintaining high accuracy. This makes Athena a practical solution for deploying LLMs in various settings. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.17459 [pdf]

Integrating Medical Imaging and Clinical Reports Using Multimodal Deep Learning for Advanced Disease Analysis

Authors: Ziyan Yao, Fei Lin, Sheng Chai, Weijie He, Lu Dai, Xinghui Fei

Abstract: In this paper, an innovative multi-modal deep learning model is proposed to deeply integrate heterogeneous information from medical images and clinical reports. First, for medical images, convolutional neural networks were used to extract high-dimensional features and capture key visual information such as focal details, texture and spatial distribution. Secondly, for clinical report text, a two-w… ▽ More In this paper, an innovative multi-modal deep learning model is proposed to deeply integrate heterogeneous information from medical images and clinical reports. First, for medical images, convolutional neural networks were used to extract high-dimensional features and capture key visual information such as focal details, texture and spatial distribution. Secondly, for clinical report text, a two-way long and short-term memory network combined with an attention mechanism is used for deep semantic understanding, and key statements related to the disease are accurately captured. The two features interact and integrate effectively through the designed multi-modal fusion layer to realize the joint representation learning of image and text. In the empirical study, we selected a large medical image database covering a variety of diseases, combined with corresponding clinical reports for model training and validation. The proposed multimodal deep learning model demonstrated substantial superiority in the realms of disease classification, lesion localization, and clinical description generation, as evidenced by the experimental results. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.15232 [pdf, other]

DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception

Authors: Run Luo, Yunshui Li, Longze Chen, Wanwei He, Ting-En Lin, Ziqiang Liu, Lei Zhang, Zikai Song, Xiaobo Xia, Tongliang Liu, Min Yang, Binyuan Hui

Abstract: The development of large language models (LLMs) has significantly advanced the emergence of large multimodal models (LMMs). While LMMs have achieved tremendous success by promoting the synergy between multimodal comprehension and creation, they often face challenges when confronted with out-of-distribution data. This is primarily due to their reliance on image encoders trained to encode images int… ▽ More The development of large language models (LLMs) has significantly advanced the emergence of large multimodal models (LMMs). While LMMs have achieved tremendous success by promoting the synergy between multimodal comprehension and creation, they often face challenges when confronted with out-of-distribution data. This is primarily due to their reliance on image encoders trained to encode images into task-relevant features, which may lead them to disregard irrelevant details. Delving into the modeling capabilities of diffusion models for images naturally prompts the question: Can diffusion models serve as the eyes of large language models for image perception? In this paper, we propose DEEM, a simple and effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder. This addresses the drawbacks of previous methods that solely relied on image encoders like ViT, thereby enhancing the model's resilience against out-of-distribution samples and reducing visual hallucinations. Importantly, this is achieved without requiring additional training modules and with fewer training parameters. We extensively evaluated DEEM on both our newly constructed RobustVQA benchmark and another well-known benchmark, POPE, for object hallucination. Compared to the state-of-the-art interleaved content generation models, DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data (10%), and a smaller base model size. △ Less

Submitted 3 July, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 25 pages. arXiv admin note: text overlap with arXiv:2401.10208 by other authors

arXiv:2405.14636 [pdf, other]

PerLLM: Personalized Inference Scheduling with Edge-Cloud Collaboration for Diverse LLM Services

Authors: Zheming Yang, Yuanhao Yang, Chang Zhao, Qi Guo, Wenkai He, Wen Ji

Abstract: With the rapid growth in the number of large language model (LLM) users, it is difficult for bandwidth-constrained cloud servers to simultaneously process massive LLM services in real-time. Recently, edge-cloud infrastructures have been used to improve the processing efficiency of large-scale LLM services. However, the diversity of task requirements and the dynamics of resources pose great challen… ▽ More With the rapid growth in the number of large language model (LLM) users, it is difficult for bandwidth-constrained cloud servers to simultaneously process massive LLM services in real-time. Recently, edge-cloud infrastructures have been used to improve the processing efficiency of large-scale LLM services. However, the diversity of task requirements and the dynamics of resources pose great challenges to inference scheduling, leading to the wastage of many resources. In this paper, we present PerLLM, a personalized inference scheduling framework with edge-cloud collaboration designed for diverse LLM services. For the complexity of multiple constraints and the decision-making process of edge-cloud collaboration, we integrate the upper confidence bound algorithm based on the constraint satisfaction mechanism in PerLLM. For diverse LLM services, PerLLM can optimize service scheduling and resource allocation solutions within the edge-cloud infrastructure to meet processing time requirements while minimizing energy costs. Experimental results from different model deployments show that PerLLM can effectively meet the processing time requirements of personalized services. Compared to other methods, PerLLM achieves 2.2x, 2.1x, and 1.6x throughput and reduces the energy cost by more than 50%. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.12229 [pdf, other]

Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy

Authors: Hao Tang, Brian Xiao, Wenhao He, Pero Subasic, Avetik R. Harutyunyan, Yao Wang, Fang Liu, Haowei Xu, Ju Li

Abstract: Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules. However, most existing ML models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML method f… ▽ More Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules. However, most existing ML models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data. Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties. As case studies, we apply the model to aromatic compounds and semiconducting polymers on both ground state and excited state properties, demonstrating its accuracy and generalization capability to complex systems that are hard to calculate using CCSD(T)-level methods. △ Less

Submitted 24 June, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.09103 [pdf, ps, other]

Mean Reflected Backward Stochastic Differential Equations Driven by G-Brownian Motion with Double Constraints

Authors: Wei He, Hanwu Li

Abstract: In this paper, we study the backward stochastic differential equations driven by G-Brownian motion with double mean reflections, which means that the constraints are made on the law of the solution. Making full use of the backward Skorokhod problem with two nonlinear reflecting boundaries and the fixed-point theory, the existence and uniqueness of solutions are established. We also consider the ca… ▽ More In this paper, we study the backward stochastic differential equations driven by G-Brownian motion with double mean reflections, which means that the constraints are made on the law of the solution. Making full use of the backward Skorokhod problem with two nonlinear reflecting boundaries and the fixed-point theory, the existence and uniqueness of solutions are established. We also consider the case where the coefficients satisfy a non-Lipschitz condition using the Picard iteration argument only for the Y component. Moreover, some basic properties including a new version of comparison theorem and connection with a deterministic optimization problem are also obtained. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.06858 [pdf, other]

Topological Superconductivity in Monolayer T$_{\textrm{d}}$-MoTe$_2$

Authors: Xin-Zhi Li, Zhen-Bo Qi, Quansheng Wu, Wen-Yu He

Abstract: Topological superconductivity has attracted significant attention due to its potential applications in quantum computation, but its experimental realization remains challenging. Recently, monolayer T$_{\textrm{d}}$-MoTe$_2$ was observed to exhibit gate tunable superconductivity, and its in-plane upper critical field exceeds the Pauli limit. Here, we show that an in-plane magnetic field beyond the… ▽ More Topological superconductivity has attracted significant attention due to its potential applications in quantum computation, but its experimental realization remains challenging. Recently, monolayer T$_{\textrm{d}}$-MoTe$_2$ was observed to exhibit gate tunable superconductivity, and its in-plane upper critical field exceeds the Pauli limit. Here, we show that an in-plane magnetic field beyond the Pauli limit can drive the superconducting monolayer T$_{\textrm{d}}$-MoTe$_2$ into a topological superconductor. The topological superconductivity arises from the interplay between the in-plane Zeeman coupling and the unique \emph{Ising plus in-plane SOC} in the monolayer T$_{\textrm{d}}$-MoTe$_2$. The \emph{Ising plus in-plane SOC} plays the essential role to enable the effective $p_x+ip_y$ pairing. Importantly, as the essential \emph{Ising plus in-plane SOC} in the monolayer T$_{\textrm{d}}$-MoTe$_2$ is generated by an in-plane polar field, our proposal demonstrates that applying an in-plane magnetic field to a gate tunable 2D superconductor with an in-plane polar axis is a feasible way to realize topological superconductivity. △ Less

Submitted 10 May, 2024; originally announced May 2024.

Comments: 9 pages, 4 figures, plus Supplementary Material. Comments are welcome

arXiv:2405.05133 [pdf, other]

Identifying every building's function in large-scale urban areas with multi-modality remote-sensing data

Authors: Zhuohong Li, Wei He, Jiepan Li, Hongyan Zhang

Abstract: Buildings, as fundamental man-made structures in urban environments, serve as crucial indicators for understanding various city function zones. Rapid urbanization has raised an urgent need for efficiently surveying building footprints and functions. In this study, we proposed a semi-supervised framework to identify every building's function in large-scale urban areas with multi-modality remote-sen… ▽ More Buildings, as fundamental man-made structures in urban environments, serve as crucial indicators for understanding various city function zones. Rapid urbanization has raised an urgent need for efficiently surveying building footprints and functions. In this study, we proposed a semi-supervised framework to identify every building's function in large-scale urban areas with multi-modality remote-sensing data. In detail, optical images, building height, and nighttime-light data are collected to describe the morphological attributes of buildings. Then, the area of interest (AOI) and building masks from the volunteered geographic information (VGI) data are collected to form sparsely labeled samples. Furthermore, the multi-modality data and weak labels are utilized to train a segmentation model with a semi-supervised strategy. Finally, results are evaluated by 20,000 validation points and statistical survey reports from the government. The evaluations reveal that the produced function maps achieve an OA of 82% and Kappa of 71% among 1,616,796 buildings in Shanghai, China. This study has the potential to support large-scale urban management and sustainable urban development. All collected data and produced maps are open access at https://github.com/LiZhuoHong/BuildingMap. △ Less

Submitted 8 May, 2024; originally announced May 2024.

Comments: 5 pages, 7 figures, accepted by IGARSS 2024

arXiv:2404.15016 [pdf, ps, other]

Convergence of the hypersymplectic flow on $T^4$ with $T^3$-symmetry

Authors: Joel Fine, Weiyong He, Chengjian Yao

Abstract: A hypersymplectic structure on a 4-manifold is a triple $ω_1, ω_2, ω_3$ of 2-forms for which every non-trivial linear combination $a^1ω_1 + a^2 ω_2 + a^3 ω_3$ is a symplectic form. Donaldson has conjectured that when the underlying manifold is compact, any such structure is isotopic in its cohomolgy class to a hyperkähler triple. We prove this conjecture for a hypersymplectic structure on $T^4$ wh… ▽ More A hypersymplectic structure on a 4-manifold is a triple $ω_1, ω_2, ω_3$ of 2-forms for which every non-trivial linear combination $a^1ω_1 + a^2 ω_2 + a^3 ω_3$ is a symplectic form. Donaldson has conjectured that when the underlying manifold is compact, any such structure is isotopic in its cohomolgy class to a hyperkähler triple. We prove this conjecture for a hypersymplectic structure on $T^4$ which is invariant under the standard $T^3$ action. The proof uses the hypersymplectic flow, a geometric flow which attempts to deform a given hypersymplectic structure to a hyperkähler triple. We prove that on $T^4$, when starting from a $T^3$-invariant hypersymplectic structure, the flow exists for all time and converges modulo diffeomorphisms to the unique cohomologous hyperkähler structure. △ Less

Submitted 23 April, 2024; originally announced April 2024.

Comments: 25 pages

MSC Class: 58J35; 53C26; 53D05

arXiv:2404.14648 [pdf, other]

Pseudorandom Permutations from Random Reversible Circuits

Authors: William He, Ryan O'Donnell

Abstract: We study pseudorandomness properties of permutations on $\{0,1\}^n$ computed by random circuits made from reversible $3$-bit gates (permutations on $\{0,1\}^3$). Our main result is that a random circuit of depth $n \cdot \tilde{O}(k^2)$, with each layer consisting of $\approx n/3$ random gates in a fixed nearest-neighbor architecture, yields almost $k$-wise independent permutations. The main techn… ▽ More We study pseudorandomness properties of permutations on $\{0,1\}^n$ computed by random circuits made from reversible $3$-bit gates (permutations on $\{0,1\}^3$). Our main result is that a random circuit of depth $n \cdot \tilde{O}(k^2)$, with each layer consisting of $\approx n/3$ random gates in a fixed nearest-neighbor architecture, yields almost $k$-wise independent permutations. The main technical component is showing that the Markov chain on $k$-tuples of $n$-bit strings induced by a single random $3$-bit nearest-neighbor gate has spectral gap at least $1/n \cdot \tilde{O}(k)$. This improves on the original work of Gowers [Gowers96], who showed a gap of $1/\mathrm{poly}(n,k)$ for one random gate (with non-neighboring inputs); and, on subsequent work [HMMR05,BH08] improving the gap to $Ω(1/n^2k)$ in the same setting. From the perspective of cryptography, our result can be seen as a particularly simple/practical block cipher construction that gives provable statistical security against attackers with access to $k$~input-output pairs within few rounds. We also show that the Luby--Rackoff construction of pseudorandom permutations from pseudorandom functions can be implemented with reversible circuits. From this, we make progress on the complexity of the Minimum Reversible Circuit Size Problem (MRCSP), showing that block ciphers of fixed polynomial size are computationally secure against arbitrary polynomial-time adversaries, assuming the existence of one-way functions (OWFs). △ Less

Submitted 3 July, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

Comments: v2: added references and comparison to subsequent work, removed claim in previous Section 7.3 with error in proof

arXiv:2404.14233 [pdf, other]

Detecting and Mitigating Hallucination in Large Vision Language Models via Fine-Grained AI Feedback

Authors: Wenyi Xiao, Ziwei Huang, Leilei Gan, Wanggui He, Haoyuan Li, Zhelun Yu, Hao Jiang, Fei Wu, Linchao Zhu

Abstract: The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs. Most previous work detects and mitigates hallucination at the coarse-grained level or requires expensive annotation (e.g.… ▽ More The rapidly developing Large Vision Language Models (LVLMs) have shown notable capabilities on a range of multi-modal tasks, but still face the hallucination phenomena where the generated texts do not align with the given contexts, significantly restricting the usages of LVLMs. Most previous work detects and mitigates hallucination at the coarse-grained level or requires expensive annotation (e.g., labeling by proprietary models or human experts). To address these issues, we propose detecting and mitigating hallucinations in LVLMs via fine-grained AI feedback. The basic idea is that we generate a small-size sentence-level hallucination annotation dataset by proprietary models, whereby we train a hallucination detection model which can perform sentence-level hallucination detection, covering primary hallucination types (i.e., object, attribute, and relationship). Then, we propose a detect-then-rewrite pipeline to automatically construct preference dataset for training hallucination mitigating model. Furthermore, we propose differentiating the severity of hallucinations, and introducing a Hallucination Severity-Aware Direct Preference Optimization (HSA-DPO) for mitigating hallucination in LVLMs by incorporating the severity of hallucinations into preference learning. Extensive experiments demonstrate the effectiveness of our method. △ Less

Submitted 22 April, 2024; originally announced April 2024.

arXiv:2404.10827 [pdf, other]

doi 10.1038/s41467-024-47852-x

Magnetically propagating Hund's exciton in van der Waals antiferromagnet NiPS3

Authors: W. He, Y. Shen, K. Wohlfeld, J. Sears, J. Li, J. Pelliciari, M. Walicki, S. Johnston, E. Baldini, V. Bisogni, M. Mitrano, M. P. M. Dean

Abstract: Magnetic van der Waals (vdW) materials have opened new frontiers for realizing novel many-body phenomena. Recently NiPS3 has received intense interest since it hosts an excitonic quasiparticle whose properties appear to be intimately linked to the magnetic state of the lattice. Despite extensive studies, the electronic character, mobility, and magnetic interactions of the exciton remain unresolved… ▽ More Magnetic van der Waals (vdW) materials have opened new frontiers for realizing novel many-body phenomena. Recently NiPS3 has received intense interest since it hosts an excitonic quasiparticle whose properties appear to be intimately linked to the magnetic state of the lattice. Despite extensive studies, the electronic character, mobility, and magnetic interactions of the exciton remain unresolved. Here we address these issues by measuring NiPS3 with ultra-high energy resolution resonant inelastic x-ray scattering (RIXS). We find that Hund's exchange interactions are primarily responsible for the energy of formation of the exciton. Measuring the dispersion of the Hund's exciton reveals that it propagates in a way that is analogous to a double-magnon. We trace this unique behavior to fundamental similarities between the NiPS3 exciton hopping and spin exchange processes, underlining the unique magnetic characteristics of this novel quasiparticle. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 11 pages accepted in Nature Communications

Journal ref: Nature Communications 15, 3496 (2024)

arXiv:2404.07496 [pdf, other]

Multifractal Dimension Spectrum Analysis for Nuclear Density Distribution

Authors: Weihu Ma, Yu-Gang Ma, Wanbing He, Bo Zhou

Abstract: We present an integral density method for calculating the multifractal dimension spectrum for the nucleon distribution in atomic nuclei. This method is then applied to analyze the non-uniformity of the density distribution in several typical types of nuclear matter distributions, including the Woods-Saxon distribution, the halo structure and the tetrahedral $α$ clustering. The subsequent discussio… ▽ More We present an integral density method for calculating the multifractal dimension spectrum for the nucleon distribution in atomic nuclei. This method is then applied to analyze the non-uniformity of the density distribution in several typical types of nuclear matter distributions, including the Woods-Saxon distribution, the halo structure and the tetrahedral $α$ clustering. The subsequent discussion provides a comprehensive and detailed exploration of the results obtained. The multifractal dimension spectrum shows remarkable sensitivity to the density distribution, establishing it as an effective tool for studying the distribution of nucleons in nuclear multibody systems. △ Less

Submitted 11 April, 2024; originally announced April 2024.

Comments: 9 pages, 6 figures

arXiv:2404.06852 [pdf, other]

Research Artifacts in Software Engineering Publications: Status and Trends

Authors: Mugeng Liu, Xiaolong Huang, Wei He, Yibing Xie, Jie M. Zhang, Xiang Jing, Zhenpeng Chen, Yun Ma

Abstract: The Software Engineering (SE) community has been embracing the open science policy and encouraging researchers to disclose artifacts in their publications. However, the status and trends of artifact practice and quality remain unclear, lacking insights on further improvement. In this paper, we present an empirical study to characterize the research artifacts in SE publications. Specifically, we ma… ▽ More The Software Engineering (SE) community has been embracing the open science policy and encouraging researchers to disclose artifacts in their publications. However, the status and trends of artifact practice and quality remain unclear, lacking insights on further improvement. In this paper, we present an empirical study to characterize the research artifacts in SE publications. Specifically, we manually collect 1,487 artifacts from all 2,196 papers published in top-tier SE conferences (ASE, FSE, ICSE, and ISSTA) from 2017 to 2022. We investigate the common practices (e.g., URL location and format, storage websites), maintenance activities (e.g., last update time and URL validity), popularity (e.g., the number of stars on GitHub and characteristics), and quality (e.g., documentation and code smell) of these artifacts. Based on our analysis, we reveal a rise in publications providing artifacts. The usage of Zenodo for sharing artifacts has significantly increased. However, artifacts stored in GitHub tend to receive few stars, indicating a limited influence on real-world SE applications. We summarize the results and provide suggestions to different stakeholders in conjunction with current guidelines. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: Accepted by Journal of Systems and Software (JSS 2024). Please include JSS in any citations

arXiv:2404.05850 [pdf, other]

Witnessing Quantum Entanglement Using Resonant Inelastic X-ray Scattering

Authors: Tianhao Ren, Yao Shen, Sophia F. R. TenHuisen, Jennifer Sears, Wei He, Mary H. Upton, Diego Casa, Petra Becker, Matteo Mitrano, Mark P. M. Dean, Robert M. Konik

Abstract: Although entanglement is both a central ingredient in our understanding of quantum many-body systems and an essential resource for quantum technologies, we only have a limited ability to quantify entanglement in real quantum materials. Thus far, entanglement metrology in quantum materials has been limited to measurements involving Hermitian operators, such as the detection of spin entanglement usi… ▽ More Although entanglement is both a central ingredient in our understanding of quantum many-body systems and an essential resource for quantum technologies, we only have a limited ability to quantify entanglement in real quantum materials. Thus far, entanglement metrology in quantum materials has been limited to measurements involving Hermitian operators, such as the detection of spin entanglement using inelastic neutron scattering. Here, we devise a method to extract the quantum Fisher information (QFI) from non-Hermitian operators and formulate an entanglement witness for resonant inelastic x-ray scattering (RIXS). Our approach is then applied to the model iridate dimer system Ba$_3$CeIr$_2$O$_9$ and used to directly test for entanglement of the electronic orbitals between neighboring Ir sites. We find that entanglement is challenging to detect under standard conditions, but that it could be achieved by analyzing the outgoing x-ray polarization or via specific choices of momentum and energy. Our protocol provides a new handle for entanglement detection, which offers routes to related types of entanglement witness (such as orbitally-resolved measurements) and to the generalization to out-of-equilibrium settings accessed in ultrafast settings. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 10 pages, 8 figures

arXiv:2404.04697 [pdf, ps, other]

Q-learning in Dynamic Treatment Regimes with Misclassified Binary Outcome

Authors: Dan Liu, Wenqing He

Abstract: The study of precision medicine involves dynamic treatment regimes (DTRs), which are sequences of treatment decision rules recommended by taking patient-level information as input. The primary goal of the DTR study is to identify an optimal DTR, a sequence of treatment decision rules that leads to the best expected clinical outcome. Statistical methods have been developed in recent years to estima… ▽ More The study of precision medicine involves dynamic treatment regimes (DTRs), which are sequences of treatment decision rules recommended by taking patient-level information as input. The primary goal of the DTR study is to identify an optimal DTR, a sequence of treatment decision rules that leads to the best expected clinical outcome. Statistical methods have been developed in recent years to estimate an optimal DTR, including Q-learning, a regression-based method in the DTR literature. Although there are many studies concerning Q-learning, little attention has been given in the presence of noisy data, such as misclassified outcomes. In this paper, we investigate the effect of outcome misclassification on Q-learning and propose a correction method to accommodate the misclassification effect. Simulation studies are conducted to demonstrate the satisfactory performance of the proposed method. We illustrate the proposed method in two examples from the National Health and Nutrition Examination Survey Data I Epidemiologic Follow-up Study and the smoking cessation program. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.04696 [pdf, ps, other]

Dynamic Treatment Regimes with Replicated Observations Available for Error-prone Covariates: a Q-learning Approach

Authors: Dan Liu, Wenqing He

Abstract: Dynamic treatment regimes (DTRs) have received an increasing interest in recent years. DTRs are sequences of treatment decision rules tailored to patient-level information. The main goal of the DTR study is to identify an optimal DTR, a sequence of treatment decision rules that yields the best expected clinical outcome. Q-learning has been considered as one of the most popular regression-based met… ▽ More Dynamic treatment regimes (DTRs) have received an increasing interest in recent years. DTRs are sequences of treatment decision rules tailored to patient-level information. The main goal of the DTR study is to identify an optimal DTR, a sequence of treatment decision rules that yields the best expected clinical outcome. Q-learning has been considered as one of the most popular regression-based methods to estimate the optimal DTR. However, it is rarely studied in an error-prone setting, where the patient information is contaminated with measurement error. In this paper, we study the effect of covariate measurement error on Q-learning and propose a correction method to correct the measurement error in Q-learning. Simulation studies are conducted to assess the performance of the proposed method in Q-learning. We illustrate the use of the proposed method in an application to the sequenced treatment alternatives to relieve depression data. △ Less

Submitted 6 April, 2024; originally announced April 2024.

arXiv:2404.01192 [pdf, other]

iMD4GC: Incomplete Multimodal Data Integration to Advance Precise Treatment Response Prediction and Survival Analysis for Gastric Cancer

Authors: Fengtao Zhou, Yingxue Xu, Yanfen Cui, Shenyan Zhang, Yun Zhu, Weiyang He, Jiguang Wang, Xin Wang, Ronald Chan, Louis Ho Shing Lau, Chu Han, Dafu Zhang, Zhenhui Li, Hao Chen

Abstract: Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among… ▽ More Gastric cancer (GC) is a prevalent malignancy worldwide, ranking as the fifth most common cancer with over 1 million new cases and 700 thousand deaths in 2020. Locally advanced gastric cancer (LAGC) accounts for approximately two-thirds of GC diagnoses, and neoadjuvant chemotherapy (NACT) has emerged as the standard treatment for LAGC. However, the effectiveness of NACT varies significantly among patients, with a considerable subset displaying treatment resistance. Ineffective NACT not only leads to adverse effects but also misses the optimal therapeutic window, resulting in lower survival rate. However, existing multimodal learning methods assume the availability of all modalities for each patient, which does not align with the reality of clinical practice. The limited availability of modalities for each patient would cause information loss, adversely affecting predictive accuracy. In this study, we propose an incomplete multimodal data integration framework for GC (iMD4GC) to address the challenges posed by incomplete multimodal data, enabling precise response prediction and survival analysis. Specifically, iMD4GC incorporates unimodal attention layers for each modality to capture intra-modal information. Subsequently, the cross-modal interaction layers explore potential inter-modal interactions and capture complementary information across modalities, thereby enabling information compensation for missing modalities. To evaluate iMD4GC, we collected three multimodal datasets for GC study: GastricRes (698 cases) for response prediction, GastricSur (801 cases) for survival analysis, and TCGA-STAD (400 cases) for survival analysis. The scale of our datasets is significantly larger than previous studies. The iMD4GC achieved impressive performance with an 80.2% AUC on GastricRes, 71.4% C-index on GastricSur, and 66.1% C-index on TCGA-STAD, significantly surpassing other compared methods. △ Less

Submitted 1 April, 2024; originally announced April 2024.

Comments: 27 pages, 9 figures, 3 tables (under review)

arXiv:2404.00884 [pdf, other]

Self-Demos: Eliciting Out-of-Demonstration Generalizability in Large Language Models

Authors: Wei He, Shichun Liu, Jun Zhao, Yiwen Ding, Yi Lu, Zhiheng Xi, Tao Gui, Qi Zhang, Xuanjing Huang

Abstract: Large language models (LLMs) have shown promising abilities of in-context learning (ICL), adapting swiftly to new tasks with only few-shot demonstrations. However, current few-shot methods heavily depend on high-quality, query-specific demos, which are often lacking. When faced with out-of-demonstration (OOD) queries, methods that rely on hand-crafted demos or external retrievers might fail. To br… ▽ More Large language models (LLMs) have shown promising abilities of in-context learning (ICL), adapting swiftly to new tasks with only few-shot demonstrations. However, current few-shot methods heavily depend on high-quality, query-specific demos, which are often lacking. When faced with out-of-demonstration (OOD) queries, methods that rely on hand-crafted demos or external retrievers might fail. To bridge the gap between limited demos and OOD queries, we propose Self-Demos, a novel prompting method that elicits the inherent generalizability in LLMs by query-aware demo generation. The generated demos strategically interpolate between existing demos and the given query, transforming the query from OOD to ID. To evaluate the effectiveness of our approach, we manually constructed OOD-Toolset, a dataset in the tool-using scenario with over 300 real-world APIs and 1000 instances, each consisting of three tool-use cases as demos and an OOD query. Thorough experiments on our dataset and two public math benchmarks have shown that our method can outperform state-of-the-art baselines in the OOD setting. Moreover, we conduct a range of analyses to validate Self-Demos's generalization and provide more insights. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: Accepted to NAACL 2024 Findings

arXiv:2404.00845 [pdf]

Harnessing Interlayer Magnetic Coupling for Efficient, Field-Free Current-Induced Magnetization Switching in a Magnetic Insulator

Authors: Leran Wang, Alejandro O. Leon, Wenqing He, Zhongyu Liang, Xiaohan Li, Xiaoxiao Fang, Wenyun Yang, Licong Peng, Jinbo Yang, Caihua Wan, Gerrit E. W. Bauer, Zhaochu Luo

Abstract: Owing to the unique features of low Gilbert damping, long spin-diffusion lengths and zero Ohmic losses, magnetic insulators are promising candidate materials for next-generation spintronic applications. However, due to the localized magnetic moments and the complex metal-oxide interface between magnetic insulators and heavy metals, spin-functional Dzyaloshinskii-Moriya interactions or spin Hall an… ▽ More Owing to the unique features of low Gilbert damping, long spin-diffusion lengths and zero Ohmic losses, magnetic insulators are promising candidate materials for next-generation spintronic applications. However, due to the localized magnetic moments and the complex metal-oxide interface between magnetic insulators and heavy metals, spin-functional Dzyaloshinskii-Moriya interactions or spin Hall and Edelstein effects are weak, which diminishes the performance of these typical building blocks for spintronic devices. Here, we exploit the exchange coupling between metallic and insulating magnets for efficient electrical manipulation of heavy metal/magnetic insulator heterostructures. By inserting a thin Co layer, we enhance the spin-orbit torque efficiency by more than 20 times, which significantly reduces the switching current density. Moreover, we demonstrate field-free current-induced magnetization switching caused by a symmetry-breaking non-collinear magnetic texture. Our work launches magnetic insulators as an alternative platform for low-power spintronic devices. △ Less

Submitted 31 March, 2024; originally announced April 2024.

arXiv:2403.18373 [pdf, other]

BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection

Authors: Changshun Wu, Weicheng He, Chih-Hong Cheng, Xiaowei Huang, Saddek Bensalem

Abstract: Out-of-distribution (OoD) detection techniques for deep neural networks (DNNs) become crucial thanks to their filtering of abnormal inputs, especially when DNNs are used in safety-critical applications and interact with an open and dynamic environment. Nevertheless, integrating OoD detection into state-of-the-art (SOTA) object detection DNNs poses significant challenges, partly due to the complexi… ▽ More Out-of-distribution (OoD) detection techniques for deep neural networks (DNNs) become crucial thanks to their filtering of abnormal inputs, especially when DNNs are used in safety-critical applications and interact with an open and dynamic environment. Nevertheless, integrating OoD detection into state-of-the-art (SOTA) object detection DNNs poses significant challenges, partly due to the complexity introduced by the SOTA OoD construction methods, which require the modification of DNN architecture and the introduction of complex loss functions. This paper proposes a simple, yet surprisingly effective, method that requires neither retraining nor architectural change in object detection DNN, called Box Abstraction-based Monitors (BAM). The novelty of BAM stems from using a finite union of convex box abstractions to capture the learned features of objects for in-distribution (ID) data, and an important observation that features from OoD data are more likely to fall outside of these boxes. The union of convex regions within the feature space allows the formation of non-convex and interpretable decision boundaries, overcoming the limitations of VOS-like detectors without sacrificing real-time performance. Experiments integrating BAM into Faster R-CNN-based object detection DNNs demonstrate a considerably improved performance against SOTA OoD detection techniques. △ Less

Submitted 27 March, 2024; originally announced March 2024.

arXiv:2403.14332 [pdf, ps, other]

A Differentially Private Clustering Algorithm for Well-Clustered Graphs

Authors: Weiqiang He, Hendrik Fichtenberger, Pan Peng

Abstract: We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)… ▽ More We study differentially private (DP) algorithms for recovering clusters in well-clustered graphs, which are graphs whose vertex set can be partitioned into a small number of sets, each inducing a subgraph of high inner conductance and small outer conductance. Such graphs have widespread application as a benchmark in the theoretical analysis of spectral clustering. We provide an efficient ($ε$,$δ$)-DP algorithm tailored specifically for such graphs. Our algorithm draws inspiration from the recent work of Chen et al., who developed DP algorithms for recovery of stochastic block models in cases where the graph comprises exactly two nearly-balanced clusters. Our algorithm works for well-clustered graphs with $k$ nearly-balanced clusters, and the misclassification ratio almost matches the one of the best-known non-private algorithms. We conduct experimental evaluations on datasets with known ground truth clusters to substantiate the prowess of our algorithm. We also show that any (pure) $ε$-DP algorithm would result in substantial error. △ Less

Submitted 21 March, 2024; originally announced March 2024.

arXiv:2403.13427 [pdf]

Observation of non-volatile anomalous Nernst effect in altermagnet with collinear Néel vector

Authors: Lei Han, Xizhi Fu, Wenqing He, Yuxiang Zhu, Jiankun Dai, Wenfeng Yang, Wenxuan Zhu, Hua Bai, Chong Chen, Caihua Wan, Xiufeng Han, Cheng Song, Junwei Liu, Feng Pan

Abstract: Anomalous Nernst effect (ANE), a widely investigated transverse thermoelectric effect that converts waste heat into electrical energy with remarkable flexibility and integration capability, has been extended to antiferromagnets with non-collinear spin texture recently. ANE in compensated magnet with collinear Néel vector will bring more opportunities to construct magnetic-field-immune and ultrafas… ▽ More Anomalous Nernst effect (ANE), a widely investigated transverse thermoelectric effect that converts waste heat into electrical energy with remarkable flexibility and integration capability, has been extended to antiferromagnets with non-collinear spin texture recently. ANE in compensated magnet with collinear Néel vector will bring more opportunities to construct magnetic-field-immune and ultrafast transverse thermoelectric converters, but remains unachieved for long. It is due to the degenerated band structure of traditional collinear compensated magnet excludes non-zero Berry curvature. Here, we realize non-volatile ANE in altermagnet Mn5Si3 thin film with collinear Neel vector, whose unique alternating spin-splitting band structure plays vital role in creating non-zero Berry curvature and hotpots of anomalous Nernst conductivity near band intersections. Interestingly, ANE is relatively weak in stoichiometric Mn5Si3, but undergoes a sixfold enhancement through strategically raising the Fermi level by additional Mn doping, indicating sensitive intrinsic influence from specific location of the Fermi level on ANE in altermagnet. Moreover, our investigation reveals a unique Neel-vector-dependent temperature-scaling relationship of anomalous Nernst conductivity in Mn5Si3. Our work not only fills a longstanding gap by confirming the presence of non-volatile ANE in collinear compensated magnet, but also enlightens thermoelectric physics related to exotic spin-splitting band structure in altermagnet. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: 25 pages, 4 figures

Showing 1–50 of 691 results for author: He, W