subscribe to arXiv mailings

WildVidFit: Video Virtual Try-On in the Wild via Image-Based Controlled Diffusion Models

Authors: Zijian He, Peixin Chen, Guangrun Wang, Guanbin Li, Philip H. S. Torr, Liang Lin

Abstract: Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and… ▽ More Video virtual try-on aims to generate realistic sequences that maintain garment identity and adapt to a person's pose and body shape in source videos. Traditional image-based methods, relying on warping and blending, struggle with complex human movements and occlusions, limiting their effectiveness in video try-on applications. Moreover, video-based models require extensive, high-quality data and substantial computational resources. To tackle these issues, we reconceptualize video try-on as a process of generating videos conditioned on garment descriptions and human motion. Our solution, WildVidFit, employs image-based controlled diffusion models for a streamlined, one-stage approach. This model, conditioned on specific garments and individuals, is trained on still images rather than videos. It leverages diffusion guidance from pre-trained models including a video masked autoencoder for segment smoothness improvement and a self-supervised model for feature alignment of adjacent frame in the latent space. This integration markedly boosts the model's ability to maintain temporal coherence, enabling more effective video try-on within an image-based framework. Our experiments on the VITON-HD and DressCode datasets, along with tests on the VVT and TikTok datasets, demonstrate WildVidFit's capability to generate fluid and coherent videos. The project page website is at wildvidfit-project.github.io. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.09652 [pdf, other]

How Chinese are Chinese Language Models? The Puzzling Lack of Language Policy in China's LLMs

Authors: Andrea W Wen-Yi, Unso Eun Seo Jo, Lu Jia Lin, David Mimno

Abstract: Contemporary language models are increasingly multilingual, but Chinese LLM developers must navigate complex political and business considerations of language diversity. Language policy in China aims at influencing the public discourse and governing a multi-ethnic society, and has gradually transitioned from a pluralist to a more assimilationist approach since 1949. We explore the impact of these… ▽ More Contemporary language models are increasingly multilingual, but Chinese LLM developers must navigate complex political and business considerations of language diversity. Language policy in China aims at influencing the public discourse and governing a multi-ethnic society, and has gradually transitioned from a pluralist to a more assimilationist approach since 1949. We explore the impact of these influences on current language technology. We evaluate six open-source multilingual LLMs pre-trained by Chinese companies on 18 languages, spanning a wide range of Chinese, Asian, and Anglo-European languages. Our experiments show Chinese LLMs performance on diverse languages is indistinguishable from international LLMs. Similarly, the models' technical reports also show lack of consideration for pretraining data language coverage except for English and Mandarin Chinese. Examining Chinese AI policy, model experiments, and technical reports, we find no sign of any consistent policy, either for or against, language diversity in China's LLM development. This leaves a puzzling fact that while China regulates both the languages people use daily as well as language model development, they do not seem to have any policy on the languages in language models. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Wen-Yi and Jo contributed equally to this work

arXiv:2407.09342 [pdf, other]

MIXED-SENSE: A Mixed Reality Sensor Emulation Framework for Test and Evaluation of UAVs Against False Data Injection Attacks

Authors: Kartik A. Pant, Li-Yu Lin, Jaehyeok Kim, Worawis Sribunma, James M. Goppert, Inseok Hwang

Abstract: We present a high-fidelity Mixed Reality sensor emulation framework for testing and evaluating the resilience of Unmanned Aerial Vehicles (UAVs) against false data injection (FDI) attacks. The proposed approach can be utilized to assess the impact of FDI attacks, benchmark attack detector performance, and validate the effectiveness of mitigation/reconfiguration strategies in single-UAV and UAV swa… ▽ More We present a high-fidelity Mixed Reality sensor emulation framework for testing and evaluating the resilience of Unmanned Aerial Vehicles (UAVs) against false data injection (FDI) attacks. The proposed approach can be utilized to assess the impact of FDI attacks, benchmark attack detector performance, and validate the effectiveness of mitigation/reconfiguration strategies in single-UAV and UAV swarm operations. Our Mixed Reality framework leverages high-fidelity simulations of Gazebo and a Motion Capture system to emulate proprioceptive (e.g., GNSS) and exteroceptive (e.g., camera) sensor measurements in real-time. We propose an empirical approach to faithfully recreate signal characteristics such as latency and noise in these measurements. Finally, we illustrate the efficacy of our proposed framework through a Mixed Reality experiment consisting of an emulated GNSS attack on an actual UAV, which (i) demonstrates the impact of false data injection attacks on GNSS measurements and (ii) validates a mitigation strategy utilizing a distributed camera network developed in our previous work. Our open-source implementation is available at \href{https://github.com/CogniPilot/mixed\_sense}{\texttt{https://github.com/CogniPilot/mixed\_sense}} △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: 6 pages, 5 figures, IROS 2024

arXiv:2407.07327 [pdf, other]

Fuse, Reason and Verify: Geometry Problem Solving with Parsed Clauses from Diagram

Authors: Ming-Liang Zhang, Zhong-Zhi Li, Fei Yin, Liang Lin, Cheng-Lin Liu

Abstract: Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained st… ▽ More Geometry problem solving (GPS) requires capacities of multi-modal understanding, multi-hop reasoning and theorem knowledge application. In this paper, we propose a neural-symbolic model for plane geometry problem solving (PGPS), named PGPSNet-v2, with three key steps: modal fusion, reasoning process and knowledge verification. In modal fusion, we leverage textual clauses to express fine-grained structural and semantic content of geometry diagram, and fuse diagram with textual problem efficiently through structural-semantic pre-training. For reasoning, we design an explicable solution program to describe the geometric reasoning process, and employ a self-limited decoder to generate solution program autoregressively. To reduce solution errors, a multi-level theorem verifier is proposed to eliminate solutions that do not match geometric principles, alleviating the hallucination of the neural model. We also construct a large-scale geometry problem dataset called PGPS9K, containing fine-grained annotations of textual clauses, solution program and involved knowledge tuples. Extensive experiments on datasets Geometry3K and PGPS9K show that our PGPSNet solver outperforms existing symbolic and neural solvers in GPS performance, while maintaining good explainability and reliability, and the solver components (fusion, reasoning, verification) are all justified effective. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: under review by journal

arXiv:2407.06886 [pdf, other]

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Authors: Yang Liu, Weixing Chen, Yongjie Bai, Jingzhou Luo, Xinshuai Song, Kaixuan Jiang, Zhida Li, Ganlong Zhao, Junyi Lin, Guanbin Li, Wen Gao, Liang Lin

Abstract: Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilit… ▽ More Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List. △ Less

Submitted 11 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: The first comprehensive review of Embodied AI in the era of MLMs, 37 pages. We also provide the paper list for Embodied AI: https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

arXiv:2407.06844 [pdf, other]

Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: submitted to TIP

arXiv:2407.06744 [pdf, other]

Suppression of Local Decay in non-Markovian Waveguide QED

Authors: Yuan liu, Linhan Lin, Hong-Bo Sun

Abstract: Atoms coupled to the same environment interfere with each other to yield super- or sub-radiance. Specifically, atoms in subradiant states are promising candidates for long-lifetime qubits and quantum memory because of the immunity to the common environment. However, subradiant states can still be influenced by local environments, which are incoherent for different atoms and cannot be canceled out… ▽ More Atoms coupled to the same environment interfere with each other to yield super- or sub-radiance. Specifically, atoms in subradiant states are promising candidates for long-lifetime qubits and quantum memory because of the immunity to the common environment. However, subradiant states can still be influenced by local environments, which are incoherent for different atoms and cannot be canceled out through interference. Here we propose to break this limit by preparing a waveguide QED system in the non-Markovian regime, where the ultra-small decay rate arises because of the retarded interaction. We further show that similar effect occurs spontaneously by self-interference and can be stressed by cooperative coupling. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: 6 pages, 4 figures

arXiv:2407.05634 [pdf, other]

Infinite quantum signal processing for arbitrary Szegő functions

Authors: Michel Alexis, Lin Lin, Gevorg Mnatsakanyan, Christoph Thiele, Jiasu Wang

Abstract: We provide a complete solution to the problem of infinite quantum signal processing for the class of Szegő functions, which are functions that satisfy a logarithmic integrability condition and include almost any function that allows for a quantum signal processing representation. We do so by introducing a new algorithm called the Riemann-Hilbert-Weiss algorithm, which can compute any individual ph… ▽ More We provide a complete solution to the problem of infinite quantum signal processing for the class of Szegő functions, which are functions that satisfy a logarithmic integrability condition and include almost any function that allows for a quantum signal processing representation. We do so by introducing a new algorithm called the Riemann-Hilbert-Weiss algorithm, which can compute any individual phase factor independent of all other phase factors. Our algorithm is also the first provably stable numerical algorithm for computing phase factors of any arbitrary Szegő function. The proof of stability involves solving a Riemann-Hilbert factorization problem in nonlinear Fourier analysis using elements of spectral theory. △ Less

Submitted 10 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 45 pages, 5 figures. Small updates to main text, and ArXiv title/abstract now renders correctly

MSC Class: 68Q12; 81P68; 34L25; 42C99

arXiv:2407.05235 [pdf, other]

Tracking Reflected Objects: A Benchmark

Authors: Xiaoyu Guo, Pengzhi Zhong, Lizhi Lin, Hao Zhang, Ling Huang, Shuiwang Li

Abstract: Visual tracking has advanced significantly in recent years, mainly due to the availability of large-scale training datasets. These datasets have enabled the development of numerous algorithms that can track objects with high accuracy and robustness.However, the majority of current research has been directed towards tracking generic objects, with less emphasis on more specialized and challenging sc… ▽ More Visual tracking has advanced significantly in recent years, mainly due to the availability of large-scale training datasets. These datasets have enabled the development of numerous algorithms that can track objects with high accuracy and robustness.However, the majority of current research has been directed towards tracking generic objects, with less emphasis on more specialized and challenging scenarios. One such challenging scenario involves tracking reflected objects. Reflections can significantly distort the appearance of objects, creating ambiguous visual cues that complicate the tracking process. This issue is particularly pertinent in applications such as autonomous driving, security, smart homes, and industrial production, where accurately tracking objects reflected in surfaces like mirrors or glass is crucial. To address this gap, we introduce TRO, a benchmark specifically for Tracking Reflected Objects. TRO includes 200 sequences with around 70,000 frames, each carefully annotated with bounding boxes. This dataset aims to encourage the development of new, accurate methods for tracking reflected objects, which present unique challenges not sufficiently covered by existing benchmarks. We evaluated 20 state-of-the-art trackers and found that they struggle with the complexities of reflections. To provide a stronger baseline, we propose a new tracker, HiP-HaTrack, which uses hierarchical features to improve performance, significantly outperforming existing algorithms. We believe our benchmark, evaluation, and HiP-HaTrack will inspire further research and applications in tracking reflected objects. The TRO and code are available at https://github.com/OpenCodeGithub/HIP-HaTrack. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.04613 [pdf, other]

Thermal and mechanical study of a parametrised cryostat model for optical characterisation of upcoming CMB experiments

Authors: Thomas J. L. J. Gascard, Yi Wang, Jon E. Gudmundsson, Eve M. Vavagiakis, Cody J. Duell, Zachary B. Huber, Lawrence T. Lin, Michael D. Niemack, Rodrigo G. Freundt

Abstract: Current and future experiments observing the cosmic microwave background require a detailed understanding of optical performance at cryogenic temperatures. Pre-deployment analysis of optics can be performed in custom-engineered cryogenic test beds, such as Mod-Cam, a first light camera for the CCAT project. This work presents studies of the mechanical and thermal performance of CryoSim, a model of… ▽ More Current and future experiments observing the cosmic microwave background require a detailed understanding of optical performance at cryogenic temperatures. Pre-deployment analysis of optics can be performed in custom-engineered cryogenic test beds, such as Mod-Cam, a first light camera for the CCAT project. This work presents studies of the mechanical and thermal performance of CryoSim, a model of a generic cylindrical 4-K cryostat cooled with a commercial pulse tube cryocooler that can be used to characterise optical components and full reimaging optical systems. CryoSim is extensively parametrised, allowing the joint analysis and optimisation of mechanical and thermal performance via finite element methods. Results from this model are validated against measured cooldown data of the Mod-Cam cryostat. Due to the extensive parametrisation of the model, significant modifications of the cryostat geometry may be implemented to be representative of any system the scientific community may desire, and validation of thermal and mechanical performance can be carried out rapidly. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: To be published in the SPIE Astronomical Telescopes + Instrumentation (AS24) proceedings

arXiv:2407.03234 [pdf, other]

Self-Evaluation as a Defense Against Adversarial Attacks on LLMs

Authors: Hannah Brown, Leon Lin, Kenji Kawaguchi, Michael Shieh

Abstract: When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs. For this reason, models are both trained and instructed to refuse to answer unsafe prompts such as "Tell me how to build a bomb." We find that, despite these safeguards, it is possible to break model defenses simply by appending a space to the end of a mod… ▽ More When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs. For this reason, models are both trained and instructed to refuse to answer unsafe prompts such as "Tell me how to build a bomb." We find that, despite these safeguards, it is possible to break model defenses simply by appending a space to the end of a model's input. In a study of eight open-source models, we demonstrate that this acts as a strong enough attack to cause the majority of models to generate harmful outputs with very high success rates. We examine the causes of this behavior, finding that the contexts in which single spaces occur in tokenized training data encourage models to generate lists when prompted, overriding training signals to refuse to answer unsafe requests. Our findings underscore the fragile state of current model alignment and promote the importance of developing more robust alignment methods. Code and data will be made available at https://github.com/Linlt-leon/self-eval. △ Less

Submitted 15 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 7 figures

arXiv:2407.03232 [pdf, other]

Single Character Perturbations Break LLM Alignment

Authors: Leon Lin, Hannah Brown, Kenji Kawaguchi, Michael Shieh

Abstract: When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs. For this reason, models are both trained and instructed to refuse to answer unsafe prompts such as "Tell me how to build a bomb." We find that, despite these safeguards, it is possible to break model defenses simply by appending a space to the end of a mod… ▽ More When LLMs are deployed in sensitive, human-facing settings, it is crucial that they do not output unsafe, biased, or privacy-violating outputs. For this reason, models are both trained and instructed to refuse to answer unsafe prompts such as "Tell me how to build a bomb." We find that, despite these safeguards, it is possible to break model defenses simply by appending a space to the end of a model's input. In a study of eight open-source models, we demonstrate that this acts as a strong enough attack to cause the majority of models to generate harmful outputs with very high success rates. We examine the causes of this behavior, finding that the contexts in which single spaces occur in tokenized training data encourage models to generate lists when prompted, overriding training signals to refuse to answer unsafe requests. Our findings underscore the fragile state of current model alignment and promote the importance of developing more robust alignment methods. Code and data will be available at https://github.com/hannah-aught/space_attack. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 8 pages, 6 figures

arXiv:2407.02818 [pdf, other]

WizardMerge -- Save Us From Merging Without Any Clues

Authors: Qingyu Zhang, Junzhe Li, Jiayi Lin, Jie Ding, Lanteng Lin, Chenxiong Qian

Abstract: Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, dev… ▽ More Modern software development necessitates efficient version-oriented collaboration among developers. While Git is the most popular version control system, it generates unsatisfactory version merging results due to textual-based workflow, leading to potentially unexpected results in the merged version of the project. Although numerous merging tools have been proposed for improving merge results, developers remain struggling to resolve the conflicts and fix incorrectly modified code without clues. We present WizardMerge, an auxiliary tool that leverages merging results from Git to retrieve code block dependency on text and LLVM-IR level and provide suggestions for developers to resolve errors introduced by textual merging. Through the evaluation, we subjected WizardMerge to testing on 227 conflicts within five large-scale projects. The outcomes demonstrate that WizardMerge diminishes conflict merging time costs, achieving a 23.85% reduction. Beyond addressing conflicts, WizardMerge provides merging suggestions for over 70% of the code blocks potentially affected by the conflicts. Notably, WizardMerge exhibits the capability to identify conflict-unrelated code blocks that require manual intervention yet are harmfully applied by Git during the merging. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 22 pages

ACM Class: D.2; D.3

arXiv:2407.01093 [pdf, other]

IBSEN: Director-Actor Agent Collaboration for Controllable and Interactive Drama Script Generation

Authors: Senyu Han, Lu Chen, Li-Min Lin, Zhengshan Xu, Kai Yu

Abstract: Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates dram… ▽ More Large language models have demonstrated their capabilities in storyline creation and human-like character role-playing. Current language model agents mainly focus on reasonable behaviors from the level of individuals, and their behaviors might be hard to constraint on the level of the whole storyline. In this paper we introduce IBSEN, a director-actor coordinate agent framework that generates drama scripts and makes the plot played by agents more controllable. The director agent writes plot outlines that the user desires to see, instructs the actor agents to role-play their characters, and reschedules the plot when human players participate in the scenario to ensure the plot is progressing towards the objective. To evaluate the framework, we create a novel drama plot that involves several actor agents and check the interactions between them under the instruction of the director agent. Evaluation results show that our framework could generate complete, diverse drama scripts from only a rough outline of plot objectives, meanwhile maintaining the characteristics of characters in the drama. Our codes and prompts are available at https://github.com/OpenDFM/ibsen. △ Less

Submitted 1 July, 2024; originally announced July 2024.

Comments: Accepted by ACL 2024 Main

arXiv:2407.01017 [pdf, other]

Coding for Intelligence from the Perspective of Category

Authors: Wenhan Yang, Zixuan Hu, Lilang Lin, Jiaying Liu, Ling-Yu Duan

Abstract: Coding, which targets compressing and reconstructing data, and intelligence, often regarded at an abstract computational level as being centered around model learning and prediction, interweave recently to give birth to a series of significant progress. The recent trends demonstrate the potential homogeneity of these two fields, especially when deep-learning models aid these two categories for bet… ▽ More Coding, which targets compressing and reconstructing data, and intelligence, often regarded at an abstract computational level as being centered around model learning and prediction, interweave recently to give birth to a series of significant progress. The recent trends demonstrate the potential homogeneity of these two fields, especially when deep-learning models aid these two categories for better probability modeling. For better understanding and describing from a unified perspective, inspired by the basic generally recognized principles in cognitive psychology, we formulate a novel problem of Coding for Intelligence from the category theory view. Based on the three axioms: existence of ideal coding, existence of practical coding, and compactness promoting generalization, we derive a general framework to understand existing methodologies, namely that, coding captures the intrinsic relationships of objects as much as possible, while ignoring information irrelevant to downstream tasks. This framework helps identify the challenges and essential elements in solving the specific derived Minimal Description Length (MDL) optimization problem from a broader range, providing opportunities to build a more intelligent system for handling multiple tasks/applications with coding ideas/tools. Centering on those elements, we systematically review recent processes of towards optimizing the MDL problem in more comprehensive ways from data, model, and task perspectives, and reveal their impacts on the potential CfI technical routes. After that, we also present new technique paths to fulfill CfI and provide potential solutions with preliminary experimental evidence. Last, further directions and remaining issues are discussed as well. The discussion shows our theory can reveal many phenomena and insights about large foundation models, which mutually corroborate with recent practices in feature learning. △ Less

Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

arXiv:2406.19233 [pdf, other]

X-ray and gamma-ray study for 2023 nova eruption of V1716 Sco

Authors: H. -H. Wang, H. -D. Yan, J. Takata, L. C. -C. Lin

Abstract: We report the results of X-ray and gamma-ray analyses of the nova V1716 Sco taken by Swift, NICER, NuSTAR and F ermi-LAT. We have detected gamma-ray emission at a significant level exceeding 8 σ in daily bins starting the day after the optical eruption. The gamma-ray emission, characterized by a Test Statistic (TS) value more than four, persisted for approximately 40 days. Notably, harder X-ray em… ▽ More We report the results of X-ray and gamma-ray analyses of the nova V1716 Sco taken by Swift, NICER, NuSTAR and F ermi-LAT. We have detected gamma-ray emission at a significant level exceeding 8 σ in daily bins starting the day after the optical eruption. The gamma-ray emission, characterized by a Test Statistic (TS) value more than four, persisted for approximately 40 days. Notably, harder X-ray emission were observed by NuSTAR as the start of gamma-ray emission, which is the fourth classical nova that gamma-ray emission is concurrent with harder X-ray emission from NuSTAR data. V1716 Sco is one of rare samples that clearly shows a hard X-ray emission (1-10 keV bands) in the Swift-XRT data concurrently with gamma-ray emission of Fermi-LAT data, and its light curve in 1.0-10.0 keV bands had a peak at about 20 days after the optical eruption. The X-ray spectrum was initially fitted by a model of thermal plasma emission, and entered a supersoft phase with additional blackbody (BB) component emerged around about 40 days after the optical eruption. NICER data taken in supersoft source phase revealed a quasi-periodic oscillation with a period of 79.10+-1.98 seconds, and the peak phase of the folded light curve varied with time. Moreover, V1716 Sco is the another example that the emission radius in supersoft source phase is significantly larger than the radius of white dwarf, and a simple BB emission model may not be applicable since the luminosity exceeds significantly Eddington limit. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: 16 pages,13 figures,submitted to AAS Journals. arXiv admin note: text overlap with arXiv:2404.08409

arXiv:2406.18365 [pdf, other]

Themis: Towards Flexible and Interpretable NLG Evaluation

Authors: Xinyu Hu, Li Lin, Mingqi Gao, Xunjian Yin, Xiaojun Wan

Abstract: The evaluation of natural language generation (NLG) tasks is a significant and longstanding research issue. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the impro… ▽ More The evaluation of natural language generation (NLG) tasks is a significant and longstanding research issue. With the recent emergence of powerful large language models (LLMs), some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the improved performance of existing methods, they still possess some deficiencies, such as dependency on references and limited evaluation flexibility. Therefore, in this paper, we meticulously construct a large-scale NLG evaluation corpus NLG-Eval with human and GPT-4 annotations to alleviate the lack of relevant data in this field. Furthermore, we propose Themis, an LLM dedicated to NLG evaluation, which has been trained with our designed multi-perspective consistency and rating-oriented preference alignment methods. Themis can conduct flexible and interpretable evaluations without references, and it exhibits superior evaluation performance on various NLG tasks, simultaneously generalizing well to unseen tasks and surpassing other evaluation models, including GPT-4. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.18327 [pdf, other]

Multi-modal Evidential Fusion Network for Trusted PET/CT Tumor Segmentation

Authors: Yuxuan Qi, Li Lin, Jiajun Wang, Jingya Zhang, Bin Zhang

Abstract: Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by network… ▽ More Accurate segmentation of tumors in PET/CT images is important in computer-aided diagnosis and treatment of cancer. The key issue of such a segmentation problem lies in the effective integration of complementary information from PET and CT images. However, the quality of PET and CT images varies widely in clinical settings, which leads to uncertainty in the modality information extracted by networks. To take the uncertainty into account in multi-modal information fusion, this paper proposes a novel Multi-modal Evidential Fusion Network (MEFN) comprising a Cross-Modal Feature Learning (CFL) module and a Multi-modal Trusted Fusion (MTF) module. The CFL module reduces the domain gap upon modality conversion and highlights common tumor features, thereby alleviating the needs of the segmentation module to handle modality specificity. The MTF module utilizes mutual attention mechanisms and an uncertainty calibrator to fuse modality features based on modality uncertainty and then fuse the segmentation results under the guidance of Dempster-Shafer Theory. Besides, a new uncertainty perceptual loss is introduced to force the model focusing on uncertain features and hence improve its ability to extract trusted modality information. Extensive comparative experiments are conducted on two publicly available PET/CT datasets to evaluate the performance of our proposed method whose results demonstrate that our MEFN significantly outperforms state-of-the-art methods with improvements of 2.15% and 3.23% in DSC scores on the AutoPET dataset and the Hecktor dataset, respectively. More importantly, our model can provide radiologists with credible uncertainty of the segmentation results for their decision in accepting or rejecting the automatic segmentation results, which is particularly important for clinical applications. Our code will be available at https://github.com/QPaws/MEFN. △ Less

Submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.15931 [pdf, other]

Multistep Criticality Search and Power Shaping in Microreactors with Reinforcement Learning

Authors: Majdi I. Radaideh, Leo Tunkle, Dean Price, Kamal Abdulraheem, Linyu Lin, Moutaz Elias

Abstract: Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, developing robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid i… ▽ More Reducing operation and maintenance costs is a key objective for advanced reactors in general and microreactors in particular. To achieve this reduction, developing robust autonomous control algorithms is essential to ensure safe and autonomous reactor operation. Recently, artificial intelligence and machine learning algorithms, specifically reinforcement learning (RL) algorithms, have seen rapid increased application to control problems, such as plasma control in fusion tokamaks and building energy management. In this work, we introduce the use of RL for intelligent control in nuclear microreactors. The RL agent is trained using proximal policy optimization (PPO) and advantage actor-critic (A2C), cutting-edge deep RL techniques, based on a high-fidelity simulation of a microreactor design inspired by the Westinghouse eVinci\textsuperscript{TM} design. We utilized a Serpent model to generate data on drum positions, core criticality, and core power distribution for training a feedforward neural network surrogate model. This surrogate model was then used to guide a PPO and A2C control policies in determining the optimal drum position across various reactor burnup states, ensuring critical core conditions and symmetrical power distribution across all six core portions. The results demonstrate the excellent performance of PPO in identifying optimal drum positions, achieving a hextant power tilt ratio of approximately 1.002 (within the limit of $<$ 1.02) and maintaining criticality within a 10 pcm range. A2C did not provide as competitive of a performance as PPO in terms of performance metrics for all burnup steps considered in the cycle. Additionally, the results highlight the capability of well-trained RL control policies to quickly identify control actions, suggesting a promising approach for enabling real-time autonomous control through digital twins. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 15 pages, 3 figures, and 2 tables

arXiv:2406.14892 [pdf, other]

CCAT: Detector Noise Limited Performance of the RFSoC-based Readout Electronics for mm/sub-mm/far-IR KIDs

Authors: Adrian K. Sinclair, James Burgoyne, Anthony I. Huber, Colin Murphy, Steve K. Choi, Cody J. Duell, Zachary B. Huber, Yaqiong Li, Scott C. Chapman, Michael D. Niemack, Thomas Nikola, Eve M. Vavagiakis, Samantha Walker, Jordan D. Wheeler, Jason Austermann, Lawrence Lin, Ruixuan Xie, Bugao Zou, Philip D. Mauskopf

Abstract: The Fred Young Submillimeter Telescope (FYST), on Cerro Chajnantor in the Atacama desert of Chile, will conduct wide-field and small deep-field surveys of the sky with more than 100,000 detectors on the Prime-Cam instrument. Kinetic inductance detectors (KIDs) were chosen as the primary sensor technology for their high density focal plane packing. Additionally, they benefit from low cost, ease of… ▽ More The Fred Young Submillimeter Telescope (FYST), on Cerro Chajnantor in the Atacama desert of Chile, will conduct wide-field and small deep-field surveys of the sky with more than 100,000 detectors on the Prime-Cam instrument. Kinetic inductance detectors (KIDs) were chosen as the primary sensor technology for their high density focal plane packing. Additionally, they benefit from low cost, ease of fabrication, and simplified cryogenic readout, which are all beneficial for successful deployment at scale. The cryogenic multiplexing complexity is pulled out of the cryostat and is instead pushed into the digital signal processing of the room temperature electronics. Using the Xilinx Radio Frequency System on a Chip (RFSoC), a highly multiplexed KID readout was developed for the first light Prime-Cam and commissioning Mod-Cam instruments. We report on the performance of the RFSoC-based readout with multiple detector arrays in various cryogenic setups. Specifically we demonstrate detector noise limited performance of the RFSoC-based readout under the expected optical loading conditions. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: draft submitted to SPIE

arXiv:2406.14259 [pdf, other]

doi 10.1109/ICASSP48485.2024.10446117

MEAT: Median-Ensemble Adversarial Training for Improving Robustness and Generalization

Authors: Zhaozhe Hu, Jia-Li Yin, Bin Chen, Luojun Lin, Bo-Hao Chen, Ximeng Liu

Abstract: Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the A… ▽ More Self-ensemble adversarial training methods improve model robustness by ensembling models at different training epochs, such as model weight averaging (WA). However, previous research has shown that self-ensemble defense methods in adversarial training (AT) still suffer from robust overfitting, which severely affects the generalization performance. Empirically, in the late phases of training, the AT becomes more overfitting to the extent that the individuals for weight averaging also suffer from overfitting and produce anomalous weight values, which causes the self-ensemble model to continue to undergo robust overfitting due to the failure in removing the weight anomalies. To solve this problem, we aim to tackle the influence of outliers in the weight space in this work and propose an easy-to-operate and effective Median-Ensemble Adversarial Training (MEAT) method to solve the robust overfitting phenomenon existing in self-ensemble defense from the source by searching for the median of the historical model weights. Experimental results show that MEAT achieves the best robustness against the powerful AutoAttack and can effectively allievate the robust overfitting. We further demonstrate that most defense methods can improve robust generalization and robustness by combining with MEAT. △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13368 [pdf]

Lewis Acidity and Basicity Diagnostics of Molten Salt for its Properties and Structure Online Monitoring

Authors: Changzu Zhu, Jia Song, Xiaorui Xu, Chengyu Wang, Yang Tong, Lve Lin, Shaoqiang Guo, Wentao Zhou, Adrien Couet, Yafei Wang

Abstract: Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscop… ▽ More Analogous to the aqueous solution where the pH of the solvent affects its multiple behaviors, the Lewis acidity-basicity of molten salts also greatly influences their thermophysical and thermochemical properties. In the study, we develop ion probes to quantitatively determine the acidity-basicity scale of molten NaCl-xAlCl3 (x = 1.5-2.1) salt using in-situ ultra-violet visible (UV-Vis) spectroscopy. With the accumulation of acidity-basicity data of NaCl-AlCl3 molten salt for a variety of compositions, the correlation between the acidity-basicity of salt and its measured fundamental properties are derived. To understand the physical and chemical features controlling the acidity-basicity variations, the structures of NaCl-xAlCl3 molten salts with different chemical compositions are investigated in terms of bonded complexes and coordination numbers. The comprehensive understanding of the correlation between composition, acidity-basicity, properties, and structures of molten salt can serve for the full screening and online monitoring of salt melt in extreme environments by simply measuring the salt acidity-basicity as developed in this study. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12501 [pdf, other]

Improving Multi-modal Recommender Systems by Denoising and Aligning Multi-modal Content and User Feedback

Authors: Guipeng Xv, Xinyu Li, Ruobing Xie, Chen Lin, Chong Liu, Feng Xia, Zhanhui Kang, Leyu Lin

Abstract: Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommende… ▽ More Multi-modal recommender systems (MRSs) are pivotal in diverse online web platforms and have garnered considerable attention in recent years. However, previous studies overlook the challenges of (1) noisy multi-modal content, (2) noisy user feedback, and (3) aligning multi-modal content with user feedback. In order to tackle these challenges, we propose Denoising and Aligning Multi-modal Recommender System (DA-MRS). To mitigate multi-modal noise, DA-MRS first constructs item-item graphs determined by consistent content similarity across modalities. To denoise user feedback, DA-MRS associates the probability of observed feedback with multi-modal content and devises a denoised BPR loss. Furthermore, DA-MRS implements Alignment guided by User preference to enhance task-specific item representation and Alignment guided by graded Item relations to provide finer-grained alignment. Extensive experiments verify that DA-MRS is a plug-and-play framework and achieves significant and consistent improvements across various datasets, backbone models, and noisy scenarios. △ Less

Submitted 18 June, 2024; originally announced June 2024.

arXiv:2406.12148 [pdf, other]

Complex variable solution on noncircular and asymmetrical tunnelling embedded by bidirectional conformal mapping incorporating Charge Simulation Method

Authors: Luobin Lin, Fuquan Chen, Changjie Zheng, Shangshun Lin

Abstract: Mechanical issues of noncircular and asymmetrical tunnelling can be estimated using complex variable method with suitable conformal mapping. Exsiting solution schemes of conformal mapping for noncircular tunnel generally need iteration or optimization strategy, and are thereby mathematically complicated. This paper proposes a new bidirectional conformal mapping for deep and shallow tunnels of nonc… ▽ More Mechanical issues of noncircular and asymmetrical tunnelling can be estimated using complex variable method with suitable conformal mapping. Exsiting solution schemes of conformal mapping for noncircular tunnel generally need iteration or optimization strategy, and are thereby mathematically complicated. This paper proposes a new bidirectional conformal mapping for deep and shallow tunnels of noncircular and asymmetrical shapes by incorporating Charge Simulation Method. The solution scheme of this new bidirectional conformal mapping only involves a pair of linear systems, and is therefore logically straight-forward, computationally efficient, and practically easy in coding. New numerical strategies are developed to deal with possible sharp corners of cavity by small arc simulation and densified collocation points. Several numerical examples are presented to illustrate the geometrical usage of the new bidirectional conformal mapping. Furthermore, the new bidirectional conformal mapping is embedded into two complex variable solutions of noncircular and asymmetrical shallow tunnelling in gravitational geomaterial with reasonable far-field displacement. The respective result comparisons with finite element solution and exsiting analytical solution show good agreements, indicating the feasible mechanical usage of the new bidirectional conformal mapping. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 41 pages, 13 figures

arXiv:2406.10801 [pdf, other]

Saliency-guided and Patch-based Mixup for Long-tailed Skin Cancer Image Classification

Authors: Tianyunxi Wei, Yijin Huang, Li Lin, Pujin Cheng, Sirui Li, Xiaoying Tang

Abstract: Medical image datasets often exhibit long-tailed distributions due to the inherent challenges in medical data collection and annotation. In long-tailed contexts, some common disease categories account for most of the data, while only a few samples are available in the rare disease categories, resulting in poor performance of deep learning methods. To address this issue, previous approaches have em… ▽ More Medical image datasets often exhibit long-tailed distributions due to the inherent challenges in medical data collection and annotation. In long-tailed contexts, some common disease categories account for most of the data, while only a few samples are available in the rare disease categories, resulting in poor performance of deep learning methods. To address this issue, previous approaches have employed class re-sampling or re-weighting techniques, which often encounter challenges such as overfitting to tail classes or difficulties in optimization during training. In this work, we propose a novel approach, namely \textbf{S}aliency-guided and \textbf{P}atch-based \textbf{Mix}up (SPMix) for long-tailed skin cancer image classification. Specifically, given a tail-class image and a head-class image, we generate a new tail-class image by mixing them under the guidance of saliency mapping, which allows for preserving and augmenting the discriminative features of the tail classes without any interference of the head-class features. Extensive experiments are conducted on the ISIC2018 dataset, demonstrating the superiority of SPMix over existing state-of-the-art methods. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: IEEE ISBI2024

arXiv:2406.08466 [pdf, other]

Scaling Laws in Linear Regression: Compute, Parameters, and Data

Authors: Licong Lin, Jingfeng Wu, Sham M. Kakade, Peter L. Bartlett, Jason D. Lee

Abstract: Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, wh… ▽ More Empirically, large-scale deep learning models often satisfy a neural scaling law: the test error of the trained model improves polynomially as the model size and data size grow. However, conventional wisdom suggests the test error consists of approximation, bias, and variance errors, where the variance error increases with model size. This disagrees with the general form of neural scaling laws, which predict that increasing model size monotonically improves performance. We study the theory of scaling laws in an infinite dimensional linear regression setup. Specifically, we consider a model with $M$ parameters as a linear function of sketched covariates. The model is trained by one-pass stochastic gradient descent (SGD) using $N$ data. Assuming the optimal parameter satisfies a Gaussian prior and the data covariance matrix has a power-law spectrum of degree $a>1$, we show that the reducible part of the test error is $Θ(M^{-(a-1)} + N^{-(a-1)/a})$. The variance error, which increases with $M$, is dominated by the other errors due to the implicit regularization of SGD, thus disappearing from the bound. Our theory is consistent with the empirical neural scaling laws and verified by numerical simulation. △ Less

Submitted 12 June, 2024; originally announced June 2024.

arXiv:2406.07574 [pdf, other]

Biharmonic Distance of Graphs and its Higher-Order Variants: Theoretical Properties with Applications to Centrality and Clustering

Authors: Mitchell Black, Lucy Lin, Amir Nayyeri, Weng-Keen Wong

Abstract: Effective resistance is a distance between vertices of a graph that is both theoretically interesting and useful in applications. We study a variant of effective resistance called the biharmonic distance. While the effective resistance measures how well-connected two vertices are, we prove several theoretical results supporting the idea that the biharmonic distance measures how important an edge i… ▽ More Effective resistance is a distance between vertices of a graph that is both theoretically interesting and useful in applications. We study a variant of effective resistance called the biharmonic distance. While the effective resistance measures how well-connected two vertices are, we prove several theoretical results supporting the idea that the biharmonic distance measures how important an edge is to the global topology of the graph. Our theoretical results connect the biharmonic distance to well-known measures of connectivity of a graph like its total resistance and sparsity. Based on these results, we introduce two clustering algorithms using the biharmonic distance. Finally, we introduce a further generalization of the biharmonic distance that we call the $k$-harmonic distance. We empirically study the utility of biharmonic and $k$-harmonic distance for edge centrality and graph clustering. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted to ICML 2024

arXiv:2406.07357 [pdf, other]

PSMC: Provable and Scalable Algorithms for Motif Conductance Based Graph Clustering

Authors: Longlong Lin, Tao Jia, Zeli Wang, Jin Zhao, Rong-Hua Li

Abstract: Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instance… ▽ More Higher-order graph clustering aims to partition the graph using frequently occurring subgraphs. Motif conductance is one of the most promising higher-order graph clustering models due to its strong interpretability. However, existing motif conductance based graph clustering algorithms are mainly limited by a seminal two-stage reweighting computing framework, needing to enumerate all motif instances to obtain an edge-weighted graph for partitioning. However, such a framework has two-fold vital defects: (1) It can only provide a quadratic bound for the motif with three vertices, and whether there is provable clustering quality for other motifs is still an open question. (2) The enumeration procedure of motif instances incurs prohibitively high costs against large motifs or large dense graphs due to combinatorial explosions. Besides, expensive spectral clustering or local graph diffusion on the edge-weighted graph also makes existing methods unable to handle massive graphs with millions of nodes. To overcome these dilemmas, we propose a Provable and Scalable Motif Conductance algorithm PSMC, which has a fixed and motif-independent approximation ratio for any motif. Specifically, PSMC first defines a new vertex metric Motif Resident based on the given motif, which can be computed locally. Then, it iteratively deletes the vertex with the smallest motif resident value very efficiently using novel dynamic update technologies. Finally, it outputs the locally optimal result during the above iterative process. To further boost efficiency, we propose several effective bounds to estimate the motif resident value of each vertex, which can greatly reduce computational costs. Empirical results show that our proposed algorithms achieve 3.2-32 times speedup and improve the quality by at least 12 times than the baselines. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.06828 [pdf, other]

CCAT: Comparisons of 280 GHz TiN and Al Kinetic Inductance Detector Arrays

Authors: Cody J. Duell, Jason Austermann, James Beall, James R. Burgoyne, Scott C. Chapman, Steve K. Choi, Rodrigo G. Freundt, Jiansong Gao, Christopher Groppi, Anthony I. Huber, Zachary B. Huber, Johannes Hubmayr, Ben Keller, Yaqiong Li, Lawrence T. Lin, Justin Matthewson, Philip Mauskopf, Alicia Middleton, Colin C. Murphy, Michael D. Niemack, Thomas Nikola, Adrian K. Sinclair, Ema Smith, Jeff van Lanen, Anna Vaskuri , et al. (5 additional authors not shown)

Abstract: The CCAT Collaboration's six-meter Fred Young Submillimeter Telescope is scheduled to begin observing in the Chilean Atacama in 2025, targeting a variety of science goals throughout cosmic history. Prime-Cam is a 1.8-meter diameter cryostat that will host up to seven independent instrument modules designed for simultaneous spectroscopic and broadband, polarimetric surveys at millimeter to submilli… ▽ More The CCAT Collaboration's six-meter Fred Young Submillimeter Telescope is scheduled to begin observing in the Chilean Atacama in 2025, targeting a variety of science goals throughout cosmic history. Prime-Cam is a 1.8-meter diameter cryostat that will host up to seven independent instrument modules designed for simultaneous spectroscopic and broadband, polarimetric surveys at millimeter to submillimeter wavelengths. The first of these instrument modules, the 280 GHz module, will include ${\sim}$10,000 kinetic inductance detectors (KIDs) across three arrays. While the first array was fabricated out of tri-layer TiN/Ti/TiN, the other two arrays were fabricated out of a single layer of Al. This combination of materials within the same instrument provides a unique opportunity to directly compare the performance and noise properties of two different detector materials that are seeing increasing use within the field. We present preliminary comparisons here based on lab testing, along with a discussion of the potential impacts on operation when observing and translating raw data to science-grade maps. △ Less

Submitted 10 June, 2024; originally announced June 2024.

Comments: 6 pages, 3 figures, conference proceedings submitted to the Journal of Low Temperature Physics

arXiv:2406.02990 [pdf, other]

Predicting Genetic Mutation from Whole Slide Images via Biomedical-Linguistic Knowledge Enhanced Multi-label Classification

Authors: Gexin Huang, Chenfei Wu, Mingjie Li, Xiaojun Chang, Ling Chen, Ying Sun, Shen Zhao, Xiaodan Liang, Liang Lin

Abstract: Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To… ▽ More Predicting genetic mutations from whole slide images is indispensable for cancer diagnosis. However, existing work training multiple binary classification models faces two challenges: (a) Training multiple binary classifiers is inefficient and would inevitably lead to a class imbalance problem. (b) The biological relationships among genes are overlooked, which limits the prediction performance. To tackle these challenges, we innovatively design a Biological-knowledge enhanced PathGenomic multi-label Transformer to improve genetic mutation prediction performances. BPGT first establishes a novel gene encoder that constructs gene priors by two carefully designed modules: (a) A gene graph whose node features are the genes' linguistic descriptions and the cancer phenotype, with edges modeled by genes' pathway associations and mutation consistencies. (b) A knowledge association module that fuses linguistic and biomedical knowledge into gene priors by transformer-based graph representation learning, capturing the intrinsic relationships between different genes' mutations. BPGT then designs a label decoder that finally performs genetic mutation prediction by two tailored modules: (a) A modality fusion module that firstly fuses the gene priors with critical regions in WSIs and obtains gene-wise mutation logits. (b) A comparative multi-label loss that emphasizes the inherent comparisons among mutation status to enhance the discrimination capabilities. Sufficient experiments on The Cancer Genome Atlas benchmark demonstrate that BPGT outperforms the state-of-the-art. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 16 pages, 8 figures, and 3 tables

arXiv:2406.02978 [pdf, other]

Self-Supervised Skeleton Action Representation Learning: A Benchmark and Beyond

Authors: Jiahang Zhang, Lilang Lin, Shuai Yang, Jiaying Liu

Abstract: Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for label-efficient skeleton-based action understanding. Different from the image domain, skeleton data possesses sparser spatial structures and diverse representation forms, with the absence of background clues and the additional temporal dimension. This presents the… ▽ More Self-supervised learning (SSL), which aims to learn meaningful prior representations from unlabeled data, has been proven effective for label-efficient skeleton-based action understanding. Different from the image domain, skeleton data possesses sparser spatial structures and diverse representation forms, with the absence of background clues and the additional temporal dimension. This presents the new challenges for the pretext task design of spatial-temporal motion representation learning. Recently, many endeavors have been made for skeleton-based SSL and remarkable progress has been achieved. However, a systematic and thorough review is still lacking. In this paper, we conduct, for the first time, a comprehensive survey on self-supervised skeleton-based action representation learning, where various literature is organized according to their pre-training pretext task methodologies. Following the taxonomy of context-based, generative learning, and contrastive learning approaches, we make a thorough review and benchmark of existing works and shed light on the future possible directions. Our investigation demonstrates that most SSL works rely on the single paradigm, learning representations of a single level, and are evaluated on the action recognition task solely, which leaves the generalization power of skeleton SSL models under-explored. To this end, a novel and effective SSL method for skeleton is further proposed, which integrates multiple pretext tasks to jointly learn versatile representations of different granularity, substantially boosting the generalization capacity for different downstream tasks. Extensive experiments under three large-scale datasets demonstrate that the proposed method achieves the superior generalization performance on various downstream tasks, including recognition, retrieval, detection, and few-shot learning. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.02086 [pdf, other]

Multi-level quantum signal processing with applications to ground state preparation using fast-forwarded Hamiltonian evolution

Authors: Yulong Dong, Lin Lin

Abstract: The preparation of the ground state of a Hamiltonian $H$ with a large spectral radius has applications in many areas such as electronic structure theory and quantum field theory. Given an initial state with a constant overlap with the ground state, and assuming that the Hamiltonian $H$ can be efficiently simulated with an ideal fast-forwarding protocol, we first demonstrate that employing a linear… ▽ More The preparation of the ground state of a Hamiltonian $H$ with a large spectral radius has applications in many areas such as electronic structure theory and quantum field theory. Given an initial state with a constant overlap with the ground state, and assuming that the Hamiltonian $H$ can be efficiently simulated with an ideal fast-forwarding protocol, we first demonstrate that employing a linear combination of unitaries (LCU) approach can prepare the ground state at a cost of $\mathcal{O}(\log^2(\|H\| Δ^{-1}))$ queries to controlled Hamiltonian evolution. Here $\|H\|$ is the spectral radius of $H$ and $Δ$ the spectral gap. However, traditional Quantum Signal Processing (QSP)-based methods fail to capitalize on this efficient protocol, and its cost scales as $\mathcal{O}(\|H\| Δ^{-1})$. To bridge this gap, we develop a multi-level QSP-based algorithm that exploits the fast-forwarding feature. This novel algorithm not only matches the efficiency of the LCU approach when an ideal fast-forwarding protocol is available, but also exceeds it with a reduced cost that scales as $\mathcal{O}(\log(\|H\| Δ^{-1}))$. Additionally, our multi-level QSP method requires only $\mathcal{O}(\log(\|H\| Δ^{-1}))$ coefficients for implementing single qubit rotations. This eliminates the need for constructing the PREPARE oracle in LCU, which prepares a state encoding $\mathcal{O}(\|H\| Δ^{-1})$ coefficients regardless of whether the Hamiltonian can be fast-forwarded. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: 25 pages, 6 figures

arXiv:2406.02059 [pdf, other]

Graph Adversarial Diffusion Convolution

Authors: Songtao Liu, Jinghui Chen, Tianfan Fu, Lu Lin, Marinka Zitnik, Dinghao Wu

Abstract: This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC… ▽ More This paper introduces a min-max optimization formulation for the Graph Signal Denoising (GSD) problem. In this formulation, we first maximize the second term of GSD by introducing perturbations to the graph structure based on Laplacian distance and then minimize the overall loss of the GSD. By solving the min-max optimization problem, we derive a new variant of the Graph Diffusion Convolution (GDC) architecture, called Graph Adversarial Diffusion Convolution (GADC). GADC differs from GDC by incorporating an additional term that enhances robustness against adversarial attacks on the graph structure and noise in node features. Moreover, GADC improves the performance of GDC on heterophilic graphs. Extensive experiments demonstrate the effectiveness of GADC across various datasets. Code is available at https://github.com/SongtaoLiu0823/GADC. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2406.01858 [pdf, other]

CCAT: FYST Prime-Cam Readout Software: A framework for massively scalable KID arrays

Authors: James R. Burgoyne, Adrian K. Sinclair, Scott C. Chapman, Steve K. Choi, Cody J. Duell, Anthony I. Huber, Zachary B. Huber, Ben Keller, Lawrence Lin, Michael D. Niemack, Douglas Scott, Eve M. Vavagiakis, Samantha Walker, Matt Xie, the CCAT collaboration

Abstract: We outline the development of the readout software for the Prime-Cam and Mod-Cam instruments on the CCAT Fred Young Submillimeter Telescope (FYST), primecam_readout. The instruments feature lumped-element kinetic inductance detector (LEKID) arrays driven by Xilinx ZCU111 RFSoC boards. In the current configuration, each board can drive up to 4000 KIDs, and Prime-Cam is implementing approximately 25… ▽ More We outline the development of the readout software for the Prime-Cam and Mod-Cam instruments on the CCAT Fred Young Submillimeter Telescope (FYST), primecam_readout. The instruments feature lumped-element kinetic inductance detector (LEKID) arrays driven by Xilinx ZCU111 RFSoC boards. In the current configuration, each board can drive up to 4000 KIDs, and Prime-Cam is implementing approximately 25 boards. The software runs on a centralized control computer connected to the boards via dedicated ethernet, and facilitates such tasks as frequency-multiplexed tone comb driving, comb calibration and optimization, and detector timestream establishment. The control computer utilizes dynamically generated control channels for each board, allowing for simultaneous parallel control over all, while uniquely tracking diagnostics for each. This work demonstrates a scalable RFSoC readout architecture where computational demands increase linearly with the number of detectors, enabling control of tens-of-thousands of KIDs with modest hardware, and opening the door to the next generation of KID arrays housing millions of detectors. △ Less

Submitted 3 June, 2024; originally announced June 2024.

Comments: SPIE Astronomical Telescopes + Instrumentation conference proceedings

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.00783 [pdf, other]

AI-Face: A Million-Scale Demographically Annotated AI-Generated Face Dataset and Fairness Benchmark

Authors: Li Lin, Santosh, Xin Wang, Shu Hu

Abstract: AI-generated faces have enriched human life, such as entertainment, education, and art. However, they also pose misuse risks. Therefore, detecting AI-generated faces becomes crucial, yet current detectors show biased performance across different demographic groups. Mitigating biases can be done by designing algorithmic fairness methods, which usually require demographically annotated face datasets… ▽ More AI-generated faces have enriched human life, such as entertainment, education, and art. However, they also pose misuse risks. Therefore, detecting AI-generated faces becomes crucial, yet current detectors show biased performance across different demographic groups. Mitigating biases can be done by designing algorithmic fairness methods, which usually require demographically annotated face datasets for model training. However, no existing dataset comprehensively encompasses both demographic attributes and diverse generative methods, which hinders the development of fair detectors for AI-generated faces. In this work, we introduce the AI-Face dataset, the first million-scale demographically annotated AI-generated face image dataset, including real faces, faces from deepfake videos, and faces generated by Generative Adversarial Networks and Diffusion Models. Based on this dataset, we conduct the first comprehensive fairness benchmark to assess various AI face detectors and provide valuable insights and findings to promote the future fair design of AI face detectors. Our AI-Face dataset and benchmark code are publicly available at https://github.com/Purdue-M2/AI-Face-FairnessBench. △ Less

Submitted 4 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00659 [pdf, other]

High Performance Operation of a Direct-Current and Superconducting Radio-Frequency Combined Photocathode Gun

Authors: H. Jia, T. Li, T. Wang, Y. Zhao, X. Zhang, H. Xu, Z. Liu, J. Liu, L. Lin, H. Xie, L. Feng, F. Wang, F. Zhu, J. Hao, S. Quan, K. Liu, S. Huang

Abstract: Superconducting radio-frequency (SRF) guns are promising candidates to deliver high brightness continuous-wave (CW) electron beams for new generations of coherent linac light sources, ultrafast electron diffractions, MeV pulsed beam applications, etc. To solve the compatibility problem of semiconductor photocathodes, a hybrid gun combining a direct-current gap and an SRF cavity has been developed.… ▽ More Superconducting radio-frequency (SRF) guns are promising candidates to deliver high brightness continuous-wave (CW) electron beams for new generations of coherent linac light sources, ultrafast electron diffractions, MeV pulsed beam applications, etc. To solve the compatibility problem of semiconductor photocathodes, a hybrid gun combining a direct-current gap and an SRF cavity has been developed. The gun, employing K2CsSb photocathodes driven by a green laser, has been brought into stable CW operation with a dark current below 100 pA, delivering electron beams at an energy gain of 2.4 MeV, an electron bunch charge of 100 pC, and a repetition rate of 1 MHz. A normalized beam emittance of 0.54 mm-mrad has been achieved at the bunch charge of 100 pC and peak current of about 6 A. CW operation at 81.25 MHz repetition rate has also been tested with the maximum average beam current reaching 3 mA. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: 6 pages, 5 figures

arXiv:2406.00632 [pdf, other]

Diff-Mosaic: Augmenting Realistic Representations in Infrared Small Target Detection via Diffusion Prior

Authors: Yukai Shi, Yupei Lin, Pengxu Wei, Xiaoyu Xian, Tianshui Chen, Liang Lin

Abstract: Recently, researchers have proposed various deep learning methods to accurately detect infrared targets with the characteristics of indistinct shape and texture. Due to the limited variety of infrared datasets, training deep learning models with good generalization poses a challenge. To augment the infrared dataset, researchers employ data augmentation techniques, which often involve generating ne… ▽ More Recently, researchers have proposed various deep learning methods to accurately detect infrared targets with the characteristics of indistinct shape and texture. Due to the limited variety of infrared datasets, training deep learning models with good generalization poses a challenge. To augment the infrared dataset, researchers employ data augmentation techniques, which often involve generating new images by combining images from different datasets. However, these methods are lacking in two respects. In terms of realism, the images generated by mixup-based methods lack realism and are difficult to effectively simulate complex real-world scenarios. In terms of diversity, compared with real-world scenes, borrowing knowledge from another dataset inherently has a limited diversity. Currently, the diffusion model stands out as an innovative generative approach. Large-scale trained diffusion models have a strong generative prior that enables real-world modeling of images to generate diverse and realistic images. In this paper, we propose Diff-Mosaic, a data augmentation method based on the diffusion model. This model effectively alleviates the challenge of diversity and realism of data augmentation methods via diffusion prior. Specifically, our method consists of two stages. Firstly, we introduce an enhancement network called Pixel-Prior, which generates highly coordinated and realistic Mosaic images by harmonizing pixels. In the second stage, we propose an image enhancement strategy named Diff-Prior. This strategy utilizes diffusion priors to model images in the real-world scene, further enhancing the diversity and realism of the images. Extensive experiments have demonstrated that our approach significantly improves the performance of the detection network. The code is available at https://github.com/YupeiLin2388/Diff-Mosaic △ Less

Submitted 2 June, 2024; originally announced June 2024.

arXiv:2406.00510 [pdf, other]

Learning Background Prompts to Discover Implicit Knowledge for Open Vocabulary Object Detection

Authors: Jiaming Li, Jiacheng Zhang, Jichang Li, Ge Li, Si Liu, Liang Lin, Guanbin Li

Abstract: Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories. Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection, significantly generalizing the powerful capabilities of the detector to identify mor… ▽ More Open vocabulary object detection (OVD) aims at seeking an optimal object detector capable of recognizing objects from both base and novel categories. Recent advances leverage knowledge distillation to transfer insightful knowledge from pre-trained large-scale vision-language models to the task of object detection, significantly generalizing the powerful capabilities of the detector to identify more unknown object categories. However, these methods face significant challenges in background interpretation and model overfitting and thus often result in the loss of crucial background knowledge, giving rise to sub-optimal inference performance of the detector. To mitigate these issues, we present a novel OVD framework termed LBP to propose learning background prompts to harness explored implicit background knowledge, thus enhancing the detection performance w.r.t. base and novel categories. Specifically, we devise three modules: Background Category-specific Prompt, Background Object Discovery, and Inference Probability Rectification, to empower the detector to discover, represent, and leverage implicit object knowledge explored from background proposals. Evaluation on two benchmark datasets, OV-COCO and OV-LVIS, demonstrates the superiority of our proposed method over existing state-of-the-art approaches in handling the OVD tasks. △ Less

Submitted 1 June, 2024; originally announced June 2024.

Comments: CVPR2024

arXiv:2406.00045 [pdf, other]

Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization

Authors: Yuanpu Cao, Tianrong Zhang, Bochuan Cao, Ziyi Yin, Lu Lin, Fenglong Ma, Jinghui Chen

Abstract: Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracti… ▽ More Researchers have been studying approaches to steer the behavior of Large Language Models (LLMs) and build personalized LLMs tailored for various applications. While fine-tuning seems to be a direct solution, it requires substantial computational resources and may significantly affect the utility of the original LLM. Recent endeavors have introduced more lightweight strategies, focusing on extracting "steering vectors" to guide the model's output toward desired behaviors by adjusting activations within specific layers of the LLM's transformer architecture. However, such steering vectors are directly extracted from the activations of human preference data and thus often lead to suboptimal results and occasional failures, especially in alignment-related scenarios. This work proposes an innovative approach that could produce more effective steering vectors through bi-directional preference optimization. Our method is designed to allow steering vectors to directly influence the generation probability of contrastive human preference data pairs, thereby offering a more precise representation of the target behavior. By carefully adjusting the direction and magnitude of the steering vector, we enabled personalized control over the desired behavior across a spectrum of intensities. Extensive experimentation across various open-ended generation tasks, particularly focusing on steering AI personas, has validated the efficacy of our approach. Moreover, we comprehensively investigate critical alignment-concerning scenarios, such as managing truthfulness, mitigating hallucination, and addressing jailbreaking attacks. Remarkably, our method can still demonstrate outstanding steering effectiveness across these scenarios. Furthermore, we showcase the transferability of our steering vectors across different models/LoRAs and highlight the synergistic benefits of applying multiple vectors simultaneously. △ Less

Submitted 28 May, 2024; originally announced June 2024.

arXiv:2405.20404 [pdf, other]

XPrompt:Explaining Large Language Model's Generation via Joint Prompt Attribution

Authors: Yurui Chang, Bochuan Cao, Yujia Wang, Jinghui Chen, Lu Lin

Abstract: Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of elucidating and explaining the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to b… ▽ More Large Language Models (LLMs) have demonstrated impressive performances in complex text generation tasks. However, the contribution of the input prompt to the generated content still remains obscure to humans, underscoring the necessity of elucidating and explaining the causality between input and output pairs. Existing works for providing prompt-specific explanation often confine model output to be classification or next-word prediction. Few initial attempts aiming to explain the entire language generation often treat input prompt texts independently, ignoring their combinatorial effects on the follow-up generation. In this study, we introduce a counterfactual explanation framework based on joint prompt attribution, XPrompt, which aims to explain how a few prompt texts collaboratively influences the LLM's complete generation. Particularly, we formulate the task of prompt attribution for generation interpretation as a combinatorial optimization problem, and introduce a probabilistic algorithm to search for the casual input combination in the discrete space. We define and utilize multiple metrics to evaluate the produced explanations, demonstrating both faithfulness and efficiency of our framework. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.18386 [pdf, other]

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

Authors: Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

Abstract: Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; o… ▽ More Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; other research uses large language models to predict edited music, resulting in imprecise audio reconstruction. To Combine the strengths and address these limitations, we introduce Instruct-MusicGen, a novel approach that finetunes a pretrained MusicGen model to efficiently follow editing instructions such as adding, removing, or separating stems. Our approach involves a modification of the original MusicGen architecture by incorporating a text fusion module and an audio fusion module, which allow the model to process instruction texts and audio inputs concurrently and yield the desired edited music. Remarkably, Instruct-MusicGen only introduces 8% new parameters to the original MusicGen model and only trains for 5K steps, yet it achieves superior performance across all tasks compared to existing baselines, and demonstrates performance comparable to the models trained for specific tasks. This advancement not only enhances the efficiency of text-to-music editing but also broadens the applicability of music language models in dynamic music production environments. △ Less

Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: Code and demo are available at: https://github.com/ldzhangyx/instruct-musicgen

arXiv:2405.17496 [pdf, other]

UU-Mamba: Uncertainty-aware U-Mamba for Cardiac Image Segmentation

Authors: Ting Yu Tsai, Li Lin, Shu Hu, Ming-Ching Chang, Hongtu Zhu, Xin Wang

Abstract: Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, w… ▽ More Biomedical image segmentation is critical for accurate identification and analysis of anatomical structures in medical imaging, particularly in cardiac MRI. Manual segmentation is labor-intensive, time-consuming, and prone to errors, highlighting the need for automated methods. However, current machine learning approaches face challenges like overfitting and data demands. To tackle these issues, we propose a new UU-Mamba model, integrating the U-Mamba model with the Sharpness-Aware Minimization (SAM) optimizer and an uncertainty-aware loss function. SAM enhances generalization by locating flat minima in the loss landscape, thus reducing overfitting. The uncertainty-aware loss combines region-based, distribution-based, and pixel-based loss designs to improve segmentation accuracy and robustness. Evaluation of our method is performed on the ACDC cardiac dataset, outperforming state-of-the-art models including TransUNet, Swin-Unet, nnUNet, and nnFormer. Our approach achieves Dice Similarity Coefficient (DSC) and Mean Squared Error (MSE) scores, demonstrating its effectiveness in cardiac MRI segmentation. △ Less

Submitted 4 June, 2024; v1 submitted 25 May, 2024; originally announced May 2024.

arXiv:2405.16768 [pdf, other]

Time-dependent complex variable solution on quasi three-dimensional shallow tunnelling in gravititational geomaterial with reasonable far-field displacement

Authors: Luobin Lin, Fuquan Chen, Changjie Zheng

Abstract: Three-dimensional effect of tunnel face and gravitational excavation generally occur in shallow tunnelling, which are nevertheless not adequately considered in present complex variable solutions. In this paper, a new time-dependent complex variable solution on quasi three-dimensional shallow tunnelling in gravitational geomaterial is derived, and the far-field displacement singularity is eliminate… ▽ More Three-dimensional effect of tunnel face and gravitational excavation generally occur in shallow tunnelling, which are nevertheless not adequately considered in present complex variable solutions. In this paper, a new time-dependent complex variable solution on quasi three-dimensional shallow tunnelling in gravitational geomaterial is derived, and the far-field displacement singularity is eliminated by fixed far-field ground surface in the whole excavation time span. With an equivalent coefficient of three-dimensional effect, the quasi three-dimensional shallow tunnelling is transformed into a plane strain problem with time-dependent virtual traction along tunnel periphery. The mixed boundaries of fixed far-field ground surface and nearby free segment form a homogenerous Riemann-Hilbert problem with extra constraints of the virtual traction along tunnel periphery, which is simultaneously solved using an iterative linear system with good numerical stability. The mixed boundary conditions along the ground surface in the whole excavation time span are well satisified in a numerical case, which is further examined by comparing with corresponding finite element solution. The results are in good agreements, and the proposed solution illustrates high efficiency. More discussions are made on excavation rate, viscosity, and solution convergence. A latent paradox is disclosed for objectivity. △ Less

Submitted 26 May, 2024; originally announced May 2024.

Comments: 38p pages, 11 figures

arXiv:2405.15768 [pdf, other]

Canonical Variates in Wasserstein Metric Space

Authors: Jia Li, Lin Lin

Abstract: In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reducti… ▽ More In this paper, we address the classification of instances each characterized not by a singular point, but by a distribution on a vector space. We employ the Wasserstein metric to measure distances between distributions, which are then used by distance-based classification algorithms such as k-nearest neighbors, k-means, and pseudo-mixture modeling. Central to our investigation is dimension reduction within the Wasserstein metric space to enhance classification accuracy. We introduce a novel approach grounded in the principle of maximizing Fisher's ratio, defined as the quotient of between-class variation to within-class variation. The directions in which this ratio is maximized are termed discriminant coordinates or canonical variates axes. In practice, we define both between-class and within-class variations as the average squared distances between pairs of instances, with the pairs either belonging to the same class or to different classes. This ratio optimization is achieved through an iterative algorithm, which alternates between optimal transport and maximization steps within the vector space. We conduct empirical studies to assess the algorithm's convergence and, through experimental validation, demonstrate that our dimension reduction technique substantially enhances classification performance. Moreover, our method outperforms well-established algorithms that operate on vector representations derived from distributional data. It also exhibits robustness against variations in the distributional representations of data clouds. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: double space 37 pages, 6 figures

arXiv:2405.15297 [pdf]

doi 10.1103/PhysRevB.109.184112

High-field magnetoelectric coupling and successive magnetic transitions in Mn-doped polar antiferromagnet Ni3TeO6

Authors: J. H. Zhang, L. Lin, C. Dong, Y. T. Chang, J. F. Wang, C. L. Lu, P. Z. Chen, W. J. Zhai, G. Z. Zhou, L. Huang, Y. S. Tang, S. H. Zheng, M. F. Liu, X. H. Zhou, Z. B. Yan, J. -M. Liu

Abstract: Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 sing… ▽ More Among the 3d transition metal ions doped polar Ni3TeO6, Mn-doped Ni3TeO6 has stimulated great interest due to its high magnetic ordering temperature and complex magnetic phases, but the mechanism of magnetoelectric (ME) coupling is far from understood. Herein we report our systematic investigation of the chemical control of magnetism, metamagnetic transition, and ME properties of Ni3-xMnxTeO6 single crystals in high magnetic field (H) up to 52 T. We present a previously unreported weak ferromagnetic behavior appeared in the ab plane below 9.5 K in addition to the incommensurate helical and commensurate collinear antiferromagnetic states. In the low-field region, a spin-flop type metamagnetic transition without any hysteresis occurs at Hc1 for H // c, while another metamagnetic transition accompanied with a change in electric polarization is observed at Hc2 in the high-field region both for H // c and H // ab above 30 K, which can be attributed to the sudden rotation of magnetic moments at Ni2 sites. The ME measurements reveal that a first-order ME effect is observed in the low-T and low-H regions, while a second-order ME coupling term appears above 30 K in the magnetic field range of Hc1 < H < Hc2 for H // c and H < Hc2 for H // ab, both becoming significant with increasing temperature. Eventually, they are dominated by the second-order ME effect near the antiferromagnetic transition temperature. The present work demonstrates that Ni3-xMnxTeO6 is an exotic magnetoelectric material compared with Ni3TeO6 and its derivatives, thereby providing insights to better understand the magnetism and ME coupling in Ni3TeO6 and its derivatives. △ Less

Submitted 29 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

Comments: 30 pages with 8 figures

Journal ref: Phys. Rev. B 109, 184112 (2024)

arXiv:2405.15280 [pdf, other]

DFGNN: Dual-frequency Graph Neural Network for Sign-aware Feedback

Authors: Yiqing Wu, Ruobing Xie, Zhao Zhang, Xu Zhang, Fuzhen Zhuang, Leyu Lin, Zhanhui Kang, Yongjun Xu

Abstract: The graph-based recommendation has achieved great success in recent years. However, most existing graph-based recommendations focus on capturing user preference based on positive edges/feedback, while ignoring negative edges/feedback (e.g., dislike, low rating) that widely exist in real-world recommender systems. How to utilize negative feedback in graph-based recommendations still remains underex… ▽ More The graph-based recommendation has achieved great success in recent years. However, most existing graph-based recommendations focus on capturing user preference based on positive edges/feedback, while ignoring negative edges/feedback (e.g., dislike, low rating) that widely exist in real-world recommender systems. How to utilize negative feedback in graph-based recommendations still remains underexplored. In this study, we first conducted a comprehensive experimental analysis and found that (1) existing graph neural networks are not well-suited for modeling negative feedback, which acts as a high-frequency signal in a user-item graph. (2) The graph-based recommendation suffers from the representation degeneration problem. Based on the two observations, we propose a novel model that models positive and negative feedback from a frequency filter perspective called Dual-frequency Graph Neural Network for Sign-aware Recommendation (DFGNN). Specifically, in DFGNN, the designed dual-frequency graph filter (DGF) captures both low-frequency and high-frequency signals that contain positive and negative feedback. Furthermore, the proposed signed graph regularization is applied to maintain the user/item embedding uniform in the embedding space to alleviate the representation degeneration problem. Additionally, we conduct extensive experiments on real-world datasets and demonstrate the effectiveness of the proposed model. Codes of our model will be released upon acceptance. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Accepted by KDD 2024 Research Track

arXiv:2405.14767 [pdf, other]

FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

Authors: Hongyang Yang, Boyu Zhang, Neng Wang, Cheng Guo, Xiaoli Zhang, Likun Lin, Junlin Wang, Tianyu Zhou, Mao Guan, Runjia Zhang, Christina Dan Wang

Abstract: As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community. These challenges impede the AI community's ability to enhance financial tasks effectively. Acknowledging financial analysis's critical role, we aim… ▽ More As financial institutions and professionals increasingly incorporate Large Language Models (LLMs) into their workflows, substantial barriers, including proprietary data and specialized knowledge, persist between the finance sector and the AI community. These challenges impede the AI community's ability to enhance financial tasks effectively. Acknowledging financial analysis's critical role, we aim to devise financial-specialized LLM-based toolchains and democratize access to them through open-source initiatives, promoting wider AI adoption in financial decision-making. In this paper, we introduce FinRobot, a novel open-source AI agent platform supporting multiple financially specialized AI agents, each powered by LLM. Specifically, the platform consists of four major layers: 1) the Financial AI Agents layer that formulates Financial Chain-of-Thought (CoT) by breaking sophisticated financial problems down into logical sequences; 2) the Financial LLM Algorithms layer dynamically configures appropriate model application strategies for specific tasks; 3) the LLMOps and DataOps layer produces accurate models by applying training/fine-tuning techniques and using task-relevant data; 4) the Multi-source LLM Foundation Models layer that integrates various LLMs and enables the above layers to access them directly. Finally, FinRobot provides hands-on for both professional-grade analysts and laypersons to utilize powerful AI techniques for advanced financial analysis. We open-source FinRobot at \url{https://github.com/AI4Finance-Foundation/FinRobot}. △ Less

Submitted 27 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

Comments: FinRobot Whitepaper V1.0

arXiv:2405.14023 [pdf, other]

WordGame: Efficient & Effective LLM Jailbreak via Simultaneous Obfuscation in Query and Response

Authors: Tianrong Zhang, Bochuan Cao, Yuanpu Cao, Lu Lin, Prasenjit Mitra, Jinghui Chen

Abstract: The recent breakthrough in large language models (LLMs) such as ChatGPT has revolutionized production processes at an unprecedented pace. Alongside this progress also comes mounting concerns about LLMs' susceptibility to jailbreaking attacks, which leads to the generation of harmful or unsafe content. While safety alignment measures have been implemented in LLMs to mitigate existing jailbreak atte… ▽ More The recent breakthrough in large language models (LLMs) such as ChatGPT has revolutionized production processes at an unprecedented pace. Alongside this progress also comes mounting concerns about LLMs' susceptibility to jailbreaking attacks, which leads to the generation of harmful or unsafe content. While safety alignment measures have been implemented in LLMs to mitigate existing jailbreak attempts and force them to become increasingly complicated, it is still far from perfect. In this paper, we analyze the common pattern of the current safety alignment and show that it is possible to exploit such patterns for jailbreaking attacks by simultaneous obfuscation in queries and responses. Specifically, we propose WordGame attack, which replaces malicious words with word games to break down the adversarial intent of a query and encourage benign content regarding the games to precede the anticipated harmful content in the response, creating a context that is hardly covered by any corpus used for safety alignment. Extensive experiments demonstrate that WordGame attack can break the guardrails of the current leading proprietary and open-source LLMs, including the latest Claude-3, GPT-4, and Llama-3 models. Further ablation studies on such simultaneous obfuscation in query and response provide evidence of the merits of the attack strategy beyond an individual attack. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.12521 [pdf, other]

Unleash Graph Neural Networks from Heavy Tuning

Authors: Lequan Lin, Dai Shi, Andi Han, Zhiyong Wang, Junbin Gao

Abstract: Graph Neural Networks (GNNs) are deep-learning architectures designed for graph-type data, where understanding relationships among individual observations is crucial. However, achieving promising GNN performance, especially on unseen data, requires comprehensive hyperparameter tuning and meticulous training. Unfortunately, these processes come with high computational costs and significant human ef… ▽ More Graph Neural Networks (GNNs) are deep-learning architectures designed for graph-type data, where understanding relationships among individual observations is crucial. However, achieving promising GNN performance, especially on unseen data, requires comprehensive hyperparameter tuning and meticulous training. Unfortunately, these processes come with high computational costs and significant human effort. Additionally, conventional searching algorithms such as grid search may result in overfitting on validation data, diminishing generalization accuracy. To tackle these challenges, we propose a graph conditional latent diffusion framework (GNN-Diff) to generate high-performing GNNs directly by learning from checkpoints saved during a light-tuning coarse search. Our method: (1) unleashes GNN training from heavy tuning and complex search space design; (2) produces GNN parameters that outperform those obtained through comprehensive grid search; and (3) establishes higher-quality generation for GNNs compared to diffusion frameworks designed for general neural networks. △ Less

Submitted 21 May, 2024; originally announced May 2024.

Showing 1–50 of 1,718 results for author: Lin, L