subscribe to arXiv mailings

Integrating AI in College Education: Positive yet Mixed Experiences with ChatGPT

Authors: Xinrui Song, Jiajin Zhang, Pingkun Yan, Juergen Hahn, Uwe Kruger, Hisham Mohamed, Ge Wang

Abstract: The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduat… ▽ More The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduate medical imaging course in the Spring 2024 semester. This study investigates the use of ChatGPT throughout a semester-long trial, providing insights into students' engagement, perception, and the overall educational effectiveness of the technology. We systematically collected and analyzed data concerning students' interaction with ChatGPT, focusing on their attitudes, concerns, and usage patterns. The findings indicate that ChatGPT offers significant advantages such as improved information access and increased interactivity, but its adoption is accompanied by concerns about the accuracy of the information provided and the necessity for well-defined guidelines to optimize its use. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.03959 [pdf, other]

Skyrmion Hall effect in altermagnets

Authors: Zhejunyu Jin, Zhaozhuo Zeng, Yunshan Cao, Peng Yan

Abstract: It is widely believed that the skyrmion Hall effect is absent in antiferromagnets because of the vanishing topological charge. However, the Aharonov-Casher theory indicates the possibility of topological effects for neutral particles. In this work, we predict the skyrmion Hall effect in emerging altermagnets with zero net magnetization and zero skyrmion charge. We first show that the neutral skyrm… ▽ More It is widely believed that the skyrmion Hall effect is absent in antiferromagnets because of the vanishing topological charge. However, the Aharonov-Casher theory indicates the possibility of topological effects for neutral particles. In this work, we predict the skyrmion Hall effect in emerging altermagnets with zero net magnetization and zero skyrmion charge. We first show that the neutral skyrmion manifests as a magnetic quadrupole in altermagnets. We reveal a hidden gauge field from the magnetic quadrupole, which induces the skyrmion Hall effect when driven by spin transfer torque. Interestingly, we identify a sign change of the Hall angle when one swaps the anisotropic exchange couplings in altermagnets. Furthermore, we demonstrate that both the velocity and Hall angle of altermagnetic skyrmions sensitively depend on the current direction. Our findings real the critical role of magnetic quadrupole in driving the skyrmion Hall effect with vanishing charge, and pave the way to discovering new Hall effect of neutral quasiparticles beyond magnetic skyrmions. △ Less

Submitted 4 July, 2024; originally announced July 2024.

Comments: 6 pages and 5 figures

arXiv:2407.03658 [pdf, other]

GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels

Authors: Jianhao Yan, Pingchuan Yan, Yulong Chen, Judy Li, Xianchao Zhu, Yue Zhang

Abstract: This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We a… ▽ More This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We also observe the imbalanced performance across different languages and domains, with GPT-4's translation capability gradually weakening from resource-rich to resource-poor directions. In addition, we qualitatively study the translation given by GPT-4 and human translators, and find that GPT-4 translator suffers from literal translations, but human translators sometimes overthink the background information. To our knowledge, this study is the first to evaluate LLMs against human translators and analyze the systematic differences between their outputs, providing valuable insights into the current state of LLM-based translation and its potential limitations. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.00557 [pdf, other]

Explaining Chest X-ray Pathology Models using Textual Concepts

Authors: Vijay Sadashivaiah, Mannudeep K. Kalra, Pingkun Yan, James A. Hendler

Abstract: Deep learning models have revolutionized medical imaging and diagnostics, yet their opaque nature poses challenges for clinical adoption and trust. Amongst approaches to improve model interpretability, concept-based explanations aim to provide concise and human understandable explanations of any arbitrary classifier. However, such methods usually require a large amount of manually collected data w… ▽ More Deep learning models have revolutionized medical imaging and diagnostics, yet their opaque nature poses challenges for clinical adoption and trust. Amongst approaches to improve model interpretability, concept-based explanations aim to provide concise and human understandable explanations of any arbitrary classifier. However, such methods usually require a large amount of manually collected data with concept annotation, which is often scarce in the medical domain. In this paper, we propose Conceptual Counterfactual Explanations for Chest X-ray (CoCoX) that leverage existing vision-language models (VLM) joint embedding space to explain black-box classifier outcomes without the need for annotated datasets. Specifically, we utilize textual concepts derived from chest radiography reports and a pre-trained chest radiography-based VLM to explain three common cardiothoracic pathologies. We demonstrate that the explanations generated by our method are semantically meaningful and faithful to underlying pathologies. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2407.00541 [pdf]

Answering real-world clinical questions using large language model based systems

Authors: Yen Sia Low, Michael L. Jackson, Rebecca J. Hyde, Robert E. Brown, Neil M. Sanghavi, Julian D. Baldwin, C. William Pike, Jananee Muralidharan, Gavin Hui, Natasha Alexander, Hadeel Hassan, Rahul V. Nene, Morgan Pike, Courtney J. Pokrzywa, Shivam Vedak, Adam Paul Yan, Dong-han Yao, Amy R. Zipursky, Christina Dinh, Philip Ballentine, Dan C. Derieg, Vladimir Polony, Rehan N. Chawdry, Jordan Davies, Brigham B. Hyde , et al. (2 additional authors not shown)

Abstract: Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas… ▽ More Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 28 pages (2 figures, 3 tables) inclusive of 8 pages of supplemental materials (4 supplemental figures and 4 supplemental tables)

arXiv:2407.00514 [pdf, ps, other]

Combining Classical and Probabilistic Independence Reasoning to Verify the Security of Oblivious Algorithms (Extended Version)

Authors: Pengbo Yan, Toby Murray, Olga Ohrimenko, Van-Thuan Pham, Robert Sison

Abstract: We consider the problem of how to verify the security of probabilistic oblivious algorithms formally and systematically. Unfortunately, prior program logics fail to support a number of complexities that feature in the semantics and invariant needed to verify the security of many practical probabilistic oblivious algorithms. We propose an approach based on reasoning over perfectly oblivious approxi… ▽ More We consider the problem of how to verify the security of probabilistic oblivious algorithms formally and systematically. Unfortunately, prior program logics fail to support a number of complexities that feature in the semantics and invariant needed to verify the security of many practical probabilistic oblivious algorithms. We propose an approach based on reasoning over perfectly oblivious approximations, using a program logic that combines both classical Hoare logic reasoning and probabilistic independence reasoning to support all the needed features. We formalise and prove our new logic sound in Isabelle/HOL and apply our approach to formally verify the security of several challenging case studies beyond the reach of prior methods for proving obliviousness. △ Less

Submitted 29 June, 2024; originally announced July 2024.

arXiv:2406.19631 [pdf, other]

Personalized Interpretation on Federated Learning: A Virtual Concepts approach

Authors: Peng Yan, Guodong Long, Jing Jiang, Michael Blumenstein

Abstract: Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixt… ▽ More Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixture of conceptual vectors that each one represents an interpretable concept to end-users. These conceptual vectors could be pre-defined or refined in a human-in-the-loop process or be learnt via the optimization procedure of the federated learning system. In addition to the interpretability, the clarity of client-specific personalization could also be applied to enhance the robustness of the training process on FL system. The effectiveness of the proposed method have been validated on benchmark datasets. △ Less

Submitted 27 June, 2024; originally announced June 2024.

arXiv:2406.13953 [pdf, other]

Peculiar corner states in magnetic fractals

Authors: Zhixiong Li, Peng Yan

Abstract: Topological excitations in periodic magnetic crystals have received significant recent attention. However, it is an open question on their fate once the lattice periodicity is broken. In this work, we theoretically study the topological properties embedded in the collective dynamics of magnetic texture array arranged into a Sierpiński carpet structure with effective Hausdorff dimensionality… ▽ More Topological excitations in periodic magnetic crystals have received significant recent attention. However, it is an open question on their fate once the lattice periodicity is broken. In this work, we theoretically study the topological properties embedded in the collective dynamics of magnetic texture array arranged into a Sierpiński carpet structure with effective Hausdorff dimensionality $d_{f}=1.893$. By evaluating the quantized real-space quadrupole moment, we obtain the phase diagram supporting peculiar corner states that are absent in conventional square lattices. We identify three different higher-order topological states, i.e., outer corner state, type I and type II inner corner states. We further show that all these corner states are topologically protected and are robust against moderate disorder. Full micromagnetic simulations are performed to verify theoretical predictions with good agreement. Our results pave the way to investigating topological phases of magnetic texture based fractals and bridging the gap between magnetic topology and fractality. △ Less

Submitted 19 June, 2024; originally announced June 2024.

Comments: 5 figures

arXiv:2406.09298 [pdf, other]

Magnon spin transport through atomic ferrimagnetic domain walls

Authors: Zhaozhuo Zeng, Peng Yan

Abstract: It is a well-established notion that the spin of a magnon should be flipped when it passes through a $180^{\circ}$ domain wall (DW) in both ferromagnets and antiferromagnets, while the magnon spin transport through ferrimagnetic DW is still elusive. In this work, we report that the magnon preserves its spin after the transmission through an atomically sharp DW in ferrimagnets, due to the intriguin… ▽ More It is a well-established notion that the spin of a magnon should be flipped when it passes through a $180^{\circ}$ domain wall (DW) in both ferromagnets and antiferromagnets, while the magnon spin transport through ferrimagnetic DW is still elusive. In this work, we report that the magnon preserves its spin after the transmission through an atomically sharp DW in ferrimagnets, due to the intriguing interband magnon scattering at the domain interface. This finding may provide significant insight to resolve the puzzling insensitivity of magnon spin diffusion to the $180^{\circ}$ ferrimagnetic DWs observed by recent experiments. Our results reveal the unique role of ferrimagnetic DWs in manipulating the magnon spin and may facilitate the design of novel magnonic devices based on ferrimagnets. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.00258 [pdf, other]

Artemis: Towards Referential Understanding in Complex Videos

Authors: Jihao Qiu, Yuan Zhang, Xi Tang, Lingxi Xie, Tianren Ma, Pengyu Yan, David Doermann, Qixiang Ye, Yunjie Tian

Abstract: Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language quest… ▽ More Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language question with a bounding box in any video frame and describes the referred target in the entire video. The key to achieving this goal lies in extracting compact, target-specific video features, where we set a solid baseline by tracking and selecting spatiotemporal features from the video. We train Artemis on the newly established VideoRef45K dataset with 45K video-QA pairs and design a computationally efficient, three-stage training procedure. Results are promising both quantitatively and qualitatively. Additionally, we show that \model can be integrated with video grounding and text summarization tools to understand more complex scenarios. Code and data are available at https://github.com/qiujihao19/Artemis. △ Less

Submitted 31 May, 2024; originally announced June 2024.

Comments: 19 pages, 14 figures. Code and data are available at https://github.com/qiujihao19/Artemis

arXiv:2405.18533 [pdf, other]

Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba

Authors: Zefan Yang, Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

Abstract: Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower level… ▽ More Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower levels of radiation, offering a safer option. This rationale motivates our investigation into the feasibility of using chest X-ray for predicting CVD risk. Convolutional Neural Networks (CNNs) and Transformers are two established network architectures for computer-aided diagnosis. However, they struggle to model very high resolution chest X-ray due to the lack of large context modeling power or quadratic time complexity. Inspired by state space sequence models (SSMs), a new class of network architectures with competitive sequence modeling power as Transfomers and linear time complexity, we propose Bidirectional Image Mamba (BI-Mamba) to complement the unidirectional SSMs with opposite directional information. BI-Mamba utilizes parallel forward and backwark blocks to encode longe-range dependencies of multi-view chest X-rays. We conduct extensive experiments on images from 10,395 subjects in National Lung Screening Trail (NLST). Results show that BI-Mamba outperforms ResNet-50 and ViT-S with comparable parameter size, and saves significant amount of GPU memory during training. Besides, BI-Mamba achieves promising performance compared with previous state of the art in CT, unraveling the potential of chest X-ray for CVD risk prediction. △ Less

Submitted 28 May, 2024; originally announced May 2024.

Comments: Early accepted paper for MICCAI 2024

arXiv:2405.15728 [pdf, other]

Disease-informed Adaptation of Vision-Language Models

Authors: Jiajin Zhang, Ge Wang, Mannudeep K. Kalra, Pingkun Yan

Abstract: In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely abs… ▽ More In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely absent from the pretraining dataset. We argue that effective adaptation of VLMs hinges on the nuanced representation learning of disease concepts. By capitalizing on the joint visual-linguistic capabilities of VLMs, we introduce disease-informed contextual prompting in a novel disease prototype learning framework. This approach enables VLMs to grasp the concepts of new disease effectively and efficiently, even with limited data. Extensive experiments across multiple image modalities showcase notable enhancements in performance compared to existing techniques. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: Early Accepted by MICCAI 2024

arXiv:2405.14643 [pdf, other]

Circuit realization of topological physics

Authors: Huanhuan Yang, Lingling Song, Yunshan Cao, Peng Yan

Abstract: Recently, topolectrical circuits (TECs) boom in studying the topological states of matter. The resemblance between circuit Laplacians and tight-binding models in condensed matter physics allows for the exploration of exotic topological phases on the circuit platform. In this review, we begin by presenting the basic equations for the circuit elements and units, along with the fundamentals and exper… ▽ More Recently, topolectrical circuits (TECs) boom in studying the topological states of matter. The resemblance between circuit Laplacians and tight-binding models in condensed matter physics allows for the exploration of exotic topological phases on the circuit platform. In this review, we begin by presenting the basic equations for the circuit elements and units, along with the fundamentals and experimental methods for TECs. Subsequently, we retrospect the main literature in this field, encompassing the circuit realization of (higher-order) topological insulators and semimetals. Due to the abundant electrical elements and flexible connections, many unconventional topological states like the non-Hermitian, nonlinear, non-Abelian, non-periodic, non-Euclidean, and higher-dimensional topological states that are challenging to observe in conventional condensed matter physics, have been observed in circuits and summarized in this review. Furthermore, we show the capability of electrical circuits for exploring the physical phenomena in other systems, such as photonic and magnetic ones. Importantly, we highlight TEC systems are convenient for manufacture and miniaturization because of their compatibility with the traditional integrated circuits. Finally, we prospect the future directions in this exciting field, and connect the emerging TECs with the development of topology physics, (meta)material designs, and device applications. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2405.13467 [pdf, other]

AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning

Authors: Di Qiu, Xinyang Lin, Kaiye Wang, Xiangxiang Chu, Pengfei Yan

Abstract: With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scena… ▽ More With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scenarios. In this paper, we propose a simple yet effective federated face recognition framework called AdaFedFR, by devising an adaptive inter-class representation learning algorithm to enhance the generalization of the generic face model and the efficiency of federated training under strict privacy-preservation. In particular, our work delicately utilizes feature representations of public identities as learnable negative knowledge to optimize the local objective within the feature space, which further encourages the local model to learn powerful representations and optimize personalized models for clients. Experimental results demonstrate that our method outperforms previous approaches on several prevalent face recognition benchmarks within less than 3 communication rounds, which shows communication-friendly and great efficiency. △ Less

Submitted 22 May, 2024; originally announced May 2024.

arXiv:2405.11344 [pdf]

LiPost: Improved Content Understanding With Effective Use of Multi-task Contrastive Learning

Authors: Akanksha Bindal, Sudarshan Ramanujam, Dave Golland, TJ Hazen, Tina Jiang, Fengyu Zhang, Peng Yan

Abstract: In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling… ▽ More In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks. We observe positive transfer, leading to superior performance across all tasks when compared to training independently on each. Our model outperforms the baseline on zero shot learning and offers improved multilingual support, highlighting its potential for broader application. The specialized content embeddings produced by our model outperform generalized embeddings offered by OpenAI on Linkedin dataset and tasks. This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications. Our work offers insights and best practices for the field to build on. △ Less

Submitted 13 July, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

arXiv:2405.01962 [pdf]

Optical skyrmions from metafibers

Authors: Tiantian He, Yuan Meng, Lele Wang, Hongkun Zhong, Nilo Mata-Cervera, Dan Li, Ping Yan, Qiang Liu, Yijie Shen, Qirong Xiao

Abstract: Optical skyrmions are an emerging class of structured light with sophisticated particle-like topologies with great potential for revolutionizing modern informatics. However, the current generation of optical skyrmions involves complex or bulky systems, hindering their development of practical applications. Here, exploiting the emergent "lab-on-fiber" technology, we demonstrate the design of a meta… ▽ More Optical skyrmions are an emerging class of structured light with sophisticated particle-like topologies with great potential for revolutionizing modern informatics. However, the current generation of optical skyrmions involves complex or bulky systems, hindering their development of practical applications. Here, exploiting the emergent "lab-on-fiber" technology, we demonstrate the design of a metafiber-integrated photonic skyrmion generator. We not only successfully generated high-quality optical skyrmions from metafibers, but also experimentally verified their remarkable properties, such as regulability and topological stability with deep-subwavelength features beyond the diffraction limits. Our flexible and fiber-integrated optical skyrmions platform paves the avenue for future applications of topologically-enhanced remote super-resolution microscopy and super-robust information transfer. △ Less

Submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.17743 [pdf, ps, other]

Fourier Coefficients and Algebraic Cusp Forms on $\mathrm{U}(2,n)$

Authors: Anton Hilado, Finn McGlade, Pan Yan

Abstract: We establish a theory of scalar Fourier coefficients for a class of non-holomorphic, automorphic forms on the quaternionic real Lie group $\mathrm{U}(2,n)$. By studying the theta lifts of holomorphic modular forms from $\mathrm{U}(1,1)$, we apply this theory to obtain examples of non-holomorphic cusp forms on $\mathrm{U}(2,n)$ whose Fourier coefficients are algebraic numbers. We establish a theory of scalar Fourier coefficients for a class of non-holomorphic, automorphic forms on the quaternionic real Lie group $\mathrm{U}(2,n)$. By studying the theta lifts of holomorphic modular forms from $\mathrm{U}(1,1)$, we apply this theory to obtain examples of non-holomorphic cusp forms on $\mathrm{U}(2,n)$ whose Fourier coefficients are algebraic numbers. △ Less

Submitted 26 April, 2024; originally announced April 2024.

Comments: 27 pages, comments welcome

MSC Class: 11F30

arXiv:2404.09235 [pdf, other]

PDRs4All IX. Sulfur elemental abundance in the Orion Bar

Authors: Asunción Fuente, Evelyne Roueff, Franck Le Petit, Jacques Le Bourlot, Emeric Bron, Mark G. Wolfire, James F. Babb, Pei-Gen Yan, Takashi Onaka, John H. Black, Ilane Schroetter, Dries Van De Putte, Ameek Sidhu, Amélie Canin, Boris Trahin, Felipe Alarcón, Ryan Chown, Olga Kannavou, Olivier Berné, Emilie Habart, Els Peeters, Javier R. Goicoechea, Marion Zannese, Raphael Meshaka, Yoko Okada , et al. (9 additional authors not shown)

Abstract: One of the main problems in astrochemistry is determining the amount of sulfur in volatiles and refractories in the interstellar medium. The detection of the main sulfur reservoirs (icy H$_2$S and atomic gas) has been challenging, and estimates are based on the reliability of models to account for the abundances of species containing less than 1% of the total sulfur. The high sensitivity of the Ja… ▽ More One of the main problems in astrochemistry is determining the amount of sulfur in volatiles and refractories in the interstellar medium. The detection of the main sulfur reservoirs (icy H$_2$S and atomic gas) has been challenging, and estimates are based on the reliability of models to account for the abundances of species containing less than 1% of the total sulfur. The high sensitivity of the James Webb Space Telescope provides an unprecedented opportunity to estimate the sulfur abundance through the observation of the [S I] 25.249 $μ$m line. We used the [S III] 18.7 $μ$m, [S IV] 10.5 $μ$m, and [S l] 25.249 $μ$m lines to estimate the amount of sulfur in the ionized and molecular gas along the Orion Bar. For the theoretical part, we used an upgraded version of the Meudon photodissociation region (PDR) code to model the observations. New inelastic collision rates of neutral atomic sulfur with ortho- and para- molecular hydrogen were calculated to predict the line intensities. The [S III] 18.7 $μ$m and [S IV] 10.5 $μ$m lines are detected over the imaged region with a shallow increase (by a factor of 4) toward the HII region. We estimate a moderate sulfur depletion, by a factor of $\sim$2, in the ionized gas. The corrugated interface between the molecular and atomic phases gives rise to several edge-on dissociation fronts we refer to as DF1, DF2, and DF3. The [S l] 25.249 $μ$m line is only detected toward DF2 and DF3, the dissociation fronts located farthest from the HII region. The detailed modeling of DF3 using the Meudon PDR code shows that the emission of the [S l] 25.249 $μ$m line is coming from warm ($>$ 40 K) molecular gas located at A$_{\rm V}$ $\sim$ 1$-$5 mag from the ionization front. Moreover, the intensity of the [S l] 25.249 $μ$m line is only accounted for if we assume the presence of undepleted sulfur. △ Less

Submitted 4 June, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: 16 pages, 6 figures. Accepted for publication in Astronomy and Astrophysics

arXiv:2404.08450 [pdf, other]

Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues

Authors: Xianhua He, Dashuang Liang, Song Yang, Zhanlong Hao, Hui Ma, Binjie Mao, Xi Li, Yao Wang, Pengfei Yan, Ajian Liu

Abstract: Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev… ▽ More Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to develop and maintain multiple models. To jointly detect physical and digital attacks within a single model, we propose an innovative approach that can adapt to any network architecture. Our approach mainly contains two types of data augmentation, which we call Simulated Physical Spoofing Clues augmentation (SPSC) and Simulated Digital Spoofing Clues augmentation (SDSC). SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Extensive experiments show that SPSC and SDSC can achieve state-of-the-art generalization in Protocols 2.1 and 2.2 of the UniAttackData dataset, respectively. Our method won first place in "Unified Physical-Digital Face Attack Detection" of the 5th Face Anti-spoofing Challenge@CVPR2024. Our final submission obtains 3.75% APCER, 0.93% BPCER, and 2.34% ACER, respectively. Our code is available at https://github.com/Xianhua-He/cvpr2024-face-anti-spoofing-challenge. △ Less

Submitted 12 April, 2024; originally announced April 2024.

Comments: 10 pages with 6 figures, Accepted by CVPRW 2024

arXiv:2404.08361 [pdf, other]

Large-Scale Multi-Domain Recommendation: an Automatic Domain Feature Extraction and Personalized Integration Framework

Authors: Dongbo Xi, Zhen Chen, Yuexian Wang, He Cui, Chong Peng, Fuzhen Zhuang, Peng Yan

Abstract: Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dianping), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing cha… ▽ More Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dianping), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing challenges: (1) Accurately depicting the differences among domains using domain features is crucial for enhancing the performance of each domain. However, manually designing domain features and models for numerous domains can be a laborious task. (2) Users typically have limited impressions in only a few domains. Extracting features automatically from other domains and leveraging them to improve the predictive capabilities of each domain has consistently posed a challenging problem. In this paper, we propose an Automatic Domain Feature Extraction and Personalized Integration (DFEI) framework for the large-scale multi-domain recommendation. The framework automatically transforms the behavior of each individual user into an aggregation of all user behaviors within the domain, which serves as the domain features. Unlike offline feature engineering methods, the extracted domain features are higher-order representations and directly related to the target label. Besides, by personalized integration of domain features from other domains for each user and the innovation in the training mode, the DFEI framework can yield more accurate conversion identification. Experimental results on both public and industrial datasets, consisting of over 20 domains, clearly demonstrate that the proposed framework achieves significantly better performance compared with SOTA baselines. Furthermore, we have released the source code of the proposed framework at https://github.com/xidongbo/DFEI. △ Less

Submitted 14 April, 2024; v1 submitted 12 April, 2024; originally announced April 2024.

Comments: 8 pages

arXiv:2404.03181 [pdf, other]

MonoCD: Monocular 3D Object Detection with Complementary Depths

Authors: Longfei Yan, Pei Yan, Shengzhou Xiong, Xuanyu Xiang, Yihua Tan

Abstract: Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formu… ▽ More Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formulate the object depth estimation as an ensemble of multiple depth predictions to mitigate the insufficiency of single-depth information. However, the errors of existing multiple depths tend to have the same sign, which hinders them from neutralizing each other and limits the overall accuracy of combined depth. To alleviate this problem, we propose to increase the complementarity of depths with two novel designs. First, we add a new depth prediction branch named complementary depth that utilizes global and efficient depth clues from the entire image rather than the local clues to reduce the correlation of depth predictions. Second, we propose to fully exploit the geometric relations between multiple depth clues to achieve complementarity in form. Benefiting from these designs, our method achieves higher complementarity. Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data. In addition, complementary depth can also be a lightweight and plug-and-play module to boost multiple existing monocular 3d object detectors. Code is available at https://github.com/elvintanhust/MonoCD. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: Accepted to CVPR 2024

arXiv:2404.02655 [pdf, other]

Calibrating the Confidence of Large Language Models by Eliciting Fidelity

Authors: Mozhi Zhang, Mianqiu Huang, Rundong Shi, Linsen Guo, Chong Peng, Peng Yan, Yaqian Zhou, Xipeng Qiu

Abstract: Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the… ▽ More Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence}. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 17 pages, 13 figures

arXiv:2404.00561 [pdf, ps, other]

Epsilon dichotomy for twisted linear models

Authors: Hang Xue, Pan Yan

Abstract: Let $E/F$ be a quadratic extension of local nonarchimedean fields of characteristic zero and let $D$ be a quaternion algebra over $F$ containing $E$. In this paper, we study a relation between the existence of twisted linear models on $\mathrm{GL}_n(D)$ and the local root numbers. Let $E/F$ be a quadratic extension of local nonarchimedean fields of characteristic zero and let $D$ be a quaternion algebra over $F$ containing $E$. In this paper, we study a relation between the existence of twisted linear models on $\mathrm{GL}_n(D)$ and the local root numbers. △ Less

Submitted 31 March, 2024; originally announced April 2024.

Comments: 38 pages

MSC Class: 11F70; 22E50

arXiv:2403.19499 [pdf, other]

Client-supervised Federated Learning: Towards One-model-for-all Personalization

Authors: Peng Yan, Guodong Long

Abstract: Personalized Federated Learning (PerFL) is a new machine learning paradigm that delivers personalized models for diverse clients under federated learning settings. Most PerFL methods require extra learning processes on a client to adapt a globally shared model to the client-specific personalized model using its own local data. However, the model adaptation process in PerFL is still an open challen… ▽ More Personalized Federated Learning (PerFL) is a new machine learning paradigm that delivers personalized models for diverse clients under federated learning settings. Most PerFL methods require extra learning processes on a client to adapt a globally shared model to the client-specific personalized model using its own local data. However, the model adaptation process in PerFL is still an open challenge in the stage of model deployment and test time. This work tackles the challenge by proposing a novel federated learning framework to learn only one robust global model to achieve competitive performance to those personalized models on unseen/test clients in the FL system. Specifically, we design a new Client-Supervised Federated Learning (FedCS) to unravel clients' bias on instances' latent representations so that the global model can learn both client-specific and client-agnostic knowledge. Experimental study shows that the FedCS can learn a robust FL global model for the changing data distributions of unseen/test clients. The FedCS's global model can be directly deployed to the test clients while achieving comparable performance to other personalized FL methods that require model adaptation. △ Less

Submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18154 [pdf, ps, other]

Cohomology classes, periods, and special values of Rankin-Selberg $L$-functions

Authors: Yubo Jin, Pan Yan

Abstract: In this article, we give a cohomological interpretation of (a special case of) the integrals constructed by the second named author and Q. Zhang \cite{YanZhang2023} which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_n\times\mathrm{GL}_m$ and $\mathrm{GL}_n\times\mathrm{GL}_{n-m-1}$ for $m<n$. As an application, we prove an algebraicity result for the special values of cert… ▽ More In this article, we give a cohomological interpretation of (a special case of) the integrals constructed by the second named author and Q. Zhang \cite{YanZhang2023} which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_n\times\mathrm{GL}_m$ and $\mathrm{GL}_n\times\mathrm{GL}_{n-m-1}$ for $m<n$. As an application, we prove an algebraicity result for the special values of certain $L$-functions. This work is a generalization of the algebraicity result of Raghuram for $\mathrm{GL}_n\times\mathrm{GL}_{n-1}$ \cite{Raghuram2010} in the special case $m=n-1$, and the results of Mahnkopf \cite{Mahnkopf1998, Mahnkopf2005} in the special case $m=n-2$. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: 21 pages

MSC Class: 11F67; 11F70; 11F75; 22E55

arXiv:2403.00274 [pdf, other]

CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation

Authors: Xi Liu, Ying Guo, Cheng Zhen, Tong Li, Yingying Ao, Pengfei Yan

Abstract: Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but… ▽ More Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but cannot freely control the listener's motions. Since listener agents should have human-like attributes (e.g. identity, personality) which can be freely customized by users, this limits their realism. In this paper, we propose a user-friendly framework called CustomListener to realize the free-form text prior guided listener generation. To achieve speaker-listener coordination, we design a Static to Dynamic Portrait module (SDP), which interacts with speaker information to transform static text into dynamic portrait token with completion rhythm and amplitude information. To achieve coherence between segments, we design a Past Guided Generation Module (PGG) to maintain the consistency of customized listener attributes through the motion prior, and utilize a diffusion-based structure conditioned on the portrait token and the motion prior to realize the controllable generation. To train and evaluate our model, we have constructed two text-annotated listening head datasets based on ViCo and RealTalk, which provide text-video paired labels. Extensive experiments have verified the effectiveness of our model. △ Less

Submitted 29 March, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024

arXiv:2403.00209 [pdf, other]

ChartReformer: Natural Language-Driven Chart Image Editing

Authors: Pengyu Yan, Mahesh Bhosale, Jay Lal, Bikhyat Adhikari, David Doermann

Abstract: Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance for different application scenarios. To eliminate the need for original underlying data and information to perform chart editing, we propose ChartRef… ▽ More Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance for different application scenarios. To eliminate the need for original underlying data and information to perform chart editing, we propose ChartReformer, a natural language-driven chart image editing solution that directly edits the charts from the input images with the given instruction prompts. The key in this method is that we allow the model to comprehend the chart and reason over the prompt to generate the corresponding underlying data table and visual attributes for new charts, enabling precise edits. Additionally, to generalize ChartReformer, we define and standardize various types of chart editing, covering style, layout, format, and data-centric edits. The experiments show promising results for the natural language-driven chart image editing. △ Less

Submitted 1 May, 2024; v1 submitted 29 February, 2024; originally announced March 2024.

Comments: Published in ICDAR 2024. Code and model are available at https://github.com/pengyu965/ChartReformer

arXiv:2402.15687 [pdf, other]

General Purpose Image Encoder DINOv2 for Medical Image Registration

Authors: Xinrui Song, Xuanang Xu, Pingkun Yan

Abstract: Existing medical image registration algorithms rely on either dataset specific training or local texture-based features to align images. The former cannot be reliably implemented without large modality-specific training datasets, while the latter lacks global semantics thus could be easily trapped at local minima. In this paper, we present a training-free deformable image registration method, DINO… ▽ More Existing medical image registration algorithms rely on either dataset specific training or local texture-based features to align images. The former cannot be reliably implemented without large modality-specific training datasets, while the latter lacks global semantics thus could be easily trapped at local minima. In this paper, we present a training-free deformable image registration method, DINO-Reg, leveraging a general purpose image encoder DINOv2 for image feature extraction. The DINOv2 encoder was trained using the ImageNet data containing natural images. We used the pretrained DINOv2 without any finetuning. Our method feeds the DINOv2 encoded features into a discrete optimizer to find the optimal deformable registration field. We conducted a series of experiments to understand the behavior and role of such a general purpose image encoder in the application of image registration. Combined with handcrafted features, our method won the first place in the recent OncoReg Challenge. To our knowledge, this is the first application of general vision foundation models in medical image registration. △ Less

Submitted 23 February, 2024; originally announced February 2024.

arXiv:2402.00137 [pdf, other]

Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT

Authors: Diego Machado Reyes, Hanqing Chao, Juergen Hahn, Li Shen, Pingkun Yan

Abstract: Alzheimer's disease (AD) is the most prevalent neurodegenerative disease; yet its currently available treatments are limited to stopping disease progression. Moreover, effectiveness of these treatments is not guaranteed due to the heterogenetiy of the disease. Therefore, it is essential to be able to identify the disease subtypes at a very early stage. Current data driven approaches are able to cl… ▽ More Alzheimer's disease (AD) is the most prevalent neurodegenerative disease; yet its currently available treatments are limited to stopping disease progression. Moreover, effectiveness of these treatments is not guaranteed due to the heterogenetiy of the disease. Therefore, it is essential to be able to identify the disease subtypes at a very early stage. Current data driven approaches are able to classify the subtypes at later stages of AD or related disorders, but struggle when predicting at the asymptomatic or prodromal stage. Moreover, most existing models either lack explainability behind the classification or only use a single modality for the assessment, limiting scope of its analysis. Thus, we propose a multimodal framework that uses early-stage indicators such as imaging, genetics and clinical assessments to classify AD patients into subtypes at early stages. Similarly, we build prompts and use large language models, such as ChatGPT, to interpret the findings of our model. In our framework, we propose a tri-modal co-attention mechanism (Tri-COAT) to explicitly learn the cross-modal feature associations. Our proposed model outperforms baseline models and provides insight into key cross-modal feature associations supported by known biological mechanisms. △ Less

Submitted 31 January, 2024; originally announced February 2024.

arXiv:2401.08407 [pdf, other]

Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining

Authors: Jiahao Nie, Yun Xing, Gongjie Zhang, Pei Yan, Aoran Xiao, Yap-Peng Tan, Alex C. Kot, Shijian Lu

Abstract: Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fin… ▽ More Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fine-tuning due to the scarcity of novel category examples. With these insights, we propose a novel cross-domain fine-tuning strategy that addresses the challenging CD-FSS tasks. We first design Bi-directional Few-shot Prediction (BFP), which establishes support-query correspondence in a bi-directional manner, crafting augmented supervision to reduce the overfitting risk. Then we further extend BFP into Iterative Few-shot Adaptor (IFA), which is a recursive framework to capture the support-query correspondence iteratively, targeting maximal exploitation of supervisory signals from the sparse novel category samples. Extensive empirical evaluations show that our method significantly outperforms the state-of-the-arts (+7.8\%), which verifies that IFA tackles the cross-domain challenges and mitigates the overfitting simultaneously. The code is available at: https://github.com/niejiahao1998/IFA. △ Less

Submitted 13 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted by CVPR 2024

arXiv:2312.12484 [pdf, other]

SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks

Authors: Peishen Yan, Hao Wang, Tao Song, Yang Hua, Ruhui Ma, Ningxin Hu, Mohammad R. Haghighat, Haibing Guan

Abstract: Federated Learning (FL) is becoming a popular paradigm for leveraging distributed data and preserving data privacy. However, due to the distributed characteristic, FL systems are vulnerable to Byzantine attacks that compromised clients attack the global model by uploading malicious model updates. With the development of layer-level and parameter-level fine-grained attacks, the attacks' stealthines… ▽ More Federated Learning (FL) is becoming a popular paradigm for leveraging distributed data and preserving data privacy. However, due to the distributed characteristic, FL systems are vulnerable to Byzantine attacks that compromised clients attack the global model by uploading malicious model updates. With the development of layer-level and parameter-level fine-grained attacks, the attacks' stealthiness and effectiveness have been significantly improved. The existing defense mechanisms solely analyze the model-level statistics of individual model updates uploaded by clients to mitigate Byzantine attacks, which are ineffective against fine-grained attacks due to unawareness or overreaction. To address this problem, we propose SkyMask, a new attack-agnostic robust FL system that firstly leverages fine-grained learnable masks to identify malicious model updates at the parameter level. Specifically, the FL server freezes and multiplies the model updates uploaded by clients with the parameter-level masks, and trains the masks over a small clean dataset (i.e., root dataset) to learn the subtle difference between benign and malicious model updates in a high-dimension space. Our extensive experiments involve different models on three public datasets under state-of-the-art (SOTA) attacks, where the results show that SkyMask achieves up to 14% higher testing accuracy compared with SOTA defense strategies under the same attacks and successfully defends against attacks with malicious clients of a high fraction up to 80%. Code is available at https://github.com/KoalaYan/SkyMask. △ Less

Submitted 18 July, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

Comments: Accepted by ECCV2024

arXiv:2312.12027 [pdf, other]

A continuous cold rubidium atomic beam with enhanced flux and tunable velocity

Authors: Shengzhe Wang, Zhixin Meng, and Peiqiang Yan, Yuanxing Liu, Yanying Feng

Abstract: We present a cold atomic beam source based on a two-dimensional (2D)+ magneto-optical trap (MOT), capable of generating a continuous cold beam of 87Rb atoms with a flux up to 4.3*10^9 atoms/s, a mean velocity of 10.96(2.20) m/s, and a transverse temperature of 16.90(1.56) uK. Investigating the influence of high cooling laser intensity, we observe a significant population loss of atoms to hyperfine… ▽ More We present a cold atomic beam source based on a two-dimensional (2D)+ magneto-optical trap (MOT), capable of generating a continuous cold beam of 87Rb atoms with a flux up to 4.3*10^9 atoms/s, a mean velocity of 10.96(2.20) m/s, and a transverse temperature of 16.90(1.56) uK. Investigating the influence of high cooling laser intensity, we observe a significant population loss of atoms to hyperfine-level dark states. To account for this, we employ a multiple hyperfine level model to calculate the cooling efficiency associated with the population in dark states, subsequently modifying the scattering force. Simulations of beam flux at different cooling and repumping laser intensities using the modified scattering force are in agreement with experimental results. Optimizing repumping and cooling intensities enhances the flux by 50%. The influence of phase modulation on both the pushing and cooling lasers is experimentally studied, revealing that the mean velocity of cold atoms can be tuned from 9.5 m/s to 14.6 m/s with a phase-modulated pushing laser. The versatility of this continuous beam source, featuring high flux, controlled velocity, and narrow transverse temperature, renders it valuable for applications in atom interferometers and clocks, ultimately enhancing bandwidth, sensitivity, and signal contrast in these devices. △ Less

Submitted 19 December, 2023; originally announced December 2023.

arXiv:2312.11927 [pdf, other]

Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery

Authors: Pengwei Yan, Kaisong Song, Zhuoren Jiang, Yangyang Kang, Tianqianjin Lin, Changlong Sun, Xiaozhong Liu

Abstract: While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique… ▽ More While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique dual-level pretraining structure that orchestrates node-level and subgraph-level pretext tasks. Unlike prior approaches, DGPM autonomously uncovers significant graph motifs through an edge pooling module, aligning learned motif similarities with graph kernel-based similarities. A cross-matching task enables sophisticated node-motif interactions and novel representation learning. Extensive experiments on 15 datasets validate DGPM's effectiveness and generalizability, outperforming state-of-the-art methods in unsupervised representation learning and transfer learning settings. The autonomously discovered motifs demonstrate the potential of DGPM to enhance robustness and interpretability. △ Less

Submitted 19 December, 2023; originally announced December 2023.

Comments: 14 pages, 6 figures, accepted by AAAI'24

arXiv:2312.08317 [pdf, other]

Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4

Authors: Pei Yan, Shunquan Tan, Miaohui Wang, Jiwu Huang

Abstract: Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated malware, thereby preventing them from invading computers. As a significant representation of dynamic malware behavior, the API (Application Programming Interface) sequence, comprised of consecutive API calls, has progressively become the dominant feature of dynamic analysis methods. Though there have been numerous deep l… ▽ More Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated malware, thereby preventing them from invading computers. As a significant representation of dynamic malware behavior, the API (Application Programming Interface) sequence, comprised of consecutive API calls, has progressively become the dominant feature of dynamic analysis methods. Though there have been numerous deep learning models for malware detection based on API sequences, the quality of API call representations produced by those models is limited. These models cannot generate representations for unknown API calls, which weakens both the detection performance and the generalization. Further, the concept drift phenomenon of API calls is prominent. To tackle these issues, we introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence. Afterward, the pre-trained language model BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence. Theoretically, this proposed method is capable of generating representations for all API calls, excluding the necessity for dataset training during the generation process. Utilizing the representation, a CNN-based detection model is designed to extract the feature. We adopt five benchmark datasets to validate the performance of the proposed model. The experimental results reveal that the proposed detection algorithm performs better than the state-of-the-art method (TextCNN). Specifically, in cross-database experiments and few-shot learning experiments, the proposed model achieves excellent detection performance and almost a 100% recall rate for malware, verifying its superior generalization performance. The code is available at: github.com/yan-scnu/Prompted_Dynamic_Detection. △ Less

Submitted 13 December, 2023; originally announced December 2023.

arXiv:2312.06462 [pdf, other]

Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation

Authors: Qi Yang, Xing Nie, Tong Li, Pengfei Gao, Ying Guo, Cheng Zhen, Pengfei Yan, Shiming Xiang

Abstract: Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral… ▽ More Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/. △ Less

Submitted 7 April, 2024; v1 submitted 11 December, 2023; originally announced December 2023.

Comments: CVPR 2024 Highlight. 13 pages, 10 figures

arXiv:2311.16557 [pdf, other]

A Continuous Dual-Axis Atomic Interferometric Inertial Sensor

Authors: Pei-Qiang Yan, Wei-Chen Jia, Sheng-Zhe Wang, Yan-Ying Feng

Abstract: We present an interferometric inertial sensor that utilizes two counter-propagating atomic beams with transverse two-dimensional cooling. By employing three parallel and spatially aligned Raman laser beams for Doppler-sensitive Raman transitions, we successfully generate inertia-sensitive Mach-Zehnder interference fringes with an interrogation length of $2L=54\,\rm{cm}$. The measured rotation and… ▽ More We present an interferometric inertial sensor that utilizes two counter-propagating atomic beams with transverse two-dimensional cooling. By employing three parallel and spatially aligned Raman laser beams for Doppler-sensitive Raman transitions, we successfully generate inertia-sensitive Mach-Zehnder interference fringes with an interrogation length of $2L=54\,\rm{cm}$. The measured rotation and acceleration sensitivities are $0.25\,(μ\rm{rad/s})/\sqrt{Hz}$ and $0.12\,\rm{m}\textit{g}/\rm{\sqrt{Hz}}$, respectively. The sensor's capability to measure rotation and acceleration simultaneously in dynamic environments is validated through comparative analysis with classical sensors under force oscillation in different directions. Additionally, we conduct experiments on a turntable to calibrate the gyroscope's scaling factor and address nonlinearity. △ Less

Submitted 28 November, 2023; originally announced November 2023.

Comments: 8 pages, 4 figures

arXiv:2311.03679 [pdf, other]

Unsupervised convolutional neural network fusion approach for change detection in remote sensing images

Authors: Weidong Yan, Pei Yan, Li Cao

Abstract: With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection.… ▽ More With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection. Firstly, the bi-temporal images are transformed into different feature spaces by using convolution kernels of different sizes to extract multi-scale information of the images. Secondly, the output features of bi-temporal images at the same convolution kernels are subtracted to obtain the corresponding difference images, and the difference feature images at the same scale are fused into one feature image by using 1 * 1 convolution layer. Finally, the output features of different scales are concatenated and a 1 * 1 convolution layer is used to fuse the multi-scale information of the image. The model parameters are obtained by a redesigned sparse function. Our model has three features: the entire training process is conducted in an unsupervised manner, the network architecture is shallow, and the objective function is sparse. Thus, it can be seen as a kind of lightweight network model. Experimental results on four real remote sensing datasets indicate the feasibility and effectiveness of the proposed approach. △ Less

Submitted 6 November, 2023; originally announced November 2023.

arXiv:2311.00353 [pdf, other]

LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation

Authors: Yuxiang Bao, Di Qiu, Guoliang Kang, Baochang Zhang, Bo Jin, Kaiye Wang, Pengfei Yan

Abstract: Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to enc… ▽ More Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to encourage the temporal consistency. However, in those works, temporal inconsistency issue may not be thoroughly solved, rendering the fidelity of generated videos limited.%The current state of the art cross-frame attention method aims at maintaining fine-grained visual details across frames, but it is still challenged by the temporal coherence problem. In this paper, we find the bottleneck lies in the unconstrained query tokens and propose a new zero-shot video-to-video translation framework, named \textit{LatentWarp}. Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space to constrain the query tokens. Specifically, based on the optical flow obtained from the original video, we warp the generated latent features of last frame to align with the current frame during the denoising process. As a result, the corresponding regions across the adjacent frames can share closely-related query tokens and attention outputs, which can further improve latent-level consistency to enhance visual temporal coherence of generated videos. Extensive experiment results demonstrate the superiority of \textit{LatentWarp} in achieving video-to-video translation with temporal coherence. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2311.00266 [pdf, other]

Constructing the Fulde-Ferrell-Larkin-Ovchinnikov state in antiferromagnetic insulator CrOCl

Authors: Yifan Ding, Jiadian He, Shihao Zhang, Huakun Zuo, Pingfan Gu, Jiliang Cai, Xiaohui Zeng, Pu Yan, Kecheng Cao, Kenji Watanabe, Takashi Taniguchi, Peng Dong, Yiwen Zhang, Yueshen Wu, Xiang Zhou, Jinghui Wang, Yulin Chen, Yu Ye, Jianpeng Liu, Jun Li

Abstract: Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, w… ▽ More Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, we report the observation of superconductivity in a few-layer antiferromagnetic insulator CrOCl by utilizing superconducting proximity effect with NbSe2 flakes. The superconductivity demonstrates a considerably weak gap of about 0.12 meV and the in-plane upper critical field reveals as behavior of the FFLO state at low temperature. Our first-principles calculations indicate that the proximitized superconductivity may exist in the CrOCl layer with Cr vacancies or line-defects. Moreover, the FFLO state could be induced by the inherent larger spin splitting in the CrOCl layer. Our findings not only demonstrate the fascinating interaction between superconductivity and magnetism, but also provide a possible path to construct FFLO state by intrinsic time reversal symmetry breaking and superconducting proximity effect. △ Less

Submitted 31 October, 2023; originally announced November 2023.

arXiv:2310.17144 [pdf, ps, other]

Some remarks on strong multiplicity one for paramodular forms

Authors: Xiyuan Wang, Zhining Wei, Pan Yan, Shaoyun Yi

Abstract: We establish several refined strong multiplicity one results for paramodular cusp forms by using the spinor and standard $L$-functions with the combination of the methods from both of automorphic side and Galois side. We establish several refined strong multiplicity one results for paramodular cusp forms by using the spinor and standard $L$-functions with the combination of the methods from both of automorphic side and Galois side. △ Less

Submitted 26 October, 2023; originally announced October 2023.

Comments: 28 pages

MSC Class: Primary 11F46; 11F60; 11F66; Secondary 11F30; 11F70; 11F80

arXiv:2309.15699 [pdf, other]

STRAW: Structure-Adaptive Weighting Procedure for Large-Scale Spatial Multiple Testing

Authors: Pengfei Wang, Pengyu Yan, Canhui Li

Abstract: The problem of large-scale spatial multiple testing is often encountered in various scientific research fields, where the signals are usually enriched on some regions while sparse on others. To integrate spatial structure information from nearby locations, we propose a novel approach, called {\bf STR}ucture-{\bf A}daptive {\bf W}eighting (STRAW) procedure, for large-scale spatial multiple testing.… ▽ More The problem of large-scale spatial multiple testing is often encountered in various scientific research fields, where the signals are usually enriched on some regions while sparse on others. To integrate spatial structure information from nearby locations, we propose a novel approach, called {\bf STR}ucture-{\bf A}daptive {\bf W}eighting (STRAW) procedure, for large-scale spatial multiple testing. The STRAW procedure is capable of handling a broad range of spatial settings by leveraging a class of weighted p-values and is fully data-driven. Theoretical results show that the proposed method controls the false discovery rate (FDR) at the pre-specified level under some mild conditions. In practice, the local sparsity level, defined as the probability of the null hypothesis being not true, is commonly unknown. To address this issue, we develop a new method for estimating the local sparsity level by employing the kernel-smooth local false discovery rate (Lfdr) statistic. The superior numerical performance of the STRAW procedure is demonstrated by performing extensive simulation studies and a real data analysis. △ Less

Submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.10445 [pdf, other]

Product of Rankin-Selberg convolutions and a new proof of Jacquet's local converse conjecture

Authors: Pan Yan, Qing Zhang

Abstract: In this article, we construct a family of integrals which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_{l}\times \mathrm{GL}_m$ and of $\mathrm{GL}_{l}\times \mathrm{GL}_n $ when $m+n<l$. When $n=0$, these integrals are those defined by Jacquet--Piatetski-Shapiro--Shalika up to a shift. In this sense, these new integrals generalize Jacquet--Piatetski-Shapiro--Shalika's Ran… ▽ More In this article, we construct a family of integrals which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_{l}\times \mathrm{GL}_m$ and of $\mathrm{GL}_{l}\times \mathrm{GL}_n $ when $m+n<l$. When $n=0$, these integrals are those defined by Jacquet--Piatetski-Shapiro--Shalika up to a shift. In this sense, these new integrals generalize Jacquet--Piatetski-Shapiro--Shalika's Rankin-Selberg convolution integrals. We study basic properties of these integrals. In particular, we define local gamma factors using this new family of integrals. As an application, we obtain a new proof of Jacquet's local converse conjecture using these new integrals. △ Less

Submitted 19 September, 2023; originally announced September 2023.

MSC Class: 11F70; 22E50

arXiv:2309.09475 [pdf, other]

Terahertz magnon frequency comb

Authors: Xianglong Yao, Zhejunyu Jin, Zhenyu Wang, Zhaozhuo Zeng, Peng Yan

Abstract: Magnon frequency comb (MFC), the spin-wave spectra composing of equidistant coherent peaks, is attracting much attention in magnonics. A terahertz (THz) MFC, combining the advantages of the THz and MFC technologies, is highly desired because it would significantly advance the MFC applications in ultrafast magnonic metrology, sensing, and communications. Here, we show that the THz MFC can be genera… ▽ More Magnon frequency comb (MFC), the spin-wave spectra composing of equidistant coherent peaks, is attracting much attention in magnonics. A terahertz (THz) MFC, combining the advantages of the THz and MFC technologies, is highly desired because it would significantly advance the MFC applications in ultrafast magnonic metrology, sensing, and communications. Here, we show that the THz MFC can be generated by nonlinear interactions between spin waves and skyrmions in antiferromagnets [Z. Jin \emph{et al}., \href{https://doi.org/10.48550/arXiv.2301.03211}{arXiv:2301.03211}]. It is found that the strength of the three-wave mixing between propagating magnons and breathing skyrmions follows a linear dependence on the driving frequency and the MFC signal can be observed over a broad driving frequency range. Our results extend the working frequency of MFC to the THz regime, which would have potential applications in ultrafast spintronic devices and promote the development of nonlinear magnonics in antiferromagnets. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: 6 pages, 6 figures

arXiv:2309.01207 [pdf, other]

Spectral Adversarial MixUp for Few-Shot Unsupervised Domain Adaptation

Authors: Jiajin Zhang, Hanqing Chao, Amit Dhurandhar, Pin-Yu Chen, Ali Tajer, Yangyang Xu, Pingkun Yan

Abstract: Domain shift is a common problem in clinical applications, where the training images (source domain) and the test images (target domain) are under different distributions. Unsupervised Domain Adaptation (UDA) techniques have been proposed to adapt models trained in the source domain to the target domain. However, those methods require a large number of images from the target domain for model train… ▽ More Domain shift is a common problem in clinical applications, where the training images (source domain) and the test images (target domain) are under different distributions. Unsupervised Domain Adaptation (UDA) techniques have been proposed to adapt models trained in the source domain to the target domain. However, those methods require a large number of images from the target domain for model training. In this paper, we propose a novel method for Few-Shot Unsupervised Domain Adaptation (FSUDA), where only a limited number of unlabeled target domain samples are available for training. To accomplish this challenging task, first, a spectral sensitivity map is introduced to characterize the generalization weaknesses of models in the frequency domain. We then developed a Sensitivity-guided Spectral Adversarial MixUp (SAMix) method to generate target-style images to effectively suppresses the model sensitivity, which leads to improved model generalizability in the target domain. We demonstrated the proposed method and rigorously evaluated its performance on multiple tasks using several public datasets. △ Less

Submitted 3 September, 2023; originally announced September 2023.

Comments: Accepted by MICCAI 2023

arXiv:2308.01971 [pdf, other]

SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding

Authors: Saleem Ahmed, Pengyu Yan, David Doermann, Srirangaraj Setlur, Venu Govindaraju

Abstract: We introduce a novel bottom-up approach for the extraction of chart data. Our model utilizes images of charts as inputs and learns to detect keypoints (KP), which are used to reconstruct the components within the plot area. Our novelty lies in detecting a fusion of continuous and discrete KP as predicted heatmaps. A combination of sparse and dense per-pixel objectives coupled with a uni-modal self… ▽ More We introduce a novel bottom-up approach for the extraction of chart data. Our model utilizes images of charts as inputs and learns to detect keypoints (KP), which are used to reconstruct the components within the plot area. Our novelty lies in detecting a fusion of continuous and discrete KP as predicted heatmaps. A combination of sparse and dense per-pixel objectives coupled with a uni-modal self-attention-based feature-fusion layer is applied to learn KP embeddings. Further leveraging deep metric learning for unsupervised clustering, allows us to segment the chart plot area into various objects. By further matching the chart components to the legend, we are able to obtain the data series names. A post-processing threshold is applied to the KP embeddings to refine the object reconstructions and improve accuracy. Our extensive experiments include an evaluation of different modules for KP estimation and the combination of deep layer aggregation and corner pooling approaches. The results of our experiments provide extensive evaluation for the task of real-world chart data extraction. △ Less

Submitted 3 August, 2023; originally announced August 2023.

Comments: Accepted ORAL at ICDAR 23

arXiv:2307.16748 [pdf, ps, other]

The ring-shaped shadow of rotating naked singularity with a complete photon sphere

Authors: Mingzhi Wang, Guanghai Guo, Pengfei Yan, Songbai Chen, Jiliang Jing

Abstract: We investigate the shadows of Konoplya-Zhidenko naked singularity. In the spacetime of Konoplya-Zhidenko naked singularity, not only can unstable retrograde light ring (LR) exist, but also unstable prograde LR, leading to the formation of a complete photon sphere (PS). Due to the absence of an event horizon, a dark disc-shaped shadow does not appear; instead, a ring-shaped shadow is observed. The… ▽ More We investigate the shadows of Konoplya-Zhidenko naked singularity. In the spacetime of Konoplya-Zhidenko naked singularity, not only can unstable retrograde light ring (LR) exist, but also unstable prograde LR, leading to the formation of a complete photon sphere (PS). Due to the absence of an event horizon, a dark disc-shaped shadow does not appear; instead, a ring-shaped shadow is observed. The ring-shaped shadow appears as an infinite number of relativistic Einstein rings in the image of the naked singularity. For some parameter values, only the unstable retrograde LR exists, resulting in an incomplete unstable PS and consequently giving rise to the arc-shaped shadow for Konoplya-Zhidenko naked singularity. The shadow of Konoplya-Zhidenko naked singularity gradually shifts to the right as the rotation parameter $a$ increases, and gradually becomes smaller as the deformation parameter $|η|$ increases. Moreover, the stable LRs and stable photon spherical orbits can also exist in Konoplya-Zhidenko naked singularity spacetime, but they have no effect on the image of the naked singularity. This study demonstrates that rotating naked singularity can exhibit not only an arc-shaped shadow but also a ring-shaped shadow. △ Less

Submitted 5 June, 2024; v1 submitted 31 July, 2023; originally announced July 2023.

Comments: 14 pages, 11 figures. It is to be published in Chinese Physics C

arXiv:2307.14634 [pdf, other]

Fact-Checking of AI-Generated Reports

Authors: Razi Mahmood, Ge Wang, Mannudeep Kalra, Pingkun Yan

Abstract: With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new m… ▽ More With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 10 pages, 3 figures, 3 tables

arXiv:2307.14039 [pdf, other]

Controllable Guide-Space for Generalizable Face Forgery Detection

Authors: Ying Guo, Cheng Zhen, Pengfei Yan

Abstract: Recent studies on face forgery detection have shown satisfactory performance for methods involved in training datasets, but are not ideal enough for unknown domains. This motivates many works to improve the generalization, but forgery-irrelevant information, such as image background and identity, still exists in different domain features and causes unexpected clustering, limiting the generalizatio… ▽ More Recent studies on face forgery detection have shown satisfactory performance for methods involved in training datasets, but are not ideal enough for unknown domains. This motivates many works to improve the generalization, but forgery-irrelevant information, such as image background and identity, still exists in different domain features and causes unexpected clustering, limiting the generalization. In this paper, we propose a controllable guide-space (GS) method to enhance the discrimination of different forgery domains, so as to increase the forgery relevance of features and thereby improve the generalization. The well-designed guide-space can simultaneously achieve both the proper separation of forgery domains and the large distance between real-forgery domains in an explicit and controllable manner. Moreover, for better discrimination, we use a decoupling module to weaken the interference of forgery-irrelevant correlations between domains. Furthermore, we make adjustments to the decision boundary manifold according to the clustering degree of the same domain features within the neighborhood. Extensive experiments in multiple in-domain and cross-domain settings confirm that our method can achieve state-of-the-art generalization. △ Less

Submitted 26 July, 2023; originally announced July 2023.

Comments: Accepted by ICCV 2023

arXiv:2307.13693 [pdf, other]

Evaluating Large Language Models for Radiology Natural Language Processing

Authors: Zhengliang Liu, Tianyang Zhong, Yiwei Li, Yutong Zhang, Yi Pan, Zihao Zhao, Peixin Dong, Chao Cao, Yuxiao Liu, Peng Shu, Yaonai Wei, Zihao Wu, Chong Ma, Jiaqi Wang, Sheng Wang, Mengyue Zhou, Zuowei Jiang, Chunlin Li, Jason Holmes, Shaochen Xu, Lu Zhang, Haixing Dai, Kai Zhang, Lin Zhao, Yuanhao Chen , et al. (20 additional authors not shown)

Abstract: The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a compreh… ▽ More The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a comprehensive evaluation of these models remains to be conducted. This lack of assessment is especially apparent within the context of radiology NLP. This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports, a crucial component of radiology NLP. Specifically, the ability to derive impressions from radiologic findings is assessed. The outcomes of this evaluation provide key insights into the performance, strengths, and weaknesses of these LLMs, informing their practical applications within the medical domain. △ Less

Submitted 27 July, 2023; v1 submitted 25 July, 2023; originally announced July 2023.

arXiv:2307.10954 [pdf, other]

Soft-tissue Driven Craniomaxillofacial Surgical Planning

Authors: Xi Fang, Daeseung Kim, Xuanang Xu, Tianshu Kuang, Nathan Lampen, Jungwook Lee, Hannah H. Deng, Jaime Gateno, Michael A. K. Liebschner, James J. Xia, Pingkun Yan

Abstract: In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct fac… ▽ More In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct facial deformities. Despite efforts to simulate facial changes resulting from bony movement, surgical planning still relies on iterative revisions and educated guesses. To address these issues, we propose a soft-tissue driven framework that can automatically create and verify surgical plans. Our framework consists of a bony planner network that estimates the bony movements required to achieve the desired facial outcome and a facial simulator network that can simulate the possible facial changes resulting from the estimated bony movement plans. By combining these two models, we can verify and determine the final bony movement required for planning. The proposed framework was evaluated using a clinical dataset, and our experimental results demonstrate that the soft-tissue driven approach greatly improves the accuracy and efficacy of surgical planning when compared to the conventional bone-driven approach. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: Early accepted by MICCAI 2023

Showing 1–50 of 227 results for author: Yan, P