-
Integrating AI in College Education: Positive yet Mixed Experiences with ChatGPT
Authors:
Xinrui Song,
Jiajin Zhang,
Pingkun Yan,
Juergen Hahn,
Uwe Kruger,
Hisham Mohamed,
Ge Wang
Abstract:
The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduat…
▽ More
The integration of artificial intelligence (AI) chatbots into higher education marks a shift towards a new generation of pedagogical tools, mirroring the arrival of milestones like the internet. With the launch of ChatGPT-4 Turbo in November 2023, we developed a ChatGPT-based teaching application (https://chat.openai.com/g/g-1imx1py4K-chatge-medical-imaging) and integrated it into our undergraduate medical imaging course in the Spring 2024 semester. This study investigates the use of ChatGPT throughout a semester-long trial, providing insights into students' engagement, perception, and the overall educational effectiveness of the technology. We systematically collected and analyzed data concerning students' interaction with ChatGPT, focusing on their attitudes, concerns, and usage patterns. The findings indicate that ChatGPT offers significant advantages such as improved information access and increased interactivity, but its adoption is accompanied by concerns about the accuracy of the information provided and the necessity for well-defined guidelines to optimize its use.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Skyrmion Hall effect in altermagnets
Authors:
Zhejunyu Jin,
Zhaozhuo Zeng,
Yunshan Cao,
Peng Yan
Abstract:
It is widely believed that the skyrmion Hall effect is absent in antiferromagnets because of the vanishing topological charge. However, the Aharonov-Casher theory indicates the possibility of topological effects for neutral particles. In this work, we predict the skyrmion Hall effect in emerging altermagnets with zero net magnetization and zero skyrmion charge. We first show that the neutral skyrm…
▽ More
It is widely believed that the skyrmion Hall effect is absent in antiferromagnets because of the vanishing topological charge. However, the Aharonov-Casher theory indicates the possibility of topological effects for neutral particles. In this work, we predict the skyrmion Hall effect in emerging altermagnets with zero net magnetization and zero skyrmion charge. We first show that the neutral skyrmion manifests as a magnetic quadrupole in altermagnets. We reveal a hidden gauge field from the magnetic quadrupole, which induces the skyrmion Hall effect when driven by spin transfer torque. Interestingly, we identify a sign change of the Hall angle when one swaps the anisotropic exchange couplings in altermagnets. Furthermore, we demonstrate that both the velocity and Hall angle of altermagnetic skyrmions sensitively depend on the current direction. Our findings real the critical role of magnetic quadrupole in driving the skyrmion Hall effect with vanishing charge, and pave the way to discovering new Hall effect of neutral quasiparticles beyond magnetic skyrmions.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
GPT-4 vs. Human Translators: A Comprehensive Evaluation of Translation Quality Across Languages, Domains, and Expertise Levels
Authors:
Jianhao Yan,
Pingchuan Yan,
Yulong Chen,
Judy Li,
Xianchao Zhu,
Yue Zhang
Abstract:
This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We a…
▽ More
This study comprehensively evaluates the translation quality of Large Language Models (LLMs), specifically GPT-4, against human translators of varying expertise levels across multiple language pairs and domains. Through carefully designed annotation rounds, we find that GPT-4 performs comparably to junior translators in terms of total errors made but lags behind medium and senior translators. We also observe the imbalanced performance across different languages and domains, with GPT-4's translation capability gradually weakening from resource-rich to resource-poor directions. In addition, we qualitatively study the translation given by GPT-4 and human translators, and find that GPT-4 translator suffers from literal translations, but human translators sometimes overthink the background information. To our knowledge, this study is the first to evaluate LLMs against human translators and analyze the systematic differences between their outputs, providing valuable insights into the current state of LLM-based translation and its potential limitations.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Explaining Chest X-ray Pathology Models using Textual Concepts
Authors:
Vijay Sadashivaiah,
Mannudeep K. Kalra,
Pingkun Yan,
James A. Hendler
Abstract:
Deep learning models have revolutionized medical imaging and diagnostics, yet their opaque nature poses challenges for clinical adoption and trust. Amongst approaches to improve model interpretability, concept-based explanations aim to provide concise and human understandable explanations of any arbitrary classifier. However, such methods usually require a large amount of manually collected data w…
▽ More
Deep learning models have revolutionized medical imaging and diagnostics, yet their opaque nature poses challenges for clinical adoption and trust. Amongst approaches to improve model interpretability, concept-based explanations aim to provide concise and human understandable explanations of any arbitrary classifier. However, such methods usually require a large amount of manually collected data with concept annotation, which is often scarce in the medical domain. In this paper, we propose Conceptual Counterfactual Explanations for Chest X-ray (CoCoX) that leverage existing vision-language models (VLM) joint embedding space to explain black-box classifier outcomes without the need for annotated datasets. Specifically, we utilize textual concepts derived from chest radiography reports and a pre-trained chest radiography-based VLM to explain three common cardiothoracic pathologies. We demonstrate that the explanations generated by our method are semantically meaningful and faithful to underlying pathologies.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Answering real-world clinical questions using large language model based systems
Authors:
Yen Sia Low,
Michael L. Jackson,
Rebecca J. Hyde,
Robert E. Brown,
Neil M. Sanghavi,
Julian D. Baldwin,
C. William Pike,
Jananee Muralidharan,
Gavin Hui,
Natasha Alexander,
Hadeel Hassan,
Rahul V. Nene,
Morgan Pike,
Courtney J. Pokrzywa,
Shivam Vedak,
Adam Paul Yan,
Dong-han Yao,
Amy R. Zipursky,
Christina Dinh,
Philip Ballentine,
Dan C. Derieg,
Vladimir Polony,
Rehan N. Chawdry,
Jordan Davies,
Brigham B. Hyde
, et al. (2 additional authors not shown)
Abstract:
Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-bas…
▽ More
Evidence to guide healthcare decisions is often limited by a lack of relevant and trustworthy literature as well as difficulty in contextualizing existing research for a specific patient. Large language models (LLMs) could potentially address both challenges by either summarizing published literature or generating new studies based on real-world data (RWD). We evaluated the ability of five LLM-based systems in answering 50 clinical questions and had nine independent physicians review the responses for relevance, reliability, and actionability. As it stands, general-purpose LLMs (ChatGPT-4, Claude 3 Opus, Gemini Pro 1.5) rarely produced answers that were deemed relevant and evidence-based (2% - 10%). In contrast, retrieval augmented generation (RAG)-based and agentic LLM systems produced relevant and evidence-based answers for 24% (OpenEvidence) to 58% (ChatRWD) of questions. Only the agentic ChatRWD was able to answer novel questions compared to other LLMs (65% vs. 0-9%). These results suggest that while general-purpose LLMs should not be used as-is, a purpose-built system for evidence summarization based on RAG and one for generating novel evidence working synergistically would improve availability of pertinent evidence for patient care.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Combining Classical and Probabilistic Independence Reasoning to Verify the Security of Oblivious Algorithms (Extended Version)
Authors:
Pengbo Yan,
Toby Murray,
Olga Ohrimenko,
Van-Thuan Pham,
Robert Sison
Abstract:
We consider the problem of how to verify the security of probabilistic oblivious algorithms formally and systematically. Unfortunately, prior program logics fail to support a number of complexities that feature in the semantics and invariant needed to verify the security of many practical probabilistic oblivious algorithms. We propose an approach based on reasoning over perfectly oblivious approxi…
▽ More
We consider the problem of how to verify the security of probabilistic oblivious algorithms formally and systematically. Unfortunately, prior program logics fail to support a number of complexities that feature in the semantics and invariant needed to verify the security of many practical probabilistic oblivious algorithms. We propose an approach based on reasoning over perfectly oblivious approximations, using a program logic that combines both classical Hoare logic reasoning and probabilistic independence reasoning to support all the needed features. We formalise and prove our new logic sound in Isabelle/HOL and apply our approach to formally verify the security of several challenging case studies beyond the reach of prior methods for proving obliviousness.
△ Less
Submitted 29 June, 2024;
originally announced July 2024.
-
Personalized Interpretation on Federated Learning: A Virtual Concepts approach
Authors:
Peng Yan,
Guodong Long,
Jing Jiang,
Michael Blumenstein
Abstract:
Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixt…
▽ More
Tackling non-IID data is an open challenge in federated learning research. Existing FL methods, including robust FL and personalized FL, are designed to improve model performance without consideration of interpreting non-IID across clients. This paper aims to design a novel FL method to robust and interpret the non-IID data across clients. Specifically, we interpret each client's dataset as a mixture of conceptual vectors that each one represents an interpretable concept to end-users. These conceptual vectors could be pre-defined or refined in a human-in-the-loop process or be learnt via the optimization procedure of the federated learning system. In addition to the interpretability, the clarity of client-specific personalization could also be applied to enhance the robustness of the training process on FL system. The effectiveness of the proposed method have been validated on benchmark datasets.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
Peculiar corner states in magnetic fractals
Authors:
Zhixiong Li,
Peng Yan
Abstract:
Topological excitations in periodic magnetic crystals have received significant recent attention. However, it is an open question on their fate once the lattice periodicity is broken. In this work, we theoretically study the topological properties embedded in the collective dynamics of magnetic texture array arranged into a Sierpiński carpet structure with effective Hausdorff dimensionality…
▽ More
Topological excitations in periodic magnetic crystals have received significant recent attention. However, it is an open question on their fate once the lattice periodicity is broken. In this work, we theoretically study the topological properties embedded in the collective dynamics of magnetic texture array arranged into a Sierpiński carpet structure with effective Hausdorff dimensionality $d_{f}=1.893$. By evaluating the quantized real-space quadrupole moment, we obtain the phase diagram supporting peculiar corner states that are absent in conventional square lattices. We identify three different higher-order topological states, i.e., outer corner state, type I and type II inner corner states. We further show that all these corner states are topologically protected and are robust against moderate disorder. Full micromagnetic simulations are performed to verify theoretical predictions with good agreement. Our results pave the way to investigating topological phases of magnetic texture based fractals and bridging the gap between magnetic topology and fractality.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Magnon spin transport through atomic ferrimagnetic domain walls
Authors:
Zhaozhuo Zeng,
Peng Yan
Abstract:
It is a well-established notion that the spin of a magnon should be flipped when it passes through a $180^{\circ}$ domain wall (DW) in both ferromagnets and antiferromagnets, while the magnon spin transport through ferrimagnetic DW is still elusive. In this work, we report that the magnon preserves its spin after the transmission through an atomically sharp DW in ferrimagnets, due to the intriguin…
▽ More
It is a well-established notion that the spin of a magnon should be flipped when it passes through a $180^{\circ}$ domain wall (DW) in both ferromagnets and antiferromagnets, while the magnon spin transport through ferrimagnetic DW is still elusive. In this work, we report that the magnon preserves its spin after the transmission through an atomically sharp DW in ferrimagnets, due to the intriguing interband magnon scattering at the domain interface. This finding may provide significant insight to resolve the puzzling insensitivity of magnon spin diffusion to the $180^{\circ}$ ferrimagnetic DWs observed by recent experiments. Our results reveal the unique role of ferrimagnetic DWs in manipulating the magnon spin and may facilitate the design of novel magnonic devices based on ferrimagnets.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Artemis: Towards Referential Understanding in Complex Videos
Authors:
Jihao Qiu,
Yuan Zhang,
Xi Tang,
Lingxi Xie,
Tianren Ma,
Pengyu Yan,
David Doermann,
Qixiang Ye,
Yunjie Tian
Abstract:
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language quest…
▽ More
Videos carry rich visual information including object description, action, interaction, etc., but the existing multimodal large language models (MLLMs) fell short in referential understanding scenarios such as video-based referring. In this paper, we present Artemis, an MLLM that pushes video-based referential understanding to a finer level. Given a video, Artemis receives a natural-language question with a bounding box in any video frame and describes the referred target in the entire video. The key to achieving this goal lies in extracting compact, target-specific video features, where we set a solid baseline by tracking and selecting spatiotemporal features from the video. We train Artemis on the newly established VideoRef45K dataset with 45K video-QA pairs and design a computationally efficient, three-stage training procedure. Results are promising both quantitatively and qualitatively. Additionally, we show that \model can be integrated with video grounding and text summarization tools to understand more complex scenarios. Code and data are available at https://github.com/qiujihao19/Artemis.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
Cardiovascular Disease Detection from Multi-View Chest X-rays with BI-Mamba
Authors:
Zefan Yang,
Jiajin Zhang,
Ge Wang,
Mannudeep K. Kalra,
Pingkun Yan
Abstract:
Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower level…
▽ More
Accurate prediction of Cardiovascular disease (CVD) risk in medical imaging is central to effective patient health management. Previous studies have demonstrated that imaging features in computed tomography (CT) can help predict CVD risk. However, CT entails notable radiation exposure, which may result in adverse health effects for patients. In contrast, chest X-ray emits significantly lower levels of radiation, offering a safer option. This rationale motivates our investigation into the feasibility of using chest X-ray for predicting CVD risk. Convolutional Neural Networks (CNNs) and Transformers are two established network architectures for computer-aided diagnosis. However, they struggle to model very high resolution chest X-ray due to the lack of large context modeling power or quadratic time complexity. Inspired by state space sequence models (SSMs), a new class of network architectures with competitive sequence modeling power as Transfomers and linear time complexity, we propose Bidirectional Image Mamba (BI-Mamba) to complement the unidirectional SSMs with opposite directional information. BI-Mamba utilizes parallel forward and backwark blocks to encode longe-range dependencies of multi-view chest X-rays. We conduct extensive experiments on images from 10,395 subjects in National Lung Screening Trail (NLST). Results show that BI-Mamba outperforms ResNet-50 and ViT-S with comparable parameter size, and saves significant amount of GPU memory during training. Besides, BI-Mamba achieves promising performance compared with previous state of the art in CT, unraveling the potential of chest X-ray for CVD risk prediction.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Disease-informed Adaptation of Vision-Language Models
Authors:
Jiajin Zhang,
Ge Wang,
Mannudeep K. Kalra,
Pingkun Yan
Abstract:
In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely abs…
▽ More
In medical image analysis, the expertise scarcity and the high cost of data annotation limits the development of large artificial intelligence models. This paper investigates the potential of transfer learning with pre-trained vision-language models (VLMs) in this domain. Currently, VLMs still struggle to transfer to the underrepresented diseases with minimal presence and new diseases entirely absent from the pretraining dataset. We argue that effective adaptation of VLMs hinges on the nuanced representation learning of disease concepts. By capitalizing on the joint visual-linguistic capabilities of VLMs, we introduce disease-informed contextual prompting in a novel disease prototype learning framework. This approach enables VLMs to grasp the concepts of new disease effectively and efficiently, even with limited data. Extensive experiments across multiple image modalities showcase notable enhancements in performance compared to existing techniques.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
Circuit realization of topological physics
Authors:
Huanhuan Yang,
Lingling Song,
Yunshan Cao,
Peng Yan
Abstract:
Recently, topolectrical circuits (TECs) boom in studying the topological states of matter. The resemblance between circuit Laplacians and tight-binding models in condensed matter physics allows for the exploration of exotic topological phases on the circuit platform. In this review, we begin by presenting the basic equations for the circuit elements and units, along with the fundamentals and exper…
▽ More
Recently, topolectrical circuits (TECs) boom in studying the topological states of matter. The resemblance between circuit Laplacians and tight-binding models in condensed matter physics allows for the exploration of exotic topological phases on the circuit platform. In this review, we begin by presenting the basic equations for the circuit elements and units, along with the fundamentals and experimental methods for TECs. Subsequently, we retrospect the main literature in this field, encompassing the circuit realization of (higher-order) topological insulators and semimetals. Due to the abundant electrical elements and flexible connections, many unconventional topological states like the non-Hermitian, nonlinear, non-Abelian, non-periodic, non-Euclidean, and higher-dimensional topological states that are challenging to observe in conventional condensed matter physics, have been observed in circuits and summarized in this review. Furthermore, we show the capability of electrical circuits for exploring the physical phenomena in other systems, such as photonic and magnetic ones. Importantly, we highlight TEC systems are convenient for manufacture and miniaturization because of their compatibility with the traditional integrated circuits. Finally, we prospect the future directions in this exciting field, and connect the emerging TECs with the development of topology physics, (meta)material designs, and device applications.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
AdaFedFR: Federated Face Recognition with Adaptive Inter-Class Representation Learning
Authors:
Di Qiu,
Xinyang Lin,
Kaiye Wang,
Xiangxiang Chu,
Pengfei Yan
Abstract:
With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scena…
▽ More
With the growing attention on data privacy and communication security in face recognition applications, federated learning has been introduced to learn a face recognition model with decentralized datasets in a privacy-preserving manner. However, existing works still face challenges such as unsatisfying performance and additional communication costs, limiting their applicability in real-world scenarios. In this paper, we propose a simple yet effective federated face recognition framework called AdaFedFR, by devising an adaptive inter-class representation learning algorithm to enhance the generalization of the generic face model and the efficiency of federated training under strict privacy-preservation. In particular, our work delicately utilizes feature representations of public identities as learnable negative knowledge to optimize the local objective within the feature space, which further encourages the local model to learn powerful representations and optimize personalized models for clients. Experimental results demonstrate that our method outperforms previous approaches on several prevalent face recognition benchmarks within less than 3 communication rounds, which shows communication-friendly and great efficiency.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
LiPost: Improved Content Understanding With Effective Use of Multi-task Contrastive Learning
Authors:
Akanksha Bindal,
Sudarshan Ramanujam,
Dave Golland,
TJ Hazen,
Tina Jiang,
Fengyu Zhang,
Peng Yan
Abstract:
In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling…
▽ More
In enhancing LinkedIn core content recommendation models, a significant challenge lies in improving their semantic understanding capabilities. This paper addresses the problem by leveraging multi-task learning, a method that has shown promise in various domains. We fine-tune a pre-trained, transformer-based LLM using multi-task contrastive learning with data from a diverse set of semantic labeling tasks. We observe positive transfer, leading to superior performance across all tasks when compared to training independently on each. Our model outperforms the baseline on zero shot learning and offers improved multilingual support, highlighting its potential for broader application. The specialized content embeddings produced by our model outperform generalized embeddings offered by OpenAI on Linkedin dataset and tasks. This work provides a robust foundation for vertical teams across LinkedIn to customize and fine-tune the LLM to their specific applications. Our work offers insights and best practices for the field to build on.
△ Less
Submitted 13 July, 2024; v1 submitted 18 May, 2024;
originally announced May 2024.
-
Optical skyrmions from metafibers
Authors:
Tiantian He,
Yuan Meng,
Lele Wang,
Hongkun Zhong,
Nilo Mata-Cervera,
Dan Li,
Ping Yan,
Qiang Liu,
Yijie Shen,
Qirong Xiao
Abstract:
Optical skyrmions are an emerging class of structured light with sophisticated particle-like topologies with great potential for revolutionizing modern informatics. However, the current generation of optical skyrmions involves complex or bulky systems, hindering their development of practical applications. Here, exploiting the emergent "lab-on-fiber" technology, we demonstrate the design of a meta…
▽ More
Optical skyrmions are an emerging class of structured light with sophisticated particle-like topologies with great potential for revolutionizing modern informatics. However, the current generation of optical skyrmions involves complex or bulky systems, hindering their development of practical applications. Here, exploiting the emergent "lab-on-fiber" technology, we demonstrate the design of a metafiber-integrated photonic skyrmion generator. We not only successfully generated high-quality optical skyrmions from metafibers, but also experimentally verified their remarkable properties, such as regulability and topological stability with deep-subwavelength features beyond the diffraction limits. Our flexible and fiber-integrated optical skyrmions platform paves the avenue for future applications of topologically-enhanced remote super-resolution microscopy and super-robust information transfer.
△ Less
Submitted 3 May, 2024;
originally announced May 2024.
-
Fourier Coefficients and Algebraic Cusp Forms on $\mathrm{U}(2,n)$
Authors:
Anton Hilado,
Finn McGlade,
Pan Yan
Abstract:
We establish a theory of scalar Fourier coefficients for a class of non-holomorphic, automorphic forms on the quaternionic real Lie group $\mathrm{U}(2,n)$. By studying the theta lifts of holomorphic modular forms from $\mathrm{U}(1,1)$, we apply this theory to obtain examples of non-holomorphic cusp forms on $\mathrm{U}(2,n)$ whose Fourier coefficients are algebraic numbers.
We establish a theory of scalar Fourier coefficients for a class of non-holomorphic, automorphic forms on the quaternionic real Lie group $\mathrm{U}(2,n)$. By studying the theta lifts of holomorphic modular forms from $\mathrm{U}(1,1)$, we apply this theory to obtain examples of non-holomorphic cusp forms on $\mathrm{U}(2,n)$ whose Fourier coefficients are algebraic numbers.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
PDRs4All IX. Sulfur elemental abundance in the Orion Bar
Authors:
Asunción Fuente,
Evelyne Roueff,
Franck Le Petit,
Jacques Le Bourlot,
Emeric Bron,
Mark G. Wolfire,
James F. Babb,
Pei-Gen Yan,
Takashi Onaka,
John H. Black,
Ilane Schroetter,
Dries Van De Putte,
Ameek Sidhu,
Amélie Canin,
Boris Trahin,
Felipe Alarcón,
Ryan Chown,
Olga Kannavou,
Olivier Berné,
Emilie Habart,
Els Peeters,
Javier R. Goicoechea,
Marion Zannese,
Raphael Meshaka,
Yoko Okada
, et al. (9 additional authors not shown)
Abstract:
One of the main problems in astrochemistry is determining the amount of sulfur in volatiles and refractories in the interstellar medium. The detection of the main sulfur reservoirs (icy H$_2$S and atomic gas) has been challenging, and estimates are based on the reliability of models to account for the abundances of species containing less than 1% of the total sulfur. The high sensitivity of the Ja…
▽ More
One of the main problems in astrochemistry is determining the amount of sulfur in volatiles and refractories in the interstellar medium. The detection of the main sulfur reservoirs (icy H$_2$S and atomic gas) has been challenging, and estimates are based on the reliability of models to account for the abundances of species containing less than 1% of the total sulfur. The high sensitivity of the James Webb Space Telescope provides an unprecedented opportunity to estimate the sulfur abundance through the observation of the [S I] 25.249 $μ$m line. We used the [S III] 18.7 $μ$m, [S IV] 10.5 $μ$m, and [S l] 25.249 $μ$m lines to estimate the amount of sulfur in the ionized and molecular gas along the Orion Bar. For the theoretical part, we used an upgraded version of the Meudon photodissociation region (PDR) code to model the observations. New inelastic collision rates of neutral atomic sulfur with ortho- and para- molecular hydrogen were calculated to predict the line intensities. The [S III] 18.7 $μ$m and [S IV] 10.5 $μ$m lines are detected over the imaged region with a shallow increase (by a factor of 4) toward the HII region. We estimate a moderate sulfur depletion, by a factor of $\sim$2, in the ionized gas. The corrugated interface between the molecular and atomic phases gives rise to several edge-on dissociation fronts we refer to as DF1, DF2, and DF3. The [S l] 25.249 $μ$m line is only detected toward DF2 and DF3, the dissociation fronts located farthest from the HII region. The detailed modeling of DF3 using the Meudon PDR code shows that the emission of the [S l] 25.249 $μ$m line is coming from warm ($>$ 40 K) molecular gas located at A$_{\rm V}$ $\sim$ 1$-$5 mag from the ionization front. Moreover, the intensity of the [S l] 25.249 $μ$m line is only accounted for if we assume the presence of undepleted sulfur.
△ Less
Submitted 4 June, 2024; v1 submitted 14 April, 2024;
originally announced April 2024.
-
Joint Physical-Digital Facial Attack Detection Via Simulating Spoofing Clues
Authors:
Xianhua He,
Dashuang Liang,
Song Yang,
Zhanlong Hao,
Hui Ma,
Binjie Mao,
Xi Li,
Yao Wang,
Pengfei Yan,
Ajian Liu
Abstract:
Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to dev…
▽ More
Face recognition systems are frequently subjected to a variety of physical and digital attacks of different types. Previous methods have achieved satisfactory performance in scenarios that address physical attacks and digital attacks, respectively. However, few methods are considered to integrate a model that simultaneously addresses both physical and digital attacks, implying the necessity to develop and maintain multiple models. To jointly detect physical and digital attacks within a single model, we propose an innovative approach that can adapt to any network architecture. Our approach mainly contains two types of data augmentation, which we call Simulated Physical Spoofing Clues augmentation (SPSC) and Simulated Digital Spoofing Clues augmentation (SDSC). SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Extensive experiments show that SPSC and SDSC can achieve state-of-the-art generalization in Protocols 2.1 and 2.2 of the UniAttackData dataset, respectively. Our method won first place in "Unified Physical-Digital Face Attack Detection" of the 5th Face Anti-spoofing Challenge@CVPR2024. Our final submission obtains 3.75% APCER, 0.93% BPCER, and 2.34% ACER, respectively. Our code is available at https://github.com/Xianhua-He/cvpr2024-face-anti-spoofing-challenge.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Large-Scale Multi-Domain Recommendation: an Automatic Domain Feature Extraction and Personalized Integration Framework
Authors:
Dongbo Xi,
Zhen Chen,
Yuexian Wang,
He Cui,
Chong Peng,
Fuzhen Zhuang,
Peng Yan
Abstract:
Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dianping), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing cha…
▽ More
Feed recommendation is currently the mainstream mode for many real-world applications (e.g., TikTok, Dianping), it is usually necessary to model and predict user interests in multiple scenarios (domains) within and even outside the application. Multi-domain learning is a typical solution in this regard. While considerable efforts have been made in this regard, there are still two long-standing challenges: (1) Accurately depicting the differences among domains using domain features is crucial for enhancing the performance of each domain. However, manually designing domain features and models for numerous domains can be a laborious task. (2) Users typically have limited impressions in only a few domains. Extracting features automatically from other domains and leveraging them to improve the predictive capabilities of each domain has consistently posed a challenging problem. In this paper, we propose an Automatic Domain Feature Extraction and Personalized Integration (DFEI) framework for the large-scale multi-domain recommendation. The framework automatically transforms the behavior of each individual user into an aggregation of all user behaviors within the domain, which serves as the domain features. Unlike offline feature engineering methods, the extracted domain features are higher-order representations and directly related to the target label. Besides, by personalized integration of domain features from other domains for each user and the innovation in the training mode, the DFEI framework can yield more accurate conversion identification. Experimental results on both public and industrial datasets, consisting of over 20 domains, clearly demonstrate that the proposed framework achieves significantly better performance compared with SOTA baselines. Furthermore, we have released the source code of the proposed framework at https://github.com/xidongbo/DFEI.
△ Less
Submitted 14 April, 2024; v1 submitted 12 April, 2024;
originally announced April 2024.
-
MonoCD: Monocular 3D Object Detection with Complementary Depths
Authors:
Longfei Yan,
Pei Yan,
Shengzhou Xiong,
Xuanyu Xiang,
Yihua Tan
Abstract:
Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formu…
▽ More
Monocular 3D object detection has attracted widespread attention due to its potential to accurately obtain object 3D localization from a single image at a low cost. Depth estimation is an essential but challenging subtask of monocular 3D object detection due to the ill-posedness of 2D to 3D mapping. Many methods explore multiple local depth clues such as object heights and keypoints and then formulate the object depth estimation as an ensemble of multiple depth predictions to mitigate the insufficiency of single-depth information. However, the errors of existing multiple depths tend to have the same sign, which hinders them from neutralizing each other and limits the overall accuracy of combined depth. To alleviate this problem, we propose to increase the complementarity of depths with two novel designs. First, we add a new depth prediction branch named complementary depth that utilizes global and efficient depth clues from the entire image rather than the local clues to reduce the correlation of depth predictions. Second, we propose to fully exploit the geometric relations between multiple depth clues to achieve complementarity in form. Benefiting from these designs, our method achieves higher complementarity. Experiments on the KITTI benchmark demonstrate that our method achieves state-of-the-art performance without introducing extra data. In addition, complementary depth can also be a lightweight and plug-and-play module to boost multiple existing monocular 3d object detectors. Code is available at https://github.com/elvintanhust/MonoCD.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Calibrating the Confidence of Large Language Models by Eliciting Fidelity
Authors:
Mozhi Zhang,
Mianqiu Huang,
Rundong Shi,
Linsen Guo,
Chong Peng,
Peng Yan,
Yaqian Zhou,
Xipeng Qiu
Abstract:
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the…
▽ More
Large language models optimized with techniques like RLHF have achieved good alignment in being helpful and harmless. However, post-alignment, these language models often exhibit overconfidence, where the expressed confidence does not accurately calibrate with their correctness rate. In this paper, we decompose the language model confidence into the \textit{Uncertainty} about the question and the \textit{Fidelity} to the answer generated by language models. Then, we propose a plug-and-play method to estimate the confidence of language models. Our method has shown good calibration performance by conducting experiments with 6 RLHF-LMs on four MCQA datasets. Moreover, we propose two novel metrics, IPR and CE, to evaluate the calibration of the model, and we have conducted a detailed discussion on \textit{Truly Well-Calibrated Confidence}. Our method could serve as a strong baseline, and we hope that this work will provide some insights into the model confidence calibration.
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
Epsilon dichotomy for twisted linear models
Authors:
Hang Xue,
Pan Yan
Abstract:
Let $E/F$ be a quadratic extension of local nonarchimedean fields of characteristic zero and let $D$ be a quaternion algebra over $F$ containing $E$. In this paper, we study a relation between the existence of twisted linear models on $\mathrm{GL}_n(D)$ and the local root numbers.
Let $E/F$ be a quadratic extension of local nonarchimedean fields of characteristic zero and let $D$ be a quaternion algebra over $F$ containing $E$. In this paper, we study a relation between the existence of twisted linear models on $\mathrm{GL}_n(D)$ and the local root numbers.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Client-supervised Federated Learning: Towards One-model-for-all Personalization
Authors:
Peng Yan,
Guodong Long
Abstract:
Personalized Federated Learning (PerFL) is a new machine learning paradigm that delivers personalized models for diverse clients under federated learning settings. Most PerFL methods require extra learning processes on a client to adapt a globally shared model to the client-specific personalized model using its own local data. However, the model adaptation process in PerFL is still an open challen…
▽ More
Personalized Federated Learning (PerFL) is a new machine learning paradigm that delivers personalized models for diverse clients under federated learning settings. Most PerFL methods require extra learning processes on a client to adapt a globally shared model to the client-specific personalized model using its own local data. However, the model adaptation process in PerFL is still an open challenge in the stage of model deployment and test time. This work tackles the challenge by proposing a novel federated learning framework to learn only one robust global model to achieve competitive performance to those personalized models on unseen/test clients in the FL system. Specifically, we design a new Client-Supervised Federated Learning (FedCS) to unravel clients' bias on instances' latent representations so that the global model can learn both client-specific and client-agnostic knowledge. Experimental study shows that the FedCS can learn a robust FL global model for the changing data distributions of unseen/test clients. The FedCS's global model can be directly deployed to the test clients while achieving comparable performance to other personalized FL methods that require model adaptation.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Cohomology classes, periods, and special values of Rankin-Selberg $L$-functions
Authors:
Yubo Jin,
Pan Yan
Abstract:
In this article, we give a cohomological interpretation of (a special case of) the integrals constructed by the second named author and Q. Zhang \cite{YanZhang2023} which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_n\times\mathrm{GL}_m$ and $\mathrm{GL}_n\times\mathrm{GL}_{n-m-1}$ for $m<n$. As an application, we prove an algebraicity result for the special values of cert…
▽ More
In this article, we give a cohomological interpretation of (a special case of) the integrals constructed by the second named author and Q. Zhang \cite{YanZhang2023} which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_n\times\mathrm{GL}_m$ and $\mathrm{GL}_n\times\mathrm{GL}_{n-m-1}$ for $m<n$. As an application, we prove an algebraicity result for the special values of certain $L$-functions. This work is a generalization of the algebraicity result of Raghuram for $\mathrm{GL}_n\times\mathrm{GL}_{n-1}$ \cite{Raghuram2010} in the special case $m=n-1$, and the results of Mahnkopf \cite{Mahnkopf1998, Mahnkopf2005} in the special case $m=n-2$.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
Authors:
Xi Liu,
Ying Guo,
Cheng Zhen,
Tong Li,
Yingying Ao,
Pengfei Yan
Abstract:
Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but…
▽ More
Listening head generation aims to synthesize a non-verbal responsive listener head by modeling the correlation between the speaker and the listener in dynamic conversion.The applications of listener agent generation in virtual interaction have promoted many works achieving the diverse and fine-grained motion generation. However, they can only manipulate motions through simple emotional labels, but cannot freely control the listener's motions. Since listener agents should have human-like attributes (e.g. identity, personality) which can be freely customized by users, this limits their realism. In this paper, we propose a user-friendly framework called CustomListener to realize the free-form text prior guided listener generation. To achieve speaker-listener coordination, we design a Static to Dynamic Portrait module (SDP), which interacts with speaker information to transform static text into dynamic portrait token with completion rhythm and amplitude information. To achieve coherence between segments, we design a Past Guided Generation Module (PGG) to maintain the consistency of customized listener attributes through the motion prior, and utilize a diffusion-based structure conditioned on the portrait token and the motion prior to realize the controllable generation. To train and evaluate our model, we have constructed two text-annotated listening head datasets based on ViCo and RealTalk, which provide text-video paired labels. Extensive experiments have verified the effectiveness of our model.
△ Less
Submitted 29 March, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
ChartReformer: Natural Language-Driven Chart Image Editing
Authors:
Pengyu Yan,
Mahesh Bhosale,
Jay Lal,
Bikhyat Adhikari,
David Doermann
Abstract:
Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance for different application scenarios. To eliminate the need for original underlying data and information to perform chart editing, we propose ChartRef…
▽ More
Chart visualizations are essential for data interpretation and communication; however, most charts are only accessible in image format and lack the corresponding data tables and supplementary information, making it difficult to alter their appearance for different application scenarios. To eliminate the need for original underlying data and information to perform chart editing, we propose ChartReformer, a natural language-driven chart image editing solution that directly edits the charts from the input images with the given instruction prompts. The key in this method is that we allow the model to comprehend the chart and reason over the prompt to generate the corresponding underlying data table and visual attributes for new charts, enabling precise edits. Additionally, to generalize ChartReformer, we define and standardize various types of chart editing, covering style, layout, format, and data-centric edits. The experiments show promising results for the natural language-driven chart image editing.
△ Less
Submitted 1 May, 2024; v1 submitted 29 February, 2024;
originally announced March 2024.
-
General Purpose Image Encoder DINOv2 for Medical Image Registration
Authors:
Xinrui Song,
Xuanang Xu,
Pingkun Yan
Abstract:
Existing medical image registration algorithms rely on either dataset specific training or local texture-based features to align images. The former cannot be reliably implemented without large modality-specific training datasets, while the latter lacks global semantics thus could be easily trapped at local minima. In this paper, we present a training-free deformable image registration method, DINO…
▽ More
Existing medical image registration algorithms rely on either dataset specific training or local texture-based features to align images. The former cannot be reliably implemented without large modality-specific training datasets, while the latter lacks global semantics thus could be easily trapped at local minima. In this paper, we present a training-free deformable image registration method, DINO-Reg, leveraging a general purpose image encoder DINOv2 for image feature extraction. The DINOv2 encoder was trained using the ImageNet data containing natural images. We used the pretrained DINOv2 without any finetuning. Our method feeds the DINOv2 encoded features into a discrete optimizer to find the optimal deformable registration field. We conducted a series of experiments to understand the behavior and role of such a general purpose image encoder in the application of image registration. Combined with handcrafted features, our method won the first place in the recent OncoReg Challenge. To our knowledge, this is the first application of general vision foundation models in medical image registration.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Multimodal Neurodegenerative Disease Subtyping Explained by ChatGPT
Authors:
Diego Machado Reyes,
Hanqing Chao,
Juergen Hahn,
Li Shen,
Pingkun Yan
Abstract:
Alzheimer's disease (AD) is the most prevalent neurodegenerative disease; yet its currently available treatments are limited to stopping disease progression. Moreover, effectiveness of these treatments is not guaranteed due to the heterogenetiy of the disease. Therefore, it is essential to be able to identify the disease subtypes at a very early stage. Current data driven approaches are able to cl…
▽ More
Alzheimer's disease (AD) is the most prevalent neurodegenerative disease; yet its currently available treatments are limited to stopping disease progression. Moreover, effectiveness of these treatments is not guaranteed due to the heterogenetiy of the disease. Therefore, it is essential to be able to identify the disease subtypes at a very early stage. Current data driven approaches are able to classify the subtypes at later stages of AD or related disorders, but struggle when predicting at the asymptomatic or prodromal stage. Moreover, most existing models either lack explainability behind the classification or only use a single modality for the assessment, limiting scope of its analysis. Thus, we propose a multimodal framework that uses early-stage indicators such as imaging, genetics and clinical assessments to classify AD patients into subtypes at early stages. Similarly, we build prompts and use large language models, such as ChatGPT, to interpret the findings of our model. In our framework, we propose a tri-modal co-attention mechanism (Tri-COAT) to explicitly learn the cross-modal feature associations. Our proposed model outperforms baseline models and provides insight into key cross-modal feature associations supported by known biological mechanisms.
△ Less
Submitted 31 January, 2024;
originally announced February 2024.
-
Cross-Domain Few-Shot Segmentation via Iterative Support-Query Correspondence Mining
Authors:
Jiahao Nie,
Yun Xing,
Gongjie Zhang,
Pei Yan,
Aoran Xiao,
Yap-Peng Tan,
Alex C. Kot,
Shijian Lu
Abstract:
Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fin…
▽ More
Cross-Domain Few-Shot Segmentation (CD-FSS) poses the challenge of segmenting novel categories from a distinct domain using only limited exemplars. In this paper, we undertake a comprehensive study of CD-FSS and uncover two crucial insights: (i) the necessity of a fine-tuning stage to effectively transfer the learned meta-knowledge across domains, and (ii) the overfitting risk during the naïve fine-tuning due to the scarcity of novel category examples. With these insights, we propose a novel cross-domain fine-tuning strategy that addresses the challenging CD-FSS tasks. We first design Bi-directional Few-shot Prediction (BFP), which establishes support-query correspondence in a bi-directional manner, crafting augmented supervision to reduce the overfitting risk. Then we further extend BFP into Iterative Few-shot Adaptor (IFA), which is a recursive framework to capture the support-query correspondence iteratively, targeting maximal exploitation of supervisory signals from the sparse novel category samples. Extensive empirical evaluations show that our method significantly outperforms the state-of-the-arts (+7.8\%), which verifies that IFA tackles the cross-domain challenges and mitigates the overfitting simultaneously. The code is available at: https://github.com/niejiahao1998/IFA.
△ Less
Submitted 13 March, 2024; v1 submitted 16 January, 2024;
originally announced January 2024.
-
SkyMask: Attack-agnostic Robust Federated Learning with Fine-grained Learnable Masks
Authors:
Peishen Yan,
Hao Wang,
Tao Song,
Yang Hua,
Ruhui Ma,
Ningxin Hu,
Mohammad R. Haghighat,
Haibing Guan
Abstract:
Federated Learning (FL) is becoming a popular paradigm for leveraging distributed data and preserving data privacy. However, due to the distributed characteristic, FL systems are vulnerable to Byzantine attacks that compromised clients attack the global model by uploading malicious model updates. With the development of layer-level and parameter-level fine-grained attacks, the attacks' stealthines…
▽ More
Federated Learning (FL) is becoming a popular paradigm for leveraging distributed data and preserving data privacy. However, due to the distributed characteristic, FL systems are vulnerable to Byzantine attacks that compromised clients attack the global model by uploading malicious model updates. With the development of layer-level and parameter-level fine-grained attacks, the attacks' stealthiness and effectiveness have been significantly improved. The existing defense mechanisms solely analyze the model-level statistics of individual model updates uploaded by clients to mitigate Byzantine attacks, which are ineffective against fine-grained attacks due to unawareness or overreaction. To address this problem, we propose SkyMask, a new attack-agnostic robust FL system that firstly leverages fine-grained learnable masks to identify malicious model updates at the parameter level. Specifically, the FL server freezes and multiplies the model updates uploaded by clients with the parameter-level masks, and trains the masks over a small clean dataset (i.e., root dataset) to learn the subtle difference between benign and malicious model updates in a high-dimension space. Our extensive experiments involve different models on three public datasets under state-of-the-art (SOTA) attacks, where the results show that SkyMask achieves up to 14% higher testing accuracy compared with SOTA defense strategies under the same attacks and successfully defends against attacks with malicious clients of a high fraction up to 80%. Code is available at https://github.com/KoalaYan/SkyMask.
△ Less
Submitted 18 July, 2024; v1 submitted 19 December, 2023;
originally announced December 2023.
-
A continuous cold rubidium atomic beam with enhanced flux and tunable velocity
Authors:
Shengzhe Wang,
Zhixin Meng,
and Peiqiang Yan,
Yuanxing Liu,
Yanying Feng
Abstract:
We present a cold atomic beam source based on a two-dimensional (2D)+ magneto-optical trap (MOT), capable of generating a continuous cold beam of 87Rb atoms with a flux up to 4.3*10^9 atoms/s, a mean velocity of 10.96(2.20) m/s, and a transverse temperature of 16.90(1.56) uK. Investigating the influence of high cooling laser intensity, we observe a significant population loss of atoms to hyperfine…
▽ More
We present a cold atomic beam source based on a two-dimensional (2D)+ magneto-optical trap (MOT), capable of generating a continuous cold beam of 87Rb atoms with a flux up to 4.3*10^9 atoms/s, a mean velocity of 10.96(2.20) m/s, and a transverse temperature of 16.90(1.56) uK. Investigating the influence of high cooling laser intensity, we observe a significant population loss of atoms to hyperfine-level dark states. To account for this, we employ a multiple hyperfine level model to calculate the cooling efficiency associated with the population in dark states, subsequently modifying the scattering force. Simulations of beam flux at different cooling and repumping laser intensities using the modified scattering force are in agreement with experimental results. Optimizing repumping and cooling intensities enhances the flux by 50%. The influence of phase modulation on both the pushing and cooling lasers is experimentally studied, revealing that the mean velocity of cold atoms can be tuned from 9.5 m/s to 14.6 m/s with a phase-modulated pushing laser. The versatility of this continuous beam source, featuring high flux, controlled velocity, and narrow transverse temperature, renders it valuable for applications in atom interferometers and clocks, ultimately enhancing bandwidth, sensitivity, and signal contrast in these devices.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Empowering Dual-Level Graph Self-Supervised Pretraining with Motif Discovery
Authors:
Pengwei Yan,
Kaisong Song,
Zhuoren Jiang,
Yangyang Kang,
Tianqianjin Lin,
Changlong Sun,
Xiaozhong Liu
Abstract:
While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique…
▽ More
While self-supervised graph pretraining techniques have shown promising results in various domains, their application still experiences challenges of limited topology learning, human knowledge dependency, and incompetent multi-level interactions. To address these issues, we propose a novel solution, Dual-level Graph self-supervised Pretraining with Motif discovery (DGPM), which introduces a unique dual-level pretraining structure that orchestrates node-level and subgraph-level pretext tasks. Unlike prior approaches, DGPM autonomously uncovers significant graph motifs through an edge pooling module, aligning learned motif similarities with graph kernel-based similarities. A cross-matching task enables sophisticated node-motif interactions and novel representation learning. Extensive experiments on 15 datasets validate DGPM's effectiveness and generalizability, outperforming state-of-the-art methods in unsupervised representation learning and transfer learning settings. The autonomously discovered motifs demonstrate the potential of DGPM to enhance robustness and interpretability.
△ Less
Submitted 19 December, 2023;
originally announced December 2023.
-
Prompt Engineering-assisted Malware Dynamic Analysis Using GPT-4
Authors:
Pei Yan,
Shunquan Tan,
Miaohui Wang,
Jiwu Huang
Abstract:
Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated malware, thereby preventing them from invading computers. As a significant representation of dynamic malware behavior, the API (Application Programming Interface) sequence, comprised of consecutive API calls, has progressively become the dominant feature of dynamic analysis methods. Though there have been numerous deep l…
▽ More
Dynamic analysis methods effectively identify shelled, wrapped, or obfuscated malware, thereby preventing them from invading computers. As a significant representation of dynamic malware behavior, the API (Application Programming Interface) sequence, comprised of consecutive API calls, has progressively become the dominant feature of dynamic analysis methods. Though there have been numerous deep learning models for malware detection based on API sequences, the quality of API call representations produced by those models is limited. These models cannot generate representations for unknown API calls, which weakens both the detection performance and the generalization. Further, the concept drift phenomenon of API calls is prominent. To tackle these issues, we introduce a prompt engineering-assisted malware dynamic analysis using GPT-4. In this method, GPT-4 is employed to create explanatory text for each API call within the API sequence. Afterward, the pre-trained language model BERT is used to obtain the representation of the text, from which we derive the representation of the API sequence. Theoretically, this proposed method is capable of generating representations for all API calls, excluding the necessity for dataset training during the generation process. Utilizing the representation, a CNN-based detection model is designed to extract the feature. We adopt five benchmark datasets to validate the performance of the proposed model. The experimental results reveal that the proposed detection algorithm performs better than the state-of-the-art method (TextCNN). Specifically, in cross-database experiments and few-shot learning experiments, the proposed model achieves excellent detection performance and almost a 100% recall rate for malware, verifying its superior generalization performance. The code is available at: github.com/yan-scnu/Prompted_Dynamic_Detection.
△ Less
Submitted 13 December, 2023;
originally announced December 2023.
-
Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-Visual Segmentation
Authors:
Qi Yang,
Xing Nie,
Tong Li,
Pengfei Gao,
Ying Guo,
Cheng Zhen,
Pengfei Yan,
Shiming Xiang
Abstract:
Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral…
▽ More
Recently, an audio-visual segmentation (AVS) task has been introduced, aiming to group pixels with sounding objects within a given video. This task necessitates a first-ever audio-driven pixel-level understanding of the scene, posing significant challenges. In this paper, we propose an innovative audio-visual transformer framework, termed COMBO, an acronym for COoperation of Multi-order Bilateral relatiOns. For the first time, our framework explores three types of bilateral entanglements within AVS: pixel entanglement, modality entanglement, and temporal entanglement. Regarding pixel entanglement, we employ a Siam-Encoder Module (SEM) that leverages prior knowledge to generate more precise visual features from the foundational model. For modality entanglement, we design a Bilateral-Fusion Module (BFM), enabling COMBO to align corresponding visual and auditory signals bi-directionally. As for temporal entanglement, we introduce an innovative adaptive inter-frame consistency loss according to the inherent rules of temporal. Comprehensive experiments and ablation studies on AVSBench-object (84.7 mIoU on S4, 59.2 mIou on MS3) and AVSBench-semantic (42.1 mIoU on AVSS) datasets demonstrate that COMBO surpasses previous state-of-the-art methods. Code and more results will be publicly available at https://yannqi.github.io/AVS-COMBO/.
△ Less
Submitted 7 April, 2024; v1 submitted 11 December, 2023;
originally announced December 2023.
-
A Continuous Dual-Axis Atomic Interferometric Inertial Sensor
Authors:
Pei-Qiang Yan,
Wei-Chen Jia,
Sheng-Zhe Wang,
Yan-Ying Feng
Abstract:
We present an interferometric inertial sensor that utilizes two counter-propagating atomic beams with transverse two-dimensional cooling. By employing three parallel and spatially aligned Raman laser beams for Doppler-sensitive Raman transitions, we successfully generate inertia-sensitive Mach-Zehnder interference fringes with an interrogation length of $2L=54\,\rm{cm}$. The measured rotation and…
▽ More
We present an interferometric inertial sensor that utilizes two counter-propagating atomic beams with transverse two-dimensional cooling. By employing three parallel and spatially aligned Raman laser beams for Doppler-sensitive Raman transitions, we successfully generate inertia-sensitive Mach-Zehnder interference fringes with an interrogation length of $2L=54\,\rm{cm}$. The measured rotation and acceleration sensitivities are $0.25\,(μ\rm{rad/s})/\sqrt{Hz}$ and $0.12\,\rm{m}\textit{g}/\rm{\sqrt{Hz}}$, respectively. The sensor's capability to measure rotation and acceleration simultaneously in dynamic environments is validated through comparative analysis with classical sensors under force oscillation in different directions. Additionally, we conduct experiments on a turntable to calibrate the gyroscope's scaling factor and address nonlinearity.
△ Less
Submitted 28 November, 2023;
originally announced November 2023.
-
Unsupervised convolutional neural network fusion approach for change detection in remote sensing images
Authors:
Weidong Yan,
Pei Yan,
Li Cao
Abstract:
With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection.…
▽ More
With the rapid development of deep learning, a variety of change detection methods based on deep learning have emerged in recent years. However, these methods usually require a large number of training samples to train the network model, so it is very expensive. In this paper, we introduce a completely unsupervised shallow convolutional neural network (USCNN) fusion approach for change detection. Firstly, the bi-temporal images are transformed into different feature spaces by using convolution kernels of different sizes to extract multi-scale information of the images. Secondly, the output features of bi-temporal images at the same convolution kernels are subtracted to obtain the corresponding difference images, and the difference feature images at the same scale are fused into one feature image by using 1 * 1 convolution layer. Finally, the output features of different scales are concatenated and a 1 * 1 convolution layer is used to fuse the multi-scale information of the image. The model parameters are obtained by a redesigned sparse function. Our model has three features: the entire training process is conducted in an unsupervised manner, the network architecture is shallow, and the objective function is sparse. Thus, it can be seen as a kind of lightweight network model. Experimental results on four real remote sensing datasets indicate the feasibility and effectiveness of the proposed approach.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
LatentWarp: Consistent Diffusion Latents for Zero-Shot Video-to-Video Translation
Authors:
Yuxiang Bao,
Di Qiu,
Guoliang Kang,
Baochang Zhang,
Bo Jin,
Kaiye Wang,
Pengfei Yan
Abstract:
Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to enc…
▽ More
Leveraging the generative ability of image diffusion models offers great potential for zero-shot video-to-video translation. The key lies in how to maintain temporal consistency across generated video frames by image diffusion models. Previous methods typically adopt cross-frame attention, \emph{i.e.,} sharing the \textit{key} and \textit{value} tokens across attentions of different frames, to encourage the temporal consistency. However, in those works, temporal inconsistency issue may not be thoroughly solved, rendering the fidelity of generated videos limited.%The current state of the art cross-frame attention method aims at maintaining fine-grained visual details across frames, but it is still challenged by the temporal coherence problem. In this paper, we find the bottleneck lies in the unconstrained query tokens and propose a new zero-shot video-to-video translation framework, named \textit{LatentWarp}. Our approach is simple: to constrain the query tokens to be temporally consistent, we further incorporate a warping operation in the latent space to constrain the query tokens. Specifically, based on the optical flow obtained from the original video, we warp the generated latent features of last frame to align with the current frame during the denoising process. As a result, the corresponding regions across the adjacent frames can share closely-related query tokens and attention outputs, which can further improve latent-level consistency to enhance visual temporal coherence of generated videos. Extensive experiment results demonstrate the superiority of \textit{LatentWarp} in achieving video-to-video translation with temporal coherence.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Constructing the Fulde-Ferrell-Larkin-Ovchinnikov state in antiferromagnetic insulator CrOCl
Authors:
Yifan Ding,
Jiadian He,
Shihao Zhang,
Huakun Zuo,
Pingfan Gu,
Jiliang Cai,
Xiaohui Zeng,
Pu Yan,
Kecheng Cao,
Kenji Watanabe,
Takashi Taniguchi,
Peng Dong,
Yiwen Zhang,
Yueshen Wu,
Xiang Zhou,
Jinghui Wang,
Yulin Chen,
Yu Ye,
Jianpeng Liu,
Jun Li
Abstract:
Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, w…
▽ More
Time reversal symmetry breaking in superconductors, resulting from external magnetic fields or spontaneous magnetization, often leads to unconventional superconducting properties. In this way, a conventional Fulde-Ferrell-Larkin-Ovchinnikov (FFLO) state, characterized by the Cooper pairs with nonzero total momentum, may be realized by the Zeeman effect caused from external magnetic fields. Here, we report the observation of superconductivity in a few-layer antiferromagnetic insulator CrOCl by utilizing superconducting proximity effect with NbSe2 flakes. The superconductivity demonstrates a considerably weak gap of about 0.12 meV and the in-plane upper critical field reveals as behavior of the FFLO state at low temperature. Our first-principles calculations indicate that the proximitized superconductivity may exist in the CrOCl layer with Cr vacancies or line-defects. Moreover, the FFLO state could be induced by the inherent larger spin splitting in the CrOCl layer. Our findings not only demonstrate the fascinating interaction between superconductivity and magnetism, but also provide a possible path to construct FFLO state by intrinsic time reversal symmetry breaking and superconducting proximity effect.
△ Less
Submitted 31 October, 2023;
originally announced November 2023.
-
Some remarks on strong multiplicity one for paramodular forms
Authors:
Xiyuan Wang,
Zhining Wei,
Pan Yan,
Shaoyun Yi
Abstract:
We establish several refined strong multiplicity one results for paramodular cusp forms by using the spinor and standard $L$-functions with the combination of the methods from both of automorphic side and Galois side.
We establish several refined strong multiplicity one results for paramodular cusp forms by using the spinor and standard $L$-functions with the combination of the methods from both of automorphic side and Galois side.
△ Less
Submitted 26 October, 2023;
originally announced October 2023.
-
STRAW: Structure-Adaptive Weighting Procedure for Large-Scale Spatial Multiple Testing
Authors:
Pengfei Wang,
Pengyu Yan,
Canhui Li
Abstract:
The problem of large-scale spatial multiple testing is often encountered in various scientific research fields, where the signals are usually enriched on some regions while sparse on others. To integrate spatial structure information from nearby locations, we propose a novel approach, called {\bf STR}ucture-{\bf A}daptive {\bf W}eighting (STRAW) procedure, for large-scale spatial multiple testing.…
▽ More
The problem of large-scale spatial multiple testing is often encountered in various scientific research fields, where the signals are usually enriched on some regions while sparse on others. To integrate spatial structure information from nearby locations, we propose a novel approach, called {\bf STR}ucture-{\bf A}daptive {\bf W}eighting (STRAW) procedure, for large-scale spatial multiple testing. The STRAW procedure is capable of handling a broad range of spatial settings by leveraging a class of weighted p-values and is fully data-driven. Theoretical results show that the proposed method controls the false discovery rate (FDR) at the pre-specified level under some mild conditions. In practice, the local sparsity level, defined as the probability of the null hypothesis being not true, is commonly unknown. To address this issue, we develop a new method for estimating the local sparsity level by employing the kernel-smooth local false discovery rate (Lfdr) statistic. The superior numerical performance of the STRAW procedure is demonstrated by performing extensive simulation studies and a real data analysis.
△ Less
Submitted 27 September, 2023;
originally announced September 2023.
-
Product of Rankin-Selberg convolutions and a new proof of Jacquet's local converse conjecture
Authors:
Pan Yan,
Qing Zhang
Abstract:
In this article, we construct a family of integrals which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_{l}\times \mathrm{GL}_m$ and of $\mathrm{GL}_{l}\times \mathrm{GL}_n $ when $m+n<l$. When $n=0$, these integrals are those defined by Jacquet--Piatetski-Shapiro--Shalika up to a shift. In this sense, these new integrals generalize Jacquet--Piatetski-Shapiro--Shalika's Ran…
▽ More
In this article, we construct a family of integrals which represent the product of Rankin-Selberg $L$-functions of $\mathrm{GL}_{l}\times \mathrm{GL}_m$ and of $\mathrm{GL}_{l}\times \mathrm{GL}_n $ when $m+n<l$. When $n=0$, these integrals are those defined by Jacquet--Piatetski-Shapiro--Shalika up to a shift. In this sense, these new integrals generalize Jacquet--Piatetski-Shapiro--Shalika's Rankin-Selberg convolution integrals. We study basic properties of these integrals. In particular, we define local gamma factors using this new family of integrals. As an application, we obtain a new proof of Jacquet's local converse conjecture using these new integrals.
△ Less
Submitted 19 September, 2023;
originally announced September 2023.
-
Terahertz magnon frequency comb
Authors:
Xianglong Yao,
Zhejunyu Jin,
Zhenyu Wang,
Zhaozhuo Zeng,
Peng Yan
Abstract:
Magnon frequency comb (MFC), the spin-wave spectra composing of equidistant coherent peaks, is attracting much attention in magnonics. A terahertz (THz) MFC, combining the advantages of the THz and MFC technologies, is highly desired because it would significantly advance the MFC applications in ultrafast magnonic metrology, sensing, and communications. Here, we show that the THz MFC can be genera…
▽ More
Magnon frequency comb (MFC), the spin-wave spectra composing of equidistant coherent peaks, is attracting much attention in magnonics. A terahertz (THz) MFC, combining the advantages of the THz and MFC technologies, is highly desired because it would significantly advance the MFC applications in ultrafast magnonic metrology, sensing, and communications. Here, we show that the THz MFC can be generated by nonlinear interactions between spin waves and skyrmions in antiferromagnets [Z. Jin \emph{et al}., \href{https://doi.org/10.48550/arXiv.2301.03211}{arXiv:2301.03211}]. It is found that the strength of the three-wave mixing between propagating magnons and breathing skyrmions follows a linear dependence on the driving frequency and the MFC signal can be observed over a broad driving frequency range. Our results extend the working frequency of MFC to the THz regime, which would have potential applications in ultrafast spintronic devices and promote the development of nonlinear magnonics in antiferromagnets.
△ Less
Submitted 18 September, 2023;
originally announced September 2023.
-
Spectral Adversarial MixUp for Few-Shot Unsupervised Domain Adaptation
Authors:
Jiajin Zhang,
Hanqing Chao,
Amit Dhurandhar,
Pin-Yu Chen,
Ali Tajer,
Yangyang Xu,
Pingkun Yan
Abstract:
Domain shift is a common problem in clinical applications, where the training images (source domain) and the test images (target domain) are under different distributions. Unsupervised Domain Adaptation (UDA) techniques have been proposed to adapt models trained in the source domain to the target domain. However, those methods require a large number of images from the target domain for model train…
▽ More
Domain shift is a common problem in clinical applications, where the training images (source domain) and the test images (target domain) are under different distributions. Unsupervised Domain Adaptation (UDA) techniques have been proposed to adapt models trained in the source domain to the target domain. However, those methods require a large number of images from the target domain for model training. In this paper, we propose a novel method for Few-Shot Unsupervised Domain Adaptation (FSUDA), where only a limited number of unlabeled target domain samples are available for training. To accomplish this challenging task, first, a spectral sensitivity map is introduced to characterize the generalization weaknesses of models in the frequency domain. We then developed a Sensitivity-guided Spectral Adversarial MixUp (SAMix) method to generate target-style images to effectively suppresses the model sensitivity, which leads to improved model generalizability in the target domain. We demonstrated the proposed method and rigorously evaluated its performance on multiple tasks using several public datasets.
△ Less
Submitted 3 September, 2023;
originally announced September 2023.
-
SpaDen : Sparse and Dense Keypoint Estimation for Real-World Chart Understanding
Authors:
Saleem Ahmed,
Pengyu Yan,
David Doermann,
Srirangaraj Setlur,
Venu Govindaraju
Abstract:
We introduce a novel bottom-up approach for the extraction of chart data. Our model utilizes images of charts as inputs and learns to detect keypoints (KP), which are used to reconstruct the components within the plot area. Our novelty lies in detecting a fusion of continuous and discrete KP as predicted heatmaps. A combination of sparse and dense per-pixel objectives coupled with a uni-modal self…
▽ More
We introduce a novel bottom-up approach for the extraction of chart data. Our model utilizes images of charts as inputs and learns to detect keypoints (KP), which are used to reconstruct the components within the plot area. Our novelty lies in detecting a fusion of continuous and discrete KP as predicted heatmaps. A combination of sparse and dense per-pixel objectives coupled with a uni-modal self-attention-based feature-fusion layer is applied to learn KP embeddings. Further leveraging deep metric learning for unsupervised clustering, allows us to segment the chart plot area into various objects. By further matching the chart components to the legend, we are able to obtain the data series names. A post-processing threshold is applied to the KP embeddings to refine the object reconstructions and improve accuracy. Our extensive experiments include an evaluation of different modules for KP estimation and the combination of deep layer aggregation and corner pooling approaches. The results of our experiments provide extensive evaluation for the task of real-world chart data extraction.
△ Less
Submitted 3 August, 2023;
originally announced August 2023.
-
The ring-shaped shadow of rotating naked singularity with a complete photon sphere
Authors:
Mingzhi Wang,
Guanghai Guo,
Pengfei Yan,
Songbai Chen,
Jiliang Jing
Abstract:
We investigate the shadows of Konoplya-Zhidenko naked singularity. In the spacetime of Konoplya-Zhidenko naked singularity, not only can unstable retrograde light ring (LR) exist, but also unstable prograde LR, leading to the formation of a complete photon sphere (PS). Due to the absence of an event horizon, a dark disc-shaped shadow does not appear; instead, a ring-shaped shadow is observed. The…
▽ More
We investigate the shadows of Konoplya-Zhidenko naked singularity. In the spacetime of Konoplya-Zhidenko naked singularity, not only can unstable retrograde light ring (LR) exist, but also unstable prograde LR, leading to the formation of a complete photon sphere (PS). Due to the absence of an event horizon, a dark disc-shaped shadow does not appear; instead, a ring-shaped shadow is observed. The ring-shaped shadow appears as an infinite number of relativistic Einstein rings in the image of the naked singularity. For some parameter values, only the unstable retrograde LR exists, resulting in an incomplete unstable PS and consequently giving rise to the arc-shaped shadow for Konoplya-Zhidenko naked singularity. The shadow of Konoplya-Zhidenko naked singularity gradually shifts to the right as the rotation parameter $a$ increases, and gradually becomes smaller as the deformation parameter $|η|$ increases. Moreover, the stable LRs and stable photon spherical orbits can also exist in Konoplya-Zhidenko naked singularity spacetime, but they have no effect on the image of the naked singularity. This study demonstrates that rotating naked singularity can exhibit not only an arc-shaped shadow but also a ring-shaped shadow.
△ Less
Submitted 5 June, 2024; v1 submitted 31 July, 2023;
originally announced July 2023.
-
Fact-Checking of AI-Generated Reports
Authors:
Razi Mahmood,
Ge Wang,
Mannudeep Kalra,
Pingkun Yan
Abstract:
With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new m…
▽ More
With advances in generative artificial intelligence (AI), it is now possible to produce realistic-looking automated reports for preliminary reads of radiology images. This can expedite clinical workflows, improve accuracy and reduce overall costs. However, it is also well-known that such models often hallucinate, leading to false findings in the generated reports. In this paper, we propose a new method of fact-checking of AI-generated reports using their associated images. Specifically, the developed examiner differentiates real and fake sentences in reports by learning the association between an image and sentences describing real or potentially fake findings. To train such an examiner, we first created a new dataset of fake reports by perturbing the findings in the original ground truth radiology reports associated with images. Text encodings of real and fake sentences drawn from these reports are then paired with image encodings to learn the mapping to real/fake labels. The utility of such an examiner is demonstrated for verifying automatically generated reports by detecting and removing fake sentences. Future generative AI approaches can use the resulting tool to validate their reports leading to a more responsible use of AI in expediting clinical workflows.
△ Less
Submitted 27 July, 2023;
originally announced July 2023.
-
Controllable Guide-Space for Generalizable Face Forgery Detection
Authors:
Ying Guo,
Cheng Zhen,
Pengfei Yan
Abstract:
Recent studies on face forgery detection have shown satisfactory performance for methods involved in training datasets, but are not ideal enough for unknown domains. This motivates many works to improve the generalization, but forgery-irrelevant information, such as image background and identity, still exists in different domain features and causes unexpected clustering, limiting the generalizatio…
▽ More
Recent studies on face forgery detection have shown satisfactory performance for methods involved in training datasets, but are not ideal enough for unknown domains. This motivates many works to improve the generalization, but forgery-irrelevant information, such as image background and identity, still exists in different domain features and causes unexpected clustering, limiting the generalization. In this paper, we propose a controllable guide-space (GS) method to enhance the discrimination of different forgery domains, so as to increase the forgery relevance of features and thereby improve the generalization. The well-designed guide-space can simultaneously achieve both the proper separation of forgery domains and the large distance between real-forgery domains in an explicit and controllable manner. Moreover, for better discrimination, we use a decoupling module to weaken the interference of forgery-irrelevant correlations between domains. Furthermore, we make adjustments to the decision boundary manifold according to the clustering degree of the same domain features within the neighborhood. Extensive experiments in multiple in-domain and cross-domain settings confirm that our method can achieve state-of-the-art generalization.
△ Less
Submitted 26 July, 2023;
originally announced July 2023.
-
Evaluating Large Language Models for Radiology Natural Language Processing
Authors:
Zhengliang Liu,
Tianyang Zhong,
Yiwei Li,
Yutong Zhang,
Yi Pan,
Zihao Zhao,
Peixin Dong,
Chao Cao,
Yuxiao Liu,
Peng Shu,
Yaonai Wei,
Zihao Wu,
Chong Ma,
Jiaqi Wang,
Sheng Wang,
Mengyue Zhou,
Zuowei Jiang,
Chunlin Li,
Jason Holmes,
Shaochen Xu,
Lu Zhang,
Haixing Dai,
Kai Zhang,
Lin Zhao,
Yuanhao Chen
, et al. (20 additional authors not shown)
Abstract:
The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a compreh…
▽ More
The rise of large language models (LLMs) has marked a pivotal shift in the field of natural language processing (NLP). LLMs have revolutionized a multitude of domains, and they have made a significant impact in the medical field. Large language models are now more abundant than ever, and many of these models exhibit bilingual capabilities, proficient in both English and Chinese. However, a comprehensive evaluation of these models remains to be conducted. This lack of assessment is especially apparent within the context of radiology NLP. This study seeks to bridge this gap by critically evaluating thirty two LLMs in interpreting radiology reports, a crucial component of radiology NLP. Specifically, the ability to derive impressions from radiologic findings is assessed. The outcomes of this evaluation provide key insights into the performance, strengths, and weaknesses of these LLMs, informing their practical applications within the medical domain.
△ Less
Submitted 27 July, 2023; v1 submitted 25 July, 2023;
originally announced July 2023.
-
Soft-tissue Driven Craniomaxillofacial Surgical Planning
Authors:
Xi Fang,
Daeseung Kim,
Xuanang Xu,
Tianshu Kuang,
Nathan Lampen,
Jungwook Lee,
Hannah H. Deng,
Jaime Gateno,
Michael A. K. Liebschner,
James J. Xia,
Pingkun Yan
Abstract:
In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct fac…
▽ More
In CMF surgery, the planning of bony movement to achieve a desired facial outcome is a challenging task. Current bone driven approaches focus on normalizing the bone with the expectation that the facial appearance will be corrected accordingly. However, due to the complex non-linear relationship between bony structure and facial soft-tissue, such bone-driven methods are insufficient to correct facial deformities. Despite efforts to simulate facial changes resulting from bony movement, surgical planning still relies on iterative revisions and educated guesses. To address these issues, we propose a soft-tissue driven framework that can automatically create and verify surgical plans. Our framework consists of a bony planner network that estimates the bony movements required to achieve the desired facial outcome and a facial simulator network that can simulate the possible facial changes resulting from the estimated bony movement plans. By combining these two models, we can verify and determine the final bony movement required for planning. The proposed framework was evaluated using a clinical dataset, and our experimental results demonstrate that the soft-tissue driven approach greatly improves the accuracy and efficacy of surgical planning when compared to the conventional bone-driven approach.
△ Less
Submitted 20 July, 2023;
originally announced July 2023.