-
Fine-Tuning Medical Language Models for Enhanced Long-Contextual Understanding and Domain Expertise
Authors:
Qimin Yang,
Rongsheng Wang,
Jiexin Chen,
Runqi Su,
Tao Tan
Abstract:
Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities…
▽ More
Large Language Models (LLMs) have been widely applied in various professional fields. By fine-tuning the models using domain specific question and answer datasets, the professional domain knowledge and Q\&A abilities of these models have significantly improved, for example, medical professional LLMs that use fine-tuning of doctor-patient Q\&A data exhibit extraordinary disease diagnostic abilities. However, we observed that despite improvements in specific domain knowledge, the performance of medical LLM in long-context understanding has significantly declined, especially compared to general language models with similar parameters. The purpose of this study is to investigate the phenomenon of reduced performance in understanding long-context in medical LLM. We designed a series of experiments to conduct open-book professional knowledge exams on all models to evaluate their ability to read long-context. By adjusting the proportion and quantity of general data and medical data in the process of fine-tuning, we can determine the best data composition to optimize the professional model and achieve a balance between long-context performance and specific domain knowledge.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Magnetic and nematic order of Bose-Fermi mixtures in moiré superlattices of 2D semiconductors
Authors:
Feng-Ren Fan,
Tixuan Tan,
Chengxin Xiao,
Wang Yao
Abstract:
We investigate the magnetic orders in a mixture of Boson (exciton) and Fermion (electron or hole) trapped in transition-metal dichalcogenides moiré superlattices. A sizable antiferromagnetic exchange interaction is found between a carrier and an interlayer exciton trapped at different high symmetry points of the moiré supercell. This interaction at a distance much shorter than the carrier-carrier…
▽ More
We investigate the magnetic orders in a mixture of Boson (exciton) and Fermion (electron or hole) trapped in transition-metal dichalcogenides moiré superlattices. A sizable antiferromagnetic exchange interaction is found between a carrier and an interlayer exciton trapped at different high symmetry points of the moiré supercell. This interaction at a distance much shorter than the carrier-carrier separation dominates the magnetic order in the Bose-Fermi mixture, where the carrier sublattice develops ferromagnetism opposite to that in the exciton sublattice. We demonstrate the possibility of increasing the Curie temperature of moiré carriers through electrical tuning of the exciton density in the ground state. In a trilayer moiré system with a p-n-p type band alignment, the exciton-carrier interplay can establish a layered antiferromagnetism for holes confined in the two outer layers. We further reveal a spontaneous nematic order in the Bose-Fermi mixture, arising from the interference between the Coulomb interaction and p-wave interlayer tunneling dictated by the stacking registry.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability
Authors:
Ting Fang Tan,
Kabilan Elangovan,
Jasmine Ong,
Nigam Shah,
Joseph Sung,
Tien Yin Wong,
Lan Xue,
Nan Liu,
Haibo Wang,
Chang Fu Kuo,
Simon Chesterman,
Zee Kin Yeong,
Daniel SW Ting
Abstract:
A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models…
▽ More
A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Enhanced Battery Degradation-Aware Scheduling for Distribution Network with Electric Vehicle Load
Authors:
Vijay Babu Pamshetti,
Wei Zhang,
Andy Man-Fai Ng,
Qingyu Yan,
Kuan Tak Tan
Abstract:
Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the netwo…
▽ More
Batteries play a key role in today's power grid. In this paper, we investigate the impact of battery degradation on the distribution network. We formulate a multi-objective framework for optimizing battery scheduling with the goals of minimizing monetary costs and improving network performance. Our framework incorporates energy purchase and battery degradation into the costs and measures the network performance through energy losses and voltage deviation. We propose Bach for battery degradation-aware cheduling based on e-constraint and fuzzy logic methods. Bach is implemented for the IEEE 33-bus network for an experimental study. The results show the effectiveness of Bach in optimizing costs and performance simultaneously with battery degradation awareness and demonstrate the flexibility of further customization.
△ Less
Submitted 9 July, 2024;
originally announced July 2024.
-
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Authors:
Ye Bai,
Jingping Chen,
Jitong Chen,
Wei Chen,
Zhuo Chen,
Chuang Ding,
Linhao Dong,
Qianqian Dong,
Yujiao Du,
Kepan Gao,
Lu Gao,
Yi Guo,
Minglun Han,
Ting Han,
Wenchao Hu,
Xinying Hu,
Yuxiang Hu,
Deyu Hua,
Lu Huang,
Mingkun Huang,
Youjia Huang,
Jishuo Jin,
Fanliu Kong,
Zongwei Lan,
Tianyu Li
, et al. (30 additional authors not shown)
Abstract:
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this wor…
▽ More
Modern automatic speech recognition (ASR) model is required to accurately transcribe diverse speech signals (from different domains, languages, accents, etc) given the specific contextual information in various application scenarios. Classic end-to-end models fused with extra language models perform well, but mainly in data matching scenarios and are gradually approaching a bottleneck. In this work, we introduce Seed-ASR, a large language model (LLM) based speech recognition model. Seed-ASR is developed based on the framework of audio conditioned LLM (AcLLM), leveraging the capabilities of LLMs by inputting continuous speech representations together with contextual information into the LLM. Through stage-wise large-scale training and the elicitation of context-aware capabilities in LLM, Seed-ASR demonstrates significant improvement over end-to-end models on comprehensive evaluation sets, including multiple domains, accents/dialects and languages. Additionally, Seed-ASR can be further deployed to support specific needs in various scenarios without requiring extra language models. Compared to recently released large ASR models, Seed-ASR achieves 10%-40% reduction in word (or character, for Chinese) error rates on Chinese and English public test sets, further demonstrating its powerful performance.
△ Less
Submitted 10 July, 2024; v1 submitted 5 July, 2024;
originally announced July 2024.
-
Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI
Authors:
Luyi Han,
Tao Tan,
Tianyu Zhang,
Xin Wang,
Yuan Gao,
Chunyao Lu,
Xinglong Liang,
Haoran Dou,
Yunzhi Huang,
Ritse Mann
Abstract:
Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec…
▽ More
Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the reconstruction of distinct sequences from the common latent space. We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation. Experiments using BraTS2021 dataset show that our non-adversarial model outperforms other GAN-based methods, and VQC latent space aids our model to achieve (1) anti-interference ability, which can eliminate the effects of noise, bias fields, and artifacts, and (2) solid semantic representation ability, with the potential of one-shot segmentation. Our code is publicly available.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Mobile-Bench: An Evaluation Benchmark for LLM-based Mobile Agents
Authors:
Shihan Deng,
Weikai Xu,
Hongda Sun,
Wei Liu,
Tao Tan,
Jianfeng Liu,
Ang Li,
Jian Luan,
Bin Wang,
Rui Yan,
Shuo Shang
Abstract:
With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions…
▽ More
With the remarkable advancements of large language models (LLMs), LLM-based agents have become a research hotspot in human-computer interaction. However, there is a scarcity of benchmarks available for LLM-based mobile agents. Benchmarking these agents generally faces three main challenges: (1) The inefficiency of UI-only operations imposes limitations to task evaluation. (2) Specific instructions within a singular application lack adequacy for assessing the multi-dimensional reasoning and decision-making capacities of LLM mobile agents. (3) Current evaluation metrics are insufficient to accurately assess the process of sequential actions. To this end, we propose Mobile-Bench, a novel benchmark for evaluating the capabilities of LLM-based mobile agents. First, we expand conventional UI operations by incorporating 103 collected APIs to accelerate the efficiency of task completion. Subsequently, we collect evaluation data by combining real user queries with augmentation from LLMs. To better evaluate different levels of planning capabilities for mobile agents, our data is categorized into three distinct groups: SAST, SAMT, and MAMT, reflecting varying levels of task complexity. Mobile-Bench comprises 832 data entries, with more than 200 tasks specifically designed to evaluate multi-APP collaboration scenarios. Furthermore, we introduce a more accurate evaluation metric, named CheckPoint, to assess whether LLM-based mobile agents reach essential points during their planning and reasoning steps.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Artificial Immune System of Secure Face Recognition Against Adversarial Attacks
Authors:
Min Ren,
Yunlong Wang,
Yuhao Zhu,
Yongzhen Huang,
Zhenan Sun,
Qi Li,
Tieniu Tan
Abstract:
Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored…
▽ More
Insect production for food and feed presents a promising supplement to ensure food safety and address the adverse impacts of agriculture on climate and environment in the future. However, optimisation is required for insect production to realise its full potential. This can be by targeted improvement of traits of interest through selective breeding, an approach which has so far been underexplored and underutilised in insect farming. Here we present a comprehensive review of the selective breeding framework in the context of insect production. We systematically evaluate adjustments of selective breeding techniques to the realm of insects and highlight the essential components integral to the breeding process. The discussion covers every step of a conventional breeding scheme, such as formulation of breeding objectives, phenotyping, estimation of genetic parameters and breeding values, selection of appropriate breeding strategies, and mitigation of issues associated with genetic diversity depletion and inbreeding. This review combines knowledge from diverse disciplines, bridging the gap between animal breeding, quantitative genetics, evolutionary biology, and entomology, offering an integrated view of the insect breeding research area and uniting knowledge which has previously remained scattered across diverse fields of expertise.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Authors:
Guangzhi Sun,
Wenyi Yu,
Changli Tang,
Xianzhao Chen,
Tian Tan,
Wei Li,
Lu Lu,
Zejun Ma,
Yuxuan Wang,
Chao Zhang
Abstract:
Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required b…
▽ More
Speech understanding as an element of the more generic video understanding using audio-visual large language models (av-LLMs) is a crucial yet understudied aspect. This paper proposes video-SALMONN, a single end-to-end av-LLM for video processing, which can understand not only visual frame sequences, audio events and music, but speech as well. To obtain fine-grained temporal information required by speech understanding, while keeping efficient for other video elements, this paper proposes a novel multi-resolution causal Q-Former (MRC Q-Former) structure to connect pre-trained audio-visual encoders and the backbone large language model. Moreover, dedicated training approaches including the diversity loss and the unpaired audio-visual mixed training scheme are proposed to avoid frames or modality dominance. On the introduced speech-audio-visual evaluation benchmark, video-SALMONN achieves more than 25\% absolute accuracy improvements on the video-QA task and over 30\% absolute accuracy improvements on audio-visual QA tasks with human speech. In addition, video-SALMONN demonstrates remarkable video comprehension and reasoning abilities on tasks that are unprecedented by other av-LLMs. Our training code and model checkpoints are available at \texttt{\url{https://github.com/bytedance/SALMONN/}}.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
TOI-2374 b and TOI-3071 b: two metal-rich sub-Saturns well within the Neptunian desert
Authors:
Alejandro Hacker,
Rodrigo F. Díaz,
David J. Armstrong,
Jorge Fernández Fernández,
Simon Müller,
Elisa Delgado-Mena,
Sérgio G. Sousa,
Vardan Adibekyan,
Keivan G. Stassun,
Karen A. Collins,
Samuel W. Yee,
Daniel Bayliss,
Allyson Bieryla,
François Bouchy,
R. Paul Butler,
Jeffrey D. Crane,
Xavier Dumusque,
Joel D. Hartman,
Ravit Helled,
Jon Jenkins,
Marcelo Aron F. Keniger,
Hannah Lewis,
Jorge Lillo-Box,
Michael B. Lund,
Louise D. Nielsen
, et al. (18 additional authors not shown)
Abstract:
We report the discovery of two transiting planets detected by the Transiting Exoplanet Survey Satellite (TESS), TOI-2374 b and TOI-3071 b, orbiting a K5V and an F8V star, respectively, with periods of 4.31 and 1.27 days, respectively. We confirm and characterize these two planets with a variety of ground-based and follow-up observations, including photometry, precise radial velocity monitoring and…
▽ More
We report the discovery of two transiting planets detected by the Transiting Exoplanet Survey Satellite (TESS), TOI-2374 b and TOI-3071 b, orbiting a K5V and an F8V star, respectively, with periods of 4.31 and 1.27 days, respectively. We confirm and characterize these two planets with a variety of ground-based and follow-up observations, including photometry, precise radial velocity monitoring and high-resolution imaging. The planetary and orbital parameters were derived from a joint analysis of the radial velocities and photometric data. We found that the two planets have masses of $(57 \pm 4)$ $M_\oplus$ or $(0.18 \pm 0.01)$ $M_J$, and $(68 \pm 4)$ $M_\oplus$ or $(0.21 \pm 0.01)$ $M_J$, respectively, and they have radii of $(6.8 \pm 0.3)$ $R_\oplus$ or $(0.61 \pm 0.03)$ $R_J$ and $(7.2 \pm 0.5)$ $R_\oplus$ or $(0.64 \pm 0.05)$ $R_J$, respectively. These parameters correspond to sub-Saturns within the Neptunian desert, both planets being hot and highly irradiated, with $T_{\rm eq} \approx 745$ $K$ and $T_{\rm eq} \approx 1812$ $K$, respectively, assuming a Bond albedo of 0.5. TOI-3071 b has the hottest equilibrium temperature of all known planets with masses between $10$ and $300$ $M_\oplus$ and radii less than $1.5$ $R_J$. By applying gas giant evolution models we found that both planets, especially TOI-3071 b, are very metal-rich. This challenges standard formation models which generally predict lower heavy-element masses for planets with similar characteristics. We studied the evolution of the planets' atmospheres under photoevaporation and concluded that both are stable against evaporation due to their large masses and likely high metallicities in their gaseous envelopes.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Text-aware Speech Separation for Multi-talker Keyword Spotting
Authors:
Haoyu Li,
Baochen Yang,
Yu Xi,
Linfeng Yu,
Tian Tan,
Hao Li,
Kai Yu
Abstract:
For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To ad…
▽ More
For noisy environments, ensuring the robustness of keyword spotting (KWS) systems is essential. While much research has focused on noisy KWS, less attention has been paid to multi-talker mixed speech scenarios. Unlike the usual cocktail party problem where multi-talker speech is separated using speaker clues, the key challenge here is to extract the target speech for KWS based on text clues. To address it, this paper proposes a novel Text-aware Permutation Determinization Training method for multi-talker KWS with a clue-based Speech Separation front-end (TPDT-SS). Our research highlights the critical role of SS front-ends and shows that incorporating keyword-specific clues into these models can greatly enhance the effectiveness. TPDT-SS shows remarkable success in addressing permutation problems in mixed keyword speech, thereby greatly boosting the performance of the backend. Additionally, fine-tuning our system on unseen mixed speech results in further performance improvement.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
Approximation Algorithms for Smallest Intersecting Balls
Authors:
Jiaqi Zheng,
Tiow-Seng Tan
Abstract:
We study a general smallest intersecting ball problem and its soft-margin variant in high-dimensional Euclidean spaces, which only require the input objects to be compact and convex. These two problems link and unify a series of fundamental problems in computational geometry and machine learning, including smallest enclosing ball, polytope distance, intersection radius, $\ell_1$-loss support vecto…
▽ More
We study a general smallest intersecting ball problem and its soft-margin variant in high-dimensional Euclidean spaces, which only require the input objects to be compact and convex. These two problems link and unify a series of fundamental problems in computational geometry and machine learning, including smallest enclosing ball, polytope distance, intersection radius, $\ell_1$-loss support vector machine, $\ell_1$-loss support vector data description, and so on. Two general approximation algorithms are presented respectively, and implementation details are given for specific inputs of convex polytopes, reduced polytopes, axis-aligned bounding boxes, balls, and ellipsoids. For most of these inputs, our algorithms are the first results in high-dimensional spaces, and also the first approximation methods. To achieve this, we develop a novel framework for approximating zero-sum games in Euclidean Jordan algebra systems, which may be useful in its own right.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Enhancing End-to-End Autonomous Driving with Latent World Model
Authors:
Yingyan Li,
Lue Fan,
Jiawei He,
Yuqi Wang,
Yuntao Chen,
Zhaoxiang Zhang,
Tieniu Tan
Abstract:
End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to e…
▽ More
End-to-end autonomous driving has garnered widespread attention. Current end-to-end approaches largely rely on the supervision from perception tasks such as detection, tracking, and map segmentation to aid in learning scene representations. However, these methods require extensive annotations, hindering the data scalability. To address this challenge, we propose a novel self-supervised method to enhance end-to-end driving without the need for costly labels. Specifically, our framework \textbf{LAW} uses a LAtent World model to predict future latent features based on the predicted ego actions and the latent feature of the current frame. The predicted latent features are supervised by the actually observed features in the future. This supervision jointly optimizes the latent feature learning and action prediction, which greatly enhances the driving performance. As a result, our approach achieves state-of-the-art performance in both open-loop and closed-loop benchmarks without costly annotations.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Can Large Language Models Understand Spatial Audio?
Authors:
Changli Tang,
Wenyi Yu,
Guangzhi Sun,
Xianzhao Chen,
Tian Tan,
Wei Li,
Jun Zhang,
Lu Lu,
Zejun Ma,
Yuxuan Wang,
Chao Zhang
Abstract:
This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and lo…
▽ More
This paper explores enabling large language models (LLMs) to understand spatial information from multichannel audio, a skill currently lacking in auditory LLMs. By leveraging LLMs' advanced cognitive and inferential abilities, the aim is to enhance understanding of 3D environments via audio. We study 3 spatial audio tasks: sound source localization (SSL), far-field speech recognition (FSR), and localisation-informed speech extraction (LSE), achieving notable progress in each task. For SSL, our approach achieves an MAE of $2.70^{\circ}$ on the Spatial LibriSpeech dataset, substantially surpassing the prior benchmark of about $6.60^{\circ}$. Moreover, our model can employ spatial cues to improve FSR accuracy and execute LSE by selectively attending to sounds originating from a specified direction via text prompts, even amidst overlapping speech. These findings highlight the potential of adapting LLMs to grasp physical audio concepts, paving the way for LLM-based agents in 3D environments.
△ Less
Submitted 14 June, 2024; v1 submitted 12 June, 2024;
originally announced June 2024.
-
Three super-Earths and a possible water world from TESS and ESPRESSO
Authors:
M. J. Hobson,
F. Bouchy,
B. Lavie,
C. Lovis,
V. Adibekyan,
C. Allende Prieto,
Y. Alibert,
S. C. C. Barros,
A. Castro-González,
S. Cristiani,
V. D'Odorico,
M. Damasso,
P. Di Marcantonio,
X. Dumusque,
D. Ehrenreich,
P. Figueira,
R. Génova Santos,
J. I. González Hernández,
J. Lillo-Box,
G. Lo Curto,
C. J. A. P. Martins,
A. Mehner,
G. Micela,
P. Molaro,
N. J. Nunes
, et al. (29 additional authors not shown)
Abstract:
Since 2018, the ESPRESSO spectrograph at the VLT has been hunting for planets in the Southern skies via the RV method. One of its goals is to follow up candidate planets from transit surveys such as the TESS mission, particularly small planets. We analyzed photometry from TESS and ground-based facilities, high-resolution imaging, and RVs from ESPRESSO, HARPS, and HIRES, to confirm and characterize…
▽ More
Since 2018, the ESPRESSO spectrograph at the VLT has been hunting for planets in the Southern skies via the RV method. One of its goals is to follow up candidate planets from transit surveys such as the TESS mission, particularly small planets. We analyzed photometry from TESS and ground-based facilities, high-resolution imaging, and RVs from ESPRESSO, HARPS, and HIRES, to confirm and characterize three new planets: TOI-260 b, transiting a late K-dwarf, and TOI-286 b and c, orbiting an early K-dwarf. We also update parameters for the known super-Earth TOI-134 b , hosted by an M-dwarf. TOI-260 b has a $13.475853^{+0.000013}_{-0.000011}$ d period, $4.23 \pm1.60 \mathrm{M_\oplus}$ mass and $1.71\pm0.08\mathrm{R_\oplus}$ radius. For TOI-286 b we find a $4.5117244^{+0.0000031}_{-0.0000027}$ d period, $4.53\pm0.78\mathrm{M_\oplus}$ mass and $1.42\pm0.10\mathrm{R_\oplus}$ radius; for TOI-286 c, a $39.361826^{+0.000070}_{-0.000081}$ d period, $3.72\pm2.22\mathrm{M_\oplus}$ mass and $1.88\pm 0.12\mathrm{R_\oplus}$ radius. For TOI-134 b we obtain a $1.40152604^{+0.00000074}_{-0.00000082}$ d period, $4.07\pm0.45\mathrm{M_\oplus}$ mass, and $1.63\pm0.14\mathrm{R_\oplus}$ radius. Circular models are preferred for all, although for TOI-260 b the eccentricity is not well-constrained. We compute bulk densities and place the planets in the context of composition models. TOI-260 b lies within the radius valley, and is most likely a rocky planet. However, the uncertainty on the eccentricity and thus on the mass renders its composition hard to determine. TOI-286 b and c span the radius valley, with TOI-286 b lying below it and having a likely rocky composition, while TOI-286 c is within the valley, close to the upper border, and probably has a significant water fraction. With our updated parameters for TOI-134 b, we obtain a lower density than previous findings, giving a rocky or Earth-like composition.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
UniUSNet: A Promptable Framework for Universal Ultrasound Disease Prediction and Tissue Segmentation
Authors:
Zehui Lin,
Zhuoneng Zhang,
Xindi Hu,
Zhifan Gao,
Xin Yang,
Yue Sun,
Dong Ni,
Tao Tan
Abstract:
Ultrasound is a widely used imaging modality in clinical practice due to its low cost, portability, and safety. Current research in general AI for healthcare focuses on large language models and general segmentation models, with insufficient attention to solutions addressing both disease prediction and tissue segmentation. In this study, we propose a novel universal framework for ultrasound, namel…
▽ More
Ultrasound is a widely used imaging modality in clinical practice due to its low cost, portability, and safety. Current research in general AI for healthcare focuses on large language models and general segmentation models, with insufficient attention to solutions addressing both disease prediction and tissue segmentation. In this study, we propose a novel universal framework for ultrasound, namely UniUSNet, which is a promptable framework for ultrasound image classification and segmentation. The universality of this model is derived from its versatility across various aspects. It proficiently manages any ultrasound nature, any anatomical position, any input type and excelling not only in segmentation tasks but also in classification tasks. We introduce a novel module that incorporates this information as a prompt and seamlessly embedding it within the model's learning process. To train and validate our proposed model, we curated a comprehensive ultrasound dataset from publicly accessible sources, encompassing up to 7 distinct anatomical positions with over 9.7K annotations. Experimental results demonstrate that our model achieves performance comparable to state-of-the-art models, and surpasses both a model trained on a single dataset and an ablated version of the network lacking prompt guidance. Additionally, we conducted zero-shot and fine-tuning experiments on new datasets, which proved that our model possesses strong generalization capabilities and can be effectively adapted to new data at low cost through its adapter module. We will continuously expand the dataset and optimize the task specific prompting mechanism towards the universality in medical ultrasound. Model weights, data processing workflows, and code will be open source to the public (https://github.com/Zehui-Lin/UniUSNet).
△ Less
Submitted 20 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Towards Clinical AI Fairness: Filling Gaps in the Puzzle
Authors:
Mingxuan Liu,
Yilin Ning,
Salinelat Teixayavong,
Xiaoxuan Liu,
Mayli Mertens,
Yuqing Shang,
Xin Li,
Di Miao,
Jie Xu,
Daniel Shu Wei Ting,
Lionel Tim-Ee Cheng,
Jasmine Chiat Ling Ong,
Zhen Ling Teo,
Ting Fang Tan,
Narrendar RaviChandran,
Fei Wang,
Leo Anthony Celi,
Marcus Eng Hock Ong,
Nan Liu
Abstract:
The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical adva…
▽ More
The ethical integration of Artificial Intelligence (AI) in healthcare necessitates addressing fairness-a concept that is highly context-specific across medical fields. Extensive studies have been conducted to expand the technical components of AI fairness, while tremendous calls for AI fairness have been raised from healthcare. Despite this, a significant disconnect persists between technical advancements and their practical clinical applications, resulting in a lack of contextualized discussion of AI fairness in clinical settings. Through a detailed evidence gap analysis, our review systematically pinpoints several deficiencies concerning both healthcare data and the provided AI fairness solutions. We highlight the scarcity of research on AI fairness in many medical domains where AI technology is increasingly utilized. Additionally, our analysis highlights a substantial reliance on group fairness, aiming to ensure equality among demographic groups from a macro healthcare system perspective; in contrast, individual fairness, focusing on equity at a more granular level, is frequently overlooked. To bridge these gaps, our review advances actionable strategies for both the healthcare and AI research communities. Beyond applying existing AI fairness methods in healthcare, we further emphasize the importance of involving healthcare professionals to refine AI fairness concepts and methods to ensure contextually relevant and ethically sound AI applications in healthcare.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
Benchmarking bosonic modes for quantum information with randomized displacements
Authors:
Christophe H. Valahu,
Tomas Navickas,
Michael J. Biercuk,
Ting Rei Tan
Abstract:
Bosonic modes are prevalent in all aspects of quantum information processing. However, existing tools for characterizing the quality, stability, and noise properties of bosonic modes are limited, especially in a driven setting. Here, we propose, demonstrate, and analyze a bosonic randomized benchmarking (BRB) protocol that uses randomized displacements of the bosonic modes in phase space to determ…
▽ More
Bosonic modes are prevalent in all aspects of quantum information processing. However, existing tools for characterizing the quality, stability, and noise properties of bosonic modes are limited, especially in a driven setting. Here, we propose, demonstrate, and analyze a bosonic randomized benchmarking (BRB) protocol that uses randomized displacements of the bosonic modes in phase space to determine their quality. We investigate the impact of common analytic error models, such as heating and dephasing, on the distribution of outcomes over randomized displacement trajectories in phase space. We show that analyzing the distinctive behavior of the mean and variance of this distribution - describable as a gamma distribution - enables identification of error processes, and quantitative extraction of error rates and correlations using a minimal number of measurements. We experimentally validate the analytical models by injecting engineered noise into the motional mode of a trapped ion system and performing the bosonic randomized benchmarking protocol, showing good agreement between experiment and theory. Finally, we investigate the intrinsic error properties in our system, identifying the presence of highly correlated dephasing noise as the dominant process.
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
CMB lensing and Lyα forest cross bispectrum from DESI's first-year quasar sample
Authors:
N. G. Karaçaylı,
P. Martini,
D. H. Weinberg,
S. Ferraro,
R. de Belsunce,
J. Aguilar,
S. Ahlen,
E. Armengaud,
D. Brooks,
T. Claybaugh,
A. de la Macorra,
B. Dey,
P. Doel,
K. Fanning,
J. E. Forero-Romero,
S. Gontcho A Gontcho,
A. X. Gonzalez-Morales,
G. Gutierrez,
J. Guy,
K. Honscheid,
D. Kirkby,
T. Kisner,
A. Kremin,
A. Lambert,
M. Landriau
, et al. (28 additional authors not shown)
Abstract:
The squeezed cross-bispectrum \bispeconed\ between the gravitational lensing in the Cosmic Microwave Background and the 1D \lya\ forest power spectrum can constrain bias parameters and break degeneracies between $σ_8$ and other cosmological parameters. We detect \bispeconed\ with $4.8σ$ significance at an effective redshift $z_\mathrm{eff}=2.4$ using Planck PR3 lensing map and over 280,000 quasar…
▽ More
The squeezed cross-bispectrum \bispeconed\ between the gravitational lensing in the Cosmic Microwave Background and the 1D \lya\ forest power spectrum can constrain bias parameters and break degeneracies between $σ_8$ and other cosmological parameters. We detect \bispeconed\ with $4.8σ$ significance at an effective redshift $z_\mathrm{eff}=2.4$ using Planck PR3 lensing map and over 280,000 quasar spectra from the Dark Energy Spectroscopic Instrument's first-year data. We test our measurement against metal contamination and foregrounds such as Galactic extinction and clusters of galaxies by deprojecting the thermal Sunyaev-Zeldovich effect. We compare our results to a tree-level perturbation theory calculation and find reasonable agreement between the model and measurement.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models
Authors:
Yiming Chen,
Chen Zhang,
Danqing Luo,
Luis Fernando D'Haro,
Robby T. Tan,
Haizhou Li
Abstract:
The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks.…
▽ More
The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks. To address the problem, we introduce AdvEval, a novel black-box adversarial framework against NLG evaluators. AdvEval is specially tailored to generate data that yield strong disagreements between human and victim evaluators. Specifically, inspired by the recent success of large language models (LLMs) in text generation and evaluation, we adopt strong LLMs as both the data generator and gold evaluator. Adversarial data are automatically optimized with feedback from the gold and victim evaluator. We conduct experiments on 12 victim evaluators and 11 NLG datasets, spanning tasks including dialogue, summarization, and question evaluation. The results show that AdvEval can lead to significant performance degradation of various victim metrics, thereby validating its efficacy.
△ Less
Submitted 23 May, 2024;
originally announced May 2024.
-
Wide Binary Orbits are Preferentially Aligned with the Orbits of Small Planets, but Probably Not Hot Jupiters
Authors:
Sam Christian,
Andrew Vanderburg,
Juliette Becker,
Adam L. Kraus,
Logan Pearce,
Karen A. Collins,
Malena Rice,
Eric L. N. Jensen,
David Baker,
Paul Benni,
Allyson Bieryla,
Abraham Binnenfeld,
Kevin I. Collins,
Dennis M. Conti,
Phil Evans,
Eric Girardin,
Joao Gregorio,
Tsevi Mazeh,
Felipe Murgas,
Aviad Panahi,
Francisco J. Pozuelos,
Howard M. Relles,
Fabian Rodriguez Frustaglia,
Richard P. Schwarz,
Gregor Srdoc
, et al. (6 additional authors not shown)
Abstract:
Studying the relative orientations of the orbits of exoplanets and wide-orbiting binary companions (semimajor axis greater than 100 AU) can shed light on how planets form and evolve in binary systems. Previous observations by multiple groups discovered a possible alignment between the orbits of visual binaries and the exoplanets that reside in them. In this study, using data from \textit{Gaia} DR3…
▽ More
Studying the relative orientations of the orbits of exoplanets and wide-orbiting binary companions (semimajor axis greater than 100 AU) can shed light on how planets form and evolve in binary systems. Previous observations by multiple groups discovered a possible alignment between the orbits of visual binaries and the exoplanets that reside in them. In this study, using data from \textit{Gaia} DR3 and TESS, we confirm the existence of an alignment between the orbits of small planets $(R<6 R_\oplus)$ and binary systems with semimajor axes below 700 AU ($p=10^{-6}$). However, we find no statistical evidence for alignment between planet and binary orbits for binary semimajor axes greater than 700 AU, and no evidence for alignment of large, closely-orbiting planets (mostly hot Jupiters) and binaries at any separation. The lack of orbital alignment between our large planet sample and their binary companions appears significantly different from our small planet sample, even taking into account selection effects. Therefore, we conclude that any alignment between wide-binaries and our sample of large planets (predominantly hot Jupiters) is probably not as strong as what we observe for small planets in binaries with semimajor axes less than 700 AU. The difference in the alignment distribution of hot Jupiters and smaller planets may be attributed to the unique evolutionary mechanisms occuring in systems that form hot Jupiters, including potentially destabilizing secular resonances that onset as the protoplanetary disk dissipates and high-eccentricity migration occurring after the disk is gone.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Validation of the DESI 2024 Lyman Alpha Forest BAL Masking Strategy
Authors:
Paul Martini,
A. Cuceu,
L. Ennesser,
A. Brodzeller,
J. Aguilar,
S. Ahlen,
D. Brooks,
T. Claybaugh,
R. de Belsunce,
A. de la Macorra,
Arjun Dey,
P. Doel,
J. E. Forero-Romero,
E. Gaztañaga,
S. Gontcho A Gontcho,
J. Guy,
H. K. Herrera-Alcantar,
K. Honscheid,
N. G. Karaçaylı,
T. Kisner,
A. Kremin,
A. Lambert,
L. Le Guillou,
M. Manera,
A. Meisner
, et al. (22 additional authors not shown)
Abstract:
Broad absorption line quasars (BALs) exhibit blueshifted absorption relative to a number of their prominent broad emission features. These absorption features can contribute to quasar redshift errors and add absorption to the Lyman-alpha (LyA) forest that is unrelated to large-scale structure. We present a detailed analysis of the impact of BALs on the Baryon Acoustic Oscillation (BAO) results wit…
▽ More
Broad absorption line quasars (BALs) exhibit blueshifted absorption relative to a number of their prominent broad emission features. These absorption features can contribute to quasar redshift errors and add absorption to the Lyman-alpha (LyA) forest that is unrelated to large-scale structure. We present a detailed analysis of the impact of BALs on the Baryon Acoustic Oscillation (BAO) results with the LyA forest from the first year of data from the Dark Energy Spectroscopic Instrument (DESI). The baseline strategy for the first year analysis is to mask all pixels associated with all BAL absorption features that fall within the wavelength region used to measure the forest. We explore a range of alternate masking strategies and demonstrate that these changes have minimal impact on the BAO measurements with both DESI data and synthetic data. This includes when we mask the BAL features associated with emission lines outside of the forest region to minimize their contribution to redshift errors. We identify differences in the properties of BALs in the synthetic datasets relative to the observational data, as well as use the synthetic observations to characterize the completeness of the BAL identification algorithm, and demonstrate that incompleteness and differences in the BALs between real and synthetic data also do not impact the BAO results for the LyA forest.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
A Primal-Dual Framework for Symmetric Cone Programming
Authors:
Jiaqi Zheng,
Antonios Varvitsiotis,
Tiow-Seng Tan,
Wayne Lin
Abstract:
In this paper, we introduce a primal-dual algorithmic framework for solving Symmetric Cone Programs (SCPs), a versatile optimization model that unifies and extends Linear, Second-Order Cone (SOCP), and Semidefinite Programming (SDP). Our work generalizes the primal-dual framework for SDPs introduced by Arora and Kale, leveraging a recent extension of the Multiplicative Weights Update method (MWU)…
▽ More
In this paper, we introduce a primal-dual algorithmic framework for solving Symmetric Cone Programs (SCPs), a versatile optimization model that unifies and extends Linear, Second-Order Cone (SOCP), and Semidefinite Programming (SDP). Our work generalizes the primal-dual framework for SDPs introduced by Arora and Kale, leveraging a recent extension of the Multiplicative Weights Update method (MWU) to symmetric cones. Going beyond existing works, our framework can handle SOCPs and mixed SCPs, exhibits nearly linear time complexity, and can be effectively parallelized. To illustrate the efficacy of our framework, we employ it to develop approximation algorithms for two geometric optimization problems: the Smallest Enclosing Sphere problem and the Support Vector Machine problem. Our theoretical analyses demonstrate that the two algorithms compute approximate solutions in nearly linear running time and with parallel depth scaling polylogarithmically with the input size. We compare our algorithms against CGAL as well as interior point solvers applied to these problems. Experiments show that our algorithms are highly efficient when implemented on a CPU and achieve substantial speedups when parallelized on a GPU, allowing us to solve large-scale instances of these problems.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Charting the Path Forward: CT Image Quality Assessment -- An In-Depth Review
Authors:
Siyi Xun,
Qiaoyu Li,
Xiaohong Liu,
Guangtao Zhai,
Mingxiang Wu,
Tao Tan
Abstract:
Computed Tomography (CT) is a frequently utilized imaging technology that is employed in the clinical diagnosis of many disorders. However, clinical diagnosis, data storage, and management are posed huge challenges by a huge volume of non-homogeneous CT data in terms of imaging quality. As a result, the quality assessment of CT images is a crucial problem that demands consideration. The history, a…
▽ More
Computed Tomography (CT) is a frequently utilized imaging technology that is employed in the clinical diagnosis of many disorders. However, clinical diagnosis, data storage, and management are posed huge challenges by a huge volume of non-homogeneous CT data in terms of imaging quality. As a result, the quality assessment of CT images is a crucial problem that demands consideration. The history, advancements in research, and current developments in CT image quality assessment (IQA) are examined in this paper. In this review, we collected and researched more than 500 CT-IQA publications published before August 2023. And we provide the visualization analysis of keywords and co-citations in the knowledge graph of these papers. Prospects and obstacles for the continued development of CT-IQA are also covered. At present, significant research branches in the CT-IQA domain include Phantom study, Artificial intelligence deep-learning reconstruction algorithm, Dose reduction opportunity, and Virtual monoenergetic reconstruction. Artificial intelligence (AI)-based CT-IQA also becomes a trend. It increases the accuracy of the CT scanning apparatus, amplifies the impact of the CT system reconstruction algorithm, and creates an effective algorithm for post-processing CT images. AI-based medical IQA offers excellent application opportunities in clinical work. AI can provide uniform quality assessment criteria and more comprehensive guidance amongst various healthcare facilities, and encourage them to identify one another's images. It will help lower the number of unnecessary tests and associated costs, and enhance the quality of medical imaging and assessment efficiency.
△ Less
Submitted 30 April, 2024;
originally announced May 2024.
-
6G comprehensive intelligence: network operations and optimization based on Large Language Models
Authors:
Sifan Long,
Fengxiao Tang,
Yangfan Li,
Tiao Tan,
Zhengjie Jin,
Ming Zhao,
Nei Kato
Abstract:
The sixth generation mobile communication standard (6G) can promote the development of Industrial Internet and Internet of Things (IoT). To achieve comprehensive intelligent development of the network and provide customers with higher quality personalized services. This paper proposes a network performance optimization and intelligent operation network architecture based on Large Language Model (L…
▽ More
The sixth generation mobile communication standard (6G) can promote the development of Industrial Internet and Internet of Things (IoT). To achieve comprehensive intelligent development of the network and provide customers with higher quality personalized services. This paper proposes a network performance optimization and intelligent operation network architecture based on Large Language Model (LLM), aiming to build a comprehensive intelligent 6G network system. The Large Language Model, with more parameters and stronger learning ability, can more accurately capture patterns and features in data, which can achieve more accurate content output and high intelligence and provide strong support for related research such as network data security, privacy protection, and health assessment. This paper also presents the design framework of a network health assessment system based on LLM and focuses on its potential application value, through the case of network health management system, it is fully demonstrated that the 6G intelligent network system based on LLM has important practical significance for the comprehensive realization of intelligence.
△ Less
Submitted 28 April, 2024;
originally announced April 2024.
-
Decidability of Graph Neural Networks via Logical Characterizations
Authors:
Michael Benedikt,
Chia-Hsuan Lu,
Boris Motik,
Tony Tan
Abstract:
We present results concerning the expressiveness and decidability of a popular graph learning formalism, graph neural networks (GNNs), exploiting connections with logic. We use a family of recently-discovered decidable logics involving "Presburger quantifiers". We show how to use these logics to measure the expressiveness of classes of GNNs, in some cases getting exact correspondences between the…
▽ More
We present results concerning the expressiveness and decidability of a popular graph learning formalism, graph neural networks (GNNs), exploiting connections with logic. We use a family of recently-discovered decidable logics involving "Presburger quantifiers". We show how to use these logics to measure the expressiveness of classes of GNNs, in some cases getting exact correspondences between the expressiveness of logics and GNNs. We also employ the logics, and the techniques used to analyze them, to obtain decision procedures for verification problems over GNNs. We complement this with undecidability results for static analysis problems involving the logics, as well as for GNN verification problems.
△ Less
Submitted 23 May, 2024; v1 submitted 28 April, 2024;
originally announced April 2024.
-
MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion
Authors:
Jingxue Huang,
Xilai Li,
Tianshu Tan,
Xiaosong Li,
Tao Ye
Abstract:
Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space, thereby producing an informative fused image. However, the existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality in certain regions of the fusion results. In this study, we analyzed the…
▽ More
Multi-modal image fusion (MMIF) maps useful information from various modalities into the same representation space, thereby producing an informative fused image. However, the existing fusion algorithms tend to symmetrically fuse the multi-modal images, causing the loss of shallow information or bias towards a single modality in certain regions of the fusion results. In this study, we analyzed the spatial distribution differences of information in different modalities and proved that encoding features within the same network is not conducive to achieving simultaneous deep feature space alignment for multi-modal images. To overcome this issue, a Multi-Modal Asymmetric UNet (MMA-UNet) was proposed. We separately trained specialized feature encoders for different modal and implemented a cross-scale fusion strategy to maintain the features from different modalities within the same representation space, ensuring a balanced information fusion process. Furthermore, extensive fusion and downstream task experiments were conducted to demonstrate the efficiency of MMA-UNet in fusing infrared and visible image information, producing visually natural and semantically rich fusion results. Its performance surpasses that of the state-of-the-art comparison fusion methods.
△ Less
Submitted 11 July, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
Laser excitation of the $^{229}$Th nuclear isomeric transition in a solid-state host
Authors:
R. Elwell,
Christian Schneider,
Justin Jeet,
J. E. S. Terhune,
H. W. T. Morgan,
A. N. Alexandrova,
H. B. Tran Tan,
Andrei Derevianko,
Eric R. Hudson
Abstract:
LiSrAlF$_6$ crystals doped with $^{229}$Th are used in a laser-based search for the nuclear isomeric transition. Two spectroscopic features near the nuclear transition energy are observed. The first is a broad excitation feature that produces red-shifted fluorescence that decays with a timescale of a few seconds. The second is a narrow, laser-linewidth-limited spectral feature at…
▽ More
LiSrAlF$_6$ crystals doped with $^{229}$Th are used in a laser-based search for the nuclear isomeric transition. Two spectroscopic features near the nuclear transition energy are observed. The first is a broad excitation feature that produces red-shifted fluorescence that decays with a timescale of a few seconds. The second is a narrow, laser-linewidth-limited spectral feature at $148.38219(4)_{\textrm{stat}}(20)_{\textrm{sys}}$ nm ($2020407.3(5)_{\textrm{stat}}(30)_{\textrm{sys}}$ GHz) that decays with a lifetime of $568(13)_{\textrm{stat}}(20)_{\textrm{sys}}$ s. This feature is assigned to the excitation of the $^{229}$Th nuclear isomeric state, whose energy is found to be $8.355733(2)_{\textrm{stat}}(10)_{\textrm{sys}}$ eV in $^{229}$Th:\thor:LiSrAlF$_6$.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.
-
FusionMamba: Dynamic Feature Enhancement for Multimodal Image Fusion with Mamba
Authors:
Xinyu Xie,
Yawen Cui,
Chio-In Ieong,
Tao Tan,
Xiaozhi Zhang,
Xubin Zheng,
Zitong Yu
Abstract:
Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confro…
▽ More
Multi-modal image fusion aims to combine information from different modes to create a single image with comprehensive information and detailed textures. However, fusion models based on convolutional neural networks encounter limitations in capturing global image features due to their focus on local convolution operations. Transformer-based models, while excelling in global feature modeling, confront computational challenges stemming from their quadratic complexity. Recently, the Selective Structured State Space Model has exhibited significant potential for long-range dependency modeling with linear complexity, offering a promising avenue to address the aforementioned dilemma. In this paper, we propose FusionMamba, a novel dynamic feature enhancement method for multimodal image fusion with Mamba. Specifically, we devise an improved efficient Mamba model for image fusion, integrating efficient visual state space model with dynamic convolution and channel attention. This refined model not only upholds the performance of Mamba and global modeling capability but also diminishes channel redundancy while enhancing local enhancement capability. Additionally, we devise a dynamic feature fusion module (DFFM) comprising two dynamic feature enhancement modules (DFEM) and a cross modality fusion mamba module (CMFM). The former serves for dynamic texture enhancement and dynamic difference perception, whereas the latter enhances correlation features between modes and suppresses redundant intermodal information. FusionMamba has yielded state-of-the-art (SOTA) performance across various multimodal medical image fusion tasks (CT-MRI, PET-MRI, SPECT-MRI), infrared and visible image fusion task (IR-VIS) and multimodal biomedical image fusion dataset (GFP-PC), which is proved that our model has generalization ability. The code for FusionMamba is available at https://github.com/millieXie/FusionMamba.
△ Less
Submitted 20 April, 2024; v1 submitted 15 April, 2024;
originally announced April 2024.
-
Stable Acceleration of a LHe-Free Nb3Sn demo SRF e-linac Based on Conduction Cooling
Authors:
Ziqin Yang,
Yuan He,
Tiancai Jiang,
Feng Bai,
Fengfeng Wang,
Weilong Chen,
Guangze Jiang,
Yimeng Chu,
Hangxu Li,
Bo Zhao,
Guozhen Sun,
Zongheng Xue,
Yugang Zhao,
Zheng Gao,
Yaguang Li,
Pingran Xiong,
Hao Guo,
Liepeng Sun,
Guirong Huang,
Zhijun Wang,
Junhui Zhang,
Teng Tan,
Hongwei Zhao,
Wenlong Zhan
Abstract:
The design, construction, and commissioning of a conduction-cooled Nb3Sn demonstration superconducting radio frequency (SRF) electron accelerator at the Institute of Modern Physics of the Chinese Academy of Sciences (IMP, CAS) will be presented. In the context of engineering application planning for Nb3Sn thin-film SRF cavities within the CiADS project, a 650MHz 5-cell elliptical cavity was coated…
▽ More
The design, construction, and commissioning of a conduction-cooled Nb3Sn demonstration superconducting radio frequency (SRF) electron accelerator at the Institute of Modern Physics of the Chinese Academy of Sciences (IMP, CAS) will be presented. In the context of engineering application planning for Nb3Sn thin-film SRF cavities within the CiADS project, a 650MHz 5-cell elliptical cavity was coated using the vapor diffusion method for electron beam acceleration. Through high-precision collaborative control of 10 GM cryocooler, slow cooldown of the cavity crossing 18K is achieved accompanied by obviously characteristic magnetic flux expulsion. The horizontal test results of the liquid helium-free (LHe-free) cryomodule show that the cavity can operate steadily at Epk=6.02MV/m in continuous wave (CW) mode, and at Epk=14.90MV/m in 40% duty cycle pulse mode. The beam acceleration experiment indicates that the maximum average current of the electron beam in the macropulse after acceleration exceeds 200mA, with a maximum energy gain of 4.6MeV. The results provide a principle validation for the engineering application of Nb3Sn thin-film SRF cavities, highlighting the promising industrial application prospects of a small-scale compact Nb3Sn SRF accelerator driven by commercial cryocoolers.
△ Less
Submitted 14 April, 2024;
originally announced April 2024.
-
Poincaré disk as a model of squeezed states of a harmonic oscillator
Authors:
Ian Chi,
Martin Fraas,
Tina Tan
Abstract:
Single-mode squeezed states exhibit a direct correspondence with points on the Poincaré disk. In this study, we delve into this correspondence and describe the motions of the disk generated by a quadratic Hamiltonian. This provides a geometric representation of squeezed states and their evolution. We discuss applications in bang-bang and adiabatic control problems involving squeezed states.
Single-mode squeezed states exhibit a direct correspondence with points on the Poincaré disk. In this study, we delve into this correspondence and describe the motions of the disk generated by a quadratic Hamiltonian. This provides a geometric representation of squeezed states and their evolution. We discuss applications in bang-bang and adiabatic control problems involving squeezed states.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Validation of the DESI 2024 Ly$α$ forest BAO analysis using synthetic datasets
Authors:
Andrei Cuceu,
Hiram K. Herrera-Alcantar,
Calum Gordon,
Paul Martini,
Julien Guy,
Andreu Font-Ribera,
Alma X. Gonzalez-Morales,
M. Abdul Karim,
J. Aguilar,
S. Ahlen,
E. Armengaud,
A. Bault,
D. Brooks,
T. Claybaugh,
A. de la Macorra,
P. Doel,
K. Fanning,
S. Ferraro,
J. E. Forero-Romero,
E. Gaztañaga,
S. Gontcho A Gontcho,
G. Gutierrez,
K. Honscheid,
C. Howlett,
N. G. Karaçaylı
, et al. (34 additional authors not shown)
Abstract:
The first year of data from the Dark Energy Spectroscopic Instrument (DESI) contains the largest set of Lyman-$α$ (Ly$α$) forest spectra ever observed. This data, collected in the DESI Data Release 1 (DR1) sample, has been used to measure the Baryon Acoustic Oscillation (BAO) feature at redshift $z=2.33$. In this work, we use a set of 150 synthetic realizations of DESI DR1 to validate the DESI 202…
▽ More
The first year of data from the Dark Energy Spectroscopic Instrument (DESI) contains the largest set of Lyman-$α$ (Ly$α$) forest spectra ever observed. This data, collected in the DESI Data Release 1 (DR1) sample, has been used to measure the Baryon Acoustic Oscillation (BAO) feature at redshift $z=2.33$. In this work, we use a set of 150 synthetic realizations of DESI DR1 to validate the DESI 2024 Ly$α$ forest BAO measurement. The synthetic data sets are based on Gaussian random fields using the log-normal approximation. We produce realistic synthetic DESI spectra that include all major contaminants affecting the Ly$α$ forest. The synthetic data sets span a redshift range $1.8<z<3.8$, and are analysed using the same framework and pipeline used for the DESI 2024 Ly$α$ forest BAO measurement. To measure BAO, we use both the Ly$α$ auto-correlation and its cross-correlation with quasar positions. We use the mean of correlation functions from the set of DESI DR1 realizations to show that our model is able to recover unbiased measurements of the BAO position. We also fit each mock individually and study the population of BAO fits in order to validate BAO uncertainties and test our method for estimating the covariance matrix of the Ly$α$ forest correlation functions. Finally, we discuss the implications of our results and identify the needs for the next generation of Ly$α$ forest synthetic data sets, with the top priority being to simulate the effect of BAO broadening due to non-linear evolution.
△ Less
Submitted 5 May, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
Characterization of contaminants in the Lyman-alpha forest auto-correlation with DESI
Authors:
J. Guy,
S. Gontcho A Gontcho,
E. Armengaud,
A. Brodzeller,
A. Cuceu,
A. Font-Ribera,
H. K. Herrera-Alcantar,
N. G. Karaçaylı,
A. Muñoz-Gutiérrez,
M. Pieri,
I. Pérez-Ràfols,
C. Ramírez-Pérez,
C. Ravoux,
J. Rich,
M. Walther,
M. Abdul Karim,
J. Aguilar,
S. Ahlen,
A. Bault,
D. Brooks,
T. Claybaugh,
R. de la Cruz,
A. de la Macorra,
P. Doel,
K. Fanning
, et al. (39 additional authors not shown)
Abstract:
Baryon Acoustic Oscillations can be measured with sub-percent precision above redshift two with the Lyman-alpha forest auto-correlation and its cross-correlation with quasar positions. This is one of the key goals of the Dark Energy Spectroscopic Instrument (DESI) which started its main survey in May 2021. We present in this paper a study of the contaminants to the lyman-alpha forest which are mai…
▽ More
Baryon Acoustic Oscillations can be measured with sub-percent precision above redshift two with the Lyman-alpha forest auto-correlation and its cross-correlation with quasar positions. This is one of the key goals of the Dark Energy Spectroscopic Instrument (DESI) which started its main survey in May 2021. We present in this paper a study of the contaminants to the lyman-alpha forest which are mainly caused by correlated signals introduced by the spectroscopic data processing pipeline as well as astrophysical contaminants due to foreground absorption in the intergalactic medium. Notably, an excess signal caused by the sky background subtraction noise is present in the lyman-alpha auto-correlation in the first line-of-sight separation bin. We use synthetic data to isolate this contribution, we also characterize the effect of spectro-photometric calibration noise, and propose a simple model to account for both effects in the analysis of the lyman-alpha forest. We then measure the auto-correlation of the quasar flux transmission fraction of low redshift quasars, where there is no lyman-alpha forest absorption but only its contaminants. We demonstrate that we can interpret the data with a two-component model: data processing noise and triply ionized Silicon and Carbon auto-correlations. This result can be used to improve the modeling of the lyman-alpha auto-correlation function measured with DESI.
△ Less
Submitted 9 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
DESI 2024 VI: Cosmological Constraints from the Measurements of Baryon Acoustic Oscillations
Authors:
DESI Collaboration,
A. G. Adame,
J. Aguilar,
S. Ahlen,
S. Alam,
D. M. Alexander,
M. Alvarez,
O. Alves,
A. Anand,
U. Andrade,
E. Armengaud,
S. Avila,
A. Aviles,
H. Awan,
B. Bahr-Kalus,
S. Bailey,
C. Baltay,
A. Bault,
J. Behera,
S. BenZvi,
A. Bera,
F. Beutler,
D. Bianchi,
C. Blake,
R. Blum
, et al. (178 additional authors not shown)
Abstract:
We present cosmological results from the measurement of baryon acoustic oscillations (BAO) in galaxy, quasar and Lyman-$α$ forest tracers from the first year of observations from the Dark Energy Spectroscopic Instrument (DESI), to be released in the DESI Data Release 1. DESI BAO provide robust measurements of the transverse comoving distance and Hubble rate, or their combination, relative to the s…
▽ More
We present cosmological results from the measurement of baryon acoustic oscillations (BAO) in galaxy, quasar and Lyman-$α$ forest tracers from the first year of observations from the Dark Energy Spectroscopic Instrument (DESI), to be released in the DESI Data Release 1. DESI BAO provide robust measurements of the transverse comoving distance and Hubble rate, or their combination, relative to the sound horizon, in seven redshift bins from over 6 million extragalactic objects in the redshift range $0.1<z<4.2$. DESI BAO data alone are consistent with the standard flat $Λ$CDM cosmological model with a matter density $Ω_\mathrm{m}=0.295\pm 0.015$. Paired with a BBN prior and the robustly measured acoustic angular scale from the CMB, DESI requires $H_0=(68.52\pm0.62)$ km/s/Mpc. In conjunction with CMB anisotropies from Planck and CMB lensing data from Planck and ACT, we find $Ω_\mathrm{m}=0.307\pm 0.005$ and $H_0=(67.97\pm0.38)$ km/s/Mpc. Extending the baseline model with a constant dark energy equation of state parameter $w$, DESI BAO alone require $w=-0.99^{+0.15}_{-0.13}$. In models with a time-varying dark energy equation of state parametrized by $w_0$ and $w_a$, combinations of DESI with CMB or with SN~Ia individually prefer $w_0>-1$ and $w_a<0$. This preference is 2.6$σ$ for the DESI+CMB combination, and persists or grows when SN~Ia are added in, giving results discrepant with the $Λ$CDM model at the $2.5σ$, $3.5σ$ or $3.9σ$ levels for the addition of Pantheon+, Union3, or DES-SN5YR datasets respectively. For the flat $Λ$CDM model with the sum of neutrino mass $\sum m_ν$ free, combining the DESI and CMB data yields an upper limit $\sum m_ν< 0.072$ $(0.113)$ eV at 95% confidence for a $\sum m_ν>0$ $(\sum m_ν>0.059)$ eV prior. These neutrino-mass constraints are substantially relaxed in models beyond $Λ$CDM. [Abridged.]
△ Less
Submitted 24 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
DESI 2024 IV: Baryon Acoustic Oscillations from the Lyman Alpha Forest
Authors:
DESI Collaboration,
A. G. Adame,
J. Aguilar,
S. Ahlen,
S. Alam,
D. M. Alexander,
M. Alvarez,
O. Alves,
A. Anand,
U. Andrade,
E. Armengaud,
S. Avila,
A. Aviles,
H. Awan,
S. Bailey,
C. Baltay,
A. Bault,
J. Bautista,
J. Behera,
S. BenZvi,
F. Beutler,
D. Bianchi,
C. Blake,
R. Blum,
S. Brieden
, et al. (174 additional authors not shown)
Abstract:
We present the measurement of Baryon Acoustic Oscillations (BAO) from the Lyman-$α$ (Ly$α$) forest of high-redshift quasars with the first-year dataset of the Dark Energy Spectroscopic Instrument (DESI). Our analysis uses over $420\,000$ Ly$α$ forest spectra and their correlation with the spatial distribution of more than $700\,000$ quasars. An essential facet of this work is the development of a…
▽ More
We present the measurement of Baryon Acoustic Oscillations (BAO) from the Lyman-$α$ (Ly$α$) forest of high-redshift quasars with the first-year dataset of the Dark Energy Spectroscopic Instrument (DESI). Our analysis uses over $420\,000$ Ly$α$ forest spectra and their correlation with the spatial distribution of more than $700\,000$ quasars. An essential facet of this work is the development of a new analysis methodology on a blinded dataset. We conducted rigorous tests using synthetic data to ensure the reliability of our methodology and findings before unblinding. Additionally, we conducted multiple data splits to assess the consistency of the results and scrutinized various analysis approaches to confirm their robustness. For a given value of the sound horizon ($r_d$), we measure the expansion at $z_{\rm eff}=2.33$ with 2\% precision, $H(z_{\rm eff}) = (239.2 \pm 4.8) (147.09~{\rm Mpc} /r_d)$ km/s/Mpc. Similarly, we present a 2.4\% measurement of the transverse comoving distance to the same redshift, $D_M(z_{\rm eff}) = (5.84 \pm 0.14) (r_d/147.09~{\rm Mpc})$ Gpc. Together with other DESI BAO measurements at lower redshifts, these results are used in a companion paper to constrain cosmological parameters.
△ Less
Submitted 12 April, 2024; v1 submitted 3 April, 2024;
originally announced April 2024.
-
DESI 2024 III: Baryon Acoustic Oscillations from Galaxies and Quasars
Authors:
DESI Collaboration,
A. G. Adame,
J. Aguilar,
S. Ahlen,
S. Alam,
D. M. Alexander,
M. Alvarez,
O. Alves,
A. Anand,
U. Andrade,
E. Armengaud,
S. Avila,
A. Aviles,
H. Awan,
S. Bailey,
C. Baltay,
A. Bault,
J. Behera,
S. BenZvi,
F. Beutler,
D. Bianchi,
C. Blake,
R. Blum,
S. Brieden,
A. Brodzeller
, et al. (171 additional authors not shown)
Abstract:
We present the DESI 2024 galaxy and quasar baryon acoustic oscillations (BAO) measurements using over 5.7 million unique galaxy and quasar redshifts in the range 0.1<z<2.1. Divided by tracer type, we utilize 300,017 galaxies from the magnitude-limited Bright Galaxy Survey with 0.1<z<0.4, 2,138,600 Luminous Red Galaxies with 0.4<z<1.1, 2,432,022 Emission Line Galaxies with 0.8<z<1.6, and 856,652 qu…
▽ More
We present the DESI 2024 galaxy and quasar baryon acoustic oscillations (BAO) measurements using over 5.7 million unique galaxy and quasar redshifts in the range 0.1<z<2.1. Divided by tracer type, we utilize 300,017 galaxies from the magnitude-limited Bright Galaxy Survey with 0.1<z<0.4, 2,138,600 Luminous Red Galaxies with 0.4<z<1.1, 2,432,022 Emission Line Galaxies with 0.8<z<1.6, and 856,652 quasars with 0.8<z<2.1, over a ~7,500 square degree footprint. The analysis was blinded at the catalog-level to avoid confirmation bias. All fiducial choices of the BAO fitting and reconstruction methodology, as well as the size of the systematic errors, were determined on the basis of the tests with mock catalogs and the blinded data catalogs. We present several improvements to the BAO analysis pipeline, including enhancing the BAO fitting and reconstruction methods in a more physically-motivated direction, and also present results using combinations of tracers. We present a re-analysis of SDSS BOSS and eBOSS results applying the improved DESI methodology and find scatter consistent with the level of the quoted SDSS theoretical systematic uncertainties. With the total effective survey volume of ~ 18 Gpc$^3$, the combined precision of the BAO measurements across the six different redshift bins is ~0.52%, marking a 1.2-fold improvement over the previous state-of-the-art results using only first-year data. We detect the BAO in all of these six redshift bins. The highest significance of BAO detection is $9.1σ$ at the effective redshift of 0.93, with a constraint of 0.86% placed on the BAO scale. We find our measurements are systematically larger than the prediction of Planck-2018 LCDM model at z<0.8. We translate the results into transverse comoving distance and radial Hubble distance measurements, which are used to constrain cosmological models in our companion paper [abridged].
△ Less
Submitted 3 April, 2024;
originally announced April 2024.
-
3MOS: Multi-sources, Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching
Authors:
Yibin Ye,
Xichao Teng,
Shuo Chen,
Yijie Bian,
Tao Tan,
Zhang Li
Abstract:
Optical-SAR image matching is a fundamental task for image fusion and visual navigation. However, all large-scale open SAR dataset for methods development are collected from single platform, resulting in limited satellite types and spatial resolutions. Since images captured by different sensors vary significantly in both geometric and radiometric appearance, existing methods may fail to match corr…
▽ More
Optical-SAR image matching is a fundamental task for image fusion and visual navigation. However, all large-scale open SAR dataset for methods development are collected from single platform, resulting in limited satellite types and spatial resolutions. Since images captured by different sensors vary significantly in both geometric and radiometric appearance, existing methods may fail to match corresponding regions containing the same content. Besides, most of existing datasets have not been categorized based on the characteristics of different scenes. To encourage the design of more general multi-modal image matching methods, we introduce a large-scale Multi-sources,Multi-resolutions, and Multi-scenes dataset for Optical-SAR image matching(3MOS). It consists of 155K optical-SAR image pairs, including SAR data from six commercial satellites, with resolutions ranging from 1.25m to 12.5m. The data has been classified into eight scenes including urban, rural, plains, hills, mountains, water, desert, and frozen earth. Extensively experiments show that none of state-of-the-art methods achieve consistently superior performance across different sources, resolutions and scenes. In addition, the distribution of data has a substantial impact on the matching capability of deep learning models, this proposes the domain adaptation challenge in optical-SAR image matching. Our data and code will be available at:https://github.com/3M-OS/3MOS.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection
Authors:
Mikhail Kennerley,
Jian-Gang Wang,
Bharadwaj Veeravalli,
Robby T. Tan
Abstract:
Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, es…
▽ More
Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, especially where one class is a majority and the other minority, has a large impact on class bias. We propose Class-Aware Teacher (CAT) to address the class bias issue in the domain adaptation setting. In our work, we approximate the class relationships with our Inter-Class Relation module (ICRm) and exploit it to reduce the bias within the model. In this way, we are able to apply augmentations to highly related classes, both inter- and intra-domain, to boost the performance of minority classes while having minimal impact on majority classes. We further reduce the bias by implementing a class-relation weight to our classification loss. Experiments conducted on various datasets and ablation studies show that our method is able to address the class bias in the domain adaptation setting. On the Cityscapes to Foggy Cityscapes dataset, we attained a 52.5 mAP, a substantial improvement over the 51.2 mAP achieved by the state-of-the-art method.
△ Less
Submitted 28 March, 2024;
originally announced March 2024.
-
Local operator quench induced by two-dimensional inhomogeneous and homogeneous CFT Hamiltonians
Authors:
Weibo Mao,
Masahiro Nozaki,
Kotaro Tamaoka,
Mao Tian Tan
Abstract:
We explore non-equilibrium processes in two-dimensional conformal field theories (2d CFTs) due to the growth of operators induced by inhomogeneous and homogeneous Hamiltonians by investigating the time dependence of the partition function, energy density, and entanglement entropy. The non-equilibrium processes considered in this paper are constructed out of the Lorentzian and Euclidean time evolut…
▽ More
We explore non-equilibrium processes in two-dimensional conformal field theories (2d CFTs) due to the growth of operators induced by inhomogeneous and homogeneous Hamiltonians by investigating the time dependence of the partition function, energy density, and entanglement entropy. The non-equilibrium processes considered in this paper are constructed out of the Lorentzian and Euclidean time evolution governed by different Hamiltonians. We explore the effect of the time ordering on entanglement dynamics so that we find that in a free boson CFT and RCFTs, this time ordering does not affect the entanglement entropy, while in the holographic CFTs, it does. Our main finding is that in the holographic CFTs, the non-unitary time evolution induced by the inhomogeneous Hamiltonian can retain the initial state information longer than in the unitary time evolution.
△ Less
Submitted 2 April, 2024; v1 submitted 23 March, 2024;
originally announced March 2024.
-
Artifact Feature Purification for Cross-domain Detection of AI-generated Images
Authors:
Zheling Meng,
Bo Peng,
Jing Dong,
Tieniu Tan
Abstract:
In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, bring potential security risks to our society. Existing generated image detection methods suffer from performance drop when faced with out-of-domain generators and image scenes. To relieve this problem, we propose Artifact Purification Network (APN) to facilitate the artifact extraction fr…
▽ More
In the era of AIGC, the fast development of visual content generation technologies, such as diffusion models, bring potential security risks to our society. Existing generated image detection methods suffer from performance drop when faced with out-of-domain generators and image scenes. To relieve this problem, we propose Artifact Purification Network (APN) to facilitate the artifact extraction from generated images through the explicit and implicit purification processes. For the explicit one, a suspicious frequency-band proposal method and a spatial feature decomposition method are proposed to extract artifact-related features. For the implicit one, a training strategy based on mutual information estimation is proposed to further purify the artifact-related features. Experiments show that for cross-generator detection, the average accuracy of APN is 5.6% ~ 16.4% higher than the previous 10 methods on GenImage dataset and 1.7% ~ 50.1% on DiffusionForensics dataset. For cross-scene detection, APN maintains its high performance. Via visualization analysis, we find that the proposed method extracts flexible forgery patterns and condenses the forgery information diluted in irrelevant features. We also find that the artifact features APN focuses on across generators and scenes are global and diverse. The code will be available on GitHub.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
NightHaze: Nighttime Image Dehazing via Self-Prior Learning
Authors:
Beibei Lin,
Yeying Jin,
Wending Yan,
Wei Ye,
Yuan Yuan,
Robby T. Tan
Abstract:
Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with s…
▽ More
Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with self-prior learning. Our main novelty lies in the design of severe augmentation, which allows our model to learn robust priors. Unlike MAE that uses masking, we leverage two key challenging factors of nighttime images as augmentation: light effects and noise. During training, we intentionally degrade clear images by blending them with light effects as well as by adding noise, and subsequently restore the clear images. This enables our model to learn clear background priors. By increasing the noise values to approach as high as the pixel intensity values of the glow and light effect blended images, our augmentation becomes severe, resulting in stronger priors. While our self-prior learning is considerably effective in suppressing glow and revealing details of background scenes, in some cases, there are still some undesired artifacts that remain, particularly in the forms of over-suppression. To address these artifacts, we propose a self-refinement module based on the semi-supervised teacher-student framework. Our NightHaze, especially our MAE-like self-prior learning, shows that models trained with severe augmentation effectively improve the visibility of input haze images, approaching the clarity of clear nighttime images. Extensive experiments demonstrate that our NightHaze achieves state-of-the-art performance, outperforming existing nighttime image dehazing methods by a substantial margin of 15.5% for MUSIQ and 23.5% for ClipIQA.
△ Less
Submitted 12 March, 2024;
originally announced March 2024.
-
VLKEB: A Large Vision-Language Model Knowledge Editing Benchmark
Authors:
Han Huang,
Haitian Zhong,
Tao Yu,
Qiang Liu,
Shu Wu,
Liang Wang,
Tieniu Tan
Abstract:
Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls…
▽ More
Recently, knowledge editing on large language models (LLMs) has received considerable attention. Compared to this, editing Large Vision-Language Models (LVLMs) faces extra challenges from diverse data modalities and complicated model components, and data for LVLMs editing are limited. The existing LVLM editing benchmark, which comprises three metrics (Reliability, Locality, and Generality), falls short in the quality of synthesized evaluation images and cannot assess whether models apply edited knowledge in relevant content. Therefore, we employ more reliable data collection methods to construct a new Large $\textbf{V}$ision-$\textbf{L}$anguage Model $\textbf{K}$nowledge $\textbf{E}$diting $\textbf{B}$enchmark, $\textbf{VLKEB}$, and extend the Portability metric for more comprehensive evaluation. Leveraging a multi-modal knowledge graph, our image data are bound with knowledge entities. This can be further used to extract entity-related knowledge, which constitutes the base of editing data. We conduct experiments of different editing methods on five LVLMs, and thoroughly analyze how do they impact the models. The results reveal strengths and deficiencies of these methods and hopefully provide insights for future research. The codes and dataset are available at: $\href{https://github.com/VLKEB/VLKEB}{\text{https://github.com/VLKEB/VLKEB}}$.
△ Less
Submitted 13 June, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
Debiasing Multimodal Large Language Models
Authors:
Yi-Fan Zhang,
Weichen Yu,
Qingsong Wen,
Xue Wang,
Zhang Zhang,
Liang Wang,
Rong Jin,
Tieniu Tan
Abstract:
In the realms of computer vision and natural language processing, Large Vision-Language Models (LVLMs) have become indispensable tools, proficient in generating textual descriptions based on visual inputs. Despite their advancements, our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior ra…
▽ More
In the realms of computer vision and natural language processing, Large Vision-Language Models (LVLMs) have become indispensable tools, proficient in generating textual descriptions based on visual inputs. Despite their advancements, our investigation reveals a noteworthy bias in the generated content, where the output is primarily influenced by the underlying Large Language Models (LLMs) prior rather than the input image. Our empirical experiments underscore the persistence of this bias, as LVLMs often provide confident answers even in the absence of relevant images or given incongruent visual input. To rectify these biases and redirect the model's focus toward vision information, we introduce two simple, training-free strategies. Firstly, for tasks such as classification or multi-choice question-answering (QA), we propose a ``calibration'' step through affine transformation to adjust the output distribution. This ``Post-Hoc debias'' approach ensures uniform scores for each answer when the image is absent, serving as an effective regularization technique to alleviate the influence of LLM priors. For more intricate open-ended generation tasks, we extend this method to ``Debias sampling'', drawing inspirations from contrastive decoding methods. Furthermore, our investigation sheds light on the instability of LVLMs across various decoding configurations. Through systematic exploration of different settings, we significantly enhance performance, surpassing reported results and raising concerns about the fairness of existing evaluations. Comprehensive experiments substantiate the effectiveness of our proposed strategies in mitigating biases. These strategies not only prove beneficial in minimizing hallucinations but also contribute to the generation of more helpful and precise illustrations.
△ Less
Submitted 27 March, 2024; v1 submitted 8 March, 2024;
originally announced March 2024.
-
Parent Berry curvature and the ideal anomalous Hall crystal
Authors:
Tixuan Tan,
Trithep Devakul
Abstract:
We study a model of electrons moving in a parent band of uniform Berry curvature. At sufficiently high parent Berry curvature, we show that strong repulsive interactions generically lead to the formation of an anomalous Hall crystal: a topological state with spontaneously broken continuous translation symmetry. Our results are established via a mapping to a problem of Wigner crystallization in a r…
▽ More
We study a model of electrons moving in a parent band of uniform Berry curvature. At sufficiently high parent Berry curvature, we show that strong repulsive interactions generically lead to the formation of an anomalous Hall crystal: a topological state with spontaneously broken continuous translation symmetry. Our results are established via a mapping to a problem of Wigner crystallization in a regular 2D electron gas. Interestingly, we find that a periodic electrostatic potential induces a competing state with opposite Chern number. Our theory offers a unified perspective for understanding several aspects of the recently observed integer and fractional quantum anomalous Hall effects in rhombohedral multilayer graphene and provides a recipe for engineering new topological states.
△ Less
Submitted 8 July, 2024; v1 submitted 6 March, 2024;
originally announced March 2024.
-
The Ramsey numbers for trees of order $n$ with maximum degree at least $n-5$ versus the wheel graph of order nine
Authors:
Zhi Yee Chng,
Thomas Britz,
Ta Sheng Tan,
Kok Bin Wong
Abstract:
The Ramsey numbers $R(T_n,W_8)$ are determined for each tree graph $T_n$ of order $n\geq 7$ and maximum degree $Δ(T_n)$ equal to either $n-4$ or $n-5$. These numbers indicate strong support for the conjecture, due to Chen, Zhang and Zhang and to Hafidh and Baskoro, that $R(T_n,W_m) = 2n-1$ for each tree graph $T_n$ of order $n\geq m-1$ with $Δ(T_n)\leq n-m+2$ when $m\geq 4$ is even.
The Ramsey numbers $R(T_n,W_8)$ are determined for each tree graph $T_n$ of order $n\geq 7$ and maximum degree $Δ(T_n)$ equal to either $n-4$ or $n-5$. These numbers indicate strong support for the conjecture, due to Chen, Zhang and Zhang and to Hafidh and Baskoro, that $R(T_n,W_m) = 2n-1$ for each tree graph $T_n$ of order $n\geq m-1$ with $Δ(T_n)\leq n-m+2$ when $m\geq 4$ is even.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Impact of Systematic Redshift Errors on the Cross-correlation of the Lyman-$α$ Forest with Quasars at Small Scales Using DESI Early Data
Authors:
Abby Bault,
David Kirkby,
Julien Guy,
Allyson Brodzeller,
J. Aguilar,
S. Ahlen,
S. Bailey,
D. Brooks,
L. Cabayol-Garcia,
J. Chaves-Montero,
T. Claybaugh,
A. Cuceu,
K. Dawson,
R. de la Cruz,
A. de la Macorra,
A. Dey,
P. Doel,
S. Filbert,
A. Font-Ribera,
J. E. Forero-Romero,
E. Gaztañaga,
S. Gontcho A Gontcho,
C. Gordon,
H. K. Herrera-Alcantar,
K. Honscheid
, et al. (37 additional authors not shown)
Abstract:
The Dark Energy Spectroscopic Instrument (DESI) will measure millions of quasar spectra by the end of its 5 year survey. Quasar redshift errors impact the shape of the Lyman-$α$ forest correlation functions, which can affect cosmological analyses and therefore cosmological interpretations. Using data from the DESI Early Data Release and the first two months of the main survey, we measure the syste…
▽ More
The Dark Energy Spectroscopic Instrument (DESI) will measure millions of quasar spectra by the end of its 5 year survey. Quasar redshift errors impact the shape of the Lyman-$α$ forest correlation functions, which can affect cosmological analyses and therefore cosmological interpretations. Using data from the DESI Early Data Release and the first two months of the main survey, we measure the systematic redshift error from an offset in the cross-correlation of the Lyman-$α$ forest with quasars. We find evidence for a redshift dependent bias causing redshifts to be underestimated with increasing redshift, stemming from improper modeling of the Lyman-$α$ optical depth in the templates used for redshift estimation. New templates were derived for the DESI Year 1 quasar sample at $z > 1.6$ and we found the redshift dependent bias, $Δr_\parallel$, increased from $-1.94 \pm 0.15$ $h^{-1}$ Mpc to $-0.08 \pm 0.04$ $h^{-1}$ Mpc ($-205 \pm 15~\text{km s}^{-1}$ to $-9.0 \pm 4.0~\text{km s}^{-1}$). These new templates will be used to provide redshifts for the DESI Year 1 quasar sample.
△ Less
Submitted 12 April, 2024; v1 submitted 27 February, 2024;
originally announced February 2024.
-
Logical Closed Loop: Uncovering Object Hallucinations in Large Vision-Language Models
Authors:
Junfei Wu,
Qiang Liu,
Ding Wang,
Jinghao Zhang,
Shu Wu,
Liang Wang,
Tieniu Tan
Abstract:
Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational re…
▽ More
Object hallucination has been an Achilles' heel which hinders the broader applications of large vision-language models (LVLMs). Object hallucination refers to the phenomenon that the LVLMs claim non-existent objects in the image. To mitigate the object hallucinations, instruction tuning and external model-based detection methods have been proposed, which either require large-scare computational resources or depend on the detection result of external models. However, there remains an under-explored field to utilize the LVLM itself to alleviate object hallucinations. In this work, we adopt the intuition that the LVLM tends to respond logically consistently for existent objects but inconsistently for hallucinated objects. Therefore, we propose a Logical Closed Loop-based framework for Object Hallucination Detection and Mitigation, namely LogicCheckGPT. In specific, we devise logical consistency probing to raise questions with logical correlations, inquiring about attributes from objects and vice versa. Whether their responses can form a logical closed loop serves as an indicator of object hallucination. As a plug-and-play method, it can be seamlessly applied to all existing LVLMs. Comprehensive experiments conducted on three benchmarks across four LVLMs have demonstrated significant improvements brought by our method, indicating its effectiveness and generality.
△ Less
Submitted 28 June, 2024; v1 submitted 18 February, 2024;
originally announced February 2024.
-
Personalised Drug Identifier for Cancer Treatment with Transformers using Auxiliary Information
Authors:
Aishwarya Jayagopal,
Hansheng Xue,
Ziyang He,
Robert J. Walsh,
Krishna Kumar Hariprasannan,
David Shao Peng Tan,
Tuan Zea Tan,
Jason J. Pitt,
Anand D. Jeyasekharan,
Vaibhav Rajan
Abstract:
Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are chall…
▽ More
Cancer remains a global challenge due to its growing clinical and economic burden. Its uniquely personal manifestation, which makes treatment difficult, has fuelled the quest for personalized treatment strategies. Thus, genomic profiling is increasingly becoming part of clinical diagnostic panels. Effective use of such panels requires accurate drug response prediction (DRP) models, which are challenging to build due to limited labelled patient data. Previous methods to address this problem have used various forms of transfer learning. However, they do not explicitly model the variable length sequential structure of the list of mutations in such diagnostic panels. Further, they do not utilize auxiliary information (like patient survival) for model training. We address these limitations through a novel transformer based method, which surpasses the performance of state-of-the-art DRP models on benchmark data. We also present the design of a treatment recommendation system (TRS), which is currently deployed at the National University Hospital, Singapore and is being evaluated in a clinical trial.
△ Less
Submitted 16 February, 2024;
originally announced February 2024.
-
Fine-tuning Large Language Model (LLM) Artificial Intelligence Chatbots in Ophthalmology and LLM-based evaluation using GPT-4
Authors:
Ting Fang Tan,
Kabilan Elangovan,
Liyuan Jin,
Yao Jie,
Li Yong,
Joshua Lim,
Stanley Poh,
Wei Yan Ng,
Daniel Lim,
Yuhe Ke,
Nan Liu,
Daniel Shu Wei Ting
Abstract:
Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find…
▽ More
Purpose: To assess the alignment of GPT-4-based evaluation to human clinician experts, for the evaluation of responses to ophthalmology-related patient queries generated by fine-tuned LLM chatbots. Methods: 400 ophthalmology questions and paired answers were created by ophthalmologists to represent commonly asked patient questions, divided into fine-tuning (368; 92%), and testing (40; 8%). We find-tuned 5 different LLMs, including LLAMA2-7b, LLAMA2-7b-Chat, LLAMA2-13b, and LLAMA2-13b-Chat. For the testing dataset, additional 8 glaucoma QnA pairs were included. 200 responses to the testing dataset were generated by 5 fine-tuned LLMs for evaluation. A customized clinical evaluation rubric was used to guide GPT-4 evaluation, grounded on clinical accuracy, relevance, patient safety, and ease of understanding. GPT-4 evaluation was then compared against ranking by 5 clinicians for clinical alignment. Results: Among all fine-tuned LLMs, GPT-3.5 scored the highest (87.1%), followed by LLAMA2-13b (80.9%), LLAMA2-13b-chat (75.5%), LLAMA2-7b-Chat (70%) and LLAMA2-7b (68.8%) based on the GPT-4 evaluation. GPT-4 evaluation demonstrated significant agreement with human clinician rankings, with Spearman and Kendall Tau correlation coefficients of 0.90 and 0.80 respectively; while correlation based on Cohen Kappa was more modest at 0.50. Notably, qualitative analysis and the glaucoma sub-analysis revealed clinical inaccuracies in the LLM-generated responses, which were appropriately identified by the GPT-4 evaluation. Conclusion: The notable clinical alignment of GPT-4 evaluation highlighted its potential to streamline the clinical evaluation of LLM chatbot responses to healthcare-related queries. By complementing the existing clinician-dependent manual grading, this efficient and automated evaluation could assist the validation of future developments in LLM applications for healthcare.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
A Schmidt's subspace theorem for moving hyeprplane targets over function fields
Authors:
Le Giang,
Tran Van Tan,
Nguyen Van Thin
Abstract:
In this paper, we establish a Schmidt's subspace theorem for moving hyeprplane targets in projective spaces over function fields.
In this paper, we establish a Schmidt's subspace theorem for moving hyeprplane targets in projective spaces over function fields.
△ Less
Submitted 14 February, 2024;
originally announced February 2024.