-
Strong-coupling superconductivity and weak vortex pinning in Ta-doped CsV$_{3}$Sb$_{5}$ single crystals
Authors:
Jinyulin Li,
Wei Xie,
Jinjin Liu,
Qing Li,
Xiang Li,
Huan Yang,
Zhiwei Wang,
Yugui Yao,
Hai-Hu Wen
Abstract:
By measuring magnetizations of pristine and Ta-doped CsV$_{3}$Sb$_{5}$ single crystals, we have carried out systematic studies on the lower critical field, critical current density, and equilibrium magnetization of this kagome system. The lower critical field has been investigated in the two typical samples, and the temperature dependent lower critical field obtained in Ta-doped sample can be fitt…
▽ More
By measuring magnetizations of pristine and Ta-doped CsV$_{3}$Sb$_{5}$ single crystals, we have carried out systematic studies on the lower critical field, critical current density, and equilibrium magnetization of this kagome system. The lower critical field has been investigated in the two typical samples, and the temperature dependent lower critical field obtained in Ta-doped sample can be fitted by using the model with two $s$-wave superconducting gaps yielding the larger gap of $2Δ_{s1}/k_\mathrm{B}T_\mathrm{c}=7.9\;(\pm1.8)$. This indicates a strong-coupling feature of the V-based superconductors. The measured magnetization hysteresis loops allow us to calculate the critical current density, which shows a very weak bulk vortex pinning. The magnetization hysteresis loops measured in these two kinds of samples can be well described by a recently proposed generalized phenomenological model, which leads to the determination of many fundamental parameters for these superconductors. Our systematic results and detailed analysis conclude that this V-based kagome system has features of strong-coupling superconductivity, relatively large Ginzburg-Landau parameter and weak vortex coupling.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Generative AI for Advanced UAV Networking
Authors:
Geng Sun,
Wenwen Xie,
Dusit Niyato,
Hongyang Du,
Jiawen Kang,
Jing Wu,
Sumei Sun,
Ping Zhang
Abstract:
With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial veh…
▽ More
With the impressive achievements of chatGPT and Sora, generative artificial intelligence (GAI) has received increasing attention. Not limited to the field of content generation, GAI is also widely used to solve the problems in wireless communication scenarios due to its powerful learning and generalization capabilities. Therefore, we discuss key applications of GAI in improving unmanned aerial vehicle (UAV) communication and networking performance in this article. Specifically, we first review the key technologies of GAI and the important roles of UAV networking. Then, we show how GAI can improve the communication, networking, and security performances of UAV systems. Subsequently, we propose a novel framework of GAI for advanced UAV networking, and then present a case study of UAV-enabled spectrum map estimation and transmission rate optimization based on the proposed framework to verify the effectiveness of GAI-enabled UAV systems. Finally, we discuss some important open directions.
△ Less
Submitted 16 April, 2024;
originally announced April 2024.
-
Knowledge-enhanced Visual-Language Pretraining for Computational Pathology
Authors:
Xiao Zhou,
Xiaoman Zhang,
Chaoyi Wu,
Ya Zhang,
Weidi Xie,
Yanfeng Wang
Abstract:
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology. Specifically, we make the following contributions: (i) We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring…
▽ More
In this paper, we consider the problem of visual representation learning for computational pathology, by exploiting large-scale image-text pairs gathered from public resources, along with the domain specific knowledge in pathology. Specifically, we make the following contributions: (i) We curate a pathology knowledge tree that consists of 50,470 informative attributes for 4,718 diseases requiring pathology diagnosis from 32 human tissues. To our knowledge, this is the first comprehensive structured pathology knowledge base; (ii) We develop a knowledge-enhanced visual-language pretraining approach, where we first project pathology-specific knowledge into latent embedding space via language model, and use it to guide the visual representation learning; (iii) We conduct thorough experiments to validate the effectiveness of our proposed components, demonstrating significant performance improvement on various downstream tasks, including cross-modal retrieval, zero-shot classification on pathology patches, and zero-shot tumor subtyping on whole slide images (WSIs). All codes, models and the pathology knowledge tree will be released to the research community
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
Diffusion Models Meet Remote Sensing: Principles, Methods, and Perspectives
Authors:
Yidan Liu,
Jun Yue,
Shaobo Xia,
Pedram Ghamisi,
Weiying Xie,
Leyuan Fang
Abstract:
As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language processing, and molecule design. The remote sensing community has also noticed the powerful ability of diffusion models and quickly applied them to a variety of tasks for image processing. Given the rapid increase in research on…
▽ More
As a newly emerging advance in deep generative models, diffusion models have achieved state-of-the-art results in many fields, including computer vision, natural language processing, and molecule design. The remote sensing community has also noticed the powerful ability of diffusion models and quickly applied them to a variety of tasks for image processing. Given the rapid increase in research on diffusion models in the field of remote sensing, it is necessary to conduct a comprehensive review of existing diffusion model-based remote sensing papers, to help researchers recognize the potential of diffusion models and provide some directions for further exploration. Specifically, this paper first introduces the theoretical background of diffusion models, and then systematically reviews the applications of diffusion models in remote sensing, including image generation, enhancement, and interpretation. Finally, the limitations of existing remote sensing diffusion models and worthy research directions for further exploration are discussed and summarized.
△ Less
Submitted 17 April, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
Multi-scale Dynamic and Hierarchical Relationship Modeling for Facial Action Units Recognition
Authors:
Zihan Wang,
Siyang Song,
Cheng Luo,
Songhe Deng,
Weicheng Xie,
Linlin Shen
Abstract:
Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper propos…
▽ More
Human facial action units (AUs) are mutually related in a hierarchical manner, as not only they are associated with each other in both spatial and temporal domains but also AUs located in the same/close facial regions show stronger relationships than those of different facial regions. While none of existing approach thoroughly model such hierarchical inter-dependencies among AUs, this paper proposes to comprehensively model multi-scale AU-related dynamic and hierarchical spatio-temporal relationship among AUs for their occurrences recognition. Specifically, we first propose a novel multi-scale temporal differencing network with an adaptive weighting block to explicitly capture facial dynamics across frames at different spatial scales, which specifically considers the heterogeneity of range and magnitude in different AUs' activation. Then, a two-stage strategy is introduced to hierarchically model the relationship among AUs based on their spatial distribution (i.e., local and cross-region AU relationship modelling). Experimental results achieved on BP4D and DISFA show that our approach is the new state-of-the-art in the field of AU occurrence recognition. Our code is publicly available at https://github.com/CVI-SZU/MDHR.
△ Less
Submitted 9 April, 2024;
originally announced April 2024.
-
Resolving turbulence and drag over textured surfaces using texture-less simulations: the case of slip/no-slip textures
Authors:
Wenxiong Xie,
Chris T Fairhall,
Ricardo García-Mayoral
Abstract:
We study the effect of surface texture on an overlying turbulent flow for the case of textures made of an alternating slip/no-slip pattern, a common model for superhydrophobic surfaces but also a particularly simple form of texture. For texture sizes $L^+ \lesssim 20$, we have previously reported that turbulence remained smooth-wall-like, other than experiencing an apparent origin offset for diffe…
▽ More
We study the effect of surface texture on an overlying turbulent flow for the case of textures made of an alternating slip/no-slip pattern, a common model for superhydrophobic surfaces but also a particularly simple form of texture. For texture sizes $L^+ \lesssim 20$, we have previously reported that turbulence remained smooth-wall-like, other than experiencing an apparent origin offset for different flow components. For slip/no-slip textures, this effect reduced to the flow experiencing slip conditions in the streamwise and spanwise directions and zero transpiration at the surface. The overlying turbulence effectively perceived such boundary conditions at least up to texture sizes $L^+ \approx 50$. However, beyond $L^+ \approx 20$ the texture interacted with the overlying turbulence in a non-homogeneous fashion, additional Reynolds stresses arose and turbulence was no longer smooth-wall-like. This is the typical effect of surface texture observed for rough surfaces, and results in an increase in drag relative to smooth-wall flows. In this paper, we argue that this occurs because the texture modifies the overlying turbulence through non-linear, cross-advective terms between the background turbulence and the texture-coherent flow directly induced by the surface topology. To verify this, we conduct homogeneous-slip-length simulations where we introduce additional, forcing terms in the Navier-Stokes equations to capture the effect of this non-linear interaction on the background turbulence. The interaction can then be accounted for without the need to resolve the surface texture. We show that the additional terms quantitatively capture the changes in the flow up to texture sizes $L^+ \approx 70$--$100$, including not just the roughness function but also the flow statistics and structure.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Prediction of topotactic transition from black to blue phosphorus induced by surface Br adsorption
Authors:
Hao Tian,
Wenjun Xie,
Maohai Xie,
Chuanhui Zhu,
Hu Xu,
Shuk-Yin Tong
Abstract:
Based on first-principles calculations, we propose a potential access to the yet unrealized freestanding blue phosphorus (blueP) through transformation of black phosphorus (blackP) induced by surface bromine (Br) adsorption. Formation of the Br-P bonds disrupts the original sp3 configurations in blackP, generates unpaired pz electrons and induces a structural transformation that results in blueP f…
▽ More
Based on first-principles calculations, we propose a potential access to the yet unrealized freestanding blue phosphorus (blueP) through transformation of black phosphorus (blackP) induced by surface bromine (Br) adsorption. Formation of the Br-P bonds disrupts the original sp3 configurations in blackP, generates unpaired pz electrons and induces a structural transformation that results in blueP formation by re-pairing the pz orbitals. Ab initio molecular dynamics simulations confirm that randomly adsorbed Br adatoms on bilayer blackP spontaneously diffuse into specific patterns to render the emergence of the blueP phase. The expected obtainment Br-passivated blueP nanoribbons exhibit tunable band gaps in a wide range and high carrier mobilities of the order of 1000 cm2V-1s-1. This study provides an opportunity to fabricate blueP through the conversion from blackP by tuning its surface chemistry.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Impact of The Newly Revised Gravitational Redshift of X-ray Burster GS 1826-24 on The Equation of State of Supradense Neutron-Rich Matter
Authors:
Wen-Jie Xie,
Bao-An Li,
Nai-Bo Zhang
Abstract:
Thanks to the recent advancement in producing rare isotopes and measuring their masses with unprecedented precision, the updated nuclear masses around the waiting-point nucleus $^{64}$Ge in the rapid-proton capture process have led to a significant revision of the surface gravitational redshift of the neutron star (NS) in GS 1826-24 by re-fitting its X-ray burst light curve ({\it X. Zhou et al., N…
▽ More
Thanks to the recent advancement in producing rare isotopes and measuring their masses with unprecedented precision, the updated nuclear masses around the waiting-point nucleus $^{64}$Ge in the rapid-proton capture process have led to a significant revision of the surface gravitational redshift of the neutron star (NS) in GS 1826-24 by re-fitting its X-ray burst light curve ({\it X. Zhou et al., Nature Physics {\bf 19}, 1091 (2023)}) using Modules for Experiments in Stellar Astrophysics (MESA). The resulting NS compactness $ξ$ is between 0.244 and 0.342 at 95\% confidence level and its upper boundary is significantly smaller than the maximum $ξ$ previously known. Incorporating this new data within a comprehensive Bayesian statistical framework, we investigate its impact on the Equation of State (EOS) of supradense neutron-rich matter and the required spin frequency for GW190814's minor $m_2$ with mass $2.59\pm 0.05$M$_{\odot}$ to be a rotationally stable pulsar. We found that the EOS of high-density symmetric nuclear matter (SNM) has to be softened significantly while the symmetry energy at supersaturation densities stiffened compared to our prior knowledge from earlier analyses using data from both astrophysical observations and terrestrial nuclear experiments. In particular, the skewness $J_0$ characterizing the stiffness of high-density SNM decreases significantly, while the slope $L$, curvature $K_{\rm{sym}}$, and skewness $J_{\rm{sym}}$ of nuclear symmetry energy all increase appreciably compared to their fiducial values. We also found that the most probable spin rate for the $m_2$ to be a stable pulsar is very close to its mass-shedding limit once the revised redshift data from GS 1826-24 is considered, making the $m_2$ unlikely the most massive NS observed so far.
△ Less
Submitted 15 July, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition
Authors:
Weidong Xie,
Lun Luo,
Nanfei Ye,
Yi Ren,
Shaoyi Du,
Minhang Wang,
Jintao Xu,
Rui Ai,
Weihao Gu,
Xieyuanli Chen
Abstract:
Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation…
▽ More
Place recognition is an important task for robots and autonomous cars to localize themselves and close loops in pre-built maps. While single-modal sensor-based methods have shown satisfactory performance, cross-modal place recognition that retrieving images from a point-cloud database remains a challenging problem. Current cross-modal methods transform images into 3D points using depth estimation for modality conversion, which are usually computationally intensive and need expensive labeled data for depth supervision. In this work, we introduce a fast and lightweight framework to encode images and point clouds into place-distinctive descriptors. We propose an effective Field of View (FoV) transformation module to convert point clouds into an analogous modality as images. This module eliminates the necessity for depth estimation and helps subsequent modules achieve real-time performance. We further design a non-negative factorization-based encoder to extract mutually consistent semantic features between point clouds and images. This encoder yields more distinctive global descriptors for retrieval. Experimental results on the KITTI dataset show that our proposed methods achieve state-of-the-art performance while running in real time. Additional evaluation on the HAOMO dataset covering a 17 km trajectory further shows the practical generalization capabilities. We have released the implementation of our methods as open source at: https://github.com/haomo-ai/ModaLink.git.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
Grey-informed neural network for time-series forecasting
Authors:
Wanli Xie,
Ruibin Zhao,
Zhenguo Xu,
Tingting Liang
Abstract:
Neural network models have shown outstanding performance and successful resolutions to complex problems in various fields. However, the majority of these models are viewed as black-box, requiring a significant amount of data for development. Consequently, in situations with limited data, constructing appropriate models becomes challenging due to the lack of transparency and scarcity of data. To ta…
▽ More
Neural network models have shown outstanding performance and successful resolutions to complex problems in various fields. However, the majority of these models are viewed as black-box, requiring a significant amount of data for development. Consequently, in situations with limited data, constructing appropriate models becomes challenging due to the lack of transparency and scarcity of data. To tackle these challenges, this study suggests the implementation of a grey-informed neural network (GINN). The GINN ensures that the output of the neural network follows the differential equation model of the grey system, improving interpretability. Moreover, incorporating prior knowledge from grey system theory enables traditional neural networks to effectively handle small data samples. Our proposed model has been observed to uncover underlying patterns in the real world and produce reliable forecasts based on empirical data.
△ Less
Submitted 3 April, 2024; v1 submitted 22 March, 2024;
originally announced March 2024.
-
Anomalous Neutron Nuclear-Magnetic Interference Spectroscopy
Authors:
Chuliang Fu,
Phum Siriviboon,
Artittaya Boonkird,
Michael Landry,
Chen Li,
Weiwei Xie,
Mingda Li
Abstract:
The electron-phonon interaction plays a critical role in materials electrical, thermal, optical, and superconducting properties. However, measuring the phonon mode-resolved electron-phonon interaction has been challenging. Here we propose neutron-scattering-based Anomalous Neutron nUclear-Magnetic Interference Spectroscopy (ANUBIS), where the co-existence of neutron nuclear scattering and magnetic…
▽ More
The electron-phonon interaction plays a critical role in materials electrical, thermal, optical, and superconducting properties. However, measuring the phonon mode-resolved electron-phonon interaction has been challenging. Here we propose neutron-scattering-based Anomalous Neutron nUclear-Magnetic Interference Spectroscopy (ANUBIS), where the co-existence of neutron nuclear scattering and magnetic scattering leads to anomalous dynamical structure factor under the presence of the electron-phonon interaction. Such anomalous structure factor is linear in electron-phonon coupling constant at the phonon wavevector, and is directly proportional to the momentum and energy-resolved dielectric function. The experimental configuration can be achieved using existing polarized inelastic neutron scattering setup, and an order-of-magnitude estimate shows the viability to observe the anomalous scattering signal is around $10^{-4}$ to $10^{-3}$ relative to phonon scattering, which is achievable at emerging neutron facilities. Our proposal offers an alternative neutron-based metrology to probe the crucial electronic properties.
△ Less
Submitted 20 March, 2024;
originally announced March 2024.
-
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
Authors:
Wendi Li,
Wei Wei,
Kaihe Xu,
Wenfeng Xie,
Dangyang Chen,
Yu Cheng
Abstract:
To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally…
▽ More
To meet the requirements of real-world applications, it is essential to control generations of large language models (LLMs). Prior research has tried to introduce reinforcement learning (RL) into controllable text generation while most existing methods suffer from overfitting issues (finetuning-based methods) or semantic collapse (post-processing methods). However, current RL methods are generally guided by coarse-grained (sentence/paragraph-level) feedback, which may lead to suboptimal performance owing to semantic twists or progressions within sentences. To tackle that, we propose a novel reinforcement learning algorithm named TOLE which formulates TOken-LEvel rewards for controllable text generation, and employs a "first-quantize-then-noise" paradigm to enhance the robustness of the RL algorithm.Furthermore, TOLE can be flexibly extended to multiple constraints with little computational expense. Experimental results show that our algorithm can achieve superior performance on both single-attribute and multi-attribute control tasks. We have released our codes at https://github.com/WindyLee0822/CTG
△ Less
Submitted 18 March, 2024;
originally announced March 2024.
-
New constraints on Triton's atmosphere from the 6 October 2022 stellar occultation
Authors:
Ye Yuan,
Chen Zhang,
Fan Li,
Jian Chen,
Yanning Fu,
Chunhai Bai,
Xing Gao,
Yong Wang,
Tuhong Zhong,
Yixing Gao,
Liang Wang,
Donghua Chen,
Yixing Zhang,
Yang Zhang,
Wenpeng Xie,
Shupi Zhang,
Ding Liu,
Jun Cao,
Xiangdong Yin,
Xiaojun Mo,
Jing Liu,
Xinru Han,
Tong Liu,
Yuqiang Chen,
Zhendong Gao
, et al. (25 additional authors not shown)
Abstract:
The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by pr…
▽ More
The atmosphere of Triton was probed directly by observing a ground-based stellar occultation on 6 October 2022. This rare event yielded 23 positive light curves collected from 13 separate observation stations contributing to our campaign. The significance of this event lies in its potential to directly validate the modest pressure fluctuation on Triton, a phenomenon not definitively verified by previous observations, including only five stellar occultations, and the Voyager 2 radio occultation in 1989. Using an approach consistent with a comparable study, we precisely determined a surface pressure of $14.07_{-0.13}^{+0.21}~\mathrm{μbar}$ in 2022. This new pressure rules out any significant monotonic variation in pressure between 2017 and 2022 through direct observations, as it is in alignment with the 2017 value. Additionally, both the pressures in 2017 and 2022 align with the 1989 value. This provides further support for the conclusion drawn from the previous volatile transport model simulation, which is consistent with the observed alignment between the pressures in 1989 and 2017; that is to say, the pressure fluctuation is modest. Moreover, this conclusion suggests the existence of a northern polar cap extended down to at least $45^\circ$N$-60^\circ$N and the presence of nitrogen between $30^\circ$S and $0^\circ$.
△ Less
Submitted 24 March, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
E2E-MFD: Towards End-to-End Synchronous Multimodal Fusion Detection
Authors:
Jiaqing Zhang,
Mingxiang Cao,
Xue Yang,
Weiying Xie,
Jie Lei,
Daixun Li,
Wenbo Huang,
Yunsong Li
Abstract:
Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high perfor…
▽ More
Multimodal image fusion and object detection are crucial for autonomous driving. While current methods have advanced the fusion of texture details and semantic information, their complex training processes hinder broader applications. Addressing this challenge, we introduce E2E-MFD, a novel end-to-end algorithm for multimodal fusion detection. E2E-MFD streamlines the process, achieving high performance with a single training phase. It employs synchronous joint optimization across components to avoid suboptimal solutions tied to individual tasks. Furthermore, it implements a comprehensive optimization strategy in the gradient matrix for shared parameters, ensuring convergence to an optimal fusion detection configuration. Our extensive testing on multiple public datasets reveals E2E-MFD's superior capabilities, showcasing not only visually appealing image fusion but also impressive detection outcomes, such as a 3.9% and 2.0% mAP50 increase on horizontal object detection dataset M3FD and oriented object detection dataset DroneVehicle, respectively, compared to state-of-the-art approaches. The code is released at https://github.com/icey-zhang/E2E-MFD.
△ Less
Submitted 23 May, 2024; v1 submitted 14 March, 2024;
originally announced March 2024.
-
Layered Kagome Compound Na$_2$Ni$_3$S$_4$ with Topological Flat Band
Authors:
Junyao Ye,
Yihao Lin,
Haozhe Wang,
Zida Song,
Ji Feng,
Weiwei Xie,
Shuang Jia
Abstract:
We report structural and electronic properties of Na$_2$Ni$_3$S$_4$, a quasi-two-dimensional compound composed of alternating layers of [Ni$_3$S$_4$]$^{2-}$ and Na$^{+}$. The compound features a remarkable Ni-based kagome lattice with a square planar configuration of four surrounding S atoms for each Ni atom. Magnetization and electrical measurements reveal a weak paramagnetic insulator with a gap…
▽ More
We report structural and electronic properties of Na$_2$Ni$_3$S$_4$, a quasi-two-dimensional compound composed of alternating layers of [Ni$_3$S$_4$]$^{2-}$ and Na$^{+}$. The compound features a remarkable Ni-based kagome lattice with a square planar configuration of four surrounding S atoms for each Ni atom. Magnetization and electrical measurements reveal a weak paramagnetic insulator with a gap of about 0.5 eV. Our band structure calculation highlights a set of topological flat bands of the kagome lattice derived from the rotated d$_{xz}$-orbital with $C_\mathrm{3}$ + $T$ symmetry in the presence of crystal-field splitting.
△ Less
Submitted 18 April, 2024; v1 submitted 13 March, 2024;
originally announced March 2024.
-
DeliGrasp: Inferring Object Properties with LLMs for Adaptive Grasp Policies
Authors:
William Xie,
Jensen Lavering,
Nikolaus Correll
Abstract:
Large language models (LLMs) can provide rich physical descriptions of most worldly objects, allowing robots to achieve more informed and capable grasping. We leverage LLMs' common sense physical reasoning and code-writing abilities to infer an object's physical characteristics--mass $m$, friction coefficient $μ$, and spring constant $k$--from a semantic description, and then translate those chara…
▽ More
Large language models (LLMs) can provide rich physical descriptions of most worldly objects, allowing robots to achieve more informed and capable grasping. We leverage LLMs' common sense physical reasoning and code-writing abilities to infer an object's physical characteristics--mass $m$, friction coefficient $μ$, and spring constant $k$--from a semantic description, and then translate those characteristics into an executable adaptive grasp policy. Using a current-controllable, two-finger gripper with a built-in depth camera, we demonstrate that LLM-generated, physically-grounded grasp policies outperform traditional grasp policies on a custom benchmark of 12 delicate and deformable items including food, produce, toys, and other everyday items, spanning two orders of magnitude in mass and required pick-up force. We also demonstrate how compliance feedback from DeliGrasp policies can aid in downstream tasks such as measuring produce ripeness. Our code and videos are available at: https://deligrasp.github.io
△ Less
Submitted 30 March, 2024; v1 submitted 12 March, 2024;
originally announced March 2024.
-
AUFormer: Vision Transformers are Parameter-Efficient Facial Action Unit Detectors
Authors:
Kaishen Yuan,
Zitong Yu,
Xin Liu,
Weicheng Xie,
Huanjing Yue,
Jingyu Yang
Abstract:
Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a prom…
▽ More
Facial Action Units (AU) is a vital concept in the realm of affective computing, and AU detection has always been a hot research topic. Existing methods suffer from overfitting issues due to the utilization of a large number of learnable parameters on scarce AU-annotated datasets or heavy reliance on substantial additional relevant data. Parameter-Efficient Transfer Learning (PETL) provides a promising paradigm to address these challenges, whereas its existing methods lack design for AU characteristics. Therefore, we innovatively investigate PETL paradigm to AU detection, introducing AUFormer and proposing a novel Mixture-of-Knowledge Expert (MoKE) collaboration mechanism. An individual MoKE specific to a certain AU with minimal learnable parameters first integrates personalized multi-scale and correlation knowledge. Then the MoKE collaborates with other MoKEs in the expert group to obtain aggregated information and inject it into the frozen Vision Transformer (ViT) to achieve parameter-efficient AU detection. Additionally, we design a Margin-truncated Difficulty-aware Weighted Asymmetric Loss (MDWA-Loss), which can encourage the model to focus more on activated AUs, differentiate the difficulty of unactivated AUs, and discard potential mislabeled samples. Extensive experiments from various perspectives, including within-domain, cross-domain, data efficiency, and micro-expression domain, demonstrate AUFormer's state-of-the-art performance and robust generalization abilities without relying on additional relevant data. The code for AUFormer is available at https://github.com/yuankaishen2001/AUFormer.
△ Less
Submitted 9 July, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Yi: Open Foundation Models by 01.AI
Authors:
01. AI,
:,
Alex Young,
Bei Chen,
Chao Li,
Chengen Huang,
Ge Zhang,
Guanwei Zhang,
Heng Li,
Jiangcheng Zhu,
Jianqun Chen,
Jing Chang,
Kaidong Yu,
Peng Liu,
Qiang Liu,
Shawn Yue,
Senbin Yang,
Shiming Yang,
Tao Yu,
Wen Xie,
Wenhao Huang,
Xiaohui Hu,
Xiaoyi Ren,
Xinyao Niu,
Pengcheng Nie
, et al. (7 additional authors not shown)
Abstract:
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU,…
▽ More
We introduce the Yi model family, a series of language and multimodal models that demonstrate strong multi-dimensional capabilities. The Yi model family is based on 6B and 34B pretrained language models, then we extend them to chat models, 200K long context models, depth-upscaled models, and vision-language models. Our base models achieve strong performance on a wide range of benchmarks like MMLU, and our finetuned chat models deliver strong human preference rate on major evaluation platforms like AlpacaEval and Chatbot Arena. Building upon our scalable super-computing infrastructure and the classical transformer architecture, we attribute the performance of Yi models primarily to its data quality resulting from our data-engineering efforts. For pretraining, we construct 3.1 trillion tokens of English and Chinese corpora using a cascaded data deduplication and quality filtering pipeline. For finetuning, we polish a small scale (less than 10K) instruction dataset over multiple iterations such that every single instance has been verified directly by our machine learning engineers. For vision-language, we combine the chat language model with a vision transformer encoder and train the model to align visual representations to the semantic space of the language model. We further extend the context length to 200K through lightweight continual pretraining and demonstrate strong needle-in-a-haystack retrieval performance. We show that extending the depth of the pretrained checkpoint through continual pretraining further improves performance. We believe that given our current results, continuing to scale up model parameters using thoroughly optimized data will lead to even stronger frontier models.
△ Less
Submitted 7 March, 2024;
originally announced March 2024.
-
Quantum critical fluctuations generate intensely magnetic field-resilient superconductivity in UTe2
Authors:
Z. Wu,
T. I. Weinberger,
A. J. Hickey,
D. V. Chichinadze,
D. Shaffer,
A. Cabala,
H. Chen,
M. Long,
T. J. Brumm,
W. Xie,
Y. Lin,
Y. Skourski,
Z. Zengwei,
D. E. Graf,
V. Sechovsky,
G. G. Lonzarich,
1 M. Valiska,
F. M. Grosche,
A. G. Eaton
Abstract:
Quantum critical phase boundaries (QCPBs) -- where a continuous phase transition occurs at zero temperature -- have been found to nucleate novel electronic states in a number of strongly correlated materials. Emergent electronic phases, such as unconventional superconductivity, frequently occur in close proximity to a QCPB. However, the antagonism between magnetic field and superconductivity gener…
▽ More
Quantum critical phase boundaries (QCPBs) -- where a continuous phase transition occurs at zero temperature -- have been found to nucleate novel electronic states in a number of strongly correlated materials. Emergent electronic phases, such as unconventional superconductivity, frequently occur in close proximity to a QCPB. However, the antagonism between magnetic field and superconductivity generally suppresses such superconductive phases to modest magnetic field ranges, except in notable cases such as the high-$T_\text{c}$ cuprates. Here we show that the heavy fermion Kondo-lattice system UTe$_2$ possesses a QCPB at remarkably high magnetic fields $\sim$ 50 T, underpinning an extremely high field superconducting state that persists to fields $> \sim$ 70 T despite its relatively modest transition temperature of $T_\text{c} \approx$ 2 K. Whereas previous studies found this superconducting state to exist exclusively inside a field-polarised paramagnetic host-phase accessed following a first-order metamagnetic transition, for magnetic field tilt angles in the rotation plane connecting the reciprocal (010) and (101) vectors, we find an extended region of this superconductivity outside the field-polarised state. In this special rotation plane we also observe a pronounced suppression of the metamagnetic transition towards zero temperature, revealing that the metamagnetic transition surface ends at a QCPB. The superconducting $T_\text{c}$ exhibits a strong angular dependence and is enhanced close to the QCPB, where the onset of superconductivity is stretched over a surprisingly large magnetic field range reaching as low as 30 T. We model our data by a phenomenological Ginzburg-Landau theory, and show how an extended quantum critical line -- rather than a more conventional QCPB at a singular point -- anchors the remarkable high magnetic field phase landscape of UTe$_2$.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
Offline Fictitious Self-Play for Competitive Games
Authors:
Jingxiao Chen,
Weiji Xie,
Weinan Zhang,
Yong yu,
Ying Wen
Abstract:
Offline Reinforcement Learning (RL) has received significant interest due to its ability to improve policies in previously collected datasets without online interactions. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a m…
▽ More
Offline Reinforcement Learning (RL) has received significant interest due to its ability to improve policies in previously collected datasets without online interactions. Despite its success in the single-agent setting, offline multi-agent RL remains a challenge, especially in competitive games. Firstly, unaware of the game structure, it is impossible to interact with the opponents and conduct a major learning paradigm, self-play, for competitive games. Secondly, real-world datasets cannot cover all the state and action space in the game, resulting in barriers to identifying Nash equilibrium (NE). To address these issues, this paper introduces Off-FSP, the first practical model-free offline RL algorithm for competitive games. We start by simulating interactions with various opponents by adjusting the weights of the fixed dataset with importance sampling. This technique allows us to learn best responses to different opponents and employ the Offline Self-Play learning framework. In this framework, we further implement Fictitious Self-Play (FSP) to approximate NE. In partially covered real-world datasets, our methods show the potential to approach NE by incorporating any single-agent offline RL method. Experimental results in Leduc Hold'em Poker show that our method significantly improves performances compared with state-of-the-art baselines.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Mass production and performance study on the 20-inch PMT acrylic protection covers in JUNO
Authors:
Miao He,
Zhonghua Qin,
Diru Wu,
Meihang Xu,
Wan Xie,
Fang Chen,
Xiaoping Jing,
Genhua Yin,
Shengjiong Yin,
Linhua Gu,
Xiaofeng Xia,
Qinchang Wang
Abstract:
The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional p…
▽ More
The Jiangmen Underground Neutrino Observatory is a neutrino experiment that incorporates 20,012 20-inch photomultiplier tubes (PMTs) and 25,600 3-inch PMTs. A dedicated system was designed to protect the PMTs from an implosion chain reaction underwater. As a crucial element of the protection system, over 20,000 acrylic covers were manufactured through injection molding, ensuring high dimensional precision, mechanical strength, and transparency. This paper presents the manufacturing technology, mass production process, and performance characteristics of the acrylic covers.
△ Less
Submitted 25 February, 2024;
originally announced February 2024.
-
Foot In The Door: Understanding Large Language Model Jailbreaking via Cognitive Psychology
Authors:
Zhenhua Wang,
Wei Xie,
Baosheng Wang,
Enze Wang,
Zhiwen Gui,
Shuoyoucheng Ma,
Kai Chen
Abstract:
Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-makin…
▽ More
Large Language Models (LLMs) have gradually become the gateway for people to acquire new knowledge. However, attackers can break the model's security protection ("jail") to access restricted information, which is called "jailbreaking." Previous studies have shown the weakness of current LLMs when confronted with such jailbreaking attacks. Nevertheless, comprehension of the intrinsic decision-making mechanism within the LLMs upon receipt of jailbreak prompts is noticeably lacking. Our research provides a psychological explanation of the jailbreak prompts. Drawing on cognitive consistency theory, we argue that the key to jailbreak is guiding the LLM to achieve cognitive coordination in an erroneous direction. Further, we propose an automatic black-box jailbreaking method based on the Foot-in-the-Door (FITD) technique. This method progressively induces the model to answer harmful questions via multi-step incremental prompts. We instantiated a prototype system to evaluate the jailbreaking effectiveness on 8 advanced LLMs, yielding an average success rate of 83.9%. This study builds a psychological perspective on the explanatory insights into the intrinsic decision-making logic of LLMs.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Towards Building Multilingual Language Model for Medicine
Authors:
Pengcheng Qiu,
Chaoyi Wu,
Xiaoman Zhang,
Weixiong Lin,
Haicheng Wang,
Ya Zhang,
Yanfeng Wang,
Weidi Xie
Abstract:
The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for ge…
▽ More
The development of open-source, multilingual medical language models can benefit a wide, linguistically diverse audience from different regions. To promote this domain, we present contributions from the following: First, we construct a multilingual medical corpus, containing approximately 25.5B tokens encompassing 6 main languages, termed as MMedC, enabling auto-regressive domain adaptation for general LLMs; Second, to monitor the development of multilingual medical LLMs, we propose a multilingual medical multi-choice question-answering benchmark with rationale, termed as MMedBench; Third, we have assessed a number of open-source large language models (LLMs) on our benchmark, along with those further auto-regressive trained on MMedC. Our final model, MMed-Llama 3, with only 8B parameters, achieves superior performance compared to all other open-source models on both MMedBench and English benchmarks, even rivaling GPT-4. In conclusion, in this work, we present a large-scale corpus, a benchmark and a series of models to support the development of multilingual medical LLMs.
△ Less
Submitted 2 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
Slot-VLM: SlowFast Slots for Video-Language Modeling
Authors:
Jiaqi Xu,
Cuiling Lan,
Wenxuan Xie,
Xuejin Chen,
Yan Lu
Abstract:
Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a novel framework designed to generate semantically decomposed video token…
▽ More
Video-Language Models (VLMs), powered by the advancements in Large Language Models (LLMs), are charting new frontiers in video understanding. A pivotal challenge is the development of an efficient method to encapsulate video content into a set of representative tokens to align with LLMs. In this work, we introduce Slot-VLM, a novel framework designed to generate semantically decomposed video tokens, in terms of object-wise and event-wise visual representations, to facilitate LLM inference. Particularly, we design a SlowFast Slots module, i.e., SF-Slots, that adaptively aggregates the dense video tokens from the CLIP vision encoder to a set of representative slots. In order to take into account both the spatial object details and the varied temporal dynamics, SF-Slots is built with a dual-branch structure. The Slow-Slots branch focuses on extracting object-centric slots from features at high spatial resolution but low (slow) frame sample rate, emphasizing detailed object information. Conversely, Fast-Slots branch is engineered to learn event-centric slots from high temporal sample rate but low spatial resolution features. These complementary slots are combined to form the vision context, serving as the input to the LLM for efficient question answering. Our experimental results demonstrate the effectiveness of our Slot-VLM, which achieves the state-of-the-art performance on video question-answering.
△ Less
Submitted 20 February, 2024;
originally announced February 2024.
-
InstaGen: Enhancing Object Detection by Training on Synthetic Dataset
Authors:
Chengjian Feng,
Yujie Zhong,
Zequn Jie,
Weidi Xie,
Lin Ma
Abstract:
In this paper, we present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated image…
▽ More
In this paper, we present a novel paradigm to enhance the ability of object detector, e.g., expanding categories or improving detection performance, by training on synthetic dataset generated from diffusion models. Specifically, we integrate an instance-level grounding head into a pre-trained, generative diffusion model, to augment it with the ability of localising instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model, using supervision from an off-the-shelf object detector, and a novel self-training scheme on (novel) categories not covered by the detector. We conduct thorough experiments to show that, this enhanced version of diffusion model, termed as InstaGen, can serve as a data synthesizer, to enhance object detectors by training on its generated samples, demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 to 5.2 AP) scenarios. Project page with code: https://fcjian.github.io/InstaGen.
△ Less
Submitted 8 April, 2024; v1 submitted 8 February, 2024;
originally announced February 2024.
-
Regularized MIP Model for Optimal Power Flow with Energy Storage Systems and its Applications
Authors:
Dahye Han,
Nan Jiang,
Santanu S. Dey,
Weijun Xie
Abstract:
Incorporating energy storage systems (ESS) into power systems has been studied in many recent works, where binary variables are often introduced to model the complementary nature of battery charging and discharging. A conventional approach for these ESS optimization problems is to relax binary variables and convert the problem into a linear program. However, such linear programming relaxation mode…
▽ More
Incorporating energy storage systems (ESS) into power systems has been studied in many recent works, where binary variables are often introduced to model the complementary nature of battery charging and discharging. A conventional approach for these ESS optimization problems is to relax binary variables and convert the problem into a linear program. However, such linear programming relaxation models can yield unrealistic fractional solutions, such as simultaneous charging and discharging. In this paper, we develop a regularized Mixed-Integer Programming (MIP) model for the ESS optimal power flow (OPF) problem. We prove that under mild conditions, the proposed regularized model admits a zero integrality gap with its linear programming relaxation; hence, it can be solved efficiently. By studying the properties of the regularized MIP model, we show that its optimal solution is also near-optimal to the original ESS OPF problem, thereby providing a valid and tight upper bound for the ESS OPF problem. The use of the regularized MIP model allows us to solve two intractable problems: a two-stage stochastic ESS OPF problem and a trilevel network contingency problem.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Boosting Adversarial Transferability across Model Genus by Deformation-Constrained Warping
Authors:
Qinliang Lin,
Cheng Luo,
Zenghao Niu,
Xilin He,
Weicheng Xie,
Yuanbo Hou,
Linlin Shen,
Siyang Song
Abstract:
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propos…
▽ More
Adversarial examples generated by a surrogate model typically exhibit limited transferability to unknown target systems. To address this problem, many transferability enhancement approaches (e.g., input transformation and model augmentation) have been proposed. However, they show poor performances in attacking systems having different model genera from the surrogate model. In this paper, we propose a novel and generic attacking strategy, called Deformation-Constrained Warping Attack (DeCoWA), that can be effectively applied to cross model genus attack. Specifically, DeCoWA firstly augments input examples via an elastic deformation, namely Deformation-Constrained Warping (DeCoW), to obtain rich local details of the augmented input. To avoid severe distortion of global semantics led by random deformation, DeCoW further constrains the strength and direction of the warping transformation by a novel adaptive control strategy. Extensive experiments demonstrate that the transferable examples crafted by our DeCoWA on CNN surrogates can significantly hinder the performance of Transformers (and vice versa) on various tasks, including image classification, video action recognition, and audio recognition. Code is made available at https://github.com/LinQinLiang/DeCoWA.
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Distributionally Fair Stochastic Optimization using Wasserstein Distance
Authors:
Qing Ye,
Grani A. Hanasusanto,
Weijun Xie
Abstract:
A traditional stochastic program under a finite population typically seeks to optimize efficiency by maximizing the expected profits or minimizing the expected costs, subject to a set of constraints. However, implementing such optimization-based decisions can have varying impacts on individuals, and when assessed using the individuals' utility functions, these impacts may differ substantially acro…
▽ More
A traditional stochastic program under a finite population typically seeks to optimize efficiency by maximizing the expected profits or minimizing the expected costs, subject to a set of constraints. However, implementing such optimization-based decisions can have varying impacts on individuals, and when assessed using the individuals' utility functions, these impacts may differ substantially across demographic groups delineated by sensitive attributes, such as gender, race, age, and socioeconomic status. As each group comprises multiple individuals, a common remedy is to enforce group fairness, which necessitates the measurement of disparities in the distributions of utilities across different groups. This paper introduces the concept of Distributionally Fair Stochastic Optimization (DFSO) based on the Wasserstein fairness measure. The DFSO aims to minimize distributional disparities among groups, quantified by the Wasserstein distance, while adhering to an acceptable level of inefficiency. Our analysis reveals that: (i) the Wasserstein fairness measure recovers the demographic parity fairness prevalent in binary classification literature; (ii) this measure can approximate the well-known Kolmogorov-Smirnov fairness measure with considerable accuracy; and (iii) despite DFSO's biconvex nature, the epigraph of the Wasserstein fairness measure is generally Mixed-Integer Convex Programming Representable (MICP-R). Additionally, we introduce two distinct lower bounds for the Wasserstein fairness measure: the Jensen bound, applicable to the general Wasserstein fairness measure, and the Gelbrich bound, specific to the type-2 Wasserstein fairness measure. We establish the exactness of the Gelbrich bound and quantify the theoretical difference between the Wasserstein fairness measure and the Gelbrich bound.
△ Less
Submitted 8 February, 2024; v1 submitted 2 February, 2024;
originally announced February 2024.
-
DRSM: efficient neural 4d decomposition for dynamic reconstruction in stationary monocular cameras
Authors:
Weixing Xie,
Xiao Dong,
Yong Yang,
Qiqin Lin,
Jingze Chen,
Junfeng Yao,
Xiaohu Guo
Abstract:
With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology. In contrast to scene reconstructions that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained…
▽ More
With the popularity of monocular videos generated by video sharing and live broadcasting applications, reconstructing and editing dynamic scenes in stationary monocular cameras has become a special but anticipated technology. In contrast to scene reconstructions that exploit multi-view observations, the problem of modeling a dynamic scene from a single view is significantly more under-constrained and ill-posed. Inspired by recent progress in neural rendering, we present a novel framework to tackle 4D decomposition problem for dynamic scenes in monocular cameras. Our framework utilizes decomposed static and dynamic feature planes to represent 4D scenes and emphasizes the learning of dynamic regions through dense ray casting. Inadequate 3D clues from a single-view and occlusion are also particular challenges in scene reconstruction. To overcome these difficulties, we propose deep supervised optimization and ray casting strategies. With experiments on various videos, our method generates higher-fidelity results than existing methods for single-view dynamic scene representation.
△ Less
Submitted 1 February, 2024;
originally announced February 2024.
-
On Tractability, Complexity, and Mixed-Integer Convex Programming Representability of Distributionally Favorable Optimization
Authors:
Nan Jiang,
Weijun Xie
Abstract:
Distributionally Favorable Optimization (DFO) is an important framework for decision-making under uncertainty, with applications across fields such as reinforcement learning, online learning, robust statistics, chance-constrained programming, and two-stage stochastic optimization without relatively complete recourse. In contrast to the traditional Distributionally Robust Optimization (DRO) paradig…
▽ More
Distributionally Favorable Optimization (DFO) is an important framework for decision-making under uncertainty, with applications across fields such as reinforcement learning, online learning, robust statistics, chance-constrained programming, and two-stage stochastic optimization without relatively complete recourse. In contrast to the traditional Distributionally Robust Optimization (DRO) paradigm, DFO presents a unique challenge -- the application of the inner infimum operator often fails to retain the convexity. In light of this challenge, we study the tractability and complexity of DFO. We establish sufficient and necessary conditions for determining when DFO problems are tractable or intractable. Despite the typical nonconvex nature of DFO problems, our findings show that they are mixed-integer convex programming representable (MICP-R), thereby enabling solutions via standard optimization solvers. Finally, we numerically validate the efficacy of our MICP-R formulations.
△ Less
Submitted 31 January, 2024;
originally announced January 2024.
-
Synchformer: Efficient Synchronization from Sparse Cues
Authors:
Vladimir Iashin,
Weidi Xie,
Esa Rahtu,
Andrew Zisserman
Abstract:
Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art…
▽ More
Our objective is audio-visual synchronization with a focus on 'in-the-wild' videos, such as those on YouTube, where synchronization cues can be sparse. Our contributions include a novel audio-visual synchronization model, and training that decouples feature extraction from synchronization modelling through multi-modal segment-level contrastive pre-training. This approach achieves state-of-the-art performance in both dense and sparse settings. We also extend synchronization model training to AudioSet a million-scale 'in-the-wild' dataset, investigate evidence attribution techniques for interpretability, and explore a new capability for synchronization models: audio-visual synchronizability.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Time-resolved Spectral Properties of Fermi-GBM Bright Long Gamma-Ray Bursts
Authors:
Wan-Kai Wang,
Wei Xie,
Zhi-Fu Gao,
Shuo Xiao,
Ai-Jun Dong,
Bin Zhang,
Qi-Jun Zhi
Abstract:
The prompt emission mechanism of gamma-ray bursts (GRBs) is still unclear, and the time-resolved spectral analysis of GRBs is a powerful tool for studying their underlying physical processes. We performed a detailed time-resolved spectral analysis of 78 bright long GRB samples detected by Fermi/Gamma-ray Burst Monitor (GBM). A total of 1490 spectra were obtained and their properties were studied u…
▽ More
The prompt emission mechanism of gamma-ray bursts (GRBs) is still unclear, and the time-resolved spectral analysis of GRBs is a powerful tool for studying their underlying physical processes. We performed a detailed time-resolved spectral analysis of 78 bright long GRB samples detected by Fermi/Gamma-ray Burst Monitor (GBM). A total of 1490 spectra were obtained and their properties were studied using a typical Band-shape model. Firstly, the parameter distribution of the time-resolved spectrum given as follows: the low-energy spectral index $α\sim -0.72$, high-energy spectral index $β\sim -2.42$, the peak energy $E_{\rm p} \sim 221.69 \,\rm{keV}$, and the energy flux $F \sim 7.49\times 10^{-6} \rm{\, erg\,cm^{-2}\,s^{-1}}$. More than 80\% of the bursts exhibit the hardest low-energy spectral index $α_{\rm max}$ exceeding the synchrotron limit (-2/3). Secondly, the evolution patterns of $α$ and $E_{\rm p}$ were statistically analyzed. The results show that for multi-pulse GRBs the intensity-tracking pattern is more common than the hard-to-soft pattern in the evolution of both $E_{\rm p}$ and $α$. The hard-to-soft pattern is generally shown in single-pulse GRBs or in the initial pulse of multi-pulse GRBs. Finally, we found a significant positive correlation between $F$ and $E_{\rm p}$, with half of the samples exhibiting a positive correlation between $F$ and $α$. We discussed the spectral evolution of different radiation models. The diversity of spectral evolution patterns indicates that there may be more than one radiation mechanism occurring in the gamma-ray burst radiation process, including photospheric radiation and synchrotron radiation. However, it may also involve only one radiation mechanism, but more complicated physical details need to be considered.
△ Less
Submitted 26 January, 2024;
originally announced January 2024.
-
Study of SVOM/ECLAIRs inhomogeneities in the detection plane below 8 keV and their mitigation for the trigger performances
Authors:
Wenjin Xie,
Bertrand Cordier,
Nicolas Dagoneau,
Stéphane Schanne,
Jean-Luc Atteia,
Laurent Bouchet,
Olivier Godet
Abstract:
The Space-based multi-band astronomical Variable Objects Monitor (SVOM) is a Chinese-French mission dedicated to the study of the transient sky. It is scheduled to start operations in 2024. ECLAIRs is a coded-mask telescope with a large field of view. It is designed to detect and localize gamma-ray bursts (GRBs) in the energy range from 4 keV up to 120 keV. In 2021, the ECLAIRs telescope underwent…
▽ More
The Space-based multi-band astronomical Variable Objects Monitor (SVOM) is a Chinese-French mission dedicated to the study of the transient sky. It is scheduled to start operations in 2024. ECLAIRs is a coded-mask telescope with a large field of view. It is designed to detect and localize gamma-ray bursts (GRBs) in the energy range from 4 keV up to 120 keV. In 2021, the ECLAIRs telescope underwent various calibration campaigns in vacuum test-chambers to evaluate its performance. Between 4 and 8 keV, the counting response of the detection plane shows inhomogeneities between pixels from different production batches. The efficiency inhomogeneity is caused by low-efficiency pixels (LEPs) from one of the two batches, together with high-threshold pixels (HTPs) whose threshold was raised to avoid cross-talk effects. In addition, some unexpected noise was found in the detection plane regions close to the heat pipes. We study the impact of these inhomogeneities and of the heat-pipe noise at low energies on the ECLAIRs onboard triggers. We propose different strategies in order to mitigate these impacts and to improve the onboard trigger performance. We analyzed the data from the calibration campaigns and performed simulations with the ground model of the ECLAIRs trigger software in order to design and evaluate the different strategies. Most of the impact of HTPs can be corrected for by excluding HTPs from the trigger processing. To correct for the impact of LEPs, an efficiency correction in the shadowgram seems to be a good solution. An effective solution for the heat-pipe noise is selecting the noisy pixels and ignoring their data in the 4--8 keV band during the data analysis.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Vanishing vortex creep at the transition from ordered to disordered vortex phases in Ba$_{0.64}$K$_{0.36}$Fe$_2$As$_2$
Authors:
Yu-Hao Liu,
Wei Xie,
Hai-Hu Wen
Abstract:
By measuring the dynamical and conventional magnetization relaxation of the Ba$_{0.64}$K$_{0.36}$Fe$_2$As$_2$ single crystals, we found strong second peak effect on the magnetization hysteresis loops. It is found that there is a kink of magnetization at a field between the valley and maximum magnetization. Interestingly, the magnetization relaxation rate has a deep minimum at the field with the ki…
▽ More
By measuring the dynamical and conventional magnetization relaxation of the Ba$_{0.64}$K$_{0.36}$Fe$_2$As$_2$ single crystals, we found strong second peak effect on the magnetization hysteresis loops. It is found that there is a kink of magnetization at a field between the valley and maximum magnetization. Interestingly, the magnetization relaxation rate has a deep minimum at the field with the kink, indicating a diminished vortex creep. The relaxation rate at this field is clearly smaller than the so-called universal lower limit of the relaxation rate characterized by ($S_0\approx \ Gi^{1/2}(T/T_\mathrm{c}$). This diminished vortex creep is associated with the origin of the SMP effect and attributed to the strongly hindered flux motion when experiencing the transition from the quasi-ordered to disordered vortex phases.
△ Less
Submitted 23 January, 2024;
originally announced January 2024.
-
Wideband Beamforming for RIS Assisted Near-Field Communications
Authors:
Ji Wang,
Jian Xiao,
Yixuan Zou,
Wenwu Xie,
Yuanwei Liu
Abstract:
A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid…
▽ More
A near-field wideband beamforming scheme is investigated for reconfigurable intelligent surface (RIS) assisted multiple-input multiple-output (MIMO) systems, in which a deep learning-based end-to-end (E2E) optimization framework is proposed to maximize the system spectral efficiency. To deal with the near-field double beam split effect, the base station is equipped with frequency-dependent hybrid precoding architecture by introducing sub-connected true time delay (TTD) units, while two specific RIS architectures, namely true time delay-based RIS (TTD-RIS) and virtual subarray-based RIS (SA-RIS), are exploited to realize the frequency-dependent passive beamforming at the RIS. Furthermore, the efficient E2E beamforming models without explicit channel state information are proposed, which jointly exploits the uplink channel training module and the downlink wideband beamforming module. In the proposed network architecture of the E2E models, the classical communication signal processing methods, i.e., polarized filtering and sparsity transform, are leveraged to develop a signal-guided beamforming network. Numerical results show that the proposed E2E models have superior beamforming performance and robustness to conventional beamforming benchmarks. Furthermore, the tradeoff between the beamforming gain and the hardware complexity is investigated for different frequency-dependent RIS architectures, in which the TTD-RIS can achieve better spectral efficiency than the SA-RIS while requiring additional energy consumption and hardware cost.
△ Less
Submitted 3 July, 2024; v1 submitted 20 January, 2024;
originally announced January 2024.
-
Photoproduction of lepton pair in ultra-relativistic heavy-ion collisions
Authors:
Kewei Yu,
Jiazhen Peng,
Shuang Li,
Kejun Wu,
Wei Xie,
Fei Sun
Abstract:
Dilepton production provides a unique probe of the strong electromagnetic field produced in heavy-ion collisions. To map out the behavior of its transverse momentum broadening, we present a theoretical model based on the equivalent photon approximation, and then we update it to make direct comparisons with the recent experimental measurements. We find that the model calculations can describe well,…
▽ More
Dilepton production provides a unique probe of the strong electromagnetic field produced in heavy-ion collisions. To map out the behavior of its transverse momentum broadening, we present a theoretical model based on the equivalent photon approximation, and then we update it to make direct comparisons with the recent experimental measurements. We find that the model calculations can describe well, not only the average transverse momentum squared of $e^{+}e^{-}$ pairs in Au--Au collisions at $\sqrt{s_{\rm NN}}=200$ GeV, but also the acoplanarity of $μ^{+}μ^{-}$ pairs in Pb--Pb collisions at$\sqrt{s_{\rm NN}}=5.02$ TeV. Furthermore, the model predictions are also able to reproduce the measured dependencies of the pair mass and the transverse momentum squared.
△ Less
Submitted 12 June, 2024; v1 submitted 19 January, 2024;
originally announced January 2024.
-
Unraveling collisional energy loss of a heavy quark in quark-gluon plasma
Authors:
Jiazhen Peng,
Kewei Yu,
Shuang Li,
Wei Xiong,
Fei Sun,
Wei Xie
Abstract:
At leading order in QCD coupling constant, we compute the energy loss per traveling distance of a heavy quark $dE/dz$ from elastic scattering off thermal quarks and gluons at a temperature $T$, including the thermal perturbative description of soft scatterings ($-t<-t^{\ast}$) and a perturbative QCD-based calculation for hard collisions ($-t>-t^{\ast}$). Within this soft-hard factorization model,…
▽ More
At leading order in QCD coupling constant, we compute the energy loss per traveling distance of a heavy quark $dE/dz$ from elastic scattering off thermal quarks and gluons at a temperature $T$, including the thermal perturbative description of soft scatterings ($-t<-t^{\ast}$) and a perturbative QCD-based calculation for hard collisions ($-t>-t^{\ast}$). Within this soft-hard factorization model, we find that the full results of $dE/dz$ behaves a mild sensitivity to the intermediate cutoff $t^{\ast}$, supporting the validity of the soft-hard approach within the temperature region of interest. We re-derive the analytic formula for $dE/dz$ in the high-energy approximation, $E_{1}\gg m^{2}_{1}/T$, where $E_{1}$ is the injected heavy quark energy and $m_{1}$ is its mass. It is realized that the soft logarithmic contribution, $dE/dz\propto ln(-t^{\ast}/m^{2}_{D})$, arises from the $t$-channel scattering off thermal partons, while the hard logarithmic term, $dE/dz\propto ln[E_{1}T/(-t^{\ast})]$, stems from the $t$-channel scattering off thermal partons, and the one $dE/dz\propto ln(E_{1}T/m^{2}_{1})$ comes from the $s$- and $u$-channel scattering off gluons. The sum of these contributions cancels the $t^{\ast}$-dependence as observed in the full result. The mass hierarchy is observed $dE/dz(charm)>dE/dz(bottom)$. Our full results are crucial for a better description of heavy quark transport in QCD medium, in particular at low and moderate energy. We also calculate the energy loss by imposing the Einstein's relationship. The related results appear to be systematically larger than that without imposing the Einstein's relationship.
△ Less
Submitted 19 January, 2024;
originally announced January 2024.
-
Suppression of metal-to-insulator transition and stabilization of superconductivity by pressure in Re3Ge7
Authors:
S. Huyan,
E. Mun,
H. Wang,
T. J. Slade,
Z. Li,
J. Schmidt,
R. A. Ribeiro,
W. Xie,
S. L. Bud'ko,
P. C. Canfield
Abstract:
The effect of pressure on the low-temperature states of the Re3Ge7 is investigated by both electrical and Hall resistance and magnetization measurements. At ambient pressure, the temperature dependent resistance of Re3Ge7 behaves quasi-linearly from room temperature down to 60 K, then undergoes a two-step metal-to-insulator transitions (MIT) at temperatures T1 = 59.4 K and T2 = 58.7 K which may be…
▽ More
The effect of pressure on the low-temperature states of the Re3Ge7 is investigated by both electrical and Hall resistance and magnetization measurements. At ambient pressure, the temperature dependent resistance of Re3Ge7 behaves quasi-linearly from room temperature down to 60 K, then undergoes a two-step metal-to-insulator transitions (MIT) at temperatures T1 = 59.4 K and T2 = 58.7 K which may be related to a structural phase transition or occurrence of charge density wave ordering. Upon applying pressure, the two-step (T1, T2) MIT splits into three steps (T1, T2 and T3) above 1 GPa, and all traces of MITs are fully suppressed by ~8 GPa. Subsequently, the onset of bulk superconductivity (SC) occurs between 10.8 and 12.2 GPa and persists to our highest pressure of 26.8 GPa. At 12.2 GPa the superconducting transition temperature, Tc, and upper critical field, Hc2 reach the maximum of Tc (onset) ~5.9 K and Hc2 (1.8 K) ~ 14 kOe. Our results not only present the observation of SC under high pressure in Re3Ge7 but also delineate the interplay between SC and other competing electronic states by creating a T - p phase diagram for this potentially topologically nontrivial system Re3Ge7.
△ Less
Submitted 17 January, 2024;
originally announced January 2024.
-
Enabling Collaborative Clinical Diagnosis of Infectious Keratitis by Integrating Expert Knowledge and Interpretable Data-driven Intelligence
Authors:
Zhengqing Fang,
Shuowen Zhou,
Zhouhang Yuan,
Yuxuan Si,
Mengze Li,
Jinxu Li,
Yesheng Xu,
Wenjia Xie,
Kun Kuang,
Yingming Li,
Fei Wu,
Yu-Feng Yao
Abstract:
Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a…
▽ More
Although data-driven artificial intelligence (AI) in medical image diagnosis has shown impressive performance in silico, the lack of interpretability makes it difficult to incorporate the "black box" into clinicians' workflows. To make the diagnostic patterns learned from data understandable by clinicians, we develop an interpretable model, knowledge-guided diagnosis model (KGDM), that provides a visualized reasoning process containing AI-based biomarkers and retrieved cases that with the same diagnostic patterns. It embraces clinicians' prompts into the interpreted reasoning through human-AI interaction, leading to potentially enhanced safety and more accurate predictions. This study investigates the performance, interpretability, and clinical utility of KGDM in the diagnosis of infectious keratitis (IK), which is the leading cause of corneal blindness. The classification performance of KGDM is evaluated on a prospective validation dataset, an external testing dataset, and an publicly available testing dataset. The diagnostic odds ratios (DOR) of the interpreted AI-based biomarkers are effective, ranging from 3.011 to 35.233 and exhibit consistent diagnostic patterns with clinic experience. Moreover, a human-AI collaborative diagnosis test is conducted and the participants with collaboration achieved a performance exceeding that of both humans and AI. By synergistically integrating interpretability and interaction, this study facilitates the convergence of clinicians' expertise and data-driven intelligence. The promotion of inexperienced ophthalmologists with the aid of AI-based biomarkers, as well as increased AI prediction by intervention from experienced ones, demonstrate a promising diagnostic paradigm for infectious keratitis using KGDM, which holds the potential for extension to other diseases where experienced medical practitioners are limited and the safety of AI is concerned.
△ Less
Submitted 13 January, 2024;
originally announced January 2024.
-
DA-BEV: Unsupervised Domain Adaptation for Bird's Eye View Perception
Authors:
Kai Jiang,
Jiaxing Huang,
Weiying Xie,
Yunsong Li,
Ling Shao,
Shijian Lu
Abstract:
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first dom…
▽ More
Camera-only Bird's Eye View (BEV) has demonstrated great potential in environment perception in a 3D space. However, most existing studies were conducted under a supervised setup which cannot scale well while handling various new data. Unsupervised domain adaptive BEV, which effective learning from various unlabelled target data, is far under-explored. In this work, we design DA-BEV, the first domain adaptive camera-only BEV framework that addresses domain adaptive BEV challenges by exploiting the complementary nature of image-view features and BEV features. DA-BEV introduces the idea of query into the domain adaptation framework to derive useful information from image-view and BEV features. It consists of two query-based designs, namely, query-based adversarial learning (QAL) and query-based self-training (QST), which exploits image-view features or BEV features to regularize the adaptation of the other. Extensive experiments show that DA-BEV achieves superior domain adaptive BEV perception performance consistently across multiple datasets and tasks such as 3D object detection and 3D scene segmentation.
△ Less
Submitted 12 January, 2024;
originally announced January 2024.
-
Domain Adaptation for Large-Vocabulary Object Detectors
Authors:
Kai Jiang,
Jiaxing Huang,
Weiying Xie,
Jie Lei,
Yunsong Li,
Ling Shao,
Shijian Lu
Abstract:
Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CL…
▽ More
Large-vocabulary object detectors (LVDs) aim to detect objects of many categories, which learn super objectness features and can locate objects accurately while applied to various downstream data. However, LVDs often struggle in recognizing the located objects due to domain discrepancy in data distribution and object vocabulary. At the other end, recent vision-language foundation models such as CLIP demonstrate superior open-vocabulary recognition capability. This paper presents KGD, a Knowledge Graph Distillation technique that exploits the implicit knowledge graphs (KG) in CLIP for effectively adapting LVDs to various downstream domains. KGD consists of two consecutive stages: 1) KG extraction that employs CLIP to encode downstream domain data as nodes and their feature distances as edges, constructing KG that inherits the rich semantic relations in CLIP explicitly; and 2) KG encapsulation that transfers the extracted KG into LVDs to enable accurate cross-domain object classification. In addition, KGD can extract both visual and textual KG independently, providing complementary vision and language knowledge for object localization and object classification in detection tasks over various downstream domains. Experiments over multiple widely adopted detection benchmarks show that KGD outperforms the state-of-the-art consistently by large margins.
△ Less
Submitted 10 May, 2024; v1 submitted 12 January, 2024;
originally announced January 2024.
-
SwiMDiff: Scene-wide Matching Contrastive Learning with Diffusion Constraint for Remote Sensing Image
Authors:
Jiayuan Tian,
Jie Lei,
Jiaqing Zhang,
Weiying Xie,
Yunsong Li
Abstract:
With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL o…
▽ More
With recent advancements in aerospace technology, the volume of unlabeled remote sensing image (RSI) data has increased dramatically. Effectively leveraging this data through self-supervised learning (SSL) is vital in the field of remote sensing. However, current methodologies, particularly contrastive learning (CL), a leading SSL method, encounter specific challenges in this domain. Firstly, CL often mistakenly identifies geographically adjacent samples with similar semantic content as negative pairs, leading to confusion during model training. Secondly, as an instance-level discriminative task, it tends to neglect the essential fine-grained features and complex details inherent in unstructured RSIs. To overcome these obstacles, we introduce SwiMDiff, a novel self-supervised pre-training framework designed for RSIs. SwiMDiff employs a scene-wide matching approach that effectively recalibrates labels to recognize data from the same scene as false negatives. This adjustment makes CL more applicable to the nuances of remote sensing. Additionally, SwiMDiff seamlessly integrates CL with a diffusion model. Through the implementation of pixel-level diffusion constraints, we enhance the encoder's ability to capture both the global semantic information and the fine-grained features of the images more comprehensively. Our proposed framework significantly enriches the information available for downstream tasks in remote sensing. Demonstrating exceptional performance in change detection and land-cover classification tasks, SwiMDiff proves its substantial utility and value in the field of remote sensing.
△ Less
Submitted 10 January, 2024;
originally announced January 2024.
-
Distribution-aware Interactive Attention Network and Large-scale Cloud Recognition Benchmark on FY-4A Satellite Image
Authors:
Jiaqing Zhang,
Jie Lei,
Weiying Xie,
Kai Jiang,
Mingxiang Cao,
Yunsong Li
Abstract:
Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the nov…
▽ More
Accurate cloud recognition and warning are crucial for various applications, including in-flight support, weather forecasting, and climate research. However, recent deep learning algorithms have predominantly focused on detecting cloud regions in satellite imagery, with insufficient attention to the specificity required for accurate cloud recognition. This limitation inspired us to develop the novel FY-4A-Himawari-8 (FYH) dataset, which includes nine distinct cloud categories and uses precise domain adaptation methods to align 70,419 image-label pairs in terms of projection, temporal resolution, and spatial resolution, thereby facilitating the training of supervised deep learning networks. Given the complexity and diversity of cloud formations, we have thoroughly analyzed the challenges inherent to cloud recognition tasks, examining the intricate characteristics and distribution of the data. To effectively address these challenges, we designed a Distribution-aware Interactive-Attention Network (DIAnet), which preserves pixel-level details through a high-resolution branch and a parallel multi-resolution cross-branch. We also integrated a distribution-aware loss (DAL) to mitigate the imbalance across cloud categories. An Interactive Attention Module (IAM) further enhances the robustness of feature extraction combined with spatial and channel information. Empirical evaluations on the FYH dataset demonstrate that our method outperforms other cloud recognition networks, achieving superior performance in terms of mean Intersection over Union (mIoU). The code for implementing DIAnet is available at https://github.com/icey-zhang/DIAnet.
△ Less
Submitted 6 January, 2024;
originally announced January 2024.
-
Multimodal Informative ViT: Information Aggregation and Distribution for Hyperspectral and LiDAR Classification
Authors:
Jiaqing Zhang,
Jie Lei,
Weiying Xie,
Geng Yang,
Daixun Li,
Yunsong Li
Abstract:
In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundanc…
▽ More
In multimodal land cover classification (MLCC), a common challenge is the redundancy in data distribution, where irrelevant information from multiple modalities can hinder the effective integration of their unique features. To tackle this, we introduce the Multimodal Informative Vit (MIVit), a system with an innovative information aggregate-distributing mechanism. This approach redefines redundancy levels and integrates performance-aware elements into the fused representation, facilitating the learning of semantics in both forward and backward directions. MIVit stands out by significantly reducing redundancy in the empirical distribution of each modality's separate and fused features. It employs oriented attention fusion (OAF) for extracting shallow local features across modalities in horizontal and vertical dimensions, and a Transformer feature extractor for extracting deep global features through long-range attention. We also propose an information aggregation constraint (IAC) based on mutual information, designed to remove redundant information and preserve complementary information within embedded features. Additionally, the information distribution flow (IDF) in MIVit enhances performance-awareness by distributing global classification information across different modalities' feature maps. This architecture also addresses missing modality challenges with lightweight independent modality classifiers, reducing the computational load typically associated with Transformers. Our results show that MIVit's bidirectional aggregate-distributing mechanism between modalities is highly effective, achieving an average overall accuracy of 95.56% across three multimodal datasets. This performance surpasses current state-of-the-art methods in MLCC. The code for MIVit is accessible at https://github.com/icey-zhang/MIViT.
△ Less
Submitted 23 January, 2024; v1 submitted 6 January, 2024;
originally announced January 2024.
-
Double-domed temperature-pressure phase diagram found for CePd3S4
Authors:
S. Huyan,
T. J. Slade,
H. Wang,
R. Flint,
R. A. Ribeiro,
W. Xie,
S. L. Bud'ko,
P. C. Canfield
Abstract:
CePd3S4 exhibits interplay between ferromagnetism (FM), quadrupolar order, and the Kondo effect at low temperatures with a FM transition temperature that is much higher than the value expected from the de Gennes scaling of the heavier RPd3S4 compounds. In this work, we investigated the electrical transport and magnetic properties of CePd3S4 under pressure up through 12 GPa so as to better understa…
▽ More
CePd3S4 exhibits interplay between ferromagnetism (FM), quadrupolar order, and the Kondo effect at low temperatures with a FM transition temperature that is much higher than the value expected from the de Gennes scaling of the heavier RPd3S4 compounds. In this work, we investigated the electrical transport and magnetic properties of CePd3S4 under pressure up through 12 GPa so as to better understand the interplay between electronic and magnetic phases in this material. Our findings show that the low pressure FM state is suddenly replaced by a new magnetically ordered phase that is most likely antiferromagnetic that spans from ~ 7 GPa to ~ 11 GPa. Whereas this could be described as an example of avoided quantum criticality, given that clear changes in resistance and Hall data are detected near 6.3 GPa for all temperatures below 300 K, it is also possible that the change in ground state is a response to a pressure induced change in structure. The lack of any discernible change in the pressure dependence of the room temperature unit cell parameter/volume across this whole pressure range suggests that this change in structure is either more subtle than could be detected by our measurements (i.e. development of weak, new wave vector) or the transition is electronic (such as a Lifshitz transition).
△ Less
Submitted 5 January, 2024;
originally announced January 2024.
-
FedDiff: Diffusion Model Driven Federated Learning for Multi-Modal and Multi-Clients
Authors:
DaiXun Li,
Weiying Xie,
ZiXuan Wang,
YiBing Lu,
Yunsong Li,
Leyuan Fang
Abstract:
With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the…
▽ More
With the rapid development of imaging sensor technology in the field of remote sensing, multi-modal remote sensing data fusion has emerged as a crucial research direction for land cover classification tasks. While diffusion models have made great progress in generative models and image classification tasks, existing models primarily focus on single-modality and single-client control, that is, the diffusion process is driven by a single modal in a single computing node. To facilitate the secure fusion of heterogeneous data from clients, it is necessary to enable distributed multi-modal control, such as merging the hyperspectral data of organization A and the LiDAR data of organization B privately on each base station client. In this study, we propose a multi-modal collaborative diffusion federated learning framework called FedDiff. Our framework establishes a dual-branch diffusion model feature extraction setup, where the two modal data are inputted into separate branches of the encoder. Our key insight is that diffusion models driven by different modalities are inherently complementary in terms of potential denoising steps on which bilateral connections can be built. Considering the challenge of private and efficient communication between multiple clients, we embed the diffusion model into the federated learning communication structure, and introduce a lightweight communication module. Qualitative and quantitative experiments validate the superiority of our framework in terms of image quality and conditional consistency.
△ Less
Submitted 15 November, 2023;
originally announced January 2024.
-
TR-DETR: Task-Reciprocal Transformer for Joint Moment Retrieval and Highlight Detection
Authors:
Hao Sun,
Mingyao Zhou,
Wenjing Chen,
Wei Xie
Abstract:
Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature ext…
▽ More
Video moment retrieval (MR) and highlight detection (HD) based on natural language queries are two highly related tasks, which aim to obtain relevant moments within videos and highlight scores of each video clip. Recently, several methods have been devoted to building DETR-based networks to solve both MR and HD jointly. These methods simply add two separate task heads after multi-modal feature extraction and feature interaction, achieving good performance. Nevertheless, these approaches underutilize the reciprocal relationship between two tasks. In this paper, we propose a task-reciprocal transformer based on DETR (TR-DETR) that focuses on exploring the inherent reciprocity between MR and HD. Specifically, a local-global multi-modal alignment module is first built to align features from diverse modalities into a shared latent space. Subsequently, a visual feature refinement is designed to eliminate query-irrelevant information from visual features for modal interaction. Finally, a task cooperation module is constructed to refine the retrieval pipeline and the highlight score prediction process by utilizing the reciprocity between MR and HD. Comprehensive experiments on QVHighlights, Charades-STA and TVSum datasets demonstrate that TR-DETR outperforms existing state-of-the-art methods. Codes are available at \url{https://github.com/mingyao1120/TR-DETR}.
△ Less
Submitted 4 January, 2024; v1 submitted 4 January, 2024;
originally announced January 2024.
-
Joint Multi-Facts Reasoning Network For Complex Temporal Question Answering Over Knowledge Graph
Authors:
Rikui Huang,
Wei Wei,
Xiaoye Qu,
Wenfeng Xie,
Xianling Mao,
Dangyang Chen
Abstract:
Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope. Existing temporal knowledge graph question answering (TKGQA) models solely approach simple questions, owing to the prior assumption that each question only contains a single temporal fact with explicit/implicit temporal constraints. Hence, they perform poorly on questions which own multiple tempo…
▽ More
Temporal Knowledge Graph (TKG) is an extension of regular knowledge graph by attaching the time scope. Existing temporal knowledge graph question answering (TKGQA) models solely approach simple questions, owing to the prior assumption that each question only contains a single temporal fact with explicit/implicit temporal constraints. Hence, they perform poorly on questions which own multiple temporal facts. In this paper, we propose \textbf{\underline{J}}oint \textbf{\underline{M}}ulti \textbf{\underline{F}}acts \textbf{\underline{R}}easoning \textbf{\underline{N}}etwork (JMFRN), to jointly reasoning multiple temporal facts for accurately answering \emph{complex} temporal questions. Specifically, JMFRN first retrieves question-related temporal facts from TKG for each entity of the given complex question. For joint reasoning, we design two different attention (\ie entity-aware and time-aware) modules, which are suitable for universal settings, to aggregate entities and timestamps information of retrieved facts. Moreover, to filter incorrect type answers, we introduce an additional answer type discrimination task. Extensive experiments demonstrate our proposed method significantly outperforms the state-of-art on the well-known complex temporal question benchmark TimeQuestions.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.
-
Exploring Hyperspectral Anomaly Detection with Human Vision: A Small Target Aware Detector
Authors:
Jitao Ma,
Weiying Xie,
Yunsong Li
Abstract:
Hyperspectral anomaly detection (HAD) aims to localize pixel points whose spectral features differ from the background. HAD is essential in scenarios of unknown or camouflaged target features, such as water quality monitoring, crop growth monitoring and camouflaged target detection, where prior information of targets is difficult to obtain. Existing HAD methods aim to objectively detect and distin…
▽ More
Hyperspectral anomaly detection (HAD) aims to localize pixel points whose spectral features differ from the background. HAD is essential in scenarios of unknown or camouflaged target features, such as water quality monitoring, crop growth monitoring and camouflaged target detection, where prior information of targets is difficult to obtain. Existing HAD methods aim to objectively detect and distinguish background and anomalous spectra, which can be achieved almost effortlessly by human perception. However, the underlying processes of human visual perception are thought to be quite complex. In this paper, we analyze hyperspectral image (HSI) features under human visual perception, and transfer the solution process of HAD to the more robust feature space for the first time. Specifically, we propose a small target aware detector (STAD), which introduces saliency maps to capture HSI features closer to human visual perception. STAD not only extracts more anomalous representations, but also reduces the impact of low-confidence regions through a proposed small target filter (STF). Furthermore, considering the possibility of HAD algorithms being applied to edge devices, we propose a full connected network to convolutional network knowledge distillation strategy. It can learn the spectral and spatial features of the HSI while lightening the network. We train the network on the HAD100 training set and validate the proposed method on the HAD100 test set. Our method provides a new solution space for HAD that is closer to human visual perception with high confidence. Sufficient experiments on real HSI with multiple method comparisons demonstrate the excellent performance and unique potential of the proposed method. The code is available at https://github.com/majitao-xd/STAD-HAD.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.
-
Accelerating Discovery of Novel and Bioactive Ligands With Pharmacophore-Informed Generative Models
Authors:
Weixin Xie,
Jianhang Zhang,
Qin Xie,
Chaojun Gong,
Youjun Xu,
Luhua Lai,
Jianfeng Pei
Abstract:
Deep generative models have gained significant advancements to accelerate drug discovery by generating bioactive chemicals against desired targets. Nevertheless, most generated compounds that have been validated for potent bioactivity often exhibit structural novelty levels that fall short of satisfaction, thereby providing limited inspiration to human medicinal chemists. The challenge faced by ge…
▽ More
Deep generative models have gained significant advancements to accelerate drug discovery by generating bioactive chemicals against desired targets. Nevertheless, most generated compounds that have been validated for potent bioactivity often exhibit structural novelty levels that fall short of satisfaction, thereby providing limited inspiration to human medicinal chemists. The challenge faced by generative models lies in their ability to produce compounds that are both bioactive and novel, rather than merely making minor modifications to known actives present in the training set. Recognizing the utility of pharmacophores in facilitating scaffold hopping, we developed TransPharmer, an innovative generative model that integrates ligand-based interpretable pharmacophore fingerprints with generative pre-training transformer (GPT) for de novo molecule generation. TransPharmer demonstrates superior performance across tasks involving unconditioned distribution learning, de novo generation and scaffold elaboration under pharmacophoric constraints. Its distinct exploration mode within the local chemical space renders it particularly useful for scaffold hopping, producing compounds that are structurally novel while pharmaceutically related. The efficacy of TransPharmer is validated through two case studies involving the dopamine receptor D2 (DRD2) and polo-like kinase 1 (PLK1). Notably in the case of PLK1, three out of four synthesized designed compounds exhibit submicromolar activities, with the most potent one, IIP0943, demonstrating a potency of 5.1 nM. Featuring a new scaffold of 4-(benzo[b]thiophen-7-yloxy)pyrimidine, IIP0943 also exhibits high selectivity for PLK1. It was demonstrated that TransPharmer is a powerful tool for discovery of novel and bioactive ligands.
△ Less
Submitted 2 January, 2024;
originally announced January 2024.