-
Reconfigurable Intelligent Surface for Sensing, Communication, and Computation: Perspectives, Challenges, and Opportunities
Authors:
Bin Li,
Wancheng Xie,
Zesong Fei
Abstract:
Forthcoming 6G networks have two predominant features of wide coverage and sufficient computation capability. To support the promising applications, Integrated Sensing, Communication, and Computation (ISCC) has been considered as a vital enabler by completing the computation of raw data to achieve accurate environmental sensing. To help the ISCC networks better support the comprehensive services o…
▽ More
Forthcoming 6G networks have two predominant features of wide coverage and sufficient computation capability. To support the promising applications, Integrated Sensing, Communication, and Computation (ISCC) has been considered as a vital enabler by completing the computation of raw data to achieve accurate environmental sensing. To help the ISCC networks better support the comprehensive services of radar detection, data transmission and edge computing, Reconfigurable Intelligent Surface (RIS) can be employed to boost the transmission rate and the wireless coverage by smartly tuning the electromagnetic characteristics of the environment. In this article, we propose an RIS-assisted ISCC framework and exploit the RIS benefits for improving radar sensing, communication and computing functionalities via cross-layer design, while discussing the key challenges. Then, two generic application scenarios are presented, i.e., unmanned aerial vehicles and Internet of vehicles. Finally, numerical results demonstrate a superiority of RIS-assisted ISCC, followed by a range of future research directions.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
VISA: Reasoning Video Object Segmentation via Large Language Models
Authors:
Cilin Yan,
Haochen Wang,
Shilin Yan,
Xiaolong Jiang,
Yao Hu,
Guoliang Kang,
Weidi Xie,
Efstratios Gavves
Abstract:
Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS). This task aims to generate a sequence of segmentation masks in response to implic…
▽ More
Existing Video Object Segmentation (VOS) relies on explicit user instructions, such as categories, masks, or short phrases, restricting their ability to perform complex video segmentation requiring reasoning with world knowledge. In this paper, we introduce a new task, Reasoning Video Object Segmentation (ReasonVOS). This task aims to generate a sequence of segmentation masks in response to implicit text queries that require complex reasoning abilities based on world knowledge and video contexts, which is crucial for structured environment understanding and object-centric interactions, pivotal in the development of embodied AI. To tackle ReasonVOS, we introduce VISA (Video-based large language Instructed Segmentation Assistant), to leverage the world knowledge reasoning capabilities of multi-modal LLMs while possessing the ability to segment and track objects in videos with a mask decoder. Moreover, we establish a comprehensive benchmark consisting of 35,074 instruction-mask sequence pairs from 1,042 diverse videos, which incorporates complex world knowledge reasoning into segmentation tasks for instruction-tuning and evaluation purposes of ReasonVOS models. Experiments conducted on 8 datasets demonstrate the effectiveness of VISA in tackling complex reasoning segmentation and vanilla referring segmentation in both video and image domains. The code and dataset are available at https://github.com/cilinyan/VISA.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Ising-type quantum spin liquid state in PrMgAl$_{11}$O$_{19}$
Authors:
N. Li,
A. Rutherford,
Y. Y. Wang,
H. Liang,
Q. J. Li,
Z. J. Zhang,
H. Wang,
W. Xie,
H. D. Zhou,
X. F. Sun
Abstract:
We have grown single crystals of PrMgAl$_{11}$O$_{19}$, an ideal triangular-lattice antiferromagnet, and performed magnetic susceptibility, specific heat and thermal conductivity measurements at low temperatures. The main results are as follows: (i) The temperature-dependent susceptibility shows a negligible in-plane response and the isothermal magnetization curves confirm the easy axis along the…
▽ More
We have grown single crystals of PrMgAl$_{11}$O$_{19}$, an ideal triangular-lattice antiferromagnet, and performed magnetic susceptibility, specific heat and thermal conductivity measurements at low temperatures. The main results are as follows: (i) The temperature-dependent susceptibility shows a negligible in-plane response and the isothermal magnetization curves confirm the easy axis along the $c$ axis. (ii) The specific heat measurements reveal the absence of long-range magnetic order down to 60 mK, and the power-law temperature dependence indicates the existence of the gapless magnetic excitations in system. (iii) The ultralow-temperature thermal conductivity exhibits negligibly small residual term ($κ_0/T$) and strong spin-phonon scattering effect, suggesting that the spin excitations are also involved. Our results further demonstrate that PrMgAl$_{11}$O$_{19}$ is a rare quantum spin liquid candidate with Ising-like anisotropy.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
Triggering the Untriggered: The First Einstein Probe-Detected Gamma-Ray Burst 240219A and Its Implications
Authors:
Yi-Han Iris Yin,
Bin-Bin Zhang,
Jun Yang,
Hui Sun,
Chen Zhang,
Yi-Xuan Shao,
You-Dong Hu,
Zi-Pei Zhu,
Dong Xu,
Li An,
He Gao,
Xue-Feng Wu,
Bing Zhang,
Alberto Javier Castro-Tirado,
Shashi B. Pandey,
Arne Rau,
Weihua Lei,
Wei Xie,
Giancarlo Ghirlanda,
Luigi Piro,
Paul O'Brien,
Eleonora Troja,
Peter Jonker,
Yun-Wei Yu,
Jie An
, et al. (26 additional authors not shown)
Abstract:
The Einstein Probe (EP) achieved its first detection and localization of a bright X-ray flare, EP240219a, on February 19, 2024, during its commissioning phase. Subsequent targeted searches triggered by the EP240219a alert identified a faint, untriggered gamma-ray burst (GRB) in the archived data of Fermi/GBM, Swift/BAT, Insight-HXMT/HE and INTEGRAL/SPI-ACS. The EP/WXT light curve reveals a long du…
▽ More
The Einstein Probe (EP) achieved its first detection and localization of a bright X-ray flare, EP240219a, on February 19, 2024, during its commissioning phase. Subsequent targeted searches triggered by the EP240219a alert identified a faint, untriggered gamma-ray burst (GRB) in the archived data of Fermi/GBM, Swift/BAT, Insight-HXMT/HE and INTEGRAL/SPI-ACS. The EP/WXT light curve reveals a long duration of approximately 160 seconds with a slow decay, whereas the Fermi/GBM light curve shows a total duration of approximately 70 seconds. The peak in the Fermi/GBM light curve occurs slightly later with respect to the peak seen in the EP/WXT light curve. Our spectral analysis shows that a single cutoff power-law model effectively describes the joint EP/WXT-Fermi/GBM spectra in general, indicating coherent broad emission typical of GRBs. The model yielded a photon index of $\sim -1.70 \pm 0.05$ and a peak energy of $\sim 257 \pm 134$ keV. After detection of GRB 240219A, long-term observations identified several candidates in optical and radio wavelengths, none of which was confirmed as the afterglow counterpart during subsequent optical and near-infrared follow-ups. The analysis of GRB 240219A classifies it as an X-ray rich GRB with a high peak energy, presenting both challenges and opportunities for studying the physical origins of X-ray flashes (XRFs), X-ray rich GRBs (XRRs), and classical GRBs (C-GRBs). Furthermore, linking the cutoff power-law component to non-thermal synchrotron radiation suggests that the burst is driven by a Poynting flux-dominated outflow.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
ENOVA: Autoscaling towards Cost-effective and Stable Serverless LLM Serving
Authors:
Tao Huang,
Pengfei Chen,
Kyoka Gong,
Jocky Hawk,
Zachary Bright,
Wenxin Xie,
Kecheng Huang,
Zhi Ji
Abstract:
Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, mo…
▽ More
Since the increasing popularity of large language model (LLM) backend systems, it is common and necessary to deploy stable serverless serving of LLM on multi-GPU clusters with autoscaling. However, there exist challenges because the diversity and co-location of applications in multi-GPU clusters will lead to low service quality and GPU utilization. To address them, we build ENOVA, a deployment, monitoring and autoscaling service towards serverless LLM serving. ENOVA deconstructs the execution process of LLM service comprehensively, based on which ENOVA designs a configuration recommendation module for automatic deployment on any GPU clusters and a performance detection module for autoscaling. On top of them, ENOVA implements a deployment execution engine for multi-GPU cluster scheduling. The experiment results show that ENOVA significantly outperforms other state-of-the-art methods and is suitable for wide deployment in large online systems.
△ Less
Submitted 17 May, 2024;
originally announced July 2024.
-
Pseudosymmetry in Tetragonal Perovskite SrIrO$_3$ Synthesized under High Pressure
Authors:
Haozhe Wang,
Alberto de la Torre,
Joseph T. Race,
Qiaochu Wang,
Jacob P. C. Ruff,
Patrick M. Woodward,
Kemp W. Plumb,
David Walker,
Weiwei Xie
Abstract:
In this study, we report a tetragonal perovskite structure of SrIrO$_3$ (P4/mmm, a = 3.9362(9) Å, c = 7.880(3) Å) synthesized at 6 GPa and 1400 $°$C, employing the ambient pressure monoclinic SrIrO$_3$ with distorted 6H structure as a precursor. The crystal structure of tetragonal SrIrO3 was evaluated on the basis of single crystal and powder X-ray diffraction. A cubic indexing was observed attrib…
▽ More
In this study, we report a tetragonal perovskite structure of SrIrO$_3$ (P4/mmm, a = 3.9362(9) Å, c = 7.880(3) Å) synthesized at 6 GPa and 1400 $°$C, employing the ambient pressure monoclinic SrIrO$_3$ with distorted 6H structure as a precursor. The crystal structure of tetragonal SrIrO3 was evaluated on the basis of single crystal and powder X-ray diffraction. A cubic indexing was observed attributed to overlooked superlattice reflections. Weak fractional peaks in the H and K dimensions suggest possible structure modulation by oxygen defects. Magnetization study reveals weak paramagnetic behavior down to 2 K, indicative of the interplay between spin-orbit coupling, electron correlations, and crystal electric field. Additionally, measurements of electrical resistivity display metallic behavior with an upturn at about 54 K, ascribed to weak electron localization and possible structural defects.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Bayesian Inference of Fine-Features of Nuclear Equation of State from Future Neutron Star Radius Measurements to 0.1km Accuracy
Authors:
Bao-An Li,
Xavier Grundler,
Wen-Jie Xie,
Nai-Bo Zhang
Abstract:
To more precisely constrain the Equation of State (EOS) of supradense neutron-rich nuclear matter, future high-precision X-ray and gravitational wave observatories are proposed to measure the radii of neutron stars (NSs) with an accuracy better than about 0.1 km. However, it remains unclear what particular aspects (other than the stiffness generally spoken of in the literature) of the EOS and to w…
▽ More
To more precisely constrain the Equation of State (EOS) of supradense neutron-rich nuclear matter, future high-precision X-ray and gravitational wave observatories are proposed to measure the radii of neutron stars (NSs) with an accuracy better than about 0.1 km. However, it remains unclear what particular aspects (other than the stiffness generally spoken of in the literature) of the EOS and to what precision they will be better constrained. In this work, within a Bayesian framework using a meta-model EOS for NSs, we infer the posterior probability distribution functions (PDFs) of incompressibility $K_{0}$ and skewness $J_{0}$ of symmetric nuclear matter (SNM) as well as the slope $L$, curvature $K_{\rm{sym}}$, and skewness $J_{\rm{sym}}$ characterizing the density dependence of nuclear symmetry energy $E_{\rm{sym}}(ρ)$, respectively, from mean values of NS radii consistent with existing observations and an expected accuracy $ΔR$ ranging from about 1.0 km to 0.1 km. We found that (1) the $ΔR$ has little effect on inferring the stiffness of SNM at suprasaturation densities, (2) smaller $ΔR$ reveals more accurately not only the PDFs but also pairwise correlations among parameters characterizing high-density $E_{\rm{sym}}(ρ)$, (3) a double-peak feature of the PDF($K_{\rm{sym}}$) corresponding to the strong $K_{\rm{sym}}-J_{\rm{sym}}$ and $K_{\rm{sym}}-L$ anti-correlations is revealed when $ΔR$ is less than about 0.2 km, (4) the high-precision radius measurement for canonical NSs is more useful than that for massive ones for constraining the EOS of nucleonic matter around (2-3) times the saturation density of SNM.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
SurgicalGaussian: Deformable 3D Gaussians for High-Fidelity Surgical Scene Reconstruction
Authors:
Weixing Xie,
Junfeng Yao,
Xianpeng Cao,
Qiqin Lin,
Zerui Tang,
Xiao Dong,
Xiaohu Guo
Abstract:
Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-tim…
▽ More
Dynamic reconstruction of deformable tissues in endoscopic video is a key technology for robot-assisted surgery. Recent reconstruction methods based on neural radiance fields (NeRFs) have achieved remarkable results in the reconstruction of surgical scenes. However, based on implicit representation, NeRFs struggle to capture the intricate details of objects in the scene and cannot achieve real-time rendering. In addition, restricted single view perception and occluded instruments also propose special challenges in surgical scene reconstruction. To address these issues, we develop SurgicalGaussian, a deformable 3D Gaussian Splatting method to model dynamic surgical scenes. Our approach models the spatio-temporal features of soft tissues at each time stamp via a forward-mapping deformation MLP and regularization to constrain local 3D Gaussians to comply with consistent movement. With the depth initialization strategy and tool mask-guided training, our method can remove surgical instruments and reconstruct high-fidelity surgical scenes. Through experiments on various surgical videos, our network outperforms existing method on many aspects, including rendering quality, rendering speed and GPU usage. The project page can be found at https://surgicalgaussian.github.io.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images
Authors:
Weiyi Xie,
Nathalie Willems,
Shubham Patil,
Yang Li,
Mayank Kumar
Abstract:
We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured…
▽ More
We propose a straightforward yet highly effective few-shot fine-tuning strategy for adapting the Segment Anything (SAM) to anatomical segmentation tasks in medical images. Our novel approach revolves around reformulating the mask decoder within SAM, leveraging few-shot embeddings derived from a limited set of labeled images (few-shot collection) as prompts for querying anatomical objects captured in image embeddings. This innovative reformulation greatly reduces the need for time-consuming online user interactions for labeling volumetric images, such as exhaustively marking points and bounding boxes to provide prompts slice by slice. With our method, users can manually segment a few 2D slices offline, and the embeddings of these annotated image regions serve as effective prompts for online segmentation tasks. Our method prioritizes the efficiency of the fine-tuning process by exclusively training the mask decoder through caching mechanisms while keeping the image encoder frozen. Importantly, this approach is not limited to volumetric medical images, but can generically be applied to any 2D/3D segmentation task. To thoroughly evaluate our method, we conducted extensive validation on four datasets, covering six anatomical segmentation tasks across two modalities. Furthermore, we conducted a comparative analysis of different prompting options within SAM and the fully-supervised nnU-Net. The results demonstrate the superior performance of our method compared to SAM employing only point prompts (approximately 50% improvement in IoU) and performs on-par with fully supervised methods whilst reducing the requirement of labeled data by at least an order of magnitude.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Semi-Supervised Segmentation via Embedding Matching
Authors:
Weiyi Xie,
Nathalie Willems,
Nikolas Lessmann,
Tom Gibbons,
Daniele De Massari
Abstract:
Deep convolutional neural networks are widely used in medical image segmentation but require many labeled images for training. Annotating three-dimensional medical images is a time-consuming and costly process. To overcome this limitation, we propose a novel semi-supervised segmentation method that leverages mostly unlabeled images and a small set of labeled images in training. Our approach involv…
▽ More
Deep convolutional neural networks are widely used in medical image segmentation but require many labeled images for training. Annotating three-dimensional medical images is a time-consuming and costly process. To overcome this limitation, we propose a novel semi-supervised segmentation method that leverages mostly unlabeled images and a small set of labeled images in training. Our approach involves assessing prediction uncertainty to identify reliable predictions on unlabeled voxels from the teacher model. These voxels serve as pseudo-labels for training the student model. In voxels where the teacher model produces unreliable predictions, pseudo-labeling is carried out based on voxel-wise embedding correspondence using reference voxels from labeled images. We applied this method to automate hip bone segmentation in CT images, achieving notable results with just 4 CT scans. The proposed approach yielded a Hausdorff distance with 95th percentile (HD95) of 3.30 and IoU of 0.929, surpassing existing methods achieving HD95 (4.07) and IoU (0.927) at their best.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates
Authors:
Ryotaro Okabe,
Mouyang Cheng,
Abhijatmedhi Chotrattanapituk,
Nguyen Tuan Hung,
Xiang Fu,
Bowen Han,
Yao Wang,
Weiwei Xie,
Robert J. Cava,
Tommi S. Jaakkola,
Yongqiang Cheng,
Mingda Li
Abstract:
Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt…
▽ More
Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patterns into materials generation remains a challenge. Here, we introduce Structural Constraint Integration in the GENerative model (SCIGEN). Our approach can modify any trained generative diffusion model by strategic masking of the denoised structure with a diffused constrained structure prior to each diffusion step to steer the generation toward constrained outputs. Furthermore, we mathematically prove that SCIGEN effectively performs conditional sampling from the original distribution, which is crucial for generating stable constrained materials. We generate eight million compounds using Archimedean lattices as prototype constraints, with over 10% surviving a multi-staged stability pre-screening. High-throughput density functional theory (DFT) on 26,000 survived compounds shows that over 50% passed structural optimization at the DFT level. Since the properties of quantum materials are closely related to geometric patterns, our results indicate that SCIGEN provides a general framework for generating quantum materials candidates.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
Pressure Tuning the Mixture of Eu$^{2+}$ and Eu$^{3+}$ in Eu$_4$Bi$_6$Se$_{13}$
Authors:
Mingyu Xu,
Jose L. Gonzalez Jimenez,
Greeshma C. Jose,
Artittaya Boonkird,
Chengkun Xing,
Chelsea Harrod,
Xinle Li,
Haidong Zhou,
Alyssa Gaiser,
Xianglin Ke,
Wenli Bi,
Mingda Li,
Weiwei Xie
Abstract:
The investigation of crystallographic, electronic, and magnetic characteristics, especially the mixed valences of Eu$^{2+}$ and Eu$^{3+}$ under pressure of a novel europium-based bismuth selenide compound, Eu$_4$Bi$_6$Se$_{13}$, presented. This new compound adopts a monoclinic crystal structure classified under the P$2_1$/m space group (#11). It exhibits distinctive structural features, including…
▽ More
The investigation of crystallographic, electronic, and magnetic characteristics, especially the mixed valences of Eu$^{2+}$ and Eu$^{3+}$ under pressure of a novel europium-based bismuth selenide compound, Eu$_4$Bi$_6$Se$_{13}$, presented. This new compound adopts a monoclinic crystal structure classified under the P$2_1$/m space group (#11). It exhibits distinctive structural features, including substantial Eu-Se coordination numbers, Bi-Se ladders, and linear chains of Eu atoms that propagate along the b-axis. Electronic resistivity assessments indicate that Eu$_{4}$Bi$_{6}$Se$_{13}$ exhibits weak metallic behaviors. Magnetic characterization reveals uniaxial magnetic anisotropy, with a notable spin transition at approximately 1.2 T when the magnetic field is oriented along the b-axis. This behavior, coupled with the specific Eu-Eu interatomic distances and the magnetic saturation observed at low fields, supports the identification of metamagnetic properties attributable to the flipping of europium spins. The Curie-Weiss analysis of the magnetic susceptibility measured both perpendicular and parallel to the b-axis and high-pressure partial fluorescence yield (PFY) results detected by X-ray absorption spectroscopy (XAS) reveal the tendency of the material to enter a mixed valent state where the trivalent state becomes more prominent with the pressure increase or temperature decrease.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
A Sanity Check for AI-generated Image Detection
Authors:
Shilin Yan,
Ouxiang Li,
Jiayin Cai,
Yanbin Hao,
Xiaolong Jiang,
Yao Hu,
Weidi Xie
Abstract:
With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify th…
▽ More
With the rapid development of generative models, discerning AI-generated content has evoked increasing attention from both industry and academia. In this paper, we conduct a sanity check on "whether the task of AI-generated image detection has been solved". To start with, we present Chameleon dataset, consisting AIgenerated images that are genuinely challenging for human perception. To quantify the generalization of existing methods, we evaluate 9 off-the-shelf AI-generated image detectors on Chameleon dataset. Upon analysis, almost all models classify AI-generated images as real ones. Later, we propose AIDE (AI-generated Image DEtector with Hybrid Features), which leverages multiple experts to simultaneously extract visual artifacts and noise patterns. Specifically, to capture the high-level semantics, we utilize CLIP to compute the visual embedding. This effectively enables the model to discern AI-generated images based on semantics or contextual information; Secondly, we select the highest frequency patches and the lowest frequency patches in the image, and compute the low-level patchwise features, aiming to detect AI-generated images by low-level artifacts, for example, noise pattern, anti-aliasing, etc. While evaluating on existing benchmarks, for example, AIGCDetectBenchmark and GenImage, AIDE achieves +3.5% and +4.6% improvements to state-of-the-art methods, and on our proposed challenging Chameleon benchmarks, it also achieves the promising results, despite this problem for detecting AI-generated images is far from being solved. The dataset, codes, and pre-train models will be published at https://github.com/shilinyan99/AIDE.
△ Less
Submitted 27 June, 2024;
originally announced June 2024.
-
MatchTime: Towards Automatic Soccer Game Commentary Generation
Authors:
Jiayuan Rao,
Haoning Wu,
Chang Liu,
Yanfeng Wang,
Weidi Xie
Abstract:
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for…
▽ More
Soccer is a globally popular sport with a vast audience, in this paper, we consider constructing an automatic soccer game commentary model to improve the audiences' viewing experience. In general, we make the following contributions: First, observing the prevalent video-text misalignment in existing datasets, we manually annotate timestamps for 49 matches, establishing a more robust benchmark for soccer game commentary generation, termed as SN-Caption-test-align; Second, we propose a multi-modal temporal alignment pipeline to automatically correct and filter the existing dataset at scale, creating a higher-quality soccer game commentary dataset for training, denoted as MatchTime; Third, based on our curated dataset, we train an automatic commentary generation model, named MatchVoice. Extensive experiments and ablation studies have demonstrated the effectiveness of our alignment pipeline, and training model on the curated datasets achieves state-of-the-art performance for commentary generation, showcasing that better alignment can lead to significant performance improvements in downstream tasks.
△ Less
Submitted 26 June, 2024;
originally announced June 2024.
-
LLMs for Doctors: Leveraging Medical LLMs to Assist Doctors, Not Replace Them
Authors:
Wenya Xie,
Qingying Xiao,
Yu Zheng,
Xidong Wang,
Junying Chen,
Ke Ji,
Anningzhe Gao,
Xiang Wan,
Feng Jiang,
Benyou Wang
Abstract:
The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning th…
▽ More
The recent success of Large Language Models (LLMs) has had a significant impact on the healthcare field, providing patients with medical advice, diagnostic information, and more. However, due to a lack of professional medical knowledge, patients are easily misled by generated erroneous information from LLMs, which may result in serious medical problems. To address this issue, we focus on tuning the LLMs to be medical assistants who collaborate with more experienced doctors. We first conduct a two-stage survey by inspiration-feedback to gain a broad understanding of the real needs of doctors for medical assistants. Based on this, we construct a Chinese medical dataset called DoctorFLAN to support the entire workflow of doctors, which includes 92K Q\&A samples from 22 tasks and 27 specialists. Moreover, we evaluate LLMs in doctor-oriented scenarios by constructing the DoctorFLAN-\textit{test} containing 550 single-turn Q\&A and DotaBench containing 74 multi-turn conversations. The evaluation results indicate that being a medical assistant still poses challenges for existing open-source models, but DoctorFLAN can help them significantly. It demonstrates that the doctor-oriented dataset and benchmarks we construct can complement existing patient-oriented work and better promote medical LLMs research.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
RaTEScore: A Metric for Radiology Report Generation
Authors:
Weike Zhao,
Chaoyi Wu,
Xiaoman Zhang,
Ya Zhang,
Yanfeng Wang,
Weidi Xie
Abstract:
This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Technically, we developed a comprehens…
▽ More
This paper introduces a novel, entity-aware metric, termed as Radiological Report (Text) Evaluation (RaTEScore), to assess the quality of medical reports generated by AI models. RaTEScore emphasizes crucial medical entities such as diagnostic outcomes and anatomical details, and is robust against complex medical synonyms and sensitive to negation expressions. Technically, we developed a comprehensive medical NER dataset, RaTE-NER, and trained an NER model specifically for this purpose. This model enables the decomposition of complex radiological reports into constituent medical entities. The metric itself is derived by comparing the similarity of entity embeddings, obtained from a language model, based on their types and relevance to clinical significance. Our evaluations demonstrate that RaTEScore aligns more closely with human preference than existing metrics, validated both on established public benchmarks and our newly proposed RaTE-Eval benchmark.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Single-Layer Fe-Cu Interphase in Ferritic Steels Stabilized by Magnetic Friedel Oscillations
Authors:
Wen-Qiang Xie,
Jin-Li Cao,
Wen-Tong Geng
Abstract:
Copper precipitation is a technique extensively deployed in steel strengthening. Being as tiny as a few nanometers in diameter, the Cu precipitates present a real challenge to experimental techniques in determination of their composition. The late Professor Morris Fine called it a mystery when addressing the discrepancy between the fact of low solubility of Fe in bulk Cu and the remarkable content…
▽ More
Copper precipitation is a technique extensively deployed in steel strengthening. Being as tiny as a few nanometers in diameter, the Cu precipitates present a real challenge to experimental techniques in determination of their composition. The late Professor Morris Fine called it a mystery when addressing the discrepancy between the fact of low solubility of Fe in bulk Cu and the remarkable content of Fe in Cu precipitates according to atom probe tomography measurement. With a thorough search using rigorous first-principles density functional theory calculations, we are surprised to find that the interface between Cu precipitate and Fe matrix is neither immiscibly sharp, nor commonly miscible over a range of atomic layers, but rather manifests itself as a single-layer mixed Fe-Cu interphase. Our detailed analysis reveals that spin polarization is a key factor in defining such an interphase. The magnetic Friedel oscillations through spin polarization is the driving force to stabilize it. When the single-layer Fe-Cu interphase is counted in, the content of Fe in Cu precipitates would be significant due to their small size. Our finding is a strong demonstration that quantum mechanical effects such as Friedel oscillations can have explicit consequence also in structural materials. The accurate knowledge of the Fe-Cu interface is crucial for the design of advanced high-strength steels.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Orbit symmetry breaking in MXene implements enhanced soft bioelectronic implants
Authors:
Yizhang Wu,
Yuan Li,
Yihan Liu,
Dashuai Zhu,
Sicheng Xing,
Noah Lambert,
Hannah Weisbecker,
Siyuan Liu,
Brayden Davis,
Lin Zhang,
Meixiang Wang,
Gongkai Yuan,
Chris Zhoufan You,
Anran Zhang,
Cate Duncan,
Wanrong Xie,
Yihang Wang,
Yong Wang,
Sreya Kanamurlapudi,
Garcia-Guzman Evert,
Arjun Putcha,
Michael D. Dickey,
Ke Huang,
Wubin Bai
Abstract:
Bioelectronic implants with soft mechanics, biocompatibility, and excellent electrical performance enable biomedical implants to record electrophysiological signals and execute interventions within internal organs, promising to revolutionize the diagnosing, monitoring, and treatment of various pathological conditions. However, challenges remain in improving excessive impedance at the bioelectronic…
▽ More
Bioelectronic implants with soft mechanics, biocompatibility, and excellent electrical performance enable biomedical implants to record electrophysiological signals and execute interventions within internal organs, promising to revolutionize the diagnosing, monitoring, and treatment of various pathological conditions. However, challenges remain in improving excessive impedance at the bioelectronic-tissue interface and thus the efficacy of electrophysiological signaling and intervention. Here, we devise orbit symmetry breaking in MXene (a low-cost scalability, biocompatible, and conductive 2D layered material, that we refer to as OBXene), that exhibits low bioelectronic-tissue impedance, originating from the out-of-plane charge transfer. Furthermore, the Schottky-induced piezoelectricity stemming from the asymmetric orbital configuration of OBXene facilitates interlayered charge transport in the device. In this study, we report an OBXene-based cardiac patch applied on the left ventricular epicardium of both rodent and porcine models to enable spatiotemporal epicardium mapping and pacing, while coupling the wireless and battery-free operation for long-term real-time recording and closed-loop stimulation.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Resilience of international oil trade networks under extreme event shock-recovery simulations
Authors:
Na Wei,
Wen-Jie Xie,
Wei-Xing Zhou
Abstract:
With the frequent occurrence of black swan events, global energy security situation has become increasingly complex and severe. Assessing the resilience of the international oil trade network (iOTN) is crucial for evaluating its ability to withstand extreme shocks and recover thereafter, ensuring energy security. We overcomes the limitations of discrete historical data by developing a simulation m…
▽ More
With the frequent occurrence of black swan events, global energy security situation has become increasingly complex and severe. Assessing the resilience of the international oil trade network (iOTN) is crucial for evaluating its ability to withstand extreme shocks and recover thereafter, ensuring energy security. We overcomes the limitations of discrete historical data by developing a simulation model for extreme event shock-recovery in the iOTNs. We introduce network efficiency indicator to measure oil resource allocation efficiency and evaluate network performance. Then, construct a resilience index to explore the resilience of the iOTNs from dimensions of resistance and recoverability. Our findings indicate that extreme events can lead to sharp declines in performance of the iOTNs, especially when economies with significant trading positions and relations suffer shocks. The upward trend in recoverability and resilience reflects the self-organizing nature of the iOTNs, demonstrating its capacity for optimizing its own structure and functionality. Unlike traditional energy security research based solely on discrete historical data or resistance indicators, our model evaluates resilience from multiple dimensions, offering insights for global energy governance systems while providing diverse perspectives for various economies to mitigate risks and uphold energy security.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Numerical Modeling of Transient Absorption in Hybrid Dual-Plasmonic Au/CuS Nanostructures
Authors:
Atefeh Habibpourmoghadam,
Wenyong Xie,
Patrick Bessel,
André Niebur,
Artsiom Antanovich,
Dirk Dorfs,
Jannika Lauth,
Antonio Calà Lesina
Abstract:
Transient absorption in plasmonic materials has recently attracted attention of the chemistry and optics communities as a technique to understand dynamic processes and hot carriers generation on ultrafast timescales. In this context, hybrid Au/CuS nanostructures were recently investigated via ultrafast pump-probe transient absorption spectroscopy revealing an exotic dual-plasmonic behavior. Namely…
▽ More
Transient absorption in plasmonic materials has recently attracted attention of the chemistry and optics communities as a technique to understand dynamic processes and hot carriers generation on ultrafast timescales. In this context, hybrid Au/CuS nanostructures were recently investigated via ultrafast pump-probe transient absorption spectroscopy revealing an exotic dual-plasmonic behavior. Namely, the excitation of a localized surface plasmon resonance (LSPR) in Au (pump at 551 nm) or CuS (pump at 1051 nm), leads to a transient response in the counterpart. This phenomenon was attributed to Landau damping, which stems from hot carrier generation and injection mechanisms at the interface between the two materials. Here, we employ numerical modeling to further clarify the origin of such response in hybrid Au/CuS nanostructures. The geometry of the hybrid nanostructures is first investigated via steady-state simulations (only probe), confirming an UFO-shaped configuration. We provide clarification on the role of the size ratio between Au and CuS. Finally, we present the simulation of transient absorption in the pump-probe regime, which qualitatively replicates our experimental observations, thus identifying the plasmonic response modified via Landau damping as the main governing mechanism. Our numerical approach provides an important tool for the modeling of transient absorption spectroscopy and can support experimental research on dual-plasmonic materials for applications in spectroscopy, photocatalysis, thermoplasmonics, sensing, and energy harvesting.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV
Authors:
PHENIX Collaboration,
N. J. Abdulameer,
U. Acharya,
A. Adare,
S. Afanasiev,
C. Aidala,
N. N. Ajitanand,
Y. Akiba,
H. Al-Bataineh,
J. Alexander,
M. Alfred,
K. Aoki,
N. Apadula,
L. Aphecetche,
J. Asai,
H. Asano,
E. T. Atomssa,
R. Averbeck,
T. C. Awes,
B. Azmoun,
V. Babintsev,
M. Bai,
G. Baksay,
L. Baksay,
A. Baldisseri
, et al. (510 additional authors not shown)
Abstract:
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs…
▽ More
High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Bayesian inference of nuclear incompressibility from proton elliptic flow in central Au+Au collisions at 400 MeV/nucleon
Authors:
J. M. Wang,
X. G. Deng,
W. J. Xie,
B. A. Li,
Y. G. Ma
Abstract:
The incompressibility $K$ of symmetric nuclear matter (SNM) is inferred in a Bayesian analysis of proton elliptic flow in mid-central Au + Au collisions at $E = 400$ MeV/nucleon using a Gaussian process (GP) emulator of the isospin-dependent quantum molecular dynamics (IQMD) model for heavy-ion collisions, with or without considering the momentum dependence of single-nucleon potentials. Consistent…
▽ More
The incompressibility $K$ of symmetric nuclear matter (SNM) is inferred in a Bayesian analysis of proton elliptic flow in mid-central Au + Au collisions at $E = 400$ MeV/nucleon using a Gaussian process (GP) emulator of the isospin-dependent quantum molecular dynamics (IQMD) model for heavy-ion collisions, with or without considering the momentum dependence of single-nucleon potentials. Consistent but with smaller quantified uncertainties than previous results from forward modeling of the collective flow in heavy-ion collisions using IQMD, considering the momentum dependence of nucleon potentials, $K=191.3^{+3.7}_{-6.3}$ MeV at 68\% confidence level, indicating a very soft SNM equation of state, is inferred from the combined data of the rapidity and transverse momentum dependence of the proton elliptic flow in the Au+Au collisions considered. Ignoring the momentum dependence of single-nucleon potentials, the extracted value for $K$ is $234.7^{+14.6}_{-11.4}$ MeV, in agreement with its fiducial value derived from giant resonance studies.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Mitigating Boundary Ambiguity and Inherent Bias for Text Classification in the Era of Large Language Models
Authors:
Zhenyi Lu,
Jie Tian,
Wei Wei,
Xiaoye Qu,
Yu Cheng,
Wenfeng xie,
Dangyang Chen
Abstract:
Text classification is a crucial task encountered frequently in practical scenarios, yet it is still under-explored in the era of large language models (LLMs). This study shows that LLMs are vulnerable to changes in the number and arrangement of options in text classification. Our extensive empirical analyses reveal that the key bottleneck arises from ambiguous decision boundaries and inherent bia…
▽ More
Text classification is a crucial task encountered frequently in practical scenarios, yet it is still under-explored in the era of large language models (LLMs). This study shows that LLMs are vulnerable to changes in the number and arrangement of options in text classification. Our extensive empirical analyses reveal that the key bottleneck arises from ambiguous decision boundaries and inherent biases towards specific tokens and positions. To mitigate these issues, we make the first attempt and propose a novel two-stage classification framework for LLMs. Our approach is grounded in the empirical observation that pairwise comparisons can effectively alleviate boundary ambiguity and inherent bias. Specifically, we begin with a self-reduction technique to efficiently narrow down numerous options, which contributes to reduced decision space and a faster comparison process. Subsequently, pairwise contrastive comparisons are employed in a chain-of-thought manner to draw out nuances and distinguish confusable options, thus refining the ambiguous decision boundary. Extensive experiments on four datasets (Banking77, HWU64, LIU54, and Clinic150) verify the effectiveness of our framework. Furthermore, benefitting from our framework, various LLMs can achieve consistent improvements. Our code and data are available in \url{https://github.com/Chuge0335/PC-CoT}.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Break the Chain: Large Language Models Can be Shortcut Reasoners
Authors:
Mengru Ding,
Hanmeng Liu,
Zhizhang Fu,
Jian Song,
Wenbo Xie,
Yue Zhang
Abstract:
Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integratio…
▽ More
Recent advancements in Chain-of-Thought (CoT) reasoning utilize complex modules but are hampered by high token consumption, limited applicability, and challenges in reproducibility. This paper conducts a critical evaluation of CoT prompting, extending beyond arithmetic to include complex logical and commonsense reasoning tasks, areas where standard CoT methods fall short. We propose the integration of human-like heuristics and shortcuts into language models (LMs) through "break the chain" strategies. These strategies disrupt traditional CoT processes using controlled variables to assess their efficacy. Additionally, we develop innovative zero-shot prompting strategies that encourage the use of shortcuts, enabling LMs to quickly exploit reasoning clues and bypass detailed procedural steps. Our comprehensive experiments across various LMs, both commercial and open-source, reveal that LMs maintain effective performance with "break the chain" strategies. We also introduce ShortcutQA, a dataset specifically designed to evaluate reasoning through shortcuts, compiled from competitive tests optimized for heuristic reasoning tasks such as forward/backward reasoning and simplification. Our analysis confirms that ShortcutQA not only poses a robust challenge to LMs but also serves as an essential benchmark for enhancing reasoning efficiency in AI.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
PGSR: Planar-based Gaussian Splatting for Efficient and High-Fidelity Surface Reconstruction
Authors:
Danpeng Chen,
Hai Li,
Weicai Ye,
Yifan Wang,
Weijian Xie,
Shangjin Zhai,
Nan Wang,
Haomin Liu,
Hujun Bao,
Guofeng Zhang
Abstract:
Recently, 3D Gaussian Splatting (3DGS) has attracted widespread attention due to its high-quality rendering, and ultra-fast training and rendering speed. However, due to the unstructured and irregular nature of Gaussian point clouds, it is difficult to guarantee geometric reconstruction accuracy and multi-view consistency simply by relying on image reconstruction loss. Although many studies on sur…
▽ More
Recently, 3D Gaussian Splatting (3DGS) has attracted widespread attention due to its high-quality rendering, and ultra-fast training and rendering speed. However, due to the unstructured and irregular nature of Gaussian point clouds, it is difficult to guarantee geometric reconstruction accuracy and multi-view consistency simply by relying on image reconstruction loss. Although many studies on surface reconstruction based on 3DGS have emerged recently, the quality of their meshes is generally unsatisfactory. To address this problem, we propose a fast planar-based Gaussian splatting reconstruction representation (PGSR) to achieve high-fidelity surface reconstruction while ensuring high-quality rendering. Specifically, we first introduce an unbiased depth rendering method, which directly renders the distance from the camera origin to the Gaussian plane and the corresponding normal map based on the Gaussian distribution of the point cloud, and divides the two to obtain the unbiased depth. We then introduce single-view geometric, multi-view photometric, and geometric regularization to preserve global geometric accuracy. We also propose a camera exposure compensation model to cope with scenes with large illumination variations. Experiments on indoor and outdoor scenes show that our method achieves fast training and rendering while maintaining high-fidelity rendering and geometric reconstruction, outperforming 3DGS-based and NeRF-based methods.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Position Debiasing Fine-Tuning for Causal Perception in Long-Term Dialogue
Authors:
Shixuan Fan,
Wei Wei,
Wendi Li,
Xian-Ling Mao,
Wenfeng Xie,
Dangyang Chen
Abstract:
The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to…
▽ More
The core of the dialogue system is to generate relevant, informative, and human-like responses based on extensive dialogue history. Recently, dialogue generation domain has seen mainstream adoption of large language models (LLMs), due to its powerful capability in generating utterances. However, there is a natural deficiency for such models, that is, inherent position bias, which may lead them to pay more attention to the nearby utterances instead of causally relevant ones, resulting in generating irrelevant and generic responses in long-term dialogue. To alleviate such problem, in this paper, we propose a novel method, named Causal Perception long-term Dialogue framework (CPD), which employs perturbation-based causal variable discovery method to extract casually relevant utterances from the dialogue history and enhances model causal perception during fine-tuning. Specifically, a local-position awareness method is proposed in CPD for inter-sentence position correlation elimination, which helps models extract causally relevant utterances based on perturbations. Then, a casual-perception fine-tuning strategy is also proposed, to enhance the capability of discovering the causal invariant factors, by differently perturbing causally relevant and non-casually relevant ones for response generation. Experimental results on two datasets prove that our proposed method can effectively alleviate the position bias for multiple LLMs and achieve significant progress compared with existing baselines.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Improving Segment Anything on the Fly: Auxiliary Online Learning and Adaptive Fusion for Medical Image Segmentation
Authors:
Tianyu Huang,
Tao Zhou,
Weidi Xie,
Shuo Wang,
Qi Dou,
Yizhe Zhang
Abstract:
The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entai…
▽ More
The current variants of the Segment Anything Model (SAM), which include the original SAM and Medical SAM, still lack the capability to produce sufficiently accurate segmentation for medical images. In medical imaging contexts, it is not uncommon for human experts to rectify segmentations of specific test samples after SAM generates its segmentation predictions. These rectifications typically entail manual or semi-manual corrections employing state-of-the-art annotation tools. Motivated by this process, we introduce a novel approach that leverages the advantages of online machine learning to enhance Segment Anything (SA) during test time. We employ rectified annotations to perform online learning, with the aim of improving the segmentation quality of SA on medical images. To improve the effectiveness and efficiency of online learning when integrated with large-scale vision models like SAM, we propose a new method called Auxiliary Online Learning (AuxOL). AuxOL creates and applies a small auxiliary model (specialist) in conjunction with SAM (generalist), entails adaptive online-batch and adaptive segmentation fusion. Experiments conducted on eight datasets covering four medical imaging modalities validate the effectiveness of the proposed method. Our work proposes and validates a new, practical, and effective approach for enhancing SA on downstream segmentation tasks (e.g., medical image segmentation).
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
LLMs Could Autonomously Learn Without External Supervision
Authors:
Ke Ji,
Junying Chen,
Anningzhe Gao,
Wenya Xie,
Xiang Wan,
Benyou Wang
Abstract:
In the quest for super-human performance, Large Language Models (LLMs) have traditionally been tethered to human-annotated datasets and predefined training objectives-a process that is both labor-intensive and inherently limited. This paper presents a transformative approach: Autonomous Learning for LLMs, a self-sufficient learning paradigm that frees models from the constraints of human supervisi…
▽ More
In the quest for super-human performance, Large Language Models (LLMs) have traditionally been tethered to human-annotated datasets and predefined training objectives-a process that is both labor-intensive and inherently limited. This paper presents a transformative approach: Autonomous Learning for LLMs, a self-sufficient learning paradigm that frees models from the constraints of human supervision. This method endows LLMs with the ability to self-educate through direct interaction with text, akin to a human reading and comprehending literature. Our approach eliminates the reliance on annotated data, fostering an Autonomous Learning environment where the model independently identifies and reinforces its knowledge gaps. Empirical results from our comprehensive experiments, which utilized a diverse array of learning materials and were evaluated against standard public quizzes, reveal that Autonomous Learning outstrips the performance of both Pre-training and Supervised Fine-Tuning (SFT), as well as retrieval-augmented methods. These findings underscore the potential of Autonomous Learning to not only enhance the efficiency and effectiveness of LLM training but also to pave the way for the development of more advanced, self-reliant AI systems.
△ Less
Submitted 6 June, 2024; v1 submitted 1 June, 2024;
originally announced June 2024.
-
Fast characterization of multiplexed single-electron pumps with machine learning
Authors:
N. Schoinas,
Y. Rath,
S. Norimoto,
W. Xie,
P. See,
J. P. Griffiths,
C. Chen,
D. A. Ritchie,
M. Kataoka,
A. Rossi,
I. Rungger
Abstract:
We present an efficient machine learning based automated framework for the fast tuning of single-electron pump devices into current quantization regimes. It uses a sparse measurement approach based on an iterative active learning algorithm to take targeted measurements in the gate voltage parameter space. When compared to conventional parameter scans, our automated framework allows us to decrease…
▽ More
We present an efficient machine learning based automated framework for the fast tuning of single-electron pump devices into current quantization regimes. It uses a sparse measurement approach based on an iterative active learning algorithm to take targeted measurements in the gate voltage parameter space. When compared to conventional parameter scans, our automated framework allows us to decrease the number of measurement points by about an order of magnitude. This corresponds to an eight-fold decrease in the time required to determine quantization errors, which are estimated via an exponential extrapolation of the first current plateau embedded into the algorithm. We show the robustness of the framework by characterizing 28 individual devices arranged in a GaAs/AlGaAs multiplexer array, which we use to identify a subset of devices suitable for parallel operation at communal gate voltages. The method opens up the possibility to efficiently scale the characterization of such multiplexed devices to a large number of pumps.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Generative AI for Deep Reinforcement Learning: Framework, Analysis, and Use Cases
Authors:
Geng Sun,
Wenwen Xie,
Dusit Niyato,
Fang Mei,
Jiawen Kang,
Hongyang Du,
Shiwen Mao
Abstract:
As a form of artificial intelligence (AI) technology based on interactive learning, deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. However, DRL faces certain limitations, including low sample efficiency and poor generalization. Therefore, we present how to leverage generative AI (GAI) to address these issues above and en…
▽ More
As a form of artificial intelligence (AI) technology based on interactive learning, deep reinforcement learning (DRL) has been widely applied across various fields and has achieved remarkable accomplishments. However, DRL faces certain limitations, including low sample efficiency and poor generalization. Therefore, we present how to leverage generative AI (GAI) to address these issues above and enhance the performance of DRL algorithms in this paper. We first introduce several classic GAI and DRL algorithms and demonstrate the applications of GAI-enhanced DRL algorithms. Then, we discuss how to use GAI to improve DRL algorithms from the data and policy perspectives. Subsequently, we introduce a framework that demonstrates an actual and novel integration of GAI with DRL, i.e., GAI-enhanced DRL. Additionally, we provide a case study of the framework on UAV-assisted integrated near-field/far-field communication to validate the performance of the proposed framework. Moreover, we present several future directions. Finally, the related code is available at: https://xiewenwen22.github.io/GAI-enhanced-DRL.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Grade Like a Human: Rethinking Automated Assessment with Large Language Models
Authors:
Wenjing Xie,
Juxin Niu,
Chun Jason Xue,
Nan Guan
Abstract:
While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps,…
▽ More
While large language models (LLMs) have been used for automated grading, they have not yet achieved the same level of performance as humans, especially when it comes to grading complex questions. Existing research on this topic focuses on a particular step in the grading procedure: grading using predefined rubrics. However, grading is a multifaceted procedure that encompasses other crucial steps, such as grading rubrics design and post-grading review. There has been a lack of systematic research exploring the potential of LLMs to enhance the entire grading~process.
In this paper, we propose an LLM-based grading system that addresses the entire grading procedure, including the following key components: 1) Developing grading rubrics that not only consider the questions but also the student answers, which can more accurately reflect students' performance. 2) Under the guidance of grading rubrics, providing accurate and consistent scores for each student, along with customized feedback. 3) Conducting post-grading review to better ensure accuracy and fairness. Additionally, we collected a new dataset named OS from a university operating system course and conducted extensive experiments on both our new dataset and the widely used Mohler dataset. Experiments demonstrate the effectiveness of our proposed approach, providing some new insights for developing automated grading systems based on LLMs.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.
-
Effect of Halogen Substituents on Charge Transport Properties of n-type Organic Semiconductors: A Theoretical Study
Authors:
Sara Roosta,
Marcus Elstner,
Weiwei Xie
Abstract:
Organic semiconductors (OSCs) have received much attention as promising materials for electronic devices. In this study, we investigate the impact of halogen groups on the charge transport properties of n-type OSC-6,13 bis ((triisopropylsilyl) ethynyl)-5,7,12,14-tetraazapentacene (TIPS-TAP). The computed mobilities for TAPs substituted with F and Cl exhibit excellent agreement with the experimenta…
▽ More
Organic semiconductors (OSCs) have received much attention as promising materials for electronic devices. In this study, we investigate the impact of halogen groups on the charge transport properties of n-type OSC-6,13 bis ((triisopropylsilyl) ethynyl)-5,7,12,14-tetraazapentacene (TIPS-TAP). The computed mobilities for TAPs substituted with F and Cl exhibit excellent agreement with the experimental values, while the simulation overestimates the electron mobility for TAP. Interestingly, the mobility of TIPS-TAP-4F is significantly lower than that of TIPS-TAP-4Cl/Br, despite their similar packing structures. This discrepancy can be attributed to the strong electron-withdrawing effect of fluoride, leading to reduced electron transfer integrals and increased reorganization energy. While molecular packing is widely accepted as a dominant factor in charge transport in OSCs, our study highlights the essential role of electronic effects in OSC charge transport. This study provides new insights into the understanding of the charge transport mechanism in OSCs.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Spinons in a new Shastry-Sutherland lattice magnet Pr$_2$Ga$_2$BeO$_7$
Authors:
N. Li,
A. Brassington,
M. F. Shu,
Y. Y. Wang,
H. Liang,
Q. J. Li,
X. Zhao,
P. J. Baker,
H. Kikuchi,
T. Masuda,
G. Duan,
C. Liu,
H. Wang,
W. Xie,
R. Zhong,
J. Ma,
R. Yu,
H. D. Zhou,
X. F. Sun
Abstract:
Identifying the elusive spinon excitations in quantum spin liquid (QSL) materials is what scientists have long sought for. Recently, thermal conductivity ($κ$) has emerged to be a decisive probe because the fermionic nature of spinons leads to a characteristic nonzero linear $κ_0/T$ term while approaching zero Kelvin. So far, only a few systems have been reported to exhibit such term. Here, we rep…
▽ More
Identifying the elusive spinon excitations in quantum spin liquid (QSL) materials is what scientists have long sought for. Recently, thermal conductivity ($κ$) has emerged to be a decisive probe because the fermionic nature of spinons leads to a characteristic nonzero linear $κ_0/T$ term while approaching zero Kelvin. So far, only a few systems have been reported to exhibit such term. Here, we report a $κ_0/T \approx$ 0.01 WK$^{-2}$m$^{-1}$, the largest $κ_0/T$ value ever observed in magnetic oxide QSL candidates, in a new quantum magnet Pr$_2$Ga$_2$BeO$_7$ with a Shastry-Sutherland lattice (SSL). Its QSL nature is further supported by the power-law temperature dependence of the specific heat, a plateau of muon spin relaxation rate, and gapless inelastic neutron spectra. Our theoretical analysis reveals that the introduction of XY spin anisotropy is the key for Pr$_2$Ga$_2$BeO$_7$ to be the first QSL realized on the SSL, after more than four decades of extensive studies on this celebrated magnetically frustrated lattice.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Temperature-dependent Structural Evolution of Ruddlesden-Popper Bilayer Nickelate La$_3$Ni$_2$O$_7$
Authors:
Haozhe Wang,
Haidong Zhou,
Weiwei Xie
Abstract:
A recent $J. Am. Chem. Soc.$ Article (DOI: 10.1021/jacs.3c13094) details a pressure-temperature ($P$-$T$) phase diagram for the Ruddlesden-Popper bilayer nickelate La$_3$Ni$_2$O$_7$ (LNO-2222) using synchrotron X-ray diffraction. This study identifies a phase transition from $Amam$ (#63) to $Fmmm$ (#69) within the temperature range of 104 K to 120 K under initial pressure and attributes the…
▽ More
A recent $J. Am. Chem. Soc.$ Article (DOI: 10.1021/jacs.3c13094) details a pressure-temperature ($P$-$T$) phase diagram for the Ruddlesden-Popper bilayer nickelate La$_3$Ni$_2$O$_7$ (LNO-2222) using synchrotron X-ray diffraction. This study identifies a phase transition from $Amam$ (#63) to $Fmmm$ (#69) within the temperature range of 104 K to 120 K under initial pressure and attributes the $I\rm{4/}$$mmm$ (#139) space group to the structure responsible for the superconductivity of LNO-2222. Herein, we examine the temperature-dependent structural evolution of LNO-2222 single crystals at ambient pressure. Contrary to symmetry increase and the established $Amam$-$Fmmm$ phase boundary, we observe an enhancement in the $Cmcm$ reflections as temperature decreases. This work not only establishes a benchmark method for single crystal structure studies of LNO-2222 using laboratory X-rays, but also enhances the understanding of the complex crystallographic behavior of this system, contributing insights to further experimental and theoretical explorations.
△ Less
Submitted 14 May, 2024;
originally announced May 2024.
-
Magnetic properties of the quasi-XY Shastry-Sutherland magnet Er$_2$Be$_2$SiO$_7$
Authors:
A. Brassington,
1 Q. Ma,
G. Sala,
A. I. Kolesnikov,
K. M. Taddei,
Y. Wu,
E. S Choi,
H. Wang,
W. Xie,
J. Ma,
H. D. Zhou,
A. A. Aczel
Abstract:
Polycrystalline and single crystal samples of the insulating Shastry-Sutherland compound Er$_2$Be$_2$SiO$_7$ were synthesized via a solid-state reaction and the floating zone method respectively. The crystal structure, Er single ion anisotropy, zero-field magnetic ground state, and magnetic phase diagrams along high-symmetry crystallographic directions were investigated by bulk measurement techniq…
▽ More
Polycrystalline and single crystal samples of the insulating Shastry-Sutherland compound Er$_2$Be$_2$SiO$_7$ were synthesized via a solid-state reaction and the floating zone method respectively. The crystal structure, Er single ion anisotropy, zero-field magnetic ground state, and magnetic phase diagrams along high-symmetry crystallographic directions were investigated by bulk measurement techniques, x-ray and neutron diffraction, and neutron spectroscopy. We establish that Er$_2$Be$_2$SiO$_7$ crystallizes in a tetragonal space group with planes of orthogonal Er dimers and a strong preference for the Er moments to lie in the local plane perpendicular to each dimer bond. We also find that this system has a non-collinear ordered ground state in zero field with a transition temperature of 0.841 K consisting of antiferromagnetic dimers and in-plane moments. Finally, we mapped out the $H-T$ phase diagrams for Er$_2$Be$_2$SiO$_7$ along the directions $H \parallel$ [001], [100], and [110]. While an increasing in-plane field simply induces a phase transition to a field-polarized phase, we identify three metamagnetic transitions before the field-polarized phase is established in the $H \parallel$ [001] case. This complex behavior establishes insulating Er$_2$Be$_2$SiO$_7$ and other isostructural family members as promising candidates for uncovering exotic magnetic properties and phenomena that can be readily compared to theoretical predictions of the exactly soluble Shastry-Sutherland model.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis
Authors:
Tianci Bi,
Xiaoyi Zhang,
Zhizheng Zhang,
Wenxuan Xie,
Cuiling Lan,
Yan Lu,
Nanning Zheng
Abstract:
Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detection and grouping using separate models, or train a model from scratch while using a unified one. All of them have not yet made full use of the already…
▽ More
Significant progress has been made in scene text detection models since the rise of deep learning, but scene text layout analysis, which aims to group detected text instances as paragraphs, has not kept pace. Previous works either treated text detection and grouping using separate models, or train a model from scratch while using a unified one. All of them have not yet made full use of the already well-trained text detectors and easily obtainable detection datasets. In this paper, we present Text Grouping Adapter (TGA), a module that can enable the utilization of various pre-trained text detectors to learn layout analysis, allowing us to adopt a well-trained text detector right off the shelf or just fine-tune it efficiently. Designed to be compatible with various text detector architectures, TGA takes detected text regions and image features as universal inputs to assemble text instance features. To capture broader contextual information for layout analysis, we propose to predict text group masks from text instance features by one-to-many assignment. Our comprehensive experiments demonstrate that, even with frozen pre-trained models, incorporating our TGA into various pre-trained text detectors and text spotters can achieve superior layout analysis performance, simultaneously inheriting generalized text detection ability from pre-training. In the case of full parameter fine-tuning, we can further improve layout analysis performance.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning
Authors:
Dan Qiao,
Yi Su,
Pinzheng Wang,
Jing Ye,
Wenjing Xie,
Yuechi Zhou,
Yuyang Ding,
Zecheng Tang,
Jikai Wang,
Yixin Ji,
Yue Wang,
Pei Guo,
Zechen Sun,
Zikang Zhang,
Juntao Li,
Pingfu Chao,
Wenliang Chen,
Guohong Fu,
Guodong Zhou,
Qiaoming Zhu,
Min Zhang
Abstract:
Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived…
▽ More
Large Language Models (LLMs) have played an important role in many fields due to their powerful capabilities.However, their massive number of parameters leads to high deployment requirements and incurs significant inference costs, which impedes their practical applications. Training smaller models is an effective way to address this problem. Therefore, we introduce OpenBA-V2, a 3.4B model derived from multi-stage compression and continual pre-training from the original 15B OpenBA model. OpenBA-V2 utilizes more data, more flexible training objectives, and techniques such as layer pruning, neural pruning, and vocabulary pruning to achieve a compression rate of 77.3\% with minimal performance loss. OpenBA-V2 demonstrates competitive performance compared to other open-source models of similar size, achieving results close to or on par with the 15B OpenBA model in downstream tasks such as common sense reasoning and Named Entity Recognition (NER). OpenBA-V2 illustrates that LLMs can be compressed into smaller ones with minimal performance loss by employing advanced training objectives and data strategies, which may help deploy LLMs in resource-limited scenarios.
△ Less
Submitted 9 May, 2024;
originally announced May 2024.
-
Pressure induced metallization and loss of surface magnetism in FeSi
Authors:
Yuhang Deng,
Farhad Taraporevala,
Haozhe Wang,
Eric Lee-Wong,
Camilla M. Moir,
Jinhyuk Lim,
Shubham Sinha,
Weiwei Xie,
James Hamlin,
Yogesh Vohra,
M. Brian Maple
Abstract:
Single crystalline FeSi samples with a conducting surface state (CSS) were studied under high pressure ($\textit{P}$) and magnetic field ($\textit{B}$) by means of electrical resistance ($\textit{R}$) measurements to explore how the bulk semiconducting state and the surface state are tuned by the application of pressure. We found that the energy gap ($Δ$) associated with the semiconducting bulk ph…
▽ More
Single crystalline FeSi samples with a conducting surface state (CSS) were studied under high pressure ($\textit{P}$) and magnetic field ($\textit{B}$) by means of electrical resistance ($\textit{R}$) measurements to explore how the bulk semiconducting state and the surface state are tuned by the application of pressure. We found that the energy gap ($Δ$) associated with the semiconducting bulk phase begins to close abruptly at a critical pressure ($P_{cr}$) of ~10 GPa and the bulk material becomes metallic with no obvious sign of any emergent phases or non-Fermi liquid behavior in $\textit{R}$($\textit{T}$) in the neighborhood of $P_{cr}$ above 3 K. Moreover, the metallic phase appears to remain at near-ambient pressure upon release of the pressure. Interestingly, the hysteresis in the $\textit{R}$($\textit{T}$) curve associated with the magnetically ordered CSS decreases with pressure and vanishes at $P_{cr}$, while the slope of the $\textit{R}$($\textit{B}$) curve, d$\textit{R}$/d$\textit{B}$, which has a negative value for $\textit{P}$ < $P_{cr}$, decreases in magnitude with $\textit{P}$ and changes sign at $P_{cr}$. Thus, the CSS and the corresponding two-dimensional magnetic order collapse at $P_{cr}$ where the energy gap $Δ$ of the bulk material starts to close abruptly, revealing the connection between the CSS and the semiconducting bulk state in FeSi.
△ Less
Submitted 7 May, 2024;
originally announced May 2024.
-
Adjoint Sensitivity Analysis on Multi-Scale Bioprocess Stochastic Reaction Network
Authors:
Keilung Choy,
Wei Xie
Abstract:
Motivated by the pressing challenges in the digital twin development for biomanufacturing systems, we introduce an adjoint sensitivity analysis (SA) approach to expedite the learning of mechanistic model parameters. In this paper, we consider enzymatic stochastic reaction networks representing a multi-scale bioprocess mechanistic model that allows us to integrate disparate data from diverse produc…
▽ More
Motivated by the pressing challenges in the digital twin development for biomanufacturing systems, we introduce an adjoint sensitivity analysis (SA) approach to expedite the learning of mechanistic model parameters. In this paper, we consider enzymatic stochastic reaction networks representing a multi-scale bioprocess mechanistic model that allows us to integrate disparate data from diverse production processes and leverage the information from existing macro-kinetic and genome-scale models. To support forward prediction and backward reasoning, we develop a convergent adjoint SA algorithm studying how the perturbations of model parameters and inputs (e.g., initial state) propagate through enzymatic reaction networks and impact on output trajectory predictions. This SA can provide a sample efficient and interpretable way to assess the sensitivities between inputs and outputs accounting for their causal dependencies. Our empirical study underscores the resilience of these sensitivities and illuminates a deeper comprehension of the regulatory mechanisms behind bioprocess through sensitivities.
△ Less
Submitted 28 June, 2024; v1 submitted 7 May, 2024;
originally announced May 2024.
-
Digital Twin Calibration for Biological System-of-Systems: Cell Culture Manufacturing Process
Authors:
Fuqiang Cheng,
Wei Xie,
Hua Zheng
Abstract:
Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider…
▽ More
Biomanufacturing innovation relies on an efficient Design of Experiments (DoEs) to optimize processes and product quality. Traditional DoE methods, ignoring the underlying bioprocessing mechanisms, often suffer from a lack of interpretability and sample efficiency. This limitation motivates us to create a new optimal learning approach for digital twin model calibration. In this study, we consider the cell culture process multi-scale mechanistic model, also known as Biological System-of-Systems (Bio-SoS). This model with a modular design, composed of sub-models, allows us to integrate data across various production processes. To calibrate the Bio-SoS digital twin, we evaluate the mean squared error of model prediction and develop a computational approach to quantify the impact of parameter estimation error of individual sub-models on the prediction accuracy of digital twin, which can guide sample-efficient and interpretable DoEs.
△ Less
Submitted 28 June, 2024; v1 submitted 6 May, 2024;
originally announced May 2024.
-
Double Self-Sustainable Reconfigurable Intelligent Surfaces Aided Wireless Communications
Authors:
Ji Wang,
Suhong Luo,
Yixuan Li,
Wenwu Xie,
Xingwang Li,
Arumugam Nallanathan
Abstract:
A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meet…
▽ More
A double self-sustainable reconfigurable intelligent surfaces (RISs) assisted multi-user multiple input multiple output (MIMO) system is investigated. Two RISs are equipped with energy harvesting circuit to achieve self-sustainable transmission. The aim is to minimize the transmission power at the base station (BS), while guaranteeing the quality of service (QoS) requirements of the users and meeting the power consumption requirements of the RISs. A block coordinate descent (BCD) algorithm based on the penalty-based method and successive convex approximation (SCA) is employed to alternatively optimize the active beamforming at the BS and the phase shifts, as well as amplitude coefficients of two RISs. Simulation results show that the required power consumption at the BS for the proposed double self-sustainable RISs system is significantly reduced compared to conventional RIS systems.
△ Less
Submitted 7 May, 2024; v1 submitted 5 May, 2024;
originally announced May 2024.
-
Linear Noise Approximation Assisted Bayesian Inference on Mechanistic Model of Partially Observed Stochastic Reaction Network
Authors:
Wandi Xu,
Wei Xie
Abstract:
To support mechanism online learning and facilitate digital twin development for biomanufacturing processes, this paper develops an efficient Bayesian inference approach for partially observed enzymatic stochastic reaction network (SRN), a fundamental building block of multi-scale bioprocess mechanistic model. To tackle the critical challenges brought by the nonlinear stochastic differential equat…
▽ More
To support mechanism online learning and facilitate digital twin development for biomanufacturing processes, this paper develops an efficient Bayesian inference approach for partially observed enzymatic stochastic reaction network (SRN), a fundamental building block of multi-scale bioprocess mechanistic model. To tackle the critical challenges brought by the nonlinear stochastic differential equations (SDEs)-based mechanistic model with partially observed state and having measurement errors, an interpretable Bayesian updating linear noise approximation (LNA) metamodel, incorporating the structure information of the mechanistic model, is proposed to approximate the likelihood of observations. Then, an efficient posterior sampling approach is developed by utilizing the gradients of the derived likelihood to speed up the convergence of Markov Chain Monte Carlo (MCMC). The empirical study demonstrates that the proposed approach has a promising performance.
△ Less
Submitted 28 June, 2024; v1 submitted 4 May, 2024;
originally announced May 2024.
-
High-quality Surface Reconstruction using Gaussian Surfels
Authors:
Pinxuan Dai,
Jiamin Xu,
Wenxiang Xie,
Xinguo Liu,
Huamin Wang,
Weiwei Xu
Abstract:
We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer.…
▽ More
We propose a novel point-based representation, Gaussian surfels, to combine the advantages of the flexible optimization procedure in 3D Gaussian points and the surface alignment property of surfels. This is achieved by directly setting the z-scale of 3D Gaussian points to 0, effectively flattening the original 3D ellipsoid into a 2D ellipse. Such a design provides clear guidance to the optimizer. By treating the local z-axis as the normal direction, it greatly improves optimization stability and surface alignment. While the derivatives to the local z-axis computed from the covariance matrix are zero in this setting, we design a self-supervised normal-depth consistency loss to remedy this issue. Monocular normal priors and foreground masks are incorporated to enhance the quality of the reconstruction, mitigating issues related to highlights and background. We propose a volumetric cutting method to aggregate the information of Gaussian surfels so as to remove erroneous points in depth maps generated by alpha blending. Finally, we apply screened Poisson reconstruction method to the fused depth maps to extract the surface mesh. Experimental results show that our method demonstrates superior performance in surface reconstruction compared to state-of-the-art neural volume rendering and point-based rendering methods.
△ Less
Submitted 29 April, 2024; v1 submitted 27 April, 2024;
originally announced April 2024.
-
Multicontinuum homogenization in perforated domains
Authors:
Wei Xie,
Yalchin Efendiev,
Yunqing Huang,
Wing Tat Leung,
Yin Yang
Abstract:
In this paper, we develop a general framework for multicontinuum homogenization in perforated domains. The simulations of problems in perforated domains are expensive and, in many applications, coarse-grid macroscopic models are developed. Many previous approaches include homogenization, multiscale finite element methods, and so on. In our paper, we design multicontinuum homogenization based on ou…
▽ More
In this paper, we develop a general framework for multicontinuum homogenization in perforated domains. The simulations of problems in perforated domains are expensive and, in many applications, coarse-grid macroscopic models are developed. Many previous approaches include homogenization, multiscale finite element methods, and so on. In our paper, we design multicontinuum homogenization based on our recently proposed framework. In this setting, we distinguish different spatial regions in perforations based on their sizes. For example, very thin perforations are considered as one continua, while larger perforations are considered as another continua. By differentiating perforations in this way, we are able to predict flows in each of them more accurately. We present a framework by formulating cell problems for each continuum using appropriate constraints for the solution averages and their gradients. These cell problem solutions are used in a multiscale expansion and in deriving novel macroscopic systems for multicontinuum homogenization. Our proposed approaches are designed for problems without scale separation. We present numerical results for two continuum problems and demonstrate the accuracy of the proposed methods.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
CEM-GMsFEM for Poisson equations in heterogeneous perforated domains
Authors:
Wei Xie,
Yin Yang,
Eric Chung,
Yunqing Huang
Abstract:
In this paper, we propose a novel multiscale model reduction strategy tailored to address the Poisson equation within heterogeneous perforated domains. The numerical simulation of this intricate problem is impeded by its multiscale characteristics, necessitating an exceptionally fine mesh to adequately capture all relevant details. To overcome the challenges inherent in the multiscale nature of th…
▽ More
In this paper, we propose a novel multiscale model reduction strategy tailored to address the Poisson equation within heterogeneous perforated domains. The numerical simulation of this intricate problem is impeded by its multiscale characteristics, necessitating an exceptionally fine mesh to adequately capture all relevant details. To overcome the challenges inherent in the multiscale nature of the perforations, we introduce a coarse space constructed using the Constraint Energy Minimizing Generalized Multiscale Finite Element Method (CEM-GMsFEM). This involves constructing basis functions through a sequence of local energy minimization problems over eigenspaces containing localized information pertaining to the heterogeneities. Through our analysis, we demonstrate that the oversampling layers depend on the local eigenvalues, thereby implicating the local geometry as well. Additionally, we provide numerical examples to illustrate the efficacy of the proposed scheme.
△ Less
Submitted 26 April, 2024;
originally announced April 2024.
-
Made to Order: Discovering monotonic temporal changes via self-supervised video ordering
Authors:
Charig Yang,
Weidi Xie,
Andrew Zisserman
Abstract:
Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with `time' serving as a supervisory signal since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a flexible transformer-based model for general-purpose ordering of ima…
▽ More
Our objective is to discover and localize monotonic temporal changes in a sequence of images. To achieve this, we exploit a simple proxy task of ordering a shuffled image sequence, with `time' serving as a supervisory signal since only changes that are monotonic with time can give rise to the correct ordering. We also introduce a flexible transformer-based model for general-purpose ordering of image sequences of arbitrary length with built-in attribution maps. After training, the model successfully discovers and localizes monotonic changes while ignoring cyclic and stochastic ones. We demonstrate applications of the model in multiple video settings covering different scene and object types, discovering both object-level and environmental changes in unseen sequences. We also demonstrate that the attention-based attribution maps function as effective prompts for segmenting the changing regions, and that the learned representations can be used for downstream applications. Finally, we show that the model achieves the state of the art on standard benchmarks for ordering a set of images.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis
Authors:
Xiaoman Zhang,
Chaoyi Wu,
Ziheng Zhao,
Jiayu Lei,
Ya Zhang,
Yanfeng Wang,
Weidi Xie
Abstract:
Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In thi…
▽ More
Developing generalist foundation model has recently attracted tremendous attention among researchers in the field of AI for Medicine (AI4Medicine). A pivotal insight in developing these models is their reliance on dataset scaling, which emphasizes the requirements on developing open-source medical image datasets that incorporate diverse supervision signals across various imaging modalities. In this paper, we introduce RadGenome-Chest CT, a comprehensive, large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE. Specifically, we leverage the latest powerful universal segmentation and large language models, to extend the original datasets (over 25,692 non-contrast 3D chest CT volume and reports from 20,000 patients) from the following aspects: (i) organ-level segmentation masks covering 197 categories, which provide intermediate reasoning visual clues for interpretation; (ii) 665 K multi-granularity grounded reports, where each sentence of the report is linked to the corresponding anatomical region of CT volume in the form of a segmentation mask; (iii) 1.3 M grounded VQA pairs, where questions and answers are all linked with reference segmentation masks, enabling models to associate visual evidence with textual explanations. All grounded reports and VQA pairs in the validation set have gone through manual verification to ensure dataset quality. We believe that RadGenome-Chest CT can significantly advance the development of multimodal medical foundation models, by training to generate texts based on given segmentation regions, which is unattainable with previous relevant datasets. We will release all segmentation masks, grounded reports, and VQA pairs to facilitate further research and development in this field.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
AutoAD III: The Prequel -- Back to the Pixels
Authors:
Tengda Han,
Max Bain,
Arsha Nagrani,
Gül Varol,
Weidi Xie,
Andrew Zisserman
Abstract:
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is hampered by using performance measures not specialized to the AD domain. In this paper, we make three c…
▽ More
Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names. Currently, visual language models for AD generation are limited by a lack of suitable training data, and also their evaluation is hampered by using performance measures not specialized to the AD domain. In this paper, we make three contributions: (i) We propose two approaches for constructing AD datasets with aligned video data, and build training and evaluation datasets using these. These datasets will be publicly released; (ii) We develop a Q-former-based architecture which ingests raw video and generates AD, using frozen pre-trained visual encoders and large language models; and (iii) We provide new evaluation metrics to benchmark AD quality that are well-matched to human performance. Taken together, we improve the state of the art on AD generation.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Hyperspectral Anomaly Detection with Self-Supervised Anomaly Prior
Authors:
Yidan Liu,
Weiying Xie,
Kai Jiang,
Jiaqing Zhang,
Yunsong Li,
Leyuan Fang
Abstract:
The majority of existing hyperspectral anomaly detection (HAD) methods use the low-rank representation (LRR) model to separate the background and anomaly components, where the anomaly component is optimized by handcrafted sparse priors (e.g., $\ell_{2,1}$-norm). However, this may not be ideal since they overlook the spatial structure present in anomalies and make the detection result largely depen…
▽ More
The majority of existing hyperspectral anomaly detection (HAD) methods use the low-rank representation (LRR) model to separate the background and anomaly components, where the anomaly component is optimized by handcrafted sparse priors (e.g., $\ell_{2,1}$-norm). However, this may not be ideal since they overlook the spatial structure present in anomalies and make the detection result largely dependent on manually set sparsity. To tackle these problems, we redefine the optimization criterion for the anomaly component in the LRR model with a self-supervised network called self-supervised anomaly prior (SAP). This prior is obtained by the pretext task of self-supervised learning, which is customized to learn the characteristics of hyperspectral anomalies. Specifically, this pretext task is a classification task to distinguish the original hyperspectral image (HSI) and the pseudo-anomaly HSI, where the pseudo-anomaly is generated from the original HSI and designed as a prism with arbitrary polygon bases and arbitrary spectral bands. In addition, a dual-purified strategy is proposed to provide a more refined background representation with an enriched background dictionary, facilitating the separation of anomalies from complex backgrounds. Extensive experiments on various hyperspectral datasets demonstrate that the proposed SAP offers a more accurate and interpretable solution than other advanced HAD methods.
△ Less
Submitted 20 April, 2024;
originally announced April 2024.
-
Moving Object Segmentation: All You Need Is SAM (and Flow)
Authors:
Junyu Xie,
Charig Yang,
Weidi Xie,
Andrew Zisserman
Abstract:
The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. This is a much studied area with numerous careful,and sometimes complex, approaches and training schemes including: self-supervised learning, learning from synthetic datasets, object-centric representations, amodal representations, and many more. Our interest in this paper is to determin…
▽ More
The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. This is a much studied area with numerous careful,and sometimes complex, approaches and training schemes including: self-supervised learning, learning from synthetic datasets, object-centric representations, amodal representations, and many more. Our interest in this paper is to determine if the Segment Anything model (SAM) can contribute to this task. We investigate two models for combining SAM with optical flow that harness the segmentation power of SAM with the ability of flow to discover and group moving objects. In the first model, we adapt SAM to take optical flow, rather than RGB, as an input. In the second, SAM takes RGB as an input, and flow is used as a segmentation prompt. These surprisingly simple methods, without any further modifications, outperform all previous approaches by a considerable margin in both single and multi-object benchmarks. We also extend these frame-level segmentations to sequence-level segmentations that maintain object identity. Again, this simple model outperforms previous methods on multiple video object segmentation benchmarks.
△ Less
Submitted 18 April, 2024;
originally announced April 2024.