-
ASteISR: Adapting Single Image Super-resolution Pre-trained Model for Efficient Stereo Image Super-resolution
Authors:
Yuanbo Zhou,
Yuyang Xue,
Wei Deng,
Xinlin Zhang,
Qinquan Gao,
Tong Tong
Abstract:
Despite advances in the paradigm of pre-training then fine-tuning in low-level vision tasks, significant challenges persist particularly regarding the increased size of pre-trained models such as memory usage and training time. Another concern often encountered is the unsatisfying results yielded when directly applying pre-trained single-image models to multi-image domain. In this paper, we propos…
▽ More
Despite advances in the paradigm of pre-training then fine-tuning in low-level vision tasks, significant challenges persist particularly regarding the increased size of pre-trained models such as memory usage and training time. Another concern often encountered is the unsatisfying results yielded when directly applying pre-trained single-image models to multi-image domain. In this paper, we propose a efficient method for transferring a pre-trained single-image super-resolution (SISR) transformer network to the domain of stereo image super-resolution (SteISR) through a parameter-efficient fine-tuning (PEFT) method. Specifically, we introduce the concept of stereo adapters and spatial adapters which are incorporated into the pre-trained SISR transformer network. Subsequently, the pre-trained SISR model is frozen, enabling us to fine-tune the adapters using stereo datasets along. By adopting this training method, we enhance the ability of the SISR model to accurately infer stereo images by 0.79dB on the Flickr1024 dataset. This method allows us to train only 4.8% of the original model parameters, achieving state-of-the-art performance on four commonly used SteISR benchmarks. Compared to the more complicated full fine-tuning approach, our method reduces training time and memory consumption by 57% and 15%, respectively.
△ Less
Submitted 3 July, 2024;
originally announced July 2024.
-
Distinguishing Surface and Bulk Electromagnetism via Their Dynamics in an Intrinsic Magnetic Topological Insulator
Authors:
Khanh Duy Nguyen,
Woojoo Lee,
Jianchen Dang,
Tongyao Wu,
Gabriele Berruto,
Chenhui Yan,
Chi Ian Jess Ip,
Haoran Lin,
Qiang Gao,
Seng Huat Lee,
Binghai Yan,
Chaoxing Liu,
Zhiqiang Mao,
Xiao-Xiao Zhang,
Shuolong Yang
Abstract:
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of th…
▽ More
The indirect exchange interaction between local magnetic moments via surface electrons has been long predicted to bolster the surface ferromagnetism in magnetic topological insulators (MTIs), which facilitates the quantum anomalous Hall effect. This unconventional effect is critical to determining the operating temperatures of future topotronic devices. However, the experimental confirmation of this mechanism remains elusive, especially in intrinsic MTIs. Here we combine time-resolved photoemission spectroscopy with time-resolved magneto-optical Kerr effect measurements to elucidate the unique electromagnetism at the surface of an intrinsic MTI MnBi2Te4. Theoretical modeling based on 2D Ruderman-Kittel-Kasuya-Yosida interactions captures the initial quenching of a surface-rooted exchange gap within a factor of two but over-estimates the bulk demagnetization by one order of magnitude. This mechanism directly explains the sizable gap in the quasi-2D electronic state and the nonzero residual magnetization in even-layer MnBi2Te4. Furthermore, it leads to efficient light-induced demagnetization comparable to state-of-the-art magnetophotonic crystals, promising an effective manipulation of magnetism and topological orders for future topotronics.
△ Less
Submitted 28 June, 2024;
originally announced July 2024.
-
A Set-based Approach for Feature Extraction of 3D CAD Models
Authors:
Peng Xu,
Qi Gao,
Ying-Jie Wu
Abstract:
Feature extraction is a critical technology to realize the automatic transmission of feature information throughout product life cycles. As CAD models primarily capture the 3D geometry of products, feature extraction heavily relies on geometric information. However, existing feature extraction methods often yield inaccurate outcomes due to the diverse interpretations of geometric information. This…
▽ More
Feature extraction is a critical technology to realize the automatic transmission of feature information throughout product life cycles. As CAD models primarily capture the 3D geometry of products, feature extraction heavily relies on geometric information. However, existing feature extraction methods often yield inaccurate outcomes due to the diverse interpretations of geometric information. This report presents a set-based feature extraction approach to address this uncertainty issue. Unlike existing methods that seek accurate feature results, our approach aims to transform the uncertainty of geometric information into a set of feature subgraphs. First, we define the convexity of basic geometric entities and introduce the concept of two-level attributed adjacency graphs. Second, a feature extraction workflow is designed to determine feature boundaries and identify feature subgraphs from CAD models. This set of feature subgraphs can be used for further feature recognition. A feature extraction system is programmed using C++ and UG/Open to demonstrate the feasibility of our proposed approach.
△ Less
Submitted 22 May, 2024;
originally announced June 2024.
-
Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm
Authors:
Qiang Gao,
Zixiang Meng,
Bobo Li,
Jun Zhou,
Fei Li,
Chong Teng,
Donghong Ji
Abstract:
Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document ev…
▽ More
Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events. We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events, where over 70% of them are cross-document. To build a benchmark, we propose a CDEE pipeline that includes 5 steps, namely event extraction, coreference resolution, entity normalization, role normalization and entity-role resolution. Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task. Our work builds a new line of information extraction research and will attract new research attention.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information
Authors:
Qiang Gao,
Bobo Li,
Zixiang Meng,
Yunlong Li,
Jun Zhou,
Fei Li,
Chong Teng,
Donghong Ji
Abstract:
Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performan…
▽ More
Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results
Authors:
Jiaqi Wang,
Yuhang Zang,
Pan Zhang,
Tao Chu,
Yuhang Cao,
Zeyi Sun,
Ziyu Liu,
Xiaoyi Dong,
Tong Wu,
Dahua Lin,
Zeming Chen,
Zhi Wang,
Lingchen Meng,
Wenhao Yao,
Jianwei Yang,
Sihong Wu,
Zhineng Chen,
Zuxuan Wu,
Yu-Gang Jiang,
Peixi Wu,
Bosong Chai,
Xuan Nie,
Longquan Yan,
Zeyu Wang,
Qifan Zhou
, et al. (9 additional authors not shown)
Abstract:
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3…
▽ More
Detecting objects in real-world scenes is a complex task due to various challenges, including the vast range of object categories, and potential encounters with previously unknown or unseen objects. The challenges necessitate the development of public benchmarks and challenges to advance the field of object detection. Inspired by the success of previous COCO and LVIS Challenges, we organize the V3Det Challenge 2024 in conjunction with the 4th Open World Vision Workshop: Visual Perception via Learning in an Open World (VPLOW) at CVPR 2024, Seattle, US. This challenge aims to push the boundaries of object detection research and encourage innovation in this field. The V3Det Challenge 2024 consists of two tracks: 1) Vast Vocabulary Object Detection: This track focuses on detecting objects from a large set of 13204 categories, testing the detection algorithm's ability to recognize and locate diverse objects. 2) Open Vocabulary Object Detection: This track goes a step further, requiring algorithms to detect objects from an open set of categories, including unknown objects. In the following sections, we will provide a comprehensive summary and analysis of the solutions submitted by participants. By analyzing the methods and solutions presented, we aim to inspire future research directions in vast vocabulary and open-vocabulary object detection, driving progress in this field. Challenge homepage: https://v3det.openxlab.org.cn/challenge
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Technique Report of CVPR 2024 PBDL Challenges
Authors:
Ying Fu,
Yu Li,
Shaodi You,
Boxin Shi,
Linwei Chen,
Yunhao Zou,
Zichun Wang,
Yichen Li,
Yuze Han,
Yingkai Zhang,
Jianan Wang,
Qinglin Liu,
Wei Yu,
Xiaoqian Lv,
Jianing Li,
Shengping Zhang,
Xiangyang Ji,
Yuanpei Chen,
Yuhan Zhang,
Weihang Peng,
Liwen Zhang,
Zhe Xu,
Dingyong Gou,
Cong Li,
Senyan Xu
, et al. (75 additional authors not shown)
Abstract:
The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a…
▽ More
The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches.
△ Less
Submitted 12 July, 2024; v1 submitted 15 June, 2024;
originally announced June 2024.
-
DSL-FIQA: Assessing Facial Image Quality via Dual-Set Degradation Learning and Landmark-Guided Transformer
Authors:
Wei-Ting Chen,
Gurunandan Krishnan,
Qiang Gao,
Sy-Yen Kuo,
Sizhuo Ma,
Jian Wang
Abstract:
Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial image…
▽ More
Generic Face Image Quality Assessment (GFIQA) evaluates the perceptual quality of facial images, which is crucial in improving image restoration algorithms and selecting high-quality face images for downstream tasks. We present a novel transformer-based method for GFIQA, which is aided by two unique mechanisms. First, a Dual-Set Degradation Representation Learning (DSL) mechanism uses facial images with both synthetic and real degradations to decouple degradation from content, ensuring generalizability to real-world scenarios. This self-supervised method learns degradation features on a global scale, providing a robust alternative to conventional methods that use local patch information in degradation learning. Second, our transformer leverages facial landmarks to emphasize visually salient parts of a face image in evaluating its perceptual quality. We also introduce a balanced and diverse Comprehensive Generic Face IQA (CGFIQA-40k) dataset of 40K images carefully designed to overcome the biases, in particular the imbalances in skin tone and gender representation, in existing datasets. Extensive analysis and evaluation demonstrate the robustness of our method, marking a significant improvement over prior methods.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
Pandora: Towards General World Model with Natural Language Actions and Video States
Authors:
Jiannan Xiang,
Guangyi Liu,
Yi Gu,
Qiyue Gao,
Yuting Ning,
Yuheng Zha,
Zeyu Feng,
Tianhua Tao,
Shibo Hao,
Yemin Shi,
Zhengzhong Liu,
Eric P. Xing,
Zhiting Hu
Abstract:
World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the…
▽ More
World models simulate future states of the world in response to different actions. They facilitate interactive content creation and provides a foundation for grounded, long-horizon reasoning. Current foundation models do not fully meet the capabilities of general world models: large language models (LLMs) are constrained by their reliance on language modality and their limited understanding of the physical world, while video models lack interactive action control over the world simulations. This paper makes a step towards building a general world model by introducing Pandora, a hybrid autoregressive-diffusion model that simulates world states by generating videos and allows real-time control with free-text actions. Pandora achieves domain generality, video consistency, and controllability through large-scale pretraining and instruction tuning. Crucially, Pandora bypasses the cost of training-from-scratch by integrating a pretrained LLM (7B) and a pretrained video model, requiring only additional lightweight finetuning. We illustrate extensive outputs by Pandora across diverse domains (indoor/outdoor, natural/urban, human/robot, 2D/3D, etc.). The results indicate great potential of building stronger general world models with larger-scale training.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Constraints on Ultra Heavy Dark Matter Properties from Dwarf Spheroidal Galaxies with LHAASO Observations
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes…
▽ More
In this work we try to search for signals generated by ultra-heavy dark matter at the Large High Altitude Air Shower Observatory (LHAASO) data. We look for possible gamma-ray by dark matter annihilation or decay from 16 dwarf spheroidal galaxies in the field of view of LHAASO. Dwarf spheroidal galaxies are among the most promising targets for indirect detection of dark matter which have low fluxes of astrophysical $γ$-ray background while large amount of dark matter. By analyzing more than 700 days observational data at LHAASO, no significant dark matter signal from 1 TeV to 1 EeV is detected. Accordingly we derive the most stringent constraints on the ultra-heavy dark matter annihilation cross-section up to EeV. The constraints on the lifetime of dark matter in decay mode are also derived.
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Stratified Avatar Generation from Sparse Observations
Authors:
Han Feng,
Wenchao Ma,
Quankai Gao,
Xianwei Zheng,
Nan Xue,
Huijuan Xu
Abstract:
Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this…
▽ More
Estimating 3D full-body avatars from AR/VR devices is essential for creating immersive experiences in AR/VR applications. This task is challenging due to the limited input from Head Mounted Devices, which capture only sparse observations from the head and hands. Predicting the full-body avatars, particularly the lower body, from these sparse observations presents significant difficulties. In this paper, we are inspired by the inherent property of the kinematic tree defined in the Skinned Multi-Person Linear (SMPL) model, where the upper body and lower body share only one common ancestor node, bringing the potential of decoupled reconstruction. We propose a stratified approach to decouple the conventional full-body avatar reconstruction pipeline into two stages, with the reconstruction of the upper body first and a subsequent reconstruction of the lower body conditioned on the previous stage. To implement this straightforward idea, we leverage the latent diffusion model as a powerful probabilistic generator, and train it to follow the latent distribution of decoupled motions explored by a VQ-VAE encoder-decoder model. Extensive experiments on AMASS mocap dataset demonstrate our state-of-the-art performance in the reconstruction of full-body motions.
△ Less
Submitted 3 June, 2024; v1 submitted 30 May, 2024;
originally announced May 2024.
-
Preservation of Topological Surface States in Millimeter-Scale Transferred Membranes
Authors:
Chi Ian Jess Ip,
Qiang Gao,
Khanhy Du Nguyen,
Chenhui Yan,
Gangbin Yan,
Eli Hoenig,
Thomas S. Marchese,
Minghao Zhang,
Woojoo Lee,
Hossein Rokni,
Ying Shirley Meng,
Chong Liu,
Shuolong Yang
Abstract:
Ultrathin topological insulator membranes are building blocks of exotic quantum matter. However, traditional epitaxy of these materials does not facilitate stacking in arbitrary orders, while mechanical exfoliation from bulk crystals is also challenging due to the non-negligible interlayer coupling therein. Here we liberate millimeter-scale films of topological insulator Bi$_2$Se$_3$, grown by mol…
▽ More
Ultrathin topological insulator membranes are building blocks of exotic quantum matter. However, traditional epitaxy of these materials does not facilitate stacking in arbitrary orders, while mechanical exfoliation from bulk crystals is also challenging due to the non-negligible interlayer coupling therein. Here we liberate millimeter-scale films of topological insulator Bi$_2$Se$_3$, grown by molecular beam epitaxy, down to 3 quintuple layers. We characterize the preservation of the topological surface states and quantum well states in transferred Bi$_{2}$Se$_{3}$ films using angle-resolved photoemission spectroscopy. Leveraging the photon-energy-dependent surface sensitivity, the photoemission spectra taken with $6$ eV and $21.2$ eV photons reveal a transfer-induced migration of the topological surface states from the top to the inner layers. By establishing clear electronic structures of the transferred films and unveiling the wavefunction relocation of the topological surface states, our work paves the physics foundation crucial for the future fabrication of artificially stacked topological materials with single-layer precision.
△ Less
Submitted 21 May, 2024;
originally announced May 2024.
-
Featuring nuanced electronic band structure in gapped multilayer graphene
Authors:
Jin Jiang,
Qixuan Gao,
Zekang Zhou,
Cheng Shen,
Mario Di Luca,
Emily Hajigeorgiou,
Kenji Watanabe,
Takashi Taniguchi,
Mitali Banerjee
Abstract:
Moiré systems featuring flat electronic bands exhibit a vast landscape of emergent exotic quantum states, making them one of the resourceful platforms in condensed matter physics in recent times. Tuning these systems via twist angle and the electric field greatly enhances our comprehension of their strongly correlated ground states. Here, we report a technique to investigate the nuanced intricacie…
▽ More
Moiré systems featuring flat electronic bands exhibit a vast landscape of emergent exotic quantum states, making them one of the resourceful platforms in condensed matter physics in recent times. Tuning these systems via twist angle and the electric field greatly enhances our comprehension of their strongly correlated ground states. Here, we report a technique to investigate the nuanced intricacies of band structures in dual-gated multilayer graphene systems. We utilize the Landau levels of a decoupled monolayer graphene to extract the electric field-dependent bilayer graphene charge neutrality point gap. Then, we extend this method to analyze the evolution of the band gap and the flat bandwidth in twisted mono-bilayer graphene. The band gap maximizes at the same displacement field where the flat bandwidth minimizes, indicating the strongest electron-electron correlation, which is supported by directly observing the emergence of a strongly correlated phase. Moreover, we extract integer and fractional gaps to further demonstrate the strength of this method. Our technique gives a new perspective and paves the way for improving the understanding of electronic band structure in versatile flat band systems.
△ Less
Submitted 10 June, 2024; v1 submitted 21 May, 2024;
originally announced May 2024.
-
Data quality control system and long-term performance monitor of the LHAASO-KM2A
Authors:
Zhen Cao,
F. Aharonian,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
W. Bian,
A. V. Bukevich,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
H. X. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. Chen
, et al. (263 additional authors not shown)
Abstract:
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To…
▽ More
The KM2A is the largest sub-array of the Large High Altitude Air Shower Observatory (LHAASO). It consists of 5216 electromagnetic particle detectors (EDs) and 1188 muon detectors (MDs). The data recorded by the EDs and MDs are used to reconstruct primary information of cosmic ray and gamma-ray showers. This information is used for physical analysis in gamma-ray astronomy and cosmic ray physics. To ensure the reliability of the LHAASO-KM2A data, a three-level quality control system has been established. It is used to monitor the status of detector units, stability of reconstructed parameters and the performance of the array based on observations of the Crab Nebula and Moon shadow. This paper will introduce the control system and its application on the LHAASO-KM2A data collected from August 2021 to July 2023. During this period, the pointing and angular resolution of the array were stable. From the observations of the Moon shadow and Crab Nebula, the results achieved using the two methods are consistent with each other. According to the observation of the Crab Nebula at energies from 25 TeV to 100 TeV, the time averaged pointing errors are estimated to be $-0.003^{\circ} \pm 0.005^{\circ}$ and $0.001^{\circ} \pm 0.006^{\circ}$ in the R.A. and Dec directions, respectively.
△ Less
Submitted 13 June, 2024; v1 submitted 20 May, 2024;
originally announced May 2024.
-
Semantic Gesticulator: Semantics-Aware Co-Speech Gesture Synthesis
Authors:
Zeyi Zhang,
Tenglong Ao,
Yuyao Zhang,
Qingzhe Gao,
Chuan Lin,
Baoquan Chen,
Libin Liu
Abstract:
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging fo…
▽ More
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples. User studies confirm the quality and human-likeness of our results, and show that our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.
△ Less
Submitted 16 May, 2024; v1 submitted 16 May, 2024;
originally announced May 2024.
-
Discovery of Very-high-energy Gamma-ray Emissions from the Low Luminosity AGN NGC 4278 by LHAASO
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (255 additional authors not shown)
Abstract:
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) i…
▽ More
The first source catalog of Large High Altitude Air Shower Observatory reported the detection of a very-high-energy gamma ray source, 1LHAASO J1219+2915. In this paper a further detailed study of the spectral and temporal behavior of this point-like source have been carried. The best-fit position of the TeV source ($\rm{RA}=185.05^{\circ}\pm0.04^{\circ}$, $\rm{Dec}=29.25^{\circ}\pm0.03^{\circ}$) is compatible with NGC 4278 within $\sim0.03$ degree. Variation analysis shows an indication of the variability at a few months level in the TeV band, which is consistent with low frequency observations. Based on these observations, we report the detection of TeV $γ$-ray emissions from this low-luminosity AGN NGC 4278. The observations by LHAASO-WCDA during active period has a significance level of 8.8\,$σ$ with best-fit photon spectral index $\varGamma=2.56\pm0.14$ and a flux $f_{1-10\,\rm{TeV}}=(7.0\pm1.1_{\rm{sta}}\pm0.35_{\rm{syst}})\times10^{-13}\,\rm{photons\,cm^{-2}\,s^{-1}}$, or approximately $5\%$ of the Crab Nebula. The discovery of VHE from NGC 4278 indicates that the compact, weak radio jet can efficiently accelerate particles and emit TeV photons.
△ Less
Submitted 13 May, 2024;
originally announced May 2024.
-
HandS3C: 3D Hand Mesh Reconstruction with State Space Spatial Channel Attention from RGB images
Authors:
Zixun Jiao,
Xihan Wang,
Zhaoqiang Xia,
Lianhe Shao,
Quanli Gao
Abstract:
Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high comput…
▽ More
Reconstructing the hand mesh from one single RGB image is a challenging task because hands are often occluded by other objects. Most previous works attempt to explore more additional information and adopt attention mechanisms for improving 3D reconstruction performance, while it would increase computational complexity simultaneously. To achieve a performance-reserving architecture with high computational efficiency, in this work, we propose a simple but effective 3D hand mesh reconstruction network (i.e., HandS3C), which is the first time to incorporate state space model into the task of hand mesh reconstruction. In the network, we design a novel state-space spatial-channel attention module that extends the effective receptive field, extracts hand features in the spatial dimension, and enhances regional features of hands in the channel dimension. This helps to reconstruct a complete and detailed hand mesh. Extensive experiments conducted on well-known datasets facing heavy occlusions (such as FREIHAND, DEXYCB, and HO3D) demonstrate that our proposed HandS3C achieves state-of-the-art performance while maintaining a minimal parameters.
△ Less
Submitted 14 May, 2024; v1 submitted 2 May, 2024;
originally announced May 2024.
-
On the duality in constant-roll inflation
Authors:
Yue Wang,
Qing Gao,
Shengqing Gao,
Yungui Gong
Abstract:
There is a duality in the observables $n_s$, $r$ and the inflaton potential between large and small $η_H$ for the constant-roll inflation if the slow-roll parameter $ε_H$ is negligible. In general, the duality between $η_H$ and $\barη_H$ does not hold for the background evolution of the inflation. For some particular solutions for the constant-roll inflation with $η_H$ being a constant, we find th…
▽ More
There is a duality in the observables $n_s$, $r$ and the inflaton potential between large and small $η_H$ for the constant-roll inflation if the slow-roll parameter $ε_H$ is negligible. In general, the duality between $η_H$ and $\barη_H$ does not hold for the background evolution of the inflation. For some particular solutions for the constant-roll inflation with $η_H$ being a constant, we find that in the small field approximation, the potential takes the quadratic form and it remains the same when the parameter $η_H$ changes to $\barη_H=3-η_H$. If the scalar field is small and the contribution of $ε_H$ is negligible, we find that there exists the logarithmic duality and the duality between large and small $η_H$ for the primordial curvature perturbation in inflationary models with the quadratic potential.
△ Less
Submitted 29 April, 2024;
originally announced April 2024.
-
Real-Time 4K Super-Resolution of Compressed AVIF Images. AIS 2024 Challenge Survey
Authors:
Marcos V. Conde,
Zhijun Lei,
Wen Li,
Cosmin Stejerean,
Ioannis Katsavounidis,
Radu Timofte,
Kihwan Yoon,
Ganzorig Gankhuyag,
Jiangtao Lv,
Long Sun,
Jinshan Pan,
Jiangxin Dong,
Jinhui Tang,
Zhiyuan Li,
Hao Wei,
Chenyang Ge,
Dongyang Zhang,
Tianle Liu,
Huaian Chen,
Yi Jin,
Menghan Zhou,
Yiqiang Yan,
Si Gao,
Biao Wu,
Shaoli Liu
, et al. (50 additional authors not shown)
Abstract:
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF cod…
▽ More
This paper introduces a novel benchmark as part of the AIS 2024 Real-Time Image Super-Resolution (RTSR) Challenge, which aims to upscale compressed images from 540p to 4K resolution (4x factor) in real-time on commercial GPUs. For this, we use a diverse test set containing a variety of 4K images ranging from digital art to gaming and photography. The images are compressed using the modern AVIF codec, instead of JPEG. All the proposed methods improve PSNR fidelity over Lanczos interpolation, and process images under 10ms. Out of the 160 participants, 25 teams submitted their code and models. The solutions present novel designs tailored for memory-efficiency and runtime on edge devices. This survey describes the best solutions for real-time SR of compressed high-resolution images.
△ Less
Submitted 25 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Challenge on Low Light Image Enhancement: Methods and Results
Authors:
Xiaoning Liu,
Zongwei Wu,
Ao Li,
Florin-Alexandru Vasluianu,
Yulun Zhang,
Shuhang Gu,
Le Zhang,
Ce Zhu,
Radu Timofte,
Zhi Jin,
Hongjun Wu,
Chenxi Wang,
Haitao Ling,
Yuanhao Cai,
Hao Bian,
Yuxin Zheng,
Jing Lin,
Alan Yuille,
Ben Shao,
Jin Guo,
Tianli Liu,
Mohao Wu,
Yixu Feng,
Shuo Hou,
Haotian Lin
, et al. (87 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlig…
▽ More
This paper reviews the NTIRE 2024 low light image enhancement challenge, highlighting the proposed solutions and results. The aim of this challenge is to discover an effective network design or solution capable of generating brighter, clearer, and visually appealing results when dealing with a variety of conditions, including ultra-high resolution (4K and beyond), non-uniform illumination, backlighting, extreme darkness, and night scenes. A notable total of 428 participants registered for the challenge, with 22 teams ultimately making valid submissions. This paper meticulously evaluates the state-of-the-art advancements in enhancing low-light images, reflecting the significant progress and creativity in this field.
△ Less
Submitted 22 April, 2024;
originally announced April 2024.
-
Thermodynamic constraints on kinetic perturbations of homogeneous driven diffusions
Authors:
Qi Gao,
Hyun-Myung Chun,
Jordan M. Horowitz
Abstract:
We analyze the static response to kinetic perturbations of nonequilibrium steady states that can be modeled as diffusions. We demonstrate that kinetic response is purely a nonequilibirum effect, measuring the degree to which the Fluctuation-Dissipation Theorem is violated out of equilibrium. For driven diffusions in a flat landscape, we further demonstrate that such response is constrained by the…
▽ More
We analyze the static response to kinetic perturbations of nonequilibrium steady states that can be modeled as diffusions. We demonstrate that kinetic response is purely a nonequilibirum effect, measuring the degree to which the Fluctuation-Dissipation Theorem is violated out of equilibrium. For driven diffusions in a flat landscape, we further demonstrate that such response is constrained by the strength of the nonequilibrium driving via quantitative inequalities.
△ Less
Submitted 15 April, 2024;
originally announced April 2024.
-
LLM Reasoners: New Evaluation, Library, and Analysis of Step-by-Step Reasoning with Large Language Models
Authors:
Shibo Hao,
Yi Gu,
Haotian Luo,
Tianyang Liu,
Xiyan Shao,
Xinyuan Wang,
Shuhua Xie,
Haodi Ma,
Adithya Samavedhi,
Qiyue Gao,
Zhen Wang,
Zhiting Hu
Abstract:
Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the la…
▽ More
Generating accurate step-by-step reasoning is essential for Large Language Models (LLMs) to address complex problems and enhance robustness and interpretability. Despite the flux of research on developing advanced reasoning approaches, systematically analyzing the diverse LLMs and reasoning strategies in generating reasoning chains remains a significant challenge. The difficulties stem from the lack of two key elements: (1) an automatic method for evaluating the generated reasoning chains on different tasks, and (2) a unified formalism and implementation of the diverse reasoning approaches for systematic comparison. This paper aims to close the gap: (1) We introduce AutoRace for fully automated reasoning chain evaluation. Existing metrics rely on expensive human annotations or pre-defined LLM prompts not adaptable to different tasks. In contrast, AutoRace automatically creates detailed evaluation criteria tailored for each task, and uses GPT-4 for accurate evaluation following the criteria. (2) We develop LLM Reasoners, a library for standardized modular implementation of existing and new reasoning algorithms, under a unified formulation of the search, reward, and world model components. With the new evaluation and library, (3) we conduct extensive study of different reasoning approaches (e.g., CoT, ToT, RAP). The analysis reveals interesting findings about different factors contributing to reasoning, including the reward-guidance, breadth-vs-depth in search, world model, and prompt formats, etc.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning
Authors:
Yu Lei,
Guoshuai Sheng,
Fangfang Li,
Quanxue Gao,
Cheng Deng,
Qin Li
Abstract:
Zero-shot learning(ZSL) aims to recognize new classes without prior exposure to their samples, relying on semantic knowledge from observed classes. However, current attention-based models may overlook the transferability of visual features and the distinctiveness of attribute localization when learning regional features in images. Additionally, they often overlook shared attributes among different…
▽ More
Zero-shot learning(ZSL) aims to recognize new classes without prior exposure to their samples, relying on semantic knowledge from observed classes. However, current attention-based models may overlook the transferability of visual features and the distinctiveness of attribute localization when learning regional features in images. Additionally, they often overlook shared attributes among different objects. Highly discriminative attribute features are crucial for identifying and distinguishing unseen classes. To address these issues, we propose an innovative approach called High-Discriminative Attribute Feature Learning for Generalized Zero-Shot Learning (HDAFL). HDAFL optimizes visual features by learning attribute features to obtain discriminative visual embeddings. Specifically, HDAFL utilizes multiple convolutional kernels to automatically learn discriminative regions highly correlated with attributes in images, eliminating irrelevant interference in image features. Furthermore, we introduce a Transformer-based attribute discrimination encoder to enhance the discriminative capability among attributes. Simultaneously, the method employs contrastive loss to alleviate dataset biases and enhance the transferability of visual features, facilitating better semantic transfer between seen and unseen classes. Experimental results demonstrate the effectiveness of HDAFL across three widely used datasets.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Fuzzy K-Means Clustering without Cluster Centroids
Authors:
Han Lu,
Fangfang Li,
Quanxue Gao,
Cheng Deng,
Chris Ding,
Qianqian Wang
Abstract:
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. However, the performance of popular Fuzzy K-Means algorithms is sensitive to the selection of initial cluster centroids and is also affected by noise when updating mean cluster centroids. To address these challenges, this paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on…
▽ More
Fuzzy K-Means clustering is a critical technique in unsupervised data analysis. However, the performance of popular Fuzzy K-Means algorithms is sensitive to the selection of initial cluster centroids and is also affected by noise when updating mean cluster centroids. To address these challenges, this paper proposes a novel Fuzzy K-Means clustering algorithm that entirely eliminates the reliance on cluster centroids, obtaining membership matrices solely through distance matrix computation. This innovation enhances flexibility in distance measurement between sample points, thus improving the algorithm's performance and robustness. The paper also establishes theoretical connections between the proposed model and popular Fuzzy K-Means clustering techniques. Experimental results on several real datasets demonstrate the effectiveness of the algorithm.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
LHAASO-KM2A detector simulation using Geant4
Authors:
Zhen Cao,
F. Aharonian,
Q. An,
Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen,
S. Z. Chen
, et al. (254 additional authors not shown)
Abstract:
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with…
▽ More
KM2A is one of the main sub-arrays of LHAASO, working on gamma ray astronomy and cosmic ray physics at energies above 10 TeV. Detector simulation is the important foundation for estimating detector performance and data analysis. It is a big challenge to simulate the KM2A detector in the framework of Geant4 due to the need to track numerous photons from a large number of detector units (>6000) with large altitude difference (30 m) and huge coverage (1.3 km^2). In this paper, the design of the KM2A simulation code G4KM2A based on Geant4 is introduced. The process of G4KM2A is optimized mainly in memory consumption to avoid memory overffow. Some simpliffcations are used to signiffcantly speed up the execution of G4KM2A. The running time is reduced by at least 30 times compared to full detector simulation. The particle distributions and the core/angle resolution comparison between simulation and experimental data of the full KM2A array are also presented, which show good agreement.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Mirror-3DGS: Incorporating Mirror Reflections into 3D Gaussian Splatting
Authors:
Jiarui Meng,
Haijie Li,
Yanmin Wu,
Qiankun Gao,
Shuzhou Yang,
Jian Zhang,
Siwei Ma
Abstract:
3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that phy…
▽ More
3D Gaussian Splatting (3DGS) has marked a significant breakthrough in the realm of 3D scene reconstruction and novel view synthesis. However, 3DGS, much like its predecessor Neural Radiance Fields (NeRF), struggles to accurately model physical reflections, particularly in mirrors that are ubiquitous in real-world scenes. This oversight mistakenly perceives reflections as separate entities that physically exist, resulting in inaccurate reconstructions and inconsistent reflective properties across varied viewpoints. To address this pivotal challenge, we introduce Mirror-3DGS, an innovative rendering framework devised to master the intricacies of mirror geometries and reflections, paving the way for the generation of realistically depicted mirror reflections. By ingeniously incorporating mirror attributes into the 3DGS and leveraging the principle of plane mirror imaging, Mirror-3DGS crafts a mirrored viewpoint to observe from behind the mirror, enriching the realism of scene renderings. Extensive assessments, spanning both synthetic and real-world scenes, showcase our method's ability to render novel views with enhanced fidelity in real-time, surpassing the state-of-the-art Mirror-NeRF specifically within the challenging mirror regions. Our code will be made publicly available for reproducible research.
△ Less
Submitted 1 April, 2024;
originally announced April 2024.
-
Interpretable Multi-View Clustering Based on Anchor Graph Tensor Factorization
Authors:
Jing Li,
Quanxue Gao,
Cheng Deng,
Qianqian Wang,
Ming Yang
Abstract:
The clustering method based on the anchor graph has gained significant attention due to its exceptional clustering performance and ability to process large-scale data. One common approach is to learn bipartite graphs with K-connected components, helping avoid the need for post-processing. However, this method has strict parameter requirements and may not always get K-connected components. To addre…
▽ More
The clustering method based on the anchor graph has gained significant attention due to its exceptional clustering performance and ability to process large-scale data. One common approach is to learn bipartite graphs with K-connected components, helping avoid the need for post-processing. However, this method has strict parameter requirements and may not always get K-connected components. To address this issue, an alternative approach is to directly obtain the cluster label matrix by performing non-negative matrix factorization (NMF) on the anchor graph. Nevertheless, existing multi-view clustering methods based on anchor graph factorization lack adequate cluster interpretability for the decomposed matrix and often overlook the inter-view information. We address this limitation by using non-negative tensor factorization to decompose an anchor graph tensor that combines anchor graphs from multiple views. This approach allows us to consider inter-view information comprehensively. The decomposed tensors, namely the sample indicator tensor and the anchor indicator tensor, enhance the interpretability of the factorization. Extensive experiments validate the effectiveness of this method.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Different intermediate water cluster with distinct nucleation dynamics among mono layer ice nucleation
Authors:
Yuheng Zhao,
Yi Qin Gao
Abstract:
Recent first-principle calculations unveiled a distinctive dynamic behavior in water molecule rotation during the melting process of highly confined water, indicating a notable time-scale separation in diffusion. In this short paper, we conducted molecular dynamics (MD) simulations to explore the rotation dynamics during the mono-layer ice nucleation process to investigate the possible intermediat…
▽ More
Recent first-principle calculations unveiled a distinctive dynamic behavior in water molecule rotation during the melting process of highly confined water, indicating a notable time-scale separation in diffusion. In this short paper, we conducted molecular dynamics (MD) simulations to explore the rotation dynamics during the mono-layer ice nucleation process to investigate the possible intermediate states characterized by the differences in rotation of water molecules. Our study reveals two types of ice clusters with similar ice geometric structure but possess distinctly different rotational behaviors. In terms of molecular rotation, one type cluster is ice like (ILC) and can be regarded as small ice nuclei while the other is supercooled liquid water like (SCC). We found distinct nucleation pathways, thermodynamic properties, and phase transition dynamics to associate with these intermediate clusters, which yielded an unexpectedly complex picture of mono-layer ice nucleation.
△ Less
Submitted 26 March, 2024;
originally announced March 2024.
-
Innovative Quantitative Analysis for Disease Progression Assessment in Familial Cerebral Cavernous Malformations
Authors:
Ruige Zong,
Tao Wang,
Chunwang Li,
Xinlin Zhang,
Yuanbin Chen,
Longxuan Zhao,
Qixuan Li,
Qinquan Gao,
Dezhi Kang,
Fuxin Lin,
Tong Tong
Abstract:
Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions ha…
▽ More
Familial cerebral cavernous malformation (FCCM) is a hereditary disorder characterized by abnormal vascular structures within the central nervous system. The FCCM lesions are often numerous and intricate, making quantitative analysis of the lesions a labor-intensive task. Consequently, clinicians face challenges in quantitatively assessing the severity of lesions and determining whether lesions have progressed. To alleviate this problem, we propose a quantitative statistical framework for FCCM, comprising an efficient annotation module, an FCCM lesion segmentation module, and an FCCM lesion quantitative statistics module. Our framework demonstrates precise segmentation of the FCCM lesion based on efficient data annotation, achieving a Dice coefficient of 93.22\%. More importantly, we focus on quantitative statistics of lesions, which is combined with image registration to realize the quantitative comparison of lesions between different examinations of patients, and a visualization framework has been established for doctors to comprehensively compare and analyze lesions. The experimental results have demonstrated that our proposed framework not only obtains objective, accurate, and comprehensive quantitative statistical information, which provides a quantitative assessment method for disease progression and drug efficacy study, but also considerably reduces the manual measurement and statistical workload of lesions, assisting clinical decision-making for FCCM and accelerating progress in FCCM clinical research. This highlights the potential of practical application of the framework in FCCM clinical research and clinical decision-making. The codes are available at https://github.com/6zrg/Quantitative-Statistics-of-FCCM.
△ Less
Submitted 23 March, 2024;
originally announced March 2024.
-
GaussianFlow: Splatting Gaussian Dynamics for 4D Content Creation
Authors:
Quankai Gao,
Qiangeng Xu,
Zhe Cao,
Ben Mildenhall,
Wenchao Ma,
Le Chen,
Danhang Tang,
Ulrich Neumann
Abstract:
Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians…
▽ More
Creating 4D fields of Gaussian Splatting from images or videos is a challenging task due to its under-constrained nature. While the optimization can draw photometric reference from the input videos or be regulated by generative models, directly supervising Gaussian motions remains underexplored. In this paper, we introduce a novel concept, Gaussian flow, which connects the dynamics of 3D Gaussians and pixel velocities between consecutive frames. The Gaussian flow can be efficiently obtained by splatting Gaussian dynamics into the image space. This differentiable process enables direct dynamic supervision from optical flow. Our method significantly benefits 4D dynamic content generation and 4D novel view synthesis with Gaussian Splatting, especially for contents with rich motions that are hard to be handled by existing methods. The common color drifting issue that happens in 4D generation is also resolved with improved Guassian dynamics. Superior visual quality on extensive experiments demonstrates our method's effectiveness. Quantitative and qualitative evaluations show that our method achieves state-of-the-art results on both tasks of 4D generation and 4D novel view synthesis. Project page: https://zerg-overmind.github.io/GaussianFlow.github.io/
△ Less
Submitted 13 May, 2024; v1 submitted 18 March, 2024;
originally announced March 2024.
-
BAGS: Building Animatable Gaussian Splatting from a Monocular Video with Diffusion Priors
Authors:
Tingyang Zhang,
Qingzhe Gao,
Weiyu Li,
Libin Liu,
Baoquan Chen
Abstract:
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for train…
▽ More
Animatable 3D reconstruction has significant applications across various fields, primarily relying on artists' handcraft creation. Recently, some studies have successfully constructed animatable 3D models from monocular videos. However, these approaches require sufficient view coverage of the object within the input video and typically necessitate significant time and computational costs for training and rendering. This limitation restricts the practical applications. In this work, we propose a method to build animatable 3D Gaussian Splatting from monocular video with diffusion priors. The 3D Gaussian representations significantly accelerate the training and rendering process, and the diffusion priors allow the method to learn 3D models with limited viewpoints. We also present the rigid regularization to enhance the utilization of the priors. We perform an extensive evaluation across various real-world videos, demonstrating its superior performance compared to the current state-of-the-art methods.
△ Less
Submitted 17 March, 2024;
originally announced March 2024.
-
Measurements of All-Particle Energy Spectrum and Mean Logarithmic Mass of Cosmic Rays from 0.3 to 30 PeV with LHAASO-KM2A
Authors:
The LHAASO Collaboration,
Zhen Cao,
F. Aharonian,
Q. An,
A. Axikegu,
Y. X. Bai,
Y. W. Bao,
D. Bastieri,
X. J. Bi,
Y. J. Bi,
J. T. Cai,
Q. Cao,
W. Y. Cao,
Zhe Cao,
J. Chang,
J. F. Chang,
A. M. Chen,
E. S. Chen,
Liang Chen,
Lin Chen,
Long Chen,
M. J. Chen,
M. L. Chen,
Q. H. Chen,
S. H. Chen
, et al. (256 additional authors not shown)
Abstract:
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at…
▽ More
We present the measurements of all-particle energy spectrum and mean logarithmic mass of cosmic rays in the energy range of 0.3-30 PeV using data collected from LHAASO-KM2A between September 2021 and December 2022, which is based on a nearly composition-independent energy reconstruction method, achieving unprecedented accuracy. Our analysis reveals the position of the knee at $3.67 \pm 0.05 \pm 0.15$ PeV. Below the knee, the spectral index is found to be -$2.7413 \pm 0.0004 \pm 0.0050$, while above the knee, it is -$3.128 \pm 0.005 \pm 0.027$, with the sharpness of the transition measured with a statistical error of 2%. The mean logarithmic mass of cosmic rays is almost heavier than helium in the whole measured energy range. It decreases from 1.7 at 0.3 PeV to 1.3 at 3 PeV, representing a 24% decline following a power law with an index of -$0.1200 \pm 0.0003 \pm 0.0341$. This is equivalent to an increase in abundance of light components. Above the knee, the mean logarithmic mass exhibits a power law trend towards heavier components, which is reversal to the behavior observed in the all-particle energy spectrum. Additionally, the knee position and the change in power-law index are approximately the same. These findings suggest that the knee observed in the all-particle spectrum corresponds to the knee of the light component, rather than the medium-heavy components.
△ Less
Submitted 26 March, 2024; v1 submitted 15 March, 2024;
originally announced March 2024.
-
The fundamental plane of blazars based on the black hole spin-mass energy
Authors:
Xu Zhang,
Dingrong Xiong,
Quangui Gao,
Guiqin Yang,
Fangwu Lu,
Weiwei Na,
Longhua Qin
Abstract:
We examine the fundamental plane of 91 Blazars which include FSRQs and BL Lacs with known X-ray luminosity ($L_{R}$), radio luminosity ($L_X$), and black hole mass measurements ($M$) to reflect the relationship between jet and accretion for blazars. The fundamental plane of Blazars are log$L_{R}$=${0.273}_{+0.059}^{-0.059}$log$L_X$+${0.695}_{+0.191}^{-0.191}$log$M$+${25.457}_{+2.728}^{-2.728}$ and…
▽ More
We examine the fundamental plane of 91 Blazars which include FSRQs and BL Lacs with known X-ray luminosity ($L_{R}$), radio luminosity ($L_X$), and black hole mass measurements ($M$) to reflect the relationship between jet and accretion for blazars. The fundamental plane of Blazars are log$L_{R}$=${0.273}_{+0.059}^{-0.059}$log$L_X$+${0.695}_{+0.191}^{-0.191}$log$M$+${25.457}_{+2.728}^{-2.728}$ and log$L_{R}$=${0.190}_{+0.049}^{-0.049}$log$L_X$+${0.475}_{+0.157}^{-0.157}$log$M$+${28.568}_{+2.245}^{-2.245}$ after considering the effect of beam factor. Our results suggest that the jet of blazars has connection with accretion. We set the black hole spin energy as a new variable to correct the black hole mass and explore the effect of black hole spin on the fundamental relationship. We find that the fundamental plane of Blazars is effected by the black hole spin, which is similar to the previous work for AGNs. We additionally examine a new fundamental plane which is based on the black hole spin-mass energy ($M_{spin}$). The new fundamental plane (log$L_{R}$=${0.332}_{+0.081}^{-0.081}$log$L_X$+${0.502}_{+0.091}^{-0.091}$log$M_{spin}$+${22.606}_{+3.346}^{-3.346}$ with R-Square=0.575) shows that $M_{spin}$ has a better correlation coefficient comparing to the $M$ for fundamental plane of Blazars. These results support that the black hole spin should be considered as a important factor for the study of fundamental plane for Blazars. And these may further our understanding of the Blandford-Znajek process in blazars.
△ Less
Submitted 14 March, 2024;
originally announced March 2024.
-
ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment
Authors:
Hao-Lun Hsu,
Qitong Gao,
Miroslav Pajic
Abstract:
Deep Brain Stimulation (DBS) stands as an effective intervention for alleviating the motor symptoms of Parkinson's disease (PD). Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i.e., continuous DBS (cDBS). However, they in general suffer from energy inefficiency and side effects, such as speech impairment.…
▽ More
Deep Brain Stimulation (DBS) stands as an effective intervention for alleviating the motor symptoms of Parkinson's disease (PD). Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i.e., continuous DBS (cDBS). However, they in general suffer from energy inefficiency and side effects, such as speech impairment. Recent research has focused on adaptive DBS (aDBS) to resolve the limitations of cDBS. Specifically, reinforcement learning (RL) based approaches have been developed to adapt the frequencies of the stimuli in order to achieve both energy efficiency and treatment efficacy. However, RL approaches in general require significant amount of training data and computational resources, making it intractable to integrate RL policies into real-time embedded systems as needed in aDBS. In contrast, contextual multi-armed bandits (CMAB) in general lead to better sample efficiency compared to RL. In this study, we propose a CMAB solution for aDBS. Specifically, we define the context as the signals capturing irregular neuronal firing activities in the BG regions (i.e., beta-band power spectral density), while each arm signifies the (discretized) pulse frequency of the stimulation. Moreover, an ε-exploring strategy is introduced on top of the classic Thompson sampling method, leading to an algorithm called ε-Neural Thompson sampling (ε-NeuralTS), such that the learned CMAB policy can better balance exploration and exploitation of the BG environment. The ε-NeuralTS algorithm is evaluated using a computation BG model that captures the neuronal activities in PD patients' brains. The results show that our method outperforms both existing cDBS methods and CMAB baselines.
△ Less
Submitted 11 March, 2024;
originally announced March 2024.
-
Scrambling Transition in Free Fermion Systems Induced by a Single Impurity
Authors:
Qucheng Gao,
Tianci Zhou,
Pengfei Zhang,
Xiao Chen
Abstract:
In quantum many-body systems, interactions play a crucial role in the emergence of information scrambling. When particles interact throughout the system, the entanglement between them can lead to a rapid and chaotic spreading of quantum information, typically probed by the growth in operator size in the Heisenberg picture. In this study, we explore whether the operator undergoes scrambling when pa…
▽ More
In quantum many-body systems, interactions play a crucial role in the emergence of information scrambling. When particles interact throughout the system, the entanglement between them can lead to a rapid and chaotic spreading of quantum information, typically probed by the growth in operator size in the Heisenberg picture. In this study, we explore whether the operator undergoes scrambling when particles interact solely through a single impurity in generic spatial dimensions, focusing on fermion systems with spatial and temporal random hoppings. By connecting the dynamics of the operator to the symmetric exclusion process with a source term, we demonstrate the presence of an escape-to-scrambling transition when tuning the interaction strength for fermions in three dimensions. As a comparison, systems in lower dimensions are proven to scramble at arbitrarily weak interactions unless the hopping becomes sufficiently long-ranged. Our predictions are validated using both a Brownian circuit with a single Majorana fermion per site and a solvable Brownian SYK model with a large local Hilbert space dimension. This suggests the universality of the theoretical picture for free fermion systems with spatial and temporal randomness.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Mars 2.0: A Toolchain for Modeling, Analysis, Verification and Code Generation of Cyber-Physical Systems
Authors:
Bohua Zhan,
Xiong Xu,
Qiang Gao,
Zekun Ji,
Xiangyu Jin,
Shuling Wang,
Naijun Zhan
Abstract:
We introduce Mars 2.0 for modeling, analysis, verification and code generation of Cyber-Physical Systems. Mars 2.0 integrates Mars 1.0 with several important extensions and improvements, allowing the design of cyber-physical systems using the combination of AADL and Simulink/Stateflow, which provide a unified graphical framework for modeling the functionality, physicality and architecture of the s…
▽ More
We introduce Mars 2.0 for modeling, analysis, verification and code generation of Cyber-Physical Systems. Mars 2.0 integrates Mars 1.0 with several important extensions and improvements, allowing the design of cyber-physical systems using the combination of AADL and Simulink/Stateflow, which provide a unified graphical framework for modeling the functionality, physicality and architecture of the system to be developed. For a safety-critical system, formal analysis and verification of its combined AADL and Simulink/Stateflow model can be conducted via the following steps. First, the toolchain automatically translates AADL and Simulink/Stateflow models into Hybrid CSP (HCSP), an extension of CSP for formally modeling hybrid systems. Second, the HCSP processes can be simulated using the HCSP simulator, and to complement incomplete simulation, they can be verified using the Hybrid Hoare Logic prover in Isabelle/HOL, as well as the more automated HHLPy prover. Finally, implementations in SystemC or C can be automatically generated from the verified HCSP processes. The transformation from AADL and Simulink/Stateflow to HCSP, and the one from HCSP to SystemC or C, are both guaranteed to be correct with formal proofs. This approach allows model-driven design of safety-critical cyber-physical systems based on graphical and formal models and proven-correct translation procedures. We demonstrate the use of the toolchain on several benchmarks of varying complexity, including several industrial-sized examples.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Lithium Abundances from the LAMOST Med-Resolution Survey Data Release 9
Authors:
Ming-Yi Ding,
Jian-Rong Shi,
Hong-liang Yan,
Chun-Qian Li,
Qi Gao,
Tian-Yi Chen,
Jing-Hua Zhang,
Shuai Liu,
Xiao-Jin Xie,
Yao-Jia Tang,
Ze-Ming Zhou,
Jiang-Tao Wang
Abstract:
Lithium is a fragile but crucial chemical element in the universe, exhibits interesting and complex behaviors. Thanks to the massive spectroscopic data from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution survey (MRS), we can investigate the lithium abundances in a large and diverse sample of stars, which could bring vital help to study the origin and evolu…
▽ More
Lithium is a fragile but crucial chemical element in the universe, exhibits interesting and complex behaviors. Thanks to the massive spectroscopic data from the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) medium-resolution survey (MRS), we can investigate the lithium abundances in a large and diverse sample of stars, which could bring vital help to study the origin and evolution of lithium. In this work, we use the Li 6,707.8 Å line to derive the lithium abundance through a template-matching method. A catalog of precise lithium abundance is presented for 795,384 spectra corresponding to 455,752 stars from the LAMOST MRS Data Release (DR) 9. Comparing our results with those of external high-resolution references we find a good consistency with a typical deviation of σ A(Li) ~ 0.2 dex. We also analyze the internal errors using stars that have multiple LAMOST MRS observations, which will reach as low as 0.1 dex when the signal-to-noise ratio (S/N) of the spectra > 20. Besides, our result indicates that a small fraction of giant stars still exhibit surprisingly high amount of lithium contents, and 967 stars are identified as Li-rich giants with A(Li) > 1.5 dex, accounting for ~ 2.6% of our samples. If one takes into account the fact that nearly all stars deplete lithium during the main sequence, then the fraction of Li-rich stars may exceed 2.6% much. This new catalog covers a wide range of stellar evolutionary stages from pre-main sequence to giants, and will provide help to the further study of the chemical evolution of lithium.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
One-Step Multi-View Clustering Based on Transition Probability
Authors:
Wenhui Zhao,
Quanxue Gao,
Guangfei Li,
Cheng Deng,
Ming Yang
Abstract:
The large-scale multi-view clustering algorithms, based on the anchor graph, have shown promising performance and efficiency and have been extensively explored in recent years. Despite their successes, current methods lack interpretability in the clustering process and do not sufficiently consider the complementary information across different views. To address these shortcomings, we introduce the…
▽ More
The large-scale multi-view clustering algorithms, based on the anchor graph, have shown promising performance and efficiency and have been extensively explored in recent years. Despite their successes, current methods lack interpretability in the clustering process and do not sufficiently consider the complementary information across different views. To address these shortcomings, we introduce the One-Step Multi-View Clustering Based on Transition Probability (OSMVC-TP). This method adopts a probabilistic approach, which leverages the anchor graph, representing the transition probabilities from samples to anchor points. Our method directly learns the transition probabilities from anchor points to categories, and calculates the transition probabilities from samples to categories, thus obtaining soft label matrices for samples and anchor points, enhancing the interpretability of clustering. Furthermore, to maintain consistency in labels across different views, we apply a Schatten p-norm constraint on the tensor composed of the soft labels. This approach effectively harnesses the complementary information among the views. Extensive experiments have confirmed the effectiveness and robustness of OSMVC-TP.
△ Less
Submitted 3 March, 2024;
originally announced March 2024.
-
GROUNDHOG: Grounding Large Language Models to Holistic Segmentation
Authors:
Yichi Zhang,
Ziqiao Ma,
Xiaofeng Gao,
Suhaila Shakiah,
Qiaozi Gao,
Joyce Chai
Abstract:
Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Lang…
▽ More
Most multimodal large language models (MLLMs) learn language-to-object grounding through causal language modeling where grounded objects are captured by bounding boxes as sequences of location tokens. This paradigm lacks pixel-level representations that are important for fine-grained visual understanding and diagnosis. In this work, we introduce GROUNDHOG, an MLLM developed by grounding Large Language Models to holistic segmentation. GROUNDHOG incorporates a masked feature extractor and converts extracted features into visual entity tokens for the MLLM backbone, which then connects groundable phrases to unified grounding masks by retrieving and merging the entity masks. To train GROUNDHOG, we carefully curated M3G2, a grounded visual instruction tuning dataset with Multi-Modal Multi-Grained Grounding, by harvesting a collection of segmentation-grounded datasets with rich annotations. Our experimental results show that GROUNDHOG achieves superior performance on various language grounding tasks without task-specific fine-tuning, and significantly reduces object hallucination. GROUNDHOG also demonstrates better grounding towards complex forms of visual input and provides easy-to-understand diagnosis in failure cases.
△ Less
Submitted 16 April, 2024; v1 submitted 26 February, 2024;
originally announced February 2024.
-
Label Learning Method Based on Tensor Projection
Authors:
Jing Li,
Quanxue Gao,
Qianqian Wang,
Cheng Deng,
Deyan Xie
Abstract:
Multi-view clustering method based on anchor graph has been widely concerned due to its high efficiency and effectiveness. In order to avoid post-processing, most of the existing anchor graph-based methods learn bipartite graphs with connected components. However, such methods have high requirements on parameters, and in some cases it may not be possible to obtain bipartite graphs with clear conne…
▽ More
Multi-view clustering method based on anchor graph has been widely concerned due to its high efficiency and effectiveness. In order to avoid post-processing, most of the existing anchor graph-based methods learn bipartite graphs with connected components. However, such methods have high requirements on parameters, and in some cases it may not be possible to obtain bipartite graphs with clear connected components. To end this, we propose a label learning method based on tensor projection (LLMTP). Specifically, we project anchor graph into the label space through an orthogonal projection matrix to obtain cluster labels directly. Considering that the spatial structure information of multi-view data may be ignored to a certain extent when projected in different views separately, we extend the matrix projection transformation to tensor projection, so that the spatial structure information between views can be fully utilized. In addition, we introduce the tensor Schatten $p$-norm regularization to make the clustering label matrices of different views as consistent as possible. Extensive experiments have proved the effectiveness of the proposed method.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
Anchor-free Clustering based on Anchor Graph Factorization
Authors:
Shikun Mei,
Fangfang Li,
Quanxue Gao,
Ming Yang
Abstract:
Anchor-based methods are a pivotal approach in handling clustering of large-scale data. However, these methods typically entail two distinct stages: selecting anchor points and constructing an anchor graph. This bifurcation, along with the initialization of anchor points, significantly influences the overall performance of the algorithm. To mitigate these issues, we introduce a novel method termed…
▽ More
Anchor-based methods are a pivotal approach in handling clustering of large-scale data. However, these methods typically entail two distinct stages: selecting anchor points and constructing an anchor graph. This bifurcation, along with the initialization of anchor points, significantly influences the overall performance of the algorithm. To mitigate these issues, we introduce a novel method termed Anchor-free Clustering based on Anchor Graph Factorization (AFCAGF). AFCAGF innovates in learning the anchor graph, requiring only the computation of pairwise distances between samples. This process, achievable through straightforward optimization, circumvents the necessity for explicit selection of anchor points. More concretely, our approach enhances the Fuzzy k-means clustering algorithm (FKM), introducing a new manifold learning technique that obviates the need for initializing cluster centers. Additionally, we evolve the concept of the membership matrix between cluster centers and samples in FKM into an anchor graph encompassing multiple anchor points and samples. Employing Non-negative Matrix Factorization (NMF) on this anchor graph allows for the direct derivation of cluster labels, thereby eliminating the requirement for further post-processing steps. To solve the method proposed, we implement an alternating optimization algorithm that ensures convergence. Empirical evaluations on various real-world datasets underscore the superior efficacy of our algorithm compared to traditional approaches.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
Formally Verified C Code Generation from Hybrid Communicating Sequential Processes
Authors:
Shuling Wang,
Zekun Ji,
Bohua Zhan,
Xiong Xu,
Qiang Gao,
Naijun Zhan
Abstract:
Hybrid Communicating Sequential Processes (HCSP) is a formal model for hybrid systems, including primitives for evolution along an ordinary differential equation (ODE), communication, and parallel composition. Code generation is needed to convert HCSP models into code that can be executed in practice, and the correctness of this conversion is essential to ensure that the generated code accurately…
▽ More
Hybrid Communicating Sequential Processes (HCSP) is a formal model for hybrid systems, including primitives for evolution along an ordinary differential equation (ODE), communication, and parallel composition. Code generation is needed to convert HCSP models into code that can be executed in practice, and the correctness of this conversion is essential to ensure that the generated code accurately reflects the formal model. In this paper, we propose a code generation algorithm from HCSP to C with POSIX library for concurrency. The main difficulties include how to bridge the gap between the synchronized communication model in HCSP and the use of mutexes for synchronization in C, and how to discretize evolution along ODEs and support interrupt of ODE evolution by communication. To prove the correctness of code generation, we define a formal semantics for POSIX C, and build transition system models for both HCSP and C programs. We then define an approximate bisimulation relation between traces of transition systems, and show that under certain robustness conditions for HCSP, the generated C program is approximately bisimilar to the original model. Finally, we evaluate the code generation algorithm on a detailed model for automatic cruise control, showing its utility on real-world examples.
△ Less
Submitted 26 February, 2024; v1 submitted 23 February, 2024;
originally announced February 2024.
-
MachineLearnAthon: An Action-Oriented Machine Learning Didactic Concept
Authors:
Michal Tkáč,
Jakub Sieber,
Lara Kuhlmann,
Matthias Brueggenolte,
Alexandru Rinciog,
Michael Henke,
Artur M. Schweidtmann,
Qinghe Gao,
Maximilian F. Theisen,
Radwa El Shawi
Abstract:
Machine Learning (ML) techniques are encountered nowadays across disciplines, from social sciences, through natural sciences to engineering. The broad application of ML and the accelerated pace of its evolution lead to an increasing need for dedicated teaching concepts aimed at making the application of this technology more reliable and responsible. However, teaching ML is a daunting task. Aside f…
▽ More
Machine Learning (ML) techniques are encountered nowadays across disciplines, from social sciences, through natural sciences to engineering. The broad application of ML and the accelerated pace of its evolution lead to an increasing need for dedicated teaching concepts aimed at making the application of this technology more reliable and responsible. However, teaching ML is a daunting task. Aside from the methodological complexity of ML algorithms, both with respect to theory and implementation, the interdisciplinary and empirical nature of the field need to be taken into consideration. This paper introduces the MachineLearnAthon format, an innovative didactic concept designed to be inclusive for students of different disciplines with heterogeneous levels of mathematics, programming and domain expertise. At the heart of the concept lie ML challenges, which make use of industrial data sets to solve real-world problems. These cover the entire ML pipeline, promoting data literacy and practical skills, from data preparation, through deployment, to evaluation.
△ Less
Submitted 29 January, 2024;
originally announced January 2024.
-
Low-resolution Prior Equilibrium Network for CT Reconstruction
Authors:
Yijie Yang,
Qifeng Gao,
Yuping Duan
Abstract:
The unrolling method has been investigated for learning variational models in X-ray computed tomography. However, it has been observed that directly unrolling the regularization model through gradient descent does not produce satisfactory results. In this paper, we present a novel deep learning-based CT reconstruction model, where the low-resolution image is introduced to obtain an effective regul…
▽ More
The unrolling method has been investigated for learning variational models in X-ray computed tomography. However, it has been observed that directly unrolling the regularization model through gradient descent does not produce satisfactory results. In this paper, we present a novel deep learning-based CT reconstruction model, where the low-resolution image is introduced to obtain an effective regularization term for improving the network`s robustness. Our approach involves constructing the backbone network architecture by algorithm unrolling that is realized using the deep equilibrium architecture. We theoretically discuss the convergence of the proposed low-resolution prior equilibrium model and provide the conditions to guarantee convergence. Experimental results on both sparse-view and limited-angle reconstruction problems are provided, demonstrating that our end-to-end low-resolution prior equilibrium model outperforms other state-of-the-art methods in terms of noise reduction, contrast-to-noise ratio, and preservation of edge details.
△ Less
Submitted 18 April, 2024; v1 submitted 28 January, 2024;
originally announced January 2024.
-
Prescribed mean curvature hypersurfaces in conformal product manifolds
Authors:
Qiang Gao,
Hengyu Zhou
Abstract:
In this paper we give the existence of prescribed mean curvature (PMC) hypersurfaces in conformal product manifolds with (possibly empty) $C^{1,α}$ fixed graphical boundaries under a barrier condition. This generalizes Gerhardt's result in conformally flat spaces. It provides new examples of the Plateau problem of PMC hypersurfaces with clear topology under high dimensions. In addition, if a quasi…
▽ More
In this paper we give the existence of prescribed mean curvature (PMC) hypersurfaces in conformal product manifolds with (possibly empty) $C^{1,α}$ fixed graphical boundaries under a barrier condition. This generalizes Gerhardt's result in conformally flat spaces. It provides new examples of the Plateau problem of PMC hypersurfaces with clear topology under high dimensions. In addition, if a quasi-decreasing condition of PMC functions is satisfied, such PMC hypersurfaces are $C^1$ graphs.
△ Less
Submitted 25 January, 2024;
originally announced January 2024.
-
Observation of possible excitonic charge density waves and metal-insulator transitions in atomically thin semimetals
Authors:
Qiang Gao,
Yang-hao Chan,
Pengfei Jiao,
Haiyang Chen,
Shuaishuai Yin,
Kanjanaporn Tangprapha,
Yichen Yang,
Xiaolong Li,
Zhengtai Liu,
Dawei Shen,
Shengwei Jiang,
Peng Chen
Abstract:
Charge density wave (CDW) is a collective quantum phenomenon with a charge modulation in solids1-2. Condensation of electron and hole pairs with finite momentum will lead to such an ordered state3-7. However, lattice symmetry breaking manifested as the softening of phonon modes can occur simultaneously, which makes it difficult to disentangle the origin of the transition8-14. Here, we report a con…
▽ More
Charge density wave (CDW) is a collective quantum phenomenon with a charge modulation in solids1-2. Condensation of electron and hole pairs with finite momentum will lead to such an ordered state3-7. However, lattice symmetry breaking manifested as the softening of phonon modes can occur simultaneously, which makes it difficult to disentangle the origin of the transition8-14. Here, we report a condensed phase in low dimensional HfTe2, whereas angle-resolved photoemission spectroscopy (ARPES) measurements show a metal-insulator transition by lowering the temperature in single triatomic layer (TL) HfTe2. A full gap opening, renormalization of the bands, and emergence of replica bands at the M point are observed in the low temperatures, indicating formation of a CDW in the ground state.Raman spectroscopy shows no sign of lattice distortion within the detection limit. The results are corroborated by first-principles calculations, demonstrating the electronic origin of the CDW. By adding more layers, the phase transition is suppressed and completely destroyed at 3 TL because of the increased screening around the Fermi surface. Interestingly, a small amount of electron doping in 1 TL film during the growth significantly raises the transition temperature (TC), which is attributed to a reduced screening effect and a more balanced electron and hole carrier density. Our results indicate a CDW formation mechanism consistent with the excitonic insulator phase in low dimensional HfTe2 and open up opportunity for realization of novel quantum states based on exciton condensation.
△ Less
Submitted 24 January, 2024;
originally announced January 2024.
-
Prospects for Joint Detection of Gravitational Waves with Counterpart Gamma-Ray Bursts Detected by the HADAR Experiment
Authors:
Pei-Jin Hu,
Qi-Ling Chen,
Tian-Lu Chen,
Ming-Ming Kang,
Yi-Qing Guo,
Dan-Zeng Luo-Bu,
You-Liang Feng,
Qi Gao,
Quan-Bu Gou,
Hong-Bo Hu,
Hai-Jin Li,
Cheng Liu,
Mao-Yuan Liu,
Wei Liu,
Xiang-Li Qian,
Bing-Qiang Qiao,
Jing-Jing Su,
Hui-Ying Sun,
Xu Wang,
Zhen Wang,
Guang-Guang Xin,
Chao-Wen Yang,
Yu-Hua Yao,
Qiang Yuan,
Yi Zhang
Abstract:
The detection of GW170817/GRB170817A implied the strong association between short gamma-ray bursts (SGRBs) and binary neutron star (BNS) mergers which produce gravitational waves (GWs). More evidence is needed to confirm the association and reveal the physical processes of BNS mergers. The upcoming High Altitude Detection of Astronomical Radiation (HADAR) experiment, excelling in a wide field of v…
▽ More
The detection of GW170817/GRB170817A implied the strong association between short gamma-ray bursts (SGRBs) and binary neutron star (BNS) mergers which produce gravitational waves (GWs). More evidence is needed to confirm the association and reveal the physical processes of BNS mergers. The upcoming High Altitude Detection of Astronomical Radiation (HADAR) experiment, excelling in a wide field of view (FOV) and a large effective area above tens of GeV, is a hope for the prompt detection of very-high-energy (VHE; > 10 GeV) SGRBs. The aim of this paper is to simulate and analyse GW/SGRB joint detections by future GW detector networks in synergy with HADAR, including the second generation LIGO, Virgo and KAGRA and the third generation ET and CE. We provide a brief introduction of the HADAR experiment for SGRB simulations and its expected SGRB detections. For GW simulations, we adopt a phenomenological model to describe GWs produced by BNS mergers and introduce the signal-noise ratios (SNRs) as detector responses. Following a theoretical analysis we compute the redshift-dependent efficiency functions of GW detector networks. We then construct the simulation of GW detection by Monte Carlo sampling. We compare the simulated results of LIGO-Virgo O2 and O3 runs with their actual detections as a check. The combination of GW and SGRB models is then discussed for joint detection, including parameter correlations, triggered SNRs and efficiency skymaps. The estimated joint detection rates are 0.09-2.52 per year for LHVK network with HADAR under different possible configurations, and approximately 0.27-7.89 per year for ET+CE network with HADAR.
△ Less
Submitted 20 January, 2024;
originally announced January 2024.
-
Autosen: improving automatic wifi human sensing through cross-modal autoencoder
Authors:
Qian Gao,
Yanling Hao,
Yuanwei Liu
Abstract:
WiFi human sensing is highly regarded for its low-cost and privacy advantages in recognizing human activities. However, its effectiveness is largely confined to controlled, single-user, line-of-sight settings, limited by data collection complexities and the scarcity of labeled datasets. Traditional cross-modal methods, aimed at mitigating these limitations by enabling self-supervised learning with…
▽ More
WiFi human sensing is highly regarded for its low-cost and privacy advantages in recognizing human activities. However, its effectiveness is largely confined to controlled, single-user, line-of-sight settings, limited by data collection complexities and the scarcity of labeled datasets. Traditional cross-modal methods, aimed at mitigating these limitations by enabling self-supervised learning without labeled data, struggle to extract meaningful features from amplitude-phase combinations. In response, we introduce AutoSen, an innovative automatic WiFi sensing solution that departs from conventional approaches. AutoSen establishes a direct link between amplitude and phase through automated cross-modal autoencoder learning. This autoencoder efficiently extracts valuable features from unlabeled CSI data, encompassing amplitude and phase information while eliminating their respective unique noises. These features are then leveraged for specific tasks using few-shot learning techniques. AutoSen's performance is rigorously evaluated on a publicly accessible benchmark dataset, demonstrating its exceptional capabilities in automatic WiFi sensing through the extraction of comprehensive cross-modal features.
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
D3PRefiner: A Diffusion-based Denoise Method for 3D Human Pose Refinement
Authors:
Danqi Yan,
Qing Gao,
Yuepeng Qian,
Xinxing Chen,
Chenglong Fu,
Yuquan Leng
Abstract:
Three-dimensional (3D) human pose estimation using a monocular camera has gained increasing attention due to its ease of implementation and the abundance of data available from daily life. However, owing to the inherent depth ambiguity in images, the accuracy of existing monocular camera-based 3D pose estimation methods remains unsatisfactory, and the estimated 3D poses usually include much noise.…
▽ More
Three-dimensional (3D) human pose estimation using a monocular camera has gained increasing attention due to its ease of implementation and the abundance of data available from daily life. However, owing to the inherent depth ambiguity in images, the accuracy of existing monocular camera-based 3D pose estimation methods remains unsatisfactory, and the estimated 3D poses usually include much noise. By observing the histogram of this noise, we find each dimension of the noise follows a certain distribution, which indicates the possibility for a neural network to learn the mapping between noisy poses and ground truth poses. In this work, in order to obtain more accurate 3D poses, a Diffusion-based 3D Pose Refiner (D3PRefiner) is proposed to refine the output of any existing 3D pose estimator. We first introduce a conditional multivariate Gaussian distribution to model the distribution of noisy 3D poses, using paired 2D poses and noisy 3D poses as conditions to achieve greater accuracy. Additionally, we leverage the architecture of current diffusion models to convert the distribution of noisy 3D poses into ground truth 3D poses. To evaluate the effectiveness of the proposed method, two state-of-the-art sequence-to-sequence 3D pose estimators are used as basic 3D pose estimation models, and the proposed method is evaluated on different types of 2D poses and different lengths of the input sequence. Experimental results demonstrate the proposed architecture can significantly improve the performance of current sequence-to-sequence 3D pose estimators, with a reduction of at least 10.3% in the mean per joint position error (MPJPE) and at least 11.0% in the Procrustes MPJPE (P-MPJPE).
△ Less
Submitted 8 January, 2024;
originally announced January 2024.
-
Nearest-Neighboring Pairing of Monolayer NbSe2 Facilitates the Emergence of Topological Superconducting States
Authors:
Yizhi Li,
Quan Gao,
Yanru Li,
Jianxin Zhong,
Lijun Meng
Abstract:
NbSe2, which simultaneously exhibits superconductivity and spin-orbit coupling, is anticipated to pave the way for topological superconductivity and unconventional electron pairing. In this paper, we systematically study topological superconducting (TSC) phases in monolayer NbSe2 through mixing on-site s-wave pairing (ps) with nearest-neighbor pairing (psA1) based on a tight-binding model. We obse…
▽ More
NbSe2, which simultaneously exhibits superconductivity and spin-orbit coupling, is anticipated to pave the way for topological superconductivity and unconventional electron pairing. In this paper, we systematically study topological superconducting (TSC) phases in monolayer NbSe2 through mixing on-site s-wave pairing (ps) with nearest-neighbor pairing (psA1) based on a tight-binding model. We observe rich phases with both fixed and sensitive Chern numbers (CNs) depending on the chemical potential (μ) and out-of-plane magnetic field (Vz). As the psA1 increases, the TSC phase manifests matching and mismatching features according to whether there is a bulk-boundary correspondence (BBC). Strikingly, the introduction of mixed wave pairing significantly reduces the critical Vz to form TSC phases compared with the pure s-wave paring. Moreover, the TSC phase can be modulated even at Vz=0 under appropriate μ and psA1, which is identified by the robust topological edge states (TESs) of ribbons. Additionally, the mixed pairing influences the hybridization of bulk and edge states, resulting in a matching/mismatching BBC with localized/oscillating TESs on the ribbon. Our finding is helpful for the realization of TSC states in experiment, as well as designing and regulating TSC materials.
△ Less
Submitted 4 January, 2024;
originally announced January 2024.