subscribe to arXiv mailings

An integrated electro-optically tunable multi-channel interference cavity laser

Authors: Junxia Zhou, Yiran Zhu, Botao Fu, Jinming Chen, Huiting Song, Zhihao Zhang, Jianping Yu, Jian Liu, Min Wang, Jia Qi, Ya Cheng

Abstract: We demonstrated a continuously tunable laser system by butt coupling a reflective semiconductor optical amplifier (RSOA) chip with a thin-film lithium niobate (TFLN) based multi-channel interference (MCI) cavity chip. This hybrid integrated lasers allows for fine-tuning of the laser wavelength from 1538 nm to 1560 nm with a resolution of 0.014 nm and a side-mode suppression ratio (SMSR) exceeding… ▽ More We demonstrated a continuously tunable laser system by butt coupling a reflective semiconductor optical amplifier (RSOA) chip with a thin-film lithium niobate (TFLN) based multi-channel interference (MCI) cavity chip. This hybrid integrated lasers allows for fine-tuning of the laser wavelength from 1538 nm to 1560 nm with a resolution of 0.014 nm and a side-mode suppression ratio (SMSR) exceeding 30 dB. The MCI cavity chip is fabricated using the photolithography assisted chemo-mechanical etching (PLACE) technique. The developed laser has an output power of approximately 10 μW, which can be further amplified to 70 mW using a commercial erbium-doped fiber amplifier (EDFA) without significant broadening of the laser linewidth. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.09162 [pdf, other]

EMMA: Your Text-to-Image Diffusion Model Can Secretly Accept Multi-Modal Prompts

Authors: Yucheng Han, Rui Wang, Chi Zhang, Juntao Hu, Pei Cheng, Bin Fu, Hanwang Zhang

Abstract: Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image gen… ▽ More Recent advancements in image generation have enabled the creation of high-quality images from text conditions. However, when facing multi-modal conditions, such as text combined with reference appearances, existing methods struggle to balance multiple conditions effectively, typically showing a preference for one modality over others. To address this challenge, we introduce EMMA, a novel image generation model accepting multi-modal prompts built upon the state-of-the-art text-to-image (T2I) diffusion model, ELLA. EMMA seamlessly incorporates additional modalities alongside text to guide image generation through an innovative Multi-modal Feature Connector design, which effectively integrates textual and supplementary modal information using a special attention mechanism. By freezing all parameters in the original T2I diffusion model and only adjusting some additional layers, we reveal an interesting finding that the pre-trained T2I diffusion model can secretly accept multi-modal prompts. This interesting property facilitates easy adaptation to different existing frameworks, making EMMA a flexible and effective tool for producing personalized and context-aware images and even videos. Additionally, we introduce a strategy to assemble learned EMMA modules to produce images conditioned on multiple modalities simultaneously, eliminating the need for additional training with mixed multi-modal prompts. Extensive experiments demonstrate the effectiveness of EMMA in maintaining high fidelity and detail in generated images, showcasing its potential as a robust solution for advanced multi-modal conditional image generation tasks. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: https://tencentqqgylab.github.io/EMMA

arXiv:2406.04594 [pdf, other]

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Authors: Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Huang Zhong, Dennis Cai, Yuan Xie, Binzhang Fu

Abstract: The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the… ▽ More The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.20853 [pdf, other]

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

Authors: Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu, Fukun Yin, Yanru Wang, Zhibin Wang, Chi Zhang, Jingyi Yu, Gang Yu, Bin Fu, Tao Chen

Abstract: The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation… ▽ More The polygon mesh representation of 3D data exhibits great flexibility, fast rendering speed, and storage efficiency, which is widely preferred in various applications. However, given its unstructured graph representation, the direct generation of high-fidelity 3D meshes is challenging. Fortunately, with a pre-defined ordering strategy, 3D meshes can be represented as sequences, and the generation process can be seamlessly treated as an auto-regressive problem. In this paper, we validate the Neural Coordinate Field (NeurCF), an explicit coordinate representation with implicit neural embeddings, is a simple-yet-effective representation for large-scale sequential mesh modeling. After that, we present MeshXL, a family of generative pre-trained auto-regressive models, which addresses the process of 3D mesh generation with modern large language model approaches. Extensive experiments show that MeshXL is able to generate high-quality 3D meshes, and can also serve as foundation models for various down-stream applications. △ Less

Submitted 18 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

arXiv:2405.09121 [pdf, other]

Dirac Fermions and Topological Phases in Magnetic Topological Insulator Films

Authors: Kai-Zhi Bai, Bo Fu, Shun-Qing Shen

Abstract: We develop a Dirac fermion theory for topological phases in magnetic topological insulator films. The theory is based on exact solutions of the energies and the wave functions for an effective model of the three-dimensional topological insulator (TI) film. It is found that the TI film consists of a pair of massless or massive Dirac fermions for the surface states, and a series of massive Dirac fer… ▽ More We develop a Dirac fermion theory for topological phases in magnetic topological insulator films. The theory is based on exact solutions of the energies and the wave functions for an effective model of the three-dimensional topological insulator (TI) film. It is found that the TI film consists of a pair of massless or massive Dirac fermions for the surface states, and a series of massive Dirac fermions for the bulk states. The massive Dirac fermion always carries zero or integer quantum Hall conductance when the valence band is fully occupied while the massless Dirac fermion carries a one-half quantum Hall conductance when the chemical potential is located around the Dirac point for a finite range. The magnetic exchange interaction in the magnetic layers in the film can be used to manipulate either the masses or chirality of the Dirac fermions and gives rise to distinct topological phases, which cover the known topological insulating phases, such as quantum anomalous Hall effect, quantum spin Hall effect and axion effect, and also the novel topological metallic phases, such as half quantized Hall effect, half quantum mirror Hall effect, and metallic quantum anomalous Hall effect. △ Less

Submitted 15 May, 2024; originally announced May 2024.

arXiv:2405.08686 [pdf]

Antiferromagnetic Quantum Anomalous Hall Effect Modulated by Spin Flips and Flops

Authors: Zichen Lian, Yongchao Wang, Yongqian Wang, Yang Feng, Zehao Dong, Shuai Yang, Liangcai Xu, Yaoxin Li, Bohan Fu, Yuetan Li, Wanjun Jiang, Chang Liu, Jinsong Zhang, Yayu Wang

Abstract: The interplay between nontrivial band topology and layered antiferromagnetism in MnBi2Te4 has opened up a new avenue for exploring topological phases of matter. Representative examples include the quantum anomalous Hall effect and axion insulator state observed in odd and even number layers of MnBi2Te4, when the top and bottom surfaces have parallel and antiparallel spin alignments respectively. T… ▽ More The interplay between nontrivial band topology and layered antiferromagnetism in MnBi2Te4 has opened up a new avenue for exploring topological phases of matter. Representative examples include the quantum anomalous Hall effect and axion insulator state observed in odd and even number layers of MnBi2Te4, when the top and bottom surfaces have parallel and antiparallel spin alignments respectively. The rich and complex spin dynamics associated with the van der Waals antiferromagnetic order is expected to generate novel topological phases and phase transitions that are unique to MnBi2Te4. Here we fabricate a device of 7-septuple-layer MnBi2Te4 covered with AlOx capping layer, which enables the investigation of antiferromagnetic quantum anomalous Hall effect over wide parameter spaces. By tuning the gate voltage and perpendicular magnetic field, we uncover a cascade of quantum phase transitions that can be attributed to the influence of spin configurations on charge transport. Furthermore, we find that an in-plane magnetic field enhances both the coercive field and exchange gap of the surface state, in sharp contrast to that in ferromagnetic quantum anomalous Hall state. We propose that these peculiar features arise from the spin flip and flop transitions inherent to van der Waals antiferromagnet. The versatile tunability of the quantum anomalous Hall effect in MnBi2Te4 paves the way for potential applications in topological antiferromagnetic spintronics. △ Less

Submitted 14 May, 2024; originally announced May 2024.

Comments: 16 pages, 4 figures

arXiv:2405.08677 [pdf]

Towards the Quantized Anomalous Hall effect in AlO$_x$-capped MnBi$_2$Te$_4$

Authors: Yongqian Wang, Bohan Fu, Yongchao Wang, Zicheng Lian, Shuai Yang, Yaoxin Li, Liangcai Xu, Zhiting Gao, Wanjun Jiang, Jinsong Zhang, Yayu Wang, Chang Liu

Abstract: The quantum anomalous Hall effect in layered antiferromagnet MnBi$_2$Te$_4$ harbors a rich interplay between magnetism and topology, holding a significant promise for low-power electronic devices and topological antiferromagnetic spintronics. In recent years, MnBi$_2$Te$_4$ has garnered considerable attention as the only known material to exhibit the antiferromagnetic quantum anomalous Hall effect… ▽ More The quantum anomalous Hall effect in layered antiferromagnet MnBi$_2$Te$_4$ harbors a rich interplay between magnetism and topology, holding a significant promise for low-power electronic devices and topological antiferromagnetic spintronics. In recent years, MnBi$_2$Te$_4$ has garnered considerable attention as the only known material to exhibit the antiferromagnetic quantum anomalous Hall effect. However, this field faces significant challenges as realizing quantized transport at zero magnetic fields depends critically on fabricating high-quality device. In this article, we address the detrimental influences of fabrication on MnBi$_2$Te$_4$ by simply depositing an AlO$_x$ thin layer on the surface prior to fabrications. Optical contrast and magnetotransport measurements on over 50 samples demonstrate that AlO$_x$ can effectively preserve the pristine state of the samples and significantly enhance the anomalous Hall effect towards quantization. Scaling analysis reveals the Berry curvature dominated mechanism of the anomalous Hall effect at various magnetic configurations. By adjusting the gate voltage, we uncover a gate independent antiferromagnetism in MnBi$_2$Te$_4$. Our experiment not only pave the way for fabricating high-quality transport devices but also advance the exploration of exotic quantum physics in 2D materials. △ Less

Submitted 19 May, 2024; v1 submitted 14 May, 2024; originally announced May 2024.

Comments: 21 pages, 4 figures

arXiv:2405.05590 [pdf, other]

TroLLoc: Logic Locking and Layout Hardening for IC Security Closure against Hardware Trojans

Authors: Fangzhou Wang, Qijing Wang, Lilas Alrahis, Bangqi Fu, Shui Jiang, Xiaopeng Zhang, Ozgur Sinanoglu, Tsung-Yi Ho, Evangeline F. Y. Young, Johann Knechtel

Abstract: Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many security threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically protect the physical layouts of ICs against… ▽ More Due to cost benefits, supply chains of integrated circuits (ICs) are largely outsourced nowadays. However, passing ICs through various third-party providers gives rise to many security threats, like piracy of IC intellectual property or insertion of hardware Trojans, i.e., malicious circuit modifications. In this work, we proactively and systematically protect the physical layouts of ICs against post-design insertion of Trojans. Toward that end, we propose TroLLoc, a novel scheme for IC security closure that employs, for the first time, logic locking and layout hardening in unison. TroLLoc is fully integrated into a commercial-grade design flow, and TroLLoc is shown to be effective, efficient, and robust. Our work provides in-depth layout and security analysis considering the challenging benchmarks of the ISPD'22/23 contests for security closure. We show that TroLLoc successfully renders layouts resilient, with reasonable overheads, against (i) general prospects for Trojan insertion as in the ISPD'22 contest, (ii) actual Trojan insertion as in the ISPD'23 contest, and (iii) potential second-order attacks where adversaries would first (i.e., before Trojan insertion) try to bypass the locking defense, e.g., using advanced machine learning attacks. Finally, we release all our artifacts for independent verification [2]. △ Less

Submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.00940 [pdf, other]

Computing Threshold Circuits with Bimolecular Void Reactions in Step Chemical Reaction Networks

Authors: Rachel Anderson, Bin Fu, Aiden Massie, Gourab Mukhopadhyay, Adrian Salinas, Robert Schweller, Evan Tomai, Tim Wylie

Abstract: Step Chemical Reaction Networks (step CRNs) are an augmentation of the Chemical Reaction Network (CRN) model where additional species may be introduced to the system in a sequence of ``steps.'' We study step CRN systems using a weak subset of reaction rules, \emph{void} rules, in which molecular species can only be deleted. We demonstrate that step CRNs with only void rules of size (2,0) can simul… ▽ More Step Chemical Reaction Networks (step CRNs) are an augmentation of the Chemical Reaction Network (CRN) model where additional species may be introduced to the system in a sequence of ``steps.'' We study step CRN systems using a weak subset of reaction rules, \emph{void} rules, in which molecular species can only be deleted. We demonstrate that step CRNs with only void rules of size (2,0) can simulate threshold formulas (TFs) under linear resources. These limited systems can also simulate threshold \emph{circuits} (TCs) by modifying the volume of the system to be exponential. We then prove a matching exponential lower bound on the required volume for simulating threshold circuits in a step CRN with (2,0)-size rules under a restricted \emph{gate-wise} simulation, thus showing our construction is optimal for simulating circuits in this way. △ Less

Submitted 1 May, 2024; originally announced May 2024.

Comments: arXiv admin note: text overlap with arXiv:2402.08220

arXiv:2404.16931 [pdf, other]

Type-I two-Higgs-doublet model and gravitational waves from domain walls bounded by strings

Authors: Bowen Fu, Anish Ghoshal, Stephen F. King, Moinul Hossain Rahat

Abstract: The spontaneous breaking of a $U(1)$ symmetry via an intermediate discrete symmetry may yield a hybrid topological defect of "domain walls bounded by cosmic strings". The decay of this defect network leads to a unique gravitational wave signal spanning many orders in observable frequencies, that can be distinguished from signals generated by other sources. We investigate the production of gravitat… ▽ More The spontaneous breaking of a $U(1)$ symmetry via an intermediate discrete symmetry may yield a hybrid topological defect of "domain walls bounded by cosmic strings". The decay of this defect network leads to a unique gravitational wave signal spanning many orders in observable frequencies, that can be distinguished from signals generated by other sources. We investigate the production of gravitational waves from this mechanism in the context of the type-I two-Higgs-doublet model extended by a $U(1)_R$ symmetry, that simultaneously accommodates the seesaw mechanism, anomaly cancellation, and eliminates flavour-changing neutral currents. The gravitational wave spectrum produced by the string-bounded-wall network can be detected for $U(1)_R$ breaking scale from $10^{12}$ to $10^{15}$ GeV in forthcoming interferometers including LISA and Einstein Telescope, with a distinctive $f^{3}$ slope and inflexion in the frequency range between microhertz and hertz. △ Less

Submitted 25 April, 2024; originally announced April 2024.

Comments: 20 pages, 4 figures

arXiv:2404.04926 [pdf, other]

On the cosmological abundance of magnetic monopoles

Authors: Chen Zhang, Shi-Hao Zhang, Bowen Fu, Jing-Fei Zhang, Xin Zhang

Abstract: We demonstrate that Debye shielding cannot be employed to constrain the cosmological abundance of magnetic monopoles, contrary to what is stated in the previous literature. Current model-independent bounds on the monopole abundance are then revisited for unit Dirac magnetic charge. We find that the Andromeda Parker bound can be employed to set an upper limit on the monopole flux at the level of… ▽ More We demonstrate that Debye shielding cannot be employed to constrain the cosmological abundance of magnetic monopoles, contrary to what is stated in the previous literature. Current model-independent bounds on the monopole abundance are then revisited for unit Dirac magnetic charge. We find that the Andromeda Parker bound can be employed to set an upper limit on the monopole flux at the level of $F_M\lesssim 5.3\times 10^{-19}\,\text{cm}^{-2}\text{s}^{-1}\text{sr}^{-1}$ for a monopole mass $10^{13}\,\text{GeV}/c^2\lesssim m\lesssim 10^{16}\,\text{GeV}/c^2$, which is more stringent than the MACRO direct search limit by two orders of magnitude. This translates into stringent constraints on the monopole density parameter $Ω_M$ at the level of $10^{-7}-10^{-4}$ depending on the mass. For larger monopole masses the scenarios in which magnetic monopoles account for all or the majority of dark matter are disfavored. △ Less

Submitted 15 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

Comments: 24 pages, 2 figures. The treatment of the Lorentz boost factor is corrected with main results unaffected. References added

arXiv:2403.11974 [pdf, other]

OUCopula: Bi-Channel Multi-Label Copula-Enhanced Adapter-Based CNN for Myopia Screening Based on OU-UWF Images

Authors: Yang Li, Qiuyi Huang, Chong Zhong, Danjuan Yang, Meiyan Li, A. H. Welsh, Aiyi Liu, Bo Fu, Catherien C. Liu, Xingtao Zhou

Abstract: Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex… ▽ More Myopia screening using cutting-edge ultra-widefield (UWF) fundus imaging is potentially significant for ophthalmic outcomes. Current multidisciplinary research between ophthalmology and deep learning (DL) concentrates primarily on disease classification and diagnosis using single-eye images, largely ignoring joint modeling and prediction for Oculus Uterque (OU, both eyes). Inspired by the complex relationships between OU and the high correlation between the (continuous) outcome labels (Spherical Equivalent and Axial Length), we propose a framework of copula-enhanced adapter convolutional neural network (CNN) learning with OU UWF fundus images (OUCopula) for joint prediction of multiple clinical scores. We design a novel bi-channel multi-label CNN that can (1) take bi-channel image inputs subject to both high correlation and heterogeneity (by sharing the same backbone network and employing adapters to parameterize the channel-wise discrepancy), and (2) incorporate correlation information between continuous output labels (using a copula). Solid experiments show that OUCopula achieves satisfactory performance in myopia score prediction compared to backbone models. Moreover, OUCopula can far exceed the performance of models constructed for single-eye inputs. Importantly, our study also hints at the potential extension of the bi-channel model to a multi-channel paradigm and the generalizability of OUCopula across various backbone CNNs. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.05851 [pdf, other]

Interest-Aware Joint Caching, Computing, and Communication Optimization for Mobile VR Delivery in MEC Networks

Authors: Baojie Fu, Tong Tang, Dapeng Wu, Ruyan Wang

Abstract: In the upcoming B5G/6G era, virtual reality (VR) over wireless has become a typical application, which is an inevitable trend in the development of video. However, in immersive and interactive VR experiences, VR services typically exhibit high delay, while simultaneously posing challenges for the energy consumption of local devices. To address these issues, this paper aims to improve the performan… ▽ More In the upcoming B5G/6G era, virtual reality (VR) over wireless has become a typical application, which is an inevitable trend in the development of video. However, in immersive and interactive VR experiences, VR services typically exhibit high delay, while simultaneously posing challenges for the energy consumption of local devices. To address these issues, this paper aims to improve the performance of the VR service in the edge-terminal cooperative system. Specifically, we formulate a problem of joint caching, computing, and communication VR service policy, by optimizing the weighted sum of overall VR delivery delay and energy consumption of local devices. For the purpose of designing the optimal VR service policy, the optimization problem is decoupled into three independent subproblems to be solved separately. To enhance the caching efficiency within the network, a bidirectional encoder representations from transformers (Bert)-based user interest analysis method is first proposed to characterize the content requesting behavior accurately. On the basis of this, a service cost minimum-maximization problem is formulated with consideration of performance fairness among users. Thereafter, the joint caching and computing scheme is derived for each user with given allocation of communication resources while a bisection-based communication scheme is acquired with the given information on joint caching and computing policy. With alternative optimization, an optimal policy for joint caching, computing and communication based on user interest can be finally obtained. Simulation results are presented to demonstrate the superiority of the proposed user interest-aware caching scheme and the effective of the joint caching, computing and communication optimization policy with consideration of user fairness. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2403.05135 [pdf, other]

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Authors: Xiwei Hu, Rui Wang, Yixiao Fang, Bin Fu, Pei Cheng, Gang Yu

Abstract: Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation. However, most widely used models still employ CLIP as their text encoder, which constrains their ability to comprehend dense prompts, encompassing multiple objects, detailed attributes, complex relationships, long-text alignment, etc. In this paper, we introduce an Efficient Large Language Model Ad… ▽ More Diffusion models have demonstrated remarkable performance in the domain of text-to-image generation. However, most widely used models still employ CLIP as their text encoder, which constrains their ability to comprehend dense prompts, encompassing multiple objects, detailed attributes, complex relationships, long-text alignment, etc. In this paper, we introduce an Efficient Large Language Model Adapter, termed ELLA, which equips text-to-image diffusion models with powerful Large Language Models (LLM) to enhance text alignment without training of either U-Net or LLM. To seamlessly bridge two pre-trained models, we investigate a range of semantic alignment connector designs and propose a novel module, the Timestep-Aware Semantic Connector (TSC), which dynamically extracts timestep-dependent conditions from LLM. Our approach adapts semantic features at different stages of the denoising process, assisting diffusion models in interpreting lengthy and intricate prompts over sampling timesteps. Additionally, ELLA can be readily incorporated with community models and tools to improve their prompt-following capabilities. To assess text-to-image models in dense prompt following, we introduce Dense Prompt Graph Benchmark (DPG-Bench), a challenging benchmark consisting of 1K dense prompts. Extensive experiments demonstrate the superiority of ELLA in dense prompt following compared to state-of-the-art methods, particularly in multiple object compositions involving diverse attributes and relationships. △ Less

Submitted 8 March, 2024; originally announced March 2024.

Comments: Project Page: https://ella-diffusion.github.io/

arXiv:2402.08220 [pdf, other]

Computing Threshold Circuits with Void Reactions in Step Chemical Reaction Networks

Authors: Rachel Anderson, Alberto Avila, Bin Fu, Timothy Gomez, Elise Grizzell, Aiden Massie, Gourab Mukhopadhyay, Adrian Salinas, Robert Schweller, Evan Tomai, Tim Wylie

Abstract: We introduce a new model of \emph{step} Chemical Reaction Networks (step CRNs), motivated by the step-wise addition of materials in standard lab procedures. Step CRNs have ordered reactants that transform into products via reaction rules over a series of steps. We study an important subset of weak reaction rules, \emph{void} rules, in which chemical species may only be deleted but never changed. W… ▽ More We introduce a new model of \emph{step} Chemical Reaction Networks (step CRNs), motivated by the step-wise addition of materials in standard lab procedures. Step CRNs have ordered reactants that transform into products via reaction rules over a series of steps. We study an important subset of weak reaction rules, \emph{void} rules, in which chemical species may only be deleted but never changed. We demonstrate the capabilities of these simple limited systems to simulate threshold circuits and compute functions using various configurations of rule sizes and step constructions, and prove that without steps, void rules are incapable of these computations, which further motivates the step model. Additionally, we prove the coNP-completeness of verifying if a given step CRN computes a function, holding even for $O(1)$ step systems. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.02654 [pdf, ps, other]

Half Quantum Mirror Hall Effect

Authors: Bo Fu, Kai-Zhi Bai, Shun-Qing Shen

Abstract: We report the discovery of the half-quantized mirror Hall effect, a novel quantum-anomaly induced by mirror symmetry in a strong topological insulator (TI) film. These films are known to host a pair of gapless Dirac cones associated with surface electrons. Our findings reveal that mirror symmetry assigns a unique mirror parity to each Dirac cone, resulting in a half-quantized Hall conductance of… ▽ More We report the discovery of the half-quantized mirror Hall effect, a novel quantum-anomaly induced by mirror symmetry in a strong topological insulator (TI) film. These films are known to host a pair of gapless Dirac cones associated with surface electrons. Our findings reveal that mirror symmetry assigns a unique mirror parity to each Dirac cone, resulting in a half-quantized Hall conductance of $\pm\frac{e^{2}}{2h}$ for each cone. Despite the total electric Hall conductance being null due to time-reversal invariance, the difference in the Hall conductance between the two cones yields a quantized Hall conductance of $\frac{e^{2}}{h}$ for the difference in mirror currents. The effect of helical edge mirror current, a crucial feature of this quantum effect, can be determined by means of electrical measurements. Overall, the half-quantum mirror Hall effect reveals a new type of mirror-symmetry induced quantum anomaly in a time-reversal invariant lattice system, giving rise to a topological metallic state of matter with time-reversal invariance. △ Less

Submitted 4 February, 2024; originally announced February 2024.

Comments: 26 pages, 3 figures

arXiv:2401.07752 [pdf, other]

Theoretical Design of Mono-Elemental Ferroelectricity with Tunable Spin Textures in Bilayer Tellurium

Authors: Jiajun Zhu, Botao Fu, Heyun Zhao, Wanbiao Hu

Abstract: 2D Ferroelectricity with switchable electric polarization has drawn widespread attention in condensed matter physics due to its crucial applications in non-volatile memory and ferroelectric spin devices. Despite recent progress in 2D ferroelectric, achieving the mono-elemental ferroelectricity still remains a great challenge because most nonmetallic mono-elemental materials are stabilized in nonpo… ▽ More 2D Ferroelectricity with switchable electric polarization has drawn widespread attention in condensed matter physics due to its crucial applications in non-volatile memory and ferroelectric spin devices. Despite recent progress in 2D ferroelectric, achieving the mono-elemental ferroelectricity still remains a great challenge because most nonmetallic mono-elemental materials are stabilized in nonpolar crystal structures. In this work, we theoretically designed mono-elemental ferroelectricity with tunable and significant spin textures in bilayer tellurium (BL-Te). Comprehensive quantitative polarization calculations demonstrate that asymmetric stacking in BL-Te can generate out-of-plane (OOP) polarization with a magnitude of 0.78 pC/m. This polarization stems from distinguishing interlayer and intra-layer contributions. Moreover, these stacked BL-Te, characterized by significant spin-orbit coupling, serve as an ideal platform for investigating both conventional spin polarization and layer-dependent/hidden spin polarization through ferroelectric reversion. Our work not only broaden the category of 2D mono-elemental ferroelectric but also offer a new platform for multifunctional nanodevices. △ Less

Submitted 15 January, 2024; originally announced January 2024.

Comments: 7 page, 4 figures

arXiv:2401.03165 [pdf, other]

doi 10.1038/s42005-023-01508-2

Non-orthogonal cavity modes near exceptional points in the far field

Authors: Jingnan Yang, Shushu Shi, Sai Yan, Rui Zhu, Xiaoming Zhao, Yi Qin, Bowen Fu, Xiqing Chen, Hancong Li, Zhanchun Zuo, Kuijuan Jin, Qihuang Gong, Xiulai Xu

Abstract: Non-orthogonal eigenstates are a fundamental feature of non-Hermitian systems and are accompanied by the emergence of nontrivial features. However, the platforms to explore non-Hermitian mode couplings mainly measure near-field effects, and the far-field behaviour remain mostly unexplored. Here, we study how a microcavity with non-Hermitian mode coupling exhibits eigenstate non-orthogonality by in… ▽ More Non-orthogonal eigenstates are a fundamental feature of non-Hermitian systems and are accompanied by the emergence of nontrivial features. However, the platforms to explore non-Hermitian mode couplings mainly measure near-field effects, and the far-field behaviour remain mostly unexplored. Here, we study how a microcavity with non-Hermitian mode coupling exhibits eigenstate non-orthogonality by investigating the spatial field and the far-field polarization of cavity modes. The non-Hermiticity arises from asymmetric backscattering, which is controlled by integrating two scatterers of different size and location into a microdisk. We observe that the spatial field overlaps of two modes increases abruptly to its maximum value, whilst different far-field elliptical polarizations of two modes coalesce when approaching an exceptional point. We demonstrate such features experimentally by measuring the far-field polarization from the fabricated microdisks. Our work reveals the non-orthogonality in the far-field degree of freedom, and the integrability of the microdisks paves a way to integrate more non-Hermitian optical properties into nanophotonic systems. △ Less

Submitted 6 January, 2024; originally announced January 2024.

Comments: 11pages, 4 figures

Journal ref: Communications Physics,7,13 (2024)

arXiv:2312.15645 [pdf, other]

Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment

Authors: Rui Zhao, Liang Zhang, Biao Fu, Cong Hu, Jinsong Su, Yidong Chen

Abstract: Sign language translation (SLT) aims to convert continuous sign language videos into textual sentences. As a typical multi-modal task, there exists an inherent modality gap between sign language videos and spoken language text, which makes the cross-modal alignment between visual and textual modalities crucial. However, previous studies tend to rely on an intermediate sign gloss representation to… ▽ More Sign language translation (SLT) aims to convert continuous sign language videos into textual sentences. As a typical multi-modal task, there exists an inherent modality gap between sign language videos and spoken language text, which makes the cross-modal alignment between visual and textual modalities crucial. However, previous studies tend to rely on an intermediate sign gloss representation to help alleviate the cross-modal problem thereby neglecting the alignment across modalities that may lead to compromised results. To address this issue, we propose a novel framework based on Conditional Variational autoencoder for SLT (CV-SLT) that facilitates direct and sufficient cross-modal alignment between sign language videos and spoken language text. Specifically, our CV-SLT consists of two paths with two Kullback-Leibler (KL) divergences to regularize the outputs of the encoder and decoder, respectively. In the prior path, the model solely relies on visual information to predict the target text; whereas in the posterior path, it simultaneously encodes visual information and textual knowledge to reconstruct the target text. The first KL divergence optimizes the conditional variational autoencoder and regularizes the encoder outputs, while the second KL divergence performs a self-distillation from the posterior path to the prior path, ensuring the consistency of decoder outputs. We further enhance the integration of textual information to the posterior path by employing a shared Attention Residual Gaussian Distribution (ARGD), which considers the textual information in the posterior path as a residual component relative to the prior path. Extensive experiments conducted on public datasets (PHOENIX14T and CSL-daily) demonstrate the effectiveness of our framework, achieving new state-of-the-art results while significantly alleviating the cross-modal representation discrepancy. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: Accepted as conference paper by AAAI24. The code and models are available at https://github.com/rzhao-zhsq/CV-SLT

arXiv:2312.13913 [pdf, other]

Paint3D: Paint Anything 3D with Lighting-Less Texture Diffusion Models

Authors: Xianfang Zeng, Xin Chen, Zhongqi Qi, Wen Liu, Zibo Zhao, Zhibin Wang, Bin Fu, Yong Liu, Gang Yu

Abstract: This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within mod… ▽ More This paper presents Paint3D, a novel coarse-to-fine generative framework that is capable of producing high-resolution, lighting-less, and diverse 2K UV texture maps for untextured 3D meshes conditioned on text or image inputs. The key challenge addressed is generating high-quality textures without embedded illumination information, which allows the textures to be re-lighted or re-edited within modern graphics pipelines. To achieve this, our method first leverages a pre-trained depth-aware 2D diffusion model to generate view-conditional images and perform multi-view texture fusion, producing an initial coarse texture map. However, as 2D models cannot fully represent 3D shapes and disable lighting effects, the coarse texture map exhibits incomplete areas and illumination artifacts. To resolve this, we train separate UV Inpainting and UVHD diffusion models specialized for the shape-aware refinement of incomplete areas and the removal of illumination artifacts. Through this coarse-to-fine process, Paint3D can produce high-quality 2K UV textures that maintain semantic consistency while being lighting-less, significantly advancing the state-of-the-art in texturing 3D objects. △ Less

Submitted 22 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Project Website: https://github.com/OpenTexture/Paint3D

arXiv:2312.13771 [pdf, other]

AppAgent: Multimodal Agents as Smartphone Users

Authors: Chi Zhang, Zhao Yang, Jiaxuan Liu, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, Gang Yu

Abstract: Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping… ▽ More Recent advancements in large language models (LLMs) have led to the creation of intelligent agents capable of performing complex tasks. This paper introduces a novel LLM-based multimodal agent framework designed to operate smartphone applications. Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps. Central to our agent's functionality is its innovative learning method. The agent learns to navigate and use new apps either through autonomous exploration or by observing human demonstrations. This process generates a knowledge base that the agent refers to for executing complex tasks across different applications. To demonstrate the practicality of our agent, we conducted extensive testing over 50 tasks in 10 different applications, including social media, email, maps, shopping, and sophisticated image editing tools. The results affirm our agent's proficiency in handling a diverse array of high-level tasks. △ Less

Submitted 21 December, 2023; v1 submitted 21 December, 2023; originally announced December 2023.

Comments: Project Page is https://appagent-official.github.io/

arXiv:2312.10347 [pdf]

Integrated multi-color Raman microlasers with ultra-low pump levels in single high-Q lithium niobate microdisks

Authors: Guanghui Zhao, Jintian Lin, Botao Fu, Renhong Gao, Chuntao Li, Ni Yao, Jianglin Guan, Minghui Li, Min Wang, Lingling Qiao, Ya Cheng

Abstract: Photonic integrated Raman microlasers, particularly discrete multi-color lasers which are crucial for extending the emission wavelength range of chip-scale laser sources to much shorter wavelength, are highly in demand for various spectroscopy, microscopy analysis, and biological detection. However, integrated multi-color Raman microlasers have yet to be demonstrated because of the requirement of… ▽ More Photonic integrated Raman microlasers, particularly discrete multi-color lasers which are crucial for extending the emission wavelength range of chip-scale laser sources to much shorter wavelength, are highly in demand for various spectroscopy, microscopy analysis, and biological detection. However, integrated multi-color Raman microlasers have yet to be demonstrated because of the requirement of high-Q microresonators possessing large second-order nonlinearity and strong Raman phonon branches and the challenging in cavity-enhanced multi-photon hyper-Raman scattering parametric process. In this work, integrated multi-color Raman lasers have been demonstrated for the first time at weak pump levels, via the excitation of high-Q (>6 X 10^6) phase-matched modes in single thin-film lithium niobate (TFLN) microresonators by dispersion engineering. Raman lasing was observed at 1712 nm for a 1546-nm pump threshold power of only 620 uW. Furthermore, multi-color Raman lasers were realized at discrete wavelengths of 1712 nm, 813 nm, 533 nm and 406 nm with pump levels as low as 1.60 mW, which is more than two order of magnitude lower than the current records (i.e., 200 mW) in bulk resonators, allowed by the fulfillment of the requisite conditions consisting of broadband natural phase match, multiple-resonance and high Q-factors. △ Less

Submitted 16 December, 2023; originally announced December 2023.

Comments: 16 pages,4 figures

arXiv:2312.02663 [pdf, other]

FaceStudio: Put Your Face Everywhere in Seconds

Authors: Yuxuan Yan, Chi Zhang, Rui Wang, Yichao Zhou, Gege Zhang, Pei Cheng, Gang Yu, Bin Fu

Abstract: This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch. Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation, but they come with significant drawbacks. These include the need for extensive resources and time for f… ▽ More This study investigates identity-preserving image synthesis, an intriguing task in image generation that seeks to maintain a subject's identity while adding a personalized, stylistic touch. Traditional methods, such as Textual Inversion and DreamBooth, have made strides in custom image creation, but they come with significant drawbacks. These include the need for extensive resources and time for fine-tuning, as well as the requirement for multiple reference images. To overcome these challenges, our research introduces a novel approach to identity-preserving synthesis, with a particular focus on human images. Our model leverages a direct feed-forward mechanism, circumventing the need for intensive fine-tuning, thereby facilitating quick and efficient image generation. Central to our innovation is a hybrid guidance framework, which combines stylized images, facial images, and textual prompts to guide the image generation process. This unique combination enables our model to produce a variety of applications, such as artistic portraits and identity-blended images. Our experimental results, including both qualitative and quantitative evaluations, demonstrate the superiority of our method over existing baseline models and previous works, particularly in its remarkable efficiency and ability to preserve the subject's identity with high fidelity. △ Less

Submitted 6 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

Comments: Project homepage: https://icoz69.github.io/facestudio/

arXiv:2312.01735 [pdf, other]

Weighted Q-learning for optimal dynamic treatment regimes with MNAR covariates

Authors: Jian Sun, Li Su, Bo Fu

Abstract: Dynamic treatment regimes (DTRs) formalize medical decision-making as a sequence of rules for different stages, mapping patient-level information to recommended treatments. In practice, estimating an optimal DTR using observational data from electronic medical record (EMR) databases can be complicated by covariates that are missing not at random (MNAR) due to informative monitoring of patients. Si… ▽ More Dynamic treatment regimes (DTRs) formalize medical decision-making as a sequence of rules for different stages, mapping patient-level information to recommended treatments. In practice, estimating an optimal DTR using observational data from electronic medical record (EMR) databases can be complicated by covariates that are missing not at random (MNAR) due to informative monitoring of patients. Since complete case analysis can result in consistent estimation of outcome model parameters under the assumption of outcome-independent missingness, Q-learning is a natural approach to accommodating MNAR covariates. However, the backward induction algorithm used in Q-learning can introduce challenges, as MNAR covariates at later stages can result in MNAR pseudo-outcomes at earlier stages, leading to suboptimal DTRs, even if the longitudinal outcome variables are fully observed. To address this unique missing data problem in DTR settings, we propose two weighted Q-learning approaches where inverse probability weights for missingness of the pseudo-outcomes are obtained through estimating equations with valid nonresponse instrumental variables or sensitivity analysis. Asymptotic properties of the weighted Q-learning estimators are derived and the finite-sample performance of the proposed methods is evaluated and compared with alternative methods through extensive simulation studies. Using EMR data from the Medical Information Mart for Intensive Care database, we apply the proposed methods to investigate the optimal fluid strategy for sepsis patients in intensive care units. △ Less

Submitted 22 February, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

arXiv:2311.16483 [pdf, other]

ChartLlama: A Multimodal LLM for Chart Understanding and Generation

Authors: Yucheng Han, Chi Zhang, Xin Chen, Xu Yang, Zhibin Wang, Gang Yu, Bin Fu, Hanwang Zhang

Abstract: Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tunin… ▽ More Multi-modal large language models have demonstrated impressive performances on most vision-language tasks. However, the model generally lacks the understanding capabilities for specific domain data, particularly when it comes to interpreting chart figures. This is mainly due to the lack of relevant multi-modal instruction tuning datasets. In this article, we create a high-quality instruction-tuning dataset leveraging GPT-4. We develop a multi-step data generation process in which different steps are responsible for generating tabular data, creating chart figures, and designing instruction tuning data separately. Our method's flexibility enables us to generate diverse, high-quality instruction-tuning data consistently and efficiently while maintaining a low resource expenditure. Additionally, it allows us to incorporate a wider variety of chart and task types not yet featured in existing datasets. Next, we introduce ChartLlama, a multi-modal large language model that we've trained using our created dataset. ChartLlama outperforms all prior methods in ChartQA, Chart-to-text, and Chart-extraction evaluation benchmarks. Additionally, ChartLlama significantly improves upon the baseline in our specially compiled chart dataset, which includes new chart and task types. The results of ChartLlama confirm the value and huge potential of our proposed data generation method in enhancing chart comprehension. △ Less

Submitted 27 November, 2023; originally announced November 2023.

Comments: Code and model on https://tingxueronghua.github.io/ChartLlama/

arXiv:2311.14189 [pdf, other]

D-SCo: Dual-Stream Conditional Diffusion for Monocular Hand-Held Object Reconstruction

Authors: Bowen Fu, Gu Wang, Chenyangguang Zhang, Yan Di, Ziqin Huang, Zhiying Leng, Fabian Manhardt, Xiangyang Ji, Federico Tombari

Abstract: Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction… ▽ More Reconstructing hand-held objects from a single RGB image is a challenging task in computer vision. In contrast to prior works that utilize deterministic modeling paradigms, we employ a point cloud denoising diffusion model to account for the probabilistic nature of this problem. In the core, we introduce centroid-fixed dual-stream conditional diffusion for monocular hand-held object reconstruction (D-SCo), tackling two predominant challenges. First, to avoid the object centroid from deviating, we utilize a novel hand-constrained centroid fixing paradigm, enhancing the stability of diffusion and reverse processes and the precision of feature projection. Second, we introduce a dual-stream denoiser to semantically and geometrically model hand-object interactions with a novel unified hand-object semantic embedding, enhancing the reconstruction performance of the hand-occluded region of the object. Experiments on the synthetic ObMan dataset and three real-world datasets HO3D, MOW and DexYCB demonstrate that our approach can surpass all other state-of-the-art methods. Codes will be released. △ Less

Submitted 22 March, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

arXiv:2311.11910 [pdf, other]

Generalization of Fitness Exercise Recognition from Doppler Measurements by Domain-adaption and Few-Shot Learning

Authors: Biying Fu, Naser Damer, Florian Kirchbuchner, Arjan Kuijper

Abstract: In previous works, a mobile application was developed using an unmodified commercial off-the-shelf smartphone to recognize whole-body exercises. The working principle was based on the ultrasound Doppler sensing with the device built-in hardware. Applying such a lab-environment trained model on realistic application variations causes a significant drop in performance, and thus decimate its applicab… ▽ More In previous works, a mobile application was developed using an unmodified commercial off-the-shelf smartphone to recognize whole-body exercises. The working principle was based on the ultrasound Doppler sensing with the device built-in hardware. Applying such a lab-environment trained model on realistic application variations causes a significant drop in performance, and thus decimate its applicability. The reason of the reduced performance can be manifold. It could be induced by the user, environment, and device variations in realistic scenarios. Such scenarios are often more complex and diverse, which can be challenging to anticipate in the initial training data. To study and overcome this issue, this paper presents a database with controlled and uncontrolled subsets of fitness exercises. We propose two concepts to utilize small adaption data to successfully improve model generalization in an uncontrolled environment, increasing the recognition accuracy by two to six folds compared to the baseline for different users. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: accepted at International Conference on Pattern Recognition (ICPR) workshop 2021

arXiv:2311.11106 [pdf, other]

ShapeMatcher: Self-Supervised Joint Shape Canonicalization, Segmentation, Retrieval and Deformation

Authors: Yan Di, Chenyangguang Zhang, Chaowei Wang, Ruida Zhang, Guangyao Zhai, Yanyan Li, Bowen Fu, Xiangyang Ji, Shan Gao

Abstract: In this paper, we present ShapeMatcher, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation. Given a partially-observed object in an arbitrary pose, we first canonicalize the object by extracting point-wise affine-invariant features, disentangling inherent structure of the object with its pose and size. These learned features are t… ▽ More In this paper, we present ShapeMatcher, a unified self-supervised learning framework for joint shape canonicalization, segmentation, retrieval and deformation. Given a partially-observed object in an arbitrary pose, we first canonicalize the object by extracting point-wise affine-invariant features, disentangling inherent structure of the object with its pose and size. These learned features are then leveraged to predict semantically consistent part segmentation and corresponding part centers. Next, our lightweight retrieval module aggregates the features within each part as its retrieval token and compare all the tokens with source shapes from a pre-established database to identify the most geometrically similar shape. Finally, we deform the retrieved shape in the deformation module to tightly fit the input object by harnessing part center guided neural cage deformation. The key insight of ShapeMaker is the simultaneous training of the four highly-associated processes: canonicalization, segmentation, retrieval, and deformation, leveraging cross-task consistency losses for mutual supervision. Extensive experiments on synthetic datasets PartNet, ComplementMe, and real-world dataset Scan2CAD demonstrate that ShapeMaker surpasses competitors by a large margin. △ Less

Submitted 11 March, 2024; v1 submitted 18 November, 2023; originally announced November 2023.

Comments: CVPR2024

arXiv:2311.09186 [pdf]

Structure responsible for the superconducting state in La3Ni2O7 at high pressure and low temperature conditions

Authors: Luhong Wang, Yan Li, Shengyi Xie, Fuyang Liu, Hualei Sun, Caoxin Huang, Yang Gao, Takeshi Nakagawa, Boyang Fu, Bo Dong, Zhenhui Cao, Runze Yu, Saori I. Kawaguchi, Hirokazu Kadobayashi, Meng Wang, Changqing Jin, Ho-kwang Mao, Haozhe Liu

Abstract: Very recently, a new superconductor with Tc = 80 K was reported in nickelate (La3Ni2O7) at around 15 - 40 GPa conditions (Nature, 621, 493, 2023) [1], which is the second type of unconventional superconductor, beside the cuprates, with Tc above liquid nitrogen temperature. However, the phase diagram plotted in this report was mostly based on the transport measurement at low temperature and high pr… ▽ More Very recently, a new superconductor with Tc = 80 K was reported in nickelate (La3Ni2O7) at around 15 - 40 GPa conditions (Nature, 621, 493, 2023) [1], which is the second type of unconventional superconductor, beside the cuprates, with Tc above liquid nitrogen temperature. However, the phase diagram plotted in this report was mostly based on the transport measurement at low temperature and high pressure conditions, and the assumed corresponding X-ray diffraction (XRD) results was carried out at room temperature. This encouraged us to carry out in situ high pressure and low temperature synchrotron XRD experiments to determine which phase is responsible for the high Tc state. In addition to the phase transition from orthorhombic Amam structure to orthorhombic Fmmm structure, a tetragonal phase with space group of I4/mmm was discovered when the sample was compressed to 19 GPa at 40 K where the superconductivity takes palce in La3Ni2O7. The calculations based on this tetragonal structure reveal that the electronic states approached to the Fermi energy were mainly dominated by the eg orbitals (3dz2 and 3dx2-y2) of Ni atoms, which are located in the oxygen octahedral crystal field. The correlation between Tc and this structural evolution, especially Ni-O octahedra regularity and the in-plane Ni-O-Ni bonding angles, are analyzed. This work sheds new lights to identify what is the most likely phase responsible for superconductivity in the double layered nickelate. △ Less

Submitted 21 November, 2023; v1 submitted 15 November, 2023; originally announced November 2023.

arXiv:2311.03967 [pdf, other]

CeCNN: Copula-enhanced convolutional neural networks in joint prediction of refraction error and axial length based on ultra-widefield fundus images

Authors: Chong Zhong, Yang Li, Danjuan Yang, Meiyan Li, Xingyao Zhou, Bo Fu, Catherine C. Liu, A. H. Welsh

Abstract: Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular c… ▽ More Ultra-widefield (UWF) fundus images are replacing traditional fundus images in screening, detection, prediction, and treatment of complications related to myopia because their much broader visual range is advantageous for highly myopic eyes. Spherical equivalent (SE) is extensively used as the main myopia outcome measure, and axial length (AL) has drawn increasing interest as an important ocular component for assessing myopia. Cutting-edge studies show that SE and AL are strongly correlated. Using the joint information from SE and AL is potentially better than using either separately. In the deep learning community, though there is research on multiple-response tasks with a 3D image biomarker, dependence among responses is only sporadically taken into consideration. Inspired by the spirit that information extracted from the data by statistical methods can improve the prediction accuracy of deep learning models, we formulate a class of multivariate response regression models with a higher-order tensor biomarker, for the bivariate tasks of regression-classification and regression-regression. Specifically, we propose a copula-enhanced convolutional neural network (CeCNN) framework that incorporates the dependence between responses through a Gaussian copula (with parameters estimated from a warm-up CNN) and uses the induced copula-likelihood loss with the backbone CNNs. We establish the statistical framework and algorithms for the aforementioned two bivariate tasks. We show that the CeCNN has better prediction accuracy after adding the dependency information to the backbone models. The modeling and the proposed CeCNN algorithm are applicable beyond the UWF scenario and can be effective with other backbones beyond ResNet and LeNet. △ Less

Submitted 1 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

arXiv:2311.00340 [pdf, other]

Two-dimensional double-kagome-lattice nitrogene: a direct band gap semiconductor with nontrivial corner state

Authors: Wenzhang Li, Qin He, Xiao-Ping Li, Da-Shuai Ma, Botao Fu

Abstract: Based on first-principles calculations, we predict that nitrogen atoms can assemble into a single-layer double kagome lattice (DKL), which possesses the characteristics of an intrinsic direct band gap semiconductor, boasting a substantial band gap of 3.460 eV. The DKL structure results in a flat valence band with high effective mass and a conduction band with small effective mass comes from Dirac… ▽ More Based on first-principles calculations, we predict that nitrogen atoms can assemble into a single-layer double kagome lattice (DKL), which possesses the characteristics of an intrinsic direct band gap semiconductor, boasting a substantial band gap of 3.460 eV. The DKL structure results in a flat valence band with high effective mass and a conduction band with small effective mass comes from Dirac electrons. These distinctive band edges lead to a significant disparity in carrier mobilities, with electron mobility being four orders of magnitude higher than that of holes. The presence of flat band in DKL-nitrogene can be further discerned through the enhanced optical absorption and correlated effects as exemplified by hole-induced ferromagnetism. Interestingly, DKL-nitrogene exhibits inherent second-order topological states, confirmed by a non-trivial second Stiefel-Whitney number and the presence of 1D floating edge states and 0D corner states within the bulk band gap. Additionally, the robust N-N bonds and the lattice's bending structure ensure thermodynamic stability and mechanical stiffness. These attributes make it exceptionally stable for potential applications in nano-devices. △ Less

Submitted 1 November, 2023; originally announced November 2023.

Comments: 9 pages, 4 figures

arXiv:2310.16345 [pdf, other]

Constructing the Equation of State of QCD in a functional QCD based scheme

Authors: Yi Lu, Fei Gao, Bao-Chi Fu, Hui-Chao Song, Yu-Xin Liu

Abstract: We construct the equation of state (EoS) of QCD based on the finite chemical potential information from the functional QCD approaches, with the assistance of the lattice QCD EoS. The obtained EoS is consistent with the up-to-date estimations of the QCD phase diagram, including a phase transition temperature at zero chemical potential of $T=155$ MeV, the curvature of the transition line $κ=0.016$ a… ▽ More We construct the equation of state (EoS) of QCD based on the finite chemical potential information from the functional QCD approaches, with the assistance of the lattice QCD EoS. The obtained EoS is consistent with the up-to-date estimations of the QCD phase diagram, including a phase transition temperature at zero chemical potential of $T=155$ MeV, the curvature of the transition line $κ=0.016$ and also a critical end point at $(T,μ_B)=(118, 600)$ MeV. In specific, the phase diagram mapping is achieved by incorporating the order parameters into the EoS, namely the dynamical quark mass for the chiral phase transition together with the Polyakov loop parameter for the deconfinement phase transition. We also implement the EoS in hydrodynamic simulations to compute the particle yields, ratios and collective flow, and find that our obtained EoS agrees well with the commonly used one based on the combination of lattice QCD simulation and hadron resonance gas model. △ Less

Submitted 26 October, 2023; v1 submitted 25 October, 2023; originally announced October 2023.

Comments: 8 pages, 12 figures

arXiv:2310.15161 [pdf, other]

SAM-Med3D

Authors: Haoyu Wang, Sizheng Guo, Jin Ye, Zhongying Deng, Junlong Cheng, Tianbin Li, Jianpin Chen, Yanzhou Su, Ziyan Huang, Yiqing Shen, Bin Fu, Shaoting Zhang, Junjun He, Yu Qiao

Abstract: Although the Segment Anything Model (SAM) has demonstrated impressive performance in 2D natural image segmentation, its application to 3D volumetric medical images reveals significant shortcomings, namely suboptimal performance and unstable prediction, necessitating an excessive number of prompt points to attain the desired outcomes. These issues can hardly be addressed by fine-tuning SAM on medic… ▽ More Although the Segment Anything Model (SAM) has demonstrated impressive performance in 2D natural image segmentation, its application to 3D volumetric medical images reveals significant shortcomings, namely suboptimal performance and unstable prediction, necessitating an excessive number of prompt points to attain the desired outcomes. These issues can hardly be addressed by fine-tuning SAM on medical data because the original 2D structure of SAM neglects 3D spatial information. In this paper, we introduce SAM-Med3D, the most comprehensive study to modify SAM for 3D medical images. Our approach is characterized by its comprehensiveness in two primary aspects: firstly, by comprehensively reformulating SAM to a thorough 3D architecture trained on a comprehensively processed large-scale volumetric medical dataset; and secondly, by providing a comprehensive evaluation of its performance. Specifically, we train SAM-Med3D with over 131K 3D masks and 247 categories. Our SAM-Med3D excels at capturing 3D spatial information, exhibiting competitive performance with significantly fewer prompt points than the top-performing fine-tuned SAM in the medical domain. We then evaluate its capabilities across 15 datasets and analyze it from multiple perspectives, including anatomical structures, modalities, targets, and generalization abilities. Our approach, compared with SAM, showcases pronouncedly enhanced efficiency and broad segmentation capabilities for 3D volumetric medical images. Our code is released at https://github.com/uni-medical/SAM-Med3D. △ Less

Submitted 29 October, 2023; v1 submitted 23 October, 2023; originally announced October 2023.

arXiv:2310.13819 [pdf, other]

LanPose: Language-Instructed 6D Object Pose Estimation for Robotic Assembly

Authors: Bowen Fu, Sek Kun Leong, Yan Di, Jiwen Tang, Xiangyang Ji

Abstract: Comprehending natural language instructions is a critical skill for robots to cooperate effectively with humans. In this paper, we aim to learn 6D poses for roboticassembly by natural language instructions. For this purpose, Language-Instructed 6D Pose Regression Network (LanPose) is proposed to jointly predict the 6D poses of the observed object and the corresponding assembly position. Our propos… ▽ More Comprehending natural language instructions is a critical skill for robots to cooperate effectively with humans. In this paper, we aim to learn 6D poses for roboticassembly by natural language instructions. For this purpose, Language-Instructed 6D Pose Regression Network (LanPose) is proposed to jointly predict the 6D poses of the observed object and the corresponding assembly position. Our proposed approach is based on the fusion of geometric and linguistic features, which allows us to finely integrate multi-modality input and map it to the 6D pose in SE(3) space by the cross-attention mechanism and the language-integrated 6D pose mapping module, respectively. To validate the effectiveness of our approach, an integrated robotic system is established to precisely and robustly perceive, grasp, manipulate and assemble blocks by language commands. 98.09 and 93.55 in ADD(-S)-0.1d are derived for the prediction of 6D object pose and 6D assembly pose, respectively. Both quantitative and qualitative results demonstrate the effectiveness of our proposed language-instructed 6D pose estimation methodology and its potential to enable robots to better understand and execute natural language instructions. △ Less

Submitted 20 October, 2023; originally announced October 2023.

Comments: 8 pages

arXiv:2310.11696 [pdf, other]

MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

Authors: Chenyangguang Zhang, Guanlong Jiao, Yan Di, Gu Wang, Ziqin Huang, Ruida Zhang, Fabian Manhardt, Bowen Fu, Federico Tombari, Xiangyang Ji

Abstract: Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models, which are hard to collect in real world. In contrast, readily accessible hand-object videos offer a promising training data source, but they only give heavily occluded object observations. In this paper, we present a novel synthetic-to-real framework to exploit Multi-vie… ▽ More Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models, which are hard to collect in real world. In contrast, readily accessible hand-object videos offer a promising training data source, but they only give heavily occluded object observations. In this paper, we present a novel synthetic-to-real framework to exploit Multi-view Occlusion-aware supervision from hand-object videos for Hand-held Object reconstruction (MOHO) from a single image, tackling two predominant challenges in such setting: hand-induced occlusion and object's self-occlusion. First, in the synthetic pre-training stage, we render a large-scaled synthetic dataset SOMVideo with hand-object images and multi-view occlusion-free supervisions, adopted to address hand-induced occlusion in both 2D and 3D spaces. Second, in the real-world finetuning stage, MOHO leverages the amodal-mask-weighted geometric supervision to mitigate the unfaithful guidance caused by the hand-occluded supervising views in real world. Moreover, domain-consistent occlusion-aware features are amalgamated in MOHO to resist object's self-occlusion for inferring the complete object shape. Extensive experiments on HO3D and DexYCB datasets demonstrate 2D-supervised MOHO gains superior results against 3D-supervised methods by a large margin. △ Less

Submitted 13 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: CVPR 2024

arXiv:2310.04153 [pdf, other]

Fair coins tend to land on the same side they started: Evidence from 350,757 flips

Authors: František Bartoš, Alexandra Sarafoglou, Henrik R. Godmann, Amir Sahrani, David Klein Leunk, Pierre Y. Gui, David Voss, Kaleem Ullah, Malte J. Zoubek, Franziska Nippold, Frederik Aust, Felipe F. Vieira, Chris-Gabriel Islam, Anton J. Zoubek, Sara Shabani, Jonas Petter, Ingeborg B. Roos, Adam Finnemann, Aaron B. Lob, Madlen F. Hoffstadt, Jason Nak, Jill de Ron, Koen Derks, Karoline Huth, Sjoerd Terpstra , et al. (25 additional authors not shown)

Abstract: Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on… ▽ More Many people have flipped coins but few have stopped to ponder the statistical and physical intricacies of the process. In a preregistered study we collected $350{,}757$ coin flips to test the counterintuitive prediction from a physics model of human coin tossing developed by Diaconis, Holmes, and Montgomery (DHM; 2007). The model asserts that when people flip an ordinary coin, it tends to land on the same side it started -- DHM estimated the probability of a same-side outcome to be about 51%. Our data lend strong support to this precise prediction: the coins landed on the same side more often than not, $\text{Pr}(\text{same side}) = 0.508$, 95% credible interval (CI) [$0.506$, $0.509$], $\text{BF}_{\text{same-side bias}} = 2359$. Furthermore, the data revealed considerable between-people variation in the degree of this same-side bias. Our data also confirmed the generic prediction that when people flip an ordinary coin -- with the initial side-up randomly determined -- it is equally likely to land heads or tails: $\text{Pr}(\text{heads}) = 0.500$, 95% CI [$0.498$, $0.502$], $\text{BF}_{\text{heads-tails bias}} = 0.182$. Furthermore, this lack of heads-tails bias does not appear to vary across coins. Additional exploratory analyses revealed that the within-people same-side bias decreased as more coins were flipped, an effect that is consistent with the possibility that practice makes people flip coins in a less wobbly fashion. Our data therefore provide strong evidence that when some (but not all) people flip a fair coin, it tends to land on the same side it started. Our data provide compelling statistical support for the DHM physics model of coin tossing. △ Less

Submitted 2 June, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.15373 [pdf, other]

Human-robot Matching and Routing for Multi-robot Tour Guiding under Time Uncertainty

Authors: Bo Fu, Tribhi Kathuria, Denise Rizzo, Matthew Castanier, X. Jessie Yang, Maani Ghaffari, Kira Barton

Abstract: This work presents a framework for multi-robot tour guidance in a partially known environment with uncertainty, such as a museum. A simultaneous matching and routing problem (SMRP) is formulated to match the humans with robot guides according to their requested places of interest (POIs) and generate the routes for the robots according to uncertain time estimation. A large neighborhood search algor… ▽ More This work presents a framework for multi-robot tour guidance in a partially known environment with uncertainty, such as a museum. A simultaneous matching and routing problem (SMRP) is formulated to match the humans with robot guides according to their requested places of interest (POIs) and generate the routes for the robots according to uncertain time estimation. A large neighborhood search algorithm is developed to efficiently find sub-optimal low-cost solutions for the SMRP. The scalability and optimality of the multi-robot planner are evaluated computationally. The largest case tested involves 50 robots, 250 humans, and 50 POIs. A photo-realistic multi-robot simulation was developed to verify the tour guiding performance in an uncertain indoor environment. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: ICRA 2022 Workshop Paper (https://sites.google.com/view/icra22ws-cor-wotf/accepted-papers). arXiv admin note: substantial text overlap with arXiv:2201.10635

MSC Class: 93A16

arXiv:2309.14739 [pdf]

Designing superhard magnetic material in clathrate \b{eta}-C3N2 through atom embeddedness

Authors: Liping Sun, Botao Fu, Jing Chang

Abstract: Designing new compounds with the coexistence of diverse physical properties is of great significance for broad applications in multifunctional electronic devices. In this work, based on density functional theory, we predict the coexistence of mechanical superhardness and the controllable magnetism in the clathrate material \b{eta}-C3N2 through the implant of the external atom into the intrinsic ca… ▽ More Designing new compounds with the coexistence of diverse physical properties is of great significance for broad applications in multifunctional electronic devices. In this work, based on density functional theory, we predict the coexistence of mechanical superhardness and the controllable magnetism in the clathrate material \b{eta}-C3N2 through the implant of the external atom into the intrinsic cage structure. Taking hydrogen-doping (H@\b{eta}-C3N2) and fluorine-doping (F@\b{eta}-C3N2) as examples, our calculations indicate these two doped configurations are stable and discovered that they belong to antiferromagnetic semiconductor and ferromagnetic semi-metal, respectively. These intriguing magnetic phase transitions originate from their distinctive band structure around the Fermi level and can be well understood by the 3D Hubbard model with half-filling occupation and the Stoner model. Moreover, the high Vickers hardness of 49.0 GPa for H@\b{eta}-C3N2 and 48.2 GPa for F@\b{eta}-C3N2 are obtained, suggesting they are clathrate superhard materials as its host. Therefore, the incorporation of H and F in \b{eta}-C3N2 gives rise to a new type of superhard antiferromagnetic semiconductor and superhard ferromagnetic semimetal, respectively, which could have potential applications in harsh conditions. Our work provides an effective strategy to design a new class of highly desirable multifunctional materials with excellent mechanical properties and magnetic properties, which may arouse spintronic applications in superhard materials in the future. △ Less

Submitted 26 September, 2023; originally announced September 2023.

Comments: 14 pages, 5 figures

arXiv:2309.11189 [pdf, ps, other]

Increasing Ticketing Allocative Efficiency Using Marginal Price Auction Theory

Authors: Boxiang Fu

Abstract: Most modern ticketing systems rely on a first-come-first-serve or randomized allocation system to determine the allocation of tickets. Such systems has received considerable backlash in recent years due to its inequitable allotment and allocative inefficiency. We analyze a ticketing protocol based on a variation of the marginal price auction system. Users submit bids to the protocol based on their… ▽ More Most modern ticketing systems rely on a first-come-first-serve or randomized allocation system to determine the allocation of tickets. Such systems has received considerable backlash in recent years due to its inequitable allotment and allocative inefficiency. We analyze a ticketing protocol based on a variation of the marginal price auction system. Users submit bids to the protocol based on their own utilities. The protocol awards tickets to the highest bidders and determines the final ticket price paid by all bidders using the lowest winning submitted bid. Game theoretic proof is provided to ensure the protocol more efficiently allocates the tickets to the bidders with the highest utilities. We also prove that the protocol extracts more economic rents for the event organizers and the non-optimality of ticket scalping under time-invariant bidder utilities. △ Less

Submitted 20 September, 2023; originally announced September 2023.

Comments: 12 pages, 7 figures

arXiv:2309.09724 [pdf, other]

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering

Authors: Chi Zhang, Wei Yin, Gang Yu, Zhibin Wang, Tao Chen, Bin Fu, Joey Tianyi Zhou, Chunhua Shen

Abstract: In this study, we address the challenge of 3D scene structure recovery from monocular depth estimation. While traditional depth estimation methods leverage labeled datasets to directly predict absolute depth, recent advancements advocate for mix-dataset training, enhancing generalization across diverse scenes. However, such mixed dataset training yields depth predictions only up to an unknown scal… ▽ More In this study, we address the challenge of 3D scene structure recovery from monocular depth estimation. While traditional depth estimation methods leverage labeled datasets to directly predict absolute depth, recent advancements advocate for mix-dataset training, enhancing generalization across diverse scenes. However, such mixed dataset training yields depth predictions only up to an unknown scale and shift, hindering accurate 3D reconstructions. Existing solutions necessitate extra 3D datasets or geometry-complete depth annotations, constraints that limit their versatility. In this paper, we propose a learning framework that trains models to predict geometry-preserving depth without requiring extra data or annotations. To produce realistic 3D structures, we render novel views of the reconstructed scenes and design loss functions to promote depth estimation consistency across different views. Comprehensive experiments underscore our framework's superior generalization capabilities, surpassing existing state-of-the-art methods on several benchmark datasets without leveraging extra training information. Moreover, our innovative loss functions empower the model to autonomously recover domain-specific scale-and-shift coefficients using solely unlabeled images. △ Less

Submitted 18 September, 2023; originally announced September 2023.

Comments: Accepted by ICCV2023

arXiv:2309.06184 [pdf, other]

Design monolayer iodinenes based on halogen bond and tiling theory

Authors: Kejun Yu, Botao Fu, Runwu Zhang, Da-shuai Ma, Xiao-ping Li, Zhi-Ming Yu, Cheng-Cheng Liu, Yugui Yao

Abstract: Xenes, two-dimensional (2D) monolayers composed of a single element, with graphene as a typical representative, have attracted widespread attention. Most of the previous Xenes, X from group-IIIA to group-VIA elements have bonding characteristics of covalent bonds. In this work, we for the first time unveil the pivotal role of a halogen bond, which is a distinctive type of bonding with interaction… ▽ More Xenes, two-dimensional (2D) monolayers composed of a single element, with graphene as a typical representative, have attracted widespread attention. Most of the previous Xenes, X from group-IIIA to group-VIA elements have bonding characteristics of covalent bonds. In this work, we for the first time unveil the pivotal role of a halogen bond, which is a distinctive type of bonding with interaction strength between that of a covalent bond and a van der Waals interaction, in 2D group-VIIA monolayers. Combing the ingenious non-edge-to-edge tiling theory and state-of-art ab initio method with refined local density functional M06-L, we provide a precise and effective bottom-up construction of 2D iodine monolayer sheets, iodinenes, primarily governed by halogen bonds, and successfully design a category of stable iodinenes, encompassing herringbone, Pythagorean, gyrated truncated hexagonal, i.e. diatomic-kagome, and gyrated hexagonal tiling pattern. These iodinene structures exhibit a wealth of properties, such as flat bands, nontrivial topology, and fascinating optical characteristics, offering valuable insights and guidance for future experimental investigations. Our work not only unveils the unexplored halogen bonding mechanism in 2D materials but also opens a new avenue for designing other non-covalent bonding 2D materials. △ Less

Submitted 28 October, 2023; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 6 pages, 4 figures

arXiv:2309.01755 [pdf, other]

Flat-band and multi-dimensional fermions in Pb10(PO4)6O4

Authors: Botao Fu, Qin He, Xiao-Ping Li

Abstract: Employing a combination of first-principles calculations and low-energy effective models, we present a comprehensive investigation on the electronic structure of Pb$_{10}$(PO$_{4}$)$_{6}$O$_{4}$, which exhibits remarkable quasi-one-dimensional flat-band around the Fermi level that contains novel multi-dimensional fermions. These flat bands predominantly originate from $p_x/p_y$ orbital of the oxyg… ▽ More Employing a combination of first-principles calculations and low-energy effective models, we present a comprehensive investigation on the electronic structure of Pb$_{10}$(PO$_{4}$)$_{6}$O$_{4}$, which exhibits remarkable quasi-one-dimensional flat-band around the Fermi level that contains novel multi-dimensional fermions. These flat bands predominantly originate from $p_x/p_y$ orbital of the oxygen molecules chain at $4e$ Wyckoff positions, and thus can be well-captured by a four-band tight-binding model. Furthermore, the abundant crystal symmetry inherent to Pb$_{10}$(PO$_{4}$)$_{6}$O$_{4}$ provides an ideal platform for the emergence of various multi-dimensional fermions, including a 0D four-fold degenerated Dirac fermion with quadratic dispersion, a 1D quadratic/linear nodal-line (QNL/LNL) fermion along symmetric $k$-paths, 1D hourglass nodal-line (HNL) fermion linked to the Dirac fermion, and a 2D symmetry-enforced nodal surface (NS) found on the $k_z$=$π$ plane. Moreover, when considering the weak ferromagnetic order, Pb$_{10}$(PO$_{4}$)$_{6}$O$_{4}$ transforms into a rare semi-half-metal, which is characterized by the presence of Dirac fermion and HNL fermion at the Fermi level for a single spin channel exhibiting 100$\%$ spin polarization. Our findings reveal the coexistence of flat bands, diverse topological semimetal states and ferromagnetism within in Pb$_{10}$(PO$_{4}$)$_{6}$O$_{4}$, which may provide valuable insights for further exploring intriguing interplay between superconductivity and exotic electronic states. △ Less

Submitted 4 September, 2023; originally announced September 2023.

Comments: 8 pages, 4 figures

arXiv:2309.00778 [pdf]

Generation of Kerr soliton microcomb in a normally dispersed lithium niobate microdisk resonator by mode trimming

Authors: Botao Fu, Renhong Gao, Ni Yao, Haisu Zhang, Chuntao Li, Jintian Lin, Min Wang, Lingling Qiao, Ya Cheng

Abstract: Anomalous microresonator dispersion is mandatory for Kerr soliton microcomb formation, which depends critically on the geometry of the microresonator and can hardly be tuned after the structure is made. To date, cavity-based microcombs have only been generated with fundamental whispering gallery modes (WGMs) of anomalous dispersion in microresonators. Moreover, microcomb generation in highly Raman… ▽ More Anomalous microresonator dispersion is mandatory for Kerr soliton microcomb formation, which depends critically on the geometry of the microresonator and can hardly be tuned after the structure is made. To date, cavity-based microcombs have only been generated with fundamental whispering gallery modes (WGMs) of anomalous dispersion in microresonators. Moreover, microcomb generation in highly Raman-active platforms such as lithium niobate (LN) microresonators frequently suffers from stimulated Raman scattering and mode crossing due to the existence of multiple families of high-order WGMs. Here, we reveal a unique Kerr soliton microcomb generation mechanism through mode trimming in a weakly perturbed LN microdisk resonator. Remarkably, the soliton comb is generated with fundamental WGMs of normal dispersion and free from the mode crossing and Raman scattering effects. A robust soliton with a spectrum spanning from 1450 nm to 1620 nm at an on-chip pump power of 35 mW. Our discovery offers a powerful solution to circumvent the stringent requirements on high-precision dispersion engineering and termination of Raman excitation for soliton generation in the high-Q microdisk. △ Less

Submitted 1 September, 2023; originally announced September 2023.

Comments: 16 pages,and 5 figures

arXiv:2308.16404 [pdf, other]

Deformation Robust Text Spotting with Geometric Prior

Authors: Xixuan Hao, Aozhong Zhang, Xianze Meng, Bin Fu

Abstract: The goal of text spotting is to perform text detection and recognition in an end-to-end manner. Although the diversity of luminosity and orientation in scene texts has been widely studied, the font diversity and shape variance of the same character are ignored in recent works, since most characters in natural images are rendered in standard fonts. To solve this problem, we present a Chinese Artist… ▽ More The goal of text spotting is to perform text detection and recognition in an end-to-end manner. Although the diversity of luminosity and orientation in scene texts has been widely studied, the font diversity and shape variance of the same character are ignored in recent works, since most characters in natural images are rendered in standard fonts. To solve this problem, we present a Chinese Artistic Dataset, termed as ARText, which contains 33,000 artistic images with rich shape deformation and font diversity. Based on this database, we develop a deformation robust text spotting method (DR TextSpotter) to solve the recognition problem of complex deformation of characters in different fonts. Specifically, we propose a geometric prior module to highlight the important features based on the unsupervised landmark detection sub-network. A graph convolution network is further constructed to fuse the character features and landmark features, and then performs semantic reasoning to enhance the discrimination for different characters. The experiments are conducted on ARText and IC19-ReCTS datasets. Our results demonstrate the effectiveness of our proposed method. △ Less

Submitted 30 August, 2023; originally announced August 2023.

arXiv:2308.10253 [pdf, other]

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data

Authors: Yanda Li, Chi Zhang, Gang Yu, Zhibin Wang, Bin Fu, Guosheng Lin, Chunhua Shen, Ling Chen, Yunchao Wei

Abstract: The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions. Current methodologies often rely on annotations derived from benchmark datasets to construct im… ▽ More The remarkable multimodal capabilities demonstrated by OpenAI's GPT-4 have sparked significant interest in the development of multimodal Large Language Models (LLMs). A primary research objective of such models is to align visual and textual modalities effectively while comprehending human instructions. Current methodologies often rely on annotations derived from benchmark datasets to construct image-dialogue datasets for training purposes, akin to instruction tuning in LLMs. However, these datasets often exhibit domain bias, potentially constraining the generative capabilities of the models. In an effort to mitigate these limitations, we propose a novel data collection methodology that synchronously synthesizes images and dialogues for visual instruction tuning. This approach harnesses the power of generative models, marrying the abilities of ChatGPT and text-to-image generative models to yield a diverse and controllable dataset with varied image content. Additionally, datasets can be arbitrarily scaled. This not only provides greater flexibility compared to existing methodologies but also significantly enhances several model capabilities. Our research includes comprehensive experiments conducted on various datasets. The results emphasize substantial enhancements in more than ten commonly assessed capabilities. Additionally, our model achieves state-of-the-art results across multiple widely recognized multimodal benchmarks. △ Less

Submitted 27 December, 2023; v1 submitted 20 August, 2023; originally announced August 2023.

Comments: Project page: https://github.com/icoz69/StableLLAVA

arXiv:2308.07936 [pdf, other]

doi 10.1016/j.physletb.2024.138821

The spin alignment of vector mesons with light front quarks

Authors: Baochi Fu, Fei Gao, Yuxin Liu, Huichao Song

Abstract: The global spin alignment of the vector meson has been observed in relativistic heavy ion collisions, but is still on hot debates in the theoretical community. Here we propose to apply the light front framework to explain this phenomenon since the light front form explicitly describes the hadron spin including both the quark spin and the orbital angular momentum. After applying the light front spi… ▽ More The global spin alignment of the vector meson has been observed in relativistic heavy ion collisions, but is still on hot debates in the theoretical community. Here we propose to apply the light front framework to explain this phenomenon since the light front form explicitly describes the hadron spin including both the quark spin and the orbital angular momentum. After applying the light front spinor, we find that the spin alignment in the polarization of vector mesons with $ρ_{00}>1/3$ can be naturally manifested and in particular, the obtained spin alignment for $φ$ meson is in good agreement with the experimental data. This implies that to explain the spin alignment it is important to properly include the contribution from the gluon interactions that are presented in terms of the orbital angular momentum of the hadron bound state. △ Less

Submitted 8 July, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

Comments: 7 pages, 2 figures, matched to the published version

Journal ref: Phys. Lett. B 855,138821(2024)

arXiv:2308.07398 [pdf, other]

Local geometry of special pieces of nilpotent orbits

Authors: Baohua Fu, Daniel Juteau, Paul Levy, Eric Sommers

Abstract: The nilpotent cone of a simple Lie algebra is partitioned into locally closed subvarieties called special pieces, each containing exactly one special orbit. Lusztig conjectured that each special piece is the quotient of some smooth variety by a precise finite group $H$, a result proved for the classical types by Kraft and Procesi. The present work is about exceptional types. Our main result is a l… ▽ More The nilpotent cone of a simple Lie algebra is partitioned into locally closed subvarieties called special pieces, each containing exactly one special orbit. Lusztig conjectured that each special piece is the quotient of some smooth variety by a precise finite group $H$, a result proved for the classical types by Kraft and Procesi. The present work is about exceptional types. Our main result is a local version of Lusztig's conjecture: the intersection of a special piece with a Slodowy slice transverse to the minimal orbit in the piece is isomorphic to the quotient of a vector space by $H$. Along the way, we complete our previous work on the generic singularities of nilpotent orbit closures, by providing proofs for the last two `exotic' singularities. Four further, non-isolated, exotic singularities are studied: we show that quotients $\overline{{\mathcal 0}_{\text{mini}}(\mathfrak{so}_8)}/\mathfrak{S}_4$, $S^2({\mathbb C}^2/μ_3)$, $S^3({\mathbb C}^2/μ_2)$ and $\overline{{\mathcal 0}_{\text{mini}}(\mathfrak{sl}_3)}/\mathfrak{S}_4$ occur as Slodowy slice singularities between nilpotent orbits in types $F_4$, $E_6$, $E_7$ and $E_8$ respectively. We also extend, to fields other than ${\mathbb C}$, the results of Brylinski and Kostant on shared orbit pairs. In the course of our analysis, we discover a shared pair which is missing from Brylinski and Kostant's classification. △ Less

Submitted 20 February, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: 37 pages; some new results added, in particular, two new slice singularities identified

MSC Class: 17B08 14L30

arXiv:2308.05963 [pdf, ps, other]

doi 10.1103/PhysRevB.108.L241407

Metallic Quantized Anomalous Hall Effect without Chiral Edge States

Authors: Kai-Zhi Bai, Bo Fu, Zhenyu Zhang, Shun-Qing Shen

Abstract: The quantum anomalous Hall effect (QAHE) is a topological state of matter with a quantized Hall resistance. It has been observed in some two-dimensional insulating materials such as magnetic topological insulator films and twisted bilayer graphene. These materials are insulating in the bulk, but possess chiral edge states carrying the edge current around the systems. Here we discover a metallic QA… ▽ More The quantum anomalous Hall effect (QAHE) is a topological state of matter with a quantized Hall resistance. It has been observed in some two-dimensional insulating materials such as magnetic topological insulator films and twisted bilayer graphene. These materials are insulating in the bulk, but possess chiral edge states carrying the edge current around the systems. Here we discover a metallic QAHE in a topological insulator film with magnetic sandwich heterostructure, in which the Hall conductance is quantized to $e^{2}/h$, but the longitudinal conductance remains finite. This effect is attributed to the existence of a pair of massless Dirac cones of surface fermions, with each contributing half of the Hall conductance due to quantum anomaly. It is not characterized by a Chern number and not associated to any chiral edge states. Our study offers novel insights into topological transport phenomena and topological metallic states of matter. △ Less

Submitted 11 August, 2023; originally announced August 2023.

Comments: 6 pages, 4 figures

Journal ref: Phys. Rev. 108, L241407 (2023)

arXiv:2308.05799 [pdf, other]

doi 10.1103/PhysRevD.109.055025

Testing Realistic $SO(10)$ SUSY GUTs with Proton Decay and Gravitational Waves

Authors: Bowen Fu, Stephen F. King, Luca Marsili, Silvia Pascoli, Jessica Turner, Ye-Ling Zhou

Abstract: We present a comprehensive analysis of a supersymmetric $SO(10)$ Grand Unified Theory, which is broken to the Standard Model via the breaking of two intermediate symmetries. The spontaneous breaking of the first intermediate symmetry, $B-L$, leads to the generation of cosmic strings and right-handed neutrino masses and further to an observable cosmological background of gravitational waves and gen… ▽ More We present a comprehensive analysis of a supersymmetric $SO(10)$ Grand Unified Theory, which is broken to the Standard Model via the breaking of two intermediate symmetries. The spontaneous breaking of the first intermediate symmetry, $B-L$, leads to the generation of cosmic strings and right-handed neutrino masses and further to an observable cosmological background of gravitational waves and generation of light neutrino masses via type-I seesaw mechanism. Supersymmetry breaking manifests as sparticle masses below the $B-L$ breaking but far above the electroweak scale due to proton decay limits. This naturally pushes the $B-L$ breaking scale close to the GUT scale, leading to the formation of metastable cosmic strings, which can provide a gravitational wave spectrum consistent with the recent Pulsar Timing Arrays observation. We perform a detailed analysis of this model using two-loop renormalisation group equations, including threshold corrections, to determine the symmetry-breaking scale consistent with the recent Pulsar Timing Arrays signals such as NANOGrav 15-year data and testable by the next-generation limits on proton decay from Hyper-K and JUNO. Simultaneously, we find the regions of the model parameter space that can predict the measured quark and lepton masses and mixing, baryon asymmetry of our Universe, a viable dark matter candidate and can be tested by a combination of neutrinoless double beta decay searches and limits on the sum of neutrinos masses. △ Less

Submitted 29 March, 2024; v1 submitted 10 August, 2023; originally announced August 2023.

Comments: 15 pages, 6 figures

Report number: IPPP/23/41

Journal ref: Phys.Rev.D 109 (2024) 5, 055025

arXiv:2308.04718 [pdf, other]

doi 10.1103/PhysRevB.109.075113

Signature of Parity Anomaly: Crossover from One Half to Integer Quantized Hall Conductance in a Finite Magnetic Field

Authors: Huan-Wen Wang, Bo Fu, Shun-Qing Shen

Abstract: The pursuit of understanding parity anomaly in condensed matter systems has led to significant advancements in both theoretical and experimental research in recent years. In this study, we explore the parity anomaly of massless Dirac fermions in a semimagnetic topological insulator (TI) thin film subjected to a finite magnetic field. Our findings reveal an anomalous half-quantized Hall conductance… ▽ More The pursuit of understanding parity anomaly in condensed matter systems has led to significant advancements in both theoretical and experimental research in recent years. In this study, we explore the parity anomaly of massless Dirac fermions in a semimagnetic topological insulator (TI) thin film subjected to a finite magnetic field. Our findings reveal an anomalous half-quantized Hall conductance arising from the occupied electronic states far below the Fermi level, which is directly associated with the parity anomaly. This observation demonstrates a crossover from one-half quantized Hall conductance in a metallic phase at zero field to one or zero quantized Hall conductance in the insulating phase at a strong field in the presence of disorders, serving as a key indicator for confirming parity anomaly. Our work provides valuable insights into the intricate relationship between band topology in condensed matter systems and quantum anomaly in quantum field theory. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Comments: 7 pages, 3 figures

Journal ref: Phys. Rev. B 109, 075113 (2024)

Showing 1–50 of 300 results for author: Fu, B