subscribe to arXiv mailings

doi 10.1145/3637528.3671519

Unified Dual-Intent Translation for Joint Modeling of Search and Recommendation

Authors: Yuting Zhang, Yiqing Wu, Ruidong Han, Ying Sun, Yongchun Zhu, Xiang Li, Wei Lin, Fuzhen Zhuang, Zhulin An, Yongjun Xu

Abstract: Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winte… ▽ More Recommendation systems, which assist users in discovering their preferred items among numerous options, have served billions of users across various online platforms. Intuitively, users' interactions with items are highly driven by their unchanging inherent intents (e.g., always preferring high-quality items) and changing demand intents (e.g., wanting a T-shirt in summer but a down jacket in winter). However, both types of intents are implicitly expressed in recommendation scenario, posing challenges in leveraging them for accurate intent-aware recommendations. Fortunately, in search scenario, often found alongside recommendation on the same online platform, users express their demand intents explicitly through their query words. Intuitively, in both scenarios, a user shares the same inherent intent and the interactions may be influenced by the same demand intent. It is therefore feasible to utilize the interaction data from both scenarios to reinforce the dual intents for joint intent-aware modeling. But the joint modeling should deal with two problems: 1) accurately modeling users' implicit demand intents in recommendation; 2) modeling the relation between the dual intents and the interactive items. To address these problems, we propose a novel model named Unified Dual-Intents Translation for joint modeling of Search and Recommendation (UDITSR). To accurately simulate users' demand intents in recommendation, we utilize real queries from search data as supervision information to guide its generation. To explicitly model the relation among the triplet <inherent intent, demand intent, interactive item>, we propose a dual-intent translation propagation mechanism to learn the triplet in the same semantic space via embedding translations. Extensive experiments demonstrate that UDITSR outperforms SOTA baselines both in search and recommendation tasks. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.16507 [pdf, other]

Statistical ranking with dynamic covariates

Authors: Pinjun Dong, Ruijian Han, Binyan Jiang, Yiming Xu

Abstract: We consider a covariate-assisted ranking model grounded in the Plackett--Luce framework. Unlike existing works focusing on pure covariates or individual effects with fixed covariates, our approach integrates individual effects with dynamic covariates. This added flexibility enhances realistic ranking yet poses significant challenges for analyzing the associated estimation procedures. This paper ma… ▽ More We consider a covariate-assisted ranking model grounded in the Plackett--Luce framework. Unlike existing works focusing on pure covariates or individual effects with fixed covariates, our approach integrates individual effects with dynamic covariates. This added flexibility enhances realistic ranking yet poses significant challenges for analyzing the associated estimation procedures. This paper makes an initial attempt to address these challenges. We begin by discussing the sufficient and necessary condition for the model's identifiability. We then introduce an efficient alternating maximization algorithm to compute the maximum likelihood estimator (MLE). Under suitable assumptions on the topology of comparison graphs and dynamic covariates, we establish a quantitative uniform consistency result for the MLE with convergence rates characterized by the asymptotic graph connectivity. The proposed graph topology assumption holds for several popular random graph models under optimal leading-order sparsity conditions. A comprehensive numerical study is conducted to corroborate our theoretical findings and demonstrate the application of the proposed model to real-world datasets, including horse racing and tennis competitions. △ Less

Submitted 8 July, 2024; v1 submitted 24 June, 2024; originally announced June 2024.

Comments: Compared with the previous arXiv version, we provide a better theoretical result

arXiv:2406.15478 [pdf]

Impact of the Top SiO2 Interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

Abstract: We study the impact of top SiO2 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistor (FeFET) with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. We find that the MW increases with the increasing thickness of the top SiO2 interlayer, and such an increase exhibits a two-stage linear dependence. The physical origin is the presence of the different… ▽ More We study the impact of top SiO2 interlayer thickness on the memory window (MW) of Si channel ferroelectric field-effect transistor (FeFET) with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. We find that the MW increases with the increasing thickness of the top SiO2 interlayer, and such an increase exhibits a two-stage linear dependence. The physical origin is the presence of the different interfacial charges trapped at the top SiO2/Hf0.5Zr0.5O2 interface. Moreover, we investigate the dependence of endurance characteristics on initial MW. We find that the endurance characteristic degrades with increasing the initial MW. By inserting a 3.4 nm SiO2 dielectric interlayer between the gate metal TiN and the ferroelectric Hf0.5Zr0.5O2, we achieve a MW of 6.3 V and retention over 10 years. Our work is helpful in the device design of FeFET. △ Less

Submitted 16 June, 2024; originally announced June 2024.

Comments: 6 pages, 12 figures. arXiv admin note: substantial text overlap with arXiv:2404.15825

arXiv:2406.08301 [pdf, other]

Jet modification via $π^0$-hadron correlations in Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$ GeV

Authors: PHENIX Collaboration, N. J. Abdulameer, U. Acharya, A. Adare, S. Afanasiev, C. Aidala, N. N. Ajitanand, Y. Akiba, H. Al-Bataineh, J. Alexander, M. Alfred, K. Aoki, N. Apadula, L. Aphecetche, J. Asai, H. Asano, E. T. Atomssa, R. Averbeck, T. C. Awes, B. Azmoun, V. Babintsev, M. Bai, G. Baksay, L. Baksay, A. Baldisseri , et al. (510 additional authors not shown)

Abstract: High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is obs… ▽ More High-momentum two-particle correlations are a useful tool for studying jet-quenching effects in the quark-gluon plasma. Angular correlations between neutral-pion triggers and charged hadrons with transverse momenta in the range 4--12~GeV/$c$ and 0.5--7~GeV/$c$, respectively, have been measured by the PHENIX experiment in 2014 for Au$+$Au collisions at $\sqrt{s_{_{NN}}}=200$~GeV. Suppression is observed in the yield of high-momentum jet fragments opposite the trigger particle, which indicates jet suppression stemming from in-medium partonic energy loss, while enhancement is observed for low-momentum particles. The ratio and differences between the yield in Au$+$Au collisions and $p$$+$$p$ collisions, $I_{AA}$ and $Δ_{AA}$, as a function of the trigger-hadron azimuthal separation, $Δφ$, are measured for the first time at the Relativistic Heavy Ion Collider. These results better quantify how the yield of low-$p_T$ associated hadrons is enhanced at wide angle, which is crucial for studying energy loss as well as medium-response effects. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 534 authors from 83 institutions, 12 pages, 7 figures. v1 is version submitted to Physical Review C. HEPdata tables for the points plotted in figures for this and previous PHENIX publications are (or will be) publicly available at http://www.phenix.bnl.gov/papers.html

arXiv:2406.07579 [pdf, other]

GFPack++: Improving 2D Irregular Packing by Learning Gradient Field with Attention

Authors: Tianyang Xue, Lin Lu, Yang Liu, Mingdong Wu, Hao Dong, Yanbin Zhang, Renmin Han, Baoquan Chen

Abstract: 2D irregular packing is a classic combinatorial optimization problem with various applications, such as material utilization and texture atlas generation. This NP-hard problem requires efficient algorithms to optimize space utilization. Conventional numerical methods suffer from slow convergence and high computational cost. Existing learning-based methods, such as the score-based diffusion model,… ▽ More 2D irregular packing is a classic combinatorial optimization problem with various applications, such as material utilization and texture atlas generation. This NP-hard problem requires efficient algorithms to optimize space utilization. Conventional numerical methods suffer from slow convergence and high computational cost. Existing learning-based methods, such as the score-based diffusion model, also have limitations, such as no rotation support, frequent collisions, and poor adaptability to arbitrary boundaries, and slow inferring. The difficulty of learning from teacher packing is to capture the complex geometric relationships among packing examples, which include the spatial (position, orientation) relationships of objects, their geometric features, and container boundary conditions. Representing these relationships in latent space is challenging. We propose GFPack++, an attention-based gradient field learning approach that addresses this challenge. It consists of two pivotal strategies: \emph{attention-based geometry encoding} for effective feature encoding and \emph{attention-based relation encoding} for learning complex relationships. We investigate the utilization distribution between the teacher and inference data and design a weighting function to prioritize tighter teacher data during training, enhancing learning effectiveness. Our diffusion model supports continuous rotation and outperforms existing methods on various datasets. We achieve higher space utilization over several widely used baselines, one-order faster than the previous diffusion-based method, and promising generalization for arbitrary boundaries. We plan to release our source code and datasets to support further research in this direction. △ Less

Submitted 9 June, 2024; originally announced June 2024.

arXiv:2406.02908 [pdf, other]

The HS-CMU Dataset for Diagnosing Benign and Malignant Diseases through Hysteroscopy

Authors: Ruxue Han, Yuantao Xie, Kangze You, Lijun Cao, Hua Li

Abstract: Hysteroscopy enables direct visualization of morphological changes in the endometrium, serving as an important means for screening, diagnosing, and treating intrauterine lesions. Accurate identification of the benign or malignant nature of diseases is crucial. However, the complexity and variability of uterine morphology increase the difficulty of identification, leading to missed diagnoses and mi… ▽ More Hysteroscopy enables direct visualization of morphological changes in the endometrium, serving as an important means for screening, diagnosing, and treating intrauterine lesions. Accurate identification of the benign or malignant nature of diseases is crucial. However, the complexity and variability of uterine morphology increase the difficulty of identification, leading to missed diagnoses and misdiagnoses, often requiring the expertise of experienced gynecologists and pathologists. Here, we provide the video and image dataset of hysteroscopic examinations conducted at Beijing Chaoyang Hospital, Capital Medical University (named the HS-CMU dataset), recording videos of 175 patients undergoing hysteroscopic surgery to explore the uterine cavity. These data were obtained using corresponding supporting software. From these videos, 3385 high-quality images from 8 categories were selected to form the HS-CMU dataset. These images were annotated by two experienced obstetricians and gynecologists using lableme software. We hope that this dataset can be used as an auxiliary tool for the diagnosis of intrauterine benign and malignant diseases. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.18284 [pdf, other]

Adaptive debiased SGD in high-dimensional GLMs with streaming data

Authors: Ruijian Han, Lan Luo, Yuanhang Luo, Yuanyuan Lin, Jian Huang

Abstract: Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing… ▽ More Online statistical inference facilitates real-time analysis of sequentially collected data, making it different from traditional methods that rely on static datasets. This paper introduces a novel approach to online inference in high-dimensional generalized linear models, where we update regression coefficient estimates and their standard errors upon each new data arrival. In contrast to existing methods that either require full dataset access or large-dimensional summary statistics storage, our method operates in a single-pass mode, significantly reducing both time and space complexity. The core of our methodological innovation lies in an adaptive stochastic gradient descent algorithm tailored for dynamic objective functions, coupled with a novel online debiasing procedure. This allows us to maintain low-dimensional summary statistics while effectively controlling optimization errors introduced by the dynamically changing loss functions. We demonstrate that our method, termed the Approximated Debiased Lasso (ADL), not only mitigates the need for the bounded individual probability condition but also significantly improves numerical performance. Numerical experiments demonstrate that the proposed ADL method consistently exhibits robust performance across various covariance matrix structures. △ Less

Submitted 1 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

Comments: 37 pages, 4 figures

arXiv:2405.18102 [pdf, ps, other]

Apportionment with Weighted Seats

Authors: Julian Chingoma, Ulle Endriss, Ronald de Haan, Adrian Haret, Jan Maly

Abstract: Apportionment is the task of assigning resources to entities with different entitlements in a fair manner, and specifically a manner that is as proportional as possible. The best-known application concerns the assignment of parliamentary seats to political parties based on their share in the popular vote. Here we enrich the standard model of apportionment by associating each seat with a weight tha… ▽ More Apportionment is the task of assigning resources to entities with different entitlements in a fair manner, and specifically a manner that is as proportional as possible. The best-known application concerns the assignment of parliamentary seats to political parties based on their share in the popular vote. Here we enrich the standard model of apportionment by associating each seat with a weight that reflects the value of that seat, for example because seats come with different roles, such as chair or treasurer, that have different (objective) values. We define several apportionment methods and natural fairness requirements for this new setting, and study the extent to which our methods satisfy our requirements. Our findings show that full fairness is harder to achieve than in the standard apportionment setting. At the same time, for relaxations of those requirements we can achieve stronger results than in the more general model of weighted fair division, where the values of objects are subjective. △ Less

Submitted 28 May, 2024; originally announced May 2024.

arXiv:2405.17792 [pdf, other]

JUNO Sensitivity to Invisible Decay Modes of Neutrons

Authors: JUNO Collaboration, Angel Abusleme, Thomas Adam, Kai Adamowicz, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli, Daniel Bick , et al. (635 additional authors not shown)

Abstract: We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation mode… ▽ More We explore the bound neutrons decay into invisible particles (e.g., $n\rightarrow 3 ν$ or $nn \rightarrow 2 ν$) in the JUNO liquid scintillator detector. The invisible decay includes two decay modes: $ n \rightarrow { inv} $ and $ nn \rightarrow { inv} $. The invisible decays of $s$-shell neutrons in $^{12}{\rm C}$ will leave a highly excited residual nucleus. Subsequently, some de-excitation modes of the excited residual nuclei can produce a time- and space-correlated triple coincidence signal in the JUNO detector. Based on a full Monte Carlo simulation informed with the latest available data, we estimate all backgrounds, including inverse beta decay events of the reactor antineutrino $\barν_e$, natural radioactivity, cosmogenic isotopes and neutral current interactions of atmospheric neutrinos. Pulse shape discrimination and multivariate analysis techniques are employed to further suppress backgrounds. With two years of exposure, JUNO is expected to give an order of magnitude improvement compared to the current best limits. After 10 years of data taking, the JUNO expected sensitivities at a 90% confidence level are $τ/B( n \rightarrow { inv} ) > 5.0 \times 10^{31} \, {\rm yr}$ and $τ/B( nn \rightarrow { inv} ) > 1.4 \times 10^{32} \, {\rm yr}$. △ Less

Submitted 27 May, 2024; originally announced May 2024.

Comments: 28 pages, 7 figures, 4 tables

arXiv:2405.13153 [pdf, other]

Max-sliced Wasserstein concentration and uniform ratio bounds of empirical measures on RKHS

Authors: Ruiyu Han, Cynthia Rush, Johannes Wiesel

Abstract: Optimal transport and the Wasserstein distance $\mathcal{W}_p$ have recently seen a number of applications in the fields of statistics, machine learning, data science, and the physical sciences. These applications are however severely restricted by the curse of dimensionality, meaning that the number of data points needed to estimate these problems accurately increases exponentially in the dimensi… ▽ More Optimal transport and the Wasserstein distance $\mathcal{W}_p$ have recently seen a number of applications in the fields of statistics, machine learning, data science, and the physical sciences. These applications are however severely restricted by the curse of dimensionality, meaning that the number of data points needed to estimate these problems accurately increases exponentially in the dimension. To alleviate this problem, a number of variants of $\mathcal{W}_p$ have been introduced. We focus here on one of these variants, namely the max-sliced Wasserstein metric $\overline{\mathcal{W}}_p$. This metric reduces the high-dimensional minimization problem given by $\mathcal{W}_p$ to a maximum of one-dimensional measurements in an effort to overcome the curse of dimensionality. In this note we derive concentration results and upper bounds on the expectation of $\overline{\mathcal{W}}_p$ between the true and empirical measure on unbounded reproducing kernel Hilbert spaces. We show that, under quite generic assumptions, probability measures concentrate uniformly fast in one-dimensional subspaces, at (nearly) parametric rates. Our results rely on an improvement of currently known bounds for $\overline{\mathcal{W}}_p$ in the finite-dimensional case. △ Less

Submitted 21 May, 2024; originally announced May 2024.

arXiv:2405.07810 [pdf, ps, other]

Sharp localization on the first supercritical stratum for Liouville frequencies

Authors: Rui Han

Abstract: We establish Anderson localization for Schrödinger operators with even analytic potentials on the first supercritical stratum for Liouville frequencies in the sharp regime $\{E: L(ω,E)>β(ω)>0, κ(ω,E)=1\}$, with $κ(ω,E)$ being Avila's acceleration. This paper builds on the large deviation measure estimate and complexity bound scheme, originally developed for Diophantine frequencies by Bourgain, Gol… ▽ More We establish Anderson localization for Schrödinger operators with even analytic potentials on the first supercritical stratum for Liouville frequencies in the sharp regime $\{E: L(ω,E)>β(ω)>0, κ(ω,E)=1\}$, with $κ(ω,E)$ being Avila's acceleration. This paper builds on the large deviation measure estimate and complexity bound scheme, originally developed for Diophantine frequencies by Bourgain, Goldstein and Schlag \cites{BG,BGS1,BGS2}, and the improved complexity bounds in \cite{HS1}. Additionally, it strengthens the large deviation estimates for weak Liouville frequencies in \cite{HZ}. We also introduce new ideas to handle Liouville frequencies in a sharp way. △ Less

Submitted 13 May, 2024; originally announced May 2024.

arXiv:2405.02965 [pdf, other]

Robust Collaborative Perception without External Localization and Clock Devices

Authors: Zixing Lei, Zhenyang Ni, Ruize Han, Shuo Tang, Dingju Wang, Chen Feng, Siheng Chen, Yanfeng Wang

Abstract: A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and… ▽ More A consistent spatial-temporal coordination across multiple agents is fundamental for collaborative perception, which seeks to improve perception abilities through information exchange among agents. To achieve this spatial-temporal alignment, traditional methods depend on external devices to provide localization and clock signals. However, hardware-generated signals could be vulnerable to noise and potentially malicious attack, jeopardizing the precision of spatial-temporal alignment. Rather than relying on external hardwares, this work proposes a novel approach: aligning by recognizing the inherent geometric patterns within the perceptual data of various agents. Following this spirit, we propose a robust collaborative perception system that operates independently of external localization and clock devices. The key module of our system,~\emph{FreeAlign}, constructs a salient object graph for each agent based on its detected boxes and uses a graph neural network to identify common subgraphs between agents, leading to accurate relative pose and time. We validate \emph{FreeAlign} on both real-world and simulated datasets. The results show that, the ~\emph{FreeAlign} empowered robust collaborative perception system perform comparably to systems relying on precise localization and clock devices. △ Less

Submitted 31 May, 2024; v1 submitted 5 May, 2024; originally announced May 2024.

Comments: 6pages, accepted to ICRA 2024

arXiv:2404.15958 [pdf, other]

Platooning of Heterogeneous Vehicles with Actuation Delays: Theoretical and Experimental Results

Authors: Redmer de Haan, Lorenzo Redi, Tom van der Sande, Erjen Lefeber

Abstract: In this paper we present a prediction-based Cooperative Adaptive Cruise Controller for vehicles with actuation delay, applicable within heterogeneous platoons. We provide a stability analysis for the discrete-time implementation of this controller, which shows the effect of the used sampling times and can be used for selecting appropriate controller gains. The theoretical results are validated by… ▽ More In this paper we present a prediction-based Cooperative Adaptive Cruise Controller for vehicles with actuation delay, applicable within heterogeneous platoons. We provide a stability analysis for the discrete-time implementation of this controller, which shows the effect of the used sampling times and can be used for selecting appropriate controller gains. The theoretical results are validated by means of experiments using full scale vehicles. This is an extended version of a paper with the same title (submitted to IFAC TDS 2024). Additional mathematical details are provided in this extended version. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2404.15825 [pdf]

Impact of Top SiO2 interlayer Thickness on Memory Window of Si Channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) Gate Structure

Authors: Tao Hu, Xianzhou Shao, Mingkai Bai, Xinpei Jia, Saifei Dai, Xiaoqing Sun, Runhao Han, Jia Yang, Xiaoyu Ke, Fengbin Tian, Shuai Yang, Junshuai Chai, Hao Xu, Xiaolei Wang, Wenwu Wang, Tianchun Ye

Abstract: We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window. We study the impact of top SiO2 interlayer thickness on memory window of Si channel FeFET with TiN/SiO2/Hf0.5Zr0.5O2/SiOx/Si (MIFIS) gate structure. The memory window increases with thicker top SiO2. We realize the memory window of 6.3 V for 3.4 nm top SiO2. Moreover, we find that the endurance characteristic degrades with increasing the initial memory window. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 4 page 7 figures

arXiv:2404.10541 [pdf, other]

MPCOM: Robotic Data Gathering with Radio Mapping and Model Predictive Communication

Authors: Zhiyou Ji, Guoliang Li, Ruihua Han, Shuai Wang, Bing Bai, Wei Xu, Kejiang Ye, Chengzhong Xu

Abstract: Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guide… ▽ More Robotic data gathering (RDG) is an emerging paradigm that navigates a robot to harvest data from remote sensors. However, motion planning in this paradigm needs to maximize the RDG efficiency instead of the navigation efficiency, for which the existing motion planning methods become inefficient, as they plan robot trajectories merely according to motion factors. This paper proposes radio map guided model predictive communication (MPCOM), which navigates the robot with both grid and radio maps for shape-aware collision avoidance and communication-aware trajectory generation in a dynamic environment. The proposed MPCOM is able to trade off the time spent on reaching goal, avoiding collision, and improving communication. MPCOM captures high-order signal propagation characteristics using radio maps and incorporates the map-guided communication regularizer to the motion planning block. Experiments in IRSIM and CARLA simulators show that the proposed MPCOM outperforms other benchmarks in both LOS and NLOS cases. Real-world testing based on car-like robots is also provided to demonstrate the effectiveness of MPCOM in indoor environments. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: submit to IROS

arXiv:2404.01693 [pdf, other]

HeMeNet: Heterogeneous Multichannel Equivariant Network for Protein Multitask Learning

Authors: Rong Han, Wenbing Huang, Lingxiao Luo, Xinyan Han, Jiaming Shen, Zhiqiang Zhang, Jun Zhou, Ting Chen

Abstract: Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance… ▽ More Understanding and leveraging the 3D structures of proteins is central to a variety of biological and drug discovery tasks. While deep learning has been applied successfully for structure-based protein function prediction tasks, current methods usually employ distinct training for each task. However, each of the tasks is of small size, and such a single-task strategy hinders the models' performance and generalization ability. As some labeled 3D protein datasets are biologically related, combining multi-source datasets for larger-scale multi-task learning is one way to overcome this problem. In this paper, we propose a neural network model to address multiple tasks jointly upon the input of 3D protein structures. In particular, we first construct a standard structure-based multi-task benchmark called Protein-MT, consisting of 6 biologically relevant tasks, including affinity prediction and property prediction, integrated from 4 public datasets. Then, we develop a novel graph neural network for multi-task learning, dubbed Heterogeneous Multichannel Equivariant Network (HeMeNet), which is E(3) equivariant and able to capture heterogeneous relationships between different atoms. Besides, HeMeNet can achieve task-specific learning via the task-aware readout mechanism. Extensive evaluations on our benchmark verify the effectiveness of multi-task learning, and our model generally surpasses state-of-the-art models. △ Less

Submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.16430 [pdf]

doi 10.1145/3636534.3649382

AeroBridge: Autonomous Drone Handoff System for Emergency Battery Service

Authors: Avishkar Seth, Alice James, Endrowednes Kuantama, Richard Han, Subhas Mukhopadhyay

Abstract: This paper proposes an Emergency Battery Service (EBS) for drones in which an EBS drone flies to a drone in the field with a depleted battery and transfers a fresh battery to the exhausted drone. The authors present a unique battery transfer mechanism and drone localization that uses the Cross Marker Position (CMP) method. The main challenges include a stable and balanced transfer that precisely l… ▽ More This paper proposes an Emergency Battery Service (EBS) for drones in which an EBS drone flies to a drone in the field with a depleted battery and transfers a fresh battery to the exhausted drone. The authors present a unique battery transfer mechanism and drone localization that uses the Cross Marker Position (CMP) method. The main challenges include a stable and balanced transfer that precisely localizes the receiver drone. The proposed EBS drone mitigates the effects of downwash due to the vertical proximity between the drones by implementing diagonal alignment with the receiver, reducing the distance to 0.5 m between the two drones. CFD analysis shows that diagonal instead of perpendicular alignment minimizes turbulence, and the authors verify the actual system for change in output airflow and thrust measurements. The CMP marker-based localization method enables position lock for the EBS drone with up to 0.9 cm accuracy. The performance of the transfer mechanism is validated experimentally by successful mid-air transfer in 5 seconds, where the EBS drone is within 0.5 m vertical distance from the receiver drone, wherein 4m/s turbulence does not affect the transfer process. △ Less

Submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.13667 [pdf, other]

DanceCamera3D: 3D Camera Movement Synthesis with Music and Dance

Authors: Zixuan Wang, Jia Jia, Shikun Sun, Haozhe Wu, Rong Han, Zhenyu Li, Di Tang, Jiaqing Zhou, Jiebo Luo

Abstract: Choreographers determine what the dances look like, while cameramen determine the final presentation of dances. Recently, various methods and datasets have showcased the feasibility of dance synthesis. However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus, we present DCM, a new multi-modal 3D dataset, which for the… ▽ More Choreographers determine what the dances look like, while cameramen determine the final presentation of dances. Recently, various methods and datasets have showcased the feasibility of dance synthesis. However, camera movement synthesis with music and dance remains an unsolved challenging problem due to the scarcity of paired data. Thus, we present DCM, a new multi-modal 3D dataset, which for the first time combines camera movement with dance motion and music audio. This dataset encompasses 108 dance sequences (3.2 hours) of paired dance-camera-music data from the anime community, covering 4 music genres. With this dataset, we uncover that dance camera movement is multifaceted and human-centric, and possesses multiple influencing factors, making dance camera synthesis a more challenging task compared to camera or dance synthesis alone. To overcome these difficulties, we propose DanceCamera3D, a transformer-based diffusion model that incorporates a novel body attention loss and a condition separation strategy. For evaluation, we devise new metrics measuring camera movement quality, diversity, and dancer fidelity. Utilizing these metrics, we conduct extensive experiments on our DCM dataset, providing both quantitative and qualitative evidence showcasing the effectiveness of our DanceCamera3D model. Code and video demos are available at https://github.com/Carmenw1203/DanceCamera3D-Official. △ Less

Submitted 20 March, 2024; originally announced March 2024.

Comments: Accept to CVPR 2024

arXiv:2403.06828 [pdf, other]

NeuPAN: Direct Point Robot Navigation with End-to-End Model-based Learning

Authors: Ruihua Han, Shuai Wang, Shuaijun Wang, Zeqing Zhang, Jianjun Chen, Shijie Lin, Chengyang Li, Chengzhong Xu, Yonina C. Eldar, Qi Hao, Jia Pan

Abstract: Navigating a nonholonomic robot in a cluttered environment requires extremely accurate perception and locomotion for collision avoidance. This paper presents NeuPAN: a real-time, highly-accurate, map-free, robot-agnostic, and environment-invariant robot navigation solution. Leveraging a tightly-coupled perception-locomotion framework, NeuPAN has two key innovations compared to existing approaches:… ▽ More Navigating a nonholonomic robot in a cluttered environment requires extremely accurate perception and locomotion for collision avoidance. This paper presents NeuPAN: a real-time, highly-accurate, map-free, robot-agnostic, and environment-invariant robot navigation solution. Leveraging a tightly-coupled perception-locomotion framework, NeuPAN has two key innovations compared to existing approaches: 1) it directly maps raw points to a learned multi-frame distance space, avoiding error propagation from perception to control; 2) it is interpretable from an end-to-end model-based learning perspective, enabling provable convergence. The crux of NeuPAN is to solve a high-dimensional end-to-end mathematical model with various point-level constraints using the plug-and-play (PnP) proximal alternating-minimization network (PAN) with neurons in the loop. This allows NeuPAN to generate real-time, end-to-end, physically-interpretable motions directly from point clouds, which seamlessly integrates data- and knowledge-engines, where its network parameters are adjusted via back propagation. We evaluate NeuPAN on car-like robot, wheel-legged robot, and passenger autonomous vehicle, in both simulated and real-world environments. Experiments demonstrate that NeuPAN outperforms various benchmarks, in terms of accuracy, efficiency, robustness, and generalization capability across various environments, including the cluttered sandbox, office, corridor, and parking lot. We show that NeuPAN works well in unstructured environments with arbitrary-shape undetectable objects, making impassable ways passable. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: submit to TRO

arXiv:2403.03541 [pdf, other]

Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving

Authors: He Li, Ruihua Han, Zirui Zhao, Wei Xu, Qi Hao, Shuai Wang, Chengzhong Xu

Abstract: Virtual reality (VR) is a promising data engine for autonomous driving (AD). However, data fidelity in this paradigm is often degraded by VR inconsistency, for which the existing VR approaches become ineffective, as they ignore the inter-dependency between low-level VR synchronizer designs (i.e., data collector) and high-level VR synthesizer designs (i.e., data processor). This paper presents a se… ▽ More Virtual reality (VR) is a promising data engine for autonomous driving (AD). However, data fidelity in this paradigm is often degraded by VR inconsistency, for which the existing VR approaches become ineffective, as they ignore the inter-dependency between low-level VR synchronizer designs (i.e., data collector) and high-level VR synthesizer designs (i.e., data processor). This paper presents a seamless virtual reality SVR platform for AD, which mitigates such inconsistency, enabling VR agents to interact with each other in a shared symbiotic world. The crux to SVR is an integrated synchronizer and synthesizer IS2 design, which consists of a drift-aware lidar-inertial synchronizer for VR colocation and a motion-aware deep visual synthesis network for augmented reality image generation. We implement SVR on car-like robots in two sandbox platforms, achieving a cm-level VR colocalization accuracy and 3.2% VR image deviation, thereby avoiding missed collisions or model clippings. Experiments show that the proposed SVR reduces the intervention times, missed turns, and failure rates compared to other benchmarks. The SVR-trained neural network can handle unseen situations in real-world environments, by leveraging its knowledge learnt from the VR space. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2403.01513 [pdf]

CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion

Authors: Jiao Ding, Jie Chang, Renrui Han, Li Yang

Abstract: Accurate segmentation of COVID-19 CT images is crucial for reducing the severity and mortality rates associated with COVID-19 infections. In response to blurred boundaries and high variability characteristic of lesion areas in COVID-19 CT images, we introduce CDSE-UNet: a novel UNet-based segmentation model that integrates Canny operator edge detection and a dual-path SENet feature fusion mechanis… ▽ More Accurate segmentation of COVID-19 CT images is crucial for reducing the severity and mortality rates associated with COVID-19 infections. In response to blurred boundaries and high variability characteristic of lesion areas in COVID-19 CT images, we introduce CDSE-UNet: a novel UNet-based segmentation model that integrates Canny operator edge detection and a dual-path SENet feature fusion mechanism. This model enhances the standard UNet architecture by employing the Canny operator for edge detection in sample images, paralleling this with a similar network structure for semantic feature extraction. A key innovation is the Double SENet Feature Fusion Block, applied across corresponding network layers to effectively combine features from both image paths. Moreover, we have developed a Multiscale Convolution approach, replacing the standard Convolution in UNet, to adapt to the varied lesion sizes and shapes. This addition not only aids in accurately classifying lesion edge pixels but also significantly improves channel differentiation and expands the capacity of the model. Our evaluations on public datasets demonstrate CDSE-UNet's superior performance over other leading models, particularly in segmenting large and small lesion areas, accurately delineating lesion edges, and effectively suppressing noise △ Less

Submitted 3 March, 2024; originally announced March 2024.

arXiv:2402.02108 [pdf, other]

From Synthetic to Real: Unveiling the Power of Synthetic Data for Video Person Re-ID

Authors: Xiangqun Zhang, Ruize Han, Wei Feng

Abstract: In this paper, we study a new problem of cross-domain video based person re-identification (Re-ID). Specifically, we take the synthetic video dataset as the source domain for training and use the real-world videos for testing, which significantly reduces the dependence on real training data collection and annotation. To unveil the power of synthetic data for video person Re-ID, we first propose a… ▽ More In this paper, we study a new problem of cross-domain video based person re-identification (Re-ID). Specifically, we take the synthetic video dataset as the source domain for training and use the real-world videos for testing, which significantly reduces the dependence on real training data collection and annotation. To unveil the power of synthetic data for video person Re-ID, we first propose a self-supervised domain invariant feature learning strategy for both static and temporal features. Then, to further improve the person identification ability in the target domain, we develop a mean-teacher scheme with the self-supervised ID consistency loss. Experimental results on four real datasets verify the rationality of cross-synthetic-real domain adaption and the effectiveness of our method. We are also surprised to find that the synthetic data performs even better than the real data in the cross-domain setting. △ Less

Submitted 3 February, 2024; originally announced February 2024.

arXiv:2402.01808 [pdf, other]

KS-Net: Multi-band joint speech restoration and enhancement network for 2024 ICASSP SSI Challenge

Authors: Guochen Yu, Runqiang Han, Chenglin Xu, Haoran Zhao, Nan Li, Chen Zhang, Xiguang Zheng, Chao Zhou, Qi Huang, Bing Yu

Abstract: This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean… ▽ More This paper presents the speech restoration and enhancement system created by the 1024K team for the ICASSP 2024 Speech Signal Improvement (SSI) Challenge. Our system consists of a generative adversarial network (GAN) in complex-domain for speech restoration and a fine-grained multi-band fusion module for speech enhancement. In the blind test set of SSI, the proposed system achieves an overall mean opinion score (MOS) of 3.49 based on ITU-T P.804 and a Word Accuracy Rate (WAcc) of 0.78 for the real-time track, as well as an overall P.804 MOS of 3.43 and a WAcc of 0.78 for the non-real-time track, ranking 1st in both tracks. △ Less

Submitted 2 February, 2024; originally announced February 2024.

Comments: Accepted to ICASSP 2024; Rank 1st in ICASSP 2024 Speech Signal Improvement (SSI) Challenge

arXiv:2401.17617 [pdf, other]

Unveiling the Power of Self-supervision for Multi-view Multi-human Association and Tracking

Authors: Wei Feng, Feifan Wang, Ruize Han, Zekun Qian, Song Wang

Abstract: Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the video… ▽ More Multi-view multi-human association and tracking (MvMHAT), is a new but important problem for multi-person scene video surveillance, aiming to track a group of people over time in each view, as well as to identify the same person across different views at the same time, which is different from previous MOT and multi-camera MOT tasks only considering the over-time human tracking. This way, the videos for MvMHAT require more complex annotations while containing more information for self learning. In this work, we tackle this problem with a self-supervised learning aware end-to-end network. Specifically, we propose to take advantage of the spatial-temporal self-consistency rationale by considering three properties of reflexivity, symmetry and transitivity. Besides the reflexivity property that naturally holds, we design the self-supervised learning losses based on the properties of symmetry and transitivity, for both appearance feature learning and assignment matrix optimization, to associate the multiple humans over time and across views. Furthermore, to promote the research on MvMHAT, we build two new large-scale benchmarks for the network training and testing of different algorithms. Extensive experiments on the proposed benchmarks verify the effectiveness of our method. We have released the benchmark and code to the public. △ Less

Submitted 31 January, 2024; originally announced January 2024.

arXiv:2401.17488 [pdf, other]

A universal pairing gap measurement proposal by dynamical excitations in 2D doped attractive Fermi-Hubbard model with spin-orbit coupling

Authors: Huaisong Zhao, Rui Han, Ling Qin, Feng Yuan, Peng Zou

Abstract: By calculating dynamical structure factor of two-dimensional doped attractive Fermi-Hubbard model with Rashba spin-orbit coupling, we not only investigate collective modes and single-particle excitations of the system during the phase transition between Bardeen-Cooper-Schrieffer superfluid and topological superfluid, but also propose a universal method to measure pairing gap measurement in an opti… ▽ More By calculating dynamical structure factor of two-dimensional doped attractive Fermi-Hubbard model with Rashba spin-orbit coupling, we not only investigate collective modes and single-particle excitations of the system during the phase transition between Bardeen-Cooper-Schrieffer superfluid and topological superfluid, but also propose a universal method to measure pairing gap measurement in an optical lattice system. Our numerical results show that the area of the molecular excitation peak at the transferred momentum ${\bf q}=\left[π,π\right]$ is proportional to the square of the pairing gap in the system with Rashba SOC. In particular, this method is very sensitive to the pairing gap. This goes on verifying that this method is universal to measure the pairing gap in a doped optical lattice with Rashba SOC. These theoretical results are important for experimentally measuring the pairing gap and studying the topological superfluid in an optical lattice. △ Less

Submitted 30 January, 2024; originally announced January 2024.

Comments: 12 pages, 8 figures

arXiv:2401.15839 [pdf, other]

Swarm: Cost-Efficient Video Content Distribution with a Peer-to-Peer System

Authors: Dehui Wei, Jiao Zhang, Haozhe Li, Zhichen Xue, Yajie Peng, Xiaofei Pang, Rui Han, Yan Ma, Jialin Li

Abstract: As ByteDance's business expands, the substantial infrastructure expenses associated with centralized Content Delivery Network (CDN) networks have rendered content distribution costs prohibitively high. In response, we embarked on exploring a peer-to-peer (P2P) network as a promising solution to alleviate the escalating costs of content distribution. However, the decentralized nature of P2P often i… ▽ More As ByteDance's business expands, the substantial infrastructure expenses associated with centralized Content Delivery Network (CDN) networks have rendered content distribution costs prohibitively high. In response, we embarked on exploring a peer-to-peer (P2P) network as a promising solution to alleviate the escalating costs of content distribution. However, the decentralized nature of P2P often introduces performance challenges, given the diversity and dispersion of peer devices. This study introduces Swarm, ByteDance's innovative hybrid system for video streaming. Swarm seamlessly integrates the robustness of a conventional CDN with the cost-efficiency of a decentralized P2P network. Its primary aim is to provide users with reliable streaming quality while minimizing traffic expenses. To achieve this, Swarm employs a centralized control plane comprised of a tracker cluster, overseeing a data plane with numerous edge residual resources. The tracker also takes on the responsibility of mapping clients to servers. Addressing the performance disparities among individual peer servers, Swarm utilizes our proprietary multipath parallel transmission method for communication between clients and peer servers. Operating stably for six years, Swarm now manages over a hundred thousand peer servers, serving nearly a hundred million users daily and saving the company hundreds of millions of RMB annually. Experimental results affirm that, while significantly cutting costs, Swarm performs on par with traditional CDNs. △ Less

Submitted 28 January, 2024; originally announced January 2024.

arXiv:2401.08463 [pdf, other]

Statistical inference for pairwise comparison models

Authors: Ruijian Han, Wenlu Tang, Yiming Xu

Abstract: Pairwise comparison models have been widely used for utility evaluation and ranking across various fields. The increasing scale of problems today underscores the need to understand statistical inference in these models when the number of subjects diverges, a topic currently lacking in the literature except in a few special instances. To partially address this gap, this paper establishes a near-opt… ▽ More Pairwise comparison models have been widely used for utility evaluation and ranking across various fields. The increasing scale of problems today underscores the need to understand statistical inference in these models when the number of subjects diverges, a topic currently lacking in the literature except in a few special instances. To partially address this gap, this paper establishes a near-optimal asymptotic normality result for the maximum likelihood estimator in a broad class of pairwise comparison models, as well as a non-asymptotic convergence rate for each individual subject under comparison. The key idea lies in identifying the Fisher information matrix as a weighted graph Laplacian, which can be studied via a meticulous spectral analysis. Our findings provide a unified theory for performing statistical inference in a wide range of pairwise comparison models beyond the Bradley--Terry model, benefiting practitioners with theoretical guarantees for their use. Simulations utilizing synthetic data are conducted to validate the asymptotic normality result, followed by a hypothesis test using a tennis competition dataset. △ Less

Submitted 2 April, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

Comments: 28 pages; include additional results

arXiv:2401.05670 [pdf, other]

Effects of zero and reversed magnetic shear on resistive wall modes in a limiter tokamak plasma

Authors: Sui Wan, Ping Zhu, Haolong Li, Rui Han

Abstract: Advanced tokamak scenarios often feature equilibriums with zero and reversed magnetic shear. To isolate and investigate their impacts on the resistive wall mode (RWM) instability analytically, we construct a series of cylindrical limiter equilibriums with reversed magnetic shear in the core and zero magnetic shear towards plasma edge, as a prototype of the configurations in advanced tokamak scenar… ▽ More Advanced tokamak scenarios often feature equilibriums with zero and reversed magnetic shear. To isolate and investigate their impacts on the resistive wall mode (RWM) instability analytically, we construct a series of cylindrical limiter equilibriums with reversed magnetic shear in the core and zero magnetic shear towards plasma edge, as a prototype of the configurations in advanced tokamak scenarios. Uniform plasma pressure is assumed, so that we can focus our analysis on the current-driven RWMs. Based on the reduced ideal MHD equations, analytical solutions for the $n=1$ resistive wall mode are obtained, which indicate that increasing the reversal of magnetic shear in the core region enhances the RWM instability, whereas the widened region of zero shear near edge leads to lower growth rate of RWM, except when the $q$ value with zero magnetic shear approaches rational values. On the other hand, enhanced positive shear at plasma edge is found to be stabilizing on RWM. NIMROD calculation results confirm these analytical findings. △ Less

Submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.03697 [pdf, other]

An audio-quality-based multi-strategy approach for target speaker extraction in the MISP 2023 Challenge

Authors: Runduo Han, Xiaopeng Yan, Weiming Xu, Pengcheng Guo, Jiayao Sun, He Wang, Quan Lu, Ning Jiang, Lei Xie

Abstract: This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-en… ▽ More This paper describes our audio-quality-based multi-strategy approach for the audio-visual target speaker extraction (AVTSE) task in the Multi-modal Information based Speech Processing (MISP) 2023 Challenge. Specifically, our approach adopts different extraction strategies based on the audio quality, striking a balance between interference removal and speech preservation, which benifits the back-end automatic speech recognition (ASR) systems. Experiments show that our approach achieves a character error rate (CER) of 24.2% and 33.2% on the Dev and Eval set, respectively, obtaining the second place in the challenge. △ Less

Submitted 6 March, 2024; v1 submitted 8 January, 2024; originally announced January 2024.

Comments: Accepted by ICASSP 2024

arXiv:2401.00769 [pdf, other]

Cavitation bubble dynamics inside a droplet suspended in a different host fluid

Authors: Shuai Li, Zhesheng Zhao, A-Man Zhang, Rui Han

Abstract: In this paper, we present a theoretical, experimental, and numerical study of the dynamics of cavitation bubbles inside a droplet suspended in another host fluid. On the theoretical side, we provided a modified Rayleigh collapse time and natural frequency for spherical bubbles in our particular context, characterized by the density ratio between the two liquids and the bubble-to-droplet size ratio… ▽ More In this paper, we present a theoretical, experimental, and numerical study of the dynamics of cavitation bubbles inside a droplet suspended in another host fluid. On the theoretical side, we provided a modified Rayleigh collapse time and natural frequency for spherical bubbles in our particular context, characterized by the density ratio between the two liquids and the bubble-to-droplet size ratio. Regarding the experimental aspect, experiments were carried out for laser-induced cavitation bubbles inside oil-in-water (O/W) or water-in-oil (W/O) droplets. Two distinct fluid-mixing mechanisms were unveiled in the two systems, respectively. In the case of O/W droplets, a liquid jet emerges around the end of the bubble collapse phase, effectively penetrating the droplet interface. We offer a detailed analysis of the criteria governing jet penetration, involving the standoff parameter and impact velocity of the bubble jet on the droplet surface. Conversely, in the scenario involving W/O droplets, the bubble traverses the droplet interior, inducing global motion and eventually leading to droplet pinch-off when the local Weber number exceeds a critical value. This phenomenon is elucidated through the equilibrium between interfacial and kinetic energies. Lastly, our boundary integral model faithfully reproduces the essential physics of nonspherical bubble dynamics observed in the experiments. We conduct a parametric study spanning a wide parameter space to investigate bubble-droplet interactions. The insights from this study could serve as a valuable reference for practical applications in the field of ultrasonic emulsification, pharmacy, etc. △ Less

Submitted 1 January, 2024; originally announced January 2024.

Comments: 31 pages,17 figures,Accepted by Journal of Fluid Mechanics

arXiv:2312.16829 [pdf]

Enlargement of Memory Window of Si Channel FeFET by Inserting Al2O3 Interlayer on Ferroelectric Hf0.5Zr0.5O2

Authors: Tao Hu, Xiaoqing Sun, Mingkai Bai, Xinpei Jia, Saifei Dai, Tingting Li, Runhao Han, Yajing Ding, Hongyang Fan, Yuanyuan Zhao, Junshuai Chai, Hao Xu, Mengwei Si, Xiaolei Wang, Wenwu Wang

Abstract: In this work, we demonstrate the enlargement of the memory window of Si channel FeFET with ferroelectric Hf0.5Zr0.5O2 by gate-side dielectric interlayer engineering. By inserting an Al2O3 dielectric interlayer between TiN gate metal and ferroelectric Hf0.5Zr0.5O2, we achieve a memory window of 3.2 V with endurance of ~105 cycles and retention over 10 years. The physical origin of memory window enl… ▽ More In this work, we demonstrate the enlargement of the memory window of Si channel FeFET with ferroelectric Hf0.5Zr0.5O2 by gate-side dielectric interlayer engineering. By inserting an Al2O3 dielectric interlayer between TiN gate metal and ferroelectric Hf0.5Zr0.5O2, we achieve a memory window of 3.2 V with endurance of ~105 cycles and retention over 10 years. The physical origin of memory window enlargement is clarified to be charge trapping at the Al2O3/Hf0.5Zr0.5O2 interface, which has an opposite charge polarity to the trapped charges at the Hf0.5Zr0.5O2/SiOx interface. △ Less

Submitted 28 December, 2023; originally announced December 2023.

Comments: 3 pages,6 figures;

arXiv:2312.13722 [pdf, other]

BAE-Net: A Low complexity and high fidelity Bandwidth-Adaptive neural network for speech super-resolution

Authors: Guochen Yu, Xiguang Zheng, Nan Li, Runqiang Han, Chengshi Zheng, Chen Zhang, Chao Zhou, Qi Huang, Bing Yu

Abstract: Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose… ▽ More Speech bandwidth extension (BWE) has demonstrated promising performance in enhancing the perceptual speech quality in real communication systems. Most existing BWE researches primarily focus on fixed upsampling ratios, disregarding the fact that the effective bandwidth of captured audio may fluctuate frequently due to various capturing devices and transmission conditions. In this paper, we propose a novel streaming adaptive bandwidth extension solution dubbed BAE-Net, which is suitable to handle the low-resolution speech with unknown and varying effective bandwidth. To address the challenges of recovering both the high-frequency magnitude and phase speech content blindly, we devise a dual-stream architecture that incorporates the magnitude inpainting and phase refinement. For potential applications on edge devices, this paper also introduces BAE-NET-lite, which is a lightweight, streaming and efficient framework. Quantitative results demonstrate the superiority of BAE-Net in terms of both performance and computational efficiency when compared with existing state-of-the-art BWE methods. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: Accepted to ICASSP 2024

arXiv:2312.10085 [pdf]

Phylogeny of Twenty-One Mammals

Authors: Ray Han

Abstract: Phylogeny can be inferred using two sources of data from an organism: morphological data and molecular data. Historically, phylogenies were usually inferred using morphological characters, but some morphological features may not necessarily indicate shared heritage. With the introduction of molecular phylogenies, the base sequence of genes, or amino acid sequence of proteins can be compared to fin… ▽ More Phylogeny can be inferred using two sources of data from an organism: morphological data and molecular data. Historically, phylogenies were usually inferred using morphological characters, but some morphological features may not necessarily indicate shared heritage. With the introduction of molecular phylogenies, the base sequence of genes, or amino acid sequence of proteins can be compared to find the number of similarities or differences to ascertain levels of relatedness between species. These two types of phylogenies are to be taken as a data-driven hypothesis about the evolutionary history of the studied organisms, and a consensus is drawn from the comparison between the different phylogenies built from the two sources of data, utilizing different methods. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.07630 [pdf, other]

Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks

Authors: Lingxiao Luo, Xuanzhong Chen, Bingda Tang, Xinsheng Chen, Rong Han, Chengpeng Hu, Yujiang Li, Ting Chen

Abstract: Recent advancements in foundation models, typically trained with self-supervised learning on large-scale and diverse datasets, have shown great potential in medical image analysis. However, due to the significant spatial heterogeneity of medical imaging data, current models must tailor specific structures for different datasets, making it challenging to leverage the abundant unlabeled data. In thi… ▽ More Recent advancements in foundation models, typically trained with self-supervised learning on large-scale and diverse datasets, have shown great potential in medical image analysis. However, due to the significant spatial heterogeneity of medical imaging data, current models must tailor specific structures for different datasets, making it challenging to leverage the abundant unlabeled data. In this work, we propose a universal foundation model for medical image analysis that processes images with heterogeneous spatial properties using a unified structure. To accomplish this, we propose spatially adaptive networks (SPAD-Nets), a family of networks that dynamically adjust the structures to adapt to the spatial properties of input images, to build such a universal foundation model. We pre-train a spatial adaptive visual tokenizer (SPAD-VT) and then a spatial adaptive Vision Transformer (SPAD-ViT) via masked image modeling (MIM) on 55 public medical image datasets. The pre-training data comprises over 9 million image slices, representing the largest, most comprehensive, and most diverse dataset to our knowledge for pre-training universal foundation models for medical image analysis. The experimental results on downstream medical image classification and segmentation tasks demonstrate the superior performance and label efficiency of our model. Our code is available at https://github.com/function2-llx/PUMIT. △ Less

Submitted 23 January, 2024; v1 submitted 12 December, 2023; originally announced December 2023.

arXiv:2312.01326 [pdf, other]

OA-ECBVC: A Cooperative Collision-free Encirclement and Capture Approach in Cluttered Environments

Authors: Xinyi Wang, Yulong Ding, Yizhou Chen, Ruihua Han, Lele Xi, Ben M. Chen

Abstract: This article investigates the practical scenarios of chasing an adversarial evader in an unbounded environment with cluttered obstacles. We propose a Voronoi-based decentralized algorithm for multiple pursuers to encircle and capture the evader by reacting to collisions. An efficient approach is presented for constructing an obstacle-aware evader-centered bounded Voronoi cell (OA-ECBVC), which str… ▽ More This article investigates the practical scenarios of chasing an adversarial evader in an unbounded environment with cluttered obstacles. We propose a Voronoi-based decentralized algorithm for multiple pursuers to encircle and capture the evader by reacting to collisions. An efficient approach is presented for constructing an obstacle-aware evader-centered bounded Voronoi cell (OA-ECBVC), which strictly ensures collision avoidance in various obstacle scenarios when pursuing the evader. The evader can be efficiently enclosed in a convex hull given random initial configurations. Furthermore, to cooperatively capture the evader, each pursuer continually compresses the boundary of its OA-ECBVC to quickly reduce the movement space of the evader while maintaining encirclement. Our OA-ECBVC algorithm is validated in various simulated environments with different dynamic systems of robots. Real-time performance of resisting uncertainties shows the superior reliability of our method for deployment on multiple robot platforms. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: 7 pages, 7 figures, conference

arXiv:2312.01175 [pdf]

High Q and high gradient performance of the first medium-temperature baking 1.3 GHz cryomodule

Authors: Jiyuan Zhai, Weimin Pan, Feisi He, Rui Ge, Zhenghui Mi, Peng Sha, Song Jin, Ruixiong Han, Qunyao Wang, Haiying Lin, Guangwei Wang, Mei Li, Minjing Sang, Liangrui Sun, Rui Ye, Tongxian Zhao, Shaopeng Li, Keyu Zhu, Baiqi Liu, Xiaolong Wang, Xiangchen Yang, Xiaojuan Bian, Xiangzhen Zhang, Huizhou Ma, Xuwen Dai , et al. (14 additional authors not shown)

Abstract: World's first 1.3 GHz cryomodule containing eight 9-cell superconducting radio-frequency (RF) cavities treated by medium-temperature furnace baking (mid-T bake) was developed, assembled and tested at IHEP for the Dalian Advanced Light Source (DALS) and CEPC R&D. The 9-cell cavities in the cryomodule achieved an unprecedented highest average Q0 of 3.8E10 at 16 MV/m and 3.6E10 at 21 MV/m in the hori… ▽ More World's first 1.3 GHz cryomodule containing eight 9-cell superconducting radio-frequency (RF) cavities treated by medium-temperature furnace baking (mid-T bake) was developed, assembled and tested at IHEP for the Dalian Advanced Light Source (DALS) and CEPC R&D. The 9-cell cavities in the cryomodule achieved an unprecedented highest average Q0 of 3.8E10 at 16 MV/m and 3.6E10 at 21 MV/m in the horizontal test. The cryomodule can operate stably up to a total CW RF voltage greater than 191 MV, with an average cavity CW accelerating gradient of more than 23 MV/m. The results significantly exceed the specifications of CEPC, DALS and the other high repetition rate free electron laser facilities (LCLS-II, LCLS-II-HE, SHINE, S3FEL). There is evidence that the mid-T bake cavity may not require fast cool-down or long processing time in the cryomodule. This paper reviews the cryomodule performance and discusses some important issues in cryomodule assembly and testing. △ Less

Submitted 2 December, 2023; originally announced December 2023.

Comments: 5 pages, 6 figures

arXiv:2312.00741 [pdf, ps, other]

Crystal: Enhancing Blockchain Mining Transparency with Quorum Certificate

Authors: Jianyu Niu, Fangyu Gai, Runchao Han, Ren Zhang, Yinqian Zhang, Chen Feng

Abstract: Researchers have discovered a series of theoretical attacks against Bitcoin's Nakamoto consensus; the most damaging ones are selfish mining, double-spending, and consistency delay attacks. These attacks have one common cause: block withholding. This paper proposes Crystal, which leverages quorum certificates to resist block withholding misbehavior. Crystal continuously elects committees from miner… ▽ More Researchers have discovered a series of theoretical attacks against Bitcoin's Nakamoto consensus; the most damaging ones are selfish mining, double-spending, and consistency delay attacks. These attacks have one common cause: block withholding. This paper proposes Crystal, which leverages quorum certificates to resist block withholding misbehavior. Crystal continuously elects committees from miners and requires each block to have a quorum certificate, i.e., a set of signatures issued by members of its committee. Consequently, an attacker has to publish its blocks to obtain quorum certificates, rendering block withholding impossible. To build Crystal, we design a novel two-round committee election in a Sybil-resistant, unpredictable and non-interactive way, and a reward mechanism to incentivize miners to follow the protocol. Our analysis and evaluations show that Crystal can significantly mitigate selfish mining and double-spending attacks. For example, in Bitcoin, an attacker with 30% of the total computation power will succeed in double-spending attacks with a probability of 15.6% to break the 6-confirmation rule; however, in Crystal, the success probability for the same attacker falls to 0.62%. We provide formal end-to-end safety proofs for Crystal, ensuring no unknown attacks will be introduced. To the best of our knowledge, Crystal is the first protocol that prevents selfish mining and double-spending attacks while providing safety proof. △ Less

Submitted 1 December, 2023; originally announced December 2023.

Comments: 17 pages, 9 figures

arXiv:2311.08983 [pdf, other]

Edge Accelerated Robot Navigation With Collaborative Motion Planning

Authors: Guoliang Li, Ruihua Han, Shuai Wang, Fei Gao, Yonina C. Eldar, Chengzhong Xu

Abstract: Low-cost distributed robots suffer from limited onboard computing power, resulting in excessive computation time when navigating in cluttered environments. This paper presents Edge Accelerated Robot Navigation (EARN), to achieve real-time collision avoidance by adopting collaborative motion planning (CMP). As such, each robot can dynamically switch between a conservative motion planner executed lo… ▽ More Low-cost distributed robots suffer from limited onboard computing power, resulting in excessive computation time when navigating in cluttered environments. This paper presents Edge Accelerated Robot Navigation (EARN), to achieve real-time collision avoidance by adopting collaborative motion planning (CMP). As such, each robot can dynamically switch between a conservative motion planner executed locally to guarantee safety (e.g., path-following) and an aggressive motion planner executed non-locally to guarantee efficiency (e.g., overtaking). In contrast to existing motion planning approaches that ignore the interdependency between low-level motion planning and high-level resource allocation, EARN adopts model predictive switching (MPS) that maximizes the expected switching gain with respect to robot states and actions under computation and communication resource constraints. The MPS problem is solved by a tightly-coupled decision making and motion planning framework based on bilevel mixed-integer nonlinear programming and penalty dual decomposition. We validate the performance of EARN in indoor simulation, outdoor simulation, and real-world environments. Experiments show that EARN achieves significantly smaller navigation time and higher success rates than state-of-the-art navigation approaches. △ Less

Submitted 25 June, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 12 pages, 13 figures, 3 tables, to appear in IEEE/ASME Transactions on Mechatronics

arXiv:2311.06563 [pdf, ps, other]

All instances of MONOTONE 3-SAT-(3,1) are satisfiable

Authors: Hannah Van Santvliet, Ronald de Haan

Abstract: The satisfiability problem is NP-complete but there are subclasses where all the instances are satisfiable. For this, restrictions on the shape of the formula are made. Darman and Döcker show that the subclass MONOTONE $3$-SAT-($k$,1) with $k \geq 5$ proves to be NP-complete and pose the open question whether instances of MONOTONE $3$-SAT-(3,1) are satisfiable. This paper shows that all instances… ▽ More The satisfiability problem is NP-complete but there are subclasses where all the instances are satisfiable. For this, restrictions on the shape of the formula are made. Darman and Döcker show that the subclass MONOTONE $3$-SAT-($k$,1) with $k \geq 5$ proves to be NP-complete and pose the open question whether instances of MONOTONE $3$-SAT-(3,1) are satisfiable. This paper shows that all instances of MONOTONE $3$-SAT-(3,1) are satisfiable using the new concept of a color-structures. △ Less

Submitted 8 December, 2023; v1 submitted 11 November, 2023; originally announced November 2023.

Comments: 14 pages, 10 figures

arXiv:2311.00387

Mapping electrostatic potential in electrolyte solution

Authors: Bo Huang, Yining Yang, Ruinong Han, Keke Chen, Zhiyuan Wang, Longteng Yun, Yian Wang, Haowei Chen, Yingchao Du, Yuxia Hao, Peng Lv, Haoran Ma, Pengju Ji, Yuemei Tan, Lianmin Zheng, Lihong Liu, Renkai Li, Jie Yang

Abstract: Mapping the electrostatic potential (ESP) distribution around ions in electrolyte solution is crucial for the establishment of a microscopic understanding of electrolyte solution properties. For solutions in the bulk phase, it has not been possible to measure the ESP distribution on Angstrom scale. Here we show that liquid electron scattering experiment using state-of-the-art relativistic electron… ▽ More Mapping the electrostatic potential (ESP) distribution around ions in electrolyte solution is crucial for the establishment of a microscopic understanding of electrolyte solution properties. For solutions in the bulk phase, it has not been possible to measure the ESP distribution on Angstrom scale. Here we show that liquid electron scattering experiment using state-of-the-art relativistic electron beam can be used to measure the Debye screening length of aqueous LiCl, KCl, and KI solutions across a wide range of concentrations. We observe that the Debye screening length is long-ranged at low concentration and short-ranged at high concentration, providing key insight into the decades-long debate over whether the impact of ions in water is long-ranged or short-ranged. In addition, we show that the measured ESP can be used to retrieve the non-local dielectric function of electrolyte solution, which can serve as a promising route to investigate the electrostatic origin of special ion effects. Our observations show that, interaction, as one of the two fundamental perspectives for understanding electrolyte solution, can provide much richer information than structure. △ Less

Submitted 1 February, 2024; v1 submitted 1 November, 2023; originally announced November 2023.

Comments: The small-angle signal in Fig. 2 C-H is highly likely to be an experimental artifact, due to that the electron beam is placed too close to the edge of the liquid sheet. This artifact invalidates the main conclusion of the paper

arXiv:2310.10982 [pdf, other]

SICNav: Safe and Interactive Crowd Navigation using Model Predictive Control and Bilevel Optimization

Authors: Sepehr Samavi, James R. Han, Florian Shkurti, Angela P. Schoellig

Abstract: Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in clo… ▽ More Robots need to predict and react to human motions to navigate through a crowd without collisions. Many existing methods decouple prediction from planning, which does not account for the interaction between robot and human motions and can lead to the robot getting stuck. We propose SICNav, a Model Predictive Control (MPC) method that jointly solves for robot motion and predicted crowd motion in closed-loop. We model each human in the crowd to be following an Optimal Reciprocal Collision Avoidance (ORCA) scheme and embed that model as a constraint in the robot's local planner, resulting in a bilevel nonlinear MPC optimization problem. We use a KKT-reformulation to cast the bilevel problem as a single level and use a nonlinear solver to optimize. Our MPC method can influence pedestrian motion while explicitly satisfying safety constraints in a single-robot multi-human environment. We analyze the performance of SICNav in two simulation environments and indoor experiments with a real robot to demonstrate safe robot motion that can influence the surrounding humans. We also validate the trajectory forecasting performance of ORCA on a human trajectory dataset. △ Less

Submitted 27 May, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

Comments: Currently under review for IEEE Transactions on Robotics (T-RO)

arXiv:2310.08149 [pdf, ps, other]

Semi-relativistic antisymmetrized molecular dynamics for energetic neutron production in intermediate energy heavy-ion reactions

Authors: Q. Hu, G. Y. Tian, R. Wada, X. Q. Liu, W. P. Lin, H. Zheng, Y. P. Zhang, Z. Q. Chen, R. Han, M. R. Huang

Abstract: Relativistic corrections have been made in the non-relativistic antisymmetrized molecular dynamics (AMD) simulations to apply to the high energy neutron production in the $^{12}$C+$^{12}$C and $^{16}$O+$^{12}$C collisions at incident energies of 290 and 400 MeV/nucleon. The corrections are made in kinematics alone and no nucleon-nucleon inelastic scatterings nor meson productions are taken into ac… ▽ More Relativistic corrections have been made in the non-relativistic antisymmetrized molecular dynamics (AMD) simulations to apply to the high energy neutron production in the $^{12}$C+$^{12}$C and $^{16}$O+$^{12}$C collisions at incident energies of 290 and 400 MeV/nucleon. The corrections are made in kinematics alone and no nucleon-nucleon inelastic scatterings nor meson productions are taken into account, and AMD with the relativistic corrections is called semi-relativistic AMD. The three-nucleon collision (3NC) and Fermi boost in the collision processes are taken into account in the non-relativistic AMD. Since the relativistic corrections tend to compensate in each other, the difference between the semi-relativistic and non-relativistic results become small. High energy tails of the available experimental neutron double differential cross sections, especially at larger angles, are well reproduced by AMD with the 3NC term both with non-relativistic and semi-relativistic simulations. These results indicate that the high energy neutrons are dominantly produced by the 3NC process in this incident energy range. △ Less

Submitted 12 October, 2023; originally announced October 2023.

arXiv:2310.07378 [pdf, other]

RLaGA: A Reinforcement Learning Augmented Genetic Algorithm For Searching Real and Diverse Marker-Based Landing Violations

Authors: Linfeng Liang, Yao Deng, Kye Morton, Valtteri Kallinen, Alice James, Avishkar Seth, Endrowednes Kuantama, Subhas Mukhopadhyay, Richard Han, Xi Zheng

Abstract: Automated landing for Unmanned Aerial Vehicles (UAVs), like multirotor drones, requires intricate software encompassing control algorithms, obstacle avoidance, and machine vision, especially when landing markers assist. Failed landings can lead to significant costs from damaged drones or payloads and the time spent seeking alternative landing solutions. Therefore, it's important to fully test auto… ▽ More Automated landing for Unmanned Aerial Vehicles (UAVs), like multirotor drones, requires intricate software encompassing control algorithms, obstacle avoidance, and machine vision, especially when landing markers assist. Failed landings can lead to significant costs from damaged drones or payloads and the time spent seeking alternative landing solutions. Therefore, it's important to fully test auto-landing systems through simulations before deploying them in the real-world to ensure safety. This paper proposes RLaGA, a reinforcement learning (RL) augmented search-based testing framework, which constructs diverse and real marker-based landing cases that involve safety violations. Specifically, RLaGA introduces a genetic algorithm (GA) to conservatively search for diverse static environment configurations offline and RL to aggressively manipulate dynamic objects' trajectories online to find potential vulnerabilities in the target deployment environment. Quantitative results reveal that our method generates up to 22.19% more violation cases and nearly doubles the diversity of generated violation cases compared to baseline methods. Qualitatively, our method can discover those corner cases which would be missed by state-of-the-art algorithms. We demonstrate that select types of these corner cases can be confirmed via real-world testing with drones in the field. △ Less

Submitted 11 October, 2023; v1 submitted 11 October, 2023; originally announced October 2023.

arXiv:2310.04409 [pdf, other]

Metal-Optic Nanophotonic Modulators in Standard CMOS Technology

Authors: Mohamed ElKabbash, Sivan Trajtenberg-Mills, Isaac Harris, Saumil Bandyopadhyay, Mohamed I Ibrahim, Archer Wang, Xibi Chen, Cole Brabec, Hasan Z. Yildiz, Ruonan Han, Dirk Englund

Abstract: Integrating nanophotonics with electronics promises revolutionary applications, from LiDAR to holographic displays. Although silicon photonics is maturing, realizing active nanophotonics in the ubiquitous bulk CMOS processes remains challenging. We introduce a fabless approach to embed active nanophotonics in bulk CMOS by co-designing the back-end-of-line metal layers for optical functionality. Us… ▽ More Integrating nanophotonics with electronics promises revolutionary applications, from LiDAR to holographic displays. Although silicon photonics is maturing, realizing active nanophotonics in the ubiquitous bulk CMOS processes remains challenging. We introduce a fabless approach to embed active nanophotonics in bulk CMOS by co-designing the back-end-of-line metal layers for optical functionality. Using a 65nm CMOS process, we create plasmonic liquid crystal modulators with switching speeds 100x faster than commercial technologies. This zero-change nanophotonics method could equip mass-produced chips with optical communications, sensing and imaging. Embedding nanophotonics in the dominant electronics platform democratizes nanofabrication, spawning technologies from chip-scale LiDAR to holographic light-field displays. △ Less

Submitted 16 November, 2023; v1 submitted 6 October, 2023; originally announced October 2023.

arXiv:2310.04369 [pdf, other]

MBTFNet: Multi-Band Temporal-Frequency Neural Network For Singing Voice Enhancement

Authors: Weiming Xu, Zhouxuan Chen, Zhili Tan, Shubo Lv, Runduo Han, Wenjiang Zhou, Weifeng Zhao, Lei Xie

Abstract: A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-freque… ▽ More A typical neural speech enhancement (SE) approach mainly handles speech and noise mixtures, which is not optimal for singing voice enhancement scenarios. Music source separation (MSS) models treat vocals and various accompaniment components equally, which may reduce performance compared to the model that only considers vocal enhancement. In this paper, we propose a novel multi-band temporal-frequency neural network (MBTFNet) for singing voice enhancement, which particularly removes background music, noise and even backing vocals from singing recordings. MBTFNet combines inter and intra-band modeling for better processing of full-band signals. Dual-path modeling are introduced to expand the receptive field of the model. We propose an implicit personalized enhancement (IPE) stage based on signal-to-noise ratio (SNR) estimation, which further improves the performance of MBTFNet. Experiments show that our proposed model significantly outperforms several state-of-the-art SE and MSS models. △ Less

Submitted 6 October, 2023; originally announced October 2023.

arXiv:2309.09217 [pdf, other]

CryoAlign: feature-based method for global and local 3D alignment of EM density maps

Authors: Bintao He, Fa Zhang, Chenjie Feng, Jianyi Yang, Xin Gao, Renmin Han

Abstract: Advances on cryo-electron imaging technologies have led to a rapidly increasing number of density maps. Alignment and comparison of density maps play a crucial role in interpreting structural information, such as conformational heterogeneity analysis using global alignment and atomic model assembly through local alignment. Here, we propose a fast and accurate global and local cryo-electron microsc… ▽ More Advances on cryo-electron imaging technologies have led to a rapidly increasing number of density maps. Alignment and comparison of density maps play a crucial role in interpreting structural information, such as conformational heterogeneity analysis using global alignment and atomic model assembly through local alignment. Here, we propose a fast and accurate global and local cryo-electron microscopy density map alignment method CryoAlign, which leverages local density feature descriptors to capture spatial structure similarities. CryoAlign is the first feature-based EM map alignment tool, in which the employment of feature-based architecture enables the rapid establishment of point pair correspondences and robust estimation of alignment parameters. Extensive experimental evaluations demonstrate the superiority of CryoAlign over the existing methods in both alignment accuracy and speed. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.07109 [pdf, ps, other]

Real-time Monitoring for the Next Core-Collapse Supernova in JUNO

Authors: Angel Abusleme, Thomas Adam, Shakeel Ahmad, Rizwan Ahmed, Sebastiano Aiello, Muhammad Akram, Abid Aleem, Fengpeng An, Qi An, Giuseppe Andronico, Nikolay Anfimov, Vito Antonelli, Tatiana Antoshkina, Burin Asavapibhop, João Pedro Athayde Marcondes de André, Didier Auguste, Weidong Bai, Nikita Balashov, Wander Baldini, Andrea Barresi, Davide Basilico, Eric Baussan, Marco Bellato, Marco Beretta, Antonio Bergnoli , et al. (606 additional authors not shown)

Abstract: The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neu… ▽ More The core-collapse supernova (CCSN) is considered one of the most energetic astrophysical events in the universe. The early and prompt detection of neutrinos before (pre-SN) and during the supernova (SN) burst presents a unique opportunity for multi-messenger observations of CCSN events. In this study, we describe the monitoring concept and present the sensitivity of the system to pre-SN and SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), a 20 kton liquid scintillator detector currently under construction in South China. The real-time monitoring system is designed to ensure both prompt alert speed and comprehensive coverage of progenitor stars. It incorporates prompt monitors on the electronic board as well as online monitors at the data acquisition stage. Assuming a false alert rate of 1 per year, this monitoring system exhibits sensitivity to pre-SN neutrinos up to a distance of approximately 1.6 (0.9) kiloparsecs and SN neutrinos up to about 370 (360) kiloparsecs for a progenitor mass of 30 solar masses, considering both normal and inverted mass ordering scenarios. The pointing ability of the CCSN is evaluated by analyzing the accumulated event anisotropy of inverse beta decay interactions from pre-SN or SN neutrinos. This, along with the early alert, can play a crucial role in facilitating follow-up multi-messenger observations of the next galactic or nearby extragalactic CCSN. △ Less

Submitted 4 December, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: 24 pages, 9 figures, accepted for the publication at JCAP

arXiv:2309.03423 [pdf, ps, other]

Non-perturbative localization for quasi-periodic Jacobi block matrices

Authors: Rui Han, Wilhelm Schlag

Abstract: We prove non-perturbative Anderson localization for quasi-periodic Jacobi block matrix operators assuming non-vanishing of all Lyapunov exponents. The base dynamics on tori $\mathbb{T}^b$ is assumed to be a Diophantine rotation. Results on arithmetic localization are obtained for $b=1$, and applications to the skew shift, stacked graphene, XY spin chains, and coupled Harper models are discussed. We prove non-perturbative Anderson localization for quasi-periodic Jacobi block matrix operators assuming non-vanishing of all Lyapunov exponents. The base dynamics on tori $\mathbb{T}^b$ is assumed to be a Diophantine rotation. Results on arithmetic localization are obtained for $b=1$, and applications to the skew shift, stacked graphene, XY spin chains, and coupled Harper models are discussed. △ Less

Submitted 6 September, 2023; originally announced September 2023.

Comments: 56 pages

arXiv:2308.15053 [pdf, other]

Adapting Text-based Dialogue State Tracker for Spoken Dialogues

Authors: Jaeseok Yoon, Seunghyun Hwang, Ran Han, Jeonguk Bang, Kee-Eung Kim

Abstract: Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are ve… ▽ More Although there have been remarkable advances in dialogue systems through the dialogue systems technology competition (DSTC), it remains one of the key challenges to building a robust task-oriented dialogue system with a speech interface. Most of the progress has been made for text-based dialogue systems since there are abundant datasets with written corpora while those with spoken dialogues are very scarce. However, as can be seen from voice assistant systems such as Siri and Alexa, it is of practical importance to transfer the success to spoken dialogues. In this paper, we describe our engineering effort in building a highly successful model that participated in the speech-aware dialogue systems technology challenge track in DSTC11. Our model consists of three major modules: (1) automatic speech recognition error correction to bridge the gap between the spoken and the text utterances, (2) text-based dialogue system (D3ST) for estimating the slots and values using slot descriptions, and (3) post-processing for recovering the error of the estimated slot value. Our experiments show that it is important to use an explicit automatic speech recognition error correction module, post-processing, and data augmentation to adapt a text-based dialogue state tracker for spoken dialogue corpora. △ Less

Submitted 9 January, 2024; v1 submitted 29 August, 2023; originally announced August 2023.

Comments: 8 pages, 5 figures, In Proceedings of The Eleventh Dialog System Technology Challenge, Association for Computational Linguistics

arXiv:2308.14289 [pdf, other]

Heterogeneous integration of spin-photon interfaces with a scalable CMOS platform

Authors: Linsen Li, Lorenzo De Santis, Isaac Harris, Kevin C. Chen, Yihuai Gao, Ian Christen, Matthew Trusheim, Hyeongrak Choi, Yixuan Song, Carlos Errando-Herranz, Jiahui Du, Yong Hu, Genevieve Clark, Mohamed I. Ibrahim, Gerald Gilbert, Ruonan Han, Dirk Englund

Abstract: Color centers in diamonds have emerged as a leading solid-state platform for advancing quantum technologies, satisfying the DiVincenzo criteria and recently achieving a quantum advantage in secret key distribution. Recent theoretical works estimate that general-purpose quantum computing using local quantum communication networks will require millions of physical qubits to encode thousands of logic… ▽ More Color centers in diamonds have emerged as a leading solid-state platform for advancing quantum technologies, satisfying the DiVincenzo criteria and recently achieving a quantum advantage in secret key distribution. Recent theoretical works estimate that general-purpose quantum computing using local quantum communication networks will require millions of physical qubits to encode thousands of logical qubits, which presents a substantial challenge to the hardware architecture at this scale. To address the unanswered scaling problem, in this work, we first introduce a scalable hardware modular architecture "Quantum System-on-Chip" (QSoC) that features compact two-dimensional arrays "quantum microchiplets" (QMCs) containing tin-vacancy (SnV-) spin qubits integrated on a cryogenic application-specific integrated circuit (ASIC). We demonstrate crucial architectural subcomponents, including (1) QSoC fabrication via a lock-and-release method for large-scale heterogeneous integration; (2) a high-throughput calibration of the QSoC for spin qubit spectral inhomogenous registration; (3) spin qubit spectral tuning functionality for inhomogenous compensation; (4) efficient spin-state preparation and measurement for improved spin and optical properties. QSoC architecture supports full connectivity for quantum memory arrays in a set of different resonant frequencies and offers the possibility for further scaling the number of solid-state physical qubits via larger and denser QMC arrays and optical frequency multiplexing networking. △ Less

Submitted 20 December, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: 26 pages, 15 figures. Comments welcome

Showing 1–50 of 427 results for author: Han, R