-
DMM: Disparity-guided Multispectral Mamba for Oriented Object Detection in Remote Sensing
Authors:
Minghang Zhou,
Tianyu Li,
Chaofan Qiao,
Dongyu Xie,
Guoqing Wang,
Ningjuan Ruan,
Lin Mei,
Yang Yang
Abstract:
Multispectral oriented object detection faces challenges due to both inter-modal and intra-modal discrepancies. Recent studies often rely on transformer-based models to address these issues and achieve cross-modal fusion detection. However, the quadratic computational complexity of transformers limits their performance. Inspired by the efficiency and lower complexity of Mamba in long sequence task…
▽ More
Multispectral oriented object detection faces challenges due to both inter-modal and intra-modal discrepancies. Recent studies often rely on transformer-based models to address these issues and achieve cross-modal fusion detection. However, the quadratic computational complexity of transformers limits their performance. Inspired by the efficiency and lower complexity of Mamba in long sequence tasks, we propose Disparity-guided Multispectral Mamba (DMM), a multispectral oriented object detection framework comprised of a Disparity-guided Cross-modal Fusion Mamba (DCFM) module, a Multi-scale Target-aware Attention (MTA) module, and a Target-Prior Aware (TPA) auxiliary task. The DCFM module leverages disparity information between modalities to adaptively merge features from RGB and IR images, mitigating inter-modal conflicts. The MTA module aims to enhance feature representation by focusing on relevant target regions within the RGB modality, addressing intra-modal variations. The TPA auxiliary task utilizes single-modal labels to guide the optimization of the MTA module, ensuring it focuses on targets and their local context. Extensive experiments on the DroneVehicle and VEDAI datasets demonstrate the effectiveness of our method, which outperforms state-of-the-art methods while maintaining computational efficiency. Code will be available at https://github.com/Another-0/DMM.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
VDAC Solvation Free Energy Calculation by a Nonuniform Size Modified Poisson-Boltzmann Ion Channel Model
Authors:
Liam Jemison,
Matthew Stahl,
Ranjan K. Dash,
Dexuan Xie
Abstract:
The Voltage-Dependent Anion Channel (VDAC) protein is the primary conduit for the regulated passage of ions and metabolites into and out of mitochondria. Calculating its solvation free energy is crucial for understanding its stability, function, and interactions within the cellular environment. In this paper, we introduce a total solvation free energy, $E$, which is the sum of electrostatic, ideal…
▽ More
The Voltage-Dependent Anion Channel (VDAC) protein is the primary conduit for the regulated passage of ions and metabolites into and out of mitochondria. Calculating its solvation free energy is crucial for understanding its stability, function, and interactions within the cellular environment. In this paper, we introduce a total solvation free energy, $E$, which is the sum of electrostatic, ideal gas, and excess free energies, along with a non-polar energy to yield a zero of $E$ in the absence of charges. We develop numerical schemes for computing $E$ and update the current mesh generation package to accelerate the generation of tetrahedral meshes and improve the quality of meshes for computing $E$. By integrating these schemes and the updated mesh package with our non-uniform size modified Poisson-Boltzmann ion channel (nuSMPBIC), SMPBIC, and PBIC finite element packages, the PDB2PQR package, and the OPM database, we create the VDAC Solvation Free Energy Calculation (VSFEC) package. Using the VSFEC package, we perform comparison tests on the nuSMPBIC, SMPBIC, and PBIC models by using six VDAC proteins and various ionic solutions containing up to four ionic species, including ATP$^{4-}$ and Ca$^{2+}$. We also conduct tests with different neutral voltages and permittivity constants to explore the varying patterns of $E$. Our test results underscore the importance of considering non-uniform ionic size effects and demonstrate the high performance of the VSFEC package in calculating the solvation free energy of VDAC proteins.
△ Less
Submitted 25 May, 2024;
originally announced July 2024.
-
On the minimal number of closed geodesics on positively curved Finsler spheres
Authors:
Huagui Duan,
Dong Xie
Abstract:
In this paper, we proved that for every Finsler metric on $S^n$ $(n\ge 4)$ with reversibility $λ$ and flag curvature $K$ satisfying $(\frac{2n-3}{n-1})^2 (\fracλ{λ+1})^2<K\le 1$ and $ λ<\frac{n-1}{n-2} $, there exist at least $n$ prime closed geodesics on $(S^n,F)$, which solved a conjecture of Katok and Anosov for such positivley curved spheres when $n$ is even. Furthermore, if the number of clos…
▽ More
In this paper, we proved that for every Finsler metric on $S^n$ $(n\ge 4)$ with reversibility $λ$ and flag curvature $K$ satisfying $(\frac{2n-3}{n-1})^2 (\fracλ{λ+1})^2<K\le 1$ and $ λ<\frac{n-1}{n-2} $, there exist at least $n$ prime closed geodesics on $(S^n,F)$, which solved a conjecture of Katok and Anosov for such positivley curved spheres when $n$ is even. Furthermore, if the number of closed geodesics on such positively curved Finsler $S^n$ is finite, then there exist at least $2\left[\frac{n}{2}\right]-1$ non-hyperbolic closed geodesics.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
Authors:
Desai Xie,
Sai Bi,
Zhixin Shu,
Kai Zhang,
Zexiang Xu,
Yi Zhou,
Sören Pirk,
Arie Kaufman,
Xin Sun,
Hao Tan
Abstract:
We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D data…
▽ More
We present LRM-Zero, a Large Reconstruction Model (LRM) trained entirely on synthesized 3D data, achieving high-quality sparse-view 3D reconstruction. The core of LRM-Zero is our procedural 3D dataset, Zeroverse, which is automatically synthesized from simple primitive shapes with random texturing and augmentations (e.g., height fields, boolean differences, and wireframes). Unlike previous 3D datasets (e.g., Objaverse) which are often captured or crafted by humans to approximate real 3D data, Zeroverse completely ignores realistic global semantics but is rich in complex geometric and texture details that are locally similar to or even more intricate than real objects. We demonstrate that our LRM-Zero, trained with our fully synthesized Zeroverse, can achieve high visual quality in the reconstruction of real-world objects, competitive with models trained on Objaverse. We also analyze several critical design choices of Zeroverse that contribute to LRM-Zero's capability and training stability. Our work demonstrates that 3D reconstruction, one of the core tasks in 3D vision, can potentially be addressed without the semantics of real-world objects. The Zeroverse's procedural synthesis code and interactive visualization are available at: https://desaixie.github.io/lrm-zero/.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
FreeV: Free Lunch For Vocoders Through Pseudo Inversed Mel Filter
Authors:
Yuanjun Lv,
Hai Li,
Ying Yan,
Junhui Liu,
Danming Xie,
Lei Xie
Abstract:
Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed while achieving comparable audio quality. However, these frequency-domain vocoders suffer from large parameter sizes, thus introducing extra memory bu…
▽ More
Vocoders reconstruct speech waveforms from acoustic features and play a pivotal role in modern TTS systems. Frequent-domain GAN vocoders like Vocos and APNet2 have recently seen rapid advancements, outperforming time-domain models in inference speed while achieving comparable audio quality. However, these frequency-domain vocoders suffer from large parameter sizes, thus introducing extra memory burden. Inspired by PriorGrad and SpecGrad, we employ pseudo-inverse to estimate the amplitude spectrum as the initialization roughly. This simple initialization significantly mitigates the parameter demand for vocoder. Based on APNet2 and our streamlined Amplitude prediction branch, we propose our FreeV, compared with its counterpart APNet2, our FreeV achieves 1.8 times inference speed improvement with nearly half parameters. Meanwhile, our FreeV outperforms APNet2 in resynthesis quality, marking a step forward in pursuing real-time, high-fidelity speech synthesis. Code and checkpoints is available at: https://github.com/BakerBunker/FreeV
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
An adaptive parameter estimator for poor-quality spectral data of white dwarfs
Authors:
Duo Xie,
Jiangchuan Zhang,
Yude Bu,
Zhenping Yi,
Meng Liu,
Xiaoming Kong
Abstract:
White dwarfs represent the end stage for 97% of stars, making precise parameter measurement crucial for understanding stellar evolution. Traditional estimation methods involve fitting spectra or photometry, which require high-quality data. In recent years, machine learning has played a crucial role in processing spectral data due to its speed, automation, and accuracy. However, two common issues h…
▽ More
White dwarfs represent the end stage for 97% of stars, making precise parameter measurement crucial for understanding stellar evolution. Traditional estimation methods involve fitting spectra or photometry, which require high-quality data. In recent years, machine learning has played a crucial role in processing spectral data due to its speed, automation, and accuracy. However, two common issues have been identified. First, most studies rely on data with high signal-to-noise ratios (SNR > 10), leaving many poor-quality datasets underutilized. Second, existing machine learning models, primarily based on convolutional networks, recurrent networks, and their variants, cannot simultaneously capture both the spatial and sequential information of spectra. To address these challenges, we designed the Estimator Network (EstNet), an advanced algorithm integrating multiple techniques, including Residual Networks, Squeeze and Excitation Attention, Gated Recurrent Units, Adaptive Loss, and Monte-Carlo Dropout Layers. We conducted parameter estimation on 5,965 poor-quality white dwarf spectra (R~1800, SNR~1.17), achieving average percentage errors of 14.86% for effective temperature and 3.97% for surface gravity. These results are significantly superior to other mainstream algorithms and consistent with the outcomes of traditional theoretical spectrum fitting methods. In the future, our algorithms will be applied for large-scale parameter estimation on the Chinese Space Station Telescope and the Large Synoptic Survey Telescope.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
An Efficient Finite Element Solver for a Nonuniform size-modified Poisson-Nernst-Planck Ion Channel Model
Authors:
Dexuan Xie
Abstract:
This paper presents an efficient finite element iterative method for solving a nonuniform size-modified Poisson-Nernst-Planck ion channel (SMPNPIC) model, along with a SMPNPIC program package that works for an ion channel protein with a three-dimensional crystallographic structure and an ionic solvent with multiple ionic species. In particular, the SMPNPIC model is constructed and then reformulate…
▽ More
This paper presents an efficient finite element iterative method for solving a nonuniform size-modified Poisson-Nernst-Planck ion channel (SMPNPIC) model, along with a SMPNPIC program package that works for an ion channel protein with a three-dimensional crystallographic structure and an ionic solvent with multiple ionic species. In particular, the SMPNPIC model is constructed and then reformulated by novel mathematical techniques so that each iteration of the method only involves linear boundary value problems and nonlinear algebraic systems, circumventing the numerical difficulties caused by the strong nonlinearities, strong asymmetries, and strong differential equation coupling of the SMPNPIC model. To further improve the method's efficiency, an efficient modified Newton iterative method is adapted to the numerical solution of each related nonlinear algebraic system. Numerical results for a voltage-dependent anion channel (VDAC) and a mixture solution of four ionic species demonstrate the method's convergence, the package's high performance, and the importance of considering nonuniform ion size effects. They also partially validate the SMPNPIC model by the anion selectivity property of VDAC.
△ Less
Submitted 2 May, 2024;
originally announced May 2024.
-
Quantum metrology in a driven-dissipation down-conversion system beyond the parametric approximation
Authors:
Dong Xie,
Chunling Xu
Abstract:
We investigate quantum metrology in a degenerate down-conversion system composed of a pump mode and two degenerate signal modes. In the conventional parametric approximation, the pump mode is assumed to be constant, not a quantum operator. We obtain the measurement precision of the coupling strength between the pump mode and two degenerate signal modes beyond the parametric approximation. Without…
▽ More
We investigate quantum metrology in a degenerate down-conversion system composed of a pump mode and two degenerate signal modes. In the conventional parametric approximation, the pump mode is assumed to be constant, not a quantum operator. We obtain the measurement precision of the coupling strength between the pump mode and two degenerate signal modes beyond the parametric approximation. Without a dissipation, the super-Heisenberg limit can be obtained when the initial state is the direct product of classical state and quantum state. This does not require the use of entanglement resources which are not easy to prepare. When the pump mode suffers from a single-photon dissipation, the measurement uncertainty of the coupling strength is close to 0 as the coupling strength approaches 0 with a coherent driving. The direct photon detection is proved to be the optimal measurement. This result has not been changed when the signal modes suffer from the two-photon dissipation. When the signal modes also suffer from the single-mode dissipation, the information of the coupling strength can still be obtained in the steady state. In addition, the measurement uncertainty of the coupling strength can also be close to 0 and become independent of noise temperature as the critical point between the normal and superradiance phase approaches. Finally, we show that a driven-dissipation down-conversion system can be used as a precise quantum sensor to measure the driving strength.
△ Less
Submitted 24 April, 2024;
originally announced April 2024.
-
Deep learning-driven pulmonary arteries and veins segmentation reveals demography-associated pulmonary vasculature anatomy
Authors:
Yuetan Chu,
Gongning Luo,
Longxi Zhou,
Shaodong Cao,
Guolin Ma,
Xianglin Meng,
Juexiao Zhou,
Changchun Yang,
Dexuan Xie,
Ricardo Henao,
Xigang Xiao,
Lianming Wu,
Zhaowen Qiu,
Xin Gao
Abstract:
Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-…
▽ More
Pulmonary artery-vein segmentation is crucial for diagnosing pulmonary diseases and surgical planning, and is traditionally achieved by Computed Tomography Pulmonary Angiography (CTPA). However, concerns regarding adverse health effects from contrast agents used in CTPA have constrained its clinical utility. In contrast, identifying arteries and veins using non-contrast CT, a conventional and low-cost clinical examination routine, has long been considered impossible. Here we propose a High-abundant Pulmonary Artery-vein Segmentation (HiPaS) framework achieving accurate artery-vein segmentation on both non-contrast CT and CTPA across various spatial resolutions. HiPaS first performs spatial normalization on raw CT scans via a super-resolution module, and then iteratively achieves segmentation results at different branch levels by utilizing the low-level vessel segmentation as a prior for high-level vessel segmentation. We trained and validated HiPaS on our established multi-centric dataset comprising 1,073 CT volumes with meticulous manual annotation. Both quantitative experiments and clinical evaluation demonstrated the superior performance of HiPaS, achieving a dice score of 91.8% and a sensitivity of 98.0%. Further experiments demonstrated the non-inferiority of HiPaS segmentation on non-contrast CT compared to segmentation on CTPA. Employing HiPaS, we have conducted an anatomical study of pulmonary vasculature on 10,613 participants in China (five sites), discovering a new association between pulmonary vessel abundance and sex and age: vessel abundance is significantly higher in females than in males, and slightly decreases with age, under the controlling of lung volumes (p < 0.0001). HiPaS realizing accurate artery-vein segmentation delineates a promising avenue for clinical diagnosis and understanding pulmonary physiology in a non-invasive manner.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
Modularity for $\mathcal{W}$-algebras and affine Springer fibres
Authors:
Peng Shan,
Dan Xie,
Wenbin Yan
Abstract:
We construct a bijection between admissible representations for an affine Lie algebra $\mathfrak{g}$ at boundary admissible levels and $\mathbb{C}^\times$ fixed points in homogeneous elliptic affine Springer fibres for the Langlands dual affine Lie algebra $\mathfrak{g}^\vee$. Using this bijection, we relate the modularity of the characters of admissible representations to Cherednik's Verlinde alg…
▽ More
We construct a bijection between admissible representations for an affine Lie algebra $\mathfrak{g}$ at boundary admissible levels and $\mathbb{C}^\times$ fixed points in homogeneous elliptic affine Springer fibres for the Langlands dual affine Lie algebra $\mathfrak{g}^\vee$. Using this bijection, we relate the modularity of the characters of admissible representations to Cherednik's Verlinde algebra construction coming from double affine Hecke algebras. Finally, we show that the expected behaviors of simple modules under quantized Drinfeld-Sokolov reductions are compatible with the reductions from affine Springer fibres to affine Spaltenstein varieties. This yields (modulo some conjectures) a similar bijection for irreducible representations of $\mathcal{W}$-algebras, as well as an interpretation for their modularity properties.
△ Less
Submitted 31 March, 2024;
originally announced April 2024.
-
Learning Expressive And Generalizable Motion Features For Face Forgery Detection
Authors:
Jingyi Zhang,
Peng Zhang,
Jingjing Wang,
Di Xie,
Shiliang Pu
Abstract:
Previous face forgery detection methods mainly focus on appearance features, which may be easily attacked by sophisticated manipulation. Considering the majority of current face manipulation methods generate fake faces based on a single frame, which do not take frame consistency and coordination into consideration, artifacts on frame sequences are more effective for face forgery detection. However…
▽ More
Previous face forgery detection methods mainly focus on appearance features, which may be easily attacked by sophisticated manipulation. Considering the majority of current face manipulation methods generate fake faces based on a single frame, which do not take frame consistency and coordination into consideration, artifacts on frame sequences are more effective for face forgery detection. However, current sequence-based face forgery detection methods use general video classification networks directly, which discard the special and discriminative motion information for face manipulation detection. To this end, we propose an effective sequence-based forgery detection framework based on an existing video classification method. To make the motion features more expressive for manipulation detection, we propose an alternative motion consistency block instead of the original motion features module. To make the learned features more generalizable, we propose an auxiliary anomaly detection block. With these two specially designed improvements, we make a general video classification network achieve promising results on three popular face forgery datasets.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
CLIP-Gaze: Towards General Gaze Estimation via Visual-Linguistic Model
Authors:
Pengwei Yin,
Guanzhong Zeng,
Jingjing Wang,
Di Xie
Abstract:
Gaze estimation methods often experience significant performance degradation when evaluated across different domains, due to the domain gap between the testing and training data. Existing methods try to address this issue using various domain generalization approaches, but with little success because of the limited diversity of gaze datasets, such as appearance, wearable, and image quality. To ove…
▽ More
Gaze estimation methods often experience significant performance degradation when evaluated across different domains, due to the domain gap between the testing and training data. Existing methods try to address this issue using various domain generalization approaches, but with little success because of the limited diversity of gaze datasets, such as appearance, wearable, and image quality. To overcome these limitations, we propose a novel framework called CLIP-Gaze that utilizes a pre-trained vision-language model to leverage its transferable knowledge. Our framework is the first to leverage the vision-and-language cross-modality approach for gaze estimation task. Specifically, we extract gaze-relevant feature by pushing it away from gaze-irrelevant features which can be flexibly constructed via language descriptions. To learn more suitable prompts, we propose a personalized context optimization method for text prompt tuning. Furthermore, we utilize the relationship among gaze samples to refine the distribution of gaze-relevant features, thereby improving the generalization capability of the gaze estimation model. Extensive experiments demonstrate the excellent performance of CLIP-Gaze over existing methods on four cross-domain evaluations.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Arbitrary-Scale Point Cloud Upsampling by Voxel-Based Network with Latent Geometric-Consistent Learning
Authors:
Hang Du,
Xuejun Yan,
Jingjing Wang,
Di Xie,
Shiliang Pu
Abstract:
Recently, arbitrary-scale point cloud upsampling mechanism became increasingly popular due to its efficiency and convenience for practical applications. To achieve this, most previous approaches formulate it as a problem of surface approximation and employ point-based networks to learn surface representations. However, learning surfaces from sparse point clouds is more challenging, and thus they o…
▽ More
Recently, arbitrary-scale point cloud upsampling mechanism became increasingly popular due to its efficiency and convenience for practical applications. To achieve this, most previous approaches formulate it as a problem of surface approximation and employ point-based networks to learn surface representations. However, learning surfaces from sparse point clouds is more challenging, and thus they often suffer from the low-fidelity geometry approximation. To address it, we propose an arbitrary-scale Point cloud Upsampling framework using Voxel-based Network (\textbf{PU-VoxelNet}). Thanks to the completeness and regularity inherited from the voxel representation, voxel-based networks are capable of providing predefined grid space to approximate 3D surface, and an arbitrary number of points can be reconstructed according to the predicted density distribution within each grid cell. However, we investigate the inaccurate grid sampling caused by imprecise density predictions. To address this issue, a density-guided grid resampling method is developed to generate high-fidelity points while effectively avoiding sampling outliers. Further, to improve the fine-grained details, we present an auxiliary training supervision to enforce the latent geometric consistency among local surface patches. Extensive experiments indicate the proposed approach outperforms the state-of-the-art approaches not only in terms of fixed upsampling rates but also for arbitrary-scale upsampling.
△ Less
Submitted 8 March, 2024;
originally announced March 2024.
-
Intelligent Traffic Monitoring with Distributed Acoustic Sensing
Authors:
Dongzi Xie,
Xinming Wu,
Zhixiang Guo,
Heting Hong,
Baoshan Wang,
Yingjiao Rong
Abstract:
Distributed Acoustic Sensing (DAS) is promising for traffic monitoring, but its extensive data and sensitivity to vibrations, causing noise, pose computational challenges. To address this, we propose a two-step deep-learning workflow with high efficiency and noise immunity for DAS-based traffic monitoring, focusing on instance vehicle trajectory segmentation and velocity estimation. Our approach b…
▽ More
Distributed Acoustic Sensing (DAS) is promising for traffic monitoring, but its extensive data and sensitivity to vibrations, causing noise, pose computational challenges. To address this, we propose a two-step deep-learning workflow with high efficiency and noise immunity for DAS-based traffic monitoring, focusing on instance vehicle trajectory segmentation and velocity estimation. Our approach begins by generating a diverse synthetic DAS dataset with labeled vehicle signals, tackling the issue of missing training labels in this field. This dataset is used to train a Convolutional Neural Network (CNN) to detect linear vehicle trajectories from the noisy DAS data in the time-space domain. However, due to significant noise, these trajectories are often fragmented and incomplete. To enhance accuracy, we introduce a second step involving the Hough transform. This converts detected linear features into point-like energy clusters in the Hough domain. Another CNN is then employed to focus on these energies, identifying the most significant points. The inverse Hough transform is applied to these points to reconstruct complete, distinct, and noise-free linear vehicle trajectories in the time-space domain. The Hough transform plays a crucial role by enforcing a local linearity constraint on the trajectories, enhancing continuity and noise immunity, and facilitating the separation of individual trajectories and estimation of vehicle velocities (indicated by trajectory slopes in the Hough domain). Our method has shown effectiveness in real-world datasets, proving its value in real-time processing of DAS data and applicability in similar traffic monitoring scenarios. All related codes and data are available at https://github.com/TTMuTian/itm/.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
On duality of four dimensional $\mathcal{N}=1$ gauge theory
Authors:
Yuanyuan Fang,
Jing Feng,
Dan Xie
Abstract:
We show that Seiberg-like duality of $\mathcal{N}=1$ gauge theory coupled with tensor chiral fields and fundamental chiral fields works if the meson spectrum built from the tensor fields takes particular form: a) It should be truncated; b) The $R$ charges of tensor fields $\{R_a\}$ and the truncated mesons $\{R_j\}$ take very special values. The meson spectrum so that the duality works is encoded…
▽ More
We show that Seiberg-like duality of $\mathcal{N}=1$ gauge theory coupled with tensor chiral fields and fundamental chiral fields works if the meson spectrum built from the tensor fields takes particular form: a) It should be truncated; b) The $R$ charges of tensor fields $\{R_a\}$ and the truncated mesons $\{R_j\}$ take very special values. The meson spectrum so that the duality works is encoded elegantly in the factorization of the polynomial $y^n-1=Φ_{+}Φ_{-}$. Our consideration covers many known $\mathcal{N}=1$ dualities and generates a large class of new examples.
△ Less
Submitted 4 March, 2024;
originally announced March 2024.
-
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach
Authors:
Lingyu Gu,
Yongqi Du,
Yuan Zhang,
Di Xie,
Shiliang Pu,
Robert C. Qiu,
Zhenyu Liao
Abstract:
Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, s…
▽ More
Modern deep neural networks (DNNs) are extremely powerful; however, this comes at the price of increased depth and having more parameters per layer, making their training and inference more computationally challenging. In an attempt to address this key limitation, efforts have been devoted to the compression (e.g., sparsification and/or quantization) of these large-scale machine learning models, so that they can be deployed on low-power IoT devices. In this paper, building upon recent advances in neural tangent kernel (NTK) and random matrix theory (RMT), we provide a novel compression approach to wide and fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in the high-dimensional regime where the number of data points $n$ and their dimension $p$ are both large, and under a Gaussian mixture model for the data, there exists \emph{asymptotic spectral equivalence} between the NTK matrices for a large family of DNN models. This theoretical result enables "lossless" compression of a given DNN to be performed, in the sense that the compressed network yields asymptotically the same NTK as the original (dense and unquantized) network, with its weights and activations taking values \emph{only} in $\{ 0, \pm 1 \}$ up to a scaling. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme, with code available at \url{https://github.com/Model-Compression/Lossless_Compression}.
△ Less
Submitted 29 February, 2024;
originally announced March 2024.
-
Label Learning Method Based on Tensor Projection
Authors:
Jing Li,
Quanxue Gao,
Qianqian Wang,
Cheng Deng,
Deyan Xie
Abstract:
Multi-view clustering method based on anchor graph has been widely concerned due to its high efficiency and effectiveness. In order to avoid post-processing, most of the existing anchor graph-based methods learn bipartite graphs with connected components. However, such methods have high requirements on parameters, and in some cases it may not be possible to obtain bipartite graphs with clear conne…
▽ More
Multi-view clustering method based on anchor graph has been widely concerned due to its high efficiency and effectiveness. In order to avoid post-processing, most of the existing anchor graph-based methods learn bipartite graphs with connected components. However, such methods have high requirements on parameters, and in some cases it may not be possible to obtain bipartite graphs with clear connected components. To end this, we propose a label learning method based on tensor projection (LLMTP). Specifically, we project anchor graph into the label space through an orthogonal projection matrix to obtain cluster labels directly. Considering that the spatial structure information of multi-view data may be ignored to a certain extent when projected in different views separately, we extend the matrix projection transformation to tensor projection, so that the spatial structure information between views can be fully utilized. In addition, we introduce the tensor Schatten $p$-norm regularization to make the clustering label matrices of different views as consistent as possible. Extensive experiments have proved the effectiveness of the proposed method.
△ Less
Submitted 26 February, 2024;
originally announced February 2024.
-
A PNP ion channel deep learning solver with local neural network and finite element input data
Authors:
Hwi Lee,
Zhen Chao,
Harris Cobb,
Yingjie Liu,
Dexuan Xie
Abstract:
In this paper, a deep learning method for solving an improved one-dimensional Poisson-Nernst-Planck ion channel (PNPic) model, called the PNPic deep learning solver, is presented. In particular, it combines a novel local neural network scheme with an effective PNPic finite element solver. Since the input data of the neural network scheme only involves a small local patch of coarse grid solutions,…
▽ More
In this paper, a deep learning method for solving an improved one-dimensional Poisson-Nernst-Planck ion channel (PNPic) model, called the PNPic deep learning solver, is presented. In particular, it combines a novel local neural network scheme with an effective PNPic finite element solver. Since the input data of the neural network scheme only involves a small local patch of coarse grid solutions, which the finite element solver can quickly produce, the PNPic deep learning solver can be trained much faster than any corresponding conventional global neural network solvers. After properly trained, it can output a predicted PNPic solution in a much higher degree of accuracy than the low cost coarse grid solutions and can reflect different perturbation cases on the parameters, ion channel subregions, and interface and boundary values, etc. Consequently, the PNPic deep learning solver can generate a numerical solution with high accuracy for a family of PNPic models. As an initial study, two types of numerical tests were done by perturbing one and two parameters of the PNPic model, respectively, as well as the tests done by using a few perturbed interface positions of the model as training samples. These tests demonstrate that the PNPic deep learning solver can generate highly accurate PNPic numerical solutions.
△ Less
Submitted 30 March, 2024; v1 submitted 30 January, 2024;
originally announced January 2024.
-
Carve3D: Improving Multi-view Reconstruction Consistency for Diffusion Models with RL Finetuning
Authors:
Desai Xie,
Jiahao Li,
Hao Tan,
Xin Sun,
Zhixin Shu,
Yi Zhou,
Sai Bi,
Sören Pirk,
Arie E. Kaufman
Abstract:
Multi-view diffusion models, obtained by applying Supervised Finetuning (SFT) to text-to-image diffusion models, have driven recent breakthroughs in text-to-3D research. However, due to the limited size and quality of existing 3D datasets, they still suffer from multi-view inconsistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benef…
▽ More
Multi-view diffusion models, obtained by applying Supervised Finetuning (SFT) to text-to-image diffusion models, have driven recent breakthroughs in text-to-3D research. However, due to the limited size and quality of existing 3D datasets, they still suffer from multi-view inconsistencies and Neural Radiance Field (NeRF) reconstruction artifacts. We argue that multi-view diffusion models can benefit from further Reinforcement Learning Finetuning (RLFT), which allows models to learn from the data generated by themselves and improve beyond their dataset limitations during SFT. To this end, we introduce Carve3D, an improved RLFT algorithm coupled with a novel Multi-view Reconstruction Consistency (MRC) metric, to enhance the consistency of multi-view diffusion models. To measure the MRC metric on a set of multi-view images, we compare them with their corresponding NeRF renderings at the same camera viewpoints. The resulting model, which we denote as Carve3DM, demonstrates superior multi-view consistency and NeRF reconstruction quality than existing models. Our results suggest that pairing SFT with Carve3D's RLFT is essential for developing multi-view-consistent diffusion models, mirroring the standard Large Language Model (LLM) alignment pipeline. Our code, training and testing data, and video results are available at: https://desaixie.github.io/carve-3d.
△ Less
Submitted 9 April, 2024; v1 submitted 21 December, 2023;
originally announced December 2023.
-
New ternary self-orthogonal codes and related LCD codes from weakly regular plateaued functions
Authors:
Dengcheng Xie,
Shixin Zhu,
Yang Li
Abstract:
A linear code is said to be self-orthogonal if it is contained in its dual. Self-orthogonal codes are of interest because of their important applications, such as for constructing linear complementary dual (LCD) codes and quantum codes. In this paper, we construct several new families of ternary self-orthogonal codes by employing weakly regular plateaued functions. Their parameters and weight dist…
▽ More
A linear code is said to be self-orthogonal if it is contained in its dual. Self-orthogonal codes are of interest because of their important applications, such as for constructing linear complementary dual (LCD) codes and quantum codes. In this paper, we construct several new families of ternary self-orthogonal codes by employing weakly regular plateaued functions. Their parameters and weight distributions are completely determined. Then we apply these self-orthogonal codes to construct several new families of ternary LCD codes. As a consequence, we obtain many (almost) optimal ternary self-orthogonal codes and LCD codes.
△ Less
Submitted 7 December, 2023;
originally announced December 2023.
-
Thermometry with a Dissipative Heavy Impurity
Authors:
Dong Xie,
Chunling Xu
Abstract:
Improving the measurement precision of low temperature is significant in fundamental science and advanced quantum technology application. However, the measurement precision of temperature $T$ usually diverges as $T$ tends to 0. Here, by utilizing a heavy impurity to measure the temperature of a Bose gas, we obtain the Landau bound to precision $δ^2 T\propto T^2$ to avoid the divergence. Moreover,…
▽ More
Improving the measurement precision of low temperature is significant in fundamental science and advanced quantum technology application. However, the measurement precision of temperature $T$ usually diverges as $T$ tends to 0. Here, by utilizing a heavy impurity to measure the temperature of a Bose gas, we obtain the Landau bound to precision $δ^2 T\propto T^2$ to avoid the divergence. Moreover, when the initial momentum of the heavy impurity is fixed and non-zero, the measurement precision can be $δ^2 T\propto T^3$ to break the Landau bound. We derive the momentum distribution of the heavy impurity at any moment and obtain the optimal measurement precision of the temperature by calculating the Fisher information. As a result, we find that enhancing the expectation value of the initial momentum can help to improve the measurement precision. In addition, the momentum measurement is the optimal measurement of the temperature in the case of that the initial momentum is fixed and not equal to 0. The kinetic energy measurement is the optimal measurement in the case of that the expectation value of the initial momentum is 0. Finally, we obtain that the temperatures of two Bose gases can be measured simultaneously. The simultaneous measurement precision is proportional to $T^2$ when two temperatures are close to $T$.
△ Less
Submitted 1 December, 2023;
originally announced December 2023.
-
On rank two theories with eight supercharges part II: Lefschetz pencils
Authors:
Dan Xie
Abstract:
The global Seiberg-Witten (SW) geometries for rank two theories with eight supercharges are studied. The theory is deformed generically so that there are only simplest $I_1$ or $\tilde{I}_1$ singularities on the Coulomb branch, which then geometrically gives the so-called Lefchetz pencils, The local singularity was shown to be determined by the conjugacy class of mapping class group (MCG); The glo…
▽ More
The global Seiberg-Witten (SW) geometries for rank two theories with eight supercharges are studied. The theory is deformed generically so that there are only simplest $I_1$ or $\tilde{I}_1$ singularities on the Coulomb branch, which then geometrically gives the so-called Lefchetz pencils, The local singularity was shown to be determined by the conjugacy class of mapping class group (MCG); The global study is then reduced to the questions about MCG: a) Find the factorization of the MCG element of the singular fiber into positive products of Dehn twists (which gives the $I_1$ singularity or $\tilde{I}_1$ singularity); b) Find the factorization of identity element in terms of Dehn twists. We solved above two MCG problems for most rank two theories.The results are very helpful in determining IR physics for all vacua of 4d SCFTs. Our approach is combinatorial and many aspects can be straightforwardly generalized to the study of higher rank theory.
△ Less
Submitted 27 November, 2023;
originally announced November 2023.
-
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval
Authors:
Youbo Lei,
Feifei He,
Chen Chen,
Yingbin Mo,
Si Jia Li,
Defeng Xie,
Haonan Lu
Abstract:
Due to the success of large-scale visual-language pretraining (VLP) models and the widespread use of image-text retrieval in industry areas, it is now critically necessary to reduce the model size and streamline their mobile-device deployment. Single- and dual-stream model structures are commonly used in image-text retrieval with the goal of closing the semantic gap between textual and visual moda…
▽ More
Due to the success of large-scale visual-language pretraining (VLP) models and the widespread use of image-text retrieval in industry areas, it is now critically necessary to reduce the model size and streamline their mobile-device deployment. Single- and dual-stream model structures are commonly used in image-text retrieval with the goal of closing the semantic gap between textual and visual modalities. While single-stream models use deep feature fusion to achieve more accurate cross-model alignment, dual-stream models are better at offline indexing and fast inference.We propose a Multi-teacher Cross-modality Alignment Distillation (MCAD) technique to integrate the advantages of single- and dual-stream models. By incorporating the fused single-stream features into the image and text features of the dual-stream model, we formulate new modified teacher similarity distributions and features. Then, we conduct both distribution and feature distillation to boost the capability of the student dual-stream model, achieving high retrieval performance without increasing inference complexity.Extensive experiments demonstrate the remarkable performance and high efficiency of MCAD on image-text retrieval tasks. Furthermore, we implement a lightweight CLIP model on Snapdragon/Dimensity chips with only $\sim$100M running memory and $\sim$8.0ms search latency, achieving the mobile-device application of VLP models.
△ Less
Submitted 1 April, 2024; v1 submitted 30 October, 2023;
originally announced October 2023.
-
Adapt Anything: Tailor Any Image Classifiers across Domains And Categories Using Text-to-Image Diffusion Models
Authors:
Weijie Chen,
Haoyu Wang,
Shicai Yang,
Lei Zhang,
Wei Wei,
Yanning Zhang,
Luojun Lin,
Di Xie,
Yueting Zhuang
Abstract:
We do not pursue a novel method in this paper, but aim to study if a modern text-to-image diffusion model can tailor any task-adaptive image classifier across domains and categories. Existing domain adaptive image classification works exploit both source and target data for domain alignment so as to transfer the knowledge learned from the labeled source data to the unlabeled target data. However,…
▽ More
We do not pursue a novel method in this paper, but aim to study if a modern text-to-image diffusion model can tailor any task-adaptive image classifier across domains and categories. Existing domain adaptive image classification works exploit both source and target data for domain alignment so as to transfer the knowledge learned from the labeled source data to the unlabeled target data. However, as the development of the text-to-image diffusion model, we wonder if the high-fidelity synthetic data from the text-to-image generator can serve as a surrogate of the source data in real world. In this way, we do not need to collect and annotate the source data for each domain adaptation task in a one-for-one manner. Instead, we utilize only one off-the-shelf text-to-image model to synthesize images with category labels derived from the corresponding text prompts, and then leverage the surrogate data as a bridge to transfer the knowledge embedded in the task-agnostic text-to-image generator to the task-oriented image classifier via domain adaptation. Such a one-for-all adaptation paradigm allows us to adapt anything in the world using only one text-to-image generator as well as the corresponding unlabeled target data. Extensive experiments validate the feasibility of the proposed idea, which even surpasses the state-of-the-art domain adaptation works using the source data collected and annotated in real world.
△ Less
Submitted 25 October, 2023;
originally announced October 2023.
-
Three dimensional quotient singularity and 4d $\mathcal{N}=1$ AdS/CFT correspondence
Authors:
Yuanyuan Fang,
Jing Feng,
Dan Xie
Abstract:
We systematically study the AdS/CFT correspondence induced by D3 branes probing three dimensional Gorenstein quotient singularity $\mathbb{C}^3/G$. The field theory is given by the McKay quiver, which has a vanishing NSVZ beta function assuming that all the chiral fields have the $U(1)_R$ charge $\frac{2}{3}$. Various physical quantities such as quiver Hilbert series, superconformal index, central…
▽ More
We systematically study the AdS/CFT correspondence induced by D3 branes probing three dimensional Gorenstein quotient singularity $\mathbb{C}^3/G$. The field theory is given by the McKay quiver, which has a vanishing NSVZ beta function assuming that all the chiral fields have the $U(1)_R$ charge $\frac{2}{3}$. Various physical quantities such as quiver Hilbert series, superconformal index, central charges, etc are computed, which match exactly with those computed using the singularity. We also study the relevant deformation of those theories and find the dual geometry, therefore generate many new interesting AdS/CFT pairs. The quiver gauge theory defined using finite subgroups of $SO(3)$ group has some interesting features, for example, its Seiberg duality behavior is quite interesting.
△ Less
Submitted 24 October, 2023;
originally announced October 2023.
-
Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis
Authors:
Yuke Li,
Xinfa Zhu,
Yi Lei,
Hai Li,
Junhui Liu,
Danming Xie,
Lei Xie
Abstract:
Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from an arbitrary speech reference in the source language to the synthetic speech in the target language. Building such a system faces challenges of unnatural foreign accents and difficulty in modeling the shared emotional expressions of different languages. Building on the DelightfulTTS neural architecture, this…
▽ More
Zero-shot emotion transfer in cross-lingual speech synthesis aims to transfer emotion from an arbitrary speech reference in the source language to the synthetic speech in the target language. Building such a system faces challenges of unnatural foreign accents and difficulty in modeling the shared emotional expressions of different languages. Building on the DelightfulTTS neural architecture, this paper addresses these challenges by introducing specifically-designed modules to model the language-specific prosody features and language-shared emotional expressions separately. Specifically, the language-specific speech prosody is learned by a non-autoregressive predictive coding (NPC) module to improve the naturalness of the synthetic cross-lingual speech. The shared emotional expression between different languages is extracted from a pre-trained self-supervised model HuBERT with strong generalization capabilities. We further use hierarchical emotion modeling to capture more comprehensive emotions across different languages. Experimental results demonstrate the proposed framework's effectiveness in synthesizing bi-lingual emotional speech for the monolingual target speaker without emotional training data.
△ Less
Submitted 5 October, 2023;
originally announced October 2023.
-
Hyperelliptic families and 4d $\mathcal{N}=2$ SCFT
Authors:
Dan Xie,
Zekai Yu
Abstract:
We classify four dimensional $\mathcal{N}=2$ SCFTs whose Seiberg-Witten (SW) geometries can be written as hyperelliptic families. By using special Kähler condition of SW geometry, we reduce the problem to one parameter quasi-homogeneous hyperelliptic families $y^2=f(x,t)$. The classification is given by further demanding that the complex algebraic surface defined by $y^2=f(x,t)$ has an isolated si…
▽ More
We classify four dimensional $\mathcal{N}=2$ SCFTs whose Seiberg-Witten (SW) geometries can be written as hyperelliptic families. By using special Kähler condition of SW geometry, we reduce the problem to one parameter quasi-homogeneous hyperelliptic families $y^2=f(x,t)$. The classification is given by further demanding that the complex algebraic surface defined by $y^2=f(x,t)$ has an isolated singularity. We then write down the full SW geometry by looking at mini-versal deformations of the one parameter family, and the SW differential is also written down. The detailed physical data for these theories are found by matching the theory with other known construction. Our solutions recover the known rank one and rank two results, and give some infinite sequences valid at arbitrary ranks. We also studied $Z_2$ quotient of above hyperelliptic families which give rise to $B$ type and $D$ type conformal gauge theory, and further generalizations.
△ Less
Submitted 4 October, 2023;
originally announced October 2023.
-
An Effective Two-stage Training Paradigm Detector for Small Dataset
Authors:
Zheng Wang,
Dong Xie,
Hanzhi Wang,
Jiang Tian
Abstract:
Learning from the limited amount of labeled data to the pre-train model has always been viewed as a challenging task. In this report, an effective and robust solution, the two-stage training paradigm YOLOv8 detector (TP-YOLOv8), is designed for the object detection track in VIPriors Challenge 2023. First, the backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling techniqu…
▽ More
Learning from the limited amount of labeled data to the pre-train model has always been viewed as a challenging task. In this report, an effective and robust solution, the two-stage training paradigm YOLOv8 detector (TP-YOLOv8), is designed for the object detection track in VIPriors Challenge 2023. First, the backbone of YOLOv8 is pre-trained as the encoder using the masked image modeling technique. Then the detector is fine-tuned with elaborate augmentations. During the test stage, test-time augmentation (TTA) is used to enhance each model, and weighted box fusion (WBF) is implemented to further boost the performance. With the well-designed structure, our approach has achieved 30.4% average precision from 0.50 to 0.95 on the DelftBikes test set, ranking 4th on the leaderboard.
△ Less
Submitted 11 September, 2023;
originally announced September 2023.
-
An Alternative Formation Scenario for Uranium-rich Giants: Engulfing a Earth-like Planet
Authors:
Dian Xie,
Chunhua Zhu,
Sufen Guo,
Helei Liu,
Guoliang Lü
Abstract:
The actinides, such as the uranium (U) element, are typically synthesized through the rapid neutron-capture process (r-process), which can occur in core-collapse supernovae or double neutron star mergers. There exist nine r-process giant stars exhibiting conspicuousUabundances, commonly referred to as U-rich giants. However, the origins of these U-rich giants remain ambiguous. We propose an altern…
▽ More
The actinides, such as the uranium (U) element, are typically synthesized through the rapid neutron-capture process (r-process), which can occur in core-collapse supernovae or double neutron star mergers. There exist nine r-process giant stars exhibiting conspicuousUabundances, commonly referred to as U-rich giants. However, the origins of these U-rich giants remain ambiguous. We propose an alternative formation scenario for these U-rich giants whereby a red giant (RG) engulfs an Earth-like planet. To approximate the process of a RG engulfing an Earth-like planet, we employ an accretion model wherein the RG assimilates materials from said planet. Our findings demonstrate that this engulfment event can considerably enhance the presence of heavy elements originating from Earth-like planets on the surfaces of very metal-poor stars (Z = 0.00001), while its impact on solar-metallicity stars is comparatively modest. Importantly, the structural and evolutionary properties of both very metalpoor and solar-metallicity stars remain largely unaffected. Notably, our engulfment model effectively accounts for the observed U abundances in known U-rich giants. Furthermore, the evolutionary trajectories of U abundances on the surfaces of RGs subsequent to the engulfment of Earth-like planets encompass all known U-rich giants. Therefore, it is plausible that U-rich giants are formed when a RG engulfs an Earth-like planet.
△ Less
Submitted 12 July, 2023;
originally announced July 2023.
-
ShredGP: Guitarist Style-Conditioned Tablature Generation
Authors:
Pedro Sarmento,
Adarsh Kumar,
Dekun Xie,
CJ Carr,
Zack Zukowski,
Mathieu Barthet
Abstract:
GuitarPro format tablatures are a type of digital music notation that encapsulates information about guitar playing techniques and fingerings. We introduce ShredGP, a GuitarPro tablature generative Transformer-based model conditioned to imitate the style of four distinct iconic electric guitarists. In order to assess the idiosyncrasies of each guitar player, we adopt a computational musicology met…
▽ More
GuitarPro format tablatures are a type of digital music notation that encapsulates information about guitar playing techniques and fingerings. We introduce ShredGP, a GuitarPro tablature generative Transformer-based model conditioned to imitate the style of four distinct iconic electric guitarists. In order to assess the idiosyncrasies of each guitar player, we adopt a computational musicology methodology by analysing features computed from the tokens yielded by the DadaGP encoding scheme. Statistical analyses of the features evidence significant differences between the four guitarists. We trained two variants of the ShredGP model, one using a multi-instrument corpus, the other using solo guitar data. We present a BERT-based model for guitar player classification and use it to evaluate the generated examples. Overall, results from the classifier show that ShredGP is able to generate content congruent with the style of the targeted guitar player. Finally, we reflect on prospective applications for ShredGP for human-AI music interaction.
△ Less
Submitted 11 July, 2023;
originally announced July 2023.
-
Mirror symmetry for circle compactified 4d $\mathcal{N}=2$ SCFTs
Authors:
Peng Shan,
Dan Xie,
Wenbin Yan
Abstract:
We propose a mirror symmetry for 4d $\mathcal{N}=2$ superconformal field theories (SCFTs) compactified on a circle with finite size. The mirror symmetry involves vertex operator algebra (VOA) describing the Schur sector (containing Higgs branch) of 4d theory, and the Coulomb branch of the effective 3d theory. The basic feature of the mirror symmetry is that many representational properties of VOA…
▽ More
We propose a mirror symmetry for 4d $\mathcal{N}=2$ superconformal field theories (SCFTs) compactified on a circle with finite size. The mirror symmetry involves vertex operator algebra (VOA) describing the Schur sector (containing Higgs branch) of 4d theory, and the Coulomb branch of the effective 3d theory. The basic feature of the mirror symmetry is that many representational properties of VOA are matched with geometric properties of the Coulomb branch moduli space. Our proposal is verified for a large class of Argyres-Douglas (AD) theories engineered from M5 branes, whose VOAs are W-algebras, and Coulomb branches are the Hitchin moduli spaces. VOA data such as simple modules, Zhu's algebra, and modular properties are matched with geometric properties like $\mathbb{C}^*$-fixed varieties in Hitchin fibers, cohomologies, and some DAHA representations. We also mention relationships to 3d symplectic duality.
△ Less
Submitted 27 June, 2023;
originally announced June 2023.
-
The Skill-Task Matching Model: Mechanism, Model Structure, and Algorithm
Authors:
Da Xie,
WeiGuo Yang
Abstract:
We distinguished between the expected and actual profit of a firm. We proposed that, beyond maximizing profit, a firm's goal also encompasses minimizing the gap between expected and actual profit. Firms strive to enhance their capability to transform projects into reality through a process of trial and error, evident as a cyclical iterative optimization process. To characterize this iterative mech…
▽ More
We distinguished between the expected and actual profit of a firm. We proposed that, beyond maximizing profit, a firm's goal also encompasses minimizing the gap between expected and actual profit. Firms strive to enhance their capability to transform projects into reality through a process of trial and error, evident as a cyclical iterative optimization process. To characterize this iterative mechanism, we developed the Skill-Task Matching Model, extending the task approach in both multidimensional and iterative manners. We vectorized jobs and employees into task and skill vector spaces, respectively, while treating production techniques as a skill-task matching matrix and business strategy as a task value vector. In our model, the process of stabilizing production techniques and optimizing business strategies corresponds to the recalibration of parameters within the skill-task matching matrix and the task value vector. We constructed a feed-forward neural network algorithm to run this model and demonstrated how it can augment operational efficiency.
△ Less
Submitted 31 October, 2023; v1 submitted 21 June, 2023;
originally announced June 2023.
-
Impact of Large Language Models on Generating Software Specifications
Authors:
Danning Xie,
Byungwoo Yoo,
Nan Jiang,
Mijung Kim,
Lin Tan,
Xiangyu Zhang,
Judy S. Lee
Abstract:
Software specifications are essential for ensuring the reliability of software systems. Existing specification extraction approaches, however, suffer from limited generalizability and require manual efforts. The recent emergence of Large Language Models (LLMs), which have been successfully applied to numerous software engineering tasks, offers a promising avenue for automating this process. In thi…
▽ More
Software specifications are essential for ensuring the reliability of software systems. Existing specification extraction approaches, however, suffer from limited generalizability and require manual efforts. The recent emergence of Large Language Models (LLMs), which have been successfully applied to numerous software engineering tasks, offers a promising avenue for automating this process. In this paper, we conduct the first empirical study to evaluate the capabilities of LLMs for generating software specifications from software comments or documentation. We evaluate LLMs' performance with Few Shot Learning (FSL), enabling LLMs to generalize from a small number of examples, as well as different prompt construction strategies, and compare the performance of LLMs with traditional approaches. Additionally, we conduct a comparative diagnosis of the failure cases from both LLMs and traditional methods, identifying their unique strengths and weaknesses. Lastly, we conduct extensive experiments on 15 state of the art LLMs, evaluating their performance and cost effectiveness for generating software specifications.
Our results show that with FSL, LLMs outperform traditional methods (by 5.6%), and more sophisticated prompt construction strategies can further enlarge this performance gap (up to 5.1 to 10.0%). Yet, LLMs suffer from their unique challenges, such as ineffective prompts and the lack of domain knowledge, which together account for 53 to 60% of LLM unique failures. The strong performance of open source models (e.g., StarCoder) makes closed source models (e.g., GPT 3 Davinci) less desirable due to size and cost. Our study offers valuable insights for future research to improve specification generation.
△ Less
Submitted 2 October, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
The Effect of Ionic Spin on Multiferroic of Orthorhombic Perovskite
Authors:
Kaiyang Gao,
Jiyu Shen,
Zeyi Lu,
Jiajun Mo,
Guoqing Liu,
Zhongjin Wu,
Chenying Gong,
Dong Xie,
Yanfang Xia,
Min Liu
Abstract:
To investigate the influence of ion spin on the coupling between ferromagnetism and ferroelectricity in type II multiferroic perovskite, we prepared the multiferroic perovskite Er0.9La0.1Cr0.8Fe0.2O3 (ELCFO) using the sol-gel method, and explored the macroscopic magnetic properties of ELCFO through Mössbauer spectrum and magnetic testing. The thermal magnetic curve was analyzed to examine the stat…
▽ More
To investigate the influence of ion spin on the coupling between ferromagnetism and ferroelectricity in type II multiferroic perovskite, we prepared the multiferroic perovskite Er0.9La0.1Cr0.8Fe0.2O3 (ELCFO) using the sol-gel method, and explored the macroscopic magnetic properties of ELCFO through Mössbauer spectrum and magnetic testing. The thermal magnetic curve was analyzed to examine the state and change of each ionic spin in the ELCFO system at different temperature ranges, and the role of ionic spin in the coupling between ferromagnetism and ferroelectricity was investigated. This study provides a theoretical basis for further research on multiferroic perovskites and has practical implications.
△ Less
Submitted 27 May, 2023;
originally announced June 2023.
-
Leveraging Generative Models to Recover Variable Names from Stripped Binary
Authors:
Xiangzhe Xu,
Zhuo Zhang,
Zian Su,
Ziyang Huang,
Shiwei Feng,
Yapeng Ye,
Nan Jiang,
Danning Xie,
Siyuan Cheng,
Lin Tan,
Xiangyu Zhang
Abstract:
Decompilation aims to recover the source code form of a binary executable. It has many security applications such as malware analysis, vulnerability detection and code hardening. A prominent challenge in decompilation is to recover variable names. We propose a novel technique that leverages the strengths of generative models while suppressing potential hallucinations and overcoming the input token…
▽ More
Decompilation aims to recover the source code form of a binary executable. It has many security applications such as malware analysis, vulnerability detection and code hardening. A prominent challenge in decompilation is to recover variable names. We propose a novel technique that leverages the strengths of generative models while suppressing potential hallucinations and overcoming the input token limitation. We build a prototype, GenNm, from a pre-trained generative model Code-Llama. We fine-tune GenNm on decompiled functions, and leverage program analysis to validate the results produced by the generative model. GenNm includes names from callers and callees while querying a function, providing rich contextual information within the model's input token limitation. Our results show that GenNm improves the state-of-the-art from 48.1% to 57.9% in the most challenging setup where a query function is not seen in the training dataset.
△ Less
Submitted 30 April, 2024; v1 submitted 4 June, 2023;
originally announced June 2023.
-
Single Domain Dynamic Generalization for Iris Presentation Attack Detection
Authors:
Yachun Li,
Jingjing Wang,
Yuhui Chen,
Di Xie,
Shiliang Pu
Abstract:
Iris presentation attack detection (PAD) has achieved great success under intra-domain settings but easily degrades on unseen domains. Conventional domain generalization methods mitigate the gap by learning domain-invariant features. However, they ignore the discriminative information in the domain-specific features. Moreover, we usually face a more realistic scenario with only one single domain a…
▽ More
Iris presentation attack detection (PAD) has achieved great success under intra-domain settings but easily degrades on unseen domains. Conventional domain generalization methods mitigate the gap by learning domain-invariant features. However, they ignore the discriminative information in the domain-specific features. Moreover, we usually face a more realistic scenario with only one single domain available for training. To tackle the above issues, we propose a Single Domain Dynamic Generalization (SDDG) framework, which simultaneously exploits domain-invariant and domain-specific features on a per-sample basis and learns to generalize to various unseen domains with numerous natural images. Specifically, a dynamic block is designed to adaptively adjust the network with a dynamic adaptor. And an information maximization loss is further combined to increase diversity. The whole network is integrated into the meta-learning paradigm. We generate amplitude perturbed images and cover diverse domains with natural images. Therefore, the network can learn to generalize to the perturbed domains in the meta-test phase. Extensive experiments show the proposed method is effective and outperforms the state-of-the-art on LivDet-Iris 2017 dataset.
△ Less
Submitted 22 May, 2023;
originally announced May 2023.
-
Quantum estimation of tripartite coupling in Spin-Magnon-Mechanical Hybrid Systems
Authors:
Dong Xie,
Chunling Xu
Abstract:
Tripartite interactions play a fundamental role in the quantum information processing and quantum technology. However, it is generally difficult to realize strong tripartite coupling. We investigate the estimation of a tripartite coupling strength in a hybrid setup composed of a single nitrogen-vacancy (NV) center and a micromagnet. A time-independent parametric drive can be utilized to increase t…
▽ More
Tripartite interactions play a fundamental role in the quantum information processing and quantum technology. However, it is generally difficult to realize strong tripartite coupling. We investigate the estimation of a tripartite coupling strength in a hybrid setup composed of a single nitrogen-vacancy (NV) center and a micromagnet. A time-independent parametric drive can be utilized to increase the estimation precision of the tripartite coupling strength. By calculating the quantum Fisher information (QFI), we can obtain the optimal estimation precision by measuring the eigenstate of the tripartite system. At the critical position, the QFI is divergent due to that the preparation time of the eigenstate is divergent. When the system is subjected to a dissipation, the QFI near the critical point of the driven-dissipation phase transition is analytically obtained. The direct intensity measurement is the optimal measurement near the dissipation phase transition point. In addition, we quantify the robustness of an imperfect measurement operator by the measurement noise susceptibility based on the error propagation formula. We find that the direct intensity measurement is enough robust against small measurement disturbance from a coherent drive. But it can be disturbed by the nonlinear anti-harmonic measurement noise, especially near the critical point.
△ Less
Submitted 21 May, 2023;
originally announced May 2023.
-
Superconformal indices of $\mathcal{N}=4$ Chern-Simons matter theories
Authors:
Bohan Li,
Dan Xie,
WenBin Yan
Abstract:
Gaiotto and Witten found that one can construct 3d $\mathcal{N}=4$ Chern-Simons matter theories by using $\mathcal{N}=4$ SCFT whose momentum map of global symmetries satisfy special condition. Usually, one uses free hypermultiplet and twisted hypermultiplet, and more recently it was found that strongly coupled theory such as 3d version of $T_N$ theory and Argyres-Douglas matter can also be used. I…
▽ More
Gaiotto and Witten found that one can construct 3d $\mathcal{N}=4$ Chern-Simons matter theories by using $\mathcal{N}=4$ SCFT whose momentum map of global symmetries satisfy special condition. Usually, one uses free hypermultiplet and twisted hypermultiplet, and more recently it was found that strongly coupled theory such as 3d version of $T_N$ theory and Argyres-Douglas matter can also be used. In this paper, we compute superconformal index of these $\mathcal{N}=4$ theories and derive the Coulomb/Higgs limit. Our results determine the moduli space of vacua, which is used to check various interesting mirror symmetry involving CSM theory and usual $\mathcal{N}=4$ gauge theory.
△ Less
Submitted 15 May, 2023; v1 submitted 15 May, 2023;
originally announced May 2023.
-
Edit Everything: A Text-Guided Generative System for Images Editing
Authors:
Defeng Xie,
Ruichen Wang,
Jian Ma,
Chen Chen,
Haonan Lu,
Dong Yang,
Fobo Shi,
Xiaodong Lin
Abstract:
We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffu…
▽ More
We introduce a new generative system called Edit Everything, which can take image and text inputs and produce image outputs. Edit Everything allows users to edit images using simple text instructions. Our system designs prompts to guide the visual module in generating requested images. Experiments demonstrate that Edit Everything facilitates the implementation of the visual aspects of Stable Diffusion with the use of Segment Anything model and CLIP. Our system is publicly available at https://github.com/DefengXie/Edit_Everything.
△ Less
Submitted 27 April, 2023;
originally announced April 2023.
-
Pseudo-periodic map and classification of theories with eight supercharges
Authors:
Dan Xie
Abstract:
The classification of one parameter local Coulomb branch solution of theories with eight supercharges is given by assuming that it is given by a genus $g$ fiberation of Riemann surfaces. The crucial point is the fact that certain conjugacy class (so-called pseudo-periodic map of negative type) in mapping class group determines the topological type of the degeneration. The classification of conjuga…
▽ More
The classification of one parameter local Coulomb branch solution of theories with eight supercharges is given by assuming that it is given by a genus $g$ fiberation of Riemann surfaces. The crucial point is the fact that certain conjugacy class (so-called pseudo-periodic map of negative type) in mapping class group determines the topological type of the degeneration. The classification of conjugacy class has a simple combinatorial description. Each such conjugacy class gives rise to a dual graph and a 3d mirror quiver gauge theory can be derived, which is then used to identify the low energy theory (assuming generic deformation). Some global Seiberg-Witten geometries are given by using the topological data of the degeneration. The geometric setup unifies 4d $\mathcal{N}=2$ SCFTs (such as $T_n$ theory and Argyres-Douglas theory), 5d $\mathcal{N}=1$ SCFTs, 6d $(1,0)$ SCFTs, 4d IR free theories, and 4d asymptotical free theories in a single combinatorial framework.
△ Less
Submitted 26 April, 2023;
originally announced April 2023.
-
A Hardy-Littlewood type Theorem and a Heinz type inequality
Authors:
Shaolin Chen,
Hidetaka Hamada,
Dou Xie
Abstract:
The main aim of this paper is to investigate the Hardy-Littlewood type Theorem and the Heinz type inequality on functions induced by a differential operator. We first prove a more general Hardy-Littlewood type theorem for the Dirichlet solution of a differential operator which depends on $α>0$ over the unit ball $\mathbb{B}^n$ of $\mathbb{R}^n$ with $n\geq 2$, related to the Lipschitz type space d…
▽ More
The main aim of this paper is to investigate the Hardy-Littlewood type Theorem and the Heinz type inequality on functions induced by a differential operator. We first prove a more general Hardy-Littlewood type theorem for the Dirichlet solution of a differential operator which depends on $α>0$ over the unit ball $\mathbb{B}^n$ of $\mathbb{R}^n$ with $n\geq 2$, related to the Lipschitz type space defined by a fast majorant. We find that the case $α>0$ is completely different from the case $α=0$. Then a more general Heinz type inequality for the Dirichlet solution of a differential operator will also be established in the case $α>n-2$.
△ Less
Submitted 24 April, 2023;
originally announced April 2023.
-
Exceptional point in self-consistent Markovian master equations
Authors:
Dong Xie,
Chunling Xu
Abstract:
Exceptional point (EP) denotes the non-Hermitian degeneracy, in which both eigenvalues and eigenstates become identical. By the conventional local Markovian master equation, EP can be constructed by parity-time (PT) or anti-PT symmetry in a system composed of coupled subsystems. However, the coupling between two systems makes the conventional local Markovian master equation become inconsistent. By…
▽ More
Exceptional point (EP) denotes the non-Hermitian degeneracy, in which both eigenvalues and eigenstates become identical. By the conventional local Markovian master equation, EP can be constructed by parity-time (PT) or anti-PT symmetry in a system composed of coupled subsystems. However, the coupling between two systems makes the conventional local Markovian master equation become inconsistent. By using the self-consistent Markovian master equation, we show that there is no EP in the system composed of two bosonic subsystems. We further prove that the conventional local master equation can be valid only when the coupling strength is much smaller than the difference in resonance frequency between the two subsystems, rather than the resonance frequencies. In a system composed of three bosonic subsystems, EP can be obtained by adiabatically eliminating one of the three subsystems.
△ Less
Submitted 2 April, 2023;
originally announced April 2023.
-
Rethinking the Approximation Error in 3D Surface Fitting for Point Cloud Normal Estimation
Authors:
Hang Du,
Xuejun Yan,
Jingjing Wang,
Di Xie,
Shiliang Pu
Abstract:
Most existing approaches for point cloud normal estimation aim to locally fit a geometric surface and calculate the normal from the fitted surface. Recently, learning-based methods have adopted a routine of predicting point-wise weights to solve the weighted least-squares surface fitting problem. Despite achieving remarkable progress, these methods overlook the approximation error of the fitting p…
▽ More
Most existing approaches for point cloud normal estimation aim to locally fit a geometric surface and calculate the normal from the fitted surface. Recently, learning-based methods have adopted a routine of predicting point-wise weights to solve the weighted least-squares surface fitting problem. Despite achieving remarkable progress, these methods overlook the approximation error of the fitting problem, resulting in a less accurate fitted surface. In this paper, we first carry out in-depth analysis of the approximation error in the surface fitting problem. Then, in order to bridge the gap between estimated and precise surface normals, we present two basic design principles: 1) applies the $Z$-direction Transform to rotate local patches for a better surface fitting with a lower approximation error; 2) models the error of the normal estimation as a learnable term. We implement these two principles using deep neural networks, and integrate them with the state-of-the-art (SOTA) normal estimation methods in a plug-and-play manner. Extensive experiments verify our approaches bring benefits to point cloud normal estimation and push the frontier of state-of-the-art performance on both synthetic and real-world datasets.
△ Less
Submitted 30 March, 2023;
originally announced March 2023.
-
An Effective and Differentially Private Protocol for Secure Distributed Cardinality Estimation
Authors:
Pinghui Wang,
Chengjin Yang,
Dongdong Xie,
Junzhou Zhao,
Hui Li,
Jing Tao,
Xiaohong Guan
Abstract:
Counting the number of distinct elements distributed over multiple data holders is a fundamental problem with many real-world applications ranging from crowd counting to network monitoring. Although a number of space and computational efficient sketch methods (e.g., the Flajolet-Martin sketch and the HyperLogLog sketch) for cardinality estimation have been proposed to solve the above problem, thes…
▽ More
Counting the number of distinct elements distributed over multiple data holders is a fundamental problem with many real-world applications ranging from crowd counting to network monitoring. Although a number of space and computational efficient sketch methods (e.g., the Flajolet-Martin sketch and the HyperLogLog sketch) for cardinality estimation have been proposed to solve the above problem, these sketch methods are insecure when considering privacy concerns related to the use of each data holder's personal dataset. Despite a recently proposed protocol that successfully implements the well-known Flajolet-Martin (FM) sketch on a secret-sharing based multiparty computation (MPC) framework for solving the problem of private distributed cardinality estimation (PDCE), we observe that this MPC-FM protocol is not differentially private. In addition, the MPC-FM protocol is computationally expensive, which limits its applications to data holders with limited computation resources. To address the above issues, in this paper we propose a novel protocol DP-DICE, which is computationally efficient and differentially private for solving the problem of PDCE. Experimental results show that our DP-DICE achieves orders of magnitude speedup and reduces the estimation error by several times in comparison with state-of-the-arts under the same security requirements.
△ Less
Submitted 4 February, 2023;
originally announced February 2023.
-
1st Place Solution for ECCV 2022 OOD-CV Challenge Object Detection Track
Authors:
Wei Zhao,
Binbin Chen,
Weijie Chen,
Shicai Yang,
Di Xie,
Shiliang Pu,
Yueting Zhuang
Abstract:
OOD-CV challenge is an out-of-distribution generalization task. To solve this problem in object detection track, we propose a simple yet effective Generalize-then-Adapt (G&A) framework, which is composed of a two-stage domain generalization part and a one-stage domain adaptation part. The domain generalization part is implemented by a Supervised Model Pretraining stage using source data for model…
▽ More
OOD-CV challenge is an out-of-distribution generalization task. To solve this problem in object detection track, we propose a simple yet effective Generalize-then-Adapt (G&A) framework, which is composed of a two-stage domain generalization part and a one-stage domain adaptation part. The domain generalization part is implemented by a Supervised Model Pretraining stage using source data for model warm-up and a Weakly Semi-Supervised Model Pretraining stage using both source data with box-level label and auxiliary data (ImageNet-1K) with image-level label for performance boosting. The domain adaptation part is implemented as a Source-Free Domain Adaptation paradigm, which only uses the pre-trained model and the unlabeled target data to further optimize in a self-supervised training manner. The proposed G&A framework help us achieve the first place on the object detection leaderboard of the OOD-CV challenge. Code will be released in https://github.com/hikvision-research/OOD-CV.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
1st Place Solution for ECCV 2022 OOD-CV Challenge Image Classification Track
Authors:
Yilu Guo,
Xingyue Shi,
Weijie Chen,
Shicai Yang,
Di Xie,
Shiliang Pu,
Yueting Zhuang
Abstract:
OOD-CV challenge is an out-of-distribution generalization task. In this challenge, our core solution can be summarized as that Noisy Label Learning Is A Strong Test-Time Domain Adaptation Optimizer. Briefly speaking, our main pipeline can be divided into two stages, a pre-training stage for domain generalization and a test-time training stage for domain adaptation. We only exploit labeled source d…
▽ More
OOD-CV challenge is an out-of-distribution generalization task. In this challenge, our core solution can be summarized as that Noisy Label Learning Is A Strong Test-Time Domain Adaptation Optimizer. Briefly speaking, our main pipeline can be divided into two stages, a pre-training stage for domain generalization and a test-time training stage for domain adaptation. We only exploit labeled source data in the pre-training stage and only exploit unlabeled target data in the test-time training stage. In the pre-training stage, we propose a simple yet effective Mask-Level Copy-Paste data augmentation strategy to enhance out-of-distribution generalization ability so as to resist shape, pose, context, texture, occlusion, and weather domain shifts in this challenge. In the test-time training stage, we use the pre-trained model to assign noisy label for the unlabeled target data, and propose a Label-Periodically-Updated DivideMix method for noisy label learning. After integrating Test-Time Augmentation and Model Ensemble strategies, our solution ranks the first place on the Image Classification Leaderboard of the OOD-CV Challenge. Code will be released in https://github.com/hikvision-research/OOD-CV.
△ Less
Submitted 11 January, 2023;
originally announced January 2023.
-
NeRF-Gaze: A Head-Eye Redirection Parametric Model for Gaze Estimation
Authors:
Pengwei Yin,
Jiawu Dai,
Jingjing Wang,
Di Xie,
Shiliang Pu
Abstract:
Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover…
▽ More
Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.
△ Less
Submitted 30 December, 2022;
originally announced December 2022.
-
On low rank 4d $\mathcal{N}=2$ SCFTs
Authors:
Bohan Li,
Dan Xie,
Wenbin Yan
Abstract:
There are two major ways of constructing 4d $\mathcal{N}=2$ superconformal field theories (SCFTs): the first one is putting a 6d $(2,0)$ theory on a punctured Riemann surface (class-S theory), and the second one is putting type IIB string theory on a 3d canonical singularity. As there are interests on low rank theories, we search all the possibilities from above two constructions. Most of those th…
▽ More
There are two major ways of constructing 4d $\mathcal{N}=2$ superconformal field theories (SCFTs): the first one is putting a 6d $(2,0)$ theory on a punctured Riemann surface (class-S theory), and the second one is putting type IIB string theory on a 3d canonical singularity. As there are interests on low rank theories, we search all the possibilities from above two constructions. Most of those theories are engineered by class-S theory with irregular singularities, and we find a universal formula for the rank of theory so that a complete search is possible. We then compute various physical quantities of those theories, such as the central charges, flavor symmetry, associated vertex operator algebra and Higgs branch, etc. One of interesting consequence of our results are the prediction of many new isomorphism of 2d vertex operator algebra.
△ Less
Submitted 21 January, 2023; v1 submitted 6 December, 2022;
originally announced December 2022.
-
On rank two theories with eight supercharges part I: local singularities
Authors:
Dan Xie
Abstract:
A complete study of local singularities of rank two $\mathcal{N}=2$ Coulomb branch geometry is given. Low energy theory associated with the local singularity is identified: it can be superconformal field theory (SCFT), or IR free gauge theory, or the combination of them. Various invariants for local singularity are also listed which are essential for the study of global Coulomb branch. As a first…
▽ More
A complete study of local singularities of rank two $\mathcal{N}=2$ Coulomb branch geometry is given. Low energy theory associated with the local singularity is identified: it can be superconformal field theory (SCFT), or IR free gauge theory, or the combination of them. Various invariants for local singularity are also listed which are essential for the study of global Coulomb branch. As a first application, global Coulomb branch with only simplest local singularities in the bulk are given for 4d theories (including SCFTs and asymptotical free theories), 5d KK theories, and 6d KK theories; those examples appear to cover all the findings in the literature and suggest there are more possibilities. More general global Coulomb branch geometry would be discussed in the sequel of this paper.
△ Less
Submitted 5 December, 2022;
originally announced December 2022.
-
Preserving background sound in noise-robust voice conversion via multi-task learning
Authors:
Jixun Yao,
Yi Lei,
Qing Wang,
Pengcheng Guo,
Ziqian Ning,
Lei Xie,
Hai Li,
Junhui Liu,
Danming Xie
Abstract:
Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare attention to VC with background sound. The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and th…
▽ More
Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios. However, prior research about VC, mainly focusing on clean voices, pay rare attention to VC with background sound. The critical problem for preserving background sound in VC is inevitable speech distortion by the neural separation model and the cascade mismatch between the source separation model and the VC model. In this paper, we propose an end-to-end framework via multi-task learning which sequentially cascades a source separation (SS) module, a bottleneck feature extraction module and a VC module. Specifically, the source separation task explicitly considers critical phase information and confines the distortion caused by the imperfect separation process. The source separation task, the typical VC task and the unified task shares a uniform reconstruction loss constrained by joint training to reduce the mismatch between the SS and VC modules. Experimental results demonstrate that our proposed framework significantly outperforms the baseline systems while achieving comparable quality and speaker similarity to the VC models trained with clean data.
△ Less
Submitted 6 November, 2022;
originally announced November 2022.