-
Uhlmann-quench, A Protocol to Measure the Uhlmann Phase
Authors:
Jia-Chen Tang,
Xu-Yang Hou,
Zheng Zhou,
Hao Guo,
Chih-Chun Chien
Abstract:
Dynamic quantum phase transitions (DQPT) following quantum quenches exhibit singular behavior of the overlap of the initial and evolved states. Here we present a formalism to incorporate a geometric phase into quench dynamics of mixed quantum states, a process named the Uhlmann quench, based on a mixed-state generalization of the Berry phase known as the Uhlmann phase. It has been shown that the g…
▽ More
Dynamic quantum phase transitions (DQPT) following quantum quenches exhibit singular behavior of the overlap of the initial and evolved states. Here we present a formalism to incorporate a geometric phase into quench dynamics of mixed quantum states, a process named the Uhlmann quench, based on a mixed-state generalization of the Berry phase known as the Uhlmann phase. It has been shown that the geometric condition of the Uhlmann phase is incompatible with Hamiltonian dynamics, making its realization and measurement a challenging task. Nevertheless, we formulate the evolution of purification of the density matrix which not only respects the dynamics according to the density matrix but also incorporate the Uhlmann parallel-transport condition to generate a geometric phase following a quantum quench. For cyclic processes exemplified by a spin-1/2 system, geometric DQPTs can emerge with both singular behavior in the overlap and jumps of the geometric phase. Moreover, the Uhlmann phase is generated at the end of each cycle. The Uhlmann quench thus offers a route for investigating the interplay between quantum dynamics and geometric processes in mixed states.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
UNIT: Backdoor Mitigation via Automated Neural Distribution Tightening
Authors:
Siyuan Cheng,
Guangyu Shen,
Kaiyuan Zhang,
Guanhong Tao,
Shengwei An,
Hanxi Guo,
Shiqing Ma,
Xiangyu Zhang
Abstract:
Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent ad…
▽ More
Deep neural networks (DNNs) have demonstrated effectiveness in various fields. However, DNNs are vulnerable to backdoor attacks, which inject a unique pattern, called trigger, into the input to cause misclassification to an attack-chosen target label. While existing works have proposed various methods to mitigate backdoor effects in poisoned models, they tend to be less effective against recent advanced attacks. In this paper, we introduce a novel post-training defense technique UNIT that can effectively eliminate backdoor effects for a variety of attacks. In specific, UNIT approximates a unique and tight activation distribution for each neuron in the model. It then proactively dispels substantially large activation values that exceed the approximated boundaries. Our experimental results demonstrate that UNIT outperforms 7 popular defense methods against 14 existing backdoor attacks, including 2 advanced attacks, using only 5\% of clean training data. UNIT is also cost efficient. The code is accessible at https://github.com/Megum1/UNIT.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Fast Matrix Multiplications for Lookup Table-Quantized LLMs
Authors:
Han Guo,
William Brandon,
Radostin Cholakov,
Jonathan Ragan-Kelley,
Eric P. Xing,
Yoon Kim
Abstract:
The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. Howe…
▽ More
The deployment of large language models (LLMs) is often constrained by memory bandwidth, where the primary bottleneck is the cost of transferring model parameters from the GPU's global memory to its registers. When coupled with custom kernels that fuse the dequantization and matmul operations, weight-only quantization can thus enable faster inference by reducing the amount of memory movement. However, developing high-performance kernels for weight-quantized LLMs presents substantial challenges, especially when the weights are compressed to non-evenly-divisible bit widths (e.g., 3 bits) with non-uniform, lookup table (LUT) quantization. This paper describes FLUTE, a flexible lookup table engine for LUT-quantized LLMs, which uses offline restructuring of the quantized weight matrix to minimize bit manipulations associated with unpacking, and vectorization and duplication of the lookup table to mitigate shared memory bandwidth constraints. At batch sizes < 32 and quantization group size of 128 (typical in LLM inference), the FLUTE kernel can be 2-4x faster than existing GEMM kernels. As an application of FLUTE, we explore a simple extension to lookup table-based NormalFloat quantization and apply it to quantize LLaMA3 to various configurations, obtaining competitive quantization performance against strong baselines while obtaining an end-to-end throughput increase of 1.5 to 2 times.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
All Roads Lead to Rome: Unveiling the Trajectory of Recommender Systems Across the LLM Era
Authors:
Bo Chen,
Xinyi Dai,
Huifeng Guo,
Wei Guo,
Weiwen Liu,
Yong Liu,
Jiarui Qin,
Ruiming Tang,
Yichao Wang,
Chuhan Wu,
Yaxiong Wu,
Hao Zhang
Abstract:
Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader pic…
▽ More
Recommender systems (RS) are vital for managing information overload and delivering personalized content, responding to users' diverse information needs. The emergence of large language models (LLMs) offers a new horizon for redefining recommender systems with vast general knowledge and reasoning capabilities. Standing across this LLM era, we aim to integrate recommender systems into a broader picture, and pave the way for more comprehensive solutions for future research. Therefore, we first offer a comprehensive overview of the technical progression of recommender systems, particularly focusing on language foundation models and their applications in recommendation. We identify two evolution paths of modern recommender systems -- via list-wise recommendation and conversational recommendation. These two paths finally converge at LLM agents with superior capabilities of long-term memory, reflection, and tool intelligence. Along these two paths, we point out that the information effectiveness of the recommendation is increased, while the user's acquisition cost is decreased. Technical features, research methodologies, and inherent challenges for each milestone along the path are carefully investigated -- from traditional list-wise recommendation to LLM-enhanced recommendation to recommendation with LLM agents. Finally, we highlight several unresolved challenges crucial for the development of future personalization technologies and interfaces and discuss the future prospects.
△ Less
Submitted 14 July, 2024;
originally announced July 2024.
-
Don't Fear Peculiar Activation Functions: EUAF and Beyond
Authors:
Qianchao Wang,
Shijun Zhang,
Dong Zeng,
Zhaoheng Xie,
Hengtao Guo,
Feng-Lei Fan,
Tieyong Zeng
Abstract:
In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activatio…
▽ More
In this paper, we propose a new super-expressive activation function called the Parametric Elementary Universal Activation Function (PEUAF). We demonstrate the effectiveness of PEUAF through systematic and comprehensive experiments on various industrial and image datasets, including CIFAR10, Tiny-ImageNet, and ImageNet. Moreover, we significantly generalize the family of super-expressive activation functions, whose existence has been demonstrated in several recent works by showing that any continuous function can be approximated to any desired accuracy by a fixed-size network with a specific super-expressive activation function. Specifically, our work addresses two major bottlenecks in impeding the development of super-expressive activation functions: the limited identification of super-expressive functions, which raises doubts about their broad applicability, and their often peculiar forms, which lead to skepticism regarding their scalability and practicality in real-world applications.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
How coronal mass ejections are influenced by the morphology and toroidal flux of their source magnetic flux ropes?
Authors:
J. H. Guo,
L. Linan,
S. Poedts,
Y. Guo,
B. Schmieder,
A. Lani,
Y. W. Ni,
M. Brchnelova,
B. Perri,
T. Baratashvili,
S. T. Li,
P. F. Chen
Abstract:
Coronal mass ejections (CMEs) stand as intense eruptions of magnetized plasma from the Sun, playing a pivotal role in driving significant changes of the heliospheric environment. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space…
▽ More
Coronal mass ejections (CMEs) stand as intense eruptions of magnetized plasma from the Sun, playing a pivotal role in driving significant changes of the heliospheric environment. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. Deducing the properties of CMEs from their progenitors in solar source regions is crucial for space weather forecasting. The primary objective of this paper is to establish a connection between CMEs and their progenitors in solar source regions, enabling us to infer the magnetic structures of CMEs before their full development. To this end, we create a dataset comprising a magnetic flux rope series with varying projection shapes, sizes and toroidal fluxes, using the Regularized Biot-Savart Laws (RBSL). Thereafter, we simulate the propagation of these flux ropes from the solar surface to a distance of 25$R_{\odot}$ with our global coronal MHD model which is named COCONUT. Our parametric survey reveals significant impacts of source flux ropes on the consequent CMEs. We find that the projection shape can influence the magnetic structures of CMEs at 20$R_{\odot}$, albeit with minimal impacts on the propagation speed. However, these impacts diminish as source flux ropes become fat. In terms of toroidal flux, our simulation results demonstrate a pronounced correlation with the propagation speed of CMEs, as well as the successfulness in erupting. This work builds the bridge between the CMEs in the outer corona and their progenitors in solar source regions. Our parametric survey suggests that the projection shape, cross-section radius and toroidal flux of source flux ropes are crucial parameters in predicting magnetic structures and propagation speed of CMEs, providing valuable insights for space weather prediction.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
Coloring the intersection of two matroids
Authors:
Eli Berger,
He Guo
Abstract:
A result \cite{matcomp} from 2006 of Aharoni and the first author of this paper states that for any two natural numbers p, q, where p divides q, if a matroid M is p-colorable and a matroid N is q-colorable then M \cap N is (p+q)-colorable. In this paper we show that the assumption that p divides q is in fact redundant, and we also prove that M \cap N is even p+q list-colorable.
The result uses t…
▽ More
A result \cite{matcomp} from 2006 of Aharoni and the first author of this paper states that for any two natural numbers p, q, where p divides q, if a matroid M is p-colorable and a matroid N is q-colorable then M \cap N is (p+q)-colorable. In this paper we show that the assumption that p divides q is in fact redundant, and we also prove that M \cap N is even p+q list-colorable.
The result uses topology and relies on a new parameter yielding a lower bound for the topological connectivity of the intersection of two matroids.
△ Less
Submitted 12 July, 2024;
originally announced July 2024.
-
The list chromatic number of the intersection of two generalized partition matroids
Authors:
He Guo
Abstract:
A famous theorem of Galvin states that the list chromatic number of the intersection of two partition matroids equals its chromatic number. Kiraly and Berczi et. al. conjectured that this equality holds for any two matroids. We prove this conjecture and a conjecture by Aharoni--Berger for any two generalized partition matroids.
A famous theorem of Galvin states that the list chromatic number of the intersection of two partition matroids equals its chromatic number. Kiraly and Berczi et. al. conjectured that this equality holds for any two matroids. We prove this conjecture and a conjecture by Aharoni--Berger for any two generalized partition matroids.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Intersections of matroids
Authors:
Ron Aharoni,
Eli Berger,
He Guo,
Dani Kotlar
Abstract:
We study simplicial complexes (hypergraphs closed under taking subsets) that are the intersection of a given number k of matroids. We prove bounds on their chromatic numbers (the minimum number of edges required to cover the ground set) and their list chromatic numbers. Settling a conjecture of Kiraly and Berczi et. al., we prove that the list chromatic number is at most k times the chromatic numb…
▽ More
We study simplicial complexes (hypergraphs closed under taking subsets) that are the intersection of a given number k of matroids. We prove bounds on their chromatic numbers (the minimum number of edges required to cover the ground set) and their list chromatic numbers. Settling a conjecture of Kiraly and Berczi et. al., we prove that the list chromatic number is at most k times the chromatic number. Following the footsteps of Edmonds, who considered the case k=2, we study three polytopes associated with k-tuples of matroids, and prove bounds on the distances between them. The tools used are in part topological.
△ Less
Submitted 11 July, 2024;
originally announced July 2024.
-
Electronic Correlation and Pseudogap-like Behavior of High-Temperature Superconductor La3Ni2O7
Authors:
Yidian Li,
Xian Du,
Yantao Cao,
Cuiying Pei,
Mingxin Zhang,
Wenxuan Zhao,
Kaiyi Zhai,
Runzhe Xu,
Zhongkai Liu,
Zhiwei Li,
Jinkui Zhao,
Gang Li,
Yanpeng Qi,
Hanjie Guo,
Yulin Chen,
Lexian Yang
Abstract:
High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemissio…
▽ More
High-temperature superconductivity (HTSC) remains one of the most challenging and fascinating mysteries in condensed matter physics. Recently, superconductivity with transition temperature exceeding liquid-nitrogen temperature is discovered in La3Ni2O7 at high pressure, which provides a new platform to explore the unconventional HTSC. In this work, using high-resolution angle-resolved photoemission spectroscopy and ab-initio calculation, we systematically investigate the electronic structures of La3Ni2O7 at ambient pressure. Our experiments are in nice agreement with ab-initio calculations after considering an orbital-dependent band renormalization effect. The strong electron correlation effect pushes a flat band of d_(z^2 ) orbital component below the Fermi level (EF), which is predicted to locate right at EF under high pressure. Moreover, the d_(x^2-y^2 ) band shows a pseudogap-like behavior with suppressed spectral weight and diminished quasiparticle peak near EF. Our findings provide important insights into the electronic structure of La3Ni2O7, which will shed light on the understanding of the unconventional superconductivity in nickelates.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Visualization of Unconventional Rashba Band and Vortex Zero Mode in Topopogical Superconductor Candidate AuSn$_{4}$
Authors:
Yuhan Ye,
Rui Song,
Hongqin Xiao,
Guoyu Xian,
Hui Guo,
Haitao Yang,
Hui Chen,
Hong-Jun Gao
Abstract:
Topological superconductivity (TSC) is a promising platform to host Majorana zero mode (MZM) for topological quantum computing. Recently, the noble metal alloy AuSn$_{4}$ has been identified as an intrinsic surface TSC. However, the atomic visualization of its nontrivial surface states and MZM remains elusive. Here, we report the direct observation of unconventional surface states and vortex zero…
▽ More
Topological superconductivity (TSC) is a promising platform to host Majorana zero mode (MZM) for topological quantum computing. Recently, the noble metal alloy AuSn$_{4}$ has been identified as an intrinsic surface TSC. However, the atomic visualization of its nontrivial surface states and MZM remains elusive. Here, we report the direct observation of unconventional surface states and vortex zero mode at the gold (Au) terminated surfaces of AuSn$_{4}$, by ultra-low scanning tunneling microscope/spectroscopy. Distinct from the trivial metallic bulk states at tin (Sn) surfaces, the Au terminated surface exhibits pronounced surface states near Fermi level. Our density functional theory calculations indicate that these states arise from unconventional Rashba bands, where two Fermi circles from different bands share identical helical spin textures, chiralities, and group velocities in the same direction. Furthermore, we find that although the superconducting gap, critical temperature, anisotropic in-plane critical field are almost identical on Au and Sn terminated surfaces, the in-gap bound states inside Abrikosov vortex cores show significant differences. The vortex on Sn terminated surfaces exhibits a conventional Caroli-de Gennes-Matricon bound state while the Au surface shows a sharp zero-energy core state with a long non-splitting distance, resembling an MZM in a non-quantum-limit condition. This distinction may result from the dominant contribution of unconventional Rashba bands near Fermi energy from Au terminated surface. Our results provide a new platform for studying unconventional Rashba band and MZM in superconductors.
△ Less
Submitted 9 July, 2024; v1 submitted 8 July, 2024;
originally announced July 2024.
-
Quantitative measurement of viscosity in two-dimensional electron fluids
Authors:
Yihang Zeng,
Haoyu Guo,
Olivia M. Ghosh,
Kenji Watanabe,
Takashi Taniguchi,
Leonid S. Levitov,
Cory R. Dean
Abstract:
Electron hydrodynamics is an emerging framework that describes dynamics of interacting electron systems as conventional fluids. While evidence for hydrodynamic-like transport is reported in a variety of two-dimensional materials, precise quantitative measurement of the core parameter, electron viscosity, remains challenging. In this work, we demonstrate that magnetoresistance in Corbino-shaped gra…
▽ More
Electron hydrodynamics is an emerging framework that describes dynamics of interacting electron systems as conventional fluids. While evidence for hydrodynamic-like transport is reported in a variety of two-dimensional materials, precise quantitative measurement of the core parameter, electron viscosity, remains challenging. In this work, we demonstrate that magnetoresistance in Corbino-shaped graphene devices offers a simultaneous Ohmmeter/viscosometer, allowing us to disentangle the individual Ohmic and viscous contributions to the transport response, even in the mixed flow regime. Most surprising, we find that in both monolayer and bilayer graphene, the effective electron-electron scattering rate scales linearly with temperature, at odds with the expected $T$-squared dependence expected from conventional Fermi liquid theory, but consistent with a recently identified tomographic flow regime, which was theoretically conjectured to be generic for two-dimensional charged fluids.
△ Less
Submitted 6 July, 2024;
originally announced July 2024.
-
A Precise Fitting Formula for Gravitational Wave Spectra from Phase Transitions
Authors:
Huai-ke Guo,
Fazlollah Hajkarim,
Kuver Sinha,
Graham White,
Yang Xiao
Abstract:
Obtaining a precise form for the predicted gravitational wave (GW) spectrum from a phase transition is a topic of great relevance for beyond Standard Model (BSM) physicists. Currently, the most sophisticated semi-analytic framework for estimating the dominant contribution to the spectrum is the sound shell model; however, full calculations within this framework can be computationally expensive, es…
▽ More
Obtaining a precise form for the predicted gravitational wave (GW) spectrum from a phase transition is a topic of great relevance for beyond Standard Model (BSM) physicists. Currently, the most sophisticated semi-analytic framework for estimating the dominant contribution to the spectrum is the sound shell model; however, full calculations within this framework can be computationally expensive, especially for large-scale scans. The community therefore generally manages with fit functions to the GW spectrum, the most widely used of which is a single broken power law. We provide a more precise fit function based on the sound shell model: our fit function features a double broken power law with two frequency breaks corresponding to the two characteristic length scales of the problem -- inter-bubble spacing and thickness of sound shells, the second of which is neglected in the single broken power law fit. Compared to previously proposed fits, we demonstrate that our fit function more faithfully captures the GW spectrum coming from a full calculation of the sound shell model, over most of the space of the thermodynamic parameters governing the phase transition. The physical origins of the fit parameters and their dependence on the thermodynamic parameters are studied in the underlying sound shell model: in particular, we perform a series of detailed scans for these quantities over the plane of the strength of the phase transition ($α$) and the bubble wall velocity ($v_w$). Wherever possible, we comment on the physical interpretations of these scans. The result of our study can be used to generate accurate GW spectra with our fit function, given initial inputs of $α$, $v_w$, $β/H$ (nucleation rate parameter) and $T_n$ (nucleation temperature) for the relevant BSM scenario.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Modeling the Nonlinear Power Spectrum in Low-redshift HI Intensity Mapping
Authors:
Zhixing Li,
Laura Wolz,
Hong Guo,
Steven Cunnington,
Yi Mao
Abstract:
We present a simulation-based framework to forecast the HI power spectrum on non-linear scales ($k\gtrsim 1\ {\rm Mpc^{-1}}$), as measured by interferometer arrays like MeerKAT in the low-redshift ($z\leq 1.0$) universe. Building on a galaxy-based HI mock catalog, we meticulously consider various factors, including the emission line profiles of HI discs and some observational settings, and explore…
▽ More
We present a simulation-based framework to forecast the HI power spectrum on non-linear scales ($k\gtrsim 1\ {\rm Mpc^{-1}}$), as measured by interferometer arrays like MeerKAT in the low-redshift ($z\leq 1.0$) universe. Building on a galaxy-based HI mock catalog, we meticulously consider various factors, including the emission line profiles of HI discs and some observational settings, and explore their impacts on the HI power spectrum. While it is relatively insensitive to the profile shape of HI emission line at these scales, we identify a strong correlation with the profile width, that is, the Full Width at Half Maxima (FWHM, also known as $W_{\rm 50}$ in observations) in this work. By modeling the width function of $W_{50}$ as a function of $v_{\rm max}$, we assign each HI source a emission line profile and find that the resulting HI power spectrum is comparatively close to results from particles in the IllustrisTNG hydrodynamical simulation. After implementing $k$-space cuts matching the MeerKAT data, our prediction replicates the trend of the measurements obtained by MeerKAT at $z\approx 0.44$, though with a significantly lower amplitude. Utilizing a Monte Carlo Markov Chain sampling method, we constrain the parameter $A_{W_{\rm 50}}$ in the $W_{\rm 50}$ models and $Ω_{\rm HI}$ with the MeerKAT measurements and find that a strong degeneracy exists between these two parameters.
△ Less
Submitted 2 July, 2024;
originally announced July 2024.
-
Understanding the Broad-line Region of Active Galactic Nuclei with Photoionization. I. the Moderate-Accretion Regime
Authors:
Qiaoya Wu,
Yue Shen,
Hengxiao Guo,
Scott F. Anderson,
W. N. Brandt,
Catherine J. Grier,
Patrick B. Hall,
Luis C. Ho,
Yasaman Homayouni,
Keith Horne,
Jennifer I-Hsiu Li,
Donald P. Schneider
Abstract:
Over three decades of reverberation mapping (RM) studies on local broad-line active galactic nuclei (AGNs) have measured reliable black-hole (BH) masses for $> 100$ AGNs. These RM measurements reveal a significant correlation between the Balmer broad-line region size and the AGN optical luminosity (the $R-L$ relation). Recent RM studies for AGN samples with more diverse BH accretion parameters (e.…
▽ More
Over three decades of reverberation mapping (RM) studies on local broad-line active galactic nuclei (AGNs) have measured reliable black-hole (BH) masses for $> 100$ AGNs. These RM measurements reveal a significant correlation between the Balmer broad-line region size and the AGN optical luminosity (the $R-L$ relation). Recent RM studies for AGN samples with more diverse BH accretion parameters (e.g., mass and Eddington ratio) reveal a substantial intrinsic dispersion around the average $R-L$ relation, suggesting variations in the overall spectral energy distribution shape as functions of accretion parameters. Here we perform a detailed photoionization investigation of expected broad-line properties as functions of accretion parameters, using the latest models for the AGN continuum implemented in {\tt qsosed}. We compare theoretical predictions with observations of a sample of 67 $z\lesssim0.5$ reverberation-mapped AGNs with both rest-frame optical and UV spectra in the moderate-accretion regime (Eddington ratio $λ_{\rm Edd}\equiv L/L_{\rm Edd}<0.5$). The UV/optical line strengths and their dependences on accretion parameters can be reasonably well reproduced by the locally-optimally-emitting cloud (LOC) photoionization models. We provide quantitative recipes that use optical/UV line flux ratios to infer the ionizing continuum, which is not directly observable. In addition, photoionization models with universal values of ionization parameter ($\log U_{\rm H}=-2$) and hydrogen density ($\log n({\rm H})=12$) can qualitatively reproduce the observed global $R-L$ relation for the current AGN sample. However, such models fail to reproduce the observed trend of decreasing BLR size with $L/L_{\rm Edd}$ at fixed optical luminosity, which may imply that the gas density increases with the accretion rate.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Collaborative Graph Exploration with Reduced Pose-SLAM Uncertainty via Submodular Optimization
Authors:
Ruofei Bai,
Shenghai Yuan,
Hongliang Guo,
Pengyu Yin,
Wei-Yun Yau,
Lihua Xie
Abstract:
This paper considers the collaborative graph exploration problem in GPS-denied environments, where a group of robots are required to cover a graph environment while maintaining reliable pose estimations in collaborative simultaneous localization and mapping (SLAM). Considering both objectives presents challenges for multi-robot pathfinding, as it involves the expensive covariance inference for SLA…
▽ More
This paper considers the collaborative graph exploration problem in GPS-denied environments, where a group of robots are required to cover a graph environment while maintaining reliable pose estimations in collaborative simultaneous localization and mapping (SLAM). Considering both objectives presents challenges for multi-robot pathfinding, as it involves the expensive covariance inference for SLAM uncertainty evaluation, especially considering various combinations of robots' paths. To reduce the computational complexity, we propose an efficient two-stage strategy where exploration paths are first generated for quick coverage, and then enhanced by adding informative and distance-efficient loop-closing actions, called loop edges, along the paths for reliable pose estimation. We formulate the latter problem as a non-monotone submodular maximization problem by relating SLAM uncertainty with pose graph topology, which (1) facilitates more efficient evaluation of SLAM uncertainty than covariance inference, and (2) allows the application of approximation algorithms in submodular optimization to provide optimality guarantees. We further introduce the ordering heuristics to improve objective values while preserving the optimality bound. Simulation experiments over randomly generated graph environments verify the efficiency of our methods in finding paths for quick coverage and enhanced pose graph reliability, and benchmark the performance of the approximation algorithms and the greedy-based algorithm in the loop edge selection problem. Our implementations will be open-source at https://github.com/bairuofei/CGE.
△ Less
Submitted 1 July, 2024;
originally announced July 2024.
-
Atomic sheaves on hyper-Kähler manifolds via Bridgeland moduli spaces
Authors:
Hanfei Guo,
Zhiyu Liu
Abstract:
In this paper, we provide new examples of 1-obstructed and atomic sheaves on an infinite series of locally complete families of projective hyper-Kähler manifolds. More precisely,
(1) we prove that the fixed loci of the natural anti-symplectic involutions on the moduli spaces of stable objects in the Kuznetsov component $\mathcal{K}u(X)$ of a Gushel--Mukai fourfold $X$ are 1-obstructed Lagrangian…
▽ More
In this paper, we provide new examples of 1-obstructed and atomic sheaves on an infinite series of locally complete families of projective hyper-Kähler manifolds. More precisely,
(1) we prove that the fixed loci of the natural anti-symplectic involutions on the moduli spaces of stable objects in the Kuznetsov component $\mathcal{K}u(X)$ of a Gushel--Mukai fourfold $X$ are 1-obstructed Lagrangian submanifolds,
(2) we construct a family of immersed atomic Lagrangian submanifolds on each moduli space of stable objects in $\mathcal{K}u(X)$, and
(3) we construct non-rigid projectively hyperholomorphic twisted bundles on any hyper-Kähler manifold of $\mathrm{K3^{[n]}}$-type for infinitely many $n$.
Additionally, we discuss examples of atomic Lagrangian submanifolds satisfying $b_1=20$ in a family of hyper-Kähler manifolds of $\mathrm{K3^{[2]}}$-type, as well as atomic sheaves supported on non-atomic Lagrangians.
△ Less
Submitted 14 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
Prior-Informed AGN-Host Spectral Decomposition Using PyQSOFit
Authors:
Wenke Ren,
Hengxiao Guo,
Yue Shen,
John D. Silverman,
Colin J. Burke,
Shu Wang,
Junxian Wang
Abstract:
We introduce an improved method for decomposing the emission of active galactic nuclei (AGN) and their host galaxies using templates from principal component analysis (PCA). This approach integrates prior information from PCA with a penalized pixel fitting mechanism which improves the precision and effectiveness of the decomposition process. Specifically, we have reduced the degeneracy and over-fi…
▽ More
We introduce an improved method for decomposing the emission of active galactic nuclei (AGN) and their host galaxies using templates from principal component analysis (PCA). This approach integrates prior information from PCA with a penalized pixel fitting mechanism which improves the precision and effectiveness of the decomposition process. Specifically, we have reduced the degeneracy and over-fitting in AGN-host decomposition, particularly for those with low signal-to-noise ratios (SNR), where traditional methods tend to fail. By applying our method to 76,565 SDSS Data Release 16 quasars with $z<0.8$, we achieve a success rate of $\approx$ 94%, thus establishing the largest host-decomposed spectral catalog of quasars to date. Our fitting results consider the impact of the host galaxy on the overestimation of the AGN luminosity and black hole mass ($M_{\rm BH}$). Furthermore, we obtained stellar velocity dispersion ($σ_*$) measurements for 4,137 quasars. The slope of the $M_{\rm BH}-σ_*$ relation in this subsample is generally consistent with previous quasar studies beyond the local universe. Our method provides a robust and efficient approach to disentangle the AGN and host galaxy components across a wide range of SNRs and redshifts.
△ Less
Submitted 25 June, 2024;
originally announced June 2024.
-
LOGCAN++: Adaptive Local-global class-aware network for semantic segmentation of remote sensing imagery
Authors:
Xiaowen Ma,
Rongrong Lian,
Zhenkai Wu,
Hongbo Guo,
Mengting Ma,
Sensen Wu,
Zhenhong Du,
Siyang Song,
Wei Zhang
Abstract:
Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensin…
▽ More
Remote sensing images usually characterized by complex backgrounds, scale and orientation variations, and large intra-class variance. General semantic segmentation methods usually fail to fully investigate the above issues, and thus their performances on remote sensing image segmentation are limited. In this paper, we propose our LOGCAN++, a semantic segmentation model customized for remote sensing images, which is made up of a Global Class Awareness (GCA) module and several Local Class Awareness (LCA) modules. The GCA module captures global representations for class-level context modeling to reduce the interference of background noise. The LCA module generates local class representations as intermediate perceptual elements to indirectly associate pixels with the global class representations, targeting at dealing with the large intra-class variance problem. In particular, we introduce affine transformations in the LCA module for adaptive extraction of local class representations to effectively tolerate scale and orientation variations in remotely sensed images. Extensive experiments on three benchmark datasets show that our LOGCAN++ outperforms current mainstream general and remote sensing semantic segmentation methods and achieves a better trade-off between speed and accuracy. Code is available at https://github.com/xwmaxwma/rssegmentation.
△ Less
Submitted 1 July, 2024; v1 submitted 24 June, 2024;
originally announced June 2024.
-
UniCoder: Scaling Code Large Language Model via Universal Code
Authors:
Tao Sun,
Linzheng Chai,
Jian Yang,
Yuwei Yin,
Hongcheng Guo,
Jiaheng Liu,
Bing Wang,
Liqun Yang,
Zhoujun Li
Abstract:
Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural lan…
▽ More
Intermediate reasoning or acting steps have successfully improved large language models (LLMs) for handling various downstream natural language processing (NLP) tasks. When applying LLMs for code generation, recent works mainly focus on directing the models to articulate intermediate natural-language reasoning steps, as in chain-of-thought (CoT) prompting, and then output code with the natural language or other structured intermediate steps. However, such output is not suitable for code translation or generation tasks since the standard CoT has different logical structures and forms of expression with the code. In this work, we introduce the universal code (UniCode) as the intermediate representation. It is a description of algorithm steps using a mix of conventions of programming languages, such as assignment operator, conditional operator, and loop. Hence, we collect an instruction dataset UniCoder-Instruct to train our model UniCoder on multi-task learning objectives. UniCoder-Instruct comprises natural-language questions, code solutions, and the corresponding universal code. The alignment between the intermediate universal code representation and the final code solution significantly improves the quality of the generated code. The experimental results demonstrate that UniCoder with the universal code significantly outperforms the previous prompting methods by a large margin, showcasing the effectiveness of the structural clues in pseudo-code.
△ Less
Submitted 24 June, 2024;
originally announced June 2024.
-
Gradient enhanced ADMM Algorithm for dynamic optimal transport on surfaces
Authors:
Guozhi Dong,
Hailong Guo,
Chengrun Jiang,
Zuoqiang Shi
Abstract:
A gradient enhanced ADMM algorithm for optimal transport on general surfaces is proposed in this paper. Based on Benamou and Brenier's dynamical formulation, we combine gradient recovery techniques on surfaces with the ADMM algorithm, not only improving the computational accuracy, but also providing a novel method to deal with dual variables in the algorithm. This method avoids the use of stagger…
▽ More
A gradient enhanced ADMM algorithm for optimal transport on general surfaces is proposed in this paper. Based on Benamou and Brenier's dynamical formulation, we combine gradient recovery techniques on surfaces with the ADMM algorithm, not only improving the computational accuracy, but also providing a novel method to deal with dual variables in the algorithm. This method avoids the use of stagger grids, has better accuracy and is more robust comparing to other averaging techniques.
△ Less
Submitted 23 June, 2024;
originally announced June 2024.
-
Secure Combination of Untrusted Time information Based on Optimized Dempster-Shafer Theory
Authors:
Yang Li,
Yujie Luo,
Yichen Zhang,
Ao Sun,
Wei Huang,
Shuai Zhang,
Tao Zhang,
Chuang Zhou,
Li Ma,
Jie Yang,
Mei Wu,
Heng Wang,
Yan Pan,
Yun Shao,
Xing Chen,
Ziyang Chen,
Song Yu,
Hong Guo,
Bingjie Xu
Abstract:
Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still…
▽ More
Secure precision time synchronization is important for applications of Cyber-Physical Systems. However, several attacks, especially the Time Delay Attack (TDA), deteriorates the performance of time synchronization system seriously. Multiple paths scheme is thought as an effective security countermeasure to decrease the influence of TDA. However, the effective secure combination algorithm is still missed for precision time synchronization. In this paper, a secure combination algorithm based on Dempster-Shafer theory is proposed for multiple paths method. Special optimizations are done for the combination algorithm to solve the potential problems due to untrusted evidence. Theoretical simulation shows that the proposed algorithm works much better than Fault Tolerant Algorithm (FTA) and the attack detection method based on single path. And experimental demonstration proves the feasibility and superiority of the proposed algorithm, where the time stability with 27.97 ps, 1.57 ps, and 1.12 ps at average time 1s, 10s, 100s is achieved under TDA and local clock jump. The proposed algorithm can be used to improve the security and resilience of many importance synchronization protocol, such as NTP, PTP, and TWFTT.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
TraceNet: Segment one thing efficiently
Authors:
Mingyuan Wu,
Zichuan Liu,
Haozhen Zheng,
Hongpeng Guo,
Bo Chen,
Xin Lu,
Klara Nahrstedt
Abstract:
Efficient single instance segmentation is essential for unlocking features in the mobile imaging applications, such as capture or editing. Existing on-the-fly mobile imaging applications scope the segmentation task to portraits or the salient subject due to the computational constraints. Instance segmentation, despite its recent developments towards efficient networks, is still heavy due to the co…
▽ More
Efficient single instance segmentation is essential for unlocking features in the mobile imaging applications, such as capture or editing. Existing on-the-fly mobile imaging applications scope the segmentation task to portraits or the salient subject due to the computational constraints. Instance segmentation, despite its recent developments towards efficient networks, is still heavy due to the cost of computation on the entire image to identify all instances. To address this, we propose and formulate a one tap driven single instance segmentation task that segments a single instance selected by a user via a positive tap. This task, in contrast to the broader task of segmenting anything as suggested in the Segment Anything Model \cite{sam}, focuses on efficient segmentation of a single instance specified by the user. To solve this problem, we present TraceNet, which explicitly locates the selected instance by way of receptive field tracing. TraceNet identifies image regions that are related to the user tap and heavy computations are only performed on selected regions of the image. Therefore overall computation cost and memory consumption are reduced during inference. We evaluate the performance of TraceNet on instance IoU average over taps and the proportion of the region that a user tap can fall into for a high-quality single-instance mask. Experimental results on MS-COCO and LVIS demonstrate the effectiveness and efficiency of the proposed approach. TraceNet can jointly achieve the efficiency and interactivity, filling in the gap between needs for efficient mobile inference and recent research trend towards multimodal and interactive segmentation models.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models
Authors:
Shilong Li,
Yancheng He,
Hangyu Guo,
Xingyuan Bu,
Ge Bai,
Jie Liu,
Jiaheng Liu,
Xingwei Qu,
Yangguang Li,
Wanli Ouyang,
Wenbo Su,
Bo Zheng
Abstract:
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore t…
▽ More
Long-context capabilities are essential for large language models (LLMs) to tackle complex and long-input tasks. Despite numerous efforts made to optimize LLMs for long contexts, challenges persist in robustly processing long inputs. In this paper, we introduce GraphReader, a graph-based agent system designed to handle long texts by structuring them into a graph and employing an agent to explore this graph autonomously. Upon receiving a question, the agent first undertakes a step-by-step analysis and devises a rational plan. It then invokes a set of predefined functions to read node content and neighbors, facilitating a coarse-to-fine exploration of the graph. Throughout the exploration, the agent continuously records new insights and reflects on current circumstances to optimize the process until it has gathered sufficient information to generate an answer. Experimental results on the LV-Eval dataset reveal that GraphReader, using a 4k context window, consistently outperforms GPT-4-128k across context lengths from 16k to 256k by a large margin. Additionally, our approach demonstrates superior performance on four challenging single-hop and multi-hop benchmarks.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Fluctuation Spectrum of Critical Fermi Surfaces
Authors:
Haoyu Guo
Abstract:
We investigate the low-energy effective theory of a Fermi surface coupled to an Ising-nematic quantum critical point in (2+1) spacetime dimensions with translation symmetry. We formulate the system using the large $N$ Yukawa-SYK model, whose saddle point is described by the Migdal-Eliashberg equations. The low-energy physics can be revealed by studying the Gaussian fluctuation spectrum around the…
▽ More
We investigate the low-energy effective theory of a Fermi surface coupled to an Ising-nematic quantum critical point in (2+1) spacetime dimensions with translation symmetry. We formulate the system using the large $N$ Yukawa-SYK model, whose saddle point is described by the Migdal-Eliashberg equations. The low-energy physics can be revealed by studying the Gaussian fluctuation spectrum around the saddle point, which is generated by the Bethe-Salpeter kernel $K_\text{BS}$. Based on the Ward identities, we propose an inner product on the space of two point functions, which reveals a large number of soft modes of $K_\text{BS}$. These soft modes parameterize deformation of the Fermi surface, and their fluctuation eigenvalues describe their decay rates. We analytically compute these eigenvalues for a circular Fermi surface, and we discover the odd-parity modes to be parametrically longer-lived than the even-parity modes, due to the kinematic constraint of fermions scattering on a convex FS. The sign of the eigenvalues signals an instability of the Ising-nematic quantum critical point at zero temperature for a convex Fermi surface. At finite temperature, the system can be stabilized by thermal fluctuations of the critical boson. We derive an effective action that describes the soft-mode dynamics, and it leads to a linearized Boltzmann equation, where the real part of the soft-mode eigenvalues can be interpreted as the collision rates. The structure of the effective action is similar to the theory of linear bosonization of a Fermi surface. As an application, we investigate the hydrodynamic transport of non-Fermi liquid. Analyzing the Boltzmann equation, we obtain a conventional hydrodynamic transport regime and a tomographic transport regime. In both regimes, the conductance of the system in finite geometry can be a sharp indicator for the soft-mode dynamics and non-Fermi liquid physics.
△ Less
Submitted 6 July, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
LLM4MSR: An LLM-Enhanced Paradigm for Multi-Scenario Recommendation
Authors:
Yuhao Wang,
Yichao Wang,
Zichuan Fu,
Xiangyang Li,
Xiangyu Zhao,
Huifeng Guo,
Ruiming Tang
Abstract:
As the demand for more personalized recommendation grows and a dramatic boom in commercial scenarios arises, the study on multi-scenario recommendation (MSR) has attracted much attention, which uses the data from all scenarios to simultaneously improve their recommendation performance. However, existing methods tend to integrate insufficient scenario knowledge and neglect learning personalized cro…
▽ More
As the demand for more personalized recommendation grows and a dramatic boom in commercial scenarios arises, the study on multi-scenario recommendation (MSR) has attracted much attention, which uses the data from all scenarios to simultaneously improve their recommendation performance. However, existing methods tend to integrate insufficient scenario knowledge and neglect learning personalized cross-scenario preferences, thus leading to suboptimal performance and inadequate interpretability. Meanwhile, though large language model (LLM) has shown great capability of reasoning and capturing semantic information, the high inference latency and high computation cost of tuning hinder its implementation in industrial recommender systems. To fill these gaps, we propose an effective efficient interpretable LLM-enhanced paradigm LLM4MSR in this work. Specifically, we first leverage LLM to uncover multi-level knowledge including scenario correlations and users' cross-scenario interests from the designed scenario- and user-level prompt without fine-tuning the LLM, then adopt hierarchical meta networks to generate multi-level meta layers to explicitly improves the scenario-aware and personalized recommendation capability. Our experiments on KuaiSAR-small, KuaiSAR, and Amazon datasets validate two significant advantages of LLM4MSR: (i) the effectiveness and compatibility with different multi-scenario backbone models (achieving 1.5%, 1%, and 40% AUC improvement on three datasets), (ii) high efficiency and deployability on industrial recommender systems, and (iii) improved interpretability. The implemented code and data is available to ease reproduction.
△ Less
Submitted 18 June, 2024;
originally announced June 2024.
-
LLM-enhanced Reranking in Recommender Systems
Authors:
Jingtong Gao,
Bo Chen,
Xiangyu Zhao,
Weiwen Liu,
Xiangyang Li,
Yichao Wang,
Zijian Zhang,
Wanyu Wang,
Yuyang Ye,
Shanru Lin,
Huifeng Guo,
Ruiming Tang
Abstract:
Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms. Traditional reranking models have focused predominantly on accuracy, but modern applications demand consideration of additional criteria such as diversity and fairness. Existing reranking approaches often fail to harmonize these diverse criteria effectively at th…
▽ More
Reranking is a critical component in recommender systems, playing an essential role in refining the output of recommendation algorithms. Traditional reranking models have focused predominantly on accuracy, but modern applications demand consideration of additional criteria such as diversity and fairness. Existing reranking approaches often fail to harmonize these diverse criteria effectively at the model level. Moreover, these models frequently encounter challenges with scalability and personalization due to their complexity and the varying significance of different reranking criteria in diverse scenarios. In response, we introduce a comprehensive reranking framework enhanced by LLM, designed to seamlessly integrate various reranking criteria while maintaining scalability and facilitating personalized recommendations. This framework employs a fully connected graph structure, allowing the LLM to simultaneously consider multiple aspects such as accuracy, diversity, and fairness through a coherent Chain-of-Thought (CoT) process. A customizable input mechanism is also integrated, enabling the tuning of the language model's focus to meet specific reranking needs. We validate our approach using three popular public datasets, where our framework demonstrates superior performance over existing state-of-the-art reranking models in balancing multiple criteria. The code for this implementation is publicly available.
△ Less
Submitted 20 June, 2024; v1 submitted 18 June, 2024;
originally announced June 2024.
-
GeoGPT4V: Towards Geometric Multi-modal Large Language Models with Geometric Image Generation
Authors:
Shihao Cai,
Keqin Bao,
Hangyu Guo,
Jizhi Zhang,
Jun Song,
Bo Zheng
Abstract:
Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source da…
▽ More
Large language models have seen widespread adoption in math problem-solving. However, in geometry problems that usually require visual aids for better understanding, even the most advanced multi-modal models currently still face challenges in effectively using image information. High-quality data is crucial for enhancing the geometric capabilities of multi-modal models, yet existing open-source datasets and related efforts are either too challenging for direct model learning or suffer from misalignment between text and images. To overcome this issue, we introduce a novel pipeline that leverages GPT-4 and GPT-4V to generate relatively basic geometry problems with aligned text and images, facilitating model learning. We have produced a dataset of 4.9K geometry problems and combined it with 19K open-source data to form our GeoGPT4V dataset. Experimental results demonstrate that the GeoGPT4V dataset significantly improves the geometry performance of various models on the MathVista and MathVision benchmarks. The code is available at https://github.com/Lanyu0303/GeoGPT4V_Project
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Toward Optimal LLM Alignments Using Two-Player Games
Authors:
Rui Zheng,
Hongyi Guo,
Zhihan Liu,
Xiaoying Zhang,
Yuanshun Yao,
Xiaojun Xu,
Zhaoran Wang,
Zhiheng Xi,
Tao Gui,
Qi Zhang,
Xuanjing Huang,
Hang Li,
Yang Liu
Abstract:
The standard Reinforcement Learning from Human Feedback (RLHF) framework primarily focuses on optimizing the performance of large language models using pre-collected prompts. However, collecting prompts that provide comprehensive coverage is both tedious and challenging, and often fails to include scenarios that LLMs need to improve on the most. In this paper, we investigate alignment through the…
▽ More
The standard Reinforcement Learning from Human Feedback (RLHF) framework primarily focuses on optimizing the performance of large language models using pre-collected prompts. However, collecting prompts that provide comprehensive coverage is both tedious and challenging, and often fails to include scenarios that LLMs need to improve on the most. In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. The adversarial agent's task at each step is to generate prompts that expose the weakness of the defensive agent. In return, the defensive agent seeks to improve its responses to these newly identified prompts it struggled with, based on feedback from the reward model. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
A Late-Stage Bitemporal Feature Fusion Network for Semantic Change Detection
Authors:
Chenyao Zhou,
Haotian Zhang,
Han Guo,
Zhengxia Zou,
Zhenwei Shi
Abstract:
Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection sub…
▽ More
Semantic change detection is an important task in geoscience and earth observation. By producing a semantic change map for each temporal phase, both the land use land cover categories and change information can be interpreted. Recently some multi-task learning based semantic change detection methods have been proposed to decompose the task into semantic segmentation and binary change detection subtasks. However, previous works comprise triple branches in an entangled manner, which may not be optimal and hard to adopt foundation models. Besides, lacking explicit refinement of bitemporal features during fusion may cause low accuracy. In this letter, we propose a novel late-stage bitemporal feature fusion network to address the issue. Specifically, we propose local global attentional aggregation module to strengthen feature fusion, and propose local global context enhancement module to highlight pivotal semantics. Comprehensive experiments are conducted on two public datasets, including SECOND and Landsat-SCD. Quantitative and qualitative results show that our proposed model achieves new state-of-the-art performance on both datasets.
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Authors:
Ruibo Fu,
Shuchen Shi,
Hongming Guo,
Tao Wang,
Chunyu Qiang,
Zhengqi Wen,
Jianhua Tao,
Xin Qi,
Yi Lu,
Xiaopeng Wang,
Zhiyong Wang,
Yukun Liu,
Xuefei Liu,
Shuai Zhang,
Guanjun Li
Abstract:
Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on…
▽ More
Foley audio, critical for enhancing the immersive experience in multimedia content, faces significant challenges in the AI-generated content (AIGC) landscape. Despite advancements in AIGC technologies for text and image generation, the foley audio dubbing remains rudimentary due to difficulties in cross-modal scene matching and content correlation. Current text-to-audio technology, which relies on detailed and acoustically relevant textual descriptions, falls short in practical video dubbing applications. Existing datasets like AudioSet, AudioCaps, Clotho, Sound-of-Story, and WavCaps do not fully meet the requirements for real-world foley audio dubbing task. To address this, we introduce the Multi-modal Image and Narrative Text Dubbing Dataset (MINT), designed to enhance mainstream dubbing tasks such as literary story audiobooks dubbing, image/silent video dubbing. Besides, to address the limitations of existing TTA technology in understanding and planning complex prompts, a Foley Audio Content Planning, Generation, and Alignment (CPGA) framework is proposed, which includes a content planning module leveraging large language models for complex multi-modal prompts comprehension. Additionally, the training process is optimized using Proximal Policy Optimization based reinforcement learning, significantly improving the alignment and auditory realism of generated foley audio. Experimental results demonstrate that our approach significantly advances the field of foley audio dubbing, providing robust solutions for the challenges of multi-modal dubbing. Even when utilizing the relatively lightweight GPT-2 model, our framework outperforms open-source multimodal large models such as LLaVA, DeepSeek-VL, and Moondream2. The dataset is available at https://github.com/borisfrb/MINT .
△ Less
Submitted 15 June, 2024;
originally announced June 2024.
-
Improving the Accuracy of Halo Mass Based Statistics For Fast Approximate N-body Simulations
Authors:
Yiheng Wu,
Hong Guo,
Volker Springel
Abstract:
Approximate N-body methods, such as FastPM and COLA, have been successful in modelling halo and galaxy clustering statistics, but their low resolution on small scales is a limitation for applications that require high precision. Full N-body simulations can provide better accuracy but are too computationally expensive for a quick exploration of cosmological parameters. This paper presents a method…
▽ More
Approximate N-body methods, such as FastPM and COLA, have been successful in modelling halo and galaxy clustering statistics, but their low resolution on small scales is a limitation for applications that require high precision. Full N-body simulations can provide better accuracy but are too computationally expensive for a quick exploration of cosmological parameters. This paper presents a method for correcting distinct haloes identified in fast N-body simulations, so that various halo statistics improve to a percent level accuracy. The scheme seeks to find empirical corrections to halo properties such that the virial mass is the same as that of a corresponding halo in a full N-body simulation. The modified outer density contour of the corrected halo is determined on the basis of the FastPM settings and the number of particles inside the halo. This method only changes some parameters of the halo finder, and does not require any extra CPU-cost. We demonstrate that the adjusted halo catalogues of FastPM simulations significantly improve the precision of halo mass-based statistics from redshifts $z=0.0$ to $1.0$, and that our calibration can be applied to different cosmologies without needing to be recalibrated.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design
Authors:
Ming Gao,
Hang Chen,
Jun Du,
Xin Xu,
Hongxiao Guo,
Hui Bu,
Jianxing Yang,
Ming Li,
Chin-Hui Lee
Abstract:
Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we…
▽ More
Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Authors:
Dongchao Yang,
Haohan Guo,
Yuanyuan Wang,
Rongjie Huang,
Xiang Li,
Xu Tan,
Xixin Wu,
Helen Meng
Abstract:
The Large Language models (LLMs) have demonstrated supreme capabilities in text understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel and LLMs-dr…
▽ More
The Large Language models (LLMs) have demonstrated supreme capabilities in text understanding and generation, but cannot be directly applied to cross-modal tasks without fine-tuning. This paper proposes a cross-modal in-context learning approach, empowering the frozen LLMs to achieve multiple audio tasks in a few-shot style without any parameter update. Specifically, we propose a novel and LLMs-driven audio codec model, LLM-Codec, to transfer the audio modality into the textual space, \textit{i.e.} representing audio tokens with words or sub-words in the vocabulary of LLMs, while keeping high audio reconstruction quality. The key idea is to reduce the modality heterogeneity between text and audio by compressing the audio modality into a well-trained LLMs token space. Thus, the audio representation can be viewed as a new \textit{foreign language}, and LLMs can learn the new \textit{foreign language} with several demonstrations. In experiments, we investigate the performance of the proposed approach across multiple audio understanding and generation tasks, \textit{e.g.} speech emotion classification, audio classification, text-to-speech generation, speech enhancement, etc. The experimental results demonstrate that the LLMs equipped with the proposed LLM-Codec, named as UniAudio 1.5, prompted by only a few examples, can achieve the expected functions in simple scenarios. It validates the feasibility and effectiveness of the proposed cross-modal in-context learning approach. To facilitate research on few-shot audio task learning and multi-modal LLMs, we have open-sourced the LLM-Codec model.
△ Less
Submitted 14 June, 2024;
originally announced June 2024.
-
MSz: An Efficient Parallel Algorithm for Correcting Morse-Smale Segmentations in Error-Bounded Lossy Compressors
Authors:
Yuxiao Li,
Xin Liang,
Bei Wang,
Yongfeng Qiu,
Lin Yan,
Hanqi Guo
Abstract:
This research explores a novel paradigm for preserving topological segmentations in existing error-bounded lossy compressors. Today's lossy compressors rarely consider preserving topologies such as Morse-Smale complexes, and the discrepancies in topology between original and decompressed datasets could potentially result in erroneous interpretations or even incorrect scientific conclusions. In thi…
▽ More
This research explores a novel paradigm for preserving topological segmentations in existing error-bounded lossy compressors. Today's lossy compressors rarely consider preserving topologies such as Morse-Smale complexes, and the discrepancies in topology between original and decompressed datasets could potentially result in erroneous interpretations or even incorrect scientific conclusions. In this paper, we focus on preserving Morse-Smale segmentations in 2D/3D piecewise linear scalar fields, targeting the precise reconstruction of minimum/maximum labels induced by the integral line of each vertex. The key is to derive a series of edits during compression time; the edits are applied to the decompressed data, leading to an accurate reconstruction of segmentations while keeping the error within the prescribed error bound. To this end, we developed a workflow to fix extrema and integral lines alternatively until convergence within finite iterations; we accelerate each workflow component with shared-memory/GPU parallelism to make the performance practical for coupling with compressors. We demonstrate use cases with fluid dynamics, ocean, and cosmology application datasets with a significant acceleration with an NVIDIA A100 GPU.
△ Less
Submitted 5 July, 2024; v1 submitted 5 April, 2024;
originally announced June 2024.
-
Strange metal and superconductor in the two-dimensional Yukawa-Sachdev-Ye-Kitaev model
Authors:
Chenyuan Li,
Davide Valentinis,
Aavishkar A. Patel,
Haoyu Guo,
Jörg Schmalian,
Subir Sachdev,
Ilya Esterlis
Abstract:
The two-dimensional Yukawa-Sachdev-Ye-Kitaev (YSYK) model provides a universal theory of quantum phase transitions in metals in the presence of quenched random spatial fluctuations in the local position of the quantum critical point. It has a Fermi surface coupled to a scalar field by spatially random Yukawa interactions. We present full numerical solutions of a self-consistent disorder averaged a…
▽ More
The two-dimensional Yukawa-Sachdev-Ye-Kitaev (YSYK) model provides a universal theory of quantum phase transitions in metals in the presence of quenched random spatial fluctuations in the local position of the quantum critical point. It has a Fermi surface coupled to a scalar field by spatially random Yukawa interactions. We present full numerical solutions of a self-consistent disorder averaged analysis of the YSYK model in both the normal and superconducting states, obtaining electronic spectral functions, frequency-dependent conductivity, and superfluid stiffness. Our results reproduce key aspects of observations in the cuprates as analyzed by Michon $et$ $al$. (Nat. Comm. $\bf{14}$, 3033 (2023)). We also find a regime of increasing zero temperature superfluid stiffness with decreasing superconducting critical temperature, as is observed in bulk cuprates.
△ Less
Submitted 19 June, 2024; v1 submitted 11 June, 2024;
originally announced June 2024.
-
McEval: Massively Multilingual Code Evaluation
Authors:
Linzheng Chai,
Shukai Liu,
Jian Yang,
Yuwei Yin,
Ke Jin,
Jiaheng Liu,
Tao Sun,
Ge Zhang,
Changyu Ren,
Hongcheng Guo,
Zekun Wang,
Boyang Wang,
Xianjie Wu,
Bing Wang,
Tongliang Li,
Liqun Yang,
Sufeng Duan,
Zhoujun Li
Abstract:
Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited nu…
▽ More
Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard to evaluate the capability of different LLMs in such tasks. However, most existing benchmarks primarily focus on Python and are still restricted to a limited number of languages, where other languages are translated from the Python samples (e.g. MultiPL-E) degrading the data diversity. To further facilitate the research of code LLMs, we propose a massively multilingual code benchmark covering 40 programming languages (McEval) with 16K test samples, which substantially pushes the limits of code LLMs in multilingual scenarios. The benchmark contains challenging code completion, understanding, and generation evaluation tasks with finely curated massively multilingual instruction corpora McEval-Instruct. In addition, we introduce an effective multilingual coder mCoder trained on McEval-Instruct to support multilingual programming language generation. Extensive experimental results on McEval show that there is still a difficult journey between open-source models and closed-source LLMs (e.g. GPT-series models) in numerous languages. The instruction corpora, evaluation benchmark, and leaderboard are available at \url{https://mceval.github.io/}.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation
Authors:
Hanzhao Li,
Liumeng Xue,
Haohan Guo,
Xinfa Zhu,
Yuanjun Lv,
Lei Xie,
Yunlin Chen,
Hao Yin,
Zhifei Li
Abstract:
The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermor…
▽ More
The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction. To avoid this obstacle, we propose Single-Codec, a single-codebook single-sequence codec, which employs a disentangled VQ-VAE to decouple speech into a time-invariant embedding and a phonetically-rich discrete sequence. Furthermore, the encoder is enhanced with 1) contextual modeling with a BLSTM module to exploit the temporal information, 2) a hybrid sampling module to alleviate distortion from upsampling and downsampling, and 3) a resampling module to encourage discrete units to carry more phonetic information. Compared with multi-codebook codecs, e.g., EnCodec and TiCodec, Single-Codec demonstrates higher reconstruction quality with a lower bandwidth of only 304bps. The effectiveness of Single-Code is further validated by LLM-TTS experiments, showing improved naturalness and intelligibility.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
NeRSP: Neural 3D Reconstruction for Reflective Objects with Sparse Polarized Images
Authors:
Yufei Han,
Heng Guo,
Koki Fukai,
Hiroaki Santo,
Boxin Shi,
Fumio Okura,
Zhanyu Ma,
Yunpeng Jia
Abstract:
We present NeRSP, a Neural 3D reconstruction technique for Reflective surfaces with Sparse Polarized images. Reflective surface reconstruction is extremely challenging as specular reflections are view-dependent and thus violate the multiview consistency for multiview stereo. On the other hand, sparse image inputs, as a practical capture setting, commonly cause incomplete or distorted results due t…
▽ More
We present NeRSP, a Neural 3D reconstruction technique for Reflective surfaces with Sparse Polarized images. Reflective surface reconstruction is extremely challenging as specular reflections are view-dependent and thus violate the multiview consistency for multiview stereo. On the other hand, sparse image inputs, as a practical capture setting, commonly cause incomplete or distorted results due to the lack of correspondence matching. This paper jointly handles the challenges from sparse inputs and reflective surfaces by leveraging polarized images. We derive photometric and geometric cues from the polarimetric image formation model and multiview azimuth consistency, which jointly optimize the surface geometry modeled via implicit neural representation. Based on the experiments on our synthetic and real datasets, we achieve the state-of-the-art surface reconstruction results with only 6 views as input.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Existence and uniqueness of ground state solutions for the planar Schrödinger-Newton equation on the disc
Authors:
Hui Guo,
Zhiwen Long,
Tao Wang
Abstract:
This paper is concerned with the existence and qualitative properties of positive ground state solutions for the planar Schrödinger-Newton equation on the disc. First, we prove the existence and radial symmetry of all the positive ground state solutions by employing the symmetric decreasing rearrangement and Talenti's inequality. Next, we develop Newton's theorem and then use the contraction mappi…
▽ More
This paper is concerned with the existence and qualitative properties of positive ground state solutions for the planar Schrödinger-Newton equation on the disc. First, we prove the existence and radial symmetry of all the positive ground state solutions by employing the symmetric decreasing rearrangement and Talenti's inequality. Next, we develop Newton's theorem and then use the contraction mapping principle to establish the uniqueness of the positive ground state solution for the Schrödinger-Newton equation on the disc in the two dimensional case. Finally, we show that the unique positive ground state solution converges to the trivial solution as the radius $R$ tending to infinity, which is totally different from the higher dimensional case in \cite{Guo-Wang-Yi}.
△ Less
Submitted 10 June, 2024;
originally announced June 2024.
-
Characterization of Recirculating Waveguide Meshes Based on an Optimization Method with a Parameter Space Reduction Technology
Authors:
Ran Tao,
Jifang Qiu,
Yuchen Chen,
Bowen Zhang,
Yan Li,
Hongxiang Guo,
Jian Wu
Abstract:
Fabrication imperfections must be considered during configuration to ensure that the setup is suitable for the actual fabricated programmable photonic integrated circuits (PPICs). Therefore, characterization of imperfections is crucial but difficult, especially for PPICs made from recirculating waveguide meshes. The flexibility required by these meshes demands a more complex topology and compact T…
▽ More
Fabrication imperfections must be considered during configuration to ensure that the setup is suitable for the actual fabricated programmable photonic integrated circuits (PPICs). Therefore, characterization of imperfections is crucial but difficult, especially for PPICs made from recirculating waveguide meshes. The flexibility required by these meshes demands a more complex topology and compact TBU structure, complicating the characterization. In this paper, we propose a characterization method applicable to recirculating waveguide meshes based on an optimization approach, along with a step-by-step procedure to reduce the parameter space of optimization, allowing for characterizing imperfect parameters of each individual component within the waveguide mesh. To the best of our knowledge, this method can greatly broaden the range of characterized parameters compared to currently reported methods. In order to verify the effectiveness of our method, we used the characterized parameters to build a multi-frequency model of a mesh with fabrication errors and successfully demonstrated accurate prediction of its behavior. Furthermore, we applied our method on implementations of 6 different kind of FIR/IRR filters, to further prove the effectiveness of our method in configuring applications on meshes with fabrication errors. At last, our method was carried out under various scenarios considering beam splitter splitting ratio variance, inaccurate measurements of mesh and imprecise TBU insertion loss characterization, to demonstrate its strong robustness under various practical scenarios.
△ Less
Submitted 8 June, 2024;
originally announced June 2024.
-
AgentGym: Evolving Large Language Model-based Agents across Diverse Environments
Authors:
Zhiheng Xi,
Yiwen Ding,
Wenxiang Chen,
Boyang Hong,
Honglin Guo,
Junzhe Wang,
Dingwen Yang,
Chenyang Liao,
Xin Guo,
Wei He,
Songyang Gao,
Lu Chen,
Rui Zheng,
Yicheng Zou,
Tao Gui,
Qi Zhang,
Xipeng Qiu,
Xuanjing Huang,
Zuxuan Wu,
Yu-Gang Jiang
Abstract:
Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervis…
▽ More
Building generalist agents that can handle diverse tasks and evolve themselves across different environments is a long-term goal in the AI community. Large language models (LLMs) are considered a promising foundation to build such agents due to their generalized capabilities. Current approaches either have LLM-based agents imitate expert-provided trajectories step-by-step, requiring human supervision, which is hard to scale and limits environmental exploration; or they let agents explore and learn in isolated environments, resulting in specialist agents with limited generalization. In this paper, we take the first step towards building generally-capable LLM-based agents with self-evolution ability. We identify a trinity of ingredients: 1) diverse environments for agent exploration and learning, 2) a trajectory set to equip agents with basic capabilities and prior knowledge, and 3) an effective and scalable evolution method. We propose AgentGym, a new framework featuring a variety of environments and tasks for broad, real-time, uni-format, and concurrent agent exploration. AgentGym also includes a database with expanded instructions, a benchmark suite, and high-quality trajectories across environments. Next, we propose a novel method, AgentEvol, to investigate the potential of agent self-evolution beyond previously seen data across tasks and environments. Experimental results show that the evolved agents can achieve results comparable to SOTA models. We release the AgentGym suite, including the platform, dataset, benchmark, checkpoints, and algorithm implementations. The AgentGym suite is available on https://github.com/WooooDyy/AgentGym.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Beyond a binary theorizing of prosociality
Authors:
Chen Shen,
Zhixue He,
Hao Guo,
Shuyue Hu,
Jun Tanimoto,
Lei Shi,
Petter Holme
Abstract:
A stylized experiment, the public goods game, has taught us the peculiar reproducible fact that humans tend to contribute more to shared resources than expected from economically rational assumptions. There have been two competing explanations for this phenomenon: either contributing to the public good is an innate human trait (the prosocial preference hypothesis) or a transitory effect while lear…
▽ More
A stylized experiment, the public goods game, has taught us the peculiar reproducible fact that humans tend to contribute more to shared resources than expected from economically rational assumptions. There have been two competing explanations for this phenomenon: either contributing to the public good is an innate human trait (the prosocial preference hypothesis) or a transitory effect while learning the game (the confused learner hypothesis). We use large-scale experimental data from a novel experimental design to distinguish between these two hypotheses. By monitoring the effects of zealots (persistently cooperating bots) and varying the participants' awareness of them, we find a considerably more complex scenario than previously reported. People indeed have a prosocial bias, but not to the degree that they always forego taking action to increase their profit. While our findings end the simplistic theorizing of prosociality in the public goods game, an observed positive, cooperative response to zealots has actionable policy implications.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Robust Knowledge Distillation Based on Feature Variance Against Backdoored Teacher Model
Authors:
Jinyin Chen,
Xiaoming Zhao,
Haibin Zheng,
Xiao Li,
Sheng Xiang,
Haifeng Guo
Abstract:
Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been e…
▽ More
Benefiting from well-trained deep neural networks (DNNs), model compression have captured special attention for computing resource limited equipment, especially edge devices. Knowledge distillation (KD) is one of the widely used compression techniques for edge deployment, by obtaining a lightweight student model from a well-trained teacher model released on public platforms. However, it has been empirically noticed that the backdoor in the teacher model will be transferred to the student model during the process of KD. Although numerous KD methods have been proposed, most of them focus on the distillation of a high-performing student model without robustness consideration. Besides, some research adopts KD techniques as effective backdoor mitigation tools, but they fail to perform model compression at the same time. Consequently, it is still an open problem to well achieve two objectives of robust KD, i.e., student model's performance and backdoor mitigation. To address these issues, we propose RobustKD, a robust knowledge distillation that compresses the model while mitigating backdoor based on feature variance. Specifically, RobustKD distinguishes the previous works in three key aspects: (1) effectiveness: by distilling the feature map of the teacher model after detoxification, the main task performance of the student model is comparable to that of the teacher model; (2) robustness: by reducing the characteristic variance between the teacher model and the student model, it mitigates the backdoor of the student model under backdoored teacher model scenario; (3) generic: RobustKD still has good performance in the face of multiple data models (e.g., WRN 28-4, Pyramid-200) and diverse DNNs (e.g., ResNet50, MobileNet).
△ Less
Submitted 1 June, 2024;
originally announced June 2024.
-
Addressing Index Collapse of Large-Codebook Speech Tokenizer with Dual-Decoding Product-Quantized Variational Auto-Encoder
Authors:
Haohan Guo,
Fenglong Xie,
Dongchao Yang,
Hui Lu,
Xixin Wu,
Helen Meng
Abstract:
VQ-VAE, as a mainstream approach of speech tokenizer, has been troubled by ``index collapse'', where only a small number of codewords are activated in large codebooks. This work proposes product-quantized (PQ) VAE with more codebooks but fewer codewords to address this problem and build large-codebook speech tokenizers. It encodes speech features into multiple VQ subspaces and composes them into c…
▽ More
VQ-VAE, as a mainstream approach of speech tokenizer, has been troubled by ``index collapse'', where only a small number of codewords are activated in large codebooks. This work proposes product-quantized (PQ) VAE with more codebooks but fewer codewords to address this problem and build large-codebook speech tokenizers. It encodes speech features into multiple VQ subspaces and composes them into codewords in a larger codebook. Besides, to utilize each VQ subspace well, we also enhance PQ-VAE via a dual-decoding training strategy with the encoding and quantized sequences. The experimental results demonstrate that PQ-VAE addresses ``index collapse" effectively, especially for larger codebooks. The model with the proposed training strategy further improves codebook perplexity and reconstruction quality, outperforming other multi-codebook VQ approaches. Finally, PQ-VAE demonstrates its effectiveness in language-model-based TTS, supporting higher-quality speech generation with larger codebooks.
△ Less
Submitted 5 June, 2024;
originally announced June 2024.
-
Uncertainty of Joint Neural Contextual Bandit
Authors:
Hongbo Guo,
Zheqing Zhu
Abstract:
Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a di…
▽ More
Contextual bandit learning is increasingly favored in modern large-scale recommendation systems. To better utlize the contextual information and available user or item features, the integration of neural networks have been introduced to enhance contextual bandit learning and has triggered significant interest from both academia and industry. However, a major challenge arises when implementing a disjoint neural contextual bandit solution in large-scale recommendation systems, where each item or user may correspond to a separate bandit arm. The huge number of items to recommend poses a significant hurdle for real world production deployment. This paper focuses on a joint neural contextual bandit solution which serves all recommending items in one single model. The output consists of a predicted reward $μ$, an uncertainty $σ$ and a hyper-parameter $α$ which balances exploitation and exploration, e.g., $μ+ ασ$.
The tuning of the parameter $α$ is typically heuristic and complex in practice due to its stochastic nature. To address this challenge, we provide both theoretical analysis and experimental findings regarding the uncertainty $σ$ of the joint neural contextual bandit model. Our analysis reveals that $α$ demonstrates an approximate square root relationship with the size of the last hidden layer $F$ and inverse square root relationship with the amount of training data $N$, i.e., $σ\propto \sqrt{\frac{F}{N}}$. The experiments, conducted with real industrial data, align with the theoretical analysis, help understanding model behaviors and assist the hyper-parameter tuning during both offline training and online deployment.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Authors:
Dongchao Yang,
Dingdong Wang,
Haohan Guo,
Xueyuan Chen,
Xixin Wu,
Helen Meng
Abstract:
In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignment information; (2) It directly takes plain text as input and generates speech through an NAR way; (3) It tries to model speech in a finite and compac…
▽ More
In this study, we propose a simple and efficient Non-Autoregressive (NAR) text-to-speech (TTS) system based on diffusion, named SimpleSpeech. Its simpleness shows in three aspects: (1) It can be trained on the speech-only dataset, without any alignment information; (2) It directly takes plain text as input and generates speech through an NAR way; (3) It tries to model speech in a finite and compact latent space, which alleviates the modeling difficulty of diffusion. More specifically, we propose a novel speech codec model (SQ-Codec) with scalar quantization, SQ-Codec effectively maps the complex speech signal into a finite and compact latent space, named scalar latent space. Benefits from SQ-Codec, we apply a novel transformer diffusion model in the scalar latent space of SQ-Codec. We train SimpleSpeech on 4k hours of a speech-only dataset, it shows natural prosody and voice cloning ability. Compared with previous large-scale TTS models, it presents significant speech quality and generation speed improvement. Demos are released.
△ Less
Submitted 14 June, 2024; v1 submitted 4 June, 2024;
originally announced June 2024.
-
Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay
Authors:
Daya Bay collaboration,
F. P. An,
W. D. Bai,
A. B. Balantekin,
M. Bishai,
S. Blyth,
G. F. Cao,
J. Cao,
J. F. Chang,
Y. Chang,
H. S. Chen,
H. Y. Chen,
S. M. Chen,
Y. Chen,
Y. X. Chen,
Z. Y. Chen,
J. Cheng,
J. Cheng,
Y. -C. Cheng,
Z. K. Cheng,
J. J. Cherwinka,
M. C. Chu,
J. P. Cummings,
O. Dalager,
F. S. Deng
, et al. (177 additional authors not shown)
Abstract:
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive…
▽ More
This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result.
△ Less
Submitted 3 June, 2024;
originally announced June 2024.
-
Filament eruption by multiple reconnections
Authors:
Y. Liu,
G. P. Ruan,
B. Schmieder,
J. H. Guo,
Y. Chen,
R. S. Zheng,
J. T. Su,
B. Wang
Abstract:
Filament eruption is a common phenomenon in solar activity, but the triggering mechanism is not well understood. We focus our study on a filament eruption located in a complex nest of three active regions close to a coronal hole. The filament eruption is observed at multiple wavelengths: by the GONG, the STEREO, the SUTRI, and the AIA and Helioseismic and Magnetic Imager (HMI) on board the SDO. Th…
▽ More
Filament eruption is a common phenomenon in solar activity, but the triggering mechanism is not well understood. We focus our study on a filament eruption located in a complex nest of three active regions close to a coronal hole. The filament eruption is observed at multiple wavelengths: by the GONG, the STEREO, the SUTRI, and the AIA and Helioseismic and Magnetic Imager (HMI) on board the SDO. Thanks to high temporal-resolution observations, we were able to analyze the evolution of the fine structure of the filament in detail. The filament changes direction during the eruption, which is followed by a halo coronal mass ejection detected by the LASCO on board the SOHO. A Type III radio burst was also registered at the time of the eruption. To investigate the process of the eruption, we analyzed the magnetic topology of the filament region adopting a nonlinear force-free-field (NLFFF) extrapolation method and the polytropic global magnetohydrodynamic (MHD) modeling. We modeled the filament by embeddingatwisted fluxropewiththe regularized Biot-Savart Laws (RBSL) method in the ambient magnetic f ield. The extrapolation results show that magnetic reconnection occurs in a fan-spine configuration resulting in a circular flare ribbon. The global modeling of the corona demonstrates that there was an interaction between the filament and open field lines, causing a deflection of the filament in the direction of the observed CME eruption and dimming area. The modeling supports the following scenario: magnetic reconnection not only occurs with the filament itself (the flux rope) but also with the background magnetic field lines and open field lines of the coronal hole located to the east of the flux rope. This multiwavelength analysis indicates that the filament undergoes multiple magnetic reconnections on small and large scales with a drifting of the flux rope.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Correlated Electronic Structure and Density-Wave Gap in Trilayer Nickelate La4Ni3O10
Authors:
X. Du,
Y. D. Li,
Y. T. Cao,
C. Y. Pei,
M. X. Zhang,
W. X. Zhao,
K. Y. Zhai,
R. Z. Xu,
Z. K. Liu,
Z. W. Li,
J. K. Zhao,
G. Li,
Y. L. Chen,
Y. P. Qi,
H. J. Guo,
L. X. Yang
Abstract:
The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popular…
▽ More
The discovery of pressurized superconductivity at 80 K in La3Ni2O7 officially brings nickelates into the family of high-temperature superconductors, which gives rise to not only new insights but also mysteries in the strongly correlated superconductivity. More recently, the sibling compound La4Ni3O10 was also shown to be superconducting below about 25 K under pressure, further boosting the popularity of nickelates in the Ruddlesden-Popper phase. In this study, combining high-resolution angle-resolved photoemission spectroscopy and ab initio calculation, we systematically investigate the electronic structures of La4Ni3O10 at ambient pressure. We reveal a high resemblance of La4Ni3O10 with La3Ni2O7 in the orbital-dependent fermiology and electronic structure, suggesting a similar electronic correlation between the two compounds. The temperature-dependent measurements imply an orbital-dependent energy gap related to the density-wave transition in La4Ni3O10. By comparing the theoretical pressure-dependent electronic structure, clues about the superconducting high-pressure phase can be deduced from the ambient measurements, providing crucial information for deciphering the unconventional superconductivity in nickelates.
△ Less
Submitted 30 May, 2024;
originally announced May 2024.