subscribe to arXiv mailings

Delayed luminescence and thermoluminescence in laboratory-grown diamonds

Authors: Jiahui Zhao, Ben L. Green, Ben G. Breeze, Hengxin Yuan, Troy Ardon, Wuyi Wang, Mark E. Newton

Abstract: The blue-green phosphorescence/thermoluminescence is most commonly observed in diamonds following excitation at or above the indirect band gap and has been explained by a substitutional nitrogen-boron donor-acceptor pair recombination model. Orange and red phosphorescence have also been frequently observed in lab-grown near-colourless high-pressure high-temperature diamonds following optical excit… ▽ More The blue-green phosphorescence/thermoluminescence is most commonly observed in diamonds following excitation at or above the indirect band gap and has been explained by a substitutional nitrogen-boron donor-acceptor pair recombination model. Orange and red phosphorescence have also been frequently observed in lab-grown near-colourless high-pressure high-temperature diamonds following optical excitation, and their luminescence mechanisms are shown to be different from that of the blue-green phosphorescence. The physics of the orange and red luminescence and phosphorescence bands including the optical-excitation dependency (UV-NIR), temperature dependency (20 - 573 K), and related charge transfer process are investigated by a combination of self-built time-resolved imaging/spectroscopic techniques. In this paper, an alternative model for long-lived phosphorescence based on charge trapping is proposed to explain the orange phosphorescence/ thermoluminescence band. Additionally, the red phosphorescence band are attributed to point defect which possibly has a three-level phosphorescence system. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 11 pages, 9 figures

arXiv:2407.11745 [pdf, other]

Universal Sound Separation with Self-Supervised Audio Masked Autoencoder

Authors: Junqi Zhao, Xubo Liu, Jinzheng Zhao, Yi Yuan, Qiuqiang Kong, Mark D. Plumbley, Wenwu Wang

Abstract: Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we… ▽ More Universal sound separation (USS) is a task of separating mixtures of arbitrary sound sources. Typically, universal separation models are trained from scratch in a supervised manner, using labeled data. Self-supervised learning (SSL) is an emerging deep learning approach that leverages unlabeled data to obtain task-agnostic representations, which can benefit many downstream tasks. In this paper, we propose integrating a self-supervised pre-trained model, namely the audio masked autoencoder (A-MAE), into a universal sound separation system to enhance its separation performance. We employ two strategies to utilize SSL embeddings: freezing or updating the parameters of A-MAE during fine-tuning. The SSL embeddings are concatenated with the short-time Fourier transform (STFT) to serve as input features for the separation model. We evaluate our methods on the AudioSet dataset, and the experimental results indicate that the proposed methods successfully enhance the separation performance of a state-of-the-art ResUNet-based USS model. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11727 [pdf, ps, other]

Measurement of the branching fraction of $D^+_s\to \ell^+ν_\ell$ via $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$

Authors: BESIII Collaboration, M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere , et al. (634 additional authors not shown)

Abstract: Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and… ▽ More Based on $10.64~\mathrm{fb}^{-1}$ of $e^+e^-$ collision data taken at center-of-mass energies between 4.237 and 4.699 GeV with the BESIII detector, we study the leptonic $D^+_s$ decays using the $e^+e^-\to D^{*+}_{s} D^{*-}_{s}$ process. The branching fractions of $D_s^+\to\ell^+ν_{\ell}\,(\ell=μ,τ)$ are measured to be $\mathcal{B}(D_s^+\toμ^+ν_μ)=(\bfmuv)\%$ and $\mathcal{B}(D_s^+\toτ^+ν_τ)=(\bftauv)\%$, respectively. The product of the decay constant and Cabibbo-Kobayashi-Maskawa matrix element $|V_{cs}|$ is determined to be $f_{D_s^+}|V_{cs}|=(\mufdsxvcsresult)_{μν}~\mathrm{MeV}$ and $f_{D_s^+}|V_{cs}|=(\taufdsxvcsresult))_{τν}~\mathrm{MeV}$, respectively. Taking the value of $|V_{cs}|$ from a global fit in the Standard Model, we obtain ${f_{D^+_s}}=(\mufdsresult)_{μν}$\,MeV and ${f_{D^+_s}}=(\taufdsresult)_{τν}$\,MeV, respectively. Conversely, taking the value for $f_{D_s^+}$ from the latest lattice quantum chromodynamics calculation, we obtain $|V_{cs}| =(\muvcsresult)_{μν}$ and $|V_{cs}| = (\tauvcsresult)_{τν}$, respectively. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 27 pages, 13 figures

arXiv:2407.11721 [pdf, other]

Inferring the mass content of galaxy clusters with satellite kinematics and Jeans Anisotropic modeling

Authors: Rui Shi, Wenting Wang, Zhaozhou Li, Ling Zhu, Alexander Smith, Shaun Cole, Hongyu Gao, Xiaokai Chen, Qingyang Li, Jiaxin Han

Abstract: Satellite galaxies can be used to indicate the dynamical mass of galaxy groups and clusters. In this study, we apply the axis-symmetric Jeans Anisotropic Multi-Gaussian Expansion JAM modeling to satellite galaxies in 28 galaxy clusters selected from the TNG300-1 simulation with halo mass of $\log_{10}M_{200}/M_\odot>14.3$. If using true bound satellites as tracers, the best constrained total mass… ▽ More Satellite galaxies can be used to indicate the dynamical mass of galaxy groups and clusters. In this study, we apply the axis-symmetric Jeans Anisotropic Multi-Gaussian Expansion JAM modeling to satellite galaxies in 28 galaxy clusters selected from the TNG300-1 simulation with halo mass of $\log_{10}M_{200}/M_\odot>14.3$. If using true bound satellites as tracers, the best constrained total mass within the half-mass radius of satellites, $M(<r_\mathrm{half})$, and the virial mass, $M_{200}$, have average biases of -0.01 and $0.03$~dex, with average scatters of 0.11~dex and 0.15~dex. If selecting companions in redshift space with line-of-sight depth of 2,000~km/s, the biases are -0.06 and $0.01$~dex, while the scatters are 0.12 and 0.18~dex for $M(<r_\mathrm{half})$ and $M_{200}$. By comparing the best-fitting and actual density profiles, we find $\sim$29% of best-fitting density profiles show very good agreement with the truth, $\sim$32% display over or under estimates at most of the radial range with biased $M(<r_\mathrm{half})$, and 39% show under/over estimates in central regions and over/under estimates in the outskirts, with good constraints on $M(<r_\mathrm{half})$, yet most of the best constraints are still consistent with the true profiles within 1-$σ$ statistical uncertainties for the three circumstances. Using a mock DESI Bright Galaxy Survey catalog with the effect of fiber incompleteness, we find DESI fiber assignments and the choice of flux limits barely modify the velocity dispersion profiles and are thus unlikely to affect the dynamical modeling outcomes. Our results show that with current and future deep spectroscopic surveys, JAM can be a powerful tool to constrain the underlying density profiles of individual massive galaxy clusters. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: accepted by ApJ

arXiv:2407.11610 [pdf, other]

MergeNet: Explicit Mesh Reconstruction from Sparse Point Clouds via Edge Prediction

Authors: Weimin Wang, Yingxu Deng, Zezeng Li, Yu Liu, Na Lei

Abstract: This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by… ▽ More This paper introduces a novel method for reconstructing meshes from sparse point clouds by predicting edge connection. Existing implicit methods usually produce superior smooth and watertight meshes due to the isosurface extraction algorithms~(e.g., Marching Cubes). However, these methods become memory and computationally intensive with increasing resolution. Explicit methods are more efficient by directly forming the face from points. Nevertheless, the challenge of selecting appropriate faces from enormous candidates often leads to undesirable faces and holes. Moreover, the reconstruction performance of both approaches tends to degrade when the point cloud gets sparse. To this end, we propose MEsh Reconstruction via edGE~(MergeNet), which converts mesh reconstruction into local connectivity prediction problems. Specifically, MergeNet learns to extract the features of candidate edges and regress their distances to the underlying surface. Consequently, the predicted distance is utilized to filter out edges that lay on surfaces. Finally, the meshes are reconstructed by refining the triangulations formed by these edges. Extensive experiments on synthetic and real-scanned datasets demonstrate the superiority of MergeNet to SoTA explicit methods. △ Less

Submitted 16 July, 2024; originally announced July 2024.

arXiv:2407.11474 [pdf, other]

Search for the rare $Λ_c^+ \to p μ^+ μ^-$ decay

Authors: LHCb collaboration, R. Aaij, A. S. W. Abdelmotteleb, C. Abellan Beteta, F. Abudinén, T. Ackernley, A. A. Adefisoye, B. Adeva, M. Adinolfi, P. Adlarson, C. Agapopoulou, C. A. Aidala, Z. Ajaltouni, S. Akar, K. Akiba, P. Albicocco, J. Albrecht, F. Alessio, M. Alexander, Z. Aliouche, P. Alvarez Cartelle, R. Amalric, S. Amato, J. L. Amey, Y. Amhis , et al. (1062 additional authors not shown)

Abstract: A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branchi… ▽ More A search for the nonresonant $Λ_c^+ \to p μ^+ μ^-$ decay is performed using proton-proton collision data recorded at a centre-of-mass energy of 13 TeV by the LHCb experiment, corresponding to an integrated luminosity of 5.4 fb$^{-1}$. No evidence for the decay is found in the dimuon invariant-mass regions where the expected contributions of resonances is subdominant. The upper limit on the branching fraction of the $Λ_c^+ \to p μ^+ μ^-$ decay is determined to be $2.9~(3.2) \times 10^{-8}$ at 90% (95%) confidence level. The branching fractions in the dimuon invariant-mass regions dominated by the $η$, $ρ$ and $ω$ resonances are also determined. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: All figures and tables, along with any supplementary material and additional information, are available at https://cern.ch/lhcbproject/Publications/p/LHCb-PAPER-2024-005.html (LHCb public pages)

Report number: LHCb-PAPER-2024-005, CERN-EP-2024-158

arXiv:2407.11410 [pdf, other]

High-energy neutrino emission from tidal disruption event outflow-cloud interactions

Authors: Hanji Wu, Kai Wang, Wei Wang

Abstract: Tidal disruption events (TDEs), characterized by their luminous transients and high-velocity outflows, have emerged as plausible sources of high-energy neutrinos contributing to the diffuse neutrino. In this study, we calculate the contribution of TDEs to the diffuse neutrino by employing the outflow-cloud model within the TDE framework. Our analysis indicates that the contribution of TDEs becomes… ▽ More Tidal disruption events (TDEs), characterized by their luminous transients and high-velocity outflows, have emerged as plausible sources of high-energy neutrinos contributing to the diffuse neutrino. In this study, we calculate the contribution of TDEs to the diffuse neutrino by employing the outflow-cloud model within the TDE framework. Our analysis indicates that the contribution of TDEs becomes negligible when the redshift $Z$ exceeds 2. Employing a set of fiducial values, which includes outflow energy $E_{\rm kin}=10^{51}$ erg, a proton spectrum cutoff energy $E_{\rm p,max}=100$ PeV, a volume TDE rate $\dot{N}=8 \times 10^{-7}\ \rm Mpc^{-3}\ year^{-1}$, covering fraction of clouds $C_V=0.1$, energy conversion efficiency in the shock $η=0.1$, and a proton spectrum index $Γ=-1.7$, we find that TDEs can account for approximately 80\% of the contribution at energies around 0.3 PeV. Additionally, TDEs still contribute around 18\% to the IceCube data below 0.1 PeV and the total contribution is $\sim 24^{+2}_{-15}\%$. In addition, we also discuss the potential influence of various parameter values on the results in detail. With the IceCube data, we impose constraints on the combination of the physical parameters, i.e., $C_{f}=\dot{N}E_{\rm kin}C_{\rm v}η$. Future observations or theoretical considerations would fix some physical parameters, which will help to constrain some individual parameters of TDEs. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: 12 pages, 10 figures, accept for the publication in PRD

arXiv:2407.10892 [pdf, other]

First Measurement of Solar $^8$B Neutrino Flux through Coherent Elastic Neutrino-Nucleus Scattering in PandaX-4T

Authors: PandaX Collaboration, Zihao Bo, Wei Chen, Xun Chen, Yunhua Chen, Zhaokan Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Zhixing Gao, Lisheng Geng, Karl Giboni, Xunan Guo, Xuyuan Guo, Zichao Guo, Chencheng Han, Ke Han, Changda He, Jinrong He, Di Huang, Houqi Huang, Junting Huang, Ruquan Hou, Yu Hou, Xiangdong Ji , et al. (77 additional authors not shown)

Abstract: The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (… ▽ More The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (0.33 keV) nuclear recoil energy. Combining the commissioning run and the first science run of PandaX-4T, a total exposure of 1.25 and 1.04 tonne$\cdot$year are collected for the paired and US2, respectively. After unblinding, 3 and 332 events are observed with an expectation of 2.8$\pm$0.5 and 251$\pm$32 background events, for the paired and US2 data, respectively. A combined analysis yields a best-fit $^8$B neutrino signal of 3.5 (75) events from the paired (US2) data sample, with $\sim$37\% uncertainty, and the background-only hypothesis is disfavored at 2.64$σ$ significance. This gives a solar $^8$B neutrino flux of ($8.4\pm3.1$)$\times$10$^6$ cm$^{-2}$s$^{-1}$, consistent with the standard solar model prediction. This is the first indication of solar $^8$B neutrino ``fog'' in a dark matter direct detection experiment. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.10540 [pdf, other]

Sudden polarization angle jumps of the repeating fast radio burst FRB 20201124A

Authors: J. R. Niu, W. Y. Wang, J. C. Jiang, Y. Qu, D. J. Zhou, W. W. Zhu, K. J. Lee, J. L. Han, B. Zhang, D. Li, S. Cao, Z. Y. Fang, Y. Feng, Q. Y. Fu, P. Jiang, W. C. Jing, J. Li, Y. Li, R. Luo, L. Q. Meng, C. C. Miao, X. L. Miao, C. H. Niu, Y. C. Pan, B. J. Wang , et al. (19 additional authors not shown)

Abstract: We report the first detection of polarization angle (PA) orthogonal jumps, a phenomenon previously only observed from radio pulsars, from a fast radio burst (FRB) source FRB 20201124A. We find three cases of orthogonal jumps in over two thousand bursts, all resembling those observed in pulsar single pulses. We propose that the jumps are due to the superposition of two orthogonal emission modes tha… ▽ More We report the first detection of polarization angle (PA) orthogonal jumps, a phenomenon previously only observed from radio pulsars, from a fast radio burst (FRB) source FRB 20201124A. We find three cases of orthogonal jumps in over two thousand bursts, all resembling those observed in pulsar single pulses. We propose that the jumps are due to the superposition of two orthogonal emission modes that could only be produced in a highly magnetized plasma, and they are caused by the line of sight sweeping across a rotating magnetosphere. The shortest jump timescale is of the order of one-millisecond, which hints that the emission modes come from regions smaller than the light cylinder of most pulsars or magnetars. This discovery provides convincing evidence that FRB emission originates from the complex magnetosphere of a magnetar, suggesting an FRB emission mechanism that is analogous to radio pulsars despite a huge luminosity difference between two types of objects. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 10 pages, 5 figures, submitted to APJL

arXiv:2407.10404 [pdf, ps, other]

On the higher-rank Askey-Wilson algebras

Authors: Wanxia Wang, Shilin Yang

Abstract: In the paper, a new algebra ${\mathcal A}(n)$, which is generated by an upper triangular generating matrix with triple relations, is introduced. It is shown that there exists an isomorphism between the algebra ${\mathcal A}(n)$ and the higher Askey-Wilson algebra ${\mathfrak{aw}}(n)$ introduced by Crampé, Frappat et al. Furthermore, we establish a series of automorphisms of ${\mathcal A}(n),$ whic… ▽ More In the paper, a new algebra ${\mathcal A}(n)$, which is generated by an upper triangular generating matrix with triple relations, is introduced. It is shown that there exists an isomorphism between the algebra ${\mathcal A}(n)$ and the higher Askey-Wilson algebra ${\mathfrak{aw}}(n)$ introduced by Crampé, Frappat et al. Furthermore, we establish a series of automorphisms of ${\mathcal A}(n),$ which satisfy braid group relations and coincide with those in ${\mathfrak{aw}}(n).$ △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 36 pages

MSC Class: 16T10; 33D45; 81R12

arXiv:2407.10373 [pdf, other]

Mutual Learning for Acoustic Matching and Dereverberation via Visual Scene-driven Diffusion

Authors: Jian Ma, Wenguan Wang, Yi Yang, Feng Zheng

Abstract: Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired da… ▽ More Visual acoustic matching (VAM) is pivotal for enhancing the immersive experience, and the task of dereverberation is effective in improving audio intelligibility. Existing methods treat each task independently, overlooking the inherent reciprocity between them. Moreover, these methods depend on paired training data, which is challenging to acquire, impeding the utilization of extensive unpaired data. In this paper, we introduce MVSD, a mutual learning framework based on diffusion models. MVSD considers the two tasks symmetrically, exploiting the reciprocal relationship to facilitate learning from inverse tasks and overcome data scarcity. Furthermore, we employ the diffusion model as foundational conditional converters to circumvent the training instability and over-smoothing drawbacks of conventional GAN architectures. Specifically, MVSD employs two converters: one for VAM called reverberator and one for dereverberation called dereverberator. The dereverberator judges whether the reverberation audio generated by reverberator sounds like being in the conditional visual scenario, and vice versa. By forming a closed loop, these two converters can generate informative feedback signals to optimize the inverse tasks, even with easily acquired one-way unpaired data. Extensive experiments on two standard benchmarks, i.e., SoundSpaces-Speech and Acoustic AVSpeech, exhibit that our framework can improve the performance of the reverberator and dereverberator and better match specified visual scenarios. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://hechang25.github.io/MVSD

arXiv:2407.10324 [pdf, other]

Stability and dynamics of massive vortices in two-component Bose-Einstein condensates

Authors: J. D'Ambroise, W. Wang, C. Ticknor, R. Carretero-González, P. G. Kevrekidis

Abstract: The study of structures involving vortices in one component and bright solitary waves in another has a time-honored history in two-component atomic Bose-Einstein condensates. In the present work, we revisit this topic extending considerations well-past the near-integrable regime of nearly equal scattering lengths. Instead, we focus on stationary states and spectral stability of such structures for… ▽ More The study of structures involving vortices in one component and bright solitary waves in another has a time-honored history in two-component atomic Bose-Einstein condensates. In the present work, we revisit this topic extending considerations well-past the near-integrable regime of nearly equal scattering lengths. Instead, we focus on stationary states and spectral stability of such structures for large values of the inter-component interaction coefficient. We find that the state can manifest dynamical instabilities for suitable parameter values. We also explore a phenomenological, yet quantitatively accurate upon suitable tuning, particle model which, in line also with earlier works, offers the potential of accurately following the associated stability and dynamical features. Finally, we probe the dynamics of the unstable vortex-bright structure, observing an unprecedented, to our knowledge, instability scenario in which the oscillatory instability leads to a patch of vorticity that harbors and eventually ejects multiple vortex-bright structures. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10271 [pdf, other]

Building holographic code from the boundary

Authors: Wei Wang

Abstract: Holographic quantum error-correcting code, the quantum-information structure hypothesized for the AdS/CFT correspondence, has being attracting increasing attention in new directions interrelating the studies of quantum gravity and quantum simulation. In this work, we initiate a novel approach for building holographic code that can be generally applied in potentially broad and interdisciplinary con… ▽ More Holographic quantum error-correcting code, the quantum-information structure hypothesized for the AdS/CFT correspondence, has being attracting increasing attention in new directions interrelating the studies of quantum gravity and quantum simulation. In this work, we initiate a novel approach for building holographic code that can be generally applied in potentially broad and interdisciplinary contexts. Our approach takes an "opposite" route to the conventional paradigm that is based on bulk tensor-networks. As illustrated in an exact model, we start from scalable descriptions of boundary qudits which can guide succinct quantum-circuit simulations, and rigorously show how the bulk qudits and the encoding structure emerge from boundary entanglement patterns. By investigating the entanglement patterns, we systematically unfold the hypothetical structure for bulk reconstruction and the details of the Ryu-Takayanagi formula in the formalism of operator-algebra quantum error correction, demonstrating desired properties that are not yet proved in the established models. Our work might offer a fresh perspective for the study of holographic code. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10213 [pdf]

Spatio-temporal breather dynamics in microcomb soliton crystals

Authors: Futai Hu, Abhinav Kumar Vinod, Wenting Wang, Hsiao-Hsuan Chin, James F. McMillan, Ziyu Zhan, Yuan Meng, Mali Gong, Chee Wei Wong

Abstract: Solitons, the distinct balance between nonlinearity and dispersion, provide a route toward ultrafast electromagnetic pulse shaping, high-harmonic generation, real-time image processing, and RF photonic communications. Here we newly explore and observe the spatio-temporal breather dynamics of optical soliton crystals in frequency microcombs, examining spatial breathers, chaos transitions, and dynam… ▽ More Solitons, the distinct balance between nonlinearity and dispersion, provide a route toward ultrafast electromagnetic pulse shaping, high-harmonic generation, real-time image processing, and RF photonic communications. Here we newly explore and observe the spatio-temporal breather dynamics of optical soliton crystals in frequency microcombs, examining spatial breathers, chaos transitions, and dynamical deterministic switching in nonlinear measurements and theory. To understand the breather solitons, we describe their dynamical routes and two example transitional maps of the ensemble spatial breathers, with and without chaos initiation. We elucidate the physical mechanisms of the breather dynamics in the soliton crystal microcombs, in the interaction plane limit cycles and in the domain-wall understanding with parity symmetry breaking from third order dispersion. We present maps of the accessible nonlinear regions, the breather frequency dependences on third order dispersion and avoided mode crossing strengths, and the transition between the collective breather spatiotemporal states. Our range of measurements matches well with our first-principles theory and nonlinear modeling. To image these soliton ensembles and their breathers, we further constructed panoramic temporal imaging for simultaneous fast and slow axis two dimensional mapping of the breathers. In the phase differential sampling, we present two dimensional evolution maps of soliton crystal breathers, including with defects, in both stable breathers and breathers with drift. Our fundamental studies contribute to the understanding of nonlinear dynamics in soliton crystal complexes, their spatiotemporal dependences, and their stability-existence zones. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.10200 [pdf, other]

Shape2Scene: 3D Scene Representation Learning Through Pre-training on Shape Data

Authors: Tuo Feng, Wenguan Wang, Ruijie Quan, Yi Yang

Abstract: Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these… ▽ More Current 3D self-supervised learning methods of 3D scenes face a data desert issue, resulting from the time-consuming and expensive collecting process of 3D scene data. Conversely, 3D shape datasets are easier to collect. Despite this, existing pre-training strategies on shape data offer limited potential for 3D scene understanding due to significant disparities in point quantities. To tackle these challenges, we propose Shape2Scene (S2S), a novel method that learns representations of large-scale 3D scenes from 3D shape data. We first design multiscale and high-resolution backbones for shape and scene level 3D tasks, i.e., MH-P (point-based) and MH-V (voxel-based). MH-P/V establishes direct paths to highresolution features that capture deep semantic information across multiple scales. This pivotal nature makes them suitable for a wide range of 3D downstream tasks that tightly rely on high-resolution features. We then employ a Shape-to-Scene strategy (S2SS) to amalgamate points from various shapes, creating a random pseudo scene (comprising multiple objects) for training data, mitigating disparities between shapes and scenes. Finally, a point-point contrastive loss (PPC) is applied for the pre-training of MH-P/V. In PPC, the inherent correspondence (i.e., point pairs) is naturally obtained in S2SS. Extensive experiments have demonstrated the transferability of 3D representations learned by MH-P/V across shape-level and scene-level 3D tasks. MH-P achieves notable performance on well-known point cloud datasets (93.8% OA on ScanObjectNN and 87.6% instance mIoU on ShapeNetPart). MH-V also achieves promising performance in 3D semantic segmentation and 3D object detection. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://github.com/FengZicai/S2S

arXiv:2407.10167 [pdf, other]

Key-Point-Driven Mathematical Reasoning Distillation of Large Language Model

Authors: Xunyu Zhu, Jian Li, Yong Liu, Can Ma, Weiping Wang

Abstract: Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these small… ▽ More Large Language Models (LLMs) have demonstrated exceptional proficiency in mathematical reasoning tasks due to their extensive parameter counts and training on vast datasets. Despite these capabilities, deploying LLMs is hindered by their computational demands. Distilling LLM mathematical reasoning into Smaller Language Models (SLMs) has emerged as a solution to this challenge, although these smaller models often suffer from errors in calculation and semantic understanding. Prior work has proposed Program-of-Thought Distillation (PoTD) to avoid calculation error. To further address semantic understanding errors, we propose Key-Point-Driven Mathematical Reasoning Distillation (KPDD). KPDD enhances the reasoning performance of SLMs by breaking down the problem-solving process into three stages: Core Question Extraction, Problem-Solving Information Extraction, and Step-by-Step Solution. This method is further divided into KPDD-CoT, which generates Chain-of-Thought rationales, and KPDD-PoT, which creates Program-of-Thought rationales. The experiment results show that KPDD-CoT significantly improves reasoning abilities, while KPDD-PoT achieves state-of-the-art performance in mathematical reasoning tasks. Our approach effectively mitigates misunderstanding errors, advancing the deployment of efficient and capable SLMs. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: arXiv admin note: text overlap with arXiv:2401.11864

arXiv:2407.10132 [pdf, other]

Optimal Kernel Choice for Score Function-based Causal Discovery

Authors: Wenjie Wang, Biwei Huang, Feng Liu, Xinge You, Tongliang Liu, Kun Zhang, Mingming Gong

Abstract: Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appr… ▽ More Score-based methods have demonstrated their effectiveness in discovering causal relationships by scoring different causal structures based on their goodness of fit to the data. Recently, Huang et al. proposed a generalized score function that can handle general data distributions and causal relationships by modeling the relations in reproducing kernel Hilbert space (RKHS). The selection of an appropriate kernel within this score function is crucial for accurately characterizing causal relationships and ensuring precise causal discovery. However, the current method involves manual heuristic selection of kernel parameters, making the process tedious and less likely to ensure optimality. In this paper, we propose a kernel selection method within the generalized score function that automatically selects the optimal kernel that best fits the data. Specifically, we model the generative process of the variables involved in each step of the causal graph search procedure as a mixture of independent noise variables. Based on this model, we derive an automatic kernel selection method by maximizing the marginal likelihood of the variables involved in each search step. We conduct experiments on both synthetic data and real-world benchmarks, and the results demonstrate that our proposed method outperforms heuristic kernel selection methods. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: Accepted by ICML2024

arXiv:2407.10119 [pdf, ps, other]

Affine and cyclotomic Schur categories

Authors: Linliang Song, Weiqiang Wang

Abstract: Using the affine web category introduced in a prequel as a building block, we formulate a diagrammatic $\Bbbk$-linear monoidal category, the affine Schur category, for any commutative ring $\Bbbk$. We then formulate diagrammatic categories, the cyclotomic Schur categories, with arbitrary parameters at positive integral levels. Integral bases consisting of elementary diagrams are obtained for affin… ▽ More Using the affine web category introduced in a prequel as a building block, we formulate a diagrammatic $\Bbbk$-linear monoidal category, the affine Schur category, for any commutative ring $\Bbbk$. We then formulate diagrammatic categories, the cyclotomic Schur categories, with arbitrary parameters at positive integral levels. Integral bases consisting of elementary diagrams are obtained for affine and cyclotomic Schur categories. A second diagrammatic basis, called a double SST basis, for any such cyclotomic Schur category is also established, leading to a conjectural higher level RSK correspondence. We show that the endomorphism algebras with the double SST bases are isomorphic to degenerate cyclotomic Schur algebras with their cellular bases, providing a first diagrammatic presentation of the latter. The presentations for the affine and cyclotomic Schur categories are much simplified when $\Bbbk$ is a field of characteristic zero. △ Less

Submitted 14 July, 2024; originally announced July 2024.

Comments: 50 pages, many figures

arXiv:2407.10109 [pdf]

Hardware-Efficient and Reliable Coherent DSCM Systems Enabled by Single-Pilot-Tone-Based Polarization Demultiplexing

Authors: Wei Wang, Dongdong Zou, Weihao Ni, Fan Li

Abstract: Recently, coherent digital subcarrier multiplexing (DSCM) technology has become an attractive solution for next-generation ultra-high-speed datacenter interconnects (DCIs). To meet the requirements of low-cost and low-power consumption in DCI applications, a comprehensive simplification of the coherent DSCM system has been investigated. The pilot-tone-based polarization demultiplexing (PT-PDM) tec… ▽ More Recently, coherent digital subcarrier multiplexing (DSCM) technology has become an attractive solution for next-generation ultra-high-speed datacenter interconnects (DCIs). To meet the requirements of low-cost and low-power consumption in DCI applications, a comprehensive simplification of the coherent DSCM system has been investigated. The pilot-tone-based polarization demultiplexing (PT-PDM) technique, known for its low-power consumption and ultra-fast polarization tracking capabilities, has emerged as a compelling alternative to the power-hungry N-tap adaptive multi-input multiple-output (MIMO) equalizer. However, the effectiveness of this PT-PDM technique is extremely vulnerable to the receiver-side XY-skew (Rx-XY-skew), which is revealed in this paper for the first time. Then, a pilot-tone-enabled modified Godard phase detector (PT-MGPD) scheme is proposed to realize Rx-XY-skew estimation, serving as the prerequisite for the successful implementation of the PT-PDM and simplification of the adaptive equalizer. Both the simulation and experiment are conducted to evaluate the accuracy of the proposed PT-MGPD scheme. The results prove it can achieve accurate estimation with an error of less than 0.3ps. Besides, a low-complexity, high-spectral-efficiency, and ultra-fast polarization demultiplexing method based on a single pilot tone (SPT) is proposed for the DSCM system in this work. Based on the proposed PT-MGPD and SPT schemes, the conventional N-tap MIMO equalizer served for each subcarrier can be successfully pruned into two polarization-independent single-input single-output equalizers, and there is no performance penalty even if the polarization rotation speed reaches 10Mrad/s. According to the results, the proposed schemes provide a hardware-efficient and reliable coherent DSCM solution for next-generation ultra-high-speed DCIs. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.09792 [pdf, other]

Language-Augmented Symbolic Planner for Open-World Task Planning

Authors: Guanqi Chen, Lei Yang, Ruixing Jia, Zhe Hu, Yizhou Chen, Wei Zhang, Wenping Wang, Jia Pan

Abstract: Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due… ▽ More Enabling robotic agents to perform complex long-horizon tasks has been a long-standing goal in robotics and artificial intelligence (AI). Despite the potential shown by large language models (LLMs), their planning capabilities remain limited to short-horizon tasks and they are unable to replace the symbolic planning approach. Symbolic planners, on the other hand, may encounter execution errors due to their common assumption of complete domain knowledge which is hard to manually prepare for an open-world setting. In this paper, we introduce a Language-Augmented Symbolic Planner (LASP) that integrates pre-trained LLMs to enable conventional symbolic planners to operate in an open-world environment where only incomplete knowledge of action preconditions, objects, and properties is initially available. In case of execution errors, LASP can utilize the LLM to diagnose the cause of the error based on the observation and interact with the environment to incrementally build up its knowledge base necessary for accomplishing the given tasks. Experiments demonstrate that LASP is proficient in solving planning problems in the open-world setting, performing well even in situations where there are multiple gaps in the knowledge. △ Less

Submitted 13 July, 2024; originally announced July 2024.

Comments: Accepted by Robotics: Science and Systems (RSS) 2024

arXiv:2407.09693 [pdf, other]

A Mathematical Framework, a Taxonomy of Modeling Paradigms, and a Suite of Learning Techniques for Neural-Symbolic Systems

Authors: Charles Dickens, Connor Pryor, Changyu Gao, Alon Albalak, Eriq Augustine, William Wang, Stephen Wright, Lise Getoor

Abstract: The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symboli… ▽ More The field of Neural-Symbolic (NeSy) systems is growing rapidly. Proposed approaches show great promise in achieving symbiotic unions of neural and symbolic methods. However, each NeSy system differs in fundamental ways. There is a pressing need for a unifying theory to illuminate the commonalities and differences in approaches and enable further progress. In this paper, we introduce Neural-Symbolic Energy-Based Models (NeSy-EBMs), a unifying mathematical framework for discriminative and generative modeling with probabilistic and non-probabilistic NeSy approaches. We utilize NeSy-EBMs to develop a taxonomy of modeling paradigms focusing on a system's neural-symbolic interface and reasoning capabilities. Additionally, we introduce a suite of learning techniques for NeSy-EBMs. Importantly, NeSy-EBMs allow the derivation of general expressions for gradients of prominent learning losses, and we provide four learning approaches that leverage methods from multiple domains, including bilevel and stochastic policy optimization. Finally, we present Neural Probabilistic Soft Logic (NeuPSL), an open-source NeSy-EBM library designed for scalability and expressivity, facilitating real-world application of NeSy systems. Through extensive empirical analysis across multiple datasets, we demonstrate the practical advantages of NeSy-EBMs in various tasks, including image classification, graph node labeling, autonomous vehicle situation awareness, and question answering. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.09648 [pdf, other]

3x2: 3D Object Part Segmentation by 2D Semantic Correspondences

Authors: Anh Thai, Weiyao Wang, Hao Tang, Stefan Stojanov, Matt Feiszli, James M. Rehg

Abstract: 3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object par… ▽ More 3D object part segmentation is essential in computer vision applications. While substantial progress has been made in 2D object part segmentation, the 3D counterpart has received less attention, in part due to the scarcity of annotated 3D datasets, which are expensive to collect. In this work, we propose to leverage a few annotated 3D shapes or richly annotated 2D datasets to perform 3D object part segmentation. We present our novel approach, termed 3-By-2 that achieves SOTA performance on different benchmarks with various granularity levels. By using features from pretrained foundation models and exploiting semantic and geometric correspondences, we are able to overcome the challenges of limited 3D annotations. Our approach leverages available 2D labels, enabling effective 3D object part segmentation. Our method 3-By-2 can accommodate various part taxonomies and granularities, demonstrating interesting part label transfer ability across different object categories. Project website: \url{https://ngailapdi.github.io/projects/3by2/}. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Accepted to ECCV 2024

arXiv:2407.09121 [pdf, other]

Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training

Authors: Youliang Yuan, Wenxiang Jiao, Wenxuan Wang, Jen-tse Huang, Jiahao Xu, Tian Liang, Pinjia He, Zhaopeng Tu

Abstract: This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at a… ▽ More This study addresses a critical gap in safety tuning practices for Large Language Models (LLMs) by identifying and tackling a refusal position bias within safety tuning data, which compromises the models' ability to appropriately refuse generating unsafe content. We introduce a novel approach, Decoupled Refusal Training (DeRTa), designed to empower LLMs to refuse compliance to harmful prompts at any response position, significantly enhancing their safety capabilities. DeRTa incorporates two novel components: (1) Maximum Likelihood Estimation (MLE) with Harmful Response Prefix, which trains models to recognize and avoid unsafe content by appending a segment of harmful response to the beginning of a safe response, and (2) Reinforced Transition Optimization (RTO), which equips models with the ability to transition from potential harm to safety refusal consistently throughout the harmful response sequence. Our empirical evaluation, conducted using LLaMA3 and Mistral model families across six attack scenarios, demonstrates that our method not only improves model safety without compromising performance but also surpasses well-known models such as GPT-4 in defending against attacks. Importantly, our approach successfully defends recent advanced attack methods (e.g., CodeAttack) that have jailbroken GPT-4 and LLaMA3-70B-Instruct. Our code and data can be found at https://github.com/RobustNLP/DeRTa. △ Less

Submitted 12 July, 2024; originally announced July 2024.

arXiv:2407.08785 [pdf, ps, other]

A kinetic Nash inequality and precise boundary behavior of the kinetic Fokker-Planck equation

Authors: Christopher Henderson, Giacomo Lucertini, Weinan Wang

Abstract: In this paper, we prove a kinetic Nash type inequality and adapt it to a new functional inequality for functions in a kinetic Sobolev space with absorbing boundary conditions on the half-space. As an application, we address the boundary behavior of the kinetic Fokker-Planck equations in the half-space. Our main result is the sharp regularity of the solution at the absorbing boundary and grazing se… ▽ More In this paper, we prove a kinetic Nash type inequality and adapt it to a new functional inequality for functions in a kinetic Sobolev space with absorbing boundary conditions on the half-space. As an application, we address the boundary behavior of the kinetic Fokker-Planck equations in the half-space. Our main result is the sharp regularity of the solution at the absorbing boundary and grazing set. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 35 pages, 2 figures

MSC Class: 35Q84; 35K65; 26D10; 35A23

arXiv:2407.08183 [pdf, other]

The white-light superflares from cool stars in GWAC triggers

Authors: Guang-Wei Li, Liang Wang, Hai-Long Yuan, Li-Ping Xin, Jing Wang, Chao Wu, Hua-Li Li, Hasitieer Haerken, Wei-Hua Wang, Hong-Bo Cai, Xu-Hui Han, Yang Xu, Lei Huang, Xiao-Meng Lu, Jian-Ying Bai, Xiang-Yu Wang, Zi-Gao Dai, En-Wei Liang, Jian-Yan Wei

Abstract: M-type stars are the ones that flare most frequently, but how big their maximum flare energy can reach is still unknown. We present 163 flares from 162 individual M2 through L1-type stars that triggered the GWAC, with flare energies ranging from $10^{32.2}$ to $10^{36.4}$ erg . The flare amplitudes range from $\triangle G = 0.84$ to $\sim 10$ mag. Flare energy increases with stellar surface temper… ▽ More M-type stars are the ones that flare most frequently, but how big their maximum flare energy can reach is still unknown. We present 163 flares from 162 individual M2 through L1-type stars that triggered the GWAC, with flare energies ranging from $10^{32.2}$ to $10^{36.4}$ erg . The flare amplitudes range from $\triangle G = 0.84$ to $\sim 10$ mag. Flare energy increases with stellar surface temperature ($T_{\rm eff}$) but both $\triangle G$ and equivalent duration $\log_{10}(ED)$ seem to be independent of $T_{\rm eff}$. Combining periods detected from light curves of TESS and K2, spectra from LAMOST, SDSS and the 2.16 m Telescope, and the Gaia DR3 data, we found that these GWAC flare stars are young. For the stars that have spectra, we found that these stars are in or very near to the saturation region, and $\log_{10}(L_{\rm Hα}/L_{\rm bol})$ is lower for M7-L1 stars than for M2-M6 stars. We also studied the relation between GWAC flare bolometric energy $E_{\rm bol}$ and stellar hemispherical area $S$, and found that $\log_{10}E_{\rm bol}$ (in erg) increases with increasing $S$ (in cm$^2$), and the maximum flare energy $\log_{10}E_{\rm bol, max} \geqslant \log_{10}S + 14.25$. For M7-L1 stars, there seem to be other factors limiting their maximum flare energies in addition to stellar hemispherical area. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 18 pages, 11 figures, 4 tables

arXiv:2407.08133 [pdf, other]

Nonverbal Interaction Detection

Authors: Jianan Wei, Tianfei Zhou, Yi Yang, Wenguan Wang

Abstract: This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the l… ▽ More This work addresses a new challenge of understanding human nonverbal interaction in social contexts. Nonverbal signals pervade virtually every communicative act. Our gestures, facial expressions, postures, gaze, even physical appearance all convey messages, without anything being said. Despite their critical role in social life, nonverbal signals receive very limited attention as compared to the linguistic counterparts, and existing solutions typically examine nonverbal cues in isolation. Our study marks the first systematic effort to enhance the interpretation of multifaceted nonverbal signals. First, we contribute a novel large-scale dataset, called NVI, which is meticulously annotated to include bounding boxes for humans and corresponding social groups, along with 22 atomic-level nonverbal behaviors under five broad interaction types. Second, we establish a new task NVI-DET for nonverbal interaction detection, which is formalized as identifying triplets in the form <individual, group, interaction> from images. Third, we propose a nonverbal interaction detection hypergraph (NVI-DEHR), a new approach that explicitly models high-order nonverbal interactions using hypergraphs. Central to the model is a dual multi-scale hypergraph that adeptly addresses individual-to-individual and group-to-group correlations across varying scales, facilitating interactional feature learning and eventually improving interaction prediction. Extensive experiments on NVI show that NVI-DEHR improves various baselines significantly in NVI-DET. It also exhibits leading performance on HOI-DET, confirming its versatility in supporting related tasks and strong generalization ability. We hope that our study will offer the community new avenues to explore nonverbal signals in more depth. △ Less

Submitted 14 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://github.com/weijianan1/NVI

arXiv:2407.08127 [pdf, other]

Prediction Exposes Your Face: Black-box Model Inversion via Prediction Alignment

Authors: Yufan Liu, Wanqian Zhang, Dayan Wu, Zheng Lin, Jingzi Gu, Weiping Wang

Abstract: Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unreal… ▽ More Model inversion (MI) attack reconstructs the private training data of a target model given its output, posing a significant threat to deep learning models and data privacy. On one hand, most of existing MI methods focus on searching for latent codes to represent the target identity, yet this iterative optimization-based scheme consumes a huge number of queries to the target model, making it unrealistic especially in black-box scenario. On the other hand, some training-based methods launch an attack through a single forward inference, whereas failing to directly learn high-level mappings from prediction vectors to images. Addressing these limitations, we propose a novel Prediction-to-Image (P2I) method for black-box MI attack. Specifically, we introduce the Prediction Alignment Encoder to map the target model's output prediction into the latent code of StyleGAN. In this way, prediction vector space can be well aligned with the more disentangled latent space, thus establishing a connection between prediction vectors and the semantic facial features. During the attack phase, we further design the Aligned Ensemble Attack scheme to integrate complementary facial attributes of target identity for better reconstruction. Experimental results show that our method outperforms other SOTAs, e.g.,compared with RLB-MI, our method improves attack accuracy by 8.5% and reduces query numbers by 99% on dataset CelebA. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: Accepted by ECCV 2024

arXiv:2407.07924 [pdf, other]

Solving General Natural-Language-Description Optimization Problems with Large Language Models

Authors: Jihai Zhang, Wei Wang, Siyan Guo, Li Wang, Fangquan Lin, Cheng Yang, Wotao Yin

Abstract: Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this p… ▽ More Optimization problems seek to find the best solution to an objective under a set of constraints, and have been widely investigated in real-world applications. Modeling and solving optimization problems in a specific domain typically require a combination of domain knowledge, mathematical skills, and programming ability, making it difficult for general users and even domain professionals. In this paper, we propose a novel framework called OptLLM that augments LLMs with external solvers. Specifically, OptLLM accepts user queries in natural language, convert them into mathematical formulations and programming codes, and calls the solvers to calculate the results for decision-making. In addition, OptLLM supports multi-round dialogues to gradually refine the modeling and solving of optimization problems. To illustrate the effectiveness of OptLLM, we provide tutorials on three typical optimization applications and conduct experiments on both prompt-based GPT models and a fine-tuned Qwen model using a large-scale selfdeveloped optimization dataset. Experimental results show that OptLLM works with various LLMs, and the fine-tuned model achieves an accuracy boost compared to the promptbased models. Some features of OptLLM framework have been available for trial since June 2023 (https://opt.alibabacloud.com/chat or https://opt.aliyun.com/chat). △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.07651 [pdf, other]

Study of the decay and production properties of $D_{s1}(2536)$ and $D_{s2}^*(2573)$

Authors: M. Ablikim, M. N. Achasov, P. Adlarson, O. Afedulidis, X. C. Ai, R. Aliberti, A. Amoroso, Q. An, Y. Bai, O. Bakina, I. Balossino, Y. Ban, H. -R. Bao, V. Batozskaya, K. Begzsuren, N. Berger, M. Berlowski, M. Bertani, D. Bettoni, F. Bianchi, E. Bianco, A. Bortone, I. Boyko, R. A. Briere, A. Brueggemann , et al. (645 additional authors not shown)

Abstract: The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be… ▽ More The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ processes are studied using data samples collected with the BESIII detector at center-of-mass energies from 4.530 to 4.946~GeV. The absolute branching fractions of $D_{s1}(2536)^- \rightarrow \bar{D}^{*0}K^-$ and $D_{s2}^*(2573)^- \rightarrow \bar{D}^0K^-$ are measured for the first time to be $(35.9\pm 4.8\pm 3.5)\%$ and $(37.4\pm 3.1\pm 4.6)\%$, respectively. The measurements are in tension with predictions based on the assumption that the $D_{s1}(2536)$ and $D_{s2}^*(2573)$ are dominated by a bare $c\bar{s}$ component. The $e^+e^-\rightarrow D_s^+D_{s1}(2536)^-$ and $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ cross sections are measured, and a resonant structure at around 4.6~GeV with a width of 50~MeV is observed for the first time with a statistical significance of $15σ$ in the $e^+e^-\rightarrow D_s^+D^*_{s2}(2573)^-$ process. It could be the $Y(4626)$ found by the Belle collaboration in the $D_s^+D_{s1}(2536)^{-}$ final state, since they have similar masses and widths. There is also evidence for a structure at around 4.75~GeV in both processes. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07504 [pdf, other]

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Authors: Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

Abstract: Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation… ▽ More Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation pre-training with the designed position-aware masked autoencoder (PAMA). Meanwhile, we propose the position-aware cross-attention (PACA) module with a kernel reorientation (KRO) strategy and an anchor dropout (AD) mechanism. The KRO strategy can capture the complete semantic structure and eliminate ambiguity in WSIs, and the AD contributes to enhancing the robustness and generalization of the model. We evaluated our method on 6 large-scale datasets from multiple organs for pan-cancer classification tasks. The results have demonstrated the effectiveness of PAMA in generalized and discriminative WSI representation learning and pan-cancer WSI pre-training. The proposed method was also compared with 7 WSI analysis methods. The experimental results have indicated that our proposed PAMA is superior to the state-of-the-art methods.The code and checkpoints are available at https://github.com/WkEEn/PAMA. △ Less

Submitted 15 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07487 [pdf, other]

Review-LLM: Harnessing Large Language Models for Personalized Review Generation

Authors: Qiyao Peng, Hongtao Liu, Hongyan Xu, Qing Yang, Minglai Shao, Wenjun Wang

Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' ph… ▽ More Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' phenomenon of the LLMs and could not generate personalized reviews (e.g., negative reviews). In this paper, we propose Review-LLM that customizes LLMs for personalized review generation. Firstly, we construct the prompt input by aggregating user historical behaviors, which include corresponding item titles and reviews. This enables the LLMs to capture user interest features and review writing style. Secondly, we incorporate ratings as indicators of satisfaction into the prompt, which could further improve the model's understanding of user preferences and the sentiment tendency control of generated reviews. Finally, we feed the prompt text into LLMs, and use Supervised Fine-Tuning (SFT) to make the model generate personalized reviews for the given user and target item. Experimental results on the real-world dataset show that our fine-tuned model could achieve better review generation performance than existing close-source LLMs. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07433 [pdf, other]

Controllable Navigation Instruction Generation with Chain of Thought Prompting

Authors: Xianghao Kong, Jinyu Chen, Wenguan Wang, Hang Su, Xiaolin Hu, Yi Yang, Si Liu

Abstract: Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation… ▽ More Instruction generation is a vital and multidisciplinary research area with broad applications. Existing instruction generation models are limited to generating instructions in a single style from a particular dataset, and the style and content of generated instructions cannot be controlled. Moreover, most existing instruction generation methods also disregard the spatial modeling of the navigation environment. Leveraging the capabilities of Large Language Models (LLMs), we propose C-Instructor, which utilizes the chain-of-thought-style prompt for style-controllable and content-controllable instruction generation. Firstly, we propose a Chain of Thought with Landmarks (CoTL) mechanism, which guides the LLM to identify key landmarks and then generate complete instructions. CoTL renders generated instructions more accessible to follow and offers greater controllability over the manipulation of landmark objects. Furthermore, we present a Spatial Topology Modeling Task to facilitate the understanding of the spatial structure of the environment. Finally, we introduce a Style-Mixed Training policy, harnessing the prior knowledge of LLMs to enable style control for instruction generation based on different prompts within a single model instance. Extensive experiments demonstrate that instructions generated by C-Instructor outperform those generated by previous methods in text metrics, navigation guidance evaluation, and user studies. △ Less

Submitted 16 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

Comments: ECCV 2024

arXiv:2407.07056 [pdf, other]

CAPformer: Compression-Aware Pre-trained Transformer for Low-Light Image Enhancement

Authors: Wei Wang, Zhi Jin

Abstract: Low-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPE… ▽ More Low-Light Image Enhancement (LLIE) has advanced with the surge in phone photography demand, yet many existing methods neglect compression, a crucial concern for resource-constrained phone photography. Most LLIE methods overlook this, hindering their effectiveness. In this study, we investigate the effects of JPEG compression on low-light images and reveal substantial information loss caused by JPEG due to widespread low pixel values in dark areas. Hence, we propose the Compression-Aware Pre-trained Transformer (CAPformer), employing a novel pre-training strategy to learn lossless information from uncompressed low-light images. Additionally, the proposed Brightness-Guided Self-Attention (BGSA) mechanism enhances rational information gathering. Experiments demonstrate the superiority of our approach in mitigating compression effects on LLIE, showcasing its potential for improving LLIE in resource-constrained scenarios. △ Less

Submitted 10 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06953 [pdf, other]

SP-Chain: Boosting Intra-Shard and Cross-Shard Security and Performance in Blockchain Sharding

Authors: Mingzhe Li, You Lin, Wei Wang, Jin Zhang

Abstract: A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or t… ▽ More A promising way to overcome the scalability limitations of the current blockchain is to use sharding, which is to split the transaction processing among multiple, smaller groups of nodes. A well-performed blockchain sharding system requires both high performance and high security in both intra- and cross-shard perspectives. However, existing protocols either have issues on protecting security or trade off great performance for security. In this paper, we propose SP-Chain, a blockchain sharding system with enhanced Security and Performance for both intra- and cross-shard perspectives. For intra-shard aspect, we design a two-phase concurrent voting scheme to provide high system throughput and low transaction confirmation latency. Moreover, we propose an efficient unbiased leader rotation scheme to ensure high performance under malicious behavior. For cross-shard aspect, a proof-assisted efficient cross-shard transaction processing mechanism is proposed to guard the cross-shard transactions with low overhead. We implement SP-Chain based on Harmony, and evaluate its performance via large-scale deployment. Extensive evaluations suggest that SP-Chain can process more than 10,000 tx/sec under malicious behaviors with a confirmation latency of 7.6s in a network of 4,000 nodes. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06943 [pdf, other]

A Starter's Kit for Concentric Tube Robots

Authors: Kalina Bonofiglio, Wenpeng Wang, Ethan R. Wilke, Adri Rajaraman, Loris Fichera

Abstract: Concentric Tube Robots (CTRs) have garnered significant interest within the surgical robotics community because of their flexibility, dexterity, and ease of miniaturization. However, mastering the unique kinematics and design principles of CTRs can be challenging for newcomers to the field. In this paper, we present an educational kit aimed at lowering the barriers to entry into concentric tube ro… ▽ More Concentric Tube Robots (CTRs) have garnered significant interest within the surgical robotics community because of their flexibility, dexterity, and ease of miniaturization. However, mastering the unique kinematics and design principles of CTRs can be challenging for newcomers to the field. In this paper, we present an educational kit aimed at lowering the barriers to entry into concentric tube robot research. Our goal is to provide accessible learning resources for CTRs, bridging the knowledge gap between traditional robotic arms and these specialized devices. The proposed kit includes (1) An open-source design and assembly instructions for an economical (cost of materials $\approx$ 700 USD) modular CTR; (2) A set of self-study materials to learn the basics of CTR modeling and control, including automatically-graded assignments. To evaluate the effectiveness of our educational kit, we conducted a human subjects study involving first-year graduate students in engineering. Over a four-week period, participants -- none of whom had any prior knowledge of concentric tube robots -- successfully built their first CTR using the provided materials, implemented the robot's kinematics in MATLAB, and conducted a tip-tracking experiment with an optical tracking device. Our findings suggest that the proposed kit facilitates learning and hands-on experience with CTRs, and furthermore, it has the potential to help early-stage graduate students get rapidly started with CTR research. By disseminating these resources, we hope to broaden participation in concentric tube robot research to a wider a more diverse group of researchers. △ Less

Submitted 9 July, 2024; originally announced July 2024.

arXiv:2407.06865 [pdf, ps, other]

Affine $\imath$quantum groups and Steinberg varieties of type C

Authors: Changjian Su, Weiqiang Wang

Abstract: We provide a geometric realization of the quasi-split affine $\imath$quantum group of type AIII$_{2n-1}^{(τ)}$ in terms of equivariant K-groups of non-connected Steinberg varieties of type C. This uses a new Drinfeld type presentation of this affine $\imath$quantum group which admits very nontrivial Serre relations. We then construct à la Springer a family of finite-dimensional standard modules an… ▽ More We provide a geometric realization of the quasi-split affine $\imath$quantum group of type AIII$_{2n-1}^{(τ)}$ in terms of equivariant K-groups of non-connected Steinberg varieties of type C. This uses a new Drinfeld type presentation of this affine $\imath$quantum group which admits very nontrivial Serre relations. We then construct à la Springer a family of finite-dimensional standard modules and irreducible modules of this $\imath$quantum group, and provide a composition multiplicity formula of the standard modules. △ Less

Submitted 13 July, 2024; v1 submitted 9 July, 2024; originally announced July 2024.

Comments: References updated

arXiv:2407.06844 [pdf, other]

Dynamic Correlation Learning and Regularization for Multi-Label Confidence Calibration

Authors: Tianshui Chen, Weihang Wang, Tao Pu, Jinghui Qin, Zhijing Yang, Jie Liu, Liang Lin

Abstract: Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This pa… ▽ More Modern visual recognition models often display overconfidence due to their reliance on complex deep neural networks and one-hot target supervision, resulting in unreliable confidence scores that necessitate calibration. While current confidence calibration techniques primarily address single-label scenarios, there is a lack of focus on more practical and generalizable multi-label contexts. This paper introduces the Multi-Label Confidence Calibration (MLCC) task, aiming to provide well-calibrated confidence scores in multi-label scenarios. Unlike single-label images, multi-label images contain multiple objects, leading to semantic confusion and further unreliability in confidence scores. Existing single-label calibration methods, based on label smoothing, fail to account for category correlations, which are crucial for addressing semantic confusion, thereby yielding sub-optimal performance. To overcome these limitations, we propose the Dynamic Correlation Learning and Regularization (DCLR) algorithm, which leverages multi-grained semantic correlations to better model semantic confusion for adaptive regularization. DCLR learns dynamic instance-level and prototype-level similarities specific to each category, using these to measure semantic correlations across different categories. With this understanding, we construct adaptive label vectors that assign higher values to categories with strong correlations, thereby facilitating more effective regularization. We establish an evaluation benchmark, re-implementing several advanced confidence calibration algorithms and applying them to leading multi-label recognition (MLR) models for fair comparison. Through extensive experiments, we demonstrate the superior performance of DCLR over existing methods in providing reliable confidence scores in multi-label scenarios. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: submitted to TIP

arXiv:2407.06540 [pdf, other]

General and Task-Oriented Video Segmentation

Authors: Mu Chen, Liulei Li, Wenguan Wang, Ruijie Quan, Yi Yang

Abstract: We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies… ▽ More We present GvSeg, a general video segmentation framework for addressing four different video segmentation tasks (i.e., instance, semantic, panoptic, and exemplar-guided) while maintaining an identical architectural design. Currently, there is a trend towards developing general video segmentation solutions that can be applied across multiple tasks. This streamlines research endeavors and simplifies deployment. However, such a highly homogenized framework in current design, where each element maintains uniformity, could overlook the inherent diversity among different tasks and lead to suboptimal performance. To tackle this, GvSeg: i) provides a holistic disentanglement and modeling for segment targets, thoroughly examining them from the perspective of appearance, position, and shape, and on this basis, ii) reformulates the query initialization, matching and sampling strategies in alignment with the task-specific requirement. These architecture-agnostic innovations empower GvSeg to effectively address each unique task by accommodating the specific properties that characterize them. Extensive experiments on seven gold-standard benchmark datasets demonstrate that GvSeg surpasses all existing specialized/general solutions by a significant margin on four different video segmentation tasks. △ Less

Submitted 9 July, 2024; originally announced July 2024.

Comments: ECCV 2024; Project page: https://github.com/kagawa588/GvSeg

arXiv:2407.06426 [pdf, other]

DebUnc: Mitigating Hallucinations in Large Language Model Agent Communication with Uncertainty Estimations

Authors: Luke Yoffe, Alfonso Amayuelas, William Yang Wang

Abstract: To enhance Large Language Model (LLM) capabilities, multi-agent debates have been introduced, where multiple LLMs discuss solutions to a problem over several rounds of debate. However, LLMs often produce incorrect responses that appear deceptively confident, which can mislead other agents. This is partly because agents do not express their confidence levels during standard debates. To address this… ▽ More To enhance Large Language Model (LLM) capabilities, multi-agent debates have been introduced, where multiple LLMs discuss solutions to a problem over several rounds of debate. However, LLMs often produce incorrect responses that appear deceptively confident, which can mislead other agents. This is partly because agents do not express their confidence levels during standard debates. To address this, we introduce DebUnc, a multi-agent debate framework that uses uncertainty metrics to assess agent confidence levels. We adapted the LLM attention mechanism to adjust token weights based on confidence levels and also explored using textual prompts to convey confidence. Our evaluations across various benchmarks show that attention-based methods are particularly effective, and that as uncertainty metrics evolve, performance will continue to increase. The code is available at https://github.com/lukeyoffe/debunc △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05984 [pdf, other]

MBA-Net: SAM-driven Bidirectional Aggregation Network for Ovarian Tumor Segmentation

Authors: Yifan Gao, Wei Xia, Wenkui Wang, Xin Gao

Abstract: Accurate segmentation of ovarian tumors from medical images is crucial for early diagnosis, treatment planning, and patient management. However, the diverse morphological characteristics and heterogeneous appearances of ovarian tumors pose significant challenges to automated segmentation methods. In this paper, we propose MBA-Net, a novel architecture that integrates the powerful segmentation capa… ▽ More Accurate segmentation of ovarian tumors from medical images is crucial for early diagnosis, treatment planning, and patient management. However, the diverse morphological characteristics and heterogeneous appearances of ovarian tumors pose significant challenges to automated segmentation methods. In this paper, we propose MBA-Net, a novel architecture that integrates the powerful segmentation capabilities of the Segment Anything Model (SAM) with domain-specific knowledge for accurate and robust ovarian tumor segmentation. MBA-Net employs a hybrid encoder architecture, where the encoder consists of a prior branch, which inherits the SAM encoder to capture robust segmentation priors, and a domain branch, specifically designed to extract domain-specific features. The bidirectional flow of information between the two branches is facilitated by the robust feature injection network (RFIN) and the domain knowledge integration network (DKIN), enabling MBA-Net to leverage the complementary strengths of both branches. We extensively evaluate MBA-Net on the public multi-modality ovarian tumor ultrasound dataset and the in-house multi-site ovarian tumor MRI dataset. Our proposed method consistently outperforms state-of-the-art segmentation approaches. Moreover, MBA-Net demonstrates superior generalization capability across different imaging modalities and clinical sites. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: MICCAI 2024

arXiv:2407.05862 [pdf, other]

Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

Authors: Bin Ren, Guofeng Mei, Danda Pani Paudel, Weijie Wang, Yawei Li, Mengyuan Liu, Rita Cucchiara, Luc Van Gool, Nicu Sebe

Abstract: Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-ba… ▽ More Contrastive learning (CL) for Vision Transformers (ViTs) in image domains has achieved performance comparable to CL for traditional convolutional backbones. However, in 3D point cloud pretraining with ViTs, masked autoencoder (MAE) modeling remains dominant. This raises the question: Can we take the best of both worlds? To answer this question, we first empirically validate that integrating MAE-based point cloud pre-training with the standard contrastive learning paradigm, even with meticulous design, can lead to a decrease in performance. To address this limitation, we reintroduce CL into the MAE-based point cloud pre-training paradigm by leveraging the inherent contrastive properties of MAE. Specifically, rather than relying on extensive data augmentation as commonly used in the image domain, we randomly mask the input tokens twice to generate contrastive input pairs. Subsequently, a weight-sharing encoder and two identically structured decoders are utilized to perform masked token reconstruction. Additionally, we propose that for an input token masked by both masks simultaneously, the reconstructed features should be as similar as possible. This naturally establishes an explicit contrastive constraint within the generative MAE-based pre-training paradigm, resulting in our proposed method, Point-CMAE. Consequently, Point-CMAE effectively enhances the representation quality and transfer performance compared to its MAE counterpart. Experimental evaluations across various downstream applications, including classification, part segmentation, and few-shot learning, demonstrate the efficacy of our framework in surpassing state-of-the-art techniques under standard ViTs and single-modal settings. The source code and trained models are available at: https://github.com/Amazingren/Point-CMAE. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning

arXiv:2407.05834 [pdf, other]

Extraction of fissile isotope antineutrino spectra using feedforward neural network

Authors: Jian Chen, Jun Wang, Wei Wang, Yuehuan Wei

Abstract: Precise measurement of antineutrino spectra produced by isotope fission in reactors is of great significance for studying neutrino oscillations, refining nuclear databases, and addressing the reactor antineutrino anomaly. This work reports a method utilizing a feedforward neural network (FNN) model to decompose the reconstructed measured prompt energy spectrum observed by a short-baseline reactor… ▽ More Precise measurement of antineutrino spectra produced by isotope fission in reactors is of great significance for studying neutrino oscillations, refining nuclear databases, and addressing the reactor antineutrino anomaly. This work reports a method utilizing a feedforward neural network (FNN) model to decompose the reconstructed measured prompt energy spectrum observed by a short-baseline reactor neutrino experiment and extract the antineutrino spectra produced by the fission of major isotopes such as $^{235}$U, $^{238}$U, $^{239}$Pu, and $^{241}$Pu in a nuclear reactor. We present two training strategies for this model and compare them with the traditional $χ^2$ minimization method, analyzing the same set of pseudo-data for a total exposure of $(2.9\times 5\times 1800)~\rm{GW_{th}\cdot tons\cdot days}$. The results show that the FNN model not only converges faster and better during the fitting process but also achieves relative errors in the extracted spectra within 1\% in the $2-8$ MeV range, outperforming the $χ^2$ minimization method. The feasibility and superiority of this method have been validated in this study. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05718 [pdf, other]

A Factuality and Diversity Reconciled Decoding Method for Knowledge-Grounded Dialogue Generation

Authors: Chenxu Yang, Zheng Lin, Chong Tian, Liang Pang, Lanrui Wang, Zhengyang Tong, Qirong Ho, Yanan Cao, Weiping Wang

Abstract: Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to disc… ▽ More Grounding external knowledge can enhance the factuality of responses in dialogue generation. However, excessive emphasis on it might result in the lack of engaging and diverse expressions. Through the introduction of randomness in sampling, current approaches can increase the diversity. Nevertheless, such sampling method could undermine the factuality in dialogue generation. In this study, to discover a solution for advancing creativity without relying on questionable randomness and to subtly reconcile the factuality and diversity within the source-grounded paradigm, a novel method named DoGe is proposed. DoGe can dynamically alternate between the utilization of internal parameter knowledge and external source knowledge based on the model's factual confidence. Extensive experiments on three widely-used datasets show that DoGe can not only enhance response diversity but also maintain factuality, and it significantly surpasses other various decoding strategy baselines. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05700 [pdf, other]

InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct

Authors: Yutong Wu, Di Huang, Wenxuan Shi, Wei Wang, Lingzhe Gao, Shihao Liu, Ziyuan Nan, Kaizhao Yuan, Rui Zhang, Xishan Zhang, Zidong Du, Qi Guo, Yewen Pu, Dawei Yin, Xing Hu, Yunji Chen

Abstract: Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation… ▽ More Recent advancements in open-source code large language models (LLMs) have demonstrated remarkable coding abilities by fine-tuning on the data generated from powerful closed-source LLMs such as GPT-3.5 and GPT-4 for instruction tuning. This paper explores how to further improve an instruction-tuned code LLM by generating data from itself rather than querying closed-source LLMs. Our key observation is the misalignment between the translation of formal and informal languages: translating formal language (i.e., code) to informal language (i.e., natural language) is more straightforward than the reverse. Based on this observation, we propose INVERSE-INSTRUCT, which summarizes instructions from code snippets instead of the reverse. Specifically, given an instruction tuning corpus for code and the resulting instruction-tuned code LLM, we ask the code LLM to generate additional high-quality instructions for the original corpus through code summarization and self-evaluation. Then, we fine-tune the base LLM on the combination of the original corpus and the self-generated one, which yields a stronger instruction-tuned LLM. We present a series of code LLMs named InverseCoder, which surpasses the performance of the original code LLMs on a wide range of benchmarks, including Python text-to-code generation, multilingual coding, and data-science code generation. △ Less

Submitted 8 July, 2024; originally announced July 2024.

arXiv:2407.05690 [pdf, other]

Pruning Large Language Models to Intra-module Low-rank Architecture with Transitional Activations

Authors: Bowen Shen, Zheng Lin, Daren Zha, Wei Liu, Jian Luan, Bin Wang, Weiping Wang

Abstract: Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a… ▽ More Structured pruning fundamentally reduces computational and memory overheads of large language models (LLMs) and offers a feasible solution for end-side LLM deployment. Structurally pruned models remain dense and high-precision, highly compatible with further tuning and compression. However, as the coarse-grained structured pruning poses large damage to the highly interconnected model, achieving a high compression ratio for scaled-up LLMs remains a challenge. In this paper, we introduce a task-agnostic structured pruning approach coupled with a compact Transformer architecture design. The proposed approach, named TransAct, reduces transitional activations inside multi-head attention (MHA) and multi-layer perceptron (MLP) modules, while preserving the inter-module activations that are sensitive to perturbations. Hence, the LLM is pruned into an intra-module low-rank architecture, significantly reducing weights, KV Cache and attention computation. TransAct is implemented on the LLaMA model and evaluated on downstream benchmarks. Results verify the optimality of our approach at high compression with respect to both efficiency and performance. Further, ablation studies reveal the strength of activation-guided iterative pruning and provide experimental analysis on the redundancy of MHA and MLP modules. △ Less

Submitted 8 July, 2024; originally announced July 2024.

Comments: Findings of ACL 2024

arXiv:2407.05550 [pdf, other]

MEEG and AT-DGNN: Advancing EEG Emotion Recognition with Music and Graph Learning

Authors: Minghao Xiao, Zhengxi Zhu, Wenyu Wang, Meixia Qu

Abstract: Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition… ▽ More Recent advances in neuroscience have elucidated the crucial role of coordinated brain region activities during cognitive tasks. To explore the complexity, we introduce the MEEG dataset, a comprehensive multi-modal music-induced electroencephalogram (EEG) dataset and the Attention-based Temporal Learner with Dynamic Graph Neural Network (AT-DGNN), a novel framework for EEG-based emotion recognition. The MEEG dataset captures a wide range of emotional responses to music, enabling an in-depth analysis of brainwave patterns in musical contexts. The AT-DGNN combines an attention-based temporal learner with a dynamic graph neural network (DGNN) to accurately model the local and global graph dynamics of EEG data across varying brain network topology. Our evaluations show that AT-DGNN achieves superior performance, with an accuracy (ACC) of 83.06\% in arousal and 85.31\% in valence, outperforming state-of-the-art (SOTA) methods on the MEEG dataset. Comparative analyses with traditional datasets like DEAP highlight the effectiveness of our approach and underscore the potential of music as a powerful medium for emotion induction. This study not only advances our understanding of the brain emotional processing, but also enhances the accuracy of emotion recognition technologies in brain-computer interfaces (BCI), leveraging both graph-based learning and the emotional impact of music. The source code and dataset are available at \textit{https://github.com/xmh1011/AT-DGNN}. △ Less

Submitted 7 July, 2024; originally announced July 2024.

arXiv:2407.05250 [pdf, other]

CLIMB: A Benchmark of Clinical Bias in Large Language Models

Authors: Yubo Zhang, Shudi Hou, Mingyu Derek Ma, Wei Wang, Muhao Chen, Jieyu Zhao

Abstract: Large language models (LLMs) are increasingly applied to clinical decision-making. However, their potential to exhibit bias poses significant risks to clinical equity. Currently, there is a lack of benchmarks that systematically evaluate such clinical bias in LLMs. While in downstream tasks, some biases of LLMs can be avoided such as by instructing the model to answer "I'm not sure...", the intern… ▽ More Large language models (LLMs) are increasingly applied to clinical decision-making. However, their potential to exhibit bias poses significant risks to clinical equity. Currently, there is a lack of benchmarks that systematically evaluate such clinical bias in LLMs. While in downstream tasks, some biases of LLMs can be avoided such as by instructing the model to answer "I'm not sure...", the internal bias hidden within the model still lacks deep studies. We introduce CLIMB (shorthand for A Benchmark of Clinical Bias in Large Language Models), a pioneering comprehensive benchmark to evaluate both intrinsic (within LLMs) and extrinsic (on downstream tasks) bias in LLMs for clinical decision tasks. Notably, for intrinsic bias, we introduce a novel metric, AssocMAD, to assess the disparities of LLMs across multiple demographic groups. Additionally, we leverage counterfactual intervention to evaluate extrinsic bias in a task of clinical diagnosis prediction. Our experiments across popular and medically adapted LLMs, particularly from the Mistral and LLaMA families, unveil prevalent behaviors with both intrinsic and extrinsic bias. This work underscores the critical need to mitigate clinical bias and sets a new standard for future evaluations of LLMs' clinical bias. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.05054 [pdf]

Cross-Lingual Word Alignment for ASEAN Languages with Contrastive Learning

Authors: Jingshen Zhang, Xinying Qiu, Teng Shen, Wenyu Wang, Kailin Zhang, Wenhe Feng

Abstract: Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between wor… ▽ More Cross-lingual word alignment plays a crucial role in various natural language processing tasks, particularly for low-resource languages. Recent study proposes a BiLSTM-based encoder-decoder model that outperforms pre-trained language models in low-resource settings. However, their model only considers the similarity of word embedding spaces and does not explicitly model the differences between word embeddings. To address this limitation, we propose incorporating contrastive learning into the BiLSTM-based encoder-decoder framework. Our approach introduces a multi-view negative sampling strategy to learn the differences between word pairs in the shared cross-lingual embedding space. We evaluate our model on five bilingual aligned datasets spanning four ASEAN languages: Lao, Vietnamese, Thai, and Indonesian. Experimental results demonstrate that integrating contrastive learning consistently improves word alignment accuracy across all datasets, confirming the effectiveness of the proposed method in low-resource scenarios. We will release our data set and code to support future research on ASEAN or more low-resource word alignment. △ Less

Submitted 6 July, 2024; originally announced July 2024.

arXiv:2407.05029 [pdf, ps, other]

doi 10.1109/IOTM.001.2300201

Ubiquitous Integrated Sensing and Communications for Massive MIMO LEO Satellite Systems

Authors: Li You, Yongxiang Zhu, Xiaoyu Qiang, Christos G. Tsinos, Wenjin Wang, Xiqi Gao, Björn Ottersten

Abstract: The next sixth generation (6G) networks are envisioned to integrate sensing and communications in a single system, thus greatly improving spectrum utilization and reducing hardware costs. Low earth orbit (LEO) satellite communications combined with massive multiple-input multiple-output (MIMO) technology holds significant promise in offering ubiquitous and seamless connectivity with high data rate… ▽ More The next sixth generation (6G) networks are envisioned to integrate sensing and communications in a single system, thus greatly improving spectrum utilization and reducing hardware costs. Low earth orbit (LEO) satellite communications combined with massive multiple-input multiple-output (MIMO) technology holds significant promise in offering ubiquitous and seamless connectivity with high data rates. Existing integrated sensing and communications (ISAC) studies mainly focus on terrestrial systems, while operating ISAC in massive MIMO LEO satellite systems is promising to provide high-capacity communication and flexible sensing ubiquitously. In this paper, we first give an overview of LEO satellite systems and ISAC and consider adopting ISAC in the massive MIMO LEO satellite systems. Then, the recent research advances are presented. A discussion on related challenges and key enabling technologies follows. Finally, we point out some open issues and promising research directions. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 6 pages,4 figures

Journal ref: IEEE Internet of Things Magazine, vol. 7, no. 4, pp. 30-35, Jul. 2024

arXiv:2407.04973 [pdf, other]

LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts

Authors: Yijia Xiao, Edward Sun, Tianyu Liu, Wei Wang

Abstract: We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficie… ▽ More We propose LogicVista, an evaluation benchmark that assesses the integrated logical reasoning capabilities of multimodal large language models (MLLMs) in Visual contexts. Recent advancements in MLLMs have demonstrated various fascinating abilities, from crafting poetry based on an image to performing mathematical reasoning. However, there is still a lack of systematic evaluation of MLLMs' proficiency in logical reasoning tasks, which are essential for activities like navigation and puzzle-solving. Thus we evaluate general logical cognition abilities across 5 logical reasoning tasks encompassing 9 different capabilities, using a sample of 448 multiple-choice questions. Each question is annotated with the correct answer and the human-written reasoning behind the selection, enabling both open-ended and multiple-choice evaluation. A total of 8 MLLMs are comprehensively evaluated using LogicVista. Code and Data Available at https://github.com/Yijia-Xiao/LogicVista. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: LogicVista benchmarks the logical reasoning of multimodal large language models in visual tasks

Showing 1–50 of 7,296 results for author: Wang, W