subscribe to arXiv mailings

Invisible sweat sensor: ultrathin membrane mimics skin for stress monitoring

Authors: Yuchen Feng, Andreas Kenny Oktavius, Reno Adley Prawoto, Hing Ni Ko, Qiao Gu, Ping Gao

Abstract: Epidermal skin sensors have emerged as a promising approach for continuous and noninvasive monitoring of vital health signals, but to maximize their performance, these sensors must integrate seamlessly with the skin, minimizing impedance while maintaining the skin's natural protective and regulatory functions.In this study, we introduce an imperceptible sweat sensor that achieves this seamless ski… ▽ More Epidermal skin sensors have emerged as a promising approach for continuous and noninvasive monitoring of vital health signals, but to maximize their performance, these sensors must integrate seamlessly with the skin, minimizing impedance while maintaining the skin's natural protective and regulatory functions.In this study, we introduce an imperceptible sweat sensor that achieves this seamless skin integration through interpenetrating networks formed by a porous, ultra-thin, ultra-high molecular weight polyethylene (UHMWPE) nanomembrane. Upon attachment to the skin by van der Waals force, the amphiphilic sweat extrudates infuse into the interconnected nanopores inside the hydrophobic UHWMPE nanomembrane, forming "pseudo skin" nanochannels for continuous sweat perspiration. This integration is further enhanced by the osmotic pressure generated during water evaporation. Leveraging the efficient transport of biomarkers through the "skin" channels within the porous membrane, we developed an organic electrochemical transducer (OECT) cortisol sensor via in-situ synthesis of a molecularly imprinted polymer (MIP) and poly(3,4 ethylenedioxythiophene) (PEDOT) within the nanomembrane. This demonstrates the capability to detect cortisol concentrations from 0.05 to 0.5 μM for seamless monitoring of stress levels. This work represents a significant advancement in self-adhesive sweat sensors that offer imperceptible and real-time non-invasive health monitoring capabilities. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.06053 [pdf, other]

Learning local equivariant representations for quantum operators

Authors: Zhanghao Zhouyin, Zixi Gan, Shishir Kumar Pandey, Linfeng Zhang, Qiangqiang Gu

Abstract: Predicting quantum operator matrices such as Hamiltonian, overlap, and density matrices in the density functional theory (DFT) framework is crucial for understanding material properties. Current methods often focus on individual operators and struggle with efficiency and scalability for large systems. Here we introduce a novel deep learning model, SLEM (strictly localized equivariant message-passi… ▽ More Predicting quantum operator matrices such as Hamiltonian, overlap, and density matrices in the density functional theory (DFT) framework is crucial for understanding material properties. Current methods often focus on individual operators and struggle with efficiency and scalability for large systems. Here we introduce a novel deep learning model, SLEM (strictly localized equivariant message-passing) for predicting multiple quantum operators, that achieves state-of-the-art accuracy while dramatically improving computational efficiency. SLEM's key innovation is its strict locality-based design, constructing local, equivariant representations for quantum tensors while preserving physical symmetries. This enables complex many-body dependence without expanding the effective receptive field, leading to superior data efficiency and transferability. Using an innovative SO(2) convolution technique, SLEM reduces the computational complexity of high-order tensor products and is therefore capable of handling systems requiring the $f$ and $g$ orbitals in their basis sets. We demonstrate SLEM's capabilities across diverse 2D and 3D materials, achieving high accuracy even with limited training data. SLEM's design facilitates efficient parallelization, potentially extending DFT simulations to systems with device-level sizes, opening new possibilities for large-scale quantum simulations and high-throughput materials discovery. △ Less

Submitted 16 July, 2024; v1 submitted 8 July, 2024; originally announced July 2024.

Comments: 11 pages, 5 figures and 5 tables

arXiv:2407.03409 [pdf, other]

From Halos to Galaxies. IX. Estimate of Halo Assembly History for SDSS Galaxy Groups

Authors: Cheqiu Lyu, Yingjie Peng, Yipeng Jing, Xiaohu Yang, Luis C. Ho, Alvio Renzini, Dingyi Zhao, Filippo Mannucci, Houjun Mo, Kai Wang, Bitao Wang, Bingxiao Xu, Jing Dou, Anna R. Gallazzi, Qiusheng Gu, Roberto Maiolino, Enci Wang, Feng Yuan

Abstract: The properties of the galaxies are tightly connected to their host halo mass and halo assembly history. Accurate measurement of the halo assembly history in observation is challenging but crucial to the understanding of galaxy formation and evolution. The stellar-to-halo mass ratio ($M_*/M_{\mathrm{h}}$) for the centrals has often been used to indicate the halo assembly time $t_{\mathrm{h,50}}$ of… ▽ More The properties of the galaxies are tightly connected to their host halo mass and halo assembly history. Accurate measurement of the halo assembly history in observation is challenging but crucial to the understanding of galaxy formation and evolution. The stellar-to-halo mass ratio ($M_*/M_{\mathrm{h}}$) for the centrals has often been used to indicate the halo assembly time $t_{\mathrm{h,50}}$ of the group, where $t_{\mathrm{h,50}}$ is the lookback time at which a halo has assembled half of its present-day virial mass. Using mock data from the semi-analytic models, we find that $M_*/M_{\mathrm{h}}$ shows a significant scatter with $t_{\mathrm{h,50}}$, with a strong systematic difference between the group with a star-forming central (blue group) and passive central (red group). To improve the accuracy, we develop machine-learning models to estimate $t_{\mathrm{h,50}}$ for galaxy groups using only observable quantities in the mocks. Since star-formation quenching will decouple the co-growth of the dark matter and baryon, we train our models separately for blue and red groups. Our models have successfully recovered $t_{\mathrm{h,50}}$, within an accuracy of $\sim$ 1.09 Gyr. With careful calibrations of individual observable quantities in the mocks with SDSS observations, we apply the trained models to the SDSS Yang et al. groups and derive the $t_{\mathrm{h,50}}$ for each group for the first time. The derived SDSS $t_{\mathrm{h,50}}$ distributions are in good agreement with that in the mocks, in particular for blue groups. The derived halo assembly history, together with the halo mass, make an important step forward in studying the halo-galaxy connections in observation. △ Less

Submitted 3 July, 2024; originally announced July 2024.

Comments: 18 pages, 7 figures. Accepted by ApJ

arXiv:2407.02973 [pdf, other]

NOEMA formIng Cluster survEy (NICE): Characterizing eight massive galaxy groups at $1.5 < z < 4$ in the COSMOS field

Authors: Nikolaj B. Sillassen, Shuowen Jin, Georgios E. Magdis, Emanuele Daddi, Tao Wang, Shiying Lu, Hanwen Sun, Vinod Arumugam, Daizhong Liu, Malte Brinch, Chiara D'Eugenio, Raphael Gobat, Carlos Gómez-Guijarro, Michael Rich, Eva Schinnerer, Veronica Strazzullo, Qinghua Tan, Francesco Valentino, Yijun Wang, Mengyuan Xiao, Luwenjia Zhou, David Blánquez-Sesé, Zheng Cai, Yanmei Chen, Laure Ciesla , et al. (19 additional authors not shown)

Abstract: The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are c… ▽ More The NOEMA formIng Cluster survEy (NICE) is a large program targeting 69 massive galaxy group candidates at $z>2$ in six deep fields. We report spectroscopic confirmation of eight groups at $1.65\leq z\leq3.61$ in COSMOS. Homogeneously selected as significant overdensities of red IRAC sources with red Herschel colors, four groups are confirmed by CO and [CI] with NOEMA 3mm observations, three are confirmed with ALMA, and one is confirmed by H$α$ from Subaru/FMOS. We constructed the integrated FIR SEDs for the eight groups, obtaining total IR SFR $=260-1300~{\rm M_\odot}$~yr$^{-1}$. We adopted six methods to estimate the dark matter masses, including stellar mass to halo mass relations, overdensity with galaxy bias, and NFW profile fitting to radial stellar mass density. We found the radial stellar mass density are consistent with a NFW profile, supporting that they are collapsed structures hosted by a single dark matter halo. The best halo mass estimates are $\log(M_{\rm h}/{\rm M_\odot})=12.8-13.7$ with uncertainty of 0.3 dex. From halo mass estimates, we derive baryonic accretion rate ${\rm BAR}=(1-8)\times10^{3}\,{\rm M_{\odot}/yr}$ for this sample. We find a quasi-linear correlation between the integrated SFR/BAR and the theoretical halo mass limit for cold streams, $M_{\rm stream}/M_{\rm h}$, with ${\rm SFR/BAR}=10^{-0.46\pm0.22}\left({M_{\rm stream}/M_{\rm h}}\right)^{0.71\pm0.16}$ with a scatter of $0.40\,{\rm dex}$. Further, we compare halo masses and stellar masses with simulations, and find all structures are consistent with being progenitors of $M_{\rm h}(z=0)>10^{14}\,{\rm M_{\odot}}$ galaxy clusters, and the most massive central galaxies have stellar masses consistent with brightest cluster galaxies (BCGs) progenitors in the TNG300 simulation. The results strongly suggest these structures are forming massive galaxy clusters via baryonic and dark matter accretion. △ Less

Submitted 5 July, 2024; v1 submitted 3 July, 2024; originally announced July 2024.

Comments: 44 pages (27pp appendix), 32 figures, 18 tables, accepted for publication in A&A

arXiv:2407.02047 [pdf, other]

CountFormer: Multi-View Crowd Counting Transformer

Authors: Hong Mo, Xiong Zhang, Jianchao Tan, Cheng Yang, Qiong Gu, Bo Hang, Wenqi Ren

Abstract: Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise… ▽ More Multi-view counting (MVC) methods have shown their superiority over single-view counterparts, particularly in situations characterized by heavy occlusion and severe perspective distortions. However, hand-crafted heuristic features and identical camera layout requirements in conventional MVC methods limit their applicability and scalability in real-world scenarios.In this work, we propose a concise 3D MVC framework called \textbf{CountFormer}to elevate multi-view image-level features to a scene-level volume representation and estimate the 3D density map based on the volume features. By incorporating a camera encoding strategy, CountFormer successfully embeds camera parameters into the volume query and image-level features, enabling it to handle various camera layouts with significant differences.Furthermore, we introduce a feature lifting module capitalized on the attention mechanism to transform image-level features into a 3D volume representation for each camera view. Subsequently, the multi-view volume aggregation module attentively aggregates various multi-view volumes to create a comprehensive scene-level volume representation, allowing CountFormer to handle images captured by arbitrary dynamic camera layouts. The proposed method performs favorably against the state-of-the-art approaches across various widely used datasets, demonstrating its greater suitability for real-world deployment compared to conventional MVC frameworks. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted By ECCV2024

arXiv:2406.16255 [pdf, other]

Uncertainty-Aware Reward-Free Exploration with General Function Approximation

Authors: Junkai Zhang, Weitong Zhang, Dongruo Zhou, Quanquan Gu

Abstract: Mastering multiple tasks through exploration and learning in an environment poses a significant challenge in reinforcement learning (RL). Unsupervised RL has been introduced to address this challenge by training policies with intrinsic rewards rather than extrinsic rewards. However, current intrinsic reward designs and unsupervised RL algorithms often overlook the heterogeneous nature of collected… ▽ More Mastering multiple tasks through exploration and learning in an environment poses a significant challenge in reinforcement learning (RL). Unsupervised RL has been introduced to address this challenge by training policies with intrinsic rewards rather than extrinsic rewards. However, current intrinsic reward designs and unsupervised RL algorithms often overlook the heterogeneous nature of collected samples, thereby diminishing their sample efficiency. To overcome this limitation, in this paper, we propose a reward-free RL algorithm called \alg. The key idea behind our algorithm is an uncertainty-aware intrinsic reward for exploring the environment and an uncertainty-weighted learning process to handle heterogeneous uncertainty in different samples. Theoretically, we show that in order to find an $ε$-optimal policy, GFA-RFE needs to collect $\tilde{O} (H^2 \log N_{\mathcal F} (ε) \mathrm{dim} (\mathcal F) / ε^2 )$ number of episodes, where $\mathcal F$ is the value function class with covering number $N_{\mathcal F} (ε)$ and generalized eluder dimension $\mathrm{dim} (\mathcal F)$. Such a result outperforms all existing reward-free RL algorithms. We further implement and evaluate GFA-RFE across various domains and tasks in the DeepMind Control Suite. Experiment results show that GFA-RFE outperforms or is comparable to the performance of state-of-the-art unsupervised RL algorithms. △ Less

Submitted 29 June, 2024; v1 submitted 23 June, 2024; originally announced June 2024.

Comments: 32 pages, 5 figures, 4 tables, accepted by ICML 2024

arXiv:2406.11234 [pdf, other]

MiniConGTS: A Near Ultimate Minimalist Contrastive Grid Tagging Scheme for Aspect Sentiment Triplet Extraction

Authors: Qiao Sun, Liujia Yang, Minghao Ma, Nanyang Ye, Qinying Gu

Abstract: Aspect Sentiment Triplet Extraction (ASTE) aims to co-extract the sentiment triplets in a given corpus. Existing approaches within the pretraining-finetuning paradigm tend to either meticulously craft complex tagging schemes and classification heads, or incorporate external semantic augmentation to enhance performance. In this study, we, for the first time, re-evaluate the redundancy in tagging sc… ▽ More Aspect Sentiment Triplet Extraction (ASTE) aims to co-extract the sentiment triplets in a given corpus. Existing approaches within the pretraining-finetuning paradigm tend to either meticulously craft complex tagging schemes and classification heads, or incorporate external semantic augmentation to enhance performance. In this study, we, for the first time, re-evaluate the redundancy in tagging schemes and the internal enhancement in pretrained representations. We propose a method to improve and utilize pretrained representations by integrating a minimalist tagging scheme and a novel token-level contrastive learning strategy. The proposed approach demonstrates comparable or superior performance compared to state-of-the-art techniques while featuring a more compact design and reduced computational overhead. Additionally, we are the first to formally evaluate GPT-4's performance in few-shot learning and Chain-of-Thought scenarios for this task. The results demonstrate that the pretraining-finetuning paradigm remains highly effective even in the era of large language models. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: arXiv admin note: text overlap with arXiv:2403.07342

arXiv:2406.09229 [pdf, other]

MGRQ: Post-Training Quantization For Vision Transformer With Mixed Granularity Reconstruction

Authors: Lianwei Yang, Zhikai Li, Junrui Xiao, Haisong Gong, Qingyi Gu

Abstract: Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruct… ▽ More Post-training quantization (PTQ) efficiently compresses vision models, but unfortunately, it accompanies a certain degree of accuracy degradation. Reconstruction methods aim to enhance model performance by narrowing the gap between the quantized model and the full-precision model, often yielding promising results. However, efforts to significantly improve the performance of PTQ through reconstruction in the Vision Transformer (ViT) have shown limited efficacy. In this paper, we conduct a thorough analysis of the reasons for this limited effectiveness and propose MGRQ (Mixed Granularity Reconstruction Quantization) as a solution to address this issue. Unlike previous reconstruction schemes, MGRQ introduces a mixed granularity reconstruction approach. Specifically, MGRQ enhances the performance of PTQ by introducing Extra-Block Global Supervision and Intra-Block Local Supervision, building upon Optimized Block-wise Reconstruction. Extra-Block Global Supervision considers the relationship between block outputs and the model's output, aiding block-wise reconstruction through global supervision. Meanwhile, Intra-Block Local Supervision reduces generalization errors by aligning the distribution of outputs at each layer within a block. Subsequently, MGRQ is further optimized for reconstruction through Mixed Granularity Loss Fusion. Extensive experiments conducted on various ViT models illustrate the effectiveness of MGRQ. Notably, MGRQ demonstrates robust performance in low-bit quantization, thereby enhancing the practicality of the quantized model. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: Accepted by 2024 IEEE International Conference on Image Processing

arXiv:2406.06279 [pdf, other]

Multi-Prompting Decoder Helps Better Language Understanding

Authors: Zifeng Cheng, Zhaoling Chen, Zhiwei Jiang, Yafeng Yin, Shiping Ge, Yuliang Liu, Qing Gu

Abstract: Recent Pre-trained Language Models (PLMs) usually only provide users with the inference APIs, namely the emerging Model-as-a-Service (MaaS) setting. To adapt MaaS PLMs to downstream tasks without accessing their parameters and gradients, some existing methods focus on the output-side adaptation of PLMs, viewing the PLM as an encoder and then optimizing a task-specific decoder for decoding the outp… ▽ More Recent Pre-trained Language Models (PLMs) usually only provide users with the inference APIs, namely the emerging Model-as-a-Service (MaaS) setting. To adapt MaaS PLMs to downstream tasks without accessing their parameters and gradients, some existing methods focus on the output-side adaptation of PLMs, viewing the PLM as an encoder and then optimizing a task-specific decoder for decoding the output hidden states and class scores of the PLM. Despite the effectiveness of these methods, they only use a single prompt to query PLMs for decoding, leading to a heavy reliance on the quality of the adopted prompt. In this paper, we propose a simple yet effective Multi-Prompting Decoder (MPD) framework for MaaS adaptation. The core idea is to query PLMs with multiple different prompts for each sample, thereby obtaining multiple output hidden states and class scores for subsequent decoding. Such multi-prompting decoding paradigm can simultaneously mitigate reliance on the quality of a single prompt, alleviate the issue of data scarcity under the few-shot setting, and provide richer knowledge extracted from PLMs. Specifically, we propose two decoding strategies: multi-prompting decoding with optimal transport for hidden states and calibrated decoding for class scores. Extensive experiments demonstrate that our method achieves new state-of-the-art results on multiple natural language understanding datasets under the few-shot setting. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.02511 [pdf, other]

V-Express: Conditional Dropout for Progressive Training of Portrait Video Generation

Authors: Cong Wang, Kuan Tian, Jun Zhang, Yonghang Guan, Feng Luo, Fei Shen, Zhiwei Jiang, Qing Gu, Xiao Han, Wei Yang

Abstract: In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effecti… ▽ More In the field of portrait video generation, the use of single images to generate portrait videos has become increasingly prevalent. A common approach involves leveraging generative models to enhance adapters for controlled generation. However, control signals (e.g., text, audio, reference image, pose, depth map, etc.) can vary in strength. Among these, weaker conditions often struggle to be effective due to interference from stronger conditions, posing a challenge in balancing these conditions. In our work on portrait video generation, we identified audio signals as particularly weak, often overshadowed by stronger signals such as facial pose and reference image. However, direct training with weak signals often leads to difficulties in convergence. To address this, we propose V-Express, a simple method that balances different control signals through the progressive training and the conditional dropout operation. Our method gradually enables effective control by weak conditions, thereby achieving generation capabilities that simultaneously take into account the facial pose, reference image, and audio. The experimental results demonstrate that our method can effectively generate portrait videos controlled by audio. Furthermore, a potential solution is provided for the simultaneous and effective use of conditions of varying strengths. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.01181 [pdf, other]

Q-BiC: A biocompatible integrated chip for in vitro and in vivo spin-based quantum sensing

Authors: Louise Shanahan, Sophia Belser, Jack W. Hart, Qiushi Gu, Julien R. E. Roth, Annika Mechnich, Michael Hoegen, Soham Pal, David Jordan, Eric A. Miska, Mete Atature, Helena S. Knowles

Abstract: Optically addressable spin-based quantum sensors enable nanoscale measurements of temperature, magnetic field, pH, and other physical properties of a system. Advancing the sensors beyond proof-of-principle demonstrations in living cells and multicellular organisms towards reliable, damage-free quantum sensing poses three distinct technical challenges. First, spin-based quantum sensing requires opt… ▽ More Optically addressable spin-based quantum sensors enable nanoscale measurements of temperature, magnetic field, pH, and other physical properties of a system. Advancing the sensors beyond proof-of-principle demonstrations in living cells and multicellular organisms towards reliable, damage-free quantum sensing poses three distinct technical challenges. First, spin-based quantum sensing requires optical accessibility and microwave delivery. Second, any microelectronics must be biocompatible and designed for imaging living specimens. Third, efficient microwave delivery and temperature control are essential to reduce unwanted heating and to maintain an optimal biological environment. Here, we present the Quantum Biosensing Chip (Q-BiC), which facilitates microfluidic-compatible microwave delivery and includes on-chip temperature control. We demonstrate the use of Q-BiC in conjunction with nanodiamonds containing nitrogen vacancy centers to perform optically detected magnetic resonance in living systems. We quantify the biocompatibility of microwave excitation required for optically detected magnetic resonance both in vitro in HeLa cells and in vivo in the nematode Caenorhabditis elegans for temperature measurements and determine the microwave-exposure range allowed before detrimental effects are observed. In addition, we show that nanoscale quantum thermometry can be performed in immobilised but non-anaesthetised adult nematodes with minimal stress. These results enable the use of spin-based quantum sensors without damaging the biological system under study, facilitating the investigation of the local thermodynamic and viscoelastic properties of intracellular processes. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2406.01007 [pdf, other]

Measurement of Electron Antineutrino Oscillation Amplitude and Frequency via Neutron Capture on Hydrogen at Daya Bay

Authors: Daya Bay collaboration, F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, J. Cheng, Y. -C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng , et al. (177 additional authors not shown)

Abstract: This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive… ▽ More This Letter reports the first measurement of the oscillation amplitude and frequency of reactor antineutrinos at Daya Bay via neutron capture on hydrogen using 1958 days of data. With over 3.6 million signal candidates, an optimized candidate selection, improved treatment of backgrounds and efficiencies, refined energy calibration, and an energy response model for the capture-on-hydrogen sensitive region, the relative $\overlineν_{e}$ rates and energy spectra variation among the near and far detectors gives $\mathrm{sin}^22θ_{13} = 0.0759_{-0.0049}^{+0.0050}$ and $Δm^2_{32} = (2.72^{+0.14}_{-0.15})\times10^{-3}$ eV$^2$ assuming the normal neutrino mass ordering, and $Δm^2_{32} = (-2.83^{+0.15}_{-0.14})\times10^{-3}$ eV$^2$ for the inverted neutrino mass ordering. This estimate of $\sin^2 2θ_{13}$ is consistent with and essentially independent from the one obtained using the capture-on-gadolinium sample at Daya Bay. The combination of these two results yields $\mathrm{sin}^22θ_{13}= 0.0833\pm0.0022$, which represents an 8% relative improvement in precision regarding the Daya Bay full 3158-day capture-on-gadolinium result. △ Less

Submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.17082 [pdf, other]

Ensembling Diffusion Models via Adaptive Feature Aggregation

Authors: Cong Wang, Kuan Tian, Yonghang Guan, Jun Zhang, Zhiwei Jiang, Fei Shen, Xiao Han, Qing Gu, Wei Yang

Abstract: The success of the text-guided diffusion model has inspired the development and release of numerous powerful diffusion models within the open-source community. These models are typically fine-tuned on various expert datasets, showcasing diverse denoising capabilities. Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied. E… ▽ More The success of the text-guided diffusion model has inspired the development and release of numerous powerful diffusion models within the open-source community. These models are typically fine-tuned on various expert datasets, showcasing diverse denoising capabilities. Leveraging multiple high-quality models to produce stronger generation ability is valuable, but has not been extensively studied. Existing methods primarily adopt parameter merging strategies to produce a new static model. However, they overlook the fact that the divergent denoising capabilities of the models may dynamically change across different states, such as when experiencing different prompts, initial noises, denoising steps, and spatial locations. In this paper, we propose a novel ensembling method, Adaptive Feature Aggregation (AFA), which dynamically adjusts the contributions of multiple models at the feature level according to various states (i.e., prompts, initial noises, denoising steps, and spatial locations), thereby keeping the advantages of multiple diffusion models, while suppressing their disadvantages. Specifically, we design a lightweight Spatial-Aware Block-Wise (SABW) feature aggregator that adaptive aggregates the block-wise intermediate features from multiple U-Net denoisers into a unified one. The core idea lies in dynamically producing an individual attention map for each model's features by comprehensively considering various states. It is worth noting that only SABW is trainable with about 50 million parameters, while other models are frozen. Both the quantitative and qualitative experiments demonstrate the effectiveness of our proposed Adaptive Feature Aggregation method. The code is available at https://github.com/tenvence/afa/. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.16417 [pdf, other]

CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection

Authors: Lin Zhu, Yifeng Yang, Qinying Gu, Xinbing Wang, Chenghu Zhou, Nanyang Ye

Abstract: Recent vision-language pre-trained models (VL-PTMs) have shown remarkable success in open-vocabulary tasks. However, downstream use cases often involve further fine-tuning of VL-PTMs, which may distort their general knowledge and impair their ability to handle distribution shifts. In real-world scenarios, machine learning systems inevitably encounter both covariate shifts (e.g., changes in image s… ▽ More Recent vision-language pre-trained models (VL-PTMs) have shown remarkable success in open-vocabulary tasks. However, downstream use cases often involve further fine-tuning of VL-PTMs, which may distort their general knowledge and impair their ability to handle distribution shifts. In real-world scenarios, machine learning systems inevitably encounter both covariate shifts (e.g., changes in image styles) and semantic shifts (e.g., test-time unseen classes). This highlights the importance of enhancing out-of-distribution (OOD) generalization on covariate shifts and simultaneously detecting semantic-shifted unseen classes. Thus a critical but underexplored question arises: How to improve VL-PTMs' generalization ability to closed-set OOD data, while effectively detecting open-set unseen classes during fine-tuning? In this paper, we propose a novel objective function of OOD detection that also serves to improve OOD generalization. We show that minimizing the gradient magnitude of energy scores on training data leads to domain-consistent Hessians of classification loss, a strong indicator for OOD generalization revealed by theoretical analysis. Based on this finding, we have developed a unified fine-tuning framework that allows for concurrent optimization of both tasks. Extensive experiments have demonstrated the superiority of our method. The code is available at https://github.com/LinLLLL/CRoFT. △ Less

Submitted 25 May, 2024; originally announced May 2024.

Comments: Accepted by ICML2024

arXiv:2405.15612 [pdf, other]

Dual opposing quadrature-PT symmetry

Authors: Wencong Wang, Jacob Kokinda, Jiazhen Li, Qing Gu, Dongmei Liu, Jianming Wen

Abstract: Our recent research on type-I quadrature parity-time (PT) symmetry, utilizing an open twin-beam system, not only enables observing genuine quantum photonic PT symmetry amid phase-sensitive amplification (PSA) and loss in the presence of Langevin noise but also reveals additional classical-to-quantum (C2Q) transitions in quadrature and relative-intensity noise fluctuations. In contrast to the previ… ▽ More Our recent research on type-I quadrature parity-time (PT) symmetry, utilizing an open twin-beam system, not only enables observing genuine quantum photonic PT symmetry amid phase-sensitive amplification (PSA) and loss in the presence of Langevin noise but also reveals additional classical-to-quantum (C2Q) transitions in quadrature and relative-intensity noise fluctuations. In contrast to the previous setup, our exploration of an alternative system assuming no loss involves a type-II PSA-only scheme. This scheme facilitates dual opposing quadrature PT symmetry, offering a comprehensive and complementary comprehension of C2Q transitions and anti-Hermiticity-enhanced quantum sensing. Furthermore, our investigation into the correlation with the Einstein-Podolsky-Rosen criteria uncovers previously unexplored connections between PT symmetry and nonclassicality, as well as quantum entanglement within the continuous-variable framework. △ Less

Submitted 24 May, 2024; originally announced May 2024.

Comments: This is type-II quadrature PT symmetry, compared to type-I quadrature PT symmetry (arXiv:2301.05511)

arXiv:2405.15191 [pdf, other]

Effectiveness of halo and galaxy properties in reducing the scatter in the stellar-to-halo mass relation

Authors: Wenxiang Pei, Qi Guo, Shi Shao, Yi He, Qing Gu

Abstract: The stellar-to-halo mass relation (SHMR) is a fundamental relationship between galaxies and their host dark matter haloes. In this study, we examine the scatter in this relation for primary galaxies in the semi-analytic L-Galaxies model and two cosmological hydrodynamical simulations, \eagle{} and \tng{}. We find that in low-mass haloes, more massive galaxies tend to reside in haloes with higher c… ▽ More The stellar-to-halo mass relation (SHMR) is a fundamental relationship between galaxies and their host dark matter haloes. In this study, we examine the scatter in this relation for primary galaxies in the semi-analytic L-Galaxies model and two cosmological hydrodynamical simulations, \eagle{} and \tng{}. We find that in low-mass haloes, more massive galaxies tend to reside in haloes with higher concentration, earlier formation time, greater environmental density, earlier major mergers, and, to have older stellar populations, which is consistent with findings in various studies. Quantitative analysis reveals the varying significance of halo and galaxy properties in determining SHMR scatter across simulations and models. In \eagle{} and \tng{}, halo concentration and formation time primarily influence SHMR scatter for haloes with $M_{\rm h}<10^{12}~\rm M_\odot$, but the influence diminishes at high mass. Baryonic processes play a more significant role in \lgal{}. For halos with $M_{\rm h} <10^{11}~\rm M_\odot$ and $10^{12}~\rm M_\odot<M_{\rm h}<10^{13}~\rm M_\odot$, the main drivers of scatter are galaxy SFR and age. In the $10^{11.5}~\rm M_\odot<M_{\rm h} <10^{12}~\rm M_\odot$ range, halo concentration and formation time are the primary factors. And for halos with $M_{\rm h} > 10^{13}~\rm M_\odot$, supermassive black hole mass becomes more important. Interestingly, it is found that AGN feedback may increase the amplitude of the scatter and decrease the dependence on halo properties at high masses. △ Less

Submitted 23 May, 2024; originally announced May 2024.

Comments: 23 pages, 12 + 5 figures, 2 tables, including 4 Appendix; Accepted by MNRAS

arXiv:2405.10060 [pdf, other]

doi 10.1109/ACCESS.2024.3352115

Typing Requirement Model as Coroutines

Authors: Qiqi Gu, Wei Ke

Abstract: Model-Driven Engineering (MDE) is a technique that aims to boost productivity in software development and ensure the safety of critical systems. Central to MDE is the refinement of high-level requirement models into executable code. Given that requirement models form the foundation of the entire development process, ensuring their correctness is crucial. RM2PT is a widely used MDE platform that em… ▽ More Model-Driven Engineering (MDE) is a technique that aims to boost productivity in software development and ensure the safety of critical systems. Central to MDE is the refinement of high-level requirement models into executable code. Given that requirement models form the foundation of the entire development process, ensuring their correctness is crucial. RM2PT is a widely used MDE platform that employs the REModel language for requirement modeling. REModel contains contract sections and other sections including a UML sequence diagram. This paper contributes a coroutine-based type system that represents pre- and post-conditions in the contract sections in a requirement model as the receiving and yielding parts of coroutines, respectively. The type system is capable of composing coroutine types, so that users can view functions as a whole system and check their collective behavior. By doing so, our type system ensures that the contracts defined in it are executed as outlined in the accompanied sequence diagram. We assessed our approach using four case studies provided by RM2PT, validating the accuracy of the models. △ Less

Submitted 16 May, 2024; originally announced May 2024.

arXiv:2405.04589 [pdf, other]

A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching

Authors: Xianlei Long, Hui Zhao, Chao Chen, Fuqiang Gu, Qingyi Gu

Abstract: In recent years, wide-area visual surveillance systems have been widely applied in various industrial and transportation scenarios. These systems, however, face significant challenges when implementing multi-object detection due to conflicts arising from the need for high-resolution imaging, efficient object searching, and accurate localization. To address these challenges, this paper presents a h… ▽ More In recent years, wide-area visual surveillance systems have been widely applied in various industrial and transportation scenarios. These systems, however, face significant challenges when implementing multi-object detection due to conflicts arising from the need for high-resolution imaging, efficient object searching, and accurate localization. To address these challenges, this paper presents a hybrid system that incorporates a wide-angle camera, a high-speed search camera, and a galvano-mirror. In this system, the wide-angle camera offers panoramic images as prior information, which helps the search camera capture detailed images of the targeted objects. This integrated approach enhances the overall efficiency and effectiveness of wide-area visual detection systems. Specifically, in this study, we introduce a wide-angle camera-based method to generate a panoramic probability map (PPM) for estimating high-probability regions of target object presence. Then, we propose a probability searching module that uses the PPM-generated prior information to dynamically adjust the sampling range and refine target coordinates based on uncertainty variance computed by the object detector. Finally, the integration of PPM and the probability searching module yields an efficient hybrid vision system capable of achieving 120 fps multi-object search and detection. Extensive experiments are conducted to verify the system's effectiveness and robustness. △ Less

Submitted 7 May, 2024; originally announced May 2024.

Comments: Accepted by ICRA 2024

Journal ref: 2024 IEEE International Conference on Robotics and Automation (ICRA)

arXiv:2405.03117 [pdf, other]

Galaxies with Biconical Ionized Structure in MaNGA - I. Sample Selection and Driven Mechanisms

Authors: Zhi-Jie Zhou, Yan-Mei Chen, Run-Quan Guan, Yong Shi, Qiu-Sheng Gu, Dmitry Bizyaev

Abstract: Based on the integral field unit (IFU) data from Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, we develop a new method to select galaxies with biconical ionized structures, building a sample of 142 edge-on biconical ionized galaxies. We classify these 142 galaxies into 81 star-forming galaxies, 31 composite galaxies, and 30 AGNs (consisting of 23 Seyferts and 7 LI(N)ERs) acco… ▽ More Based on the integral field unit (IFU) data from Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey, we develop a new method to select galaxies with biconical ionized structures, building a sample of 142 edge-on biconical ionized galaxies. We classify these 142 galaxies into 81 star-forming galaxies, 31 composite galaxies, and 30 AGNs (consisting of 23 Seyferts and 7 LI(N)ERs) according to the {\nii}-BPT diagram. The star-forming bicones have bar-like structures while AGN bicones display hourglass structures, and composite bicones exhibit transitional morphologies between them due to both black hole and star-formation activities. Star-forming bicones have intense star-formation activities in their central regions, and the primary driver of biconical structures is the central star formation rate surface density. The lack of difference in the strength of central black hole activities (traced by dust attenuation corrected {\oiii}$λ$5007 luminosity and Eddington ratio) between Seyfert bicones and their control samples can be naturally explained as that the accretion disk and the galactic disk are not necessarily coplanar. Additionally, the biconical galaxies with central LI(N)ER-like line ratios are edge-on disk galaxies that show strong central dust attenuation. The radial gradients of {\ha} surface brightness follow the $r^{-2.35}$ relation, roughly consistent with $r^{-2}$ profile, which is expected in the case of photoionization by a central point-like source. These observations indicate obscured AGNs or AGN echoes as the primary drivers of biconical structures in LI(N)ERs. △ Less

Submitted 5 May, 2024; originally announced May 2024.

Comments: 12 pages, 9 figures, 1 table, Accepted for publication in MNRAS

arXiv:2405.00675 [pdf, other]

Self-Play Preference Optimization for Language Model Alignment

Authors: Yue Wu, Zhiqing Sun, Huizhuo Yuan, Kaixuan Ji, Yiming Yang, Quanquan Gu

Abstract: Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language mo… ▽ More Traditional reinforcement learning from human feedback (RLHF) approaches relying on parametric models like the Bradley-Terry model fall short in capturing the intransitivity and irrationality in human preferences. Recent advancements suggest that directly working with preference probabilities can yield a more accurate reflection of human preferences, enabling more flexible and accurate language model alignment. In this paper, we propose a self-play-based method for language model alignment, which treats the problem as a constant-sum two-player game aimed at identifying the Nash equilibrium policy. Our approach, dubbed Self-Play Preference Optimization (SPPO), approximates the Nash equilibrium through iterative policy updates and enjoys a theoretical convergence guarantee. Our method can effectively increase the log-likelihood of the chosen response and decrease that of the rejected response, which cannot be trivially achieved by symmetric pairwise loss such as Direct Preference Optimization (DPO) and Identity Preference Optimization (IPO). In our experiments, using only 60k prompts (without responses) from the UltraFeedback dataset and without any prompt augmentation, by leveraging a pre-trained preference model PairRM with only 0.4B parameters, SPPO can obtain a model from fine-tuning Mistral-7B-Instruct-v0.2 that achieves the state-of-the-art length-controlled win-rate of 28.53% against GPT-4-Turbo on AlpacaEval 2.0. It also outperforms the (iterative) DPO and IPO on MT-Bench and the Open LLM Leaderboard. Starting from a stronger base model Llama-3-8B-Instruct, we are able to achieve a length-controlled win rate of 38.77%. Notably, the strong performance of SPPO is achieved without additional external supervision (e.g., responses, preferences, etc.) from GPT-4 or other stronger language models. Codes are available at https://github.com/uclaml/SPPO. △ Less

Submitted 14 June, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: 27 pages, 4 figures, 5 tables

arXiv:2405.00245 [pdf, other]

Flexible multi-bunch-length operation for continuous-wave x-ray free-electron lasers

Authors: Zihan Zhu, Jiawei Yan, Hanxiang Yang, Duan Gu, Bart Faatz, Haixiao Deng, Qiang Gu

Abstract: The X-ray free-electron lasers (XFELs) are cutting-edge instruments pivotal in a broad range of fields, providing high-power X-ray pulses with durations spanning from femtoseconds to attoseconds. One of the critical challenges in XFEL facilities is the simultaneous accommodation of diverse requirements for XFEL operation modes and photon properties across different undulator lines. This paper prop… ▽ More The X-ray free-electron lasers (XFELs) are cutting-edge instruments pivotal in a broad range of fields, providing high-power X-ray pulses with durations spanning from femtoseconds to attoseconds. One of the critical challenges in XFEL facilities is the simultaneous accommodation of diverse requirements for XFEL operation modes and photon properties across different undulator lines. This paper proposes a dipole-kicker combination in the bunch compressors to vary the electron bunch length for the continuous-wave XFEL facilities driven by a superconducting linac. This method enables optimization of the electron bunch length on a per-bunch basis, tailored to each specific needs of each undulator. Through start-to-end simulations based on the parameters of the Shanghai high-repetition-rate XFEL and extreme light facility, we demonstrate the feasibility of this technique. The results show its effectiveness in enabling simultaneous operations of self-amplified spontaneous emission and externally seeded FEL across different undulator lines, ensuring optimal electron bunch compression for each undulator line. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.14397 [pdf, other]

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

Authors: Adrian de Wynter, Ishaan Watts, Nektar Ege Altıntoprak, Tua Wongsangaroonsri, Minghui Zhang, Noura Farra, Lena Baur, Samantha Claudet, Pavel Gajdusek, Can Gören, Qilong Gu, Anna Kaminska, Tomasz Kaminski, Ruby Kuo, Akiko Kyuba, Jongho Lee, Kartik Mathur, Petter Merok, Ivana Milovanović, Nani Paananen, Vesa-Matti Paananen, Anna Pavlenko, Bruno Pereira Vidal, Luciano Strika, Yueh Tsao , et al. (8 additional authors not shown)

Abstract: Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transc… ▽ More Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate seven S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when judging holistically the toxicity of a prompt, and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microagressions, bias). We release of this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: Work in progress

arXiv:2404.12376 [pdf, other]

Matching the Statistical Query Lower Bound for k-sparse Parity Problems with Stochastic Gradient Descent

Authors: Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade

Abstract: The $k$-parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-parity problem with stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that SGD can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hyper… ▽ More The $k$-parity problem is a classical problem in computational complexity and algorithmic theory, serving as a key benchmark for understanding computational classes. In this paper, we solve the $k$-parity problem with stochastic gradient descent (SGD) on two-layer fully-connected neural networks. We demonstrate that SGD can efficiently solve the $k$-sparse parity problem on a $d$-dimensional hypercube ($k\le O(\sqrt{d})$) with a sample complexity of $\tilde{O}(d^{k-1})$ using $2^{Θ(k)}$ neurons, thus matching the established $Ω(d^{k})$ lower bounds of Statistical Query (SQ) models. Our theoretical analysis begins by constructing a good neural network capable of correctly solving the $k$-parity problem. We then demonstrate how a trained neural network with SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors. Our theoretical results and findings are supported by empirical evidence, showcasing the efficiency and efficacy of our approach. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 36 pages, 7 figures, 3 tables

arXiv:2404.12314 [pdf, other]

Guided Discrete Diffusion for Electronic Health Record Generation

Authors: Jun Han, Zixiang Chen, Yongqian Li, Yiwen Kou, Eran Halperin, Robert E. Tillman, Quanquan Gu

Abstract: Electronic health records (EHRs) are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research. Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases. To tackle these challenges, we explore the use of… ▽ More Electronic health records (EHRs) are a pivotal data source that enables numerous applications in computational medicine, e.g., disease progression prediction, clinical trial design, and health economics and outcomes research. Despite wide usability, their sensitive nature raises privacy and confidentially concerns, which limit potential use cases. To tackle these challenges, we explore the use of generative models to synthesize artificial, yet realistic EHRs. While diffusion-based methods have recently demonstrated state-of-the-art performance in generating other data modalities and overcome the training instability and mode collapse issues that plague previous GAN-based approaches, their applications in EHR generation remain underexplored. The discrete nature of tabular medical code data in EHRs poses challenges for high-quality data generation, especially for continuous diffusion models. To this end, we introduce a novel tabular EHR generation method, EHR-D3PM, which enables both unconditional and conditional generation using the discrete diffusion model. Our experiments demonstrate that EHR-D3PM significantly outperforms existing generative baselines on comprehensive fidelity and utility metrics while maintaining less attribute and membership vulnerability risks. Furthermore, we show EHR-D3PM is effective as a data augmentation method and enhances performance on downstream tasks when combined with real data. △ Less

Submitted 14 June, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

Comments: 26 pages, 9 figures, 9 tables

arXiv:2404.10776 [pdf, other]

Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback

Authors: Qiwei Di, Jiafan He, Quanquan Gu

Abstract: Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adversaries, who may intentionally provide misleading preferences to manipulate the output in an undesirable or harmful direction. To tackle this challenge, we study a specific model within this problem domain--con… ▽ More Learning from human feedback plays an important role in aligning generative models, such as large language models (LLM). However, the effectiveness of this approach can be influenced by adversaries, who may intentionally provide misleading preferences to manipulate the output in an undesirable or harmful direction. To tackle this challenge, we study a specific model within this problem domain--contextual dueling bandits with adversarial feedback, where the true preference label can be flipped by an adversary. We propose an algorithm namely robust contextual dueling bandit (\algo), which is based on uncertainty-weighted maximum likelihood estimation. Our algorithm achieves an $\tilde O(d\sqrt{T}+dC)$ regret bound, where $T$ is the number of rounds, $d$ is the dimension of the context, and $ 0 \le C \le T$ is the total number of adversarial feedback. We also prove a lower bound to show that our regret bound is nearly optimal, both in scenarios with and without ($C=0$) adversarial feedback. Additionally, we conduct experiments to evaluate our proposed algorithm against various types of adversarial feedback. Experimental results demonstrate its superiority over the state-of-the-art dueling bandit algorithms in the presence of adversarial feedback. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 24pages, 5 figures

arXiv:2404.10745 [pdf, other]

Settling Constant Regrets in Linear Markov Decision Processes

Authors: Weitong Zhang, Zhiyuan Fan, Jiafan He, Quanquan Gu

Abstract: We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to missp… ▽ More We study the constant regret guarantees in reinforcement learning (RL). Our objective is to design an algorithm that incurs only finite regret over infinite episodes with high probability. We introduce an algorithm, Cert-LSVI-UCB, for misspecified linear Markov decision processes (MDPs) where both the transition kernel and the reward function can be approximated by some linear function up to misspecification level $ζ$. At the core of Cert-LSVI-UCB is an innovative certified estimator, which facilitates a fine-grained concentration analysis for multi-phase value-targeted regression, enabling us to establish an instance-dependent regret bound that is constant w.r.t. the number of episodes. Specifically, we demonstrate that for an MDP characterized by a minimal suboptimality gap $Δ$, Cert-LSVI-UCB has a cumulative regret of $\tilde{\mathcal{O}}(d^3H^5/Δ)$ with high probability, provided that the misspecification level $ζ$ is below $\tilde{\mathcal{O}}(Δ/ (\sqrt{d}H^2))$. Remarkably, this regret bound remains constant relative to the number of episodes $K$. To the best of our knowledge, Cert-LSVI-UCB is the first algorithm to achieve a constant, instance-dependent, high-probability regret bound in RL with linear function approximation for infinite runs without relying on prior distribution assumptions. This not only highlights the robustness of Cert-LSVI-UCB to model misspecification but also introduces novel algorithmic designs and analytical techniques of independent interest. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: 46 pages, 2 tables

arXiv:2404.10245 [pdf, other]

Quasars with Flare/Eclipse-like Variability Identified in ZTF

Authors: Zhiyuan Zheng, Yong Shi, Shuowen Jin, H. Dannerbauer, Qiusheng Gu, Xin Li, Xiaoling Yu

Abstract: Active galactic nuclei (AGNs) are known to exhibit optical/UV variability and most of them can be well modeled by the damped random walks. Physical processes that are not related to the accretion disk, such as tidal disruption events (TDE) or moving foreground dusty clouds, can cause flare-like and eclipse-like features in the optical light curve. Both long-term and high-cadence monitoring are nee… ▽ More Active galactic nuclei (AGNs) are known to exhibit optical/UV variability and most of them can be well modeled by the damped random walks. Physical processes that are not related to the accretion disk, such as tidal disruption events (TDE) or moving foreground dusty clouds, can cause flare-like and eclipse-like features in the optical light curve. Both long-term and high-cadence monitoring are needed to identify such features. By combining the Sloan Digital Sky Survey (SDSS), Panoramic Survey Telescope, and Rapid Response System (Pan-STARRS) with the Zwicky Transient Facility (ZTF) survey, we are able to identify a rare sample (11) out of the SDSS quasar catalog (around 83, 000). These quasars exhibit more or less constant brightness but show rapid optical variation in the ZTF DR2 epochs. To investigate the possible origins of these flare/eclipse-like variabilities, we propose the second epoch spectroscopic observations with the Gran Telescopio CANARIAS (GTC). We find that the change in accretion rate plays a significant role in these quasar variabilities. Among them, we identify two Changing-Look Active Galactic Nuclei (CL-AGN) candidates: SDSS J1427+2930 and SDSS J1420+3757. The luminosity change of the former may be caused by the enhanced SMBH accretion or the tidal disruption event, while the latter is more related to the change in the accretion rate. △ Less

Submitted 15 April, 2024; originally announced April 2024.

Comments: 11 pages, 13 figures, 2 tables, accepted for publication in MNRAS

arXiv:2404.06013 [pdf, other]

Feel-Good Thompson Sampling for Contextual Dueling Bandits

Authors: Xuheng Li, Heyang Zhao, Quanquan Gu

Abstract: Contextual dueling bandits, where a learner compares two options based on context and receives feedback indicating which was preferred, extends classic dueling bandits by incorporating contextual information for decision-making and preference learning. Several algorithms based on the upper confidence bound (UCB) have been proposed for linear contextual dueling bandits. However, no algorithm based… ▽ More Contextual dueling bandits, where a learner compares two options based on context and receives feedback indicating which was preferred, extends classic dueling bandits by incorporating contextual information for decision-making and preference learning. Several algorithms based on the upper confidence bound (UCB) have been proposed for linear contextual dueling bandits. However, no algorithm based on posterior sampling has been developed in this setting, despite the empirical success observed in traditional contextual bandits. In this paper, we propose a Thompson sampling algorithm, named FGTS.CDB, for linear contextual dueling bandits. At the core of our algorithm is a new Feel-Good exploration term specifically tailored for dueling bandits. This term leverages the independence of the two selected arms, thereby avoiding a cross term in the analysis. We show that our algorithm achieves nearly minimax-optimal regret, i.e., $\tilde{\mathcal{O}}(d\sqrt T)$, where $d$ is the model dimension and $T$ is the time horizon. Finally, we evaluate our algorithm on synthetic data and observe that FGTS.CDB outperforms existing algorithms by a large margin. △ Less

Submitted 9 April, 2024; originally announced April 2024.

Comments: 30 pages, 6 figures

arXiv:2404.03858 [pdf, other]

INvestigations of massive Filaments ANd sTar formation (INFANT). I. Core Identification and Core Mass Function

Authors: Yu Cheng, Xing Lu, Patricio Sanhueza, Hauyu Baobab Liu, Qizhou Zhang, Roberto Galván-Madrid, Ke Wang, Fumitaka Nakamura, Tie Liu, Siyi Feng, Shanghuo Li, Sihan Jiao, Kei E. I. Tanaka, Xunchuan Liu, Pak Shing Li, Qiuyi Luo, Qilao Gu, Yuxin Lin, András E. Guzmán

Abstract: Filamentary structures are ubiquitously found in high-mass star-forming clouds. To investigate the relationship between filaments and star formation, we carry out the INFANT (INvestigations of massive Filaments ANd sTar formation) survey, a multi-scale, multi-wavelength survey of massive filamentary clouds with ALMA band 3/band 6 and VLA K band. In this first paper, we present the ALMA band 6 cont… ▽ More Filamentary structures are ubiquitously found in high-mass star-forming clouds. To investigate the relationship between filaments and star formation, we carry out the INFANT (INvestigations of massive Filaments ANd sTar formation) survey, a multi-scale, multi-wavelength survey of massive filamentary clouds with ALMA band 3/band 6 and VLA K band. In this first paper, we present the ALMA band 6 continuum observations toward a sample of 8 high-mass star forming filaments. We covered each target with approximately rectangular mosaic field of view with two 12-m array configurations, achieving an angular resolution of $\sim$0.6" (2700 AU at 4.5 kpc) and a continuum rms of $\sim$0.1 mJy/beam ($\sim$0.06 Msun in gas mass assuming 15 K). We identify cores using the getsf and astrodendro and find the former is more robust in terms of both identification and measuring flux densities. We identify in total 183 dense cores (15--36 cores in each cloud) and classify their star formation states via outflow and warm gas tracers. The protostellar cores are statistically more massive than the prestellar cores, possibly indicating further accretion onto cores after formation of protostars. For the high-mass end ($M_\text{core}$ $>$ 1.5 Msun) of the core mass function (CMF) we derive a power-law index of $-$1.15 $\pm$ 0.12 for the whole sample, and $-$1.70 $\pm$ 0.25 for the prestellar population. We also find a steepening trend in CMF with cloud evolution ($-$0.89 $\pm$ 0.15 for the young group v.s. $-$1.44 $\pm$ 0.25 for the evolved group) and discuss its implication for cluster formation. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 25 pages, 8 figures, accepted for ApJ

arXiv:2404.03850 [pdf, other]

Are High $Σ_1$ Massive Blue Spiral Galaxies Rejuvenated Systems?

Authors: Cai-Na Hao, Xiaoyang Xia, Yong Shi, Rui Guo, Yanmei Chen, Shuai Feng, Junqiang Ge, Qiusheng Gu

Abstract: Quiescent galaxies generally possess denser cores than star-forming galaxies with similar mass. As a measurement of the core density, the central stellar mass surface density within a radius of 1 kpc ($Σ_1$) was thus suggested to be closely related to galaxy quenching. Massive star-forming galaxies with high $Σ_1$ do not fit into this picture. To understand the origin of such galaxies, we compare… ▽ More Quiescent galaxies generally possess denser cores than star-forming galaxies with similar mass. As a measurement of the core density, the central stellar mass surface density within a radius of 1 kpc ($Σ_1$) was thus suggested to be closely related to galaxy quenching. Massive star-forming galaxies with high $Σ_1$ do not fit into this picture. To understand the origin of such galaxies, we compare the spatially-resolved stellar population and star formation properties of massive ($ > 10^{10.5}{\rm M}_{\odot}$) blue spiral galaxies with high and low $Σ_1$, divided by $Σ_1 = 10^{9.4} M_\odot \, {\rm kpc}^{-2}$, based on the final release of MaNGA IFU data. We find that both high $Σ_1$ and low $Σ_1$ blue spirals show large diversities in stellar population and star formation properties. Despite the diversities, high $Σ_1$ blue spirals are statistically different from the low $Σ_1$ ones. Specifically, the radial profiles of the luminosity-weighted age and Mgb/${\rm \langle Fe \rangle}$ show that high $Σ_1$ blue spirals consist of a larger fraction of galaxies with younger and less $α$-element enhanced centers than their low $Σ_1$ counterparts, $\sim 55\%$ versus $\sim 30\%$. The galaxies with younger centers mostly have higher central specific star formation rates, which still follow the spaxel-based star formation main sequence relation though. Examinations of the H$α$ velocity field and the optical structures suggest that galactic bars or galaxy interactions should be responsible for the rejuvenation of these galaxies. The remaining $\sim 45\% $ of high $Σ_1$ blue spirals are consistent with the inside-out growth scenario. △ Less

Submitted 4 April, 2024; originally announced April 2024.

Comments: 22 pages, 14 figures, accepted for publication in ApJ

arXiv:2404.01687 [pdf, other]

Search for a sub-eV sterile neutrino using Daya Bay's full dataset

Authors: F. P. An, W. D. Bai, A. B. Balantekin, M. Bishai, S. Blyth, G. F. Cao, J. Cao, J. F. Chang, Y. Chang, H. S. Chen, H. Y. Chen, S. M. Chen, Y. Chen, Y. X. Chen, Z. Y. Chen, J. Cheng, Y. C. Cheng, Z. K. Cheng, J. J. Cherwinka, M. C. Chu, J. P. Cummings, O. Dalager, F. S. Deng, X. Y. Ding, Y. Y. Ding , et al. (176 additional authors not shown)

Abstract: This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis… ▽ More This Letter presents results of a search for the mixing of a sub-eV sterile neutrino with three active neutrinos based on the full data sample of the Daya Bay Reactor Neutrino Experiment, collected during 3158 days of detector operation, which contains $5.55 \times 10^{6}$ reactor \anue candidates identified as inverse beta-decay interactions followed by neutron-capture on gadolinium. The analysis benefits from a doubling of the statistics of our previous result and from improvements of several important systematic uncertainties. No significant oscillation due to mixing of a sub-eV sterile neutrino with active neutrinos was found. Exclusion limits are set by both Feldman-Cousins and CLs methods. Light sterile neutrino mixing with $\sin^2 2θ_{14} \gtrsim 0.01$ can be excluded at 95\% confidence level in the region of $0.01$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.1 $ eV$^2$. This result represents the world-leading constraints in the region of $2 \times 10^{-4}$ eV$^2 \lesssim |Δm^{2}_{41}| \lesssim 0.2 $ eV$^2$. △ Less

Submitted 15 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 7 pages, 4 figures, 1 table

arXiv:2403.20083 [pdf, ps, other]

Phase structure of the de Sitter Spacetime with KR field based on the Lyapunov exponent

Authors: Yun-Zhi Du, Huai-Fan Li, Yu-Bo Ma, Qiang Gu

Abstract: Since the spontaneously broken of the Lorentz symmetry in the gravity theory with the non-minimally coupling between the Kalb-Ramond (KR) field (that acquires a nonzero vacuum expectation value) and the Einstein gravity, there exists the exactly static and spherically symmetric black holes solutions related with the Lorentz violating parameter. Based on this, we consider the corresponding black ho… ▽ More Since the spontaneously broken of the Lorentz symmetry in the gravity theory with the non-minimally coupling between the Kalb-Ramond (KR) field (that acquires a nonzero vacuum expectation value) and the Einstein gravity, there exists the exactly static and spherically symmetric black holes solutions related with the Lorentz violating parameter. Based on this, we consider the corresponding black hole solution in the de-Sitter (dS) spacetime with the KR field and investigate the thermodynamic properties in the expanded phase space through introducing the interplay entropy between the black hole and cosmological horizons. Especially we analyze the effect of the Lorentz-violating parameter on the thermodynamic properties. Furthermore, the Lyapunov exponent and the shadow of these static and spherically symmetric black holes in this Lorentz-violating gravity theory are also investigated. These study will open a new perspective to probe the thermodynamics of black holes. △ Less

Submitted 29 March, 2024; originally announced March 2024.

arXiv:2403.18118 [pdf, other]

EgoLifter: Open-world 3D Segmentation for Egocentric Perception

Authors: Qiao Gu, Zhaoyang Lv, Duncan Frost, Simon Green, Julian Straub, Chris Sweeney

Abstract: In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and… ▽ More In this paper we present EgoLifter, a novel system that can automatically segment scenes captured from egocentric sensors into a complete decomposition of individual 3D objects. The system is specifically designed for egocentric data where scenes contain hundreds of objects captured from natural (non-scanning) motion. EgoLifter adopts 3D Gaussians as the underlying representation of 3D scenes and objects and uses segmentation masks from the Segment Anything Model (SAM) as weak supervision to learn flexible and promptable definitions of object instances free of any specific object taxonomy. To handle the challenge of dynamic objects in ego-centric videos, we design a transient prediction module that learns to filter out dynamic objects in the 3D reconstruction. The result is a fully automatic pipeline that is able to reconstruct 3D object instances as collections of 3D Gaussians that collectively compose the entire scene. We created a new benchmark on the Aria Digital Twin dataset that quantitatively demonstrates its state-of-the-art performance in open-world 3D segmentation from natural egocentric input. We run EgoLifter on various egocentric activity datasets which shows the promise of the method for 3D egocentric perception at scale. △ Less

Submitted 26 March, 2024; originally announced March 2024.

Comments: Preprint. Project page: https://egolifter.github.io/

arXiv:2403.16576 [pdf, other]

Antigen-Specific Antibody Design via Direct Energy-based Preference Optimization

Authors: Xiangxin Zhou, Dongyu Xue, Ruizhe Chen, Zaixiang Zheng, Liang Wang, Quanquan Gu

Abstract: Antibody design, a crucial task with significant implications across various disciplines such as therapeutics and biology, presents considerable challenges due to its intricate nature. In this paper, we tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences, considering both rationality and functionality. Leveraging a pre-trained condi… ▽ More Antibody design, a crucial task with significant implications across various disciplines such as therapeutics and biology, presents considerable challenges due to its intricate nature. In this paper, we tackle antigen-specific antibody sequence-structure co-design as an optimization problem towards specific preferences, considering both rationality and functionality. Leveraging a pre-trained conditional diffusion model that jointly models sequences and structures of antibodies with equivariant neural networks, we propose direct energy-based preference optimization to guide the generation of antibodies with both rational structures and considerable binding affinities to given antigens. Our method involves fine-tuning the pre-trained diffusion model using a residue-level decomposed energy preference. Additionally, we employ gradient surgery to address conflicts between various types of energy, such as attraction and repulsion. Experiments on RAbD benchmark show that our approach effectively optimizes the energy of generated antibodies and achieves state-of-the-art performance in designing high-quality antibodies with low total energy and high binding affinity simultaneously, demonstrating the superiority of our approach. △ Less

Submitted 25 June, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

arXiv:2403.14088 [pdf, other]

Protein Conformation Generation via Force-Guided SE(3) Diffusion Models

Authors: Yan Wang, Lihao Wang, Yuning Shen, Yiqun Wang, Huizhuo Yuan, Yue Wu, Quanquan Gu

Abstract: The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially… ▽ More The conformational landscape of proteins is crucial to understanding their functionality in complex biological processes. Traditional physics-based computational methods, such as molecular dynamics (MD) simulations, suffer from rare event sampling and long equilibration time problems, hindering their applications in general protein systems. Recently, deep generative modeling techniques, especially diffusion models, have been employed to generate novel protein conformations. However, existing score-based diffusion methods cannot properly incorporate important physical prior knowledge to guide the generation process, causing large deviations in the sampled protein conformations from the equilibrium distribution. In this paper, to overcome these limitations, we propose a force-guided SE(3) diffusion model, ConfDiff, for protein conformation generation. By incorporating a force-guided network with a mixture of data-based score models, ConfDiff can can generate protein conformations with rich diversity while preserving high fidelity. Experiments on a variety of protein conformation prediction tasks, including 12 fast-folding proteins and the Bovine Pancreatic Trypsin Inhibitor (BPTI), demonstrate that our method surpasses the state-of-the-art method. △ Less

Submitted 20 March, 2024; originally announced March 2024.

arXiv:2403.13829 [pdf, other]

DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Authors: Xiangxin Zhou, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, Quanquan Gu

Abstract: Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes… ▽ More Recently, 3D generative models have shown promising performances in structure-based drug design by learning to generate ligands given target binding sites. However, only modeling the target-ligand distribution can hardly fulfill one of the main goals in drug discovery -- designing novel ligands with desired properties, e.g., high binding affinity, easily synthesizable, etc. This challenge becomes particularly pronounced when the target-ligand pairs used for training do not align with these desired properties. Moreover, most existing methods aim at solving \textit{de novo} design task, while many generative scenarios requiring flexible controllability, such as R-group optimization and scaffold hopping, have received little attention. In this work, we propose DecompOpt, a structure-based molecular optimization method based on a controllable and decomposed diffusion model. DecompOpt presents a new generation paradigm which combines optimization with conditional diffusion models to achieve desired properties while adhering to the molecular grammar. Additionally, DecompOpt offers a unified framework covering both \textit{de novo} design and controllable generation. To achieve so, ligands are decomposed into substructures which allows fine-grained control and local optimization. Experiments show that DecompOpt can efficiently generate molecules with improved properties than strong de novo baselines, and demonstrate great potential in controllable generation tasks. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: Accepted to ICLR 2024

arXiv:2403.09174 [pdf, other]

Properties of a Fading AGN from SDSS-IV MaNGA

Authors: Hao Mo, Yan-Mei Chen, Zhi-Yun Zhang, Alexei Moiseev, Dmitry Bizyaev, Yong Shi, Qiu-Sheng Gu, Min Bao, Xiao Cao, Song-Lin Li

Abstract: We identify a fading AGN SDSS J220141.64+115124.3 from the internal Product Launch-11 (MPL-11) in Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey. The central region with a projected radius of $\sim$2.4 kpc is characterized as LINER-like line ratios while the outskirts extended to $\sim$15 kpc show Seyfert-like line ratios. The [OIII]$λ$5007 luminosity of the Seyfert regions is… ▽ More We identify a fading AGN SDSS J220141.64+115124.3 from the internal Product Launch-11 (MPL-11) in Mapping Nearby Galaxies at Apache Point Observatory (MaNGA) survey. The central region with a projected radius of $\sim$2.4 kpc is characterized as LINER-like line ratios while the outskirts extended to $\sim$15 kpc show Seyfert-like line ratios. The [OIII]$λ$5007 luminosity of the Seyfert regions is a factor of 37 (2) higher than the LINER regions without (with) dust attenuation correction, suggesting that the AGN activity decreases at least $\sim$8 $\times$ 10$^3$ yrs ($\sim$2.4 kpc/light-speed) ago. We model the emission line spectra in the central region with double Gaussian components (a narrow core and a broad wing) and analyze the properties of each component. The narrow core component mostly co-rotates with the stellar disc, whereas the broad wing component with a median of the velocity dispersion $\sim$300 km s$^{-1}$ is related to a wind outflow. The kinematic position angle (PA) of the ionized gas shows a $\sim$20° twist from the galaxy center to 1.5 effective radius. The median of the PA difference between the gas and stellar components is as large as $\sim$50° within 0.4 effective radius. The tidal feature in DESI image and star-gas misalignment suggest this galaxy is a merger remnant. Combining all these observational results as well as public available X-ray and MIR luminosities, we confirm this is a fading AGN, the merger process kick-started the central engine to quasar phase which ionized gas composed of tidal debris, and now the activity of the central black hole decreases. The discontinuity in [OIII]$λ$5007 flux and EQW maps is due to multiple AGN outbursts triggered by merger remnant gas inflows. △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted for publication in MNRAS. 12 pages, 10 figures, 1 table

arXiv:2403.07902 [pdf, other]

DecompDiff: Diffusion Models with Decomposed Priors for Structure-Based Drug Design

Authors: Jiaqi Guan, Xiangxin Zhou, Yuwei Yang, Yu Bao, Jian Peng, Jianzhu Ma, Qiang Liu, Liang Wang, Quanquan Gu

Abstract: Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the… ▽ More Designing 3D ligands within a target binding site is a fundamental task in drug discovery. Existing structured-based drug design methods treat all ligand atoms equally, which ignores different roles of atoms in the ligand for drug design and can be less efficient for exploring the large drug-like molecule space. In this paper, inspired by the convention in pharmaceutical practice, we decompose the ligand molecule into two parts, namely arms and scaffold, and propose a new diffusion model, DecompDiff, with decomposed priors over arms and scaffold. In order to facilitate the decomposed generation and improve the properties of the generated molecules, we incorporate both bond diffusion in the model and additional validity guidance in the sampling phase. Extensive experiments on CrossDocked2020 show that our approach achieves state-of-the-art performance in generating high-affinity molecules while maintaining proper molecular properties and conformational stability, with up to -8.39 Avg. Vina Dock score and 24.5 Success Rate. The code is provided at https://github.com/bytedance/DecompDiff △ Less

Submitted 26 February, 2024; originally announced March 2024.

Comments: Accepted to ICML 2023

arXiv:2403.07414 [pdf, other]

Strong asymptotic giant branch stars' spectral features in distant quiescent galaxies: Impact on galaxy evolution

Authors: Shiying Lu, Emanuele Daddi, Claudia Maraston, Mark Dickinson, Pablo Arrabal Haro, Raphael Gobat, Alvio Renzini, Mauro Giavalisco, Micaela B. Bagley, Antonello Calabrò, Yingjie Cheng, Alexander de la Vega, Chiara D'Eugenio, David Elbaz, Steven L. Finkelstein, Carlos Gómez-Guijarro, Qiusheng Gu, Nimish P. Hathi, Marc Huertas-Company, Jeyhan S. Kartaltepe, Anton M. Koekemoer, Aurélien Le Bail, Yipeng Lyu, Benjamin Magnelli, Bahram Mobasher , et al. (5 additional authors not shown)

Abstract: Age-dating and weighting stellar populations in galaxies at various cosmic epochs are essential steps to study galaxy formation through cosmic times. Evolutionary population synthesis models with different input physics are used towards this aim. In particular, the contribution from the thermally pulsing asymptotic-giant-branch (TP-AGB) stellar phase, which peaks for intermediate-age 0.6-2 Gyr sys… ▽ More Age-dating and weighting stellar populations in galaxies at various cosmic epochs are essential steps to study galaxy formation through cosmic times. Evolutionary population synthesis models with different input physics are used towards this aim. In particular, the contribution from the thermally pulsing asymptotic-giant-branch (TP-AGB) stellar phase, which peaks for intermediate-age 0.6-2 Gyr systems, has been debated upon for decades. Here we report the detection of strong cool star signatures in the rest-frame near-infrared spectra of three young (~1 Gyr), massive (~10^10 Msun) quiescent galaxies at large look-back time, z=1-2, using JWST/NIRSpec. The co-existence of oxygen- and carbon-type absorption features, spectral edges and features from rare species such as Vanadium, and possibly Zirconium, reveal a strong contribution from TP-AGB stars. Population synthesis models with significant TP-AGB contribution reproduce the observations considerably better than those with weak TP-AGB, which are those commonly used. These findings call for revisions of published stellar population fitting results, pointing to lower masses and younger ages, with additional implications on cosmic dust production and chemical enrichment. These results will stimulate new generations of improved models informed by these and future observations. △ Less

Submitted 12 March, 2024; originally announced March 2024.

Comments: submitted

arXiv:2403.07342 [pdf, other]

Rethinking ASTE: A Minimalist Tagging Scheme Alongside Contrastive Learning

Authors: Qiao Sun, Liujia Yang, Minghao Ma, Nanyang Ye, Qinying Gu

Abstract: Aspect Sentiment Triplet Extraction (ASTE) is a burgeoning subtask of fine-grained sentiment analysis, aiming to extract structured sentiment triplets from unstructured textual data. Existing approaches to ASTE often complicate the task with additional structures or external data. In this research, we propose a novel tagging scheme and employ a contrastive learning approach to mitigate these chall… ▽ More Aspect Sentiment Triplet Extraction (ASTE) is a burgeoning subtask of fine-grained sentiment analysis, aiming to extract structured sentiment triplets from unstructured textual data. Existing approaches to ASTE often complicate the task with additional structures or external data. In this research, we propose a novel tagging scheme and employ a contrastive learning approach to mitigate these challenges. The proposed approach demonstrates comparable or superior performance in comparison to state-of-the-art techniques, while featuring a more compact design and reduced computational overhead. Notably, even in the era of Large Language Models (LLMs), our method exhibits superior efficacy compared to GPT 3.5 and GPT 4 in a few-shot learning scenarios. This study also provides valuable insights for the advancement of ASTE techniques within the paradigm of large language models. △ Less

Submitted 14 April, 2024; v1 submitted 12 March, 2024; originally announced March 2024.

arXiv:2403.05248 [pdf, other]

doi 10.3847/2041-8213/ad4986

JWST's first glimpse of a z > 2 forming cluster reveals a top-heavy stellar mass function

Authors: Hanwen Sun, Tao Wang, Ke Xu, Emanuele Daddi, Qing Gu, Tadayuki Kodama, Anita Zanella, David Elbaz, Ichi Tanaka, Raphael Gobat, Qi Guo, Jiaxin Han, Shiying Lu, Luwenjia Zhou

Abstract: Clusters and their progenitors (protoclusters) at z = 2-4, the peak epoch of star formation, are ideal laboratories to study the formation process of both the clusters themselves and their member galaxies. However, a complete census of their member galaxies has been challenging due to observational difficulties. Here we present new JWST/NIRCam observations targeting the distant cluster CLJ1001 at… ▽ More Clusters and their progenitors (protoclusters) at z = 2-4, the peak epoch of star formation, are ideal laboratories to study the formation process of both the clusters themselves and their member galaxies. However, a complete census of their member galaxies has been challenging due to observational difficulties. Here we present new JWST/NIRCam observations targeting the distant cluster CLJ1001 at z = 2.51 from the COSMOS-Web program, which, in combination with previous narrowband imaging targeting H-alpha emitters and deep millimeter surveys of CO emitters, provide a complete view of massive galaxy assembly in CLJ1001. In particular, JWST reveals a population of massive, extremely red cluster members in the long-wavelength bands that were invisible in previous Hubble Space Telescope (HST)/F160W imaging (HST-dark members). Based on this highly complete spectroscopic sample of member galaxies, we show that the spatial distribution of galaxies in CLJ1001 exhibits a strong central concentration, with the central galaxy density already resembling that of low-z clusters. Moreover, we reveal a "top-heavy" stellar mass function for the star-forming galaxies (SFGs), with an overabundance of massive SFGs piled up in the cluster core. These features strongly suggest that CLJ1001 is caught in a rapid transition, with many of its massive SFGs likely soon becoming quiescent. In the context of cluster formation, these findings suggest that the earliest clusters form from the inside out and top to bottom, with the massive galaxies in the core assembling first, followed by the less massive ones in the outskirts. △ Less

Submitted 29 May, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

Comments: 14 pages, 9 figures, 1 table, published by ApJL

Journal ref: ApJL, 967, L34 (2024)

arXiv:2403.01374 [pdf, other]

A Novel Dynamic Light-Section 3D Reconstruction Method for Wide-Range Sensing

Authors: Mengjuan Chen, Qing Li, Kohei Shimasaki, Shaopeng Hu, Qingyi Gu, Idaku Ishii

Abstract: Existing galvanometer-based laser scanning systems are challenging to apply in multi-scale 3D reconstruction because of the difficulty in achieving a balance between high reconstruction accuracy and a wide reconstruction range. This paper presents a novel method that synchronizes laser scanning by switching the field-of-view (FOV) of a camera using multi-galvanometers. In addition to the advanced… ▽ More Existing galvanometer-based laser scanning systems are challenging to apply in multi-scale 3D reconstruction because of the difficulty in achieving a balance between high reconstruction accuracy and a wide reconstruction range. This paper presents a novel method that synchronizes laser scanning by switching the field-of-view (FOV) of a camera using multi-galvanometers. In addition to the advanced hardware setup, we establish a comprehensive mathematical model of the system by modeling dynamic camera, dynamic laser, and their combined interaction. We then propose a high-precision and flexible calibration method by constructing an error model and minimizing the objective function. Finally, we evaluate the performance of the proposed system by scanning standard components. The evaluation results demonstrate that the accuracy of the proposed 3D reconstruction system achieves 0.3 mm when the measurement range is extended to 1100 mm $\times$ 1300 mm $\times$ 650 mm. With the same reconstruction accuracy, the reconstruction range is expanded by a factor of 25, indicating that the proposed method simultaneously allows for high-precision and wide-range 3D reconstruction in industrial applications. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 9 pages,6 figures, Journal

MSC Class: First-level 68 ACM Class: I.4.9

arXiv:2403.00178 [pdf, other]

Causal Graph ODE: Continuous Treatment Effect Modeling in Multi-agent Dynamical Systems

Authors: Zijie Huang, Jeehyun Hwang, Junkai Zhang, Jinwoo Baik, Weitong Zhang, Dominik Wodarz, Yizhou Sun, Quanquan Gu, Wei Wang

Abstract: Real-world multi-agent systems are often dynamic and continuous, where the agents co-evolve and undergo changes in their trajectories and interactions over time. For example, the COVID-19 transmission in the U.S. can be viewed as a multi-agent system, where states act as agents and daily population movements between them are interactions. Estimating the counterfactual outcomes in such systems enab… ▽ More Real-world multi-agent systems are often dynamic and continuous, where the agents co-evolve and undergo changes in their trajectories and interactions over time. For example, the COVID-19 transmission in the U.S. can be viewed as a multi-agent system, where states act as agents and daily population movements between them are interactions. Estimating the counterfactual outcomes in such systems enables accurate future predictions and effective decision-making, such as formulating COVID-19 policies. However, existing methods fail to model the continuous dynamic effects of treatments on the outcome, especially when multiple treatments (e.g., "stay-at-home" and "get-vaccine" policies) are applied simultaneously. To tackle this challenge, we propose Causal Graph Ordinary Differential Equations (CAG-ODE), a novel model that captures the continuous interaction among agents using a Graph Neural Network (GNN) as the ODE function. The key innovation of our model is to learn time-dependent representations of treatments and incorporate them into the ODE function, enabling precise predictions of potential outcomes. To mitigate confounding bias, we further propose two domain adversarial learning-based objectives, which enable our model to learn balanced continuous representations that are not affected by treatments or interference. Experiments on two datasets (i.e., COVID-19 and tumor growth) demonstrate the superior performance of our proposed model. △ Less

Submitted 29 February, 2024; originally announced March 2024.

arXiv:2402.18567 [pdf, other]

Diffusion Language Models Are Versatile Protein Learners

Authors: Xinyou Wang, Zaixiang Zheng, Fei Ye, Dongyu Xue, Shujian Huang, Quanquan Gu

Abstract: This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework, which generalizes language modeling for proteins in a princ… ▽ More This paper introduces diffusion protein language model (DPLM), a versatile protein language model that demonstrates strong generative and predictive capabilities for protein sequences. We first pre-train scalable DPLMs from evolutionary-scale protein sequences within a generative self-supervised discrete diffusion probabilistic framework, which generalizes language modeling for proteins in a principled way. After pre-training, DPLM exhibits the ability to generate structurally plausible, novel, and diverse protein sequences for unconditional generation. We further demonstrate the proposed diffusion generative pre-training makes DPLM possess a better understanding of proteins, making it a superior representation learner, which can be fine-tuned for various predictive tasks, comparing favorably to ESM2 (Lin et al., 2022). Moreover, DPLM can be tailored for various needs, which showcases its prowess of conditional generation in several ways: (1) conditioning on partial peptide sequences, e.g., generating scaffolds for functional motifs with high success rate; (2) incorporating other modalities as conditioner, e.g., structure-conditioned generation for inverse folding; and (3) steering sequence generation towards desired properties, e.g., satisfying specified secondary structures, through a plug-and-play classifier guidance. △ Less

Submitted 28 February, 2024; originally announced February 2024.

arXiv:2402.16363 [pdf, other]

LLM Inference Unveiled: Survey and Roofline Model Insights

Authors: Zhihang Yuan, Yuzhang Shang, Yang Zhou, Zhen Dong, Zhe Zhou, Chenhao Xue, Bingzhe Wu, Zhikai Li, Qingyi Gu, Yong Jae Lee, Yan Yan, Beidi Chen, Guangyu Sun, Kurt Keutzer

Abstract: The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Although the field has expanded and is vibrant, there hasn't been a concise framework that analyzes the various methods of LLM Inference to provide a clear understanding of this domain. Our survey stands out from traditional literature reviews by not only summ… ▽ More The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a unique blend of opportunities and challenges. Although the field has expanded and is vibrant, there hasn't been a concise framework that analyzes the various methods of LLM Inference to provide a clear understanding of this domain. Our survey stands out from traditional literature reviews by not only summarizing the current state of research but also by introducing a framework based on roofline model for systematic analysis of LLM inference techniques. This framework identifies the bottlenecks when deploying LLMs on hardware devices and provides a clear understanding of practical problems, such as why LLMs are memory-bound, how much memory and computation they need, and how to choose the right hardware. We systematically collate the latest advancements in efficient LLM inference, covering crucial areas such as model compression (e.g., Knowledge Distillation and Quantization), algorithm improvements (e.g., Early Exit and Mixture-of-Expert), and both hardware and system-level enhancements. Our survey stands out by analyzing these methods with roofline model, helping us understand their impact on memory access and computation. This distinctive approach not only showcases the current research landscape but also delivers valuable insights for practical implementation, positioning our work as an indispensable resource for researchers new to the field as well as for those seeking to deepen their understanding of efficient LLM deployment. The analyze tool, LLM-Viewer, is open-sourced. △ Less

Submitted 1 May, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.15792 [pdf, other]

doi 10.1016/j.rinp.2024.107736

Adjusting exceptional points using saturable nonlinearities

Authors: Qingxin Gu, Chunlei Qu, Yongping Zhang

Abstract: We study the impact of saturable nonlinearity on the presence and location of exceptional points in a non-Hermitian dimer system. The inclusion of the saturable nonlinearity leads to the emergence of multiple eigenvalues, exceeding the typical two found in the linear counterpart. To identify the exceptional points, we calculate the nonlinear eigenvalues both from a polynomial equation for the defi… ▽ More We study the impact of saturable nonlinearity on the presence and location of exceptional points in a non-Hermitian dimer system. The inclusion of the saturable nonlinearity leads to the emergence of multiple eigenvalues, exceeding the typical two found in the linear counterpart. To identify the exceptional points, we calculate the nonlinear eigenvalues both from a polynomial equation for the defined population imbalance and through a fully numerical method. Our results reveal that exceptional points can be precisely located by adjusting the non-equal saturable nonlinearities in the detuning space. △ Less

Submitted 8 June, 2024; v1 submitted 24 February, 2024; originally announced February 2024.

Comments: 7 pages and 3 figures

Journal ref: Results in Physics 61, 107736 (2024)

arXiv:2402.13349 [pdf, other]

Aria Everyday Activities Dataset

Authors: Zhaoyang Lv, Nicholas Charron, Pierre Moulon, Alexander Gamino, Cheng Peng, Chris Sweeney, Edward Miller, Huixuan Tang, Jeff Meissner, Jing Dong, Kiran Somasundaram, Luis Pesqueira, Mark Schwesinger, Omkar Parkhi, Qiao Gu, Renzo De Nardi, Shangyi Cheng, Steve Saarinen, Vijay Baiyya, Yuyang Zou, Richard Newcombe, Jakob Julian Engel, Xiaqing Pan, Carl Ren

Abstract: We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data includi… ▽ More We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data including high frequency globally aligned 3D trajectories, scene point cloud, per-frame 3D eye gaze vector and time aligned speech transcription. In this paper, we demonstrate a few exemplar research applications enabled by this dataset, including neural scene reconstruction and prompted segmentation. AEA is an open source dataset that can be downloaded from https://www.projectaria.com/datasets/aea/. We are also providing open-source implementations and examples of how to use the dataset in Project Aria Tools https://github.com/facebookresearch/projectaria_tools. △ Less

Submitted 21 February, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

Comments: Dataset website: https://www.projectaria.com/datasets/aea/

arXiv:2402.10210 [pdf, other]

Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation

Authors: Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu

Abstract: Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Re… ▽ More Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs). While cutting-edge diffusion models such as Stable Diffusion (SD) and SDXL rely on supervised fine-tuning, their performance inevitably plateaus after seeing a certain volume of data. Recently, reinforcement learning (RL) has been employed to fine-tune diffusion models with human preference data, but it requires at least two images ("winner" and "loser" images) for each text prompt. In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion), where the diffusion model engages in competition with its earlier versions, facilitating an iterative self-improvement process. Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment. Our experiments on the Pick-a-Pic dataset reveal that SPIN-Diffusion outperforms the existing supervised fine-tuning method in aspects of human preference alignment and visual appeal right from its first iteration. By the second iteration, it exceeds the performance of RLHF-based methods across all metrics, achieving these results with less data. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: 28 pages, 8 figures, 10 tables

arXiv:2402.09401 [pdf, other]

Reinforcement Learning from Human Feedback with Active Queries

Authors: Kaixuan Ji, Jiafan He, Quanquan Gu

Abstract: Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning,… ▽ More Aligning large language models (LLM) with human preference plays a key role in building modern generative models and can be achieved by reinforcement learning from human feedback (RLHF). Despite their superior performance, current RLHF approaches often require a large amount of human-labelled preference data, which is expensive to collect. In this paper, inspired by the success of active learning, we address this problem by proposing query-efficient RLHF methods. We first formalize the alignment problem as a contextual dueling bandit problem and design an active-query-based proximal policy optimization (APPO) algorithm with an $\tilde{O}(d^2/Δ)$ regret bound and an $\tilde{O}(d^2/Δ^2)$ query complexity, where $d$ is the dimension of feature space and $Δ$ is the sub-optimality gap over all the contexts. We then propose ADPO, a practical version of our algorithm based on direct preference optimization (DPO) and apply it to fine-tuning LLMs. Our experiments show that ADPO, while only making about half of queries for human preference, matches the performance of the state-of-the-art DPO method. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 28 pages, 1 figure, 4 table

arXiv:2402.08998 [pdf, other]

Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic Shortest Path

Authors: Qiwei Di, Jiafan He, Dongruo Zhou, Quanquan Gu

Abstract: We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we p… ▽ More We study the Stochastic Shortest Path (SSP) problem with a linear mixture transition kernel, where an agent repeatedly interacts with a stochastic environment and seeks to reach certain goal state while minimizing the cumulative cost. Existing works often assume a strictly positive lower bound of the cost function or an upper bound of the expected length for the optimal policy. In this paper, we propose a new algorithm to eliminate these restrictive assumptions. Our algorithm is based on extended value iteration with a fine-grained variance-aware confidence set, where the variance is estimated recursively from high-order moments. Our algorithm achieves an $\tilde{\mathcal O}(dB_*\sqrt{K})$ regret bound, where $d$ is the dimension of the feature mapping in the linear transition kernel, $B_*$ is the upper bound of the total cumulative cost for the optimal policy, and $K$ is the number of episodes. Our regret upper bound matches the $Ω(dB_*\sqrt{K})$ lower bound of linear mixture SSPs in Min et al. (2022), which suggests that our algorithm is nearly minimax optimal. △ Less

Submitted 14 February, 2024; originally announced February 2024.

Comments: 28 pages, 1 figure, In ICML 2023

Showing 1–50 of 645 results for author: Gu, Q