subscribe to arXiv mailings

VoxBlink2: A 100K+ Speaker Recognition Corpus and the Open-Set Speaker-Identification Benchmark

Authors: Yuke Lin, Ming Cheng, Fulin Zhang, Yingying Gao, Shilei Zhang, Ming Li

Abstract: In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of… ▽ More In this paper, we provide a large audio-visual speaker recognition dataset, VoxBlink2, which includes approximately 10M utterances with videos from 110K+ speakers in the wild. This dataset represents a significant expansion over the VoxBlink dataset, encompassing a broader diversity of speakers and scenarios by the grace of an optimized data collection pipeline. Afterward, we explore the impact of training strategies, data scale, and model complexity on speaker verification and finally establish a new single-model state-of-the-art EER at 0.170% and minDCF at 0.006% on the VoxCeleb1-O test set. Such remarkable results motivate us to explore speaker recognition from a new challenging perspective. We raise the Open-Set Speaker-Identification task, which is designed to either match a probe utterance with a known gallery speaker or categorize it as an unknown query. Associated with this task, we design concrete benchmark and evaluation protocols. The data and model resources can be found in http://voxblink2.github.io. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Accepted By InterSpeech2024

arXiv:2407.10974 [pdf, other]

Age and metal gradients in massive quiescent galaxies at $0.6 \lesssim z \lesssim 1.0$: implications for quenching and assembly histories

Authors: Chloe M. Cheng, Mariska Kriek, Aliza G. Beverage, Arjen van der Wel, Rachel Bezanson, Francesco D'Eugenio, Marijn Franx, Pavel E. Mancera Piña, Angelos Nersesian, Martje Slob, Katherine A. Suess, Pieter G. van Dokkum, Po-Feng Wu, Anna Gallazzi, Stefano Zibetti

Abstract: We present spatially resolved, SSP-equivalent ages, stellar metallicities, and abundance ratios for 456 massive ($10.3\lesssim\log(\mathrm{M}_*/\mathrm{M}_\odot)\lesssim11.8$) quiescent galaxies at $0.6\lesssim z\lesssim1.0$ from the LEGA-C survey, derived using full-spectrum models. Typically, we find flat age and [Mg/Fe] gradients, and negative [Fe/H] gradients, implying iron-rich cores. We also… ▽ More We present spatially resolved, SSP-equivalent ages, stellar metallicities, and abundance ratios for 456 massive ($10.3\lesssim\log(\mathrm{M}_*/\mathrm{M}_\odot)\lesssim11.8$) quiescent galaxies at $0.6\lesssim z\lesssim1.0$ from the LEGA-C survey, derived using full-spectrum models. Typically, we find flat age and [Mg/Fe] gradients, and negative [Fe/H] gradients, implying iron-rich cores. We also estimate intrinsic [Fe/H] gradients via forward-modeling. We examine the observed gradients in three age bins. Younger quiescent galaxies typically have negative [Fe/H] gradients and positive age gradients, possibly indicating a recent central starburst. Additionally, this finding suggests that photometrically-measured flat colour gradients in young quiescent galaxies are the result of the positive age and negative metallicity gradients cancelling each other. For older quiescent galaxies, the age gradients become flat and [Fe/H] gradients weaken, though remain negative. Thus, negative colour gradients at older ages are likely driven by metallicity gradients. The diminishing age gradient may result from the starburst fading. Furthermore, the persistence of the [Fe/H] gradients may suggest that the outskirts are simultaneously built up by mergers with lower-metallicity satellites. On the other hand, the gradients could be inherited from the star-forming phase, in which case mergers may not be needed to explain our findings. This work illustrates the need for resolved spectroscopy, instead of just photometry, to measure stellar population gradients. Extending these measurements to higher redshift is imperative for understanding how stellar populations in quiescent galaxies are assembled over cosmic time. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: Accepted for publication in MNRAS

Report number: MN-24-1137-MJ

arXiv:2407.04557 [pdf, other]

Structural Constraint Integration in Generative Model for Discovery of Quantum Material Candidates

Authors: Ryotaro Okabe, Mouyang Cheng, Abhijatmedhi Chotrattanapituk, Nguyen Tuan Hung, Xiang Fu, Bowen Han, Yao Wang, Weiwei Xie, Robert J. Cava, Tommi S. Jaakkola, Yongqiang Cheng, Mingda Li

Abstract: Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patt… ▽ More Billions of organic molecules are known, but only a tiny fraction of the functional inorganic materials have been discovered, a particularly relevant problem to the community searching for new quantum materials. Recent advancements in machine-learning-based generative models, particularly diffusion models, show great promise for generating new, stable materials. However, integrating geometric patterns into materials generation remains a challenge. Here, we introduce Structural Constraint Integration in the GENerative model (SCIGEN). Our approach can modify any trained generative diffusion model by strategic masking of the denoised structure with a diffused constrained structure prior to each diffusion step to steer the generation toward constrained outputs. Furthermore, we mathematically prove that SCIGEN effectively performs conditional sampling from the original distribution, which is crucial for generating stable constrained materials. We generate eight million compounds using Archimedean lattices as prototype constraints, with over 10% surviving a multi-staged stability pre-screening. High-throughput density functional theory (DFT) on 26,000 survived compounds shows that over 50% passed structural optimization at the DFT level. Since the properties of quantum materials are closely related to geometric patterns, our results indicate that SCIGEN provides a general framework for generating quantum materials candidates. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 512 pages total, 4 main figures + 218 supplementary figures

arXiv:2407.04305 [pdf, other]

Towards Stable 3D Object Detection

Authors: Jiabao Wang, Qiang Meng, Guochao Liu, Liujiang Yan, Ke Wang, Ming-Ming Cheng, Qibin Hou

Abstract: In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the community. To bridge this gap, this work proposes Stability Index (SI), a new metric that can comprehensively evaluate the stability of 3D detectors in terms of… ▽ More In autonomous driving, the temporal stability of 3D object detection greatly impacts the driving safety. However, the detection stability cannot be accessed by existing metrics such as mAP and MOTA, and consequently is less explored by the community. To bridge this gap, this work proposes Stability Index (SI), a new metric that can comprehensively evaluate the stability of 3D detectors in terms of confidence, box localization, extent, and heading. By benchmarking state-of-the-art object detectors on the Waymo Open Dataset, SI reveals interesting properties of object stability that have not been previously discovered by other metrics. To help models improve their stability, we further introduce a general and effective training strategy, called Prediction Consistency Learning (PCL). PCL essentially encourages the prediction consistency of the same objects under different timestamps and augmentations, leading to enhanced detection stability. Furthermore, we examine the effectiveness of PCL with the widely-used CenterPoint, and achieve a remarkable SI of 86.00 for vehicle class, surpassing the baseline by 5.48. We hope our work could serve as a reliable baseline and draw the community's attention to this crucial issue in 3D object detection. Codes will be made publicly available. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2407.04179 [pdf, other]

Defense Against Syntactic Textual Backdoor Attacks with Token Substitution

Authors: Xinglin Li, Xianwen He, Yao Li, Minhao Cheng

Abstract: Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed… ▽ More Textual backdoor attacks present a substantial security risk to Large Language Models (LLM). It embeds carefully chosen triggers into a victim model at the training stage, and makes the model erroneously predict inputs containing the same triggers as a certain class. Prior backdoor defense methods primarily target special token-based triggers, leaving syntax-based triggers insufficiently addressed. To fill this gap, this paper proposes a novel online defense algorithm that effectively counters syntax-based as well as special token-based backdoor attacks. The algorithm replaces semantically meaningful words in sentences with entirely different ones but preserves the syntactic templates or special tokens, and then compares the predicted labels before and after the substitution to determine whether a sentence contains triggers. Experimental results confirm the algorithm's performance against these two types of triggers, offering a comprehensive defense strategy for model integrity. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02556 [pdf, other]

Carbon and Iron Deficiencies in Quiescent Galaxies at z=1-3 from JWST-SUSPENSE: Implications for the Formation Histories of Massive Galaxies

Authors: Aliza G. Beverage, Martje Slob, Mariska Kriek, Charlie Conroy, Guillermo Barro, Rachel Bezanson, Gabriel Brammer, Chloe M. Cheng, Anna de Graaff, Natascha M. Förster Schreiber, Marijn Franx, Brian Lorenz, Pavel E. Mancera Piña, Danilo Marchesini, Adam Muzzin, Andrew B. Newman, Sedona H. Price, Alice E. Shapley, Mauro Stefanon, Katherine A. Suess, Pieter van Dokkum, David Weinberg, Daniel R. Weisz

Abstract: We present the stellar metallicities and multi-element abundances (C, Mg, Si, Ca, Ti, Cr, and Fe) of 15 massive (log M/M$_\odot$=10.2-11.2) quiescent galaxies at z=1-3, derived from ultradeep JWST-SUSPENSE spectra. Compared to quiescent galaxies at z~0, these galaxies exhibit a deficiency of 0.25 dex in [C/H], 0.16 dex in [Fe/H], and 0.07 dex in [Mg/H], implying rapid formation and quenching befor… ▽ More We present the stellar metallicities and multi-element abundances (C, Mg, Si, Ca, Ti, Cr, and Fe) of 15 massive (log M/M$_\odot$=10.2-11.2) quiescent galaxies at z=1-3, derived from ultradeep JWST-SUSPENSE spectra. Compared to quiescent galaxies at z~0, these galaxies exhibit a deficiency of 0.25 dex in [C/H], 0.16 dex in [Fe/H], and 0.07 dex in [Mg/H], implying rapid formation and quenching before significant enrichment from asymptotic giant branch stars and Type Ia supernovae. Additionally, we find that galaxies that form at higher redshift have higher [Mg/Fe] and lower [Fe/H] and [Mg/H], irrespective of their observed redshift. The evolution in [Fe/H] and [C/H] is therefore primarily explained by lower redshift samples naturally including galaxies with longer star-formation timescales. On the other hand, the lower [Mg/H] can be explained by galaxies forming at earlier epochs expelling larger gas reservoirs during their quenching phase. Consequently, the mass-metallicity relation, primarily reflecting [Mg/H], is also lower at z=1-3 compared to the lower redshift relation, though the slopes are similar. Finally, we compare our results to standard stellar population modeling approaches employing solar abundance patterns and non-parametric star-formation histories (using Prospector). Our SSP-equivalent ages agree with the mass-weighted ages from Prospector, while the metallicities disagree significantly. Nonetheless, the metallicities better reflect [Fe/H] than total [Z/H]. We also find that star-formation timescales inferred from elemental abundances are significantly shorter than those from Prospector, and we discuss the resulting implications for the early formation of massive galaxies. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Submitted to ApJ; 18 pages, 6 figures, 1 table

arXiv:2407.00256 [pdf, other]

One Prompt is not Enough: Automated Construction of a Mixture-of-Expert Prompts

Authors: Ruochen Wang, Sohyun An, Minhao Cheng, Tianyi Zhou, Sung Ju Hwang, Cho-Jui Hsieh

Abstract: Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction.… ▽ More Large Language Models (LLMs) exhibit strong generalization capabilities to novel tasks when prompted with language instructions and in-context demos. Since this ability sensitively depends on the quality of prompts, various methods have been explored to automate the instruction design. While these methods demonstrated promising results, they also restricted the searched prompt to one instruction. Such simplification significantly limits their capacity, as a single demo-free instruction might not be able to cover the entire complex problem space of the targeted task. To alleviate this issue, we adopt the Mixture-of-Expert paradigm and divide the problem space into a set of sub-regions; Each sub-region is governed by a specialized expert, equipped with both an instruction and a set of demos. A two-phase process is developed to construct the specialized expert for each region: (1) demo assignment: Inspired by the theoretical connection between in-context learning and kernel regression, we group demos into experts based on their semantic similarity; (2) instruction assignment: A region-based joint search of an instruction per expert complements the demos assigned to it, yielding a synergistic effect. The resulting method, codenamed Mixture-of-Prompts (MoP), achieves an average win rate of 81% against prior arts across several major benchmarks. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: ICML 2024. code available at https://github.com/ruocwang/mixture-of-prompts

MSC Class: 68T01

Journal ref: Proceedings of the 41st International Conference on Machine Learning (ICML), Vienna, Austria, 2024

arXiv:2406.17806 [pdf, other]

MOSSBench: Is Your Multimodal Language Model Oversensitive to Safe Queries?

Authors: Xirui Li, Hengguang Zhou, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Cho-Jui Hsieh

Abstract: Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of… ▽ More Humans are prone to cognitive distortions -- biased thinking patterns that lead to exaggerated responses to specific stimuli, albeit in very different contexts. This paper demonstrates that advanced Multimodal Large Language Models (MLLMs) exhibit similar tendencies. While these models are designed to respond queries under safety mechanism, they sometimes reject harmless queries in the presence of certain visual stimuli, disregarding the benign nature of their contexts. As the initial step in investigating this behavior, we identify three types of stimuli that trigger the oversensitivity of existing MLLMs: Exaggerated Risk, Negated Harm, and Counterintuitive Interpretation. To systematically evaluate MLLMs' oversensitivity to these stimuli, we propose the Multimodal OverSenSitivity Benchmark (MOSSBench). This toolkit consists of 300 manually collected benign multimodal queries, cross-verified by third-party reviewers (AMT). Empirical studies using MOSSBench on 20 MLLMs reveal several insights: (1). Oversensitivity is prevalent among SOTA MLLMs, with refusal rates reaching up to 76% for harmless queries. (2). Safer models are more oversensitive: increasing safety may inadvertently raise caution and conservatism in the model's responses. (3). Different types of stimuli tend to cause errors at specific stages -- perception, intent reasoning, and safety judgement -- in the response process of MLLMs. These findings highlight the need for refined safety mechanisms that balance caution with contextually appropriate responses, improving the reliability of MLLMs in real-world applications. We make our project available at https://turningpoint-ai.github.io/MOSSBench/. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.11181 [pdf, other]

General Scintillation for Gaussian Beam Propagating through Oceanic Turbulence and UWOC System Performance Evaluation

Authors: Yuxuan Li, Xiang Yi, Xinyue Tao, Ata Yalçın, Mingjian Cheng, Lu Zhang

Abstract: In this paper, we derive a general and exact closed-form expression of scintillation index (SI) for a Gaussian beam propagating through weak oceanic turbulence, based on the general oceanic turbulence optical power spectrum (OTOPS) and the Rytov theory. Our universal expression not only includes existing Rytov variances but also accounts for actual cases where the Kolmogorov microscale is non-zero… ▽ More In this paper, we derive a general and exact closed-form expression of scintillation index (SI) for a Gaussian beam propagating through weak oceanic turbulence, based on the general oceanic turbulence optical power spectrum (OTOPS) and the Rytov theory. Our universal expression not only includes existing Rytov variances but also accounts for actual cases where the Kolmogorov microscale is non-zero. The correctness and accuracy of our derivation are verified through comparison with the published work under identical conditions. By utilizing our derived expressions, we analyze the impact of various beam, propagation and oceanic turbulence parameters on both SI and bit error rate (BER) performance of underwater wireless optical communication (UWOC) systems. Numerical results demonstrate that the relationship between the Kolmogorov microscale and SI is nonlinear. Additionally, considering that certain oceanic turbulence parameters are related to depth, we use temperature and salinity data from Argo buoy deployed in real oceans to investigate the dependence of SI on depth. Our findings will contribute to the design and optimization of UWOC systems. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.08556 [pdf]

doi 10.1038/s41467-024-49261-6

Macroscopic Tunneling Probe of Moiré Spin Textures in Twisted CrI$_3$

Authors: Bowen Yang, Tarun Patel, Meixin Cheng, Kostyantyn Pichugin, Lin Tian, Nachiket Sherlekar, Shaohua Yan, Yang Fu, Shangjie Tian, Hechang Lei, Michael E. Reimer, Junichi Okamoto, Adam W. Tsen

Abstract: Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evide… ▽ More Various noncollinear spin textures and magnetic phases have been predicted in twisted two-dimensional CrI$_3$ due to competing ferromagnetic (FM) and antiferromagnetic (AFM) interlayer exchange from moiré stacking - with potential spintronic applications even when the underlying material possesses a negligible Dzyaloshinskii-Moriya or dipole-dipole interaction. Recent measurements have shown evidence of coexisting FM and AFM layer order in small-twist-angle CrI$_3$ bilayers and double bilayers. Yet, the nature of the magnetic textures remains unresolved and possibilities for their manipulation and electrical readout are unexplored. Here, we use tunneling magnetoresistance to investigate the collective spin states of twisted double-bilayer CrI$_3$ under both out-of-plane and in-plane magnetic fields together with detailed micromagnetic simulations of domain dynamics based on magnetic circular dichroism. Our results capture hysteretic and anisotropic field evolutions of the magnetic states and we further uncover two distinct non-volatile spin textures (out-of-plane and in-plane domains) at $\approx$ 1° twist angle, with a different global tunneling resistance that can be switched by magnetic field. △ Less

Submitted 12 June, 2024; originally announced June 2024.

Comments: 18 pages, 5 figures

arXiv:2406.04727 [pdf, other]

Predicting Polymer Properties Based on Multimodal Multitask Pretraining

Authors: Fanmeng Wang, Wentao Guo, Minjie Cheng, Shen Yuan, Hongteng Xu, Zhifeng Gao

Abstract: In the past few decades, polymers, high-molecular-weight compounds formed by bonding numerous identical or similar monomers covalently, have played an essential role in various scientific fields. In this context, accurate prediction of their properties is becoming increasingly crucial. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highl… ▽ More In the past few decades, polymers, high-molecular-weight compounds formed by bonding numerous identical or similar monomers covalently, have played an essential role in various scientific fields. In this context, accurate prediction of their properties is becoming increasingly crucial. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, current methods for predicting polymer properties heavily rely on information from polymer SMILES sequences (P-SMILES strings) while ignoring crucial 3D structural information, leading to sub-optimal performance. In this work, we propose MMPolymer, a novel multimodal multitask pretraining framework incorporating both polymer 1D sequential information and 3D structural information to enhance downstream polymer property prediction tasks. Besides, to overcome the limited availability of polymer 3D data, we further propose the "Star Substitution" strategy to extract 3D structural information effectively. During pretraining, MMPolymer not only predicts masked tokens and recovers 3D coordinates but also achieves the cross-modal alignment of latent representation. Subsequently, we further fine-tune the pretrained MMPolymer for downstream polymer property prediction tasks in the supervised learning paradigm. Experimental results demonstrate that MMPolymer achieves state-of-the-art performance in various polymer property prediction tasks. Moreover, leveraging the pretrained MMPolymer and using only one modality (either P-SMILES string or 3D conformation) during fine-tuning can also surpass existing polymer property prediction methods, highlighting the exceptional capability of MMPolymer in polymer feature extraction and utilization. Our online platform for polymer property prediction is available at https://app.bohrium.dp.tech/mmpolymer. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.02965 [pdf, other]

Understanding the Impact of Negative Prompts: When and How Do They Take Effect?

Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Minhao Cheng, Boqing Gong, Cho-Jui Hsieh

Abstract: The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative… ▽ More The concept of negative prompts, emerging from conditional generation models like Stable Diffusion, allows users to specify what to exclude from the generated images.%, demonstrating significant practical efficacy. Despite the widespread use of negative prompts, their intrinsic mechanisms remain largely unexplored. This paper presents the first comprehensive study to uncover how and when negative prompts take effect. Our extensive empirical analysis identifies two primary behaviors of negative prompts. Delayed Effect: The impact of negative prompts is observed after positive prompts render corresponding content. Deletion Through Neutralization: Negative prompts delete concepts from the generated image through a mutual cancellation effect in latent space with positive prompts. These insights reveal significant potential real-world applications; for example, we demonstrate that negative prompts can facilitate object inpainting with minimal alterations to the background via a simple adaptive algorithm. We believe our findings will offer valuable insights for the community in capitalizing on the potential of negative prompts. △ Less

Submitted 5 June, 2024; originally announced June 2024.

arXiv:2406.01970 [pdf, other]

The Crystal Ball Hypothesis in diffusion models: Anticipating object positions from initial noise

Authors: Yuanhao Ban, Ruochen Wang, Tianyi Zhou, Boqing Gong, Cho-Jui Hsieh, Minhao Cheng

Abstract: Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positio… ▽ More Diffusion models have achieved remarkable success in text-to-image generation tasks; however, the role of initial noise has been rarely explored. In this study, we identify specific regions within the initial noise image, termed trigger patches, that play a key role for object generation in the resulting images. Notably, these patches are ``universal'' and can be generalized across various positions, seeds, and prompts. To be specific, extracting these patches from one noise and injecting them into another noise leads to object generation in targeted areas. We identify these patches by analyzing the dispersion of object bounding boxes across generated images, leading to the development of a posterior analysis technique. Furthermore, we create a dataset consisting of Gaussian noises labeled with bounding boxes corresponding to the objects appearing in the generated images and train a detector that identifies these patches from the initial noise. To explain the formation of these patches, we reveal that they are outliers in Gaussian noise, and follow distinct distributions through two-sample tests. Finally, we find the misalignment between prompts and the trigger patch patterns can result in unsuccessful image generations. The study proposes a reject-sampling strategy to obtain optimal noise, aiming to improve prompt adherence and positional diversity in image generation. △ Less

Submitted 4 June, 2024; originally announced June 2024.

arXiv:2406.00816 [pdf, other]

Invisible Backdoor Attacks on Diffusion Models

Authors: Sen Li, Junchi Ma, Minhao Cheng

Abstract: In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of… ▽ More In recent years, diffusion models have achieved remarkable success in the realm of high-quality image generation, garnering increased attention. This surge in interest is paralleled by a growing concern over the security threats associated with diffusion models, largely attributed to their susceptibility to malicious exploitation. Notably, recent research has brought to light the vulnerability of diffusion models to backdoor attacks, enabling the generation of specific target images through corresponding triggers. However, prevailing backdoor attack methods rely on manually crafted trigger generation functions, often manifesting as discernible patterns incorporated into input noise, thus rendering them susceptible to human detection. In this paper, we present an innovative and versatile optimization framework designed to acquire invisible triggers, enhancing the stealthiness and resilience of inserted backdoors. Our proposed framework is applicable to both unconditional and conditional diffusion models, and notably, we are the pioneers in demonstrating the backdooring of diffusion models within the context of text-guided image editing and inpainting pipelines. Moreover, we also show that the backdoors in the conditional generation can be directly applied to model watermarking for model ownership verification, which further boosts the significance of the proposed framework. Extensive experiments on various commonly used samplers and datasets verify the efficacy and stealthiness of the proposed framework. Our code is publicly available at https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: Code: https://github.com/invisibleTriggerDiffusion/invisible_triggers_for_diffusion

arXiv:2406.00670 [pdf, other]

Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation

Authors: Yunheng Li, ZhongYu Li, Quansheng Zeng, Qibin Hou, Ming-Ming Cheng

Abstract: Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual fea… ▽ More Pre-trained vision-language models, e.g., CLIP, have been successfully applied to zero-shot semantic segmentation. Existing CLIP-based approaches primarily utilize visual features from the last layer to align with text embeddings, while they neglect the crucial information in intermediate layers that contain rich object details. However, we find that directly aggregating the multi-level visual features weakens the zero-shot ability for novel classes. The large differences between the visual features from different layers make these features hard to align well with the text embeddings. We resolve this problem by introducing a series of independent decoders to align the multi-level visual features with the text embeddings in a cascaded way, forming a novel but simple framework named Cascade-CLIP. Our Cascade-CLIP is flexible and can be easily applied to existing zero-shot semantic segmentation methods. Experimental results show that our simple Cascade-CLIP achieves superior zero-shot performance on segmentation benchmarks, like COCO-Stuff, Pascal-VOC, and Pascal-Context. Our code is available at: https://github.com/HVision-NKU/Cascade-CLIP △ Less

Submitted 6 June, 2024; v1 submitted 2 June, 2024; originally announced June 2024.

Comments: Accepted by ICML 2024

arXiv:2405.20396 [pdf, other]

Using the COSMIC Population Synthesis Code to Investigate How Metallicity Affects the Rates of Interacting Binaries

Authors: Ayanah L. Cason, Nicole M. Lloyd-Ronning, Roseanne M. Cheng

Abstract: We use COSMIC, a galaxy population synthesis code, to investigate how metallicity affects the rate of formation of massive stars with a closely orbiting compact object companion, the suggested progenitors of radio loud long gamma-ray bursts. We present the evolution time of these systems at different metallicities, and how the formation rates of these systems are anti-correlated with metallicity.… ▽ More We use COSMIC, a galaxy population synthesis code, to investigate how metallicity affects the rate of formation of massive stars with a closely orbiting compact object companion, the suggested progenitors of radio loud long gamma-ray bursts. We present the evolution time of these systems at different metallicities, and how the formation rates of these systems are anti-correlated with metallicity. In particular, these systems occur about 10 times more frequently in at metallicities between $Z = 2\times 10^{-4}$ and $2 \times 10^{-3}$, compared to those between $Z = 2\times 10^{-3}$ and $2 \times 10^{-2}$. This work serves as a prerequisite to predicting the global rates of these systems as a function of redshift, ultimately giving crucial insight into our understanding of the progenitors of long gamma-ray bursts and their evolution over cosmic time. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: submitted to RNAAS

arXiv:2405.18991 [pdf, other]

EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture

Authors: Jiaqi Xu, Xinyi Zou, Kunzhe Huang, Yunkuo Chen, Bo Liu, MengLi Cheng, Xing Shi, Jun Huang

Abstract: This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the producti… ▽ More This paper presents EasyAnimate, an advanced method for video generation that leverages the power of transformer architecture for high-performance outcomes. We have expanded the DiT framework originally designed for 2D image synthesis to accommodate the complexities of 3D video generation by incorporating a motion module block. It is used to capture temporal dynamics, thereby ensuring the production of consistent frames and seamless motion transitions. The motion module can be adapted to various DiT baseline methods to generate video with different styles. It can also generate videos with different frame rates and resolutions during both training and inference phases, suitable for both images and videos. Moreover, we introduce slice VAE, a novel approach to condense the temporal axis, facilitating the generation of long duration videos. Currently, EasyAnimate exhibits the proficiency to generate videos with 144 frames. We provide a holistic ecosystem for video production based on DiT, encompassing aspects such as data pre-processing, VAE training, DiT models training (both the baseline model and LoRA model), and end-to-end video inference. Code is available at: https://github.com/aigc-apps/EasyAnimate. We are continuously working to enhance the performance of our method. △ Less

Submitted 5 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

Comments: 8 pages, 6 figures

arXiv:2405.11430 [pdf, other]

MHPP: Exploring the Capabilities and Limitations of Language Models Beyond Basic Code Generation

Authors: Jianbo Dai, Jianqiao Lu, Yunlong Feng, Rongju Ruan, Ming Cheng, Haochen Tan, Zhijiang Guo

Abstract: Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and fo… ▽ More Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4 has achieved an 88.4% pass rate on HumanEval. However, this draws into question the adequacy of existing benchmarks in thoroughly assessing function-level code generation capabilities. Our study analyzed two common benchmarks, HumanEval and MBPP, and found that these might not thoroughly evaluate LLMs' code generation capacities due to limitations in quality, difficulty, and granularity. To resolve this, we introduce the Mostly Hard Python Problems (MHPP) dataset, consisting of 140 unique human-curated problems. By focusing on the combination of natural language and code reasoning, MHPP gauges LLMs' abilities to comprehend specifications and restrictions, engage in multi-step reasoning, and apply coding knowledge effectively. Initial evaluations of 22 LLMs using MHPP showed many high-performing models on HumanEval failed to achieve similar success on MHPP. Moreover, MHPP highlighted various previously undiscovered limitations within various LLMs, leading us to believe that it could pave the way for a better understanding of LLMs' capabilities and limitations. Dataset and code are available at https://github.com/SparksofAGI/MHPP. △ Less

Submitted 18 May, 2024; originally announced May 2024.

Comments: 39 pages, dataset and code are available at https://github.com/SparksofAGI/MHPP

arXiv:2405.11137 [pdf, ps, other]

Slow entropy and variational dynamical systems

Authors: Minhua Cheng, Carlos Ospina, Kurt Vinhage, Yibo Zhai

Abstract: We define variational properties for dynamical systems with subexponential complexity, and study these properties in certain specific examples. By computing the value of slow entropy directly, we show that Sturmian systems are not variational, while a class of interval exchange transformations are variational We define variational properties for dynamical systems with subexponential complexity, and study these properties in certain specific examples. By computing the value of slow entropy directly, we show that Sturmian systems are not variational, while a class of interval exchange transformations are variational △ Less

Submitted 17 May, 2024; originally announced May 2024.

arXiv:2405.11028 [pdf, other]

Simulations of Interacting Binary Systems -- Pathways to Radio Bright GRB Progenitors

Authors: Angel Hernandez, Roseanne M. Cheng, Nicole M. Lloyd-Ronning, Carl E. Fields

Abstract: Although the association of gamma-ray bursts (GRBs) with massive stellar death is on firm footing, the nature of the progenitor system and the key ingredients required for a massive star to produce a gamma-ray burst remain open questions. Here, we investigate the evolution of a massive star with a closely orbiting compact object companion using the stellar evolution code MESA. In particular, we ex… ▽ More Although the association of gamma-ray bursts (GRBs) with massive stellar death is on firm footing, the nature of the progenitor system and the key ingredients required for a massive star to produce a gamma-ray burst remain open questions. Here, we investigate the evolution of a massive star with a closely orbiting compact object companion using the stellar evolution code MESA. In particular, we examine how the companion influences the angular momentum and circumstellar environment near the end of the massive star life. We find that tidal effects can cause the compact object companion to significantly increase the angular momentum of the massive star, for orbital periods in the range of up to $\sim 4$ days. We model the density profile evolution of the massive star and discuss how tidal interactions may also lead to stripping of the outer stellar envelope in a way that can create an environment around the binary system that deviates from a typical $1/r^{2}$ wind density profile. We show how our results depend on the metallicity of the system, initial spin of the star, mass ratio, as well as accretion and dynamo prescriptions in the simulations. We conclude that these systems may be viable progenitors for radio-bright, long gamma-ray bursts. △ Less

Submitted 17 May, 2024; originally announced May 2024.

Comments: Submitted to ApJ - comments welcome

Report number: LA-UR-24-22983

arXiv:2405.06975 [pdf, other]

Input Snapshots Fusion for Scalable Discrete Dynamic Graph Nerual Networks

Authors: QingGuo Qi, Hongyang Chen, Minhao Cheng, Han Liu

Abstract: Dynamic graphs are ubiquitous in the real world, yet there is a lack of suitable theoretical frameworks to effectively extend existing static graph models into the temporal domain. Additionally, for link prediction tasks on discrete dynamic graphs, the requirement of substantial GPU memory to store embeddings of all nodes hinders the scalability of existing models. In this paper, we introduce an I… ▽ More Dynamic graphs are ubiquitous in the real world, yet there is a lack of suitable theoretical frameworks to effectively extend existing static graph models into the temporal domain. Additionally, for link prediction tasks on discrete dynamic graphs, the requirement of substantial GPU memory to store embeddings of all nodes hinders the scalability of existing models. In this paper, we introduce an Input {\bf S}napshots {\bf F}usion based {\bf Dy}namic {\bf G}raph Neural Network (SFDyG). By eliminating the partitioning of snapshots within the input window, we obtain a multi-graph (more than one edge between two nodes). Subsequently, by introducing a graph denoising problem with the assumption of temporal decayed smoothing, we integrate Hawkes process theory into Graph Neural Networks to model the generated multi-graph. Furthermore, based on the multi-graph, we propose a scalable three-step mini-batch training method and demonstrate its equivalence to full-batch training counterpart. Our experiments, conducted on eight distinct dynamic graph datasets for future link prediction tasks, revealed that SFDyG generally surpasses related methods. △ Less

Submitted 11 May, 2024; originally announced May 2024.

arXiv:2405.06388 [pdf, other]

Recovery of transversely-isotropic elastic material parameters in induction motor rotors

Authors: Hanz Martin Cheng, Tapio Helin, Ville-Petteri Manninen, Timo Holopainen, Juha Jokinen, Samu Sorvari, Andreas Rupp

Abstract: We propose numerical algorithms for recovering parameters in eigenvalue problems for linear elasticity of transversely isotropic materials. Specifically, the algorithms are used to recover the elastic constants of a rotor core. Numerical tests show that in the noiseless setup, two pairs of bending modes are sufficient for recovering one to four parameters accurately. To recover all five parameters… ▽ More We propose numerical algorithms for recovering parameters in eigenvalue problems for linear elasticity of transversely isotropic materials. Specifically, the algorithms are used to recover the elastic constants of a rotor core. Numerical tests show that in the noiseless setup, two pairs of bending modes are sufficient for recovering one to four parameters accurately. To recover all five parameters that govern the elastic properties of electric engines accurately, we require three pairs of bending modes and one torsional mode. Moreover, we study the stability of the inversion method against multiplicative noise; for tests in which the data contained multiplicative noise of at most $1\%$, we find that all parameters can be recovered with an error less than $10\%$. △ Less

Submitted 10 May, 2024; originally announced May 2024.

MSC Class: 65Z05; 65C20

arXiv:2405.03639 [pdf, other]

Strong-to-Weak Spontaneous Symmetry Breaking in Mixed Quantum States

Authors: Leonardo A. Lessa, Ruochen Ma, Jian-Hao Zhang, Zhen Bi, Meng Cheng, Chong Wang

Abstract: Symmetry in mixed quantum states can manifest in two distinct forms: \textit{strong symmetry}, where each individual pure state in the quantum ensemble is symmetric with the same charge, and \textit{weak symmetry}, which applies only to the entire ensemble. This paper explores a novel type of spontaneous symmetry breaking (SSB) where a strong symmetry is broken to a weak one. While the SSB of a we… ▽ More Symmetry in mixed quantum states can manifest in two distinct forms: \textit{strong symmetry}, where each individual pure state in the quantum ensemble is symmetric with the same charge, and \textit{weak symmetry}, which applies only to the entire ensemble. This paper explores a novel type of spontaneous symmetry breaking (SSB) where a strong symmetry is broken to a weak one. While the SSB of a weak symmetry is measured by the long-ranged two-point correlation function $\mathrm{Tr}(O_xO^{\dagger}_yρ)$, the strong-to-weak SSB (SW-SSB) is measured by the fidelity $F(ρ, O_xO^{\dagger}_yρO_yO^{\dagger}_x)$, dubbed the \textit{fidelity correlator}. We prove that SW-SSB is a universal property of mixed-state quantum phases, in the sense that the phenomenon of SW-SSB is robust against symmetric low-depth local quantum channels. { We also show that the symmetry breaking is "spontaneous " in the sense that the effect of a local symmetry-breaking measurement cannot be recovered locally.} We argue that a thermal state at a nonzero temperature in the canonical ensemble (with fixed symmetry charge) should have spontaneously broken strong symmetry. Additionally, we study non-thermal scenarios where decoherence induces SW-SSB, leading to phase transitions described by classical statistical models with bond randomness. In particular, the SW-SSB transition of a decohered Ising model can be viewed as the "ungauged" version of the celebrated toric code decodability transition. We confirm that, in the decohered Ising model, the SW-SSB transition defined by the fidelity correlator is the only physical transition in terms of channel recoverability. We also comment on other (inequivalent) definitions of SW-SSB, through correlation functions with higher Rényi indices. △ Less

Submitted 3 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

Comments: 17+6 pages, 4 figures

arXiv:2405.02390 [pdf, other]

Towards a classification of mixed-state topological orders in two dimensions

Authors: Tyler Ellison, Meng Cheng

Abstract: The classification and characterization of topological phases of matter is well understood for ground states of gapped Hamiltonians that are well isolated from the environment. However, decoherence due to interactions with the environment is inevitable -- thus motivating the investigation of topological orders in the context of mixed states. Here, we take a step toward classifying mixed-state topo… ▽ More The classification and characterization of topological phases of matter is well understood for ground states of gapped Hamiltonians that are well isolated from the environment. However, decoherence due to interactions with the environment is inevitable -- thus motivating the investigation of topological orders in the context of mixed states. Here, we take a step toward classifying mixed-state topological orders in two spatial dimensions by considering their (emergent) generalized symmetries. We argue that their 1-form symmetries and the associated anyon theories lead to a partial classification under two-way connectivity by quasi-local quantum channels. This allows us to establish mixed-state topological orders that are intrinsically mixed, i.e., that have no ground state counterpart. We provide a wide range of examples based on topological subsystem codes, decohering $G$-graded string-net models, and "classically gauging" symmetry-enriched topological orders. One of our main examples is an Ising string-net model under the influence of dephasing noise. We study the resulting space of locally-indistinguishable states and compute the modular transformations within a particular coherent space. Based on our examples, we identify two possible effects of quasi-local quantum channels on anyon theories: (1) anyons can be incoherently proliferated -- thus reducing to a commutant of the proliferated anyons, or (2) the system can be "classically gauged", resulting in the symmetrization of anyons and an extension by transparent bosons. Given these two mechanisms, we conjecture that mixed-state topological orders are classified by premodular anyon theories, i.e., those for which the braiding relations may be degenerate. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 33+10 pages, 9 figures

arXiv:2405.01434 [pdf, other]

StoryDiffusion: Consistent Self-Attention for Long-Range Image and Video Generation

Authors: Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, Qibin Hou

Abstract: For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent… ▽ More For recent diffusion-based generative models, maintaining consistent content across a series of generated images, especially those containing subjects and complex details, presents a significant challenge. In this paper, we propose a new way of self-attention calculation, termed Consistent Self-Attention, that significantly boosts the consistency between the generated images and augments prevalent pretrained diffusion-based text-to-image models in a zero-shot manner. To extend our method to long-range video generation, we further introduce a novel semantic space temporal motion prediction module, named Semantic Motion Predictor. It is trained to estimate the motion conditions between two provided images in the semantic spaces. This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation. By merging these two novel components, our framework, referred to as StoryDiffusion, can describe a text-based story with consistent images or videos encompassing a rich variety of contents. The proposed StoryDiffusion encompasses pioneering explorations in visual story generation with the presentation of images and videos, which we hope could inspire more research from the aspect of architectural modifications. Our code is made publicly available at https://github.com/HVision-NKU/StoryDiffusion. △ Less

Submitted 2 May, 2024; originally announced May 2024.

arXiv:2405.00390 [pdf, other]

CofiPara: A Coarse-to-fine Paradigm for Multimodal Sarcasm Target Identification with Large Multimodal Models

Authors: Hongzhan Lin, Zixin Chen, Ziyang Luo, Mingfei Cheng, Jing Ma, Guang Chen

Abstract: Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed… ▽ More Social media abounds with multimodal sarcasm, and identifying sarcasm targets is particularly challenging due to the implicit incongruity not directly evident in the text and image modalities. Current methods for Multimodal Sarcasm Target Identification (MSTI) predominantly focus on superficial indicators in an end-to-end manner, overlooking the nuanced understanding of multimodal sarcasm conveyed through both the text and image. This paper proposes a versatile MSTI framework with a coarse-to-fine paradigm, by augmenting sarcasm explainability with reasoning and pre-training knowledge. Inspired by the powerful capacity of Large Multimodal Models (LMMs) on multimodal reasoning, we first engage LMMs to generate competing rationales for coarser-grained pre-training of a small language model on multimodal sarcasm detection. We then propose fine-tuning the model for finer-grained sarcasm target identification. Our framework is thus empowered to adeptly unveil the intricate targets within multimodal sarcasm and mitigate the negative impact posed by potential noise inherently in LMMs. Experimental results demonstrate that our model far outperforms state-of-the-art MSTI methods, and markedly exhibits explainability in deciphering sarcasm as well. △ Less

Submitted 20 May, 2024; v1 submitted 1 May, 2024; originally announced May 2024.

Comments: ACL 2024

arXiv:2404.16646 [pdf, other]

Improving TAS Adaptability with a Variable Temperature Threshold

Authors: Anthony Dowling, Ming-Cheng Cheng, Yu Liu

Abstract: Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution. These methods aim to avoid issues such as accelerated aging of the device, premature failure and degraded chip performance. In this work, we implement a new TAS algorithm, VTF-TAS, which makes use of a variable temperature threshold to control task execution and thermal diss… ▽ More Thermal-Aware Scheduling (TAS) provides methods to manage the thermal dissipation of a computing chip during task execution. These methods aim to avoid issues such as accelerated aging of the device, premature failure and degraded chip performance. In this work, we implement a new TAS algorithm, VTF-TAS, which makes use of a variable temperature threshold to control task execution and thermal dissipation. To enable adequate execution of the tasks to reach their deadlines, this threshold is managed based on the theory of fluid scheduling. Using an evaluation methodology as described in POD-TAS, we evaluate VTF-TAS using a set of 4 benchmarks from the COMBS benchmark suite to examine its ability to minimize chip temperature throughout schedule execution. Through our evaluation, we demonstrate that this new algorithm is able to adaptively manage the temperature threshold such that the peak temperature during schedule execution is lower than POD-TAS, with no requirement for an expensive search procedure to obtain an optimal threshold for scheduling. △ Less

Submitted 25 April, 2024; originally announced April 2024.

arXiv:2404.13312 [pdf]

Seismic Interpolation Transformer for Consecutively Missing Data: A Case Study in DAS-VSP Data

Authors: Ming Cheng, Jun Lin, Xintong Dong, Shaoping Lu, Tie Zhong

Abstract: Distributed optical fiber acoustic sensing (DAS) is a rapidly-developed seismic acquisition technology with advantages of low cost, high resolution, high sensitivity, and small interval, etc. Nonetheless, consecutively missing cases often appear in real seismic data acquired by DAS system due to some factors, including optical fiber damage and inferior coupling between cable and well. Recently, so… ▽ More Distributed optical fiber acoustic sensing (DAS) is a rapidly-developed seismic acquisition technology with advantages of low cost, high resolution, high sensitivity, and small interval, etc. Nonetheless, consecutively missing cases often appear in real seismic data acquired by DAS system due to some factors, including optical fiber damage and inferior coupling between cable and well. Recently, some deep-learning seismic interpolation methods based on convolutional neural network (CNN) have shown impressive performance in regular and random missing cases but still remain the consecutively missing case as a challenging task. The main reason is that the weight sharing makes it difficult for CNN to capture enough comprehensive features. In this paper, we propose a transformer-based interpolation method, called seismic interpolation transformer (SIT), to deal with the consecutively missing case. This proposed SIT is an encoder-decoder structure connected by some U-shaped swin-transformer blocks. In encoder and decoder part, the multi-head self-attention (MSA) mechanism is used to capture global features which is essential for the reconstruction of consecutively missing traces. The U-shaped swin-transformer blocks are utilized to perform feature extraction operations on feature maps with different resolutions. Moreover, we combine the loss based on structural similarity index (SSIM) and L1 norm to propose a novel loss function for SIT. In experiments, this proposed SIT outperforms U-Net and swin-transformer. Moreover, ablation studies also demonstrate the advantages of new network architecture and loss function. △ Less

Submitted 20 April, 2024; originally announced April 2024.

arXiv:2404.12605 [pdf, other]

GluMarker: A Novel Predictive Modeling of Glycemic Control Through Digital Biomarkers

Authors: Ziyi Zhou, Ming Cheng, Xingjian Diao, Yanjun Cui, Xiangling Li

Abstract: The escalating prevalence of diabetes globally underscores the need for diabetes management. Recent research highlights the growing focus on digital biomarkers in diabetes management, with innovations in computational frameworks and noninvasive monitoring techniques using personalized glucose metrics. However, they predominantly focus on insulin dosing and specific glucose values, or with limited… ▽ More The escalating prevalence of diabetes globally underscores the need for diabetes management. Recent research highlights the growing focus on digital biomarkers in diabetes management, with innovations in computational frameworks and noninvasive monitoring techniques using personalized glucose metrics. However, they predominantly focus on insulin dosing and specific glucose values, or with limited attention given to overall glycemic control. This leaves a gap in expanding the scope of digital biomarkers for overall glycemic control in diabetes management. To address such a research gap, we propose GluMarker -- an end-to-end framework for modeling digital biomarkers using broader factors sources to predict glycemic control. Through the assessment and refinement of various machine learning baselines, GluMarker achieves state-of-the-art on Anderson's dataset in predicting next-day glycemic control. Moreover, our research identifies key digital biomarkers for the next day's glycemic control prediction. These identified biomarkers are instrumental in illuminating the daily factors that influence glycemic management, offering vital insights for diabetes care. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.12432 [pdf, other]

The JWST-SUSPENSE Ultradeep Spectroscopic Program: Survey Overview and Star-Formation Histories of Quiescent Galaxies at 1 < z < 3

Authors: Martje Slob, Mariska Kriek, Aliza G. Beverage, Katherine A. Suess, Guillermo Barro, Rachel Bezanson, Chloe M. Cheng, Charlie Conroy, Anna de Graaff, Natascha M. Förster Schreiber, Marijn Franx, Brian Lorenz, Pavel E. Mancera Piña, Danilo Marchesini, Adam Muzzin, Andrew B. Newman, Sedona H. Price, Alice E. Shapley, Mauro Stefanon, Pieter van Dokkum, Daniel R. Weisz

Abstract: We present an overview and first results from the Spectroscopic Ultradeep Survey Probing Extragalactic Near-infrared Stellar Emission (SUSPENSE), executed with NIRSpec on JWST. The primary goal of the SUSPENSE program is to characterize the stellar, chemical, and kinematic properties of massive quiescent galaxies at cosmic noon. In a single deep NIRSpec/MSA configuration, we target 20 distant quie… ▽ More We present an overview and first results from the Spectroscopic Ultradeep Survey Probing Extragalactic Near-infrared Stellar Emission (SUSPENSE), executed with NIRSpec on JWST. The primary goal of the SUSPENSE program is to characterize the stellar, chemical, and kinematic properties of massive quiescent galaxies at cosmic noon. In a single deep NIRSpec/MSA configuration, we target 20 distant quiescent galaxy candidates ($z=1-3$, $H_{AB}<23$), as well as 53 star-forming galaxies at $z=1-4$. With 16hr of integration and the G140M-F100LP dispersion-filter combination, we observe numerous Balmer and metal absorption lines for all quiescent candidates. We derive stellar masses (log$M_*/M_{\odot}\sim10.3-11.5$) and detailed star-formation histories (SFHs) and show that all 20 candidate quiescent galaxies indeed have quenched stellar populations. These galaxies show a variety of mass-weighted ages ($0.8-3.0$Gyr) and star formation timescales ($\sim0.5-4$Gyr), and four out of 20 galaxies were already quiescent by $z=3$. On average, the $z>1.75$ $[z<1.75]$ galaxies formed 50% of their stellar mass before $z=4$ $[z=3]$. Furthermore, the typical SFHs of galaxies in these two redshift bins ($z_{\text{mean}}=2.2$ and $z_{\text{mean}}=1.3$) indicate that galaxies at higher redshift formed earlier and over shorter star-formation timescales compared to lower redshifts. Although this evolution is naturally explained by the growth of the quiescent galaxy population over cosmic time, we cannot rule out that mergers and late-time star formation also contribute to the evolution. In future work, we will further unravel the early formation, quenching, and late-time evolution of these galaxies by extending this work with studies on their chemical abundances, △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: Submitted to ApJ; 24 pages, 13 figures, 2 tables (excluding appendices)

arXiv:2404.12400 [pdf, other]

Efflex: Efficient and Flexible Pipeline for Spatio-Temporal Trajectory Graph Modeling and Representation Learning

Authors: Ming Cheng, Ziyi Zhou, Bowen Zhang, Ziyu Wang, Jiaqi Gan, Ziang Ren, Weiqi Feng, Yi Lyu, Hefan Zhang, Xingjian Diao

Abstract: In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of… ▽ More In the landscape of spatio-temporal data analytics, effective trajectory representation learning is paramount. To bridge the gap of learning accurate representations with efficient and flexible mechanisms, we introduce Efflex, a comprehensive pipeline for transformative graph modeling and representation learning of the large-volume spatio-temporal trajectories. Efflex pioneers the incorporation of a multi-scale k-nearest neighbors (KNN) algorithm with feature fusion for graph construction, marking a leap in dimensionality reduction techniques by preserving essential data features. Moreover, the groundbreaking graph construction mechanism and the high-performance lightweight GCN increase embedding extraction speed by up to 36 times faster. We further offer Efflex in two versions, Efflex-L for scenarios demanding high accuracy, and Efflex-B for environments requiring swift data processing. Comprehensive experimentation with the Porto and Geolife datasets validates our approach, positioning Efflex as the state-of-the-art in the domain. Such enhancements in speed and accuracy highlight the versatility of Efflex, underscoring its wide-ranging potential for deployment in time-sensitive and computationally constrained applications. △ Less

Submitted 15 April, 2024; originally announced April 2024.

arXiv:2404.11924 [pdf, other]

Toward Short-Term Glucose Prediction Solely Based on CGM Time Series

Authors: Ming Cheng, Xingjian Diao, Ziyi Zhou, Yanjun Cui, Wenjun Liu, Shitong Cheng

Abstract: The global diabetes epidemic highlights the importance of maintaining good glycemic control. Glucose prediction is a fundamental aspect of diabetes management, facilitating real-time decision-making. Recent research has introduced models focusing on long-term glucose trend prediction, which are unsuitable for real-time decision-making and result in delayed responses. Conversely, models designed to… ▽ More The global diabetes epidemic highlights the importance of maintaining good glycemic control. Glucose prediction is a fundamental aspect of diabetes management, facilitating real-time decision-making. Recent research has introduced models focusing on long-term glucose trend prediction, which are unsuitable for real-time decision-making and result in delayed responses. Conversely, models designed to respond to immediate glucose level changes cannot analyze glucose variability comprehensively. Moreover, contemporary research generally integrates various physiological parameters (e.g. insulin doses, food intake, etc.), which inevitably raises data privacy concerns. To bridge such a research gap, we propose TimeGlu -- an end-to-end pipeline for short-term glucose prediction solely based on CGM time series data. We implement four baseline methods to conduct a comprehensive comparative analysis of the model's performance. Through extensive experiments on two contrasting datasets (CGM Glucose and Colas dataset), TimeGlu achieves state-of-the-art performance without the need for additional personal data from patients, providing effective guidance for real-world diabetic glucose management. △ Less

Submitted 18 April, 2024; originally announced April 2024.

arXiv:2404.10901 [pdf, other]

CrossGP: Cross-Day Glucose Prediction Excluding Physiological Information

Authors: Ziyi Zhou, Ming Cheng, Yanjun Cui, Xingjian Diao, Zhaorui Ma

Abstract: The increasing number of diabetic patients is a serious issue in society today, which has significant negative impacts on people's health and the country's financial expenditures. Because diabetes may develop into potential serious complications, early glucose prediction for diabetic patients is necessary for timely medical treatment. Existing glucose prediction methods typically utilize patients'… ▽ More The increasing number of diabetic patients is a serious issue in society today, which has significant negative impacts on people's health and the country's financial expenditures. Because diabetes may develop into potential serious complications, early glucose prediction for diabetic patients is necessary for timely medical treatment. Existing glucose prediction methods typically utilize patients' private data (e.g. age, gender, ethnicity) and physiological parameters (e.g. blood pressure, heart rate) as reference features for glucose prediction, which inevitably leads to privacy protection concerns. Moreover, these models generally focus on either long-term (monthly-based) or short-term (minute-based) predictions. Long-term prediction methods are generally inaccurate because of the external uncertainties that can greatly affect the glucose values, while short-term ones fail to provide timely medical guidance. Based on the above issues, we propose CrossGP, a novel machine-learning framework for cross-day glucose prediction solely based on the patient's external activities without involving any physiological parameters. Meanwhile, we implement three baseline models for comparison. Extensive experiments on Anderson's dataset strongly demonstrate the superior performance of CrossGP and prove its potential for future real-life applications. △ Less

Submitted 16 April, 2024; originally announced April 2024.

arXiv:2404.09419 [pdf]

Predicting Accurate Hot Spots in a More Than Ten-Thousand-Core GPU with a Million-Time Speedup over FEM Enabled by a Physics-based Learning Algorithm

Authors: Lin Jian, Yu Liu, Ming-Cheng Cheng

Abstract: The classical proper orthogonal decomposition (POD) with the Galerkin projection (GP) has been revised for chip-level thermal simulation of microprocessors with a large number of cores. An ensemble POD-GP methodology (EnPOD-GP) is introduced to significantly improve the training effectiveness and prediction accuracy by dividing a large number of heat sources into heat source blocks (HSBs) each of… ▽ More The classical proper orthogonal decomposition (POD) with the Galerkin projection (GP) has been revised for chip-level thermal simulation of microprocessors with a large number of cores. An ensemble POD-GP methodology (EnPOD-GP) is introduced to significantly improve the training effectiveness and prediction accuracy by dividing a large number of heat sources into heat source blocks (HSBs) each of which may contains one or a very small number of heat sources. Although very accurate, efficient and robust to any power map, EnPOD-GP suffers from intensive training for microprocessors with an enormous number of cores. A local-domain EnPOD-GP model (LEnPOD-GP) is thus proposed to further minimize the training burden. LEnPOD-GP utilizes the concepts of local domain truncation and generic building blocks to reduce the massive training data. LEnPOD-GP has been demonstrated on thermal simulation of NVIDIA Tesla Volta GV100, a GPU with more than 13,000 cores including FP32, FP64, INT32, and Tensor Cores. Due to the domain truncation for LEnPOD-GP, the least square error (LSE) is degraded but is still as small as 1.6% over the entire space and below 1.4% in the device layer when using 4 modes per HSB. When only the maximum temperature of the entire GPU is of interest, LEnPOD-GP offers a computing speed 1.1 million times faster than the FEM with a maximum error near 1.2 degrees over the entire simulation time. △ Less

Submitted 14 April, 2024; originally announced April 2024.

Comments: 8 pages, 8 figures

arXiv:2404.09403 [pdf, other]

Neuro-Inspired Information-Theoretic Hierarchical Perception for Multimodal Learning

Authors: Xiongye Xiao, Gengshuo Liu, Gaurav Gupta, Defu Cao, Shixuan Li, Yaxing Li, Tianqing Fang, Mingxi Cheng, Paul Bogdan

Abstract: Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most tra… ▽ More Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world in autonomous systems and cyber-physical systems. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Different from most traditional fusion models that incorporate all modalities identically in neural networks, our model designates a prime modality and regards the remaining modalities as detectors in the information pathway, serving to distill the flow of information. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of multimodal representation learning. Experimental evaluations on the MUStARD, CMU-MOSI, and CMU-MOSEI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks. Remarkably, on the CMU-MOSI dataset, ITHP surpasses human-level performance in the multimodal sentiment binary classification task across all evaluation metrics (i.e., Binary Accuracy, F1 Score, Mean Absolute Error, and Pearson Correlation). △ Less

Submitted 22 April, 2024; v1 submitted 14 April, 2024; originally announced April 2024.

Comments: Accepted by ICLR 2024. Camera Ready Version

arXiv:2404.08021 [pdf, other]

VeTraSS: Vehicle Trajectory Similarity Search Through Graph Modeling and Representation Learning

Authors: Ming Cheng, Bowen Zhang, Ziyu Wang, Ziyi Zhou, Weiqi Feng, Yi Lyu, Xingjian Diao

Abstract: Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer f… ▽ More Trajectory similarity search plays an essential role in autonomous driving, as it enables vehicles to analyze the information and characteristics of different trajectories to make informed decisions and navigate safely in dynamic environments. Existing work on the trajectory similarity search task primarily utilizes sequence-processing algorithms or Recurrent Neural Networks (RNNs), which suffer from the inevitable issues of complicated architecture and heavy training costs. Considering the intricate connections between trajectories, using Graph Neural Networks (GNNs) for data modeling is feasible. However, most methods directly use existing mathematical graph structures as the input instead of constructing specific graphs from certain vehicle trajectory data. This ignores such data's unique and dynamic characteristics. To bridge such a research gap, we propose VeTraSS -- an end-to-end pipeline for Vehicle Trajectory Similarity Search. Specifically, VeTraSS models the original trajectory data into multi-scale graphs, and generates comprehensive embeddings through a novel multi-layer attention-based GNN. The learned embeddings can be used for searching similar vehicle trajectories. Extensive experiments on the Porto and Geolife datasets demonstrate the effectiveness of VeTraSS, where our model outperforms existing work and reaches the state-of-the-art. This demonstrates the potential of VeTraSS for trajectory analysis and safe navigation in self-driving vehicles in the real world. △ Less

Submitted 11 April, 2024; originally announced April 2024.

arXiv:2404.05649 [pdf]

Realization of a three-dimensional photonic higher-order topological insulator

Authors: Ziyao Wang, Yan Meng, Bei Yan, Dong Zhao, Linyun Yang, Jing-Ming Chen, Min-Qi Cheng, Tao Xiao, Perry Ping Shum, Gui-Geng Liu, Yihao Yang, Hongsheng Chen, Xiang Xi, Zhen-Xiao Zhu, Biye Xie, Zhen Gao

Abstract: The discovery of photonic higher-order topological insulators (HOTIs) has significantly expanded our understanding of band topology and provided unprecedented lower-dimensional topological boundary states for robust photonic devices. However, due to the vectorial and leaky nature of electromagnetic waves, it is challenging to discover three-dimensional (3D) topological photonic systems and photoni… ▽ More The discovery of photonic higher-order topological insulators (HOTIs) has significantly expanded our understanding of band topology and provided unprecedented lower-dimensional topological boundary states for robust photonic devices. However, due to the vectorial and leaky nature of electromagnetic waves, it is challenging to discover three-dimensional (3D) topological photonic systems and photonic HOTIs have so far still been limited to two dimensions (2D). Here, we report on the first experimental realization of a 3D Wannier-type photonic HOTI in a tight-binding-like metal-cage photonic crystal, whose band structure matches well with that of a 3D tight-binding model due to the confined Mie resonances. By microwave near-field measurements, we directly observe coexisting topological surface, hinge, and corner states in a single 3D photonic HOTI, as predicted by the tight-binding model and simulation results. Moreover, we demonstrate that all-order topological boundary states are self-guided even in the light cone continuum and can be exposed to air without ancillary cladding, making them well-suited for practical applications. Our work thus opens routes to the multi-dimensional robust manipulation of electromagnetic waves at the outer surfaces of 3D cladding-free photonic bandgap materials and may find novel applications in 3D topological integrated photonics devices. △ Less

Submitted 8 April, 2024; originally announced April 2024.

Comments: 23 pages,4 figures

arXiv:2404.04489 [pdf]

doi 10.1002/adfm.202401365

Formation and Microwave Losses of Hydrides in Superconducting Niobium Thin Films Resulting from Fluoride Chemical Processing

Authors: Carlos G. Torres-Castanedo, Dominic P. Goronzy, Thang Pham, Anthony McFadden, Nicholas Materise, Paul Masih Das, Matthew Cheng, Dmitry Lebedev, Stephanie M. Ribet, Mitchell J. Walker, David A. Garcia-Wetten, Cameron J. Kopas, Jayss Marshall, Ella Lachman, Nikolay Zhelev, James A. Sauls, Joshua Y. Mutus, Corey Rae H. McRae, Vinayak P. Dravid, Michael J. Bedzyk, Mark C. Hersam

Abstract: Superconducting Nb thin films have recently attracted significant attention due to their utility for quantum information technologies. In the processing of Nb thin films, fluoride-based chemical etchants are commonly used to remove surface oxides that are known to affect superconducting quantum devices adversely. However, these same etchants can also introduce hydrogen to form Nb hydrides, potenti… ▽ More Superconducting Nb thin films have recently attracted significant attention due to their utility for quantum information technologies. In the processing of Nb thin films, fluoride-based chemical etchants are commonly used to remove surface oxides that are known to affect superconducting quantum devices adversely. However, these same etchants can also introduce hydrogen to form Nb hydrides, potentially negatively impacting microwave loss performance. Here, we present comprehensive materials characterization of Nb hydrides formed in Nb thin films as a function of fluoride chemical treatments. In particular, secondary-ion mass spectrometry, X-ray scattering, and transmission electron microscopy reveal the spatial distribution and phase transformation of Nb hydrides. The rate of hydride formation is determined by the fluoride solution acidity and the etch rate of Nb2O5, which acts as a diffusion barrier for hydrogen into Nb. The resulting Nb hydrides are detrimental to Nb superconducting properties and lead to increased power-independent microwave loss in coplanar waveguide resonators. However, Nb hydrides do not correlate with two-level system loss or device aging mechanisms. Overall, this work provides insight into the formation of Nb hydrides and their role in microwave loss, thus guiding ongoing efforts to maximize coherence time in superconducting quantum devices. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2404.04334 [pdf, other]

Disorder operators in 2D Fermi and non-Fermi liquids through multidimensional bosonization

Authors: Kang-Le Cai, Meng Cheng

Abstract: Disorder operators are a type of non-local observables for quantum many-body systems, measuring the fluctuations of symmetry charges inside a region. It has been shown that disorder operators can reveal global aspects of many-body states that are otherwise difficult to access through local measurements. We study disorder operator for U(1) (charge or spin) symmetry in 2D Fermi and non-Fermi liquid… ▽ More Disorder operators are a type of non-local observables for quantum many-body systems, measuring the fluctuations of symmetry charges inside a region. It has been shown that disorder operators can reveal global aspects of many-body states that are otherwise difficult to access through local measurements. We study disorder operator for U(1) (charge or spin) symmetry in 2D Fermi and non-Fermi liquid states, using the multidimensional bosonization formalism. For a region $A$, the logarithm of the charge disorder parameter in a Fermi liquid with isotropic interactions scales asympototically as $l_A\ln l_A$, with $l_A$ being the linear size of the region $A$. We calculate the proportionality coefficient in terms of Landau parameters of the Fermi liquid theory. We then study models of Fermi surface coupled to gapless bosonic fields realizing non-Fermi liquid states. In a simple spinless model, where the fermion density is coupled to a critical scalar, we find that at the quantum critical point, the scaling behavior of the charge disorder operators is drastically modified to $l_A \ln^2 l_A$. We also consider the composite Fermi liquid state and argue that the charge disorder operator scales as $l_A$. △ Less

Submitted 5 April, 2024; originally announced April 2024.

Comments: 17 pages, 5 figures

arXiv:2404.01651 [pdf, other]

NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps

Authors: Kristina Gligoric, Myra Cheng, Lucia Zheng, Esin Durmus, Dan Jurafsky

Abstract: The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (… ▽ More The use of words to convey speaker's intent is traditionally distinguished from the `mention' of words for quoting what someone said, or pointing out properties of a word. Here we show that computationally modeling this use-mention distinction is crucial for dealing with counterspeech online. Counterspeech that refutes problematic content often mentions harmful language but is not harmful itself (e.g., calling a vaccine dangerous is not the same as expressing disapproval of someone for calling vaccines dangerous). We show that even recent language models fail at distinguishing use from mention, and that this failure propagates to two key downstream tasks: misinformation and hate speech detection, resulting in censorship of counterspeech. We introduce prompting mitigations that teach the use-mention distinction, and show they reduce these errors. Our work highlights the importance of the use-mention distinction for NLP and CSS and offers ways to address it. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: NAACL 2024 (Main conference)

arXiv:2404.00146 [pdf, ps, other]

Fast OMP for Exact Recovery and Sparse Approximation

Authors: Huiyuan Yu, Jia He, Maggie Cheng

Abstract: Orthogonal Matching Pursuit (OMP) has been a powerful method in sparse signal recovery and approximation. However OMP suffers computational issue when the signal has large number of non-zeros. This paper advances OMP in two fronts: it offers a fast algorithm for the orthogonal projection of the input signal at each iteration, and a new selection criterion for making the greedy choice, which reduce… ▽ More Orthogonal Matching Pursuit (OMP) has been a powerful method in sparse signal recovery and approximation. However OMP suffers computational issue when the signal has large number of non-zeros. This paper advances OMP in two fronts: it offers a fast algorithm for the orthogonal projection of the input signal at each iteration, and a new selection criterion for making the greedy choice, which reduces the number of iterations it takes to recover the signal. The proposed modifications to OMP directly reduce the computational complexity. Experiment results show significant improvement over the classical OMP in computation time. The paper also provided a sufficient condition for exact recovery under the new greedy choice criterion. For general signals that may not have sparse representations, the paper provides a bound for the approximation error. The approximation error is at the same order as OMP but is obtained within fewer iterations and less time. △ Less

Submitted 29 March, 2024; originally announced April 2024.

arXiv:2403.19179 [pdf, other]

Environmental monitoring using orbital angular momentum mode decomposition enhanced machine learning

Authors: Zhaozhong Chen, Ultan Daly, Aleksandr Boldin, Lenny Hirsch, Mingjian Cheng, Martin P. J. Lavery

Abstract: Atmospheric interaction with light has been an area of fascination for many researchers over the last century. Environmental conditions, such as temperature and wind speed, heavily influence the complex and rapidly varying optical distortions propagating optical fields experience. The continuous random phase fluctuations commonly make deciphering the exact origins of specific optical aberrations c… ▽ More Atmospheric interaction with light has been an area of fascination for many researchers over the last century. Environmental conditions, such as temperature and wind speed, heavily influence the complex and rapidly varying optical distortions propagating optical fields experience. The continuous random phase fluctuations commonly make deciphering the exact origins of specific optical aberrations challenging. The generation of eddies is a major contributor to atmospheric turbulence, similar in geometric structure to optical vortices that sit at the centre of OAM beams. Decomposing the received optical fields into OAM provides a unique spatial similarity that can be used to analyse turbulent channels. In this work, we present a novel mode decomposition assisted machine learning approach that reveals trainable features in the distortions of vortex beams that allow for effective environmental monitoring. This novel technique can be used reliably with Support Vector Machine regression models to measure temperature variations of 0.49C and wind speed variations of 0.029 m/s over a 36m experimental turbulent free-space channels with controllable and verifiable temperature and wind speed with short 3s measurement. The predictable nature of these findings could indicate the presence of an underlying physical relationship between environmental conditions that lead to specific eddy formation and the OAM spiral spectra. △ Less

Submitted 6 May, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

arXiv:2403.18469 [pdf, other]

Density-guided Translator Boosts Synthetic-to-Real Unsupervised Domain Adaptive Segmentation of 3D Point Clouds

Authors: Zhimin Yuan, Wankang Zeng, Yanfei Su, Weiquan Liu, Ming Cheng, Yulan Guo, Cheng Wang

Abstract: 3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between doma… ▽ More 3D synthetic-to-real unsupervised domain adaptive segmentation is crucial to annotating new domains. Self-training is a competitive approach for this task, but its performance is limited by different sensor sampling patterns (i.e., variations in point density) and incomplete training strategies. In this work, we propose a density-guided translator (DGT), which translates point density between domains, and integrates it into a two-stage self-training pipeline named DGT-ST. First, in contrast to existing works that simultaneously conduct data generation and feature/output alignment within unstable adversarial training, we employ the non-learnable DGT to bridge the domain gap at the input level. Second, to provide a well-initialized model for self-training, we propose a category-level adversarial network in stage one that utilizes the prototype to prevent negative transfer. Finally, by leveraging the designs above, a domain-mixed self-training method with source-aware consistency loss is proposed in stage two to narrow the domain gap further. Experiments on two synthetic-to-real segmentation tasks (SynLiDAR $\rightarrow$ semanticKITTI and SynLiDAR $\rightarrow$ semanticPOSS) demonstrate that DGT-ST outperforms state-of-the-art methods, achieving 9.4$\%$ and 4.3$\%$ mIoU improvements, respectively. Code is available at \url{https://github.com/yuan-zm/DGT-ST}. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: CVPR2024

arXiv:2403.18383 [pdf, other]

Generative Multi-modal Models are Good Class-Incremental Learners

Authors: Xusheng Cao, Haori Lu, Linlan Huang, Xialei Liu, Ming-Ming Cheng

Abstract: In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL. H… ▽ More In class-incremental learning (CIL) scenarios, the phenomenon of catastrophic forgetting caused by the classifier's bias towards the current task has long posed a significant challenge. It is mainly caused by the characteristic of discriminative models. With the growing popularity of the generative multi-modal models, we would explore replacing discriminative models with generative ones for CIL. However, transitioning from discriminative to generative models requires addressing two key challenges. The primary challenge lies in transferring the generated textual information into the classification of distinct categories. Additionally, it requires formulating the task of CIL within a generative framework. To this end, we propose a novel generative multi-modal model (GMM) framework for class-incremental learning. Our approach directly generates labels for images using an adapted generative model. After obtaining the detailed text, we use a text encoder to extract text features and employ feature matching to determine the most similar label as the classification prediction. In the conventional CIL settings, we achieve significantly better results in long-sequence task scenarios. Under the Few-shot CIL setting, we have improved by at least 14\% accuracy over all the current state-of-the-art methods with significantly less forgetting. Our code is available at \url{https://github.com/DoubleClass/GMM}. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: Accepted at CVPR 2024

arXiv:2403.17298 [pdf, other]

Utilizing (Al, Ga)2O3/Ga2O3 superlattices to measure cation vacancy diffusion and vacancy-concentration-dependent diffusion of Al, Sn, and Fe in \b{eta} -Ga2O3

Authors: Nathan D. Rock, Haobo Yang, Brian Eisner, Aviva Levin, Arkka Bhattacharyya, Sriram Krishnamoorthy, Praneeth Ranga, Michael A Walker, Larry Wang, Ming Kit Cheng, Wei Zhao, Michael A. Scarpulla

Abstract: Diffusion of native defects such as vacancies and their interactions with impurities are fundamental in semiconductor crystal growth, device processing, and long-term aging of equilibration and transient diffusion of vacancies are rarely investigated. We used aluminum-gallium oxide/gallium oxide superlattices (SLs) to detect and analyze transient diffusion of cation vacancies during annealing in O… ▽ More Diffusion of native defects such as vacancies and their interactions with impurities are fundamental in semiconductor crystal growth, device processing, and long-term aging of equilibration and transient diffusion of vacancies are rarely investigated. We used aluminum-gallium oxide/gallium oxide superlattices (SLs) to detect and analyze transient diffusion of cation vacancies during annealing in O2 at 1000-1100 C. Using a novel finite difference scheme for the diffusion equation with time- and space-varying diffusion constant, we extract diffusion constants for Al, Fe, and cation vacancies under the given conditions, including the vacancy concentration dependence for Al. indicate that vacancies present in the substrate transiently diffuse through the SLs, interacting with Sn as it also diffuses. In the case of SLs grown on Sn-doped beta-gallium oxide substrates, gradients observed in the extent of Al diffusion indicate that vacancies present in the substrate transiently diffuse through the SLs, interacting with Sn as it also diffuses. In the case of SLs grown on (010) Fe-doped substrates, the Al diffusion is uniform through the SLs, indicating a depth-uniform concentration of vacancies. We find no evidence in either case for the introduction of gallium vacancies from the free surface at rates sufficient to affect Al diffusion down to ppm concentrations, which has important bearing on the validity of typically-made assumptions of vacancy equilibration. Additionally, we show that unintentional impurities in Sn-doped gallium oxide such as Fe, Ni, Mn, Cu, and Li also diffuse towards the surface and accumulate. Many of these likely have fast interstitial diffusion modes capable of destabilizing devices over time, thus highlighting the importance of controlling unintentional impurities in beta-gallium oxide wafers. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: 11 pages, 4 figures, references a supplimental which will be submitted seperately

arXiv:2403.16570 [pdf, other]

Spectropolarimetry of Fraunhofer lines in local upper solar atmosphere

Authors: Z. Q. Qu, L. Chang, G. T. Dun, X. M. Cheng, C. Fang, Z. Xu, D. Yuan, L. H. Deng, X. Y. Zhang

Abstract: Spectropolarimetric results of Fraunhofer lines between 516.3nm and 532.6nm are presented in local upper solar chromosphere, transition zone and inner corona below a height of about 0.04 solar radius above the solar limb. The data were acquired on Nov.3, 2013 during a total solar eclipse in Gabon by the prototype Fiber Arrayed Solar Optical Telescope(FASOT). It is found that the polarization ampli… ▽ More Spectropolarimetric results of Fraunhofer lines between 516.3nm and 532.6nm are presented in local upper solar chromosphere, transition zone and inner corona below a height of about 0.04 solar radius above the solar limb. The data were acquired on Nov.3, 2013 during a total solar eclipse in Gabon by the prototype Fiber Arrayed Solar Optical Telescope(FASOT). It is found that the polarization amplitudes of the Fraunhofer lines in these layers depend strongly on specific spectral lines. Fraunhofer line at MgI$b_{1}$518.4nm can have a polarization amplitude up to 0.36$\%$ with respective to the continuum polarization level, while the polarizations of some lines like FeI/CrI524.7nm and FeI525.0nm are often under the detection limit 6.0$\times 10^{-4}$. The polarizations of the Fraunhofer lines, like the emission lines and the continuum, increase with height as a whole trend. The fractional linear polarization amplitudes of inner F-corona can be close to those of inner E-corona, and in general larger than those of inner K-corona. Rotation of the polarization direction of Fraunhofer line is often accompanied with variations in their polarization amplitudes and profile shapes. It is also judged from these polarimetric properties, along with evidences, that neutral atoms exist in these atmospheric layers. Thus the inner F-corona described here is induced by the neutral atoms, and the entropy of the inner corona evaluated becomes larger than those in the underneath layers due to more microstates found. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: also Submitted to ApJ

arXiv:2403.14920 [pdf, ps, other]

3d Modularity Revisited

Authors: Miranda C. N. Cheng, Ioana Coman, Piotr Kucharski, Davide Passaro, Gabriele Sgroi

Abstract: The three-manifold topological invariants $\hat Z$ capture the half-index of the three-dimensional theory with ${\cal N}=2$ supersymmetry obtained by compactifying the M5 brane theory on the closed three-manifold. In 2019, surprising general relations between the $\hat Z$-invariants, quantum modular forms, and vertex algebras, have been proposed. In the meanwhile, an extensive array of examples ha… ▽ More The three-manifold topological invariants $\hat Z$ capture the half-index of the three-dimensional theory with ${\cal N}=2$ supersymmetry obtained by compactifying the M5 brane theory on the closed three-manifold. In 2019, surprising general relations between the $\hat Z$-invariants, quantum modular forms, and vertex algebras, have been proposed. In the meanwhile, an extensive array of examples have been studied, but several general important structural questions remain. First, for many three-manifolds it was observed that the different $\hat Z$-invariants for the same three-manifolds are quantum modular forms that span a subspace of a Weil representation for the modular group $SL_2(Z)$, corresponding to the structure of vector-valued quantum modular forms. We elucidate the meaning of this vector-valued quantum modular form structure by first proposing the analogue $\hat Z$-invariants with supersymmetric defects, and subsequently showing that the full vector-valued quantum modular form is precisely the object capturing all the $\hat Z$-invariants, with and without defects, of a given three-manifold. Second, it was expected that matching radial limits is a key feature of $\hat Z$-invariants when changing the orientation of the plumbed three-manifold, suggesting the relevance of mock modularity. We substantiate the conjecture by providing explicit proposals for such $\hat Z$-invariants for an infinite family of three-manifolds and verify their mock modularity and limits. Third, we initiate the study of the vertex algebra structure of the mock type invariants by showcasing a systematic way to construct cone vertex operator algebras associated to these invariants, which can be viewed as the partner of logarithmic vertex operator algebras in this context. △ Less

Submitted 25 March, 2024; v1 submitted 21 March, 2024; originally announced March 2024.

Comments: 59 pages, typos corrected

arXiv:2403.12372 [pdf, other]

Learning Transferable Time Series Classifier with Cross-Domain Pre-training from Language Model

Authors: Mingyue Cheng, Xiaoyu Tao, Qi Liu, Hao Zhang, Yiheng Chen, Chenyi Lei

Abstract: Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main cha… ▽ More Advancements in self-supervised pre-training (SSL) have significantly advanced the field of learning transferable time series representations, which can be very useful in enhancing the downstream task. Despite being effective, most existing works struggle to achieve cross-domain SSL pre-training, missing valuable opportunities to integrate patterns and features from different domains. The main challenge lies in the significant differences in the characteristics of time-series data across different domains, such as variations in the number of channels and temporal resolution scales. To address this challenge, we propose CrossTimeNet, a novel cross-domain SSL learning framework to learn transferable knowledge from various domains to largely benefit the target downstream task. One of the key characteristics of CrossTimeNet is the newly designed time series tokenization module, which could effectively convert the raw time series into a sequence of discrete tokens based on a reconstruction optimization process. Besides, we highlight that predicting a high proportion of corrupted tokens can be very helpful for extracting informative patterns across different domains during SSL pre-training, which has been largely overlooked in past years. Furthermore, unlike previous works, our work treats the pre-training language model (PLM) as the initialization of the encoder network, investigating the feasibility of transferring the knowledge learned by the PLM to the time series area. Through these efforts, the path to cross-domain pre-training of a generic time series model can be effectively paved. We conduct extensive experiments in a real-world scenario across various time series classification domains. The experimental results clearly confirm CrossTimeNet's superior performance. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.12371 [pdf, other]

Advancing Time Series Classification with Multimodal Language Modeling

Authors: Mingyue Cheng, Yiheng Chen, Qi Liu, Zhiding Liu, Yucong Luo

Abstract: For the advancements of time series classification, scrutinizing previous studies, most existing methods adopt a common learning-to-classify paradigm - a time series classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) encoding target categories with one-hot di… ▽ More For the advancements of time series classification, scrutinizing previous studies, most existing methods adopt a common learning-to-classify paradigm - a time series classifier model tries to learn the relation between sequence inputs and target label encoded by one-hot distribution. Although effective, this paradigm conceals two inherent limitations: (1) encoding target categories with one-hot distribution fails to reflect the comparability and similarity between labels, and (2) it is very difficult to learn transferable model across domains, which greatly hinder the development of universal serving paradigm. In this work, we propose InstructTime, a novel attempt to reshape time series classification as a learning-to-generate paradigm. Relying on the powerful generative capacity of the pre-trained language model, the core idea is to formulate the classification of time series as a multimodal understanding task, in which both task-specific instructions and raw time series are treated as multimodal inputs while the label information is represented by texts. To accomplish this goal, three distinct designs are developed in the InstructTime. Firstly, a time series discretization module is designed to convert continuous time series into a sequence of hard tokens to solve the inconsistency issue across modal inputs. To solve the modality representation gap issue, for one thing, we introduce an alignment projected layer before feeding the transformed token of time series into language models. For another, we highlight the necessity of auto-regressive pre-training across domains, which can facilitate the transferability of the language model and boost the generalization performance. Extensive experiments are conducted over benchmark datasets, whose results uncover the superior performance of InstructTime and the potential for a universal foundation model in time series classification. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11735 [pdf, other]

LSKNet: A Foundation Lightweight Backbone for Remote Sensing

Authors: Yuxuan Li, Xiang Li, Yimian Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming-Ming Cheng, Jian Yang

Abstract: Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote se… ▽ More Remote sensing images pose distinct challenges for downstream tasks due to their inherent complexity. While a considerable amount of research has been dedicated to remote sensing classification, object detection and semantic segmentation, most of these studies have overlooked the valuable prior knowledge embedded within remote sensing scenarios. Such prior knowledge can be useful because remote sensing objects may be mistakenly recognized without referencing a sufficiently long-range context, which can vary for different objects. This paper considers these priors and proposes a lightweight Large Selective Kernel Network (LSKNet) backbone. LSKNet can dynamically adjust its large spatial receptive field to better model the ranging context of various objects in remote sensing scenarios. To our knowledge, large and selective kernel mechanisms have not been previously explored in remote sensing images. Without bells and whistles, our lightweight LSKNet sets new state-of-the-art scores on standard remote sensing classification, object detection and semantic segmentation benchmarks. Our comprehensive analysis further validated the significance of the identified priors and the effectiveness of LSKNet. The code is available at https://github.com/zcablii/LSKNet. △ Less

Submitted 23 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

Comments: arXiv admin note: substantial text overlap with arXiv:2303.09030

Showing 1–50 of 722 results for author: Cheng, M