subscribe to arXiv mailings

CherryRec: Enhancing News Recommendation Quality via LLM-driven Framework

Authors: Shaohuang Wang, Lun Wang, Yunhan Bu, Tianwei Huang

Abstract: Large Language Models (LLMs) have achieved remarkable progress in language understanding and generation. Custom LLMs leveraging textual features have been applied to recommendation systems, demonstrating improvements across various recommendation scenarios. However, most existing methods perform untrained recommendation based on pre-trained knowledge (e.g., movie recommendation), and the auto-regr… ▽ More Large Language Models (LLMs) have achieved remarkable progress in language understanding and generation. Custom LLMs leveraging textual features have been applied to recommendation systems, demonstrating improvements across various recommendation scenarios. However, most existing methods perform untrained recommendation based on pre-trained knowledge (e.g., movie recommendation), and the auto-regressive generation of LLMs leads to slow inference speeds, making them less effective in real-time recommendations.To address this, we propose a framework for news recommendation using LLMs, named \textit{CherryRec}, which ensures the quality of recommendations while accelerating the recommendation process. Specifically, we employ a Knowledge-aware News Rapid Selector to retrieve candidate options based on the user's interaction history. The history and retrieved items are then input as text into a fine-tuned LLM, the Content-aware News Llm Evaluator, designed to enhance news recommendation capabilities. Finally, the Value-aware News Scorer integrates the scores to compute the CherryRec Score, which serves as the basis for the final recommendation.We validate the effectiveness of the proposed framework by comparing it with state-of-the-art baseline methods on benchmark datasets. Our experimental results consistently show that CherryRec outperforms the baselines in both recommendation performance and efficiency.The project resource can be accessed at: \url{https://github.com/xxxxxx} △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.04248 [pdf, other]

An adaptive parameter estimator for poor-quality spectral data of white dwarfs

Authors: Duo Xie, Jiangchuan Zhang, Yude Bu, Zhenping Yi, Meng Liu, Xiaoming Kong

Abstract: White dwarfs represent the end stage for 97% of stars, making precise parameter measurement crucial for understanding stellar evolution. Traditional estimation methods involve fitting spectra or photometry, which require high-quality data. In recent years, machine learning has played a crucial role in processing spectral data due to its speed, automation, and accuracy. However, two common issues h… ▽ More White dwarfs represent the end stage for 97% of stars, making precise parameter measurement crucial for understanding stellar evolution. Traditional estimation methods involve fitting spectra or photometry, which require high-quality data. In recent years, machine learning has played a crucial role in processing spectral data due to its speed, automation, and accuracy. However, two common issues have been identified. First, most studies rely on data with high signal-to-noise ratios (SNR > 10), leaving many poor-quality datasets underutilized. Second, existing machine learning models, primarily based on convolutional networks, recurrent networks, and their variants, cannot simultaneously capture both the spatial and sequential information of spectra. To address these challenges, we designed the Estimator Network (EstNet), an advanced algorithm integrating multiple techniques, including Residual Networks, Squeeze and Excitation Attention, Gated Recurrent Units, Adaptive Loss, and Monte-Carlo Dropout Layers. We conducted parameter estimation on 5,965 poor-quality white dwarf spectra (R~1800, SNR~1.17), achieving average percentage errors of 14.86% for effective temperature and 3.97% for surface gravity. These results are significantly superior to other mainstream algorithms and consistent with the outcomes of traditional theoretical spectrum fitting methods. In the future, our algorithms will be applied for large-scale parameter estimation on the Chinese Space Station Telescope and the Large Synoptic Survey Telescope. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2405.00157 [pdf, other]

Information-Theoretic Opacity-Enforcement in Markov Decision Processes

Authors: Chongyang Shi, Yuheng Bu, Jie Fu

Abstract: The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the sec… ▽ More The paper studies information-theoretic opacity, an information-flow privacy property, in a setting involving two agents: A planning agent who controls a stochastic system and an observer who partially observes the system states. The goal of the observer is to infer some secret, represented by a random variable, from its partial observations, while the goal of the planning agent is to make the secret maximally opaque to the observer while achieving a satisfactory total return. Modeling the stochastic system using a Markov decision process, two classes of opacity properties are considered -- Last-state opacity is to ensure that the observer is uncertain if the last state is in a specific set and initial-state opacity is to ensure that the observer is unsure of the realization of the initial state. As the measure of opacity, we employ the Shannon conditional entropy capturing the information about the secret revealed by the observable. Then, we develop primal-dual policy gradient methods for opacity-enforcement planning subject to constraints on total returns. We propose novel algorithms to compute the policy gradient of entropy for each observation, leveraging message passing within the hidden Markov models. This gradient computation enables us to have stable and fast convergence. We demonstrate our solution of opacity-enforcement control through a grid world example. △ Less

Submitted 30 April, 2024; originally announced May 2024.

arXiv:2404.18540 [pdf, other]

Tunable coupling of a quantum phononic resonator to a transmon qubit with flip-chip architecture

Authors: Xinhui Ruan, Li Li, Guihan Liang, Silu Zhao, Jia-heng Wang, Yizhou Bu, Bingjie Chen, Xiaohui Song, Xiang Li, He Zhang, Jinzhe Wang, Qianchuan Zhao, Kai Xu, Heng Fan, Yu-xi Liu, Jing Zhang, Zhihui Peng, Zhongcheng Xiang, Dongning Zheng

Abstract: A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted… ▽ More A hybrid system with tunable coupling between phonons and qubits shows great potential for advancing quantum information processing. In this work, we demonstrate strong and tunable coupling between a surface acoustic wave (SAW) resonator and a transmon qubit based on galvanic-contact flip-chip technique. The coupling strength varies from $2π\times$7.0 MHz to -$2π\times$20.6 MHz, which is extracted from different vacuum Rabi oscillation frequencies. The phonon-induced ac Stark shift of the qubit at different coupling strengths is also shown. Our approach offers a good experimental platform for exploring quantum acoustics and hybrid systems. △ Less

Submitted 29 April, 2024; originally announced April 2024.

arXiv:2404.15799 [pdf]

Towards the relationship between AIGC in manuscript writing and author profiles: evidence from preprints in LLMs

Authors: Jialin Liu, Yi Bu

Abstract: AIGC tools such as ChatGPT have profoundly changed scientific research, leading to widespread attention on its use on academic writing. Leveraging preprints from large language models, this study examined the use of AIGC in manuscript writing and its correlation with author profiles. We found that: (1) since the release of ChatGPT, the likelihood of abstracts being AI-generated has gradually incre… ▽ More AIGC tools such as ChatGPT have profoundly changed scientific research, leading to widespread attention on its use on academic writing. Leveraging preprints from large language models, this study examined the use of AIGC in manuscript writing and its correlation with author profiles. We found that: (1) since the release of ChatGPT, the likelihood of abstracts being AI-generated has gradually increased; (2) scientists from English-speaking countries are less likely to use AIGC tools for writing assistance, while those from countries with linguistic differences from English are more likely to use these tools; (3) there is weak correlation between a paper's AI-generated probability and authors' academic performance; and (4) authors who have previously published papers with high AI-generated probabilities are more likely to continue using AIGC tools. We believe that this paper provides insightful results for relevant policies and norms and in enhancing the understanding of the relationship between humans and AI. △ Less

Submitted 24 April, 2024; originally announced April 2024.

Comments: 8 pages, 4 figures, 1 table

MSC Class: J.0

arXiv:2404.12331 [pdf, other]

On the roles of stellar rotation and binarity in NGC 2423's main-sequence turnoff region

Authors: Yutian Bu, Chenyu He, Li Wang, Jiamao Lin, Chengyuan Li

Abstract: Research has shown that many young and intermediate-age clusters (younger than $\sim$2 Gyr) have extended main sequences and main-sequence turnoffs (eMSTOs), which cannot be adequately described by a single isochrone. The reason for the extended main sequences is now known, with the most probable cause being the fast rotation of stars. However, a significant fraction of slowly rotating stars form… ▽ More Research has shown that many young and intermediate-age clusters (younger than $\sim$2 Gyr) have extended main sequences and main-sequence turnoffs (eMSTOs), which cannot be adequately described by a single isochrone. The reason for the extended main sequences is now known, with the most probable cause being the fast rotation of stars. However, a significant fraction of slowly rotating stars form a younger stellar population than their fast-rotating counterparts, leading to speculation that they have undergone thorough rotational mixing processes internally. One speculation is that a considerable number of slowly rotating stars reside in close binary systems, where tidal forces from companion stars are the cause of their rotational deceleration. In this work, we report a relatively old open star cluster in the Milky Way, NGC 2423 ($\sim$1 Gyrs old), which exhibits an apparent eMSTO. As anticipated, many characteristics of NGC 2423 indicate that its eMSTO is driven by stellar rotations. Our calculations indicate that if slowly rotating stars commonly have a close companion star, they should exhibit significant differences in radial velocities observationally, and binary systems that can be tidally locked within the age of NGC 2423 should have a mass ratio close to 1. However, none of these predictions align with our observations. Interestingly, among the only two equal-mass binary systems in the observed region for which spectroscopic data could be obtained, we discovered that one of them is a tidally locked binary system. This further suggests the validity of our numerical simulation results. △ Less

Submitted 18 April, 2024; originally announced April 2024.

Comments: 13 pages, 11 figures, 2 tables. Accepted for publication in ApJ

arXiv:2404.04259 [pdf]

The prominent and heterogeneous gender disparities in scientific novelty: evidence from biomedical doctoral theses

Authors: Meijun Liu, Zihan Xie, Alex Jie Yang, Chao Yu, Jian Xu, Ying Ding, Yi Bu

Abstract: Scientific novelty is the essential driving force for research breakthroughs and innovation. However, little is known about how early-career scientists pursue novel research paths, and the gender disparities in this process. To address this research gap, this study investigates a comprehensive dataset of 279,424 doctoral theses in biomedical sciences authored by US Ph.D. graduates. Spanning from 1… ▽ More Scientific novelty is the essential driving force for research breakthroughs and innovation. However, little is known about how early-career scientists pursue novel research paths, and the gender disparities in this process. To address this research gap, this study investigates a comprehensive dataset of 279,424 doctoral theses in biomedical sciences authored by US Ph.D. graduates. Spanning from 1980 to 2016, the data originates from the ProQuest Dissertations & Theses Database. This study aims to shed light on Ph.D. students' pursuit of scientific novelty in their doctoral theses and assess gender-related differences in this process. Using a combinatorial approach and a pre-trained Bio-BERT model, we quantify the scientific novelty of doctoral theses based on bio-entities. Applying fractional logistic and quantile regression models, this study reveals a decreasing trend in scientific novelty over time and heterogeneous gender disparities in doctoral theses. Specifically, female students consistently exhibited lower scientific novelty levels than their male peers. When supervised by female advisors, students' theses are found to be less novel than those under male advisors. The significant interaction effect of female students and female advisors suggests that female advisors may amplify the gender disparity in scientific novelty. Moreover, heterogeneous gender disparities in scientific novelty are identified, with non-top-tier universities displaying more pronounced disparities, while the differences at higher percentile ranges were comparatively more minor. These findings indicate a potential underrepresentation of female scientists pursuing novel research during the early stages of their careers. Notably, the outcomes of this study hold significant policy implications for advancing the careers of female scientists. △ Less

Submitted 19 January, 2024; originally announced April 2024.

arXiv:2403.06337 [pdf, ps, other]

Sparse Spatial Smoothing: Reduced Complexity and Improved Beamforming Gain via Sparse Sub-Arrays

Authors: Yinyan Bu, Robin Rajamäki, Anand Dabak, Rajan Narasimha, Anil Mani, Piya Pal

Abstract: This paper addresses the problem of single snapshot Direction-of-Arrival (DOA) estimation, which is of great importance in a wide-range of applications including automotive radar. A popular approach to achieving high angular resolution when only one temporal snapshot is available is via subspace methods using spatial smoothing. This involves leveraging spatial shift-invariance in the antenna array… ▽ More This paper addresses the problem of single snapshot Direction-of-Arrival (DOA) estimation, which is of great importance in a wide-range of applications including automotive radar. A popular approach to achieving high angular resolution when only one temporal snapshot is available is via subspace methods using spatial smoothing. This involves leveraging spatial shift-invariance in the antenna array geometry, typically a uniform linear array (ULA), to rearrange the single snapshot measurement vector into a spatially smoothed matrix that reveals the signal subspace of interest. However, conventional approaches using spatially shifted ULA sub-arrays can lead to a prohibitively high computational complexity due to the large dimensions of the resulting spatially smoothed matrix. Hence, we propose to instead employ judiciously designed sparse sub-arrays, such as nested arrays, to reduce the computational complexity of spatial smoothing while retaining the aperture and identifiability of conventional ULA-based approaches. Interestingly, this idea also suggests a novel beamforming method which linearly combines multiple spatially smoothed matrices corresponding to different sets of shifts of the sparse (nested) sub-array. This so-called shift-domain beamforming method is demonstrated to boost the effective SNR, and thereby resolution, in a desired angular region of interest, enabling single snapshot low-complexity DOA estimation with identifiability guarantees. △ Less

Submitted 10 March, 2024; originally announced March 2024.

Comments: ©2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2403.03771 [pdf, other]

doi 10.1109/TVT.2024.3375027

Joint Sparsity Pattern Learning Based Channel Estimation for Massive MIMO-OTFS Systems

Authors: Kuo Meng, Shaoshi Yang, Xiao-Yang Wang, Yan Bu, Yurong Tang, Jianhua Zhang, Lajos Hanzo

Abstract: We propose a channel estimation scheme based on joint sparsity pattern learning (JSPL) for massive multi-input multi-output (MIMO) orthogonal time-frequency-space (OTFS) modulation aided systems. By exploiting the potential joint sparsity of the delay-Doppler-angle (DDA) domain channel, the channel estimation problem is transformed into a sparse recovery problem. To solve it, we first apply the sp… ▽ More We propose a channel estimation scheme based on joint sparsity pattern learning (JSPL) for massive multi-input multi-output (MIMO) orthogonal time-frequency-space (OTFS) modulation aided systems. By exploiting the potential joint sparsity of the delay-Doppler-angle (DDA) domain channel, the channel estimation problem is transformed into a sparse recovery problem. To solve it, we first apply the spike and slab prior model to iteratively estimate the support set of the channel matrix, and a higher-accuracy parameter update rule relying on the identified support set is introduced into the iteration. Then the specific values of the channel elements corresponding to the support set are estimated by the orthogonal matching pursuit (OMP) method. Both our simulation results and analysis demonstrate that the proposed JSPL channel estimation scheme achieves an improved performance over the representative state-of-the-art baseline schemes, despite its reduced pilot overhead. △ Less

Submitted 6 March, 2024; originally announced March 2024.

Comments: 6 pages, 6 figures, accepted to appear on IEEE Transactions on Vehicular Technology, Mar. 2024

arXiv:2402.11978 [pdf]

Thermal Stress Analysis of the LNG Corrugated Cryogenic Hose During Gas Pre-Cooling Process

Authors: Miaoer Liu, Fangqiu Li, Hao Cheng, Endao Li, Jun Yan, Hailong Lu, Yufeng Bu, Tingting Tang, Zhaokuan Lu

Abstract: In this study, thermal-fluid-solid coupled simulations on the gas-phase pre-cooling operation of the corrugated cryogenic hoses were performed. Attention was focused on the temporal evolution and spatial distribution of transient thermal stress in the hose structure caused by convective heat transfer of the cooling medium, Liquefied Natural Gas Boil-Off Gas (BOG). The effects of different corrugat… ▽ More In this study, thermal-fluid-solid coupled simulations on the gas-phase pre-cooling operation of the corrugated cryogenic hoses were performed. Attention was focused on the temporal evolution and spatial distribution of transient thermal stress in the hose structure caused by convective heat transfer of the cooling medium, Liquefied Natural Gas Boil-Off Gas (BOG). The effects of different corrugated hose parameters, i.e., boundary conditions, hose lengths, BOG inlet flow rates, and corrugation shapes (C-type and U-type), on the transient thermal stress behavior were thoroughly assessed. The thermal stress developed at different locations of the corrugated hoses with these parameters is found to be governed by two major factors: the boundary constraint and local temperature gradient. The objective of this study is to offer practical insights for the structural strength design of corrugated cryogenic hoses and effective pre-cooling strategies, aiming to mitigate structural safety risks caused by excessive thermal stress. △ Less

Submitted 19 February, 2024; originally announced February 2024.

arXiv:2402.08936 [pdf, other]

Predictive Temporal Attention on Event-based Video Stream for Energy-efficient Situation Awareness

Authors: Yiming Bu, Jiayang Liu, Qinru Qiu

Abstract: The Dynamic Vision Sensor (DVS) is an innovative technology that efficiently captures and encodes visual information in an event-driven manner. By combining it with event-driven neuromorphic processing, the sparsity in DVS camera output can result in high energy efficiency. However, similar to many embedded systems, the off-chip communication between the camera and processor presents a bottleneck… ▽ More The Dynamic Vision Sensor (DVS) is an innovative technology that efficiently captures and encodes visual information in an event-driven manner. By combining it with event-driven neuromorphic processing, the sparsity in DVS camera output can result in high energy efficiency. However, similar to many embedded systems, the off-chip communication between the camera and processor presents a bottleneck in terms of power consumption. Inspired by the predictive coding model and expectation suppression phenomenon found in human brain, we propose a temporal attention mechanism to throttle the camera output and pay attention to it only when the visual events cannot be well predicted. The predictive attention not only reduces power consumption in the sensor-processor interface but also effectively decreases the computational workload by filtering out noisy events. We demonstrate that the predictive attention can reduce 46.7% of data communication between the camera and the processor and reduce 43.8% computation activities in the processor. △ Less

Submitted 13 February, 2024; originally announced February 2024.

arXiv:2402.06160 [pdf, other]

Are Uncertainty Quantification Capabilities of Evidential Deep Learning a Mirage?

Authors: Maohao Shen, J. Jon Ryu, Soumya Ghosh, Yuheng Bu, Prasanna Sattigeri, Subhro Das, Gregory W. Wornell

Abstract: This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies… ▽ More This paper questions the effectiveness of a modern predictive uncertainty quantification approach, called \emph{evidential deep learning} (EDL), in which a single neural network model is trained to learn a meta distribution over the predictive distribution by minimizing a specific objective function. Despite their perceived strong empirical performance on downstream tasks, a line of recent studies by Bengs et al. identify limitations of the existing methods to conclude their learned epistemic uncertainties are unreliable, e.g., in that they are non-vanishing even with infinite data. Building on and sharpening such analysis, we 1) provide a sharper understanding of the asymptotic behavior of a wide class of EDL methods by unifying various objective functions; 2) reveal that the EDL methods can be better interpreted as an out-of-distribution detection algorithm based on energy-based-models; and 3) conduct extensive ablation studies to better assess their empirical effectiveness with real-world datasets. Through all these analyses, we conclude that even when EDL methods are empirically effective on downstream tasks, this occurs despite their poor uncertainty quantification capabilities. Our investigation suggests that incorporating model uncertainty can help EDL methods faithfully quantify uncertainties and further improve performance on representative downstream tasks, albeit at the cost of additional computational complexity. △ Less

Submitted 12 June, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 29 pages, 12 figures

arXiv:2402.03655 [pdf, other]

Operator SVD with Neural Networks via Nested Low-Rank Approximation

Authors: J. Jon Ryu, Xiangxiang Xu, H. S. Melihcan Erol, Yuheng Bu, Lizhong Zheng, Gregory W. Wornell

Abstract: Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra technique… ▽ More Computing eigenvalue decomposition (EVD) of a given linear operator, or finding its leading eigenvalues and eigenfunctions, is a fundamental task in many machine learning and scientific computing problems. For high-dimensional eigenvalue problems, training neural networks to parameterize the eigenfunctions is considered as a promising alternative to the classical numerical linear algebra techniques. This paper proposes a new optimization framework based on the low-rank approximation characterization of a truncated singular value decomposition, accompanied by new techniques called nesting for learning the top-$L$ singular values and singular functions in the correct order. The proposed method promotes the desired orthogonality in the learned functions implicitly and efficiently via an unconstrained optimization formulation, which is easy to solve with off-the-shelf gradient-based optimization algorithms. We demonstrate the effectiveness of the proposed optimization framework for use cases in computational physics and machine learning. △ Less

Submitted 5 February, 2024; originally announced February 2024.

Comments: 44 pages, 7 figures

arXiv:2401.13927 [pdf, other]

Adaptive Text Watermark for Large Language Models

Authors: Yepeng Liu, Yuheng Bu

Abstract: The advancement of Large Language Models (LLMs) has led to increasing concerns about the misuse of AI-generated text, and watermarking for LLM-generated text has emerged as a potential solution. However, it is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model. This… ▽ More The advancement of Large Language Models (LLMs) has led to increasing concerns about the misuse of AI-generated text, and watermarking for LLM-generated text has emerged as a potential solution. However, it is challenging to generate high-quality watermarked text while maintaining strong security, robustness, and the ability to detect watermarks without prior knowledge of the prompt or model. This paper proposes an adaptive watermarking strategy to address this problem. To improve the text quality and maintain robustness, we adaptively add watermarking to token distributions with high entropy measured using an auxiliary model and keep the low entropy token distributions untouched. For the sake of security and to further minimize the watermark's impact on text quality, instead of using a fixed green/red list generated from a random secret key, which can be vulnerable to decryption and forgery, we adaptively scale up the output logits in proportion based on the semantic embedding of previously generated text using a well designed semantic mapping model. Our experiments involving various LLMs demonstrate that our approach achieves comparable robustness performance to existing watermark methods. Additionally, the text generated by our method has perplexity comparable to that of \emph{un-watermarked} LLMs while maintaining security even under various attacks. △ Less

Submitted 8 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

Comments: ICML2024

arXiv:2401.12294 [pdf, ps, other]

Nearly critical superfluid: effective field theory and holography

Authors: Yanyan Bu, Hongfei Gao, Xin Gao, Zhiwei Li

Abstract: We study a nearly critical superfluid system from two complementary approaches. Within the first approach, we formulate a Schwinger-Keldysh effective field theory (EFT) for the system when it is located slightly above the critical temperature. . The set of symmetries, particularly the dynamical Kubo-Martin-Schwinger (KMS) symmetry and chemical shift symmetry, strictly constrains the form of EFT ac… ▽ More We study a nearly critical superfluid system from two complementary approaches. Within the first approach, we formulate a Schwinger-Keldysh effective field theory (EFT) for the system when it is located slightly above the critical temperature. . The set of symmetries, particularly the dynamical Kubo-Martin-Schwinger (KMS) symmetry and chemical shift symmetry, strictly constrains the form of EFT action. Within the second approach, using the holographic Schwinger-Keldysh technique, we derive the effective action for a ``microscopic'' holographic superfluid, confirming the EFT construction. A systematic inclusion of non-Gaussianity is one highlight of present study. △ Less

Submitted 3 June, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

Comments: 35 pages

arXiv:2401.06502 [pdf, ps, other]

Harnessing Holes for Spatial Smoothing with Applications in Automotive Radar

Authors: Yinyan Bu, Robin Rajamäki, Pulak Sarangi, Piya Pal

Abstract: This paper studies spatial smoothing using sparse arrays in single-snapshot Direction of Arrival (DOA) estimation. We consider the application of automotive MIMO radar, which traditionally synthesizes a large uniform virtual array by appropriate waveform and physical array design. We explore deliberately introducing holes into this virtual array to leverage resolution gains provided by the increas… ▽ More This paper studies spatial smoothing using sparse arrays in single-snapshot Direction of Arrival (DOA) estimation. We consider the application of automotive MIMO radar, which traditionally synthesizes a large uniform virtual array by appropriate waveform and physical array design. We explore deliberately introducing holes into this virtual array to leverage resolution gains provided by the increased aperture. The presence of these holes requires re-thinking DOA estimation, as conventional algorithms may no longer be easily applicable and alternative techniques, such as array interpolation, may be computationally expensive. Consequently, we study sparse array geometries that permit the direct application of spatial smoothing. We show that a sparse array geometry is amenable to spatial smoothing if it can be decomposed into the sum set of two subsets of suitable cardinality. Furthermore, we demonstrate that many such decompositions may exist - not all of them yielding equal identifiability or aperture. We derive necessary and sufficient conditions to guarantee identifiability of a given number of targets, which gives insight into choosing desirable decompositions for spatial smoothing. This provides uniform recovery guarantees and enables estimating DOAs at increased resolution and reduced computational complexity. △ Less

Submitted 12 January, 2024; originally announced January 2024.

Comments: ©2023 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

arXiv:2401.04900 [pdf, other]

SPT: Spectral Transformer for Red Giant Stars Age and Mass Estimation

Authors: Mengmeng Zhang, Fan Wu, Yude Bu, Shanshan Li, Zhenping Yi, Meng Liu, Xiaoming Kong

Abstract: The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlapping isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel fr… ▽ More The age and mass of red giants are essential for understanding the structure and evolution of the Milky Way. Traditional isochrone methods for these estimations are inherently limited due to overlapping isochrones in the Hertzsprung-Russell diagram, while asteroseismology, though more precise, requires high-precision, long-term observations. In response to these challenges, we developed a novel framework, Spectral Transformer (SPT), to predict the age and mass of red giants aligned with asteroseismology from their spectra. A key component of SPT, the Multi-head Hadamard Self-Attention mechanism, designed specifically for spectra, can capture complex relationships across different wavelength. Further, we introduced a Mahalanobis distance-based loss function to address scale imbalance and interaction mode loss, and incorporated Monte Carlo dropout for quantitative analysis of prediction uncertainty.Trained and tested on 3,880 red giant spectra from LAMOST, the SPT achieved remarkable age and mass estimations with average percentage errors of 17.64% and 6.61%, respectively, and provided uncertainties for each corresponding prediction. The results significantly outperform those of traditional machine learning algorithms and demonstrate a high level of consistency with asteroseismology methods and isochrone fitting techniques. In the future, our work will leverage datasets from the Chinese Space Station Telescope and the Large Synoptic Survey Telescope to enhance the precision of the model and broaden its applicability in the field of astronomy and astrophysics. △ Less

Submitted 9 January, 2024; originally announced January 2024.

Comments: Accepted by A&A

arXiv:2401.02904 [pdf, other]

Class-wise Generalization Error: an Information-Theoretic Analysis

Authors: Firas Laakom, Yuheng Bu, Moncef Gabbouj

Abstract: Existing generalization theories of supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all the classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the… ▽ More Existing generalization theories of supervised learning typically take a holistic approach and provide bounds for the expected generalization over the whole data distribution, which implicitly assumes that the model generalizes similarly for all the classes. In practice, however, there are significant variations in generalization performance among different classes, which cannot be captured by the existing generalization bounds. In this work, we tackle this problem by theoretically studying the class-generalization error, which quantifies the generalization performance of each individual class. We derive a novel information-theoretic bound for class-generalization error using the KL divergence, and we further obtain several tighter bounds using the conditional mutual information (CMI), which are significantly easier to estimate in practice. We empirically validate our proposed bounds in different neural networks and show that they accurately capture the complex class-generalization error behavior. Moreover, we show that the theoretical tools developed in this paper can be applied in several applications beyond this context. △ Less

Submitted 5 January, 2024; originally announced January 2024.

Comments: 26 pages

arXiv:2312.15712 [pdf, other]

doi 10.3847/1538-4365/ad0551

Edge-on Low-surface-brightness Galaxy Candidates Detected from SDSS Images Using YOLO

Authors: Yongguang Xing, Zhenping Yi, Zengxu Liang, Hao Su, Wei Du, Min He, Meng Liu, Xiaoming Kong, Yude Bu, Hong Wu

Abstract: Low-surface-brightness galaxies (LSBGs), fainter members of the galaxy population, are thought to be numerous. However, due to their low surface brightness, the search for a wide-area sample of LSBGs is difficult, which in turn limits our ability to fully understand the formation and evolution of galaxies as well as galaxy relationships. Edge-on LSBGs, due to their unique orientation, offer an exc… ▽ More Low-surface-brightness galaxies (LSBGs), fainter members of the galaxy population, are thought to be numerous. However, due to their low surface brightness, the search for a wide-area sample of LSBGs is difficult, which in turn limits our ability to fully understand the formation and evolution of galaxies as well as galaxy relationships. Edge-on LSBGs, due to their unique orientation, offer an excellent opportunity to study galaxy structure and galaxy components. In this work, we utilize the You Only Look Once object detection algorithm to construct an edge-on LSBG detection model by training on 281 edge-on LSBGs in Sloan Digital Sky Survey (SDSS) $gri$-band composite images. This model achieved a recall of 94.64% and a purity of 95.38% on the test set. We searched across 938,046 $gri$-band images from SDSS Data Release 16 and found 52,293 candidate LSBGs. To enhance the purity of the candidate LSBGs and reduce contamination, we employed the Deep Support Vector Data Description algorithm to identify anomalies within the candidate samples. Ultimately, we compiled a catalog containing 40,759 edge-on LSBG candidates. This sample has similar characteristics to the training data set, mainly composed of blue edge-on LSBG candidates. The catalog is available online at https://github.com/worldoutside/Edge-on_LSBG. △ Less

Submitted 25 December, 2023; originally announced December 2023.

Comments: 12 pages, 11 figures, accepted to be published on APJS

Journal ref: The Astrophysical Journal Supplement Series, Volume 269, Issue 2, id.59, 9 pp., December 2023

arXiv:2310.09453

Effects of Same-Race Mentorship Preferences on Academic Performance and Survival

Authors: Meijun Liu, Yi Bu, Daifeng Li, Ying Ding, Daniel E. Acuna

Abstract: Same-race mentorship preference refers to mentors or mentees forming connections significantly influenced by a shared race. Although racial diversity in science has been well-studied and linked to favorable outcomes, the extent and effects of same-race mentorship preferences remain largely underexplored. Here, we analyze 465,355 mentor-mentee pairs from more than 60 research areas over the last 70… ▽ More Same-race mentorship preference refers to mentors or mentees forming connections significantly influenced by a shared race. Although racial diversity in science has been well-studied and linked to favorable outcomes, the extent and effects of same-race mentorship preferences remain largely underexplored. Here, we analyze 465,355 mentor-mentee pairs from more than 60 research areas over the last 70 years to investigate the effect of same-race mentorship preferences on mentees' academic performance and survival. We use causal inference and statistical matching to measure same-race mentorship preferences while accounting for racial demographic variations across institutions, time periods, and research fields. Our findings reveal a pervasive same-race mentorship propensity across races, fields, and universities of varying research intensity. We observe an increase in same-race mentorship propensity over the years, further reinforced inter-generationally within a mentorship lineage. This propensity is more pronounced for minorities (Asians, Blacks, and Hispanics). Our results reveal that mentees under the supervision of mentors with high same-race propensity experience significantly lower productivity, impact, and collaboration reach during and after training, ultimately leading to a 27.6% reduced likelihood of remaining in academia. In contrast, a mentorship approach devoid of racial propensity appears to offer the best prospects for academic performance and persistence. These findings underscore the importance of mentorship diversity for academic success and shed light on factors contributing to minority underrepresentation in science. △ Less

Submitted 4 May, 2024; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 1. After further evaluating the race prediction method, we observed unsatisfactory accuracy and F1 scores. The study's findings could be impacted by these subpar predictions. 2. Our study incorporates both US and non-US samples, revealing that non-US samples may introduce outliers and distort the results. We recognize that the study's findings and conclusions might be affected by data quality

arXiv:2310.04945 [pdf, other]

Balancing Specialized and General Skills in LLMs: The Impact of Modern Tuning and Data Strategy

Authors: Zheng Zhang, Chen Zheng, Da Tang, Ke Sun, Yukun Ma, Yingtong Bu, Xun Zhou, Liang Zhao

Abstract: This paper introduces a multifaceted methodology for fine-tuning and evaluating large language models (LLMs) for specialized monetization tasks. The goal is to balance general language proficiency with domain-specific skills. The methodology has three main components: 1) Carefully blending in-domain and general-purpose data during fine-tuning to achieve an optimal balance between general and speci… ▽ More This paper introduces a multifaceted methodology for fine-tuning and evaluating large language models (LLMs) for specialized monetization tasks. The goal is to balance general language proficiency with domain-specific skills. The methodology has three main components: 1) Carefully blending in-domain and general-purpose data during fine-tuning to achieve an optimal balance between general and specialized capabilities; 2) Designing a comprehensive evaluation framework with 45 questions tailored to assess performance on functionally relevant dimensions like reliability, consistency, and business impact; 3) Analyzing how model size and continual training influence metrics to guide efficient resource allocation during fine-tuning. The paper details the design, data collection, analytical techniques, and results validating the proposed frameworks. It aims to provide businesses and researchers with actionable insights on effectively adapting LLMs for specialized contexts. We also intend to make public the comprehensive evaluation framework, which includes the 45 tailored questions and their respective scoring guidelines, to foster transparency and collaboration in adapting LLMs for specialized tasks. △ Less

Submitted 7 October, 2023; originally announced October 2023.

arXiv:2307.16359 [pdf, other]

doi 10.1016/j.physa.2023.128958

Discerning media bias within a network of political allies and opponents: Disruption by partisans

Authors: Yutong Bu, Andrew Melatos

Abstract: An individual's opinions about media bias derive from their own independent assessment of media outputs combined with peer pressure from networked political allies and opponents. Here we generalize previous idealized, probabilistic models of the perception formation process, based on a network of Bayesian learners inferring the bias of a coin, by introducing obdurate agents (partisans), whose opin… ▽ More An individual's opinions about media bias derive from their own independent assessment of media outputs combined with peer pressure from networked political allies and opponents. Here we generalize previous idealized, probabilistic models of the perception formation process, based on a network of Bayesian learners inferring the bias of a coin, by introducing obdurate agents (partisans), whose opinions stay fixed. It is found that even one partisan destabilizes an allies-only network, stopping it from achieving asymptotic learning and forcing persuadable agents to vacillate indefinitely (turbulent nonconvergence) between the true coin bias $θ_0$ and the partisan's belief $θ_{\rm p}$. The dwell time $t_{\rm d}$ at the partisan's belief increases, as the partisan fraction $f$ increases, and decreases, when multiple partisans disagree amongst themselves. In opponents-only networks, asymptotic learning occurs, whether or not partisans are present. However, the counterintuitive tendency to reach wrong conclusions first, identified in previous work with zero partisans, does not persist in general for $θ_0 \neq θ_{\rm p}$ in complete networks; it is a property of sparsely connected systems (e.g.\ Barabási-Albert networks with attachment parameter $\lesssim 10$). In mixed networks containing allies and opponents, partisans drive counterintuitive outcomes, which depend sensitively, on where they reside. A strongly balanced triad exhibits intermittency with a partisan (sudden transitions between long intervals of static beliefs and turbulent nonconvergence) and asymptotic learning without a partisan. △ Less

Submitted 30 July, 2023; originally announced July 2023.

Comments: 36 pages, 17 figures

arXiv:2307.10198 [pdf]

Has China caught up to the US in AI research? An exploration of mimetic isomorphism as a model for late industrializers

Authors: Chao Min, Yi Zhao, Yi Bu, Ying Ding, Caroline S. Wagner

Abstract: Artificial Intelligence (AI), a cornerstone of 21st-century technology, has seen remarkable growth in China. In this paper, we examine China's AI development process, demonstrating that it is characterized by rapid learning and differentiation, surpassing the export-oriented growth propelled by Foreign Direct Investment seen in earlier Asian industrializers. Our data indicates that China current… ▽ More Artificial Intelligence (AI), a cornerstone of 21st-century technology, has seen remarkable growth in China. In this paper, we examine China's AI development process, demonstrating that it is characterized by rapid learning and differentiation, surpassing the export-oriented growth propelled by Foreign Direct Investment seen in earlier Asian industrializers. Our data indicates that China currently leads the USA in the volume of AI-related research papers. However, when we delve into the quality of these papers based on specific metrics, the USA retains a slight edge. Nevertheless, the pace and scale of China's AI development remain noteworthy. We attribute China's accelerated AI progress to several factors, including global trends favoring open access to algorithms and research papers, contributions from China's broad diaspora and returnees, and relatively lax data protection policies. In the vein of our research, we have developed a novel measure for gauging China's imitation of US research. Our analysis shows that by 2018, the time lag between China and the USA in addressing AI research topics had evaporated. This finding suggests that China has effectively bridged a significant knowledge gap and could potentially be setting out on an independent research trajectory. While this study compares China and the USA exclusively, it's important to note that research collaborations between these two nations have resulted in more highly cited work than those produced by either country independently. This underscores the power of international cooperation in driving scientific progress in AI. △ Less

Submitted 11 July, 2023; originally announced July 2023.

arXiv:2307.03371 [pdf, ps, other]

doi 10.1016/j.joi.2023.101427

What makes a successful rebuttal in computer science conferences? : A perspective on social interaction

Authors: Junjie Huang, Win-bin Huang, Yi Bu, Qi Cao, Huawei Shen, Xueqi Cheng

Abstract: With an exponential increase in submissions to top-tier Computer Science (CS) conferences, more and more conferences have introduced a rebuttal stage to the conference peer review process. The rebuttal stage can be modeled as social interactions between authors and reviewers. A successful rebuttal often results in an increased review score after the rebuttal stage. In this paper, we conduct an emp… ▽ More With an exponential increase in submissions to top-tier Computer Science (CS) conferences, more and more conferences have introduced a rebuttal stage to the conference peer review process. The rebuttal stage can be modeled as social interactions between authors and reviewers. A successful rebuttal often results in an increased review score after the rebuttal stage. In this paper, we conduct an empirical study to determine the factors contributing to a successful rebuttal using over 3,000 papers and 13,000 reviews from ICLR2022, one of the most prestigious computer science conferences. First, we observe a significant difference in review scores before and after the rebuttal stage, which is crucial for paper acceptance. Furthermore, we investigate factors from the reviewer's perspective using signed social network analysis. A notable finding is the increase in balanced network structure after the rebuttal stage. Subsequently, we evaluate several quantifiable author rebuttal strategies and their effects on review scores. These strategies can help authors in improving their review scores. Finally, we used machine learning models to predict rebuttal success and validated the impact of potential factors analyzed in this paper. Our experiments demonstrate that the utilization of all features proposed in this study can aid in predicting the success of the rebuttal. In summary, this work presents a study on the impact factors of successful rebuttals from both reviewers' and authors' perspectives and lays the foundation for analyzing rebuttals with social network analysis. △ Less

Submitted 21 July, 2023; v1 submitted 6 July, 2023; originally announced July 2023.

Journal ref: Volume 17, Issue 3, August 2023, 101427, Journal of Informetrics

arXiv:2306.15804 [pdf]

The Impact of Heterogeneous Shared Leadership in Scientific Teams

Authors: Huimin Xu, Meijun Liu, Yi Bu, Shujing Sun, Yi Zhang, Chenwei Zhang, Daniel E. Acuna, Steven Gray, Eric Meyer, Ying Ding

Abstract: Leadership is evolving dynamically from an individual endeavor to shared efforts. This paper aims to advance our understanding of shared leadership in scientific teams. We define three kinds of leaders, junior (10-15), mid (15-20), and senior (20+) based on career age. By considering the combinations of any two leaders, we distinguish shared leadership as heterogeneous when leaders are in differen… ▽ More Leadership is evolving dynamically from an individual endeavor to shared efforts. This paper aims to advance our understanding of shared leadership in scientific teams. We define three kinds of leaders, junior (10-15), mid (15-20), and senior (20+) based on career age. By considering the combinations of any two leaders, we distinguish shared leadership as heterogeneous when leaders are in different age cohorts and homogeneous when leaders are in the same age cohort. Drawing on 1,845,351 CS, 254,039 Sociology, and 193,338 Business teams with two leaders in the OpenAlex dataset, we identify that heterogeneous shared leadership brings higher citation impact for teams than homogeneous shared leadership. Specifically, when junior leaders are paired with senior leaders, it significantly increases team citation ranking by 1-2%, in comparison with two leaders of similar age. We explore the patterns between homogeneous leaders and heterogeneous leaders from team scale, expertise composition, and knowledge recency perspectives. Compared with homogeneous leaders, heterogeneous leaders are more adaptive in large teams, have more diverse expertise, and trace both the newest and oldest references. △ Less

Submitted 27 June, 2023; originally announced June 2023.

arXiv:2306.05583 [pdf, other]

Gibbs-Based Information Criteria and the Over-Parameterized Regime

Authors: Haobo Chen, Yuheng Bu, Gregory W. Wornell

Abstract: Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) an… ▽ More Double-descent refers to the unexpected drop in test loss of a learning algorithm beyond an interpolating threshold with over-parameterization, which is not predicted by information criteria in their classical forms due to the limitations in the standard asymptotic approach. We update these analyses using the information risk minimization framework and provide Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) for models learned by the Gibbs algorithm. Notably, the penalty terms for the Gibbs-based AIC and BIC correspond to specific information measures, i.e., symmetrized KL information and KL divergence. We extend this information-theoretic analysis to over-parameterized models by providing two different Gibbs-based BICs to compute the marginal likelihood of random feature models in the regime where the number of parameters $p$ and the number of samples $n$ tend to infinity, with $p/n$ fixed. Our experiments demonstrate that the Gibbs-based BIC can select the high-dimensional model and reveal the mismatch between marginal likelihood and population risk in the over-parameterized regime, providing new insights to understand double-descent. △ Less

Submitted 13 November, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

arXiv:2306.05312 [pdf, other]

Tunable Coupling Architectures with Capacitively Connecting Pads for Large-Scale Superconducting Multi-Qubit Processors

Authors: Gui-Han Liang, Xiao-Hui Song, Cheng-Lin Deng, Xu-Yang Gu, Yu Yan, Zheng-Yang Mei, Si-Lu Zhao, Yi-Zhou Bu, Yong-Xi Xiao, Yi-Han Yu, Ming-Chuan Wang, Tong Liu, Yun-Hao Shi, He Zhang, Xiang Li, Li Li, Jing-Zhe Wang, Ye Tian, Shi-Ping Zhao, Kai Xu, Heng Fan, Zhong-Cheng Xiang, Dong-Ning Zheng

Abstract: We have proposed and experimentally verified a tunable inter-qubit coupling scheme for large-scale integration of superconducting qubits. The key feature of the scheme is the insertion of connecting pads between qubit and tunable coupling element. In such a way, the distance between two qubits can be increased considerably to a few millimeters, leaving enough space for arranging control lines, rea… ▽ More We have proposed and experimentally verified a tunable inter-qubit coupling scheme for large-scale integration of superconducting qubits. The key feature of the scheme is the insertion of connecting pads between qubit and tunable coupling element. In such a way, the distance between two qubits can be increased considerably to a few millimeters, leaving enough space for arranging control lines, readout resonators and other necessary structures. The increased inter-qubit distance provides more wiring space for flip-chip process and reduces crosstalk between qubits and from control lines to qubits. We use the term Tunable Coupler with Capacitively Connecting Pad (TCCP) to name the tunable coupling part that consists of a transmon coupler and capacitively connecting pads. With the different placement of connecting pads, different TCCP architectures can be realized. We have designed and fabricated a few multi-qubit devices in which TCCP is used for coupling. The measured results show that the performance of the qubits coupled by the TCCP, such as $T_1$ and $T_2$, was similar to that of the traditional transmon qubits without TCCP. Meanwhile, our TCCP also exhibited a wide tunable range of the effective coupling strength and a low residual ZZ interaction between the qubits by properly tuning the parameters on the design. Finally, we successfully implemented an adiabatic CZ gate with TCCP. Furthermore, by introducing TCCP, we also discuss the realization of the flip-chip process and tunable coupling qubits between different chips. △ Less

Submitted 8 June, 2023; originally announced June 2023.

Comments: Main text: 7 pages, 6 figures

arXiv:2305.20074 [pdf, other]

Feature Learning in Image Hierarchies using Functional Maximal Correlation

Authors: Bo Hu, Yuheng Bu, José C. Príncipe

Abstract: This paper proposes the Hierarchical Functional Maximal Correlation Algorithm (HFMCA), a hierarchical methodology that characterizes dependencies across two hierarchical levels in multiview systems. By framing view similarities as dependencies and ensuring contrastivity by imposing orthonormality, HFMCA achieves faster convergence and increased stability in self-supervised learning. HFMCA defines… ▽ More This paper proposes the Hierarchical Functional Maximal Correlation Algorithm (HFMCA), a hierarchical methodology that characterizes dependencies across two hierarchical levels in multiview systems. By framing view similarities as dependencies and ensuring contrastivity by imposing orthonormality, HFMCA achieves faster convergence and increased stability in self-supervised learning. HFMCA defines and measures dependencies within image hierarchies, from pixels and patches to full images. We find that the network topology for approximating orthonormal basis functions aligns with a vanilla CNN, enabling the decomposition of density ratios between neighboring layers of feature maps. This approach provides powerful interpretability, revealing the resemblance between supervision and self-supervision through the lens of internal representations. △ Less

Submitted 31 May, 2023; originally announced May 2023.

arXiv:2305.08207 [pdf, other]

A Bilateral Bound on the Mean-Square Error for Estimation in Model Mismatch

Authors: Amir Weiss, Alejandro Lancho, Yuheng Bu, Gregory W. Wornell

Abstract: A bilateral (i.e., upper and lower) bound on the mean-square error under a general model mismatch is developed. The bound, which is derived from the variational representation of the chi-square divergence, is applicable in the Bayesian and nonBayesian frameworks to biased and unbiased estimators. Unlike other classical MSE bounds that depend only on the model, our bound is also estimator-dependent… ▽ More A bilateral (i.e., upper and lower) bound on the mean-square error under a general model mismatch is developed. The bound, which is derived from the variational representation of the chi-square divergence, is applicable in the Bayesian and nonBayesian frameworks to biased and unbiased estimators. Unlike other classical MSE bounds that depend only on the model, our bound is also estimator-dependent. Thus, it is applicable as a tool for characterizing the MSE of a specific estimator. The proposed bounding technique has a variety of applications, one of which is a tool for proving the consistency of estimators for a class of models. Furthermore, it provides insight as to why certain estimators work well under general model mismatch conditions. △ Less

Submitted 14 May, 2023; originally announced May 2023.

Comments: Accepted for publication in Proc. of ISIT 2023

arXiv:2305.00593 [pdf, other]

Reliable Gradient-free and Likelihood-free Prompt Tuning

Authors: Maohao Shen, Soumya Ghosh, Prasanna Sattigeri, Subhro Das, Yuheng Bu, Gregory Wornell

Abstract: Due to privacy or commercial constraints, large pre-trained language models (PLMs) are often offered as black-box APIs. Fine-tuning such models to downstream tasks is challenging because one can neither access the model's internal representations nor propagate gradients through it. This paper addresses these challenges by developing techniques for adapting PLMs with only API access. Building on re… ▽ More Due to privacy or commercial constraints, large pre-trained language models (PLMs) are often offered as black-box APIs. Fine-tuning such models to downstream tasks is challenging because one can neither access the model's internal representations nor propagate gradients through it. This paper addresses these challenges by developing techniques for adapting PLMs with only API access. Building on recent work on soft prompt tuning, we develop methods to tune the soft prompts without requiring gradient computation. Further, we develop extensions that in addition to not requiring gradients also do not need to access any internal representation of the PLM beyond the input embeddings. Moreover, instead of learning a single prompt, our methods learn a distribution over prompts allowing us to quantify predictive uncertainty. Ours is the first work to consider uncertainty in prompts when only having API access to the PLM. Finally, through extensive experiments, we carefully vet the proposed methods and find them competitive with (and sometimes even improving on) gradient-based approaches with full access to the PLM. △ Less

Submitted 30 April, 2023; originally announced May 2023.

Comments: EACL 2023 (Findings)

arXiv:2304.14332 [pdf, other]

On the Generalization Error of Meta Learning for the Gibbs Algorithm

Authors: Yuheng Bu, Harsha Vardhan Tetali, Gholamali Aminian, Miguel Rodrigues, Gregory Wornell

Abstract: We analyze the generalization ability of joint-training meta learning algorithms via the Gibbs algorithm. Our exact characterization of the expected meta generalization error for the meta Gibbs algorithm is based on symmetrized KL information, which measures the dependence between all meta-training datasets and the output parameters, including task-specific and meta parameters. Additionally, we de… ▽ More We analyze the generalization ability of joint-training meta learning algorithms via the Gibbs algorithm. Our exact characterization of the expected meta generalization error for the meta Gibbs algorithm is based on symmetrized KL information, which measures the dependence between all meta-training datasets and the output parameters, including task-specific and meta parameters. Additionally, we derive an exact characterization of the meta generalization error for the super-task Gibbs algorithm, in terms of conditional symmetrized KL information within the super-sample and super-task framework introduced in Steinke and Zakynthinou (2020) and Hellstrom and Durisi (2022) respectively. Our results also enable us to provide novel distribution-free generalization error upper bounds for these Gibbs algorithms applicable to meta learning. △ Less

Submitted 27 April, 2023; originally announced April 2023.

Comments: Accepted at ISIT 2023

arXiv:2304.14173 [pdf, other]

doi 10.1007/JHEP09(2023)019

U(1) quasi-hydrodynamics: Schwinger-Keldysh effective field theory and holography

Authors: Matteo Baggioli, Yanyan Bu, Vaios Ziogas

Abstract: We study the quasi-hydrodynamics of a system with a softly broken $U(1)$ global symmetry using effective field theory (EFT) and holographic methods. In the gravity side, we consider a holographic Proca model in the limit of small bulk mass, which is responsible for a controllable explicit breaking of the $U(1)$ global symmetry in the boundary field theory. We perform a holographic Schwinger-Keldys… ▽ More We study the quasi-hydrodynamics of a system with a softly broken $U(1)$ global symmetry using effective field theory (EFT) and holographic methods. In the gravity side, we consider a holographic Proca model in the limit of small bulk mass, which is responsible for a controllable explicit breaking of the $U(1)$ global symmetry in the boundary field theory. We perform a holographic Schwinger-Keldysh analysis, which allows us to derive the form of the boundary effective action in presence of dissipation. We compare our results with the previously proposed EFT and hydrodynamic theories, and we confirm their validity by computing the low-energy quasi-normal modes spectrum analytically and numerically. Additionally, we derive the broken holographic Ward identity for the $U(1)$ current, and discuss the recently proposed novel transport coefficients for systems with explicitly broken symmetries. The setup considered is expected to serve as a toy model for more realistic situations where quasi-hydrodynamics is at work, such as axial charge relaxation in QCD, spin relaxation in relativistic systems, electric field relaxation in magneto-hydrodynamics, or momentum relaxation in condensed matter systems. △ Less

Submitted 27 August, 2023; v1 submitted 27 April, 2023; originally announced April 2023.

Comments: v3: matches published version, v2: new subsection, references added, minor edits; v1: 44 pages, 4 figures

Report number: CPHT-RR017.042023

Journal ref: JHEP09(2023)019

arXiv:2304.11559 [pdf, other]

doi 10.1109/LWC.2023.3270423

Lightweight Machine Learning for Digital Cross-Link Interference Cancellation with RF Chain Characteristics in Flexible Duplex MIMO Systems

Authors: Jing-Sheng Tan, Shaoshi Yang, Kuo Meng, Jianhua Zhang, Yurong Tang, Yan Bu, Guizhen Wang

Abstract: The flexible duplex (FD) technique, including dynamic time-division duplex (D-TDD) and dynamic frequency-division duplex (D-FDD), is regarded as a promising solution to achieving a more flexible uplink/downlink transmission in 5G-Advanced or 6G mobile communication systems. However, it may introduce serious cross-link interference (CLI). For better mitigating the impact of CLI, we first present a… ▽ More The flexible duplex (FD) technique, including dynamic time-division duplex (D-TDD) and dynamic frequency-division duplex (D-FDD), is regarded as a promising solution to achieving a more flexible uplink/downlink transmission in 5G-Advanced or 6G mobile communication systems. However, it may introduce serious cross-link interference (CLI). For better mitigating the impact of CLI, we first present a more realistic base station (BS)-to-BS channel model incorporating the radio frequency (RF) chain characteristics, which exhibit a hardware-dependent nonlinear property, and hence the accuracy of conventional channel modelling is inadequate for CLI cancellation. Then, we propose a channel parameter estimation based polynomial CLI canceller and two machine learning (ML) based CLI cancellers that use the lightweight feedforward neural network (FNN). Our simulation results and analysis show that the ML based CLI cancellers achieve notable performance improvement and dramatic reduction of computational complexity, in comparison with the polynomial CLI canceller. △ Less

Submitted 23 April, 2023; originally announced April 2023.

Comments: 5 pages, 6 figures

arXiv:2303.01836 [pdf, other]

doi 10.3847/1538-3881/acc108

L dwarfs detection from SDSS images using improved Faster R-CNN

Authors: Zhi Cao, Zhenping Yi, Jingchang Pan, Hao Su, Yude Bu, Xiao Kong, Ali Luo

Abstract: We present a data-driven approach to automatically detect L dwarfs from Sloan Digital Sky Survey(SDSS) images using an improved Faster R-CNN framework based on deep learning. The established L dwarf automatic detection (LDAD) model distinguishes L dwarfs from other celestial objects and backgrounds in SDSS field images by learning the features of 387 SDSS images containing L dwarfs. Applying the L… ▽ More We present a data-driven approach to automatically detect L dwarfs from Sloan Digital Sky Survey(SDSS) images using an improved Faster R-CNN framework based on deep learning. The established L dwarf automatic detection (LDAD) model distinguishes L dwarfs from other celestial objects and backgrounds in SDSS field images by learning the features of 387 SDSS images containing L dwarfs. Applying the LDAD model to the SDSS images containing 93 labeled L dwarfs in the test set, we successfully detected 83 known L dwarfs with a recall rate of 89.25% for known L dwarfs. Several techniques are implemented in the LDAD model to improve its detection performance for L dwarfs,including the deep residual network and the feature pyramid network. As a result, the LDAD model outperforms the model of the original Faster R-CNN, whose recall rate of known L dwarfs is 80.65% for the same test set. The LDAD model was applied to detect L dwarfs from a larger validation set including 843 labeled L dwarfs, resulting in a recall rate of 94.42% for known L dwarfs. The newly identified candidates include L dwarfs, late M and T dwarfs, which were estimated from color (i-z) and spectral type relation. The contamination rates for the test candidates and validation candidates are 8.60% and 9.27%, respectively. The detection results indicate that our model is effective to search for L dwarfs from astronomical images. △ Less

Submitted 3 March, 2023; originally announced March 2023.

Comments: 12 pages, 10 figures, accepted to be published in AJ

arXiv:2302.11198 [pdf, other]

Estimating Stellar Parameters and Identifying Very Metal-poor Stars Using Convolutional Neural Networks for Low-resolution Spectra (R~200)

Authors: Tianmin Wu, Yude Bu, Jianhang Xie, Junchao Liang, Wei Liu, Zhenping Yi, Xiaoming Kong, Meng Liu

Abstract: Very metal-poor (VMP, [Fe/H]<-2.0) stars offer a wealth of information on the nature and evolution of elemental production in the early galaxy and universe. The upcoming China Space Station Telescope (CSST) will provide us with a large amount of spectroscopic data that may contain plenty of VMP stars, and thus it is crucial to determine the stellar atmospheric parameters ($T_{eff}$, $\log g$, and… ▽ More Very metal-poor (VMP, [Fe/H]<-2.0) stars offer a wealth of information on the nature and evolution of elemental production in the early galaxy and universe. The upcoming China Space Station Telescope (CSST) will provide us with a large amount of spectroscopic data that may contain plenty of VMP stars, and thus it is crucial to determine the stellar atmospheric parameters ($T_{eff}$, $\log g$, and [Fe/H]) for low-resolution spectra similar to the CSST spectra (R~200). In this paper, a two-dimensional Convolutional Neural Network (CNN) model with three convolutional layers and two fully connected layers is constructed. The principal aim of this work is to measure the ability of this model to estimate stellar parameters on low-resolution (R~200) spectra and to identify VMP stars so that we can better search for VMP stars in the spectra observed by CSST.We mainly use 10,008 observed spectra of VMP stars from LAMOST DR3, and 16,638 spectra of common stars ([Fe/H]>-2.0) from LAMOST DR8 for the experiment and make comparisons. All spectra are reduced to R~200 to match the resolution of the CSST and are preprocessed and collapsed into two-dimensional spectra for input to the CNN model. The results show that the MAE values are 99.40 K for $T_{eff}$, 0.22 dex for $\log g$, 0.14 dex for [Fe/H], and 0.26 dex for [C/Fe], respectively. Besides, the CNN model efficiently identifies VMP stars with a precision of 94.77%. The validation and practicality of this model are also tested on the MARCS synthetic spectra. This paper powerfully demonstrates the effectiveness of the proposed CNN model in estimating stellar parameters for low-resolution spectra (R~200) and recognizing VMP stars that are of interest for stellar population and galactic evolution work. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 13 pages, 9 figures

arXiv:2302.08077 [pdf, other]

Group Fairness with Uncertainty in Sensitive Attributes

Authors: Abhin Shah, Maohao Shen, Jongha Jon Ryu, Subhro Das, Prasanna Sattigeri, Yuheng Bu, Gregory W. Wornell

Abstract: Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty.… ▽ More Learning a fair predictive model is crucial to mitigate biased decisions against minority groups in high-stakes applications. A common approach to learn such a model involves solving an optimization problem that maximizes the predictive power of the model under an appropriate group fairness constraint. However, in practice, sensitive attributes are often missing or noisy resulting in uncertainty. We demonstrate that solely enforcing fairness constraints on uncertain sensitive attributes can fall significantly short in achieving the level of fairness of models trained without uncertainty. To overcome this limitation, we propose a bootstrap-based algorithm that achieves the target level of fairness despite the uncertainty in sensitive attributes. The algorithm is guided by a Gaussian analysis for the independence notion of fairness where we propose a robust quadratically constrained quadratic problem to ensure a strict fairness guarantee with uncertain sensitive attributes. Our algorithm is applicable to both discrete and continuous sensitive attributes and is effective in real-world classification and regression tasks for various group fairness notions, e.g., independence and separation. △ Less

Submitted 7 June, 2023; v1 submitted 15 February, 2023; originally announced February 2023.

arXiv:2302.03242 [pdf, other]

doi 10.1145/3581783.3612426

Combating Online Misinformation Videos: Characterization, Detection, and Future Directions

Authors: Yuyan Bu, Qiang Sheng, Juan Cao, Peng Qi, Danding Wang, Jintao Li

Abstract: With information consumption via online video streaming becoming increasingly popular, misinformation video poses a new threat to the health of the online information ecosystem. Though previous studies have made much progress in detecting misinformation in text and image formats, video-based misinformation brings new and unique challenges to automatic detection systems: 1) high information heterog… ▽ More With information consumption via online video streaming becoming increasingly popular, misinformation video poses a new threat to the health of the online information ecosystem. Though previous studies have made much progress in detecting misinformation in text and image formats, video-based misinformation brings new and unique challenges to automatic detection systems: 1) high information heterogeneity brought by various modalities, 2) blurred distinction between misleading video manipulation and nonmalicious artistic video editing, and 3) new patterns of misinformation propagation due to the dominant role of recommendation systems on online video platforms. To facilitate research on this challenging task, we conduct this survey to present advances in misinformation video detection. We first analyze and characterize the misinformation video from three levels including signals, semantics, and intents. Based on the characterization, we systematically review existing works for detection from features of various modalities to techniques for clue integration. We also introduce existing resources including representative datasets and useful tools. Besides summarizing existing studies, we discuss related areas and outline open issues and future directions to encourage and guide more research on misinformation video detection. The corresponding repository is at https://github.com/ICTMCG/Awesome-Misinfo-Video-Detection. △ Less

Submitted 6 August, 2023; v1 submitted 6 February, 2023; originally announced February 2023.

Comments: Accepted at ACM Multimedia 2023 (MM 2023). 11 pages, 4 figures, and 89 references

arXiv:2301.06703 [pdf, other]

Non-Gaussianity from Schwinger-Keldysh Effective Field Theory

Authors: Shu Lin, Yanyan Bu, Chang Lei

Abstract: We present a systematic treatment of non-Gaussianity in stochastic systems using the Schwinger-Keldysh effective field theory framework, in which the non-Gaussianity is realized as nonlinear terms in the fluctuation field. We establish two stochastic formulations of the Schwinger-Keldysh effective field theory, with those nonlinear terms manifested as multiple non-Gaussian noises in the Langevin e… ▽ More We present a systematic treatment of non-Gaussianity in stochastic systems using the Schwinger-Keldysh effective field theory framework, in which the non-Gaussianity is realized as nonlinear terms in the fluctuation field. We establish two stochastic formulations of the Schwinger-Keldysh effective field theory, with those nonlinear terms manifested as multiple non-Gaussian noises in the Langevin equation and as higher order diffusive terms in the Fokker-Planck equation. The equivalence of the stochastic formulations with the original Schwinger-Keldysh effective field theory is demonstrated with non-trivial examples for arbitrary non-Gaussian parameters. The stochastic formulations will be more flexible and effective in studying non-equilibrium dynamics. We also reveal an ambiguity when coarse-graining time scale and non-Gaussian parameters vanish simultaneously, which may be responsible for the unphysical divergence found in perturbative analysis. △ Less

Submitted 14 February, 2024; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: 9 pages, 8 figures, published version

arXiv:2212.10072 [pdf, other]

Identifying hot subdwarf stars from photometric data using Gaussian mixture model and graph neural network

Authors: Wei Liu, Yude Bu, Xiaoming Kong, Zhenping Yi, Meng Liu

Abstract: Hot subdwarf stars are very important for understanding stellar evolution, stellar astrophysics, and binary star systems. Identifying more such stars can help us better understand their statistical distribution, properties, and evolution. In this paper, we present a new method to search for hot subdwarf stars in photometric data (b, y, g, r, i, z) using a machine learning algorithm, graph neural n… ▽ More Hot subdwarf stars are very important for understanding stellar evolution, stellar astrophysics, and binary star systems. Identifying more such stars can help us better understand their statistical distribution, properties, and evolution. In this paper, we present a new method to search for hot subdwarf stars in photometric data (b, y, g, r, i, z) using a machine learning algorithm, graph neural network, and Gaussian mixture model. We use a Gaussian mixture model and Markov distance to build the graph structure, and on the graph structure, we use a graph neural network to identify hot subdwarf stars from 86 084 stars, when the recall, precision, and f1 score are maximized on the original, weight and synthetic minority oversampling technique datasets. Finally, from 21 885 candidates, we selected approximately 6 000 stars that were the most similar to the hot subdwarf star. △ Less

Submitted 20 December, 2022; originally announced December 2022.

arXiv:2212.07359 [pdf, other]

Post-hoc Uncertainty Learning using a Dirichlet Meta-Model

Authors: Maohao Shen, Yuheng Bu, Prasanna Sattigeri, Soumya Ghosh, Subhro Das, Gregory Wornell

Abstract: It is known that neural networks have the problem of being over-confident when directly using the output label distribution to generate uncertainty measures. Existing methods mainly resolve this issue by retraining the entire model to impose the uncertainty quantification capability so that the learned model can achieve desired performance in accuracy and uncertainty prediction simultaneously. How… ▽ More It is known that neural networks have the problem of being over-confident when directly using the output label distribution to generate uncertainty measures. Existing methods mainly resolve this issue by retraining the entire model to impose the uncertainty quantification capability so that the learned model can achieve desired performance in accuracy and uncertainty prediction simultaneously. However, training the model from scratch is computationally expensive and may not be feasible in many situations. In this work, we consider a more practical post-hoc uncertainty learning setting, where a well-trained base model is given, and we focus on the uncertainty quantification task at the second stage of training. We propose a novel Bayesian meta-model to augment pre-trained models with better uncertainty quantification abilities, which is effective and computationally efficient. Our proposed method requires no additional training data and is flexible enough to quantify different uncertainties and easily adapt to different application settings, including out-of-domain data detection, misclassification detection, and trustworthy transfer learning. We demonstrate our proposed meta-model approach's flexibility and superior empirical performance on these applications over multiple representative image classification benchmarks. △ Less

Submitted 14 December, 2022; originally announced December 2022.

Comments: Accepted by AAAI 2023

arXiv:2211.10973 [pdf, other]

FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms

Authors: Peng Qi, Yuyan Bu, Juan Cao, Wei Ji, Ruihao Shui, Junbin Xiao, Danding Wang, Tat-Seng Chua

Abstract: Short video platforms have become an important channel for news sharing, but also a new breeding ground for fake news. To mitigate this problem, research of fake news video detection has recently received a lot of attention. Existing works face two roadblocks: the scarcity of comprehensive and largescale datasets and insufficient utilization of multimodal information. Therefore, in this paper, we… ▽ More Short video platforms have become an important channel for news sharing, but also a new breeding ground for fake news. To mitigate this problem, research of fake news video detection has recently received a lot of attention. Existing works face two roadblocks: the scarcity of comprehensive and largescale datasets and insufficient utilization of multimodal information. Therefore, in this paper, we construct the largest Chinese short video dataset about fake news named FakeSV, which includes news content, user comments, and publisher profiles simultaneously. To understand the characteristics of fake news videos, we conduct exploratory analysis of FakeSV from different perspectives. Moreover, we provide a new multimodal detection model named SV-FEND, which exploits the cross-modal correlations to select the most informative features and utilizes the social context information for detection. Extensive experiments evaluate the superiority of the proposed method and provide detailed comparisons of different methods and modalities for future works. △ Less

Submitted 2 December, 2022; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: To appear in AAAI 2023 AISI track. This version contains appendix with additional details

arXiv:2211.06777 [pdf, other]

doi 10.3847/1538-3881/aca323

Searching for Barium Stars from the LAMOST Spectra Using the Machine Learning Method: I

Authors: Fengyue Guo, Zhongding Cheng, Xiaoming Kong, Yatao Zhang, Yude Bu, Zhenping Yi, Bing Du, Jingchang Pan

Abstract: Barium stars are chemically peculiar stars that exhibit enhancement of s-process elements. Chemical abundance analysis of barium stars can provide crucial clues for the study of the chemical evolution of the Galaxy. The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) has released more than 6 million low-resolution spectra of FGK-type stars by Data Release 9 (DR9), which can sign… ▽ More Barium stars are chemically peculiar stars that exhibit enhancement of s-process elements. Chemical abundance analysis of barium stars can provide crucial clues for the study of the chemical evolution of the Galaxy. The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) has released more than 6 million low-resolution spectra of FGK-type stars by Data Release 9 (DR9), which can significantly increase the sample size of barium stars. In this paper, we used machine learning algorithms to search for barium stars from low-resolution spectra of LAMOST. We have applied the Light Gradient Boosting Machine (LGBM) algorithm to build classifiers of barium stars based on different features, and build predictors for determining [Ba/Fe] and [Sr/Fe] of barium candidates. The classification with features in the whole spectrum performs best: for the sample with strontium enhancement, Precision = 97.81%, and Recall = 96.05%; for the sample with barium enhancement, Precision = 96.03% and Recall = 97.70%. In prediction, [Ba/Fe] estimated from BaII line at 4554 Å has smaller dispersion than that from BaII line at 4934 Å: MAE$_{4554 Å}$ = 0.07, $σ_{4554 Å}$ = 0.12. [Sr/Fe] estimated from SrII line at 4077 Å performs better than that from SrII line at 4215 Å: MAE$_{4077 Å}$ = 0.09, $σ_{4077 Å}$ = 0.16. A comparison of the LGBM and other popular algorithms shows that LGBM is accurate and efficient in classifying barium stars. This work demonstrated that machine learning can be used as an effective means to identify chemically peculiar stars and determine their elemental abundance. △ Less

Submitted 12 November, 2022; originally announced November 2022.

Comments: 13 pages, 6 figures

arXiv:2211.06608 [pdf, other]

doi 10.3847/1538-3881/aca098

Li-rich Giants Identified from LAMOST DR8 Low-Resolution Survey

Authors: BeiChen Cai, XiaoMing Kong, JianRong Shi, Qi Gao, Yude Bu, Zhenping Yi

Abstract: A small fraction of giants possess photospheric lithium(Li) abundance higher than the value predicted by the standard stellar evolution models, and the detailed mechanisms of Li enhancement are complicated and lack a definite conclusion. In order to better understand the Li enhancement behaviors, a large and homogeneous Li-rich giants sample is needed. In this study, we designed a modified convolu… ▽ More A small fraction of giants possess photospheric lithium(Li) abundance higher than the value predicted by the standard stellar evolution models, and the detailed mechanisms of Li enhancement are complicated and lack a definite conclusion. In order to better understand the Li enhancement behaviors, a large and homogeneous Li-rich giants sample is needed. In this study, we designed a modified convolutional neural network model called Coord-DenseNet to determine the A(Li) of Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) low-resolution survey (LRS) giant spectra. The precision is good on the test set: MAE=0.15 dex, and σ=0.21 dex. We used this model to predict the Li abundance of more than 900,000 LAMOST DR8 LRS giant spectra and identified 7,768 Li-rich giants with Li abundances ranging from 2.0 to 5.4 dex, accounting for about 1.02% of all giants. We compared the Li abundance estimated by our work with those derived from high-resolution spectra. We found that the consistency was good if the overall deviation of 0.27 dex between them was not considered. The analysis shows that the difference is mainly due to the high A(Li) from the medium-resolution spectra in the training set. This sample of Li-rich giants dramatically expands the existing sample size of Li-rich giants and provides us with more samples to further study the formation and evolution of Li-rich giants. △ Less

Submitted 12 November, 2022; originally announced November 2022.

Comments: 14 pages,13 figures

arXiv:2210.09864 [pdf, ps, other]

Information-theoretic Characterizations of Generalization Error for the Gibbs Algorithm

Authors: Gholamali Aminian, Yuheng Bu, Laura Toni, Miguel R. D. Rodrigues, Gregory W. Wornell

Abstract: Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the wel… ▽ More Various approaches have been developed to upper bound the generalization error of a supervised learning algorithm. However, existing bounds are often loose and even vacuous when evaluated in practice. As a result, they may fail to characterize the exact generalization ability of a learning algorithm. Our main contributions are exact characterizations of the expected generalization error of the well-known Gibbs algorithm (a.k.a. Gibbs posterior) using different information measures, in particular, the symmetrized KL information between the input training samples and the output hypothesis. Our result can be applied to tighten existing expected generalization error and PAC-Bayesian bounds. Our information-theoretic approach is versatile, as it also characterizes the generalization error of the Gibbs algorithm with a data-dependent regularizer and that of the Gibbs algorithm in the asymptotic regime, where it converges to the standard empirical risk minimization algorithm. Of particular relevance, our results highlight the role the symmetrized KL information plays in controlling the generalization error of the Gibbs algorithm. △ Less

Submitted 18 October, 2022; originally announced October 2022.

Comments: under review. arXiv admin note: text overlap with arXiv:2107.13656, arXiv:2111.01635

arXiv:2210.08188 [pdf, ps, other]

How Does Pseudo-Labeling Affect the Generalization Error of the Semi-Supervised Gibbs Algorithm?

Authors: Haiyun He, Gholamali Aminian, Yuheng Bu, Miguel Rodrigues, Vincent Y. F. Tan

Abstract: We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm. The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset. Distribution-free upper and lower bounds on the gen-error can also be obtained.… ▽ More We provide an exact characterization of the expected generalization error (gen-error) for semi-supervised learning (SSL) with pseudo-labeling via the Gibbs algorithm. The gen-error is expressed in terms of the symmetrized KL information between the output hypothesis, the pseudo-labeled dataset, and the labeled dataset. Distribution-free upper and lower bounds on the gen-error can also be obtained. Our findings offer new insights that the generalization performance of SSL with pseudo-labeling is affected not only by the information between the output hypothesis and input training data but also by the information {\em shared} between the {\em labeled} and {\em pseudo-labeled} data samples. This serves as a guideline to choose an appropriate pseudo-labeling method from a given family of methods. To deepen our understanding, we further explore two examples -- mean estimation and logistic regression. In particular, we analyze how the ratio of the number of unlabeled to labeled data $λ$ affects the gen-error under both scenarios. As $λ$ increases, the gen-error for mean estimation decreases and then saturates at a value larger than when all the samples are labeled, and the gap can be quantified {\em exactly} with our analysis, and is dependent on the \emph{cross-covariance} between the labeled and pseudo-labeled data samples. For logistic regression, the gen-error and the variance component of the excess risk also decrease as $λ$ increases. △ Less

Submitted 15 June, 2023; v1 submitted 15 October, 2022; originally announced October 2022.

Comments: 30 pages, 4 figures

arXiv:2210.07408 [pdf, other]

doi 10.1021/acsaelm.2c01176

High-strain-induced local modification of the electronic properties of VO$_2$ thin films

Authors: Yorick A. Birkhölzer, Kai Sotthewes, Nicolas Gauquelin, Lars Riekehr, Daen Jannis, Emma van der Minne, Yibin Bu, Johan Verbeeck, Harold J. W. Zandvliet, Gertjan Koster, Guus Rijnders

Abstract: Vanadium dioxide (VO2) is a popular candidate for electronic and optical switching applications due to its well-known semiconductor-metal transition. Its study is notoriously challenging due to the interplay of long and short range elastic distortions, as well as the symmetry change, and the electronic structure changes. The inherent coupling of lattice and electronic degrees of freedom opens the… ▽ More Vanadium dioxide (VO2) is a popular candidate for electronic and optical switching applications due to its well-known semiconductor-metal transition. Its study is notoriously challenging due to the interplay of long and short range elastic distortions, as well as the symmetry change, and the electronic structure changes. The inherent coupling of lattice and electronic degrees of freedom opens the avenue towards mechanical actuation of single domains. In this work, we show that we can manipulate and monitor the reversible semiconductor-to-metal transition of VO2 while applying a controlled amount of mechanical pressure by a nanosized metallic probe using an atomic force microscope. At a critical pressure, we can reversibly actuate the phase transition with a large modulation of the conductivity. Direct tunneling through the VO2-metal contact is observed as the main charge carrier injection mechanism before and after the phase transition of VO2. The tunneling barrier is formed by a very thin but persistently insulating surfacelayer of the VO2. The necessary pressure to induce the transition decreases with temperature. In addition, we measured the phase coexistence line in a hitherto unexplored regime. Our study provides valuable information on pressure-induced electronic modifications of the VO2 properties, as well as on nanoscale metal-oxide contacts, which can help in the future design of oxide electronics. △ Less

Submitted 13 October, 2022; originally announced October 2022.

Comments: Y.A.B. and K.S. contributed equally. 30 pages, 4 figures, Supplemental Material (28 pages, 16 figures)

Journal ref: ACS Applied Electronic Materials 2022

arXiv:2210.02274 [pdf, other]

Nonlinear effective dynamics of Brownian particle in magnetized plasma

Authors: Yanyan Bu, Biye Zhang, Jingbo Zhang

Abstract: An effective description is presented for a Brownian particle in a magnetized plasma. In order to systematically capture various corrections to linear Langevin equation, we construct effective action for the Brownian particle, to quartic order in its position. The effective action is first derived within non-equilibrium effective field theory formalism, and then confirmed via a microscopic hologra… ▽ More An effective description is presented for a Brownian particle in a magnetized plasma. In order to systematically capture various corrections to linear Langevin equation, we construct effective action for the Brownian particle, to quartic order in its position. The effective action is first derived within non-equilibrium effective field theory formalism, and then confirmed via a microscopic holographic model consisting of an open string probing magnetic AdS$_5$ black brane. For practical usage, the non-Gaussian effective action is converted into Fokker-Planck type equation, which is an Euclidean analog of Schr$\ddot{\rm o}$dinger equation and describes time evolution of probability distribution for particle's position and velocity. △ Less

Submitted 5 October, 2022; originally announced October 2022.

Comments: 33 pages, 8figures

arXiv:2209.04871 [pdf, other]

doi 10.1109/GLOBECOM48099.2022.10001513

Data-Driven Blind Synchronization and Interference Rejection for Digital Communication Signals

Authors: Alejandro Lancho, Amir Weiss, Gary C. F. Lee, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

Abstract: We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separati… ▽ More We study the potential of data-driven deep learning methods for separation of two communication signals from an observation of their mixture. In particular, we assume knowledge on the generation process of one of the signals, dubbed signal of interest (SOI), and no knowledge on the generation process of the second signal, referred to as interference. This form of the single-channel source separation problem is also referred to as interference rejection. We show that capturing high-resolution temporal structures (nonstationarities), which enables accurate synchronization to both the SOI and the interference, leads to substantial performance gains. With this key insight, we propose a domain-informed neural network (NN) design that is able to improve upon both "off-the-shelf" NNs and classical detection and interference rejection methods, as demonstrated in our simulations. Our findings highlight the key role communication-specific domain knowledge plays in the development of data-driven approaches that hold the promise of unprecedented gains. △ Less

Submitted 11 September, 2022; originally announced September 2022.

Comments: 9 pages, 6 figures, accepted at IEEE GLOBECOM 2022 (this version contains extended proofs)

arXiv:2209.03690 [pdf]

doi 10.1108/EL-11-2020-0324

Exploring the Distribution Regularities of User Attention and Sentiment toward Product Aspects in Online Reviews

Authors: Chenglei Qin, Chengzhi Zhang, Yi Bu

Abstract: [Purpose] To better understand the online reviews and help potential consumers, businessmen, and product manufacturers effectively obtain users' evaluation on product aspects, this paper explores the distribution regularities of user attention and sentiment toward product aspects from the temporal perspective of online reviews. [Design/methodology/approach] Temporal characteristics of online revie… ▽ More [Purpose] To better understand the online reviews and help potential consumers, businessmen, and product manufacturers effectively obtain users' evaluation on product aspects, this paper explores the distribution regularities of user attention and sentiment toward product aspects from the temporal perspective of online reviews. [Design/methodology/approach] Temporal characteristics of online reviews (purchase time, review time, and time intervals between purchase time and review time), similar attributes clustering, and attribute-level sentiment computing technologies are employed based on more than 340k smartphone reviews of three products from JD.COM (a famous online shopping platform in China) to explore the distribution regularities of user attention and sentiment toward product aspects in this article. [Findings] The empirical results show that a power-law distribution can fit user attention to product aspects, and the reviews posted in short time intervals contain more product aspects. Besides, the results show that the values of user sentiment of product aspects are significantly higher/lower in short time intervals which contribute to judging the advantages and weaknesses of a product. [Research limitations] The paper can't acquire online reviews for more products with temporal characteristics to verify the findings because of the restriction on reviews crawling by the shopping platforms. [Originality/value] This work reveals the distribution regularities of user attention and sentiment toward product aspects, which is of great significance in assisting decision-making, optimizing review presentation, and improving the shopping experience. △ Less

Submitted 8 September, 2022; originally announced September 2022.

arXiv:2208.10325 [pdf, other]

doi 10.1109/MLSP55214.2022.9943311

Exploiting Temporal Structures of Cyclostationary Signals for Data-Driven Single-Channel Source Separation

Authors: Gary C. F. Lee, Amir Weiss, Alejandro Lancho, Jennifer Tang, Yuheng Bu, Yury Polyanskiy, Gregory W. Wornell

Abstract: We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian cons… ▽ More We study the problem of single-channel source separation (SCSS), and focus on cyclostationary signals, which are particularly suitable in a variety of application domains. Unlike classical SCSS approaches, we consider a setting where only examples of the sources are available rather than their models, inspiring a data-driven approach. For source models with underlying cyclostationary Gaussian constituents, we establish a lower bound on the attainable mean squared error (MSE) for any separation method, model-based or data-driven. Our analysis further reveals the operation for optimal separation and the associated implementation challenges. As a computationally attractive alternative, we propose a deep learning approach using a U-Net architecture, which is competitive with the minimum MSE estimator. We demonstrate in simulation that, with suitable domain-informed architectural choices, our U-Net method can approach the optimal performance with substantially reduced computational burden. △ Less

Submitted 22 August, 2022; originally announced August 2022.

Showing 1–50 of 132 results for author: Bu, Y