subscribe to arXiv mailings

HRDE: Retrieval-Augmented Large Language Models for Chinese Health Rumor Detection and Explainability

Authors: Yanfang Chen, Ding Chen, Shichao Song, Simin Niu, Hanyu Wang, Zeyun Tang, Feiyu Xiong, Zhiyu Li

Abstract: As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-so… ▽ More As people increasingly prioritize their health, the speed and breadth of health information dissemination on the internet have also grown. At the same time, the presence of false health information (health rumors) intermingled with genuine content poses a significant potential threat to public health. However, current research on Chinese health rumors still lacks a large-scale, public, and open-source dataset of health rumor information, as well as effective and reliable rumor detection methods. This paper addresses this gap by constructing a dataset containing 1.12 million health-related rumors (HealthRCN) through web scraping of common health-related questions and a series of data processing steps. HealthRCN is the largest known dataset of Chinese health information rumors to date. Based on this dataset, we propose retrieval-augmented large language models for Chinese health rumor detection and explainability (HRDE). This model leverages retrieved relevant information to accurately determine whether the input health information is a rumor and provides explanatory responses, effectively aiding users in verifying the authenticity of health information. In evaluation experiments, we compared multiple models and found that HRDE outperformed them all, including GPT-4-1106-Preview, in rumor detection accuracy and answer quality. HRDE achieved an average accuracy of 91.04% and an F1 score of 91.58%. △ Less

Submitted 3 July, 2024; v1 submitted 30 June, 2024; originally announced July 2024.

arXiv:2406.19528 [pdf, other]

Using Large Language Models to Assist Video Content Analysis: An Exploratory Study of Short Videos on Depression

Authors: Jiaying Liu, Yunlong Wang, Yao Lyu, Yiheng Su, Shuo Niu, Xuhai Orson Xu, Yan Zhang

Abstract: Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt eng… ▽ More Despite the growing interest in leveraging Large Language Models (LLMs) for content analysis, current studies have primarily focused on text-based content. In the present work, we explored the potential of LLMs in assisting video content analysis by conducting a case study that followed a new workflow of LLM-assisted multimodal content analysis. The workflow encompasses codebook design, prompt engineering, LLM processing, and human evaluation. We strategically crafted annotation prompts to get LLM Annotations in structured form and explanation prompts to generate LLM Explanations for a better understanding of LLM reasoning and transparency. To test LLM's video annotation capabilities, we analyzed 203 keyframes extracted from 25 YouTube short videos about depression. We compared the LLM Annotations with those of two human coders and found that LLM has higher accuracy in object and activity Annotations than emotion and genre Annotations. Moreover, we identified the potential and limitations of LLM's capabilities in annotating videos. Based on the findings, we explore opportunities and challenges for future research and improvements to the workflow. We also discuss ethical concerns surrounding future studies based on LLM-assisted video analysis. △ Less

Submitted 4 July, 2024; v1 submitted 27 June, 2024; originally announced June 2024.

Comments: 6 pages, 2 figures, under review in CSCW 24

arXiv:2406.09684 [pdf, other]

Explainable AI for Comparative Analysis of Intrusion Detection Models

Authors: Pap M. Corea, Yongxin Liu, Jian Wang, Shuteng Niu, Houbing Song

Abstract: Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class class… ▽ More Explainable Artificial Intelligence (XAI) has become a widely discussed topic, the related technologies facilitate better understanding of conventional black-box models like Random Forest, Neural Networks and etc. However, domain-specific applications of XAI are still insufficient. To fill this gap, this research analyzes various machine learning models to the tasks of binary and multi-class classification for intrusion detection from network traffic on the same dataset using occlusion sensitivity. The models evaluated include Linear Regression, Logistic Regression, Linear Support Vector Machine (SVM), K-Nearest Neighbors (KNN), Random Forest, Decision Trees, and Multi-Layer Perceptrons (MLP). We trained all models to the accuracy of 90\% on the UNSW-NB15 Dataset. We found that most classifiers leverage only less than three critical features to achieve such accuracies, indicating that effective feature engineering could actually be far more important for intrusion detection than applying complicated models. We also discover that Random Forest provides the best performance in terms of accuracy, time efficiency and robustness. Data and code available at https://github.com/pcwhy/XML-IntrusionDetection.git △ Less

Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

Comments: Submitted to IEEE MeditCom 2024 - WS-05

arXiv:2405.19682 [pdf, other]

Fully Test-Time Adaptation for Monocular 3D Object Detection

Authors: Hongbin Lin, Yifan Zhang, Shuaicheng Niu, Shuguang Cui, Zhen Li

Abstract: Monocular 3D object detection (Mono 3Det) aims to identify 3D objects from a single RGB image. However, existing methods often assume training and test data follow the same distribution, which may not hold in real-world test scenarios. To address the out-of-distribution (OOD) problems, we explore a new adaptation paradigm for Mono 3Det, termed Fully Test-time Adaptation. It aims to adapt a well-tr… ▽ More Monocular 3D object detection (Mono 3Det) aims to identify 3D objects from a single RGB image. However, existing methods often assume training and test data follow the same distribution, which may not hold in real-world test scenarios. To address the out-of-distribution (OOD) problems, we explore a new adaptation paradigm for Mono 3Det, termed Fully Test-time Adaptation. It aims to adapt a well-trained model to unlabeled test data by handling potential data distribution shifts at test time without access to training data and test labels. However, applying this paradigm in Mono 3Det poses significant challenges due to OOD test data causing a remarkable decline in object detection scores. This decline conflicts with the pre-defined score thresholds of existing detection methods, leading to severe object omissions (i.e., rare positive detections and many false negatives). Consequently, the limited positive detection and plenty of noisy predictions cause test-time adaptation to fail in Mono 3Det. To handle this problem, we propose a novel Monocular Test-Time Adaptation (MonoTTA) method, based on two new strategies. 1) Reliability-driven adaptation: we empirically find that high-score objects are still reliable and the optimization of high-score objects can enhance confidence across all detections. Thus, we devise a self-adaptive strategy to identify reliable objects for model adaptation, which discovers potential objects and alleviates omissions. 2) Noise-guard adaptation: since high-score objects may be scarce, we develop a negative regularization term to exploit the numerous low-score objects via negative learning, preventing overfitting to noise and trivial solutions. Experimental results show that MonoTTA brings significant performance gains for Mono 3Det models in OOD test scenarios, approximately 190% gains by average on KITTI and 198% gains on nuScenes. △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.16933 [pdf, other]

Empowering Large Language Models to Set up a Knowledge Retrieval Indexer via Self-Learning

Authors: Xun Liang, Simin Niu, Zhiyu li, Sensen Zhang, Shichao Song, Hanyu Wang, Jiawei Yang, Feiyu Xiong, Bo Tang, Chenyang Xi

Abstract: Retrieval-Augmented Generation (RAG) offers a cost-effective approach to injecting real-time knowledge into large language models (LLMs). Nevertheless, constructing and validating high-quality knowledge repositories require considerable effort. We propose a pre-retrieval framework named Pseudo-Graph Retrieval-Augmented Generation (PG-RAG), which conceptualizes LLMs as students by providing them wi… ▽ More Retrieval-Augmented Generation (RAG) offers a cost-effective approach to injecting real-time knowledge into large language models (LLMs). Nevertheless, constructing and validating high-quality knowledge repositories require considerable effort. We propose a pre-retrieval framework named Pseudo-Graph Retrieval-Augmented Generation (PG-RAG), which conceptualizes LLMs as students by providing them with abundant raw reading materials and encouraging them to engage in autonomous reading to record factual information in their own words. The resulting concise, well-organized mental indices are interconnected through common topics or complementary facts to form a pseudo-graph database. During the retrieval phase, PG-RAG mimics the human behavior in flipping through notes, identifying fact paths and subsequently exploring the related contexts. Adhering to the principle of the path taken by many is the best, it integrates highly corroborated fact paths to provide a structured and refined sub-graph assisting LLMs. We validated PG-RAG on three specialized question-answering datasets. In single-document tasks, PG-RAG significantly outperformed the current best baseline, KGP-LLaMA, across all key evaluation metrics, with an average overall performance improvement of 11.6%. Specifically, its BLEU score increased by approximately 14.3%, and the QE-F1 metric improved by 23.7%. In multi-document scenarios, the average metrics of PG-RAG were at least 2.35% higher than the best baseline. Notably, the BLEU score and QE-F1 metric showed stable improvements of around 7.55% and 12.75%, respectively. Our code: https://github.com/IAAR-Shanghai/PGRAG. △ Less

Submitted 27 May, 2024; originally announced May 2024.

arXiv:2405.05887 [pdf, ps, other]

Convergence Rates of Online Critic Value Function Approximation in Native Spaces

Authors: Shengyuan Niu, Ali Bouland, Haoran Wang, Filippos Fotiadis, Andrew Kurdila, Andrea L'Afflitto, Sai Tej Paruchuri, Kyriakos G. Vamvoudakis

Abstract: In this paper, the evolution equation that defines the online critic for the approximation of the optimal value function is cast in a general class of reproducing kernel Hilbert spaces (RKHSs). Exploiting some core tools of RKHS theory, this formulation allows deriving explicit bounds on the performance of the critic in terms of the kernel and definition of the RKHS, the number of basis functions,… ▽ More In this paper, the evolution equation that defines the online critic for the approximation of the optimal value function is cast in a general class of reproducing kernel Hilbert spaces (RKHSs). Exploiting some core tools of RKHS theory, this formulation allows deriving explicit bounds on the performance of the critic in terms of the kernel and definition of the RKHS, the number of basis functions, and the location of centers used to define scattered bases. The performance of the critic is precisely measured in terms of the power function of the scattered basis used in approximations, and it can be used either in an a priori evaluation of potential bases or in an a posteriori assessments of value function error for basis enrichment or pruning. The most concise bounds in the paper describe explicitly how the critic performance depends on the placement of centers, as measured by their fill distance in a subset that contains the trajectory of the critic. △ Less

Submitted 28 May, 2024; v1 submitted 9 May, 2024; originally announced May 2024.

arXiv:2405.00711 [pdf, other]

Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities

Authors: Xiaomin Yu, Yezhaohui Wang, Yanfang Chen, Zhen Tao, Dinghao Xi, Shichao Song, Simin Niu, Zhiyu Li

Abstract: In recent years, generative artificial intelligence models, represented by Large Language Models (LLMs) and Diffusion Models (DMs), have revolutionized content production methods. These artificial intelligence-generated content (AIGC) have become deeply embedded in various aspects of daily life and work. However, these technologies have also led to the emergence of Fake Artificial Intelligence Gen… ▽ More In recent years, generative artificial intelligence models, represented by Large Language Models (LLMs) and Diffusion Models (DMs), have revolutionized content production methods. These artificial intelligence-generated content (AIGC) have become deeply embedded in various aspects of daily life and work. However, these technologies have also led to the emergence of Fake Artificial Intelligence Generated Content (FAIGC), posing new challenges in distinguishing genuine information. It is crucial to recognize that AIGC technology is akin to a double-edged sword; its potent generative capabilities, while beneficial, also pose risks for the creation and dissemination of FAIGC. In this survey, We propose a new taxonomy that provides a more comprehensive breakdown of the space of FAIGC methods today. Next, we explore the modalities and generative technologies of FAIGC. We introduce FAIGC detection methods and summarize the related benchmark from various perspectives. Finally, we discuss outstanding challenges and promising areas for future research. △ Less

Submitted 3 May, 2024; v1 submitted 25 April, 2024; originally announced May 2024.

arXiv:2404.01650 [pdf, other]

Test-Time Model Adaptation with Only Forward Passes

Authors: Shuaicheng Niu, Chunyan Miao, Guohao Chen, Pengcheng Wu, Peilin Zhao

Abstract: Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavil… ▽ More Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts. However, in real-world scenarios, models are usually deployed on resource-limited devices, e.g., FPGAs, and are often quantized and hard-coded with non-modifiable parameters for acceleration. In light of this, existing methods are often infeasible since they heavily depend on computation-intensive backpropagation for model updating that may be not supported. To address this, we propose a test-time Forward-Optimization Adaptation (FOA) method. In FOA, we seek to solely learn a newly added prompt (as model's input) via a derivative-free covariance matrix adaptation evolution strategy. To make this strategy work stably under our online unsupervised setting, we devise a novel fitness function by measuring test-training statistic discrepancy and model prediction entropy. Moreover, we design an activation shifting scheme that directly tunes the model activations for shifted test samples, making them align with the source training domain, thereby further enhancing adaptation performance. Without using any backpropagation and altering model weights, FOA runs on quantized 8-bit ViT outperforms gradient-based TENT on full-precision 32-bit ViT, while achieving an up to 24-fold memory reduction on ImageNet-C. △ Less

Submitted 29 May, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 18 pages, 4 figures, 17 tables, accepted by International Conference on Machine Learning

arXiv:2403.11491 [pdf, other]

Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting

Authors: Mingkui Tan, Guohao Chen, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Peilin Zhao, Shuaicheng Niu

Abstract: Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can signi… ▽ More Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Although recent TTA has shown promising performance, we still face two key challenges: 1) prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications; 2) while existing TTA can significantly improve the test performance on out-of-distribution data, they often suffer from severe performance degradation on in-distribution data after TTA (known as forgetting). To this end, we have proposed an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples for test-time entropy minimization. To alleviate forgetting, EATA introduces a Fisher regularizer estimated from test samples to constrain important model parameters from drastic changes. However, in EATA, the adopted entropy loss consistently assigns higher confidence to predictions even for samples that are underlying uncertain, leading to overconfident predictions. To tackle this, we further propose EATA with Calibration (EATA-C) to separately exploit the reducible model uncertainty and the inherent data uncertainty for calibrated TTA. Specifically, we measure the model uncertainty by the divergence between predictions from the full network and its sub-networks, on which we propose a divergence loss to encourage consistent predictions instead of overconfident ones. To further recalibrate prediction confidence, we utilize the disagreement among predicted labels as an indicator of the data uncertainty, and then devise a min-max entropy regularizer to selectively increase and decrease prediction confidence for different samples. Experiments on image classification and semantic segmentation verify the effectiveness of our methods. △ Less

Submitted 18 March, 2024; originally announced March 2024.

Comments: 20 pages, 14 tables, 11 figures. arXiv admin note: substantial text overlap with arXiv:2204.02610

arXiv:2403.10206 [pdf, other]

A Data-Driven Approach for Mitigating Dark Current Noise and Bad Pixels in Complementary Metal Oxide Semiconductor Cameras for Space-based Telescopes

Authors: Peng Jia, Chao Lv, Yushan Li, Yongyang Sun, Shu Niu, Zhuoxiao Wang

Abstract: In recent years, there has been a gradual increase in the performance of Complementary Metal Oxide Semiconductor (CMOS) cameras. These cameras have gained popularity as a viable alternative to charge-coupled device (CCD) cameras in a wide range of applications. One particular application is the CMOS camera installed in small space telescopes. However, the limited power and spatial resources availa… ▽ More In recent years, there has been a gradual increase in the performance of Complementary Metal Oxide Semiconductor (CMOS) cameras. These cameras have gained popularity as a viable alternative to charge-coupled device (CCD) cameras in a wide range of applications. One particular application is the CMOS camera installed in small space telescopes. However, the limited power and spatial resources available on satellites present challenges in maintaining ideal observation conditions, including temperature and radiation environment. Consequently, images captured by CMOS cameras are susceptible to issues such as dark current noise and defective pixels. In this paper, we introduce a data-driven framework for mitigating dark current noise and bad pixels for CMOS cameras. Our approach involves two key steps: pixel clustering and function fitting. During pixel clustering step, we identify and group pixels exhibiting similar dark current noise properties. Subsequently, in the function fitting step, we formulate functions that capture the relationship between dark current and temperature, as dictated by the Arrhenius law. Our framework leverages ground-based test data to establish distinct temperature-dark current relations for pixels within different clusters. The cluster results could then be utilized to estimate the dark current noise level and detect bad pixels from real observational data. To assess the effectiveness of our approach, we have conducted tests using real observation data obtained from the Yangwang-1 satellite, equipped with a near-ultraviolet telescope and an optical telescope. The results show a considerable improvement in the detection efficiency of space-based telescopes. △ Less

Submitted 15 March, 2024; originally announced March 2024.

Comments: Accepted by the AJ, comments are welcome. The complete code could be downloaded from: DOI: 10.12149/101387

arXiv:2403.06927 [pdf]

Effective multiband synthetic four-wave mixing by cascading quadratic processes

Authors: Li Chen, Zheng Ge, Su-Jian Niu, Yin-Hai Li, Zhao-Qi-Zhi Han, Yue-Wei Song, Wu-Zhen Li, Ren-Hui Chen, Ming-Yuan Gao, Meng-Yu Xie, Zhi-Yuan Zhou, Bao-Sen Shi

Abstract: Four wave mixing (FWM) is an important way to generate supercontinuum and frequency combs in the mid-infrared band. Here, we obtain simultaneous synthetic FWM in the visible and mid-infrared bands by cascading quadratic nonlinear processes in a periodically poled lithium niobate crystal (PPLN), which has a 110dB(at 3000nm) higher conversion efficiency than the FWM directly generated by third-order… ▽ More Four wave mixing (FWM) is an important way to generate supercontinuum and frequency combs in the mid-infrared band. Here, we obtain simultaneous synthetic FWM in the visible and mid-infrared bands by cascading quadratic nonlinear processes in a periodically poled lithium niobate crystal (PPLN), which has a 110dB(at 3000nm) higher conversion efficiency than the FWM directly generated by third-order susceptibilities in bulk PPLN crystals. A general model of this process is developed that is in full agreement with the experimental verifications. The frequency difference between the new frequency components can be freely tuned by changing the frequency difference of the dual pump lasers. Furthermore, by increasing the conversion bandwidth and efficiency of the cascaded processes, it is feasible to generate frequency combs in three bands the visible, near-infrared and mid-infrared bands simultaneously through high-order cascaded processes. This work opens up a new avenue toward free-tuning multiband frequency comb generation with multi-octaves frequency spanning, which will have significant applications in fields such as mid-infrared gas sensing, lidar and precision spectroscopy. △ Less

Submitted 11 March, 2024; originally announced March 2024.

arXiv:2403.06039 [pdf, other]

doi 10.1145/3613905.3651057

A Preliminary Exploration of YouTubers' Use of Generative-AI in Content Creation

Authors: Yao Lyu, He Zhang, Shuo Niu, Jie Cai

Abstract: Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being appli… ▽ More Content creators increasingly utilize generative artificial intelligence (Gen-AI) on platforms such as YouTube, TikTok, Instagram, and various blogging sites to produce imaginative images, AI-generated videos, and articles using Large Language Models (LLMs). Despite its growing popularity, there remains an underexplored area concerning the specific domains where AI-generated content is being applied, and the methodologies content creators employ with Gen-AI tools during the creation process. This study initially explores this emerging area through a qualitative analysis of 68 YouTube videos demonstrating Gen-AI usage. Our research focuses on identifying the content domains, the variety of tools used, the activities performed, and the nature of the final products generated by Gen-AI in the context of user-generated content. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: Accepted at CHI LBW 2024

arXiv:2403.05391 [pdf, other]

Multi-qubit Dynamical Decoupling for Enhanced Crosstalk Suppression

Authors: Siyuan Niu, Aida Todri-Sanial, Nicholas T. Bronn

Abstract: Dynamical decoupling (DD) is one of the simplest error suppression methods, aiming to enhance the coherence of qubits in open quantum systems. Moreover, DD has demonstrated effectiveness in reducing coherent crosstalk, one major error source in near-term quantum hardware, which manifests from two types of interactions. Static crosstalk exists in various hardware platforms, including superconductor… ▽ More Dynamical decoupling (DD) is one of the simplest error suppression methods, aiming to enhance the coherence of qubits in open quantum systems. Moreover, DD has demonstrated effectiveness in reducing coherent crosstalk, one major error source in near-term quantum hardware, which manifests from two types of interactions. Static crosstalk exists in various hardware platforms, including superconductor and semiconductor qubits, by virtue of always-on qubit-qubit coupling. Additionally, driven crosstalk may occur as an unwanted drive term due to leakage from driven gates on other qubits. Here we explore a novel staggered DD protocol tailored for multi-qubit systems that suppresses the decoherence error and both types of coherent crosstalk. We develop two experimental setups - an "idle-idle" experiment in which two pairs of qubits undergo free evolution simultaneously and a "driven-idle" experiment in which one pair is continuously driven during the free evolution of the other pair. These experiments are performed on an IBM Quantum superconducting processor and demonstrate the significant impact of the staggered DD protocol in suppressing both types of coherent crosstalk. When compared to the standard DD sequences from state-of-the-art methodologies with the application of X2 sequences, our staggered DD protocol enhances circuit fidelity by 19.7% and 8.5%, respectively, in addressing these two crosstalk types. △ Less

Submitted 13 March, 2024; v1 submitted 8 March, 2024; originally announced March 2024.

arXiv:2403.01243 [pdf, other]

Simulating chiral spin liquids with fermionic Projected Entangled Paired States

Authors: Sasank Budaraju, Didier Poilblanc, Sen Niu

Abstract: Chiral Spin Liquids (CSL) based on spin-1/2 fermionic Projected Entangled Pair States (fPEPS) are considered on the square lattice. First, fPEPS approximants of Gutzwiller-projected Chern insulators (GPCI) are investigated by Variational Monte Carlo (VMC) techniques on finite size tori. We show that such fPEPS of finite bond dimension can correctly capture the topological properties of the chiral… ▽ More Chiral Spin Liquids (CSL) based on spin-1/2 fermionic Projected Entangled Pair States (fPEPS) are considered on the square lattice. First, fPEPS approximants of Gutzwiller-projected Chern insulators (GPCI) are investigated by Variational Monte Carlo (VMC) techniques on finite size tori. We show that such fPEPS of finite bond dimension can correctly capture the topological properties of the chiral spin liquid, as the exact GPCI, with the correct topological ground state degeneracy on the torus. Further, more general fPEPS are considered and optimized (on the infinite plane) to describe the CSL phase of a chiral frustrated Heisenberg antiferromagnet. The chiral modes are computed on the edge of a semi-infinite cylinder (of finite circumference) and shown to follow the predictions from Conformal Field Theory. In contrast to their bosonic analogs the (optimized) fPEPS do not suffer from the replication of the chiral edge mode in the odd topological sector. △ Less

Submitted 2 March, 2024; originally announced March 2024.

Comments: 7 pages, 5 figures

arXiv:2402.18957 [pdf, other]

Vibrational properties differ between halide and chalcogenide perovskite semiconductors, and it matters for optoelectronic performance

Authors: K. Ye, M. Menahem, T. Salzillo, F. Knoop, B. Zhao, S. Niu, O. Hellman, J. Ravichandran, R. Jaramillo, O. Yaffe

Abstract: We report a comparative study of temperature-dependent photoluminescence and structural dynamics of two perovskite semiconductors, the chalcogenide BaZrS$_3$ (BZS) and the halide CsPbBr$_3$ (CPB). These materials have similar crystal structures and direct band gaps, but we find that they have quite distinct optoelectronic and vibrational properties. Both materials exhibit thermally-activated non-r… ▽ More We report a comparative study of temperature-dependent photoluminescence and structural dynamics of two perovskite semiconductors, the chalcogenide BaZrS$_3$ (BZS) and the halide CsPbBr$_3$ (CPB). These materials have similar crystal structures and direct band gaps, but we find that they have quite distinct optoelectronic and vibrational properties. Both materials exhibit thermally-activated non-radiative recombination, but the non-radiative recombination rate in BZS is between two and four orders of magnitude faster than in CPB. Raman spectroscopy reveals that the effects of phonon anharmonicity are far more pronounced in CPB than in BZS. Further, although both materials feature a large dielectric response due to low-energy polar optical phonons, the phonons in CPB are substantially lower in energy than in BZS. Our results suggest that electron-phonon coupling in BZS is more effective at non-radiative recombination than in CPB, and that BZS may also have a substantially higher concentration of non-radiative recombination centers than CPB. The low defect concentration in CPB may be related to the ease of lattice reconfiguration, typified by anharmonic bonding. It remains to be seen to what extent these differences are inherent to the chalcogenide and halide perovskites and to what extent they can be affected by materials processing; comparing BZS single-crystals and thin films provides reason for optimism. △ Less

Submitted 14 April, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

Comments: Main text - 12 pages with 5 figures and 1 table. Supplemental text - 16 pages with 6 figures and 5 tables

arXiv:2402.17401 [pdf]

Quantum entanglement enabled ellipsometer for phase retardance measurement

Authors: Meng-Yu Xie, Su-Jian Niu, Yin-Hai Li, Zheng Ge, Ming-Yuan Gao, Zhao-Qi-Zhi Han, Ren-Hui Chen, Zhi-Yuan Zhou, Bao-Sen Shi

Abstract: An ellipsometer is a vital precision tool used for measuring optical parameters with wide applications in many fields, including accurate measurements in film thickness, optical constants, structural profiles, etc. However, the precise measurement of photosensitive materials meets huge obstacles because of the excessive input photons, therefore the requirement of enhancing detection accuracy under… ▽ More An ellipsometer is a vital precision tool used for measuring optical parameters with wide applications in many fields, including accurate measurements in film thickness, optical constants, structural profiles, etc. However, the precise measurement of photosensitive materials meets huge obstacles because of the excessive input photons, therefore the requirement of enhancing detection accuracy under low incident light intensity is an essential topic in the precision measurement. In this work, by combining a polarization-entangled photon source with a classical transmission-type ellipsometer, the quantum ellipsometer with the PSA (Polarizer-Sample-Analyzer) and the Senarmount method is constructed firstly to measure the phase retardation of the birefringent materials. The experimental results show that the accuracy can reach to nanometer scale at extremely low input intensity, and the stability are within 1% for all specimens tested with a compensator involved. Our work paves the way for precision measurement at low incident light intensity, with potential applications in measuring photosensitive materials, active-biological samples and other remote monitoring scenarios. △ Less

Submitted 27 February, 2024; originally announced February 2024.

Comments: 13 pages, 5 figures. This work has been submitted for possible publication

arXiv:2402.17316 [pdf, other]

Towards Robust and Efficient Cloud-Edge Elastic Model Adaptation via Selective Entropy Distillation

Authors: Yaofo Chen, Shuaicheng Niu, Yaowei Wang, Shoukai Xu, Hengjie Song, Mingkui Tan

Abstract: The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environment… ▽ More The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once deployed (at least for some period) due to the potential high cost of model adaptation for both the server and edge sides. However, in many real-world scenarios, the test environments may change dynamically (known as distribution shifts), which often results in degraded performance. Thus, one has to adapt the edge models promptly to attain promising performance. Moreover, with the increasing data collected at the edge, this paradigm also fails to further adapt the cloud model for better performance. To address these, we encounter two primary challenges: 1) the edge model has limited computation power and may only support forward propagation; 2) the data transmission budget between cloud and edge devices is limited in latency-sensitive scenarios. In this paper, we establish a Cloud-Edge Elastic Model Adaptation (CEMA) paradigm in which the edge models only need to perform forward propagation and the edge models can be adapted online. In our CEMA, to reduce the communication burden, we devise two criteria to exclude unnecessary samples from uploading to the cloud, i.e., dynamic unreliable and low-informative sample exclusion. Based on the uploaded samples, we update and distribute the affine parameters of normalization layers by distilling from the stronger foundation model to the edge model with a sample replay strategy. Extensive experimental results on ImageNet-C and ImageNet-R verify the effectiveness of our CEMA. △ Less

Submitted 6 June, 2024; v1 submitted 27 February, 2024; originally announced February 2024.

Comments: Published in ICLR 2024

arXiv:2401.17043 [pdf, other]

CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models

Authors: Yuanjie Lyu, Zhiyu Li, Simin Niu, Feiyu Xiong, Bo Tang, Wenjin Wang, Hao Wu, Huanyong Liu, Tong Xu, Enhong Chen

Abstract: Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, the evaluation of RAG systems is challenging, as existing benchmarks are limited in scope a… ▽ More Retrieval-Augmented Generation (RAG) is a technique that enhances the capabilities of large language models (LLMs) by incorporating external knowledge sources. This method addresses common LLM limitations, including outdated information and the tendency to produce inaccurate "hallucinated" content. However, the evaluation of RAG systems is challenging, as existing benchmarks are limited in scope and diversity. Most of the current benchmarks predominantly assess question-answering applications, overlooking the broader spectrum of situations where RAG could prove advantageous. Moreover, they only evaluate the performance of the LLM component of the RAG pipeline in the experiments, and neglect the influence of the retrieval component and the external knowledge database. To address these issues, this paper constructs a large-scale and more comprehensive benchmark, and evaluates all the components of RAG systems in various RAG application scenarios. Specifically, we have categorized the range of RAG applications into four distinct types-Create, Read, Update, and Delete (CRUD), each representing a unique use case. "Create" refers to scenarios requiring the generation of original, varied content. "Read" involves responding to intricate questions in knowledge-intensive situations. "Update" focuses on revising and rectifying inaccuracies or inconsistencies in pre-existing texts. "Delete" pertains to the task of summarizing extensive texts into more concise forms. For each of these CRUD categories, we have developed comprehensive datasets to evaluate the performance of RAG systems. We also analyze the effects of various components of the RAG system, such as the retriever, the context length, the knowledge base construction, and the LLM. Finally, we provide useful insights for optimizing the RAG technology for different scenarios. △ Less

Submitted 15 July, 2024; v1 submitted 30 January, 2024; originally announced January 2024.

Comments: 40 Pages

arXiv:2401.11671 [pdf, other]

RTA-Former: Reverse Transformer Attention for Polyp Segmentation

Authors: Zhikai Li, Murong Yi, Ali Uneri, Sihan Niu, Craig Jones

Abstract: Polyp segmentation is a key aspect of colorectal cancer prevention, enabling early detection and guiding subsequent treatments. Intelligent diagnostic tools, including deep learning solutions, are widely explored to streamline and potentially automate this process. However, even with many powerful network architectures, there still comes the problem of producing accurate edge segmentation. In this… ▽ More Polyp segmentation is a key aspect of colorectal cancer prevention, enabling early detection and guiding subsequent treatments. Intelligent diagnostic tools, including deep learning solutions, are widely explored to streamline and potentially automate this process. However, even with many powerful network architectures, there still comes the problem of producing accurate edge segmentation. In this paper, we introduce a novel network, namely RTA-Former, that employs a transformer model as the encoder backbone and innovatively adapts Reverse Attention (RA) with a transformer stage in the decoder for enhanced edge segmentation. The results of the experiments illustrate that RTA-Former achieves state-of-the-art (SOTA) performance in five polyp segmentation datasets. The strong capability of RTA-Former holds promise in improving the accuracy of Transformer-based polyp segmentation, potentially leading to better clinical decisions and patient outcomes. Our code is publicly available on GitHub. △ Less

Submitted 28 April, 2024; v1 submitted 21 January, 2024; originally announced January 2024.

Comments: The paper has been accepted by EMBC 2024

arXiv:2401.11669 [pdf]

An Improved Grey Wolf Optimization Algorithm for Heart Disease Prediction

Authors: Sihan Niu, Yifan Zhou, Zhikai Li, Shuyao Huang, Yujun Zhou

Abstract: This paper presents a unique solution to challenges in medical image processing by incorporating an adaptive curve grey wolf optimization (ACGWO) algorithm into neural network backpropagation. Neural networks show potential in medical data but suffer from issues like overfitting and lack of interpretability due to imbalanced and scarce data. Traditional Gray Wolf Optimization (GWO) also has its dr… ▽ More This paper presents a unique solution to challenges in medical image processing by incorporating an adaptive curve grey wolf optimization (ACGWO) algorithm into neural network backpropagation. Neural networks show potential in medical data but suffer from issues like overfitting and lack of interpretability due to imbalanced and scarce data. Traditional Gray Wolf Optimization (GWO) also has its drawbacks, such as a lack of population diversity and premature convergence. This paper addresses these problems by introducing an adaptive algorithm, enhancing the standard GWO with a sigmoid function. This algorithm was extensively compared to four leading algorithms using six well-known test functions, outperforming them effectively. Moreover, by utilizing the ACGWO, we increase the robustness and generalization of the neural network, resulting in more interpretable predictions. Applied to the publicly accessible Cleveland Heart Disease dataset, our technique surpasses ten other methods, achieving 86.8% accuracy, indicating its potential for efficient heart disease prediction in the clinical setting. △ Less

Submitted 21 January, 2024; originally announced January 2024.

arXiv:2311.15387 [pdf]

doi 10.1021/acsphotonics.3c01869

Micro-transfer-printed Thin film lithium niobate (TFLN)-on-Silicon Ring Modulator

Authors: Ying Tan, Shengpu Niu, Maximilien Billet, Nishant Singh, Margot Niels, Tom Vanackere, Joris Van Kerrebrouck, Gunther Roelkens, Bart Kuyken, Dries Van Thourhout

Abstract: Thin-film lithium niobate (TFLN) has a proven record of building high-performance electro-optical (EO) modulators. However, its CMOS incompatibility and the need for non-standard etching have consistently posed challenges in terms of scalability, standardization, and the complexity of integration. Heterogeneous integration comes to solve this key challenge. Micro-transfer printing of thin-film lit… ▽ More Thin-film lithium niobate (TFLN) has a proven record of building high-performance electro-optical (EO) modulators. However, its CMOS incompatibility and the need for non-standard etching have consistently posed challenges in terms of scalability, standardization, and the complexity of integration. Heterogeneous integration comes to solve this key challenge. Micro-transfer printing of thin-film lithium niobate brings TFLN to well-established silicon ecosystem by easy "pick and place", which showcases immense potential in constructing high-density, cost-effective, highly versatile heterogeneous integrated circuits. Here, we demonstrated for the first time a micro-transfer-printed thin film lithium niobate (TFLN)-on-silicon ring modulator, which is an important step towards dense integration of performant lithium niobate modulators with compact and scalable silicon circuity. The presented device exhibits an insertion loss of -1.5dB, extinction ratio of -37dB, electro-optical bandwidth of 16GHz and modulation rates up to 45Gps. △ Less

Submitted 26 November, 2023; originally announced November 2023.

Comments: 8 figures,10 pages. ACS Photonics 2024

Report number: DOI: 10.1021/acsphotonics.3c01869

arXiv:2311.15296 [pdf, other]

UHGEval: Benchmarking the Hallucination of Chinese Large Language Models via Unconstrained Generation

Authors: Xun Liang, Shichao Song, Simin Niu, Zhiyu Li, Feiyu Xiong, Bo Tang, Yezhaohui Wang, Dawei He, Peng Cheng, Zhonghao Wang, Haiying Deng

Abstract: Large language models (LLMs) have emerged as pivotal contributors in contemporary natural language processing and are increasingly being applied across a diverse range of industries. However, these large-scale probabilistic statistical models cannot currently ensure the requisite quality in professional content generation. These models often produce hallucinated text, compromising their practical… ▽ More Large language models (LLMs) have emerged as pivotal contributors in contemporary natural language processing and are increasingly being applied across a diverse range of industries. However, these large-scale probabilistic statistical models cannot currently ensure the requisite quality in professional content generation. These models often produce hallucinated text, compromising their practical utility in professional contexts. To assess the authentic reliability of LLMs in text generation, numerous initiatives have developed benchmark evaluations for hallucination phenomena. Nevertheless, these benchmarks frequently utilize constrained generation techniques due to cost and temporal constraints. These techniques encompass the use of directed hallucination induction and strategies that deliberately alter authentic text to produce hallucinations. These approaches are not congruent with the unrestricted text generation demanded by real-world applications. Furthermore, a well-established Chinese-language dataset dedicated to the evaluation of hallucinations in text generation is presently lacking. Consequently, we have developed an Unconstrained Hallucination Generation Evaluation (UHGEval) benchmark, designed to compile outputs produced with minimal restrictions by LLMs. Concurrently, we have established a comprehensive benchmark evaluation framework to aid subsequent researchers in undertaking scalable and reproducible experiments. We have also executed extensive experiments, evaluating prominent Chinese language models and the GPT series models to derive professional performance insights regarding hallucination challenges. △ Less

Submitted 23 May, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

Comments: Accepted by ACL 2024

arXiv:2311.13107 [pdf, other]

Powerful Quantum Circuit Resizing with Resource Efficient Synthesis

Authors: Siyuan Niu, Akel Hashim, Costin Iancu, Wibe Albert de Jong, Ed Younis

Abstract: In the noisy intermediate-scale quantum era, mid-circuit measurement and reset operations facilitate novel circuit optimization strategies by reducing a circuit's qubit count in a method called resizing. This paper introduces two such algorithms. The first one leverages gate-dependency rules to reduce qubit count by 61.6% or 45.3% when optimizing depth as well. Based on numerical instantiation and… ▽ More In the noisy intermediate-scale quantum era, mid-circuit measurement and reset operations facilitate novel circuit optimization strategies by reducing a circuit's qubit count in a method called resizing. This paper introduces two such algorithms. The first one leverages gate-dependency rules to reduce qubit count by 61.6% or 45.3% when optimizing depth as well. Based on numerical instantiation and synthesis, the second algorithm finds resizing opportunities in previously unresizable circuits via dependency rules and other state-of-the-art tools. This resizing algorithm reduces qubit count by 20.7% on average for these previously impossible-to-resize circuits. △ Less

Submitted 21 November, 2023; originally announced November 2023.

arXiv:2311.10952 [pdf, other]

NAS-ASDet: An Adaptive Design Method for Surface Defect Detection Network using Neural Architecture Search

Authors: Zhenrong Wang, Bin Li, Weifeng Li, Shuanlong Niu, Wang Miao, Tongzhi Niu

Abstract: Deep convolutional neural networks (CNNs) have been widely used in surface defect detection. However, no CNN architecture is suitable for all detection tasks and designing effective task-specific requires considerable effort. The neural architecture search (NAS) technology makes it possible to automatically generate adaptive data-driven networks. Here, we propose a new method called NAS-ASDet to a… ▽ More Deep convolutional neural networks (CNNs) have been widely used in surface defect detection. However, no CNN architecture is suitable for all detection tasks and designing effective task-specific requires considerable effort. The neural architecture search (NAS) technology makes it possible to automatically generate adaptive data-driven networks. Here, we propose a new method called NAS-ASDet to adaptively design network for surface defect detection. First, a refined and industry-appropriate search space that can adaptively adjust the feature distribution is designed, which consists of repeatedly stacked basic novel cells with searchable attention operations. Then, a progressive search strategy with a deep supervision mechanism is used to explore the search space faster and better. This method can design high-performance and lightweight defect detection networks with data scarcity in industrial scenarios. The experimental results on four datasets demonstrate that the proposed method achieves superior performance and a relatively lighter model size compared to other competitive methods, including both manual and NAS-based approaches. △ Less

Submitted 17 November, 2023; originally announced November 2023.

arXiv:2311.06787 [pdf, other]

Data-Driven Moving Horizon Estimation Using Bayesian Optimization

Authors: Qing Sun, Shuai Niu, Minrui Fei

Abstract: In this work, an innovative data-driven moving horizon state estimation is proposed for model dynamic-unknown systems based on Bayesian optimization. As long as the measurement data is received, a locally linear dynamics model can be obtained from one Bayesian optimization-based offline learning framework. Herein, the learned model is continuously updated iteratively based on the actual observed d… ▽ More In this work, an innovative data-driven moving horizon state estimation is proposed for model dynamic-unknown systems based on Bayesian optimization. As long as the measurement data is received, a locally linear dynamics model can be obtained from one Bayesian optimization-based offline learning framework. Herein, the learned model is continuously updated iteratively based on the actual observed data to approximate the actual system dynamic with the intent of minimizing the cost function of the moving horizon estimator until the desired performance is achieved. Meanwhile, the characteristics of Bayesian optimization can guarantee the closest approximation of the learned model to the actual system dynamic. Thus, one effective data-driven moving horizon estimator can be designed further on the basis of this learned model. Finally, the efficiency of the proposed state estimation algorithm is demonstrated by several numerical simulations. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 12 pages,3 figures

arXiv:2311.06766 [pdf]

Enhancing Control Performance through ESN-Based Model Compensation in MPC for Dynamic Systems

Authors: Shuai Niu, Qing Sun, Minrui Fei, Xuqian Ju

Abstract: Deriving precise system dynamic models through traditional numerical methods is often a challenging endeavor. The performance of Model Predictive Control is heavily contingent on the accuracy of the system dynamic model. Consequently, this study employs Echo State Networks to acquire knowledge of the unmodeled dynamic characteristics inherent in the system. This information is then integrated with… ▽ More Deriving precise system dynamic models through traditional numerical methods is often a challenging endeavor. The performance of Model Predictive Control is heavily contingent on the accuracy of the system dynamic model. Consequently, this study employs Echo State Networks to acquire knowledge of the unmodeled dynamic characteristics inherent in the system. This information is then integrated with the nominal model, functioning as a form of model compensation. The present paper introduces a control framework that combines ESN with MPC. By perpetually assimilating the disparities between the nominal and real models, control performance experiences augmentation. In a demonstrative example, a second order dynamic system is subjected to simulation. The outcomes conclusively evince that ESNbased MPC adeptly assimilates unmodeled dynamic attributes, thereby elevating the system control proficiency. △ Less

Submitted 12 November, 2023; originally announced November 2023.

Comments: 5 pages,3 figures,conference

arXiv:2310.19193 [pdf, other]

A Survey on Watching Social Issue Videos among YouTube and TikTok Users

Authors: Shuo Niu, Dilasha Shrestha, Abhisan Ghimire, Zhicong Lu

Abstract: The openness and influence of video-sharing platforms (VSPs) such as YouTube and TikTok attracted creators to share videos on various social issues. Although social issue videos (SIVs) affect public opinions and breed misinformation, how VSP users obtain information and interact with SIVs is under-explored. This work surveyed 659 YouTube and 127 TikTok users to understand the motives for consuming… ▽ More The openness and influence of video-sharing platforms (VSPs) such as YouTube and TikTok attracted creators to share videos on various social issues. Although social issue videos (SIVs) affect public opinions and breed misinformation, how VSP users obtain information and interact with SIVs is under-explored. This work surveyed 659 YouTube and 127 TikTok users to understand the motives for consuming SIVs on VSPs. We found that VSP users are primarily motivated by the information and entertainment gratifications to use the platform. VSP users use SIVs for information-seeking purposes and find YouTube and TikTok convenient to interact with SIVs. VSP users moderately watch SIVs for entertainment and inactively engage in social interactions. SIV consumption is associated with information and socialization gratifications of the platform. VSP users appreciate the diversity of information and opinions but would also do their own research and are concerned about the misinformation and echo chamber problems. △ Less

Submitted 29 October, 2023; originally announced October 2023.

ACM Class: J.4

arXiv:2310.19011 [pdf, other]

Efficient Test-Time Adaptation for Super-Resolution with Second-Order Degradation and Reconstruction

Authors: Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan

Abstract: Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images. Conventional SR methods typically gather the paired training data by synthesizing LR images from HR images using a predetermined degradation model, e.g., Bicubic down-sampling. However, the realistic degradation type of test images may mismatch with the training-… ▽ More Image super-resolution (SR) aims to learn a mapping from low-resolution (LR) to high-resolution (HR) using paired HR-LR training images. Conventional SR methods typically gather the paired training data by synthesizing LR images from HR images using a predetermined degradation model, e.g., Bicubic down-sampling. However, the realistic degradation type of test images may mismatch with the training-time degradation type due to the dynamic changes of the real-world scenarios, resulting in inferior-quality SR images. To address this, existing methods attempt to estimate the degradation model and train an image-specific model, which, however, is quite time-consuming and impracticable to handle rapidly changing domain shifts. Moreover, these methods largely concentrate on the estimation of one degradation type (e.g., blur degradation), overlooking other degradation types like noise and JPEG in real-world test-time scenarios, thus limiting their practicality. To tackle these problems, we present an efficient test-time adaptation framework for SR, named SRTTA, which is able to quickly adapt SR models to test domains with different/unknown degradation types. Specifically, we design a second-order degradation scheme to construct paired data based on the degradation type of the test image, which is predicted by a pre-trained degradation classifier. Then, we adapt the SR model by implementing feature-level reconstruction learning from the initial test image to its second-order degraded counterparts, which helps the SR model generate plausible HR images. Extensive experiments are conducted on newly synthesized corrupted DIV2K datasets with 8 different degradations and several real-world datasets, demonstrating that our SRTTA framework achieves an impressive improvement over existing methods with satisfying speed. The source code is available at https://github.com/DengZeshuai/SRTTA. △ Less

Submitted 29 October, 2023; originally announced October 2023.

Comments: Accepted by 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

arXiv:2310.04615 [pdf]

doi 10.1002/adma.202311559

Giant Modulation of Refractive Index from Picoscale Atomic Displacements

Authors: Boyang Zhao, Guodong Ren, Hongyan Mei, Vincent C. Wu, Shantanu Singh, Gwan-Yeong Jung, Huandong Chen, Raynald Giovine, Shanyuan Niu, Arashdeep S. Thind, Jad Salman, Nick S. Settineri, Bryan C. Chakoumakos, Michael E. Manley, Raphael P. Hermann, Andrew R. Lupini, Miaofang Chi, Jordan A. Hachtel, Arkadiy Simonov, Simon J. Teat, Raphaële J. Clément, Mikhail A. Kats, J. Ravichandran, Rohan Mishra

Abstract: Structural disorder has been shown to enhance and modulate magnetic, electrical, dipolar, electrochemical, and mechanical properties of materials. However, the possibility of obtaining novel optical and optoelectronic properties from structural disorder remains an open question. Here, we show unambiguous evidence of disorder in the form of anisotropic, picoscale atomic displacements modulating the… ▽ More Structural disorder has been shown to enhance and modulate magnetic, electrical, dipolar, electrochemical, and mechanical properties of materials. However, the possibility of obtaining novel optical and optoelectronic properties from structural disorder remains an open question. Here, we show unambiguous evidence of disorder in the form of anisotropic, picoscale atomic displacements modulating the refractive index tensor and resulting in the giant optical anisotropy observed in BaTiS$_3$, a quasi-one-dimensional hexagonal chalcogenide. Single crystal X-ray diffraction studies reveal the presence of antipolar displacements of Ti atoms within adjacent TiS$_6$ chains along the c-axis, and three-fold degenerate Ti displacements in the a-b plane. $^{47/49}$Ti solid-state NMR provides additional evidence for those Ti displacements in the form of a three-horned NMR lineshape resulting from a low symmetry local environment around Ti atoms. We used scanning transmission electron microscopy to directly observe the globally disordered Ti a-b plane displacements and find them to be ordered locally over a few unit cells. First-principles calculations show that the Ti a-b plane displacements selectively reduce the refractive index along the ab-plane, while having minimal impact on the refractive index along the chain direction, thus resulting in a giant enhancement in the optical anisotropy. By showing a strong connection between structural disorder with picoscale displacements and the optical response in BaTiS$_3$, this study opens a pathway for designing optical materials with high refractive index and functionalities such as large optical anisotropy and nonlinearity. △ Less

Submitted 19 March, 2024; v1 submitted 6 October, 2023; originally announced October 2023.

Comments: 24 pages, 3 figures

arXiv:2309.09412 [pdf]

Cross-attention-based saliency inference for predicting cancer metastasis on whole slide images

Authors: Ziyu Su, Mostafa Rezapour, Usama Sajjad, Shuo Niu, Metin Nafi Gurcan, Muhammad Khalid Khan Niazi

Abstract: Although multiple instance learning (MIL) methods are widely used for automatic tumor detection on whole slide images (WSI), they suffer from the extreme class imbalance within the small tumor WSIs. This occurs when the tumor comprises only a few isolated cells. For early detection, it is of utmost importance that MIL algorithms can identify small tumors, even when they are less than 1% of the siz… ▽ More Although multiple instance learning (MIL) methods are widely used for automatic tumor detection on whole slide images (WSI), they suffer from the extreme class imbalance within the small tumor WSIs. This occurs when the tumor comprises only a few isolated cells. For early detection, it is of utmost importance that MIL algorithms can identify small tumors, even when they are less than 1% of the size of the WSI. Existing studies have attempted to address this issue using attention-based architectures and instance selection-based methodologies, but have not yielded significant improvements. This paper proposes cross-attention-based salient instance inference MIL (CASiiMIL), which involves a novel saliency-informed attention mechanism, to identify breast cancer lymph node micro-metastasis on WSIs without the need for any annotations. Apart from this new attention mechanism, we introduce a negative representation learning algorithm to facilitate the learning of saliency-informed attention weights for improved sensitivity on tumor WSIs. The proposed model outperforms the state-of-the-art MIL methods on two popular tumor metastasis detection datasets, and demonstrates great cross-center generalizability. In addition, it exhibits excellent accuracy in classifying WSIs with small tumor lesions. Moreover, we show that the proposed model has excellent interpretability attributed to the saliency-informed attention weights. We strongly believe that the proposed method will pave the way for training algorithms for early tumor detection on large datasets where acquiring fine-grained annotations is practically impossible. △ Less

Submitted 17 September, 2023; originally announced September 2023.

arXiv:2309.09180 [pdf, other]

Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding with Sequence-to-Sequence Architecture

Authors: Gaobin Yang, Maokui He, Shutong Niu, Ruoyu Wang, Yanyan Yue, Shuangqing Qian, Shilong Wu, Jun Du, Chin-Hui Lee

Abstract: We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by in… ▽ More We propose a novel neural speaker diarization system using memory-aware multi-speaker embedding with sequence-to-sequence architecture (NSD-MS2S), which integrates the strengths of memory-aware multi-speaker embedding (MA-MSE) and sequence-to-sequence (Seq2Seq) architecture, leading to improvement in both efficiency and performance. Next, we further decrease the memory occupation of decoding by incorporating input features fusion and then employ a multi-head attention mechanism to capture features at different levels. NSD-MS2S achieved a macro diarization error rate (DER) of 15.9% on the CHiME-7 EVAL set, which signifies a relative improvement of 49% over the official baseline system, and is the key technique for us to achieve the best performance for the main track of CHiME-7 DASR Challenge. Additionally, we introduce a deep interactive module (DIM) in MA-MSE module to better retrieve a cleaner and more discriminative multi-speaker embedding, enabling the current model to outperform the system we used in the CHiME-7 DASR Challenge. Our code will be available at https://github.com/liyunlongaaa/NSD-MS2S. △ Less

Submitted 26 December, 2023; v1 submitted 17 September, 2023; originally announced September 2023.

Comments: Accepted by ICASSP 2024

arXiv:2309.07383 [pdf, ps, other]

Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning

Authors: Ali Bouland, Shengyuan Niu, Sai Tej Paruchuri, Andrew Kurdila, John Burns, Eugenio Schuster

Abstract: This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(Ω)$. By casting an optimal control problem in a specific class of native spaces, strong rates of convergence are derived for the operator equation that enables offline approximations that appear in policy iteration. Explicit upper bounds on error in… ▽ More This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(Ω)$. By casting an optimal control problem in a specific class of native spaces, strong rates of convergence are derived for the operator equation that enables offline approximations that appear in policy iteration. Explicit upper bounds on error in value function and controller approximations are derived in terms of power function $\mathcal{P}_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(Ω)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions. △ Less

Submitted 17 November, 2023; v1 submitted 13 September, 2023; originally announced September 2023.

Comments: 8 pages, 5 figures

arXiv:2308.14638 [pdf, other]

The USTC-NERCSLIP Systems for the CHiME-7 DASR Challenge

Authors: Ruoyu Wang, Maokui He, Jun Du, Hengshun Zhou, Shutong Niu, Hang Chen, Yanyan Yue, Gaobin Yang, Shilong Wu, Lei Sun, Yanhui Tu, Haitao Tang, Shuangqing Qian, Tian Gao, Mengzhi Wang, Genshun Wan, Jia Pan, Jianqing Gao, Chin-Hui Lee

Abstract: This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy base… ▽ More This technical report details our submission system to the CHiME-7 DASR Challenge, which focuses on speaker diarization and speech recognition under complex multi-speaker scenarios. Additionally, it also evaluates the efficiency of systems in handling diverse array devices. To address these issues, we implemented an end-to-end speaker diarization system and introduced a rectification strategy based on multi-channel spatial information. This approach significantly diminished the word error rates (WER). In terms of recognition, we utilized publicly available pre-trained models as the foundational models to train our end-to-end speech recognition models. Our system attained a Macro-averaged diarization-attributed WER (DA-WER) of 21.01% on the CHiME-7 evaluation set, which signifies a relative improvement of 62.04% over the official baseline system. △ Less

Submitted 10 October, 2023; v1 submitted 28 August, 2023; originally announced August 2023.

Comments: Accepted by 2023 CHiME Workshop, Oral

arXiv:2307.15231 [pdf, other]

A hybrid method for quantum dynamics simulation

Authors: Niladri Gomes, Jia Yin, Siyuan Niu, Chao Yang, Wibe Albert de Jong

Abstract: We propose a hybrid approach to simulate quantum many body dynamics by combining Trotter based quantum algorithm with classical dynamic mode decomposition. The interest often lies in estimating observables rather than explicitly obtaining the wave function's form. Our method predicts observables of a quantum state in the long time by using data from a set of short time measurements from a quantum… ▽ More We propose a hybrid approach to simulate quantum many body dynamics by combining Trotter based quantum algorithm with classical dynamic mode decomposition. The interest often lies in estimating observables rather than explicitly obtaining the wave function's form. Our method predicts observables of a quantum state in the long time by using data from a set of short time measurements from a quantum computer. The upper bound for the global error of our method scales as $O(t^{3/2})$ with a fixed set of the measurement. We apply our method to quench dynamics in Hubbard model and nearest neighbor spin systems and show that the observable properties can be predicted up to a reasonable error by controlling the number of data points obtained from the quantum measurements. △ Less

Submitted 27 July, 2023; originally announced July 2023.

Comments: 9 pages, 4 figures

MSC Class: 68Q12; 81P68

arXiv:2307.13996 [pdf, ps, other]

Fast algorithms for k-submodular maximization subject to a matroid constraint

Authors: Shuxian Niu, Qian Liu, Yang Zhou, Min Li

Abstract: In this paper, we apply a Threshold-Decreasing Algorithm to maximize $k$-submodular functions under a matroid constraint, which reduces the query complexity of the algorithm compared to the greedy algorithm with little loss in approximation ratio. We give a $(\frac{1}{2} - ε)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - ε)$-approximation algorit… ▽ More In this paper, we apply a Threshold-Decreasing Algorithm to maximize $k$-submodular functions under a matroid constraint, which reduces the query complexity of the algorithm compared to the greedy algorithm with little loss in approximation ratio. We give a $(\frac{1}{2} - ε)$-approximation algorithm for monotone $k$-submodular function maximization, and a $(\frac{1}{3} - ε)$-approximation algorithm for non-monotone case, with complexity $O(\frac{n(k\cdot EO + IO)}ε \log \frac{r}ε)$, where $r$ denotes the rank of the matroid, and $IO, EO$ denote the number of oracles to evaluate whether a subset is an independent set and to compute the function value of $f$, respectively. Since the constraint of total size can be looked as a special matroid, called uniform matroid, then we present the fast algorithm for maximizing $k$-submodular functions subject to a total size constraint as corollaries. corollaries. △ Less

Submitted 26 July, 2023; originally announced July 2023.

arXiv:2307.08688 [pdf, other]

Semi-supervised multi-channel speaker diarization with cross-channel attention

Authors: Shilong Wu, Jun Du, Maokui He, Shutong Niu, Hang Chen, Haitao Tang, Chin-Hui Lee

Abstract: Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios. This paper proposes a semi-supervised speaker diarization system to utilize large-scale multi-channel training data by generating pseudo-labels for unlabeled data. Furthermore, we introduce cross-channel attention into the Neural Speaker Diarization Using Me… ▽ More Most neural speaker diarization systems rely on sufficient manual training data labels, which are hard to collect under real-world scenarios. This paper proposes a semi-supervised speaker diarization system to utilize large-scale multi-channel training data by generating pseudo-labels for unlabeled data. Furthermore, we introduce cross-channel attention into the Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding (NSD-MA-MSE) to learn channel contextual information of speaker embeddings better. Experimental results on the CHiME-7 Mixer6 dataset which only contains partial speakers' labels of the training set, show that our system achieved 57.01% relative DER reduction compared to the clustering-based model on the development set. We further conducted experiments on the CHiME-6 dataset to simulate the scenario of missing partial training set labels. When using 80% and 50% labeled training data, our system performs comparably to the results obtained using 100% labeled data for training. △ Less

Submitted 17 July, 2023; originally announced July 2023.

Comments: 8 pages,3 figures

arXiv:2306.10457 [pdf, other]

doi 10.1103/PhysRevB.109.L081107

Chiral spin liquids with projected Gaussian fermionic entangled pair states

Authors: Sen Niu, Jheng-Wei Li, Ji-Yao Chen, Didier Poilblanc

Abstract: We study the parton construction of chiral spin liquids (CSLs) using projected Gaussian fermionic entangled pair states (GfPEPSs). First, we show that GfPEPSs can represent generic spinless Chern insulators faithfully with finite bond dimensions. Then, by applying the Gutzwiller projection to a bi-layer GfPEPSs, spin-1/2 Abelian and non-Abelian CSLs are obtained for Chern number $C=1$ and $C=2$, r… ▽ More We study the parton construction of chiral spin liquids (CSLs) using projected Gaussian fermionic entangled pair states (GfPEPSs). First, we show that GfPEPSs can represent generic spinless Chern insulators faithfully with finite bond dimensions. Then, by applying the Gutzwiller projection to a bi-layer GfPEPSs, spin-1/2 Abelian and non-Abelian CSLs are obtained for Chern number $C=1$ and $C=2$, respectively. As a consequence of the topological obstruction for GfPEPSs, very weak Gossamer tails are observed in the correlation functions of the fermionic projected entangled pair state (PEPS) ansatze, suggesting that the no-go theorem for chiral PEPS is universal but does not bring any practical limitation. Remarkably, without fine tuning, all topological sectors can be constructed showing the expected number of chiral branches in the respective entanglement spectra, providing a sharp improvement with respect to the known bosonic PEPS approach. △ Less

Submitted 17 June, 2023; originally announced June 2023.

arXiv:2305.18998 [pdf, other]

Blind Beamforming for Intelligent Reflecting Surface in Fading Channels without CSI

Authors: Wenhai Lai, Wenyu Wang, Fan Xu, Xin Li, Shaobo Niu, Kaiming Shen

Abstract: This paper discusses how to optimize the phase shifts of intelligent reflecting surface (IRS) to combat channel fading without any channel state information (CSI), namely blind beamforming. Differing from most previous works based on a two-stage paradigm of first estimating channels and then optimizing phase shifts, our approach is completely data-driven, only requiring a dataset of the received s… ▽ More This paper discusses how to optimize the phase shifts of intelligent reflecting surface (IRS) to combat channel fading without any channel state information (CSI), namely blind beamforming. Differing from most previous works based on a two-stage paradigm of first estimating channels and then optimizing phase shifts, our approach is completely data-driven, only requiring a dataset of the received signal power at the user terminal. Thus, our method does not incur extra overhead costs for channel estimation, and does not entail collaboration from service provider, either. The main idea is to choose phase shifts at random and use the corresponding conditional sample mean of the received signal power to extract the main features of the wireless environment. This blind beamforming approach guarantees an $N^2$ boost of signal-to-noise ratio (SNR), where $N$ is the number of reflective elements (REs) of IRS, regardless of whether the direct channel is line-of-sight (LoS) or not. Moreover, blind beamforming is extended to a double-IRS system with provable performance. Finally, prototype tests show that the proposed blind beamforming method can be readily incorporated into the existing communication systems in the real world; simulation tests further show that it works for a variety of fading channel models. △ Less

Submitted 30 May, 2023; originally announced May 2023.

Comments: 14 pages, 14 figures

arXiv:2305.12649 [pdf, other]

Imbalance-Agnostic Source-Free Domain Adaptation via Avatar Prototype Alignment

Authors: Hongbin Lin, Mingkui Tan, Yifan Zhang, Zhen Qiu, Shuaicheng Niu, Dong Liu, Qing Du, Yanxia Liu

Abstract: Source-free Unsupervised Domain Adaptation (SF-UDA) aims to adapt a well-trained source model to an unlabeled target domain without access to the source data. One key challenge is the lack of source data during domain adaptation. To handle this, we propose to mine the hidden knowledge of the source model and exploit it to generate source avatar prototypes. To this end, we propose a Contrastive Pro… ▽ More Source-free Unsupervised Domain Adaptation (SF-UDA) aims to adapt a well-trained source model to an unlabeled target domain without access to the source data. One key challenge is the lack of source data during domain adaptation. To handle this, we propose to mine the hidden knowledge of the source model and exploit it to generate source avatar prototypes. To this end, we propose a Contrastive Prototype Generation and Adaptation (CPGA) method. CPGA consists of two stages: Prototype generation and Prototype adaptation. Extensive experiments on three UDA benchmark datasets demonstrate the superiority of CPGA. However, existing SF.UDA studies implicitly assume balanced class distributions for both the source and target domains, which hinders their real applications. To address this issue, we study a more practical SF-UDA task, termed imbalance-agnostic SF-UDA, where the class distributions of both the unseen source domain and unlabeled target domain are unknown and could be arbitrarily skewed. This task is much more challenging than vanilla SF-UDA due to the co-occurrence of covariate shifts and unidentified class distribution shifts between the source and target domains. To address this task, we extend CPGA and propose a new Target-aware Contrastive Prototype Generation and Adaptation (T-CPGA) method. Specifically, for better prototype adaptation in the imbalance-agnostic scenario, T-CPGA applies a new pseudo label generation strategy to identify unknown target class distribution and generate accurate pseudo labels, by utilizing the collective intelligence of the source model and an additional contrastive language-image pre-trained model. Meanwhile, we further devise a target label-distribution-aware classifier to adapt the model to the unknown target class distribution. We empirically show that T-CPGA significantly outperforms CPGA and other SF-UDA methods in imbalance-agnostic SF-UDA. △ Less

Submitted 21 May, 2023; originally announced May 2023.

Comments: arXiv admin note: text overlap with arXiv:2106.15326

arXiv:2305.06915 [pdf, other]

doi 10.22331/q-2024-02-13-1252

Adaptive variational simulation for open quantum systems

Authors: Huo Chen, Niladri Gomes, Siyuan Niu, Wibe Albert de Jong

Abstract: Emerging quantum hardware provides new possibilities for quantum simulation. While much of the research has focused on simulating closed quantum systems, the real-world quantum systems are mostly open. Therefore, it is essential to develop quantum algorithms that can effectively simulate open quantum systems. Here we present an adaptive variational quantum algorithm for simulating open quantum sys… ▽ More Emerging quantum hardware provides new possibilities for quantum simulation. While much of the research has focused on simulating closed quantum systems, the real-world quantum systems are mostly open. Therefore, it is essential to develop quantum algorithms that can effectively simulate open quantum systems. Here we present an adaptive variational quantum algorithm for simulating open quantum system dynamics described by the Lindblad equation. The algorithm is designed to build resource-efficient ansatze through the dynamical addition of operators by maintaining the simulation accuracy. We validate the effectiveness of our algorithm on both noiseless simulators and IBM quantum processors and observe good quantitative and qualitative agreement with the exact solution. We also investigate the scaling of the required resources with system size and accuracy and find polynomial behavior. Our results demonstrate that near-future quantum processors are capable of simulating open quantum systems. △ Less

Submitted 6 February, 2024; v1 submitted 11 May, 2023; originally announced May 2023.

Comments: 30 pages, 14 figures, 3 tables

Journal ref: Quantum 8, 1252 (2024)

arXiv:2305.03480 [pdf]

doi 10.1103/PhysRevApplied.20.054060

Mid-Infrared Upconversion Imaging Under Different Illumination Conditions

Authors: Zheng Ge, Zhao-Qi-Zhi Han, Yi-Yang Liu, Xiao-Hua Wang, Zhi-Yuan Zhou, Fan Yang, Yin-Hai Li, Yan Li, Li Chen, Wu-Zhen Li, Su-Jian Niu, Bao-Sen Shi

Abstract: Converting the medium infrared field to the visible band is an effective image detection method. We propose a comprehensive theory of image up-conversion under continuous optical pumping, and discuss the relationship between the experimental parameters and imaging field of view, resolution, quantum efficiency, and conversion bandwidth. Theoretical predictions of upconversion imaging results are gi… ▽ More Converting the medium infrared field to the visible band is an effective image detection method. We propose a comprehensive theory of image up-conversion under continuous optical pumping, and discuss the relationship between the experimental parameters and imaging field of view, resolution, quantum efficiency, and conversion bandwidth. Theoretical predictions of upconversion imaging results are given based on numerical simulations, which show good agreement with experimental results. In particular, coherent and incoherent light illumination are studied separately and the advantages and disadvantages of their imaging performance are compared and analysed. This work provides a study of the upconversion image detection performance of the system, which is of great value in guiding the design of the detection system and bringing it to practical applications. △ Less

Submitted 18 May, 2023; v1 submitted 5 May, 2023; originally announced May 2023.

Comments: 12 pages, 8 figures

Report number: Phys. Rev. Applied 20, 054060

arXiv:2304.11497 [pdf, other]

A method to measure the embedded crack length and position in high-density polyethylene using microseconds ultrasound time signal

Authors: Sijun Niu, Venkatsai Bellala, Daanish A. Qureshi, Vikas Srivastava

Abstract: High-density polyethylene (HDPE) is used in applications ranging from cooling water pipelines in nuclear power plants and distribution pipelines for natural gas and hydrogen to biomedical implants. Embedded crack-like flaws form within HDPE during fabrication or operations. Non-visible flaws can cause catastrophic failure if undetected. Large structures such as HDPE pipelines require a fast, non-d… ▽ More High-density polyethylene (HDPE) is used in applications ranging from cooling water pipelines in nuclear power plants and distribution pipelines for natural gas and hydrogen to biomedical implants. Embedded crack-like flaws form within HDPE during fabrication or operations. Non-visible flaws can cause catastrophic failure if undetected. Large structures such as HDPE pipelines require a fast, non-destructive evaluation (NDE) method where the sensor can move rapidly across the structure. This is only possible if the flaw is evaluated using microseconds of time signal. We propose and show the accuracy of a machine learning-based Ultrasound NDE method that can rapidly and accurately predict embedded crack length and position simultaneously in HDPE with only tens of microseconds of time signal sensing. A method to quantify crack size in HDPE and other polymers using a very short Ultrasound time signal is lacking. We propose that an optimally trained machine learning model can decipher the crack characteristics using short measures of time signal, but a lack of large, well-distributed, and labeled datasets to train machine learning models continues to be a major limitation. To overcome this limitation, we have conducted computer simulations of ultrasound on HDPE to develop training data. We show that fully simulations trained convolutional neural network (CNN) can accurately predict crack lengths and positions in HDPE from experimentally measured ultrasound A-scan microsecond signals. Our method is based on the 1D time amplitude signal acquired over a very short time period and not based on 2D image analysis. The proposed methodology presents a pathway for training CNN using computationally generated data and applying the trained CNN in the field to quantify hidden cracks in large HDPE or other polymer structures. △ Less

Submitted 7 June, 2024; v1 submitted 22 April, 2023; originally announced April 2023.

Comments: 21 pages, 9 figures

arXiv:2304.05577 [pdf, other]

doi 10.1093/mnras/stad1080

The X-ray variation of M81* resolved by Chandra and NuSTAR

Authors: S. Niu, F. G. Xie, Q. D. Wang, L. Ji, F. Yuan, M. Long

Abstract: Despite advances in our understanding of low luminosity active galactic nuclei (LLAGNs), the fundamental details about the mechanisms of radiation and flare/outburst in hot accretion flow are still largely missing. We have systematically analyzed the archival Chandra and NuSTAR X-ray data of the nearby LLAGN M81*, whose $L_{\rm bol}\sim 10^{-5} L_{\rm Edd}$. Through a detailed study of X-ray light… ▽ More Despite advances in our understanding of low luminosity active galactic nuclei (LLAGNs), the fundamental details about the mechanisms of radiation and flare/outburst in hot accretion flow are still largely missing. We have systematically analyzed the archival Chandra and NuSTAR X-ray data of the nearby LLAGN M81*, whose $L_{\rm bol}\sim 10^{-5} L_{\rm Edd}$. Through a detailed study of X-ray light curve and spectral properties, we find that the X-ray continuum emission of the power-law shape more likely originates from inverse Compton scattering within the hot accretion flow. In contrast to Sgr A*, flares are rare in M81*. Low-amplitude variation can only be observed in soft X-ray band (amplitude usually $\lesssim 2$). Several simple models are tested, including sinusoidal-like and quasi-periodical. Based on a comparison of the dramatic differences of flare properties among Sgr A*, M31* and M81*, we find that, when the differences in both the accretion rate and the black hole mass are considered, the flares in LLAGNs can be understood universally in a magneto-hydrodynamical model. △ Less

Submitted 11 April, 2023; originally announced April 2023.

Comments: 11 pages, 8 figures, and 4 tables. Accepted to MNRAS

arXiv:2303.07085 [pdf]

Thermal camera based on frequency upconversion and its noise-equivalent temperature difference characterization

Authors: Zheng Ge, Zhi-Yuan Zhou, Jing-Xin Ceng, Li Chen, Yin-Hai Li, Yan Li, Su-Jian Niu, Bao-Sen Shi

Abstract: We present a scheme for estimating the noise-equivalent temperature difference (NETD) of frequency upconversion detectors (UCDs) that detect mid-infrared (MIR) light. In particular, this letter investigates the frequency upconversion of a periodically polarized crystal based on lithium niobate, where a mid-infrared conversion bandwidth of 220 nm can be achieved in a single poled period by a specia… ▽ More We present a scheme for estimating the noise-equivalent temperature difference (NETD) of frequency upconversion detectors (UCDs) that detect mid-infrared (MIR) light. In particular, this letter investigates the frequency upconversion of a periodically polarized crystal based on lithium niobate, where a mid-infrared conversion bandwidth of 220 nm can be achieved in a single poled period by a special design. Experimentally for a temperature target with a central wavelength of 7.89 μm in mid-infrared radiation, we estimated the NETD of the device to be 56 mK. Meanwhile, a direct measurement of the NETD was performed utilizing conventional methods, which resulted in 48 mK. We also compared the NETD of our UCD with commercially available direct mid-infrared detectors. Here, we showed that the limiting factor for further NETD reduction of our device is not primarily from the upconversion process and camera noise, but from the limitations of the heat source performance. Our detectors have good temperature measurement performance and can be used for a variety of applications involving temperature object identification and material structure detection. △ Less

Submitted 21 March, 2023; v1 submitted 13 March, 2023; originally announced March 2023.

arXiv:2303.06151 [pdf, other]

NoiseCAM: Explainable AI for the Boundary Between Noise and Adversarial Attacks

Authors: Wenkai Tan, Justus Renkhoff, Alvaro Velasquez, Ziyu Wang, Lusi Li, Jian Wang, Shuteng Niu, Fan Yang, Yongxin Liu, Houbing Song

Abstract: Deep Learning (DL) and Deep Neural Networks (DNNs) are widely used in various domains. However, adversarial attacks can easily mislead a neural network and lead to wrong decisions. Defense mechanisms are highly preferred in safety-critical applications. In this paper, firstly, we use the gradient class activation map (GradCAM) to analyze the behavior deviation of the VGG-16 network when its inputs… ▽ More Deep Learning (DL) and Deep Neural Networks (DNNs) are widely used in various domains. However, adversarial attacks can easily mislead a neural network and lead to wrong decisions. Defense mechanisms are highly preferred in safety-critical applications. In this paper, firstly, we use the gradient class activation map (GradCAM) to analyze the behavior deviation of the VGG-16 network when its inputs are mixed with adversarial perturbation or Gaussian noise. In particular, our method can locate vulnerable layers that are sensitive to adversarial perturbation and Gaussian noise. We also show that the behavior deviation of vulnerable layers can be used to detect adversarial examples. Secondly, we propose a novel NoiseCAM algorithm that integrates information from globally and pixel-level weighted class activation maps. Our algorithm is susceptible to adversarial perturbations and will not respond to Gaussian random noise mixed in the inputs. Third, we compare detecting adversarial examples using both behavior deviation and NoiseCAM, and we show that NoiseCAM outperforms behavior deviation modeling in its overall performance. Our work could provide a useful tool to defend against certain adversarial attacks on deep neural networks. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: Submitted to IEEE Fuzzy 2023. arXiv admin note: text overlap with arXiv:2303.06032

arXiv:2303.06032 [pdf, other]

Exploring Adversarial Attacks on Neural Networks: An Explainable Approach

Authors: Justus Renkhoff, Wenkai Tan, Alvaro Velasquez, illiam Yichen Wang, Yongxin Liu, Jian Wang, Shuteng Niu, Lejla Begic Fazlic, Guido Dartmann, Houbing Song

Abstract: Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input ima… ▽ More Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise. In particular, we compare the network response layer by layer to determine where errors occurred. Several interesting findings are derived. First, compared to Gaussian random noise, intentionally generated adversarial noise causes severe behavior deviation by distracting the area of concentration in the networks. Second, in many cases, adversarial examples only need to compromise a few intermediate blocks to mislead the final decision. Third, our experiments revealed that specific blocks are more vulnerable and easier to exploit by adversarial examples. Finally, we demonstrate that the layers $Block4\_conv1$ and $Block5\_cov1$ of the VGG-16 model are more susceptible to adversarial attacks. Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models. △ Less

Submitted 8 March, 2023; originally announced March 2023.

arXiv:2303.04253 [pdf, other]

TMHOI: Translational Model for Human-Object Interaction Detection

Authors: Lijing Zhu, Qizhen Lan, Alvaro Velasquez, Houbing Song, Acharya Kamal, Qing Tian, Shuteng Niu

Abstract: Detecting human-object interactions (HOIs) is an intricate challenge in the field of computer vision. Existing methods for HOI detection heavily rely on appearance-based features, but these may not fully capture all the essential characteristics necessary for accurate detection. To overcome these challenges, we propose an innovative graph-based approach called TMGHOI (Translational Model for Human… ▽ More Detecting human-object interactions (HOIs) is an intricate challenge in the field of computer vision. Existing methods for HOI detection heavily rely on appearance-based features, but these may not fully capture all the essential characteristics necessary for accurate detection. To overcome these challenges, we propose an innovative graph-based approach called TMGHOI (Translational Model for Human-Object Interaction Detection). Our method effectively captures the sentiment representation of HOIs by integrating both spatial and semantic knowledge. By representing HOIs as a graph, where the interaction components serve as nodes and their spatial relationships as edges. To extract crucial spatial and semantic information, TMGHOI employs separate spatial and semantic encoders. Subsequently, these encodings are combined to construct a knowledge graph that effectively captures the sentiment representation of HOIs. Additionally, the ability to incorporate prior knowledge enhances the understanding of interactions, further boosting detection accuracy. We conducted extensive evaluations on the widely-used HICO-DET datasets to demonstrate the effectiveness of TMGHOI. Our approach outperformed existing state-of-the-art graph-based methods by a significant margin, showcasing its potential as a superior solution for HOI detection. We are confident that TMGHOI has the potential to significantly improve the accuracy and efficiency of HOI detection. Its integration of spatial and semantic knowledge, along with its computational efficiency and practicality, makes it a valuable tool for researchers and practitioners in the computer vision community. As with any research, we acknowledge the importance of further exploration and evaluation on various datasets to establish the generalizability and robustness of our proposed method. △ Less

Submitted 1 July, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

Comments: 10 pages, 3 figures, 2 tables

arXiv:2303.00041 [pdf]

Colossal optical anisotropy from atomic-scale modulations

Authors: Hongyan Mei, Guodong Ren, Boyang Zhao, Jad Salman, Gwan Yeong Jung, Huandong Chen, Shantanu Singh, Arashdeep S. Thind, John Cavin, Jordan A. Hachtel, Miaofang Chi, Shanyuan Niu, Graham Joe, Chenghao Wan, Nick Settineri, Simon J. Teat, Bryan C. Chakoumakos, Jayakanth Ravichandran, Rohan Mishra, Mikhail A. Kats

Abstract: In modern optics, materials with large birefringence (Δn, where n is the refractive index) are sought after for polarization control (e.g. in wave plates, polarizing beam splitters, etc.), nonlinear optics and quantum optics (e.g. for phase matching and production of entangled photons), micromanipulation, and as a platform for unconventional light-matter coupling, such as Dyakonov-like surface pol… ▽ More In modern optics, materials with large birefringence (Δn, where n is the refractive index) are sought after for polarization control (e.g. in wave plates, polarizing beam splitters, etc.), nonlinear optics and quantum optics (e.g. for phase matching and production of entangled photons), micromanipulation, and as a platform for unconventional light-matter coupling, such as Dyakonov-like surface polaritons and hyperbolic phonon polaritons. Layered "van der Waals" materials, with strong intra-layer bonding and weak inter-layer bonding, can feature some of the largest optical anisotropy; however, their use in most optical systems is limited because their optic axis is out of the plane of the layers and the layers are weakly attached, making the anisotropy hard to access. Here, we demonstrate that a bulk crystal with subtle periodic modulations in its structure -- Sr9/8TiS3 -- is transparent and positive-uniaxial, with extraordinary index n_e = 4.5 and ordinary index n_o = 2.4 in the mid- to far-infrared. The excess Sr, compared to stoichiometric SrTiS3, results in the formation of TiS6 trigonal-prismatic units that break the infinite chains of face-shared TiS6 octahedra in SrTiS3 into periodic blocks of five TiS6 octahedral units. The additional electrons introduced by the excess Sr subsequently occupy the TiS6 octahedral blocks to form highly oriented and polarizable electron clouds, which selectively boost the extraordinary index n_e and result in record birefringence (Δn > 2.1 with low loss). The connection between subtle structural modulations and large changes in refractive index suggests new categories of anisotropic materials and also tunable optical materials with large refractive-index modulation and low optical losses. △ Less

Submitted 21 July, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

Comments: Main text + supplementary

arXiv:2302.12400 [pdf, other]

Towards Stable Test-Time Adaptation in Dynamic Wild World

Authors: Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen, Peilin Zhao, Mingkui Tan

Abstract: Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance whe… ▽ More Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples. However, the online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world. Specifically, TTA may fail to improve or even harm the model performance when test data have: 1) mixed distribution shifts, 2) small batch sizes, and 3) online imbalanced label distribution shifts, which are quite common in practice. In this paper, we investigate the unstable reasons and find that the batch norm layer is a crucial factor hindering TTA stability. Conversely, TTA can perform more stably with batch-agnostic norm layers, \ie, group or layer norm. However, we observe that TTA with group and layer norms does not always succeed and still suffers many failure cases. By digging into the failure cases, we find that certain noisy test samples with large gradients may disturb the model adaption and result in collapsed trivial solutions, \ie, assigning the same class label for all samples. To address the above collapse issue, we propose a sharpness-aware and reliable entropy minimization method, called SAR, for further stabilizing TTA from two aspects: 1) remove partial noisy samples with large gradients, 2) encourage model weights to go to a flat minimum so that the model is robust to the remaining noisy samples. Promising results demonstrate that SAR performs more stably over prior methods and is computationally efficient under the above wild test scenarios. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: accepted by International Conference on Learning Representations (ICLR) 2023 as Notable-Top-5%; 27 pages, 10 figures, 18 tables

arXiv:2302.09971 [pdf, other]

Social4Rec: Distilling User Preference from Social Graph for Video Recommendation in Tencent

Authors: Xuanji Xiao, Huaqiang Dai, Qian Dong, Shuzi Niu, Yuzhen Liu, Pei Liu

Abstract: Despite recommender systems play a key role in network content platforms, mining the user's interests is still a significant challenge. Existing works predict the user interest by utilizing user behaviors, i.e., clicks, views, etc., but current solutions are ineffective when users perform unsettled activities. The latter ones involve new users, which have few activities of any kind, and sparse use… ▽ More Despite recommender systems play a key role in network content platforms, mining the user's interests is still a significant challenge. Existing works predict the user interest by utilizing user behaviors, i.e., clicks, views, etc., but current solutions are ineffective when users perform unsettled activities. The latter ones involve new users, which have few activities of any kind, and sparse users who have low-frequency behaviors. We uniformly describe both these user-types as "cold users", which are very common but often neglected in network content platforms. To address this issue, we enhance the representation of the user interest by combining his social interest, e.g., friendship, following bloggers, interest groups, etc., with the activity behaviors. Thus, in this work, we present a novel algorithm entitled SocialNet, which adopts a two-stage method to progressively extract the coarse-grained and fine-grained social interest. Our technique then concatenates SocialNet's output with the original user representation to get the final user representation that combines behavior interests and social interests. Offline experiments on Tencent video's recommender system demonstrate the superiority over the baseline behavior-based model. The online experiment also shows a significant performance improvement in clicks and view time in the real-world recommendation system. The source code is available at https://github.com/Social4Rec/SocialNet. △ Less

Submitted 11 August, 2023; v1 submitted 20 February, 2023; originally announced February 2023.

Showing 1–50 of 354 results for author: Niu, S