-
NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition
Authors:
Chenyu Liu,
Jia Pan,
Jinshui Hu,
Baocai Yin,
Bing Yin,
Mingjun Chen,
Cong Liu,
Jun Du,
Qingfeng Liu
Abstract:
Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. Current methods typically approach HMER as an image-to-sequence generation task within an autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall languag…
▽ More
Recently, Handwritten Mathematical Expression Recognition (HMER) has gained considerable attention in pattern recognition for its diverse applications in document understanding. Current methods typically approach HMER as an image-to-sequence generation task within an autoregressive (AR) encoder-decoder framework. However, these approaches suffer from several drawbacks: 1) a lack of overall language context, limiting information utilization beyond the current decoding step; 2) error accumulation during AR decoding; and 3) slow decoding speed. To tackle these problems, this paper makes the first attempt to build a novel bottom-up Non-AutoRegressive Modeling approach for HMER, called NAMER. NAMER comprises a Visual Aware Tokenizer (VAT) and a Parallel Graph Decoder (PGD). Initially, the VAT tokenizes visible symbols and local relations at a coarse level. Subsequently, the PGD refines all tokens and establishes connectivities in parallel, leveraging comprehensive visual and linguistic contexts. Experiments on CROHME 2014/2016/2019 and HME100K datasets demonstrate that NAMER not only outperforms the current state-of-the-art (SOTA) methods on ExpRate by 1.93%/2.35%/1.49%/0.62%, but also achieves significant speedups of 13.7x and 6.7x faster in decoding time and overall FPS, proving the effectiveness and efficiency of NAMER.
△ Less
Submitted 16 July, 2024;
originally announced July 2024.
-
Bayesian uncertainty analysis for underwater 3D reconstruction with neural radiance fields
Authors:
Haojie Lian,
Xinhao Li,
Yilin Qu,
Jing Du,
Zhuxuan Meng,
Jie Liu,
Leilei Chen
Abstract:
Neural radiance fields (NeRFs) are a deep learning technique that can generate novel views of 3D scenes using sparse 2D images from different viewing directions and camera poses. As an extension of conventional NeRFs in underwater environment, where light can get absorbed and scattered by water, SeaThru-NeRF was proposed to separate the clean appearance and geometric structure of underwater scene…
▽ More
Neural radiance fields (NeRFs) are a deep learning technique that can generate novel views of 3D scenes using sparse 2D images from different viewing directions and camera poses. As an extension of conventional NeRFs in underwater environment, where light can get absorbed and scattered by water, SeaThru-NeRF was proposed to separate the clean appearance and geometric structure of underwater scene from the effects of the scattering medium. Since the quality of the appearance and structure of underwater scenes is crucial for downstream tasks such as underwater infrastructure inspection, the reliability of the 3D reconstruction model should be considered and evaluated. Nonetheless, owing to the lack of ability to quantify uncertainty in 3D reconstruction of underwater scenes under natural ambient illumination, the practical deployment of NeRFs in unmanned autonomous underwater navigation is limited. To address this issue, we introduce a spatial perturbation field D_omega based on Bayes' rays in SeaThru-NeRF and perform Laplace approximation to obtain a Gaussian distribution N(0,Sigma) of the parameters omega, where the diagonal elements of Sigma correspond to the uncertainty at each spatial location. We also employ a simple thresholding method to remove artifacts from the rendered results of underwater scenes. Numerical experiments are provided to demonstrate the effectiveness of this approach.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
Waveguide Superlattices with Artificial Gauge Field Towards Colorless and Crosstalkless Ultrahigh-Density Photonic Integration
Authors:
Xuelin Zhang,
Jiangbing Du,
Ke Xu,
Zuyuan He
Abstract:
Dense waveguide arrays with low crosstalk and ultra-broadband remain a vital issue for chip-scale integrated photonics. However, the sub-wavelength regime of such devices has not been adequately explored in practice. Herein, we propose the advanced waveguide superlattices leveraging the artificial gauge field mechanism. This approach achieves remarkable -24 dB crosstalk suppression with an ultra-b…
▽ More
Dense waveguide arrays with low crosstalk and ultra-broadband remain a vital issue for chip-scale integrated photonics. However, the sub-wavelength regime of such devices has not been adequately explored in practice. Herein, we propose the advanced waveguide superlattices leveraging the artificial gauge field mechanism. This approach achieves remarkable -24 dB crosstalk suppression with an ultra-broadband bandwidth, experimentally demonstrated over 500 nm, in silicon nitride waveguides. Moreover, the 112 Gbit/s signal encoded per channel of ultra-compact circuits with a bit error rate below the 7% forward error correction limit verified the capability for high-speed on-chip transmission. This design is compatible with metal back end-of-the-line (BEOL) processes and can be readily transferred to other platforms. Thus it holds great promise for significant reduction in on-chip footprint and cost in large-scale integrated photonics, and salient enhancement in the performance of a wide range of active and passive photonic devices and systems.
△ Less
Submitted 10 July, 2024;
originally announced July 2024.
-
F2PAD: A General Optimization Framework for Feature-Level to Pixel-Level Anomaly Detection
Authors:
Chengyu Tao,
Hao Xu,
Juan Du
Abstract:
Image-based inspection systems have been widely deployed in manufacturing production lines. Due to the scarcity of defective samples, unsupervised anomaly detection that only leverages normal samples during training to detect various defects is popular. Existing feature-based methods, utilizing deep features from pretrained neural networks, show their impressive performance in anomaly localization…
▽ More
Image-based inspection systems have been widely deployed in manufacturing production lines. Due to the scarcity of defective samples, unsupervised anomaly detection that only leverages normal samples during training to detect various defects is popular. Existing feature-based methods, utilizing deep features from pretrained neural networks, show their impressive performance in anomaly localization and the low demand for the sample size for training. However, the detected anomalous regions of these methods always exhibit inaccurate boundaries, which impedes the downstream tasks. This deficiency is caused: (i) The decreased resolution of high-level features compared with the original image, and (ii) The mixture of adjacent normal and anomalous pixels during feature extraction. To address them, we propose a novel unified optimization framework (F2PAD) that leverages the Feature-level information to guide the optimization process for Pixel-level Anomaly Detection in the inference stage. The proposed framework is universal and plug-and-play, which can enhance various feature-based methods with limited assumptions. Case studies are provided to demonstrate the effectiveness of our strategy, particularly when applied to three popular backbone methods: PaDiM, CFLOW-AD, and PatchCore.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
Multi-level Reliability Interface for Semantic Communications over Wireless Networks
Authors:
Tze-Yang Tung,
Homa Esfahanizadeh,
Jinfeng Du,
Harish Viswanathan
Abstract:
Semantic communication, when examined through the lens of joint source-channel coding (JSCC), maps source messages directly into channel input symbols, where the measure of success is defined by end-to-end distortion rather than traditional metrics such as block error rate. Previous studies have shown significant improvements achieved through deep learning (DL)-driven JSCC compared to traditional…
▽ More
Semantic communication, when examined through the lens of joint source-channel coding (JSCC), maps source messages directly into channel input symbols, where the measure of success is defined by end-to-end distortion rather than traditional metrics such as block error rate. Previous studies have shown significant improvements achieved through deep learning (DL)-driven JSCC compared to traditional separate source and channel coding. However, JSCC is impractical in existing communication networks, where application and network providers are typically different entities connected over general-purpose TCP/IP links. In this paper, we propose designing the source and channel mappings separately and sequentially via a novel multi-level reliability interface. This conceptual interface enables semi-JSCC at both the learned source and channel mappers and achieves many of the gains observed in existing DL-based JSCC work (which would require a fully joint design between the application and the network), such as lower end-to-end distortion and graceful degradation of distortion with channel quality. We believe this work represents an important step towards realizing semantic communications in wireless networks.
△ Less
Submitted 7 July, 2024;
originally announced July 2024.
-
Network-based Neighborhood regression
Authors:
Yaoming Zhen,
Jin-Hong Du
Abstract:
Given the ubiquity of modularity in biological systems, module-level regulation analysis is vital for understanding biological systems across various levels and their dynamics. Current statistical analysis on biological modules predominantly focuses on either detecting the functional modules in biological networks or sub-group regression on the biological features without using the network data. T…
▽ More
Given the ubiquity of modularity in biological systems, module-level regulation analysis is vital for understanding biological systems across various levels and their dynamics. Current statistical analysis on biological modules predominantly focuses on either detecting the functional modules in biological networks or sub-group regression on the biological features without using the network data. This paper proposes a novel network-based neighborhood regression framework whose regression functions depend on both the global community-level information and local connectivity structures among entities. An efficient community-wise least square optimization approach is developed to uncover the strength of regulation among the network modules while enabling asymptotic inference. With random graph theory, we derive non-asymptotic estimation error bounds for the proposed estimator, achieving exact minimax optimality. Unlike the root-n consistency typical in canonical linear regression, our model exhibits linear consistency in the number of nodes n, highlighting the advantage of incorporating neighborhood information. The effectiveness of the proposed framework is further supported by extensive numerical experiments. Application to whole-exome sequencing and RNA-sequencing Autism datasets demonstrates the usage of the proposed method in identifying the association between the gene modules of genetic variations and the gene modules of genomic differential expressions.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Edge AI: A Taxonomy, Systematic Review and Future Directions
Authors:
Sukhpal Singh Gill,
Muhammed Golec,
Jianmin Hu,
Minxian Xu,
Junhui Du,
Huaming Wu,
Guneet Kaur Walia,
Subramaniam Subramanian Murugesan,
Babar Ali,
Mohit Kumar,
Kejiang Ye,
Prabal Verma,
Surendra Kumar,
Felix Cuadrado,
Steve Uhlig
Abstract:
Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge…
▽ More
Edge Artificial Intelligence (AI) incorporates a network of interconnected systems and devices that receive, cache, process, and analyse data in close communication with the location where the data is captured with AI technology. Recent advancements in AI efficiency, the widespread use of Internet of Things (IoT) devices, and the emergence of edge computing have unlocked the enormous scope of Edge AI. The goal of Edge AI is to optimize data processing efficiency and velocity while ensuring data confidentiality and integrity. Despite being a relatively new field of research, spanning from 2014 to the present, it has shown significant and rapid development over the last five years. In this article, we present a systematic literature review for Edge AI to discuss the existing research, recent advancements, and future research directions. We created a collaborative edge AI learning system for cloud and edge computing analysis, including an in-depth study of the architectures that facilitate this mechanism. The taxonomy for Edge AI facilitates the classification and configuration of Edge AI systems while also examining its potential influence across many fields through compassing infrastructure, cloud computing, fog computing, services, use cases, ML and deep learning, and resource management. This study highlights the significance of Edge AI in processing real-time data at the edge of the network. Additionally, it emphasizes the research challenges encountered by Edge AI systems, including constraints on resources, vulnerabilities to security threats, and problems with scalability. Finally, this study highlights the potential future research directions that aim to address the current limitations of Edge AI by providing innovative solutions.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Out-of-Plane Polarization from Spin Reflection Induces Field-Free Spin-Orbit Torque Switching in Structures with Canted NiO Interfacial Moments
Authors:
Zhe Zhang,
Zhuoyi Li,
Yuzhe Chen,
Fangyuan Zhu,
Yu Yan,
Yao Li,
Liang He,
Jun Du,
Rong Zhang,
Jing Wu,
Xianyang Lu,
Yongbing Xu
Abstract:
Realizing deterministic current-induced spin-orbit torque (SOT) magnetization switching, especially in systems exhibiting perpendicular magnetic anisotropy (PMA), typically requires the application of a collinear in-plane field, posing a challenging problem. In this study, we successfully achieve field-free SOT switching in the CoFeB/MgO system. In a Ta/CoFeB/MgO/NiO/Ta structure, spin reflection…
▽ More
Realizing deterministic current-induced spin-orbit torque (SOT) magnetization switching, especially in systems exhibiting perpendicular magnetic anisotropy (PMA), typically requires the application of a collinear in-plane field, posing a challenging problem. In this study, we successfully achieve field-free SOT switching in the CoFeB/MgO system. In a Ta/CoFeB/MgO/NiO/Ta structure, spin reflection at the NiO interface, characterized by noncollinear spin structures with canted magnetization, generates a spin current with an out-of-plane spin polarization σz. We confirm the contribution of σz to the field-free SOT switching through measurements of the shift effect in the out-of-plane magnetization hysteresis loops under different currents. The incorporation of NiO as an antiferromagnetic insulator, mitigates the current shunting effect and ensures excellent thermal stability of the device. The sample with 0.8 nm MgO and 2 nm NiO demonstrates an impressive optimal switching ratio approaching 100% without an in-plane field. This breakthrough in the CoFeB/MgO system promises significant applications in spintronics, advancing us closer to realizing innovative technologies.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Decentralized Multi-Party Multi-Network AI for Global Deployment of 6G Wireless Systems
Authors:
Merim Dzaferagic,
Marco Ruffini,
Nina Slamnik-Krijestorac,
Joao F. Santos,
Johann Marquez-Barja,
Christos Tranoris,
Spyros Denazis,
Thomas Kyriakakis,
Panagiotis Karafotis,
Luiz DaSilva,
Shashi Raj Pandey,
Junya Shiraishi,
Petar Popovski,
Soren Kejser Jensen,
Christian Thomsen,
Torben Bach Pedersen,
Holger Claussen,
Jinfeng Du,
Gil Zussman,
Tingjun Chen,
Yiran Chen,
Seshu Tirupathi,
Ivan Seskar,
Daniel Kilper
Abstract:
Multiple visions of 6G networks elicit Artificial Intelligence (AI) as a central, native element. When 6G systems are deployed at a large scale, end-to-end AI-based solutions will necessarily have to encompass both the radio and the fiber-optical domain. This paper introduces the Decentralized Multi-Party, Multi-Network AI (DMMAI) framework for integrating AI into 6G networks deployed at scale. DM…
▽ More
Multiple visions of 6G networks elicit Artificial Intelligence (AI) as a central, native element. When 6G systems are deployed at a large scale, end-to-end AI-based solutions will necessarily have to encompass both the radio and the fiber-optical domain. This paper introduces the Decentralized Multi-Party, Multi-Network AI (DMMAI) framework for integrating AI into 6G networks deployed at scale. DMMAI harmonizes AI-driven controls across diverse network platforms and thus facilitates networks that autonomously configure, monitor, and repair themselves. This is particularly crucial at the network edge, where advanced applications meet heightened functionality and security demands. The radio/optical integration is vital due to the current compartmentalization of AI research within these domains, which lacks a comprehensive understanding of their interaction. Our approach explores multi-network orchestration and AI control integration, filling a critical gap in standardized frameworks for AI-driven coordination in 6G networks. The DMMAI framework is a step towards a global standard for AI in 6G, aiming to establish reference use cases, data and model management methods, and benchmarking platforms for future AI/ML solutions.
△ Less
Submitted 15 April, 2024;
originally announced July 2024.
-
Critical fluctuation and noise spectra in two-dimensional Fe$_{3}$GeTe$_{2}$ magnets
Authors:
Yuxin Li,
Zhe Ding,
Chen Wang,
Haoyu Sun,
Zhousheng Chen,
Pengfei Wang,
Ya Wang,
Ming Gong,
Hualing Zeng,
Fazhan Shi,
Jiangfeng Du
Abstract:
Critical fluctuations play a fundamental role in determining the spin orders for low-dimensional quantum materials, especially for recently discovered two-dimensional (2D) magnets. Here we employ the quantum decoherence imaging technique utilizing nitrogen-vacancy centers in diamond to explore the critical magnetic fluctuations and the associated temporal spin noise in van der Waals magnet…
▽ More
Critical fluctuations play a fundamental role in determining the spin orders for low-dimensional quantum materials, especially for recently discovered two-dimensional (2D) magnets. Here we employ the quantum decoherence imaging technique utilizing nitrogen-vacancy centers in diamond to explore the critical magnetic fluctuations and the associated temporal spin noise in van der Waals magnet $\rm{Fe_{3}GeTe_{2}}$. We show that the critical fluctuation contributes to a random magnetic field characterized by the noise spectra, which can be changed dramatically near the critical temperature $T_c$. A theoretical model to describe this phenomenon is developed, showing that the spectral density is characterized by a $1/f$ noise near the $T_c$, while away from this point it behaves like a white noise. The crossover at a certain temperature between these two situations is determined by changing of the distance between the sample and the diamond. This work provides a new way to study critical fluctuation and to extract some of the critical exponents, which may greatly deepen our understanding of criticality in a wide range of physical systems.
△ Less
Submitted 30 June, 2024;
originally announced July 2024.
-
Efficient Long-distance Latent Relation-aware Graph Neural Network for Multi-modal Emotion Recognition in Conversations
Authors:
Yuntao Shou,
Wei Ai,
Jiayi Du,
Tao Meng,
Haiyan Liu
Abstract:
The task of multi-modal emotion recognition in conversation (MERC) aims to analyze the genuine emotional state of each utterance based on the multi-modal information in the conversation, which is crucial for conversation understanding. Existing methods focus on using graph neural networks (GNN) to model conversational relationships and capture contextual latent semantic relationships. However, due…
▽ More
The task of multi-modal emotion recognition in conversation (MERC) aims to analyze the genuine emotional state of each utterance based on the multi-modal information in the conversation, which is crucial for conversation understanding. Existing methods focus on using graph neural networks (GNN) to model conversational relationships and capture contextual latent semantic relationships. However, due to the complexity of GNN, existing methods cannot efficiently capture the potential dependencies between long-distance utterances, which limits the performance of MERC. In this paper, we propose an Efficient Long-distance Latent Relation-aware Graph Neural Network (ELR-GNN) for multi-modal emotion recognition in conversations. Specifically, we first use pre-extracted text, video and audio features as input to Bi-LSTM to capture contextual semantic information and obtain low-level utterance features. Then, we use low-level utterance features to construct a conversational emotion interaction graph. To efficiently capture the potential dependencies between long-distance utterances, we use the dilated generalized forward push algorithm to precompute the emotional propagation between global utterances and design an emotional relation-aware operator to capture the potential semantic associations between different utterances. Furthermore, we combine early fusion and adaptive late fusion mechanisms to fuse latent dependency information between speaker relationship information and context. Finally, we obtain high-level discourse features and feed them into MLP for emotion prediction. Extensive experimental results show that ELR-GNN achieves state-of-the-art performance on the benchmark datasets IEMOCAP and MELD, with running times reduced by 52\% and 35\%, respectively.
△ Less
Submitted 27 June, 2024;
originally announced July 2024.
-
Amplify Graph Learning for Recommendation via Sparsity Completion
Authors:
Peng Yuan,
Haojie Li,
Minying Fang,
Xu Yu,
Yongjing Hao,
Junwei Du
Abstract:
Graph learning models have been widely deployed in collaborative filtering (CF) based recommendation systems. Due to the issue of data sparsity, the graph structure of the original input lacks potential positive preference edges, which significantly reduces the performance of recommendations. In this paper, we study how to enhance the graph structure for CF more effectively, thereby optimizing the…
▽ More
Graph learning models have been widely deployed in collaborative filtering (CF) based recommendation systems. Due to the issue of data sparsity, the graph structure of the original input lacks potential positive preference edges, which significantly reduces the performance of recommendations. In this paper, we study how to enhance the graph structure for CF more effectively, thereby optimizing the representation of graph nodes. Previous works introduced matrix completion techniques into CF, proposing the use of either stochastic completion methods or superficial structure completion to address this issue. However, most of these approaches employ random numerical filling that lack control over noise perturbations and limit the in-depth exploration of higher-order interaction features of nodes, resulting in biased graph representations.
In this paper, we propose an Amplify Graph Learning framework based on Sparsity Completion (called AGL-SC). First, we utilize graph neural network to mine direct interaction features between user and item nodes, which are used as the inputs of the encoder. Second, we design a factorization-based method to mine higher-order interaction features. These features serve as perturbation factors in the latent space of the hidden layer to facilitate generative enhancement. Finally, by employing the variational inference, the above multi-order features are integrated to implement the completion and enhancement of missing graph structures. We conducted benchmark and strategy experiments on four real-world datasets related to recommendation tasks. The experimental results demonstrate that AGL-SC significantly outperforms the state-of-the-art methods.
△ Less
Submitted 1 July, 2024; v1 submitted 27 June, 2024;
originally announced June 2024.
-
LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing
Authors:
Jiangshu Du,
Yibo Wang,
Wenting Zhao,
Zhongfen Deng,
Shuaiqi Liu,
Renze Lou,
Henry Peng Zou,
Pranav Narayanan Venkit,
Nan Zhang,
Mukund Srinath,
Haoran Ranran Zhang,
Vipul Gupta,
Yinghui Li,
Tao Li,
Fei Wang,
Qin Liu,
Tianlin Liu,
Pengzhi Gao,
Congying Xia,
Chen Xing,
Jiayang Cheng,
Zhaowei Wang,
Ying Su,
Raj Sanjay Shah,
Ruohao Guo
, et al. (15 additional authors not shown)
Abstract:
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as th…
▽ More
This work is motivated by two key trends. On one hand, large language models (LLMs) have shown remarkable versatility in various generative tasks such as writing, drawing, and question answering, significantly reducing the time required for many routine tasks. On the other hand, researchers, whose work is not only time-consuming but also highly expertise-demanding, face increasing challenges as they have to spend more time reading, writing, and reviewing papers. This raises the question: how can LLMs potentially assist researchers in alleviating their heavy workload?
This study focuses on the topic of LLMs assist NLP Researchers, particularly examining the effectiveness of LLM in assisting paper (meta-)reviewing and its recognizability. To address this, we constructed the ReviewCritique dataset, which includes two types of information: (i) NLP papers (initial submissions rather than camera-ready) with both human-written and LLM-generated reviews, and (ii) each review comes with "deficiency" labels and corresponding explanations for individual segments, annotated by experts. Using ReviewCritique, this study explores two threads of research questions: (i) "LLMs as Reviewers", how do reviews generated by LLMs compare with those written by humans in terms of quality and distinguishability? (ii) "LLMs as Metareviewers", how effectively can LLMs identify potential issues, such as Deficient or unprofessional review segments, within individual paper reviews? To our knowledge, this is the first work to provide such a comprehensive analysis.
△ Less
Submitted 25 June, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
LLMs' Classification Performance is Overclaimed
Authors:
Hanzi Xu,
Renze Lou,
Jiangshu Du,
Vahid Mahzoon,
Elmira Talebianaraki,
Zhuoan Zhou,
Elizabeth Garrison,
Slobodan Vucetic,
Wenpeng Yin
Abstract:
In many classification tasks designed for AI or human to solve, gold labels are typically included within the label space by default, often posed as "which of the following is correct?" This standard setup has traditionally highlighted the strong performance of advanced AI, particularly top-performing Large Language Models (LLMs), in routine classification tasks. However, when the gold label is in…
▽ More
In many classification tasks designed for AI or human to solve, gold labels are typically included within the label space by default, often posed as "which of the following is correct?" This standard setup has traditionally highlighted the strong performance of advanced AI, particularly top-performing Large Language Models (LLMs), in routine classification tasks. However, when the gold label is intentionally excluded from the label space, it becomes evident that LLMs still attempt to select from the available label candidates, even when none are correct. This raises a pivotal question: Do LLMs truly demonstrate their intelligence in understanding the essence of classification tasks?
In this study, we evaluate both closed-source and open-source LLMs across representative classification tasks, arguing that the perceived performance of LLMs is overstated due to their inability to exhibit the expected comprehension of the task. This paper makes a threefold contribution: i) To our knowledge, this is the first work to identify the limitations of LLMs in classification tasks when gold labels are absent. We define this task as Classify-w/o-Gold and propose it as a new testbed for LLMs. ii) We introduce a benchmark, Know-No, comprising two existing classification tasks and one new task, to evaluate Classify-w/o-Gold. iii) This work defines and advocates for a new evaluation metric, OmniAccuracy, which assesses LLMs' performance in classification tasks both when gold labels are present and absent.
△ Less
Submitted 3 July, 2024; v1 submitted 23 June, 2024;
originally announced June 2024.
-
Breaking Secure Aggregation: Label Leakage from Aggregated Gradients in Federated Learning
Authors:
Zhibo Wang,
Zhiwei Chang,
Jiahui Hu,
Xiaoyi Pang,
Jiacheng Du,
Yongle Chen,
Kui Ren
Abstract:
Federated Learning (FL) exhibits privacy vulnerabilities under gradient inversion attacks (GIAs), which can extract private information from individual gradients. To enhance privacy, FL incorporates Secure Aggregation (SA) to prevent the server from obtaining individual gradients, thus effectively resisting GIAs. In this paper, we propose a stealthy label inference attack to bypass SA and recover…
▽ More
Federated Learning (FL) exhibits privacy vulnerabilities under gradient inversion attacks (GIAs), which can extract private information from individual gradients. To enhance privacy, FL incorporates Secure Aggregation (SA) to prevent the server from obtaining individual gradients, thus effectively resisting GIAs. In this paper, we propose a stealthy label inference attack to bypass SA and recover individual clients' private labels. Specifically, we conduct a theoretical analysis of label inference from the aggregated gradients that are exclusively obtained after implementing SA. The analysis results reveal that the inputs (embeddings) and outputs (logits) of the final fully connected layer (FCL) contribute to gradient disaggregation and label restoration. To preset the embeddings and logits of FCL, we craft a fishing model by solely modifying the parameters of a single batch normalization (BN) layer in the original model. Distributing client-specific fishing models, the server can derive the individual gradients regarding the bias of FCL by resolving a linear system with expected embeddings and the aggregated gradients as coefficients. Then the labels of each client can be precisely computed based on preset logits and gradients of FCL's bias. Extensive experiments show that our attack achieves large-scale label recovery with 100\% accuracy on various datasets and model architectures.
△ Less
Submitted 22 June, 2024;
originally announced June 2024.
-
Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios
Authors:
Ya Jiang,
Qing Wang,
Jun Du,
Maocheng Hu,
Pengfei Hu,
Zeyan Liu,
Shi Cheng,
Zhaoxu Nian,
Yuxuan Dong,
Mingqi Cai,
Xin Fang,
Chin-Hui Lee
Abstract:
This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich c…
▽ More
This study presents an audio-visual information fusion approach to sound event localization and detection (SELD) in low-resource scenarios. We aim at utilizing audio and video modality information through cross-modal learning and multi-modal fusion. First, we propose a cross-modal teacher-student learning (TSL) framework to transfer information from an audio-only teacher model, trained on a rich collection of audio data with multiple data augmentation techniques, to an audio-visual student model trained with only a limited set of multi-modal data. Next, we propose a two-stage audio-visual fusion strategy, consisting of an early feature fusion and a late video-guided decision fusion to exploit synergies between audio and video modalities. Finally, we introduce an innovative video pixel swapping (VPS) technique to extend an audio channel swapping (ACS) method to an audio-visual joint augmentation. Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances. Furthermore, our submission to the SELD task of the DCASE 2023 Challenge ranks first place by effectively integrating the proposed techniques into a model ensemble.
△ Less
Submitted 21 June, 2024;
originally announced June 2024.
-
Textual Unlearning Gives a False Sense of Unlearning
Authors:
Jiacheng Du,
Zhibo Wang,
Kui Ren
Abstract:
Language models (LMs) are susceptible to "memorizing" training data, including a large amount of private or copyright-protected content. To safeguard the right to be forgotten (RTBF), machine unlearning has emerged as a promising method for LMs to efficiently "forget" sensitive training content and mitigate knowledge leakage risks. However, despite its good intentions, could the unlearning mechani…
▽ More
Language models (LMs) are susceptible to "memorizing" training data, including a large amount of private or copyright-protected content. To safeguard the right to be forgotten (RTBF), machine unlearning has emerged as a promising method for LMs to efficiently "forget" sensitive training content and mitigate knowledge leakage risks. However, despite its good intentions, could the unlearning mechanism be counterproductive? In this paper, we propose the Textual Unlearning Leakage Attack (TULA), where an adversary can infer information about the unlearned data only by accessing the models before and after unlearning. Furthermore, we present variants of TULA in both black-box and white-box scenarios. Through various experimental results, we critically demonstrate that machine unlearning amplifies the risk of knowledge leakage from LMs. Specifically, TULA can increase an adversary's ability to infer membership information about the unlearned data by more than 20% in black-box scenario. Moreover, TULA can even reconstruct the unlearned data directly with more than 60% accuracy with white-box access. Our work is the first to reveal that machine unlearning in LMs can inversely create greater knowledge risks and inspire the development of more secure unlearning mechanisms.
△ Less
Submitted 19 June, 2024;
originally announced June 2024.
-
Quantum Compiling with Reinforcement Learning on a Superconducting Processor
Authors:
Z. T. Wang,
Qiuhao Chen,
Yuxuan Du,
Z. H. Yang,
Xiaoxia Cai,
Kaixuan Huang,
Jingning Zhang,
Kai Xu,
Jun Du,
Yinan Li,
Yuling Jiao,
Xingyao Wu,
Wu Liu,
Xiliang Lu,
Huikai Xu,
Yirong Jin,
Ruixia Wang,
Haifeng Yu,
S. P. Zhao
Abstract:
To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcemen…
▽ More
To effectively implement quantum algorithms on noisy intermediate-scale quantum (NISQ) processors is a central task in modern quantum technology. NISQ processors feature tens to a few hundreds of noisy qubits with limited coherence times and gate operations with errors, so NISQ algorithms naturally require employing circuits of short lengths via quantum compilation. Here, we develop a reinforcement learning (RL)-based quantum compiler for a superconducting processor and demonstrate its capability of discovering novel and hardware-amenable circuits with short lengths. We show that for the three-qubit quantum Fourier transformation, a compiled circuit using only seven CZ gates with unity circuit fidelity can be achieved. The compiler is also able to find optimal circuits under device topological constraints, with lengths considerably shorter than those by the conventional method. Our study exemplifies the codesign of the software with hardware for efficient quantum compilation, offering valuable insights for the advancement of RL-based compilers.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design
Authors:
Ming Gao,
Hang Chen,
Jun Du,
Xin Xu,
Hongxiao Guo,
Hui Bu,
Jianxing Yang,
Ming Li,
Chin-Hui Lee
Abstract:
Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we…
▽ More
Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
The $q$-Schur algebras in type $D$, I, fundamental multiplication formulas
Authors:
Jie Du,
Yiqiang Li,
Zhaozhao Zhao
Abstract:
By embedding the Hecke algebra $\check H_q$ of type $D$ into the Hecke algebra $H_{q,1}$ of type $B$ with unequal parameters $(q,1)$, the $q$-Schur algebras $S^κ_q(n,r)$ of type $D$ is naturally defined as the endomorphism algebra of the tensor space with the $\check H_q$-action restricted from the $H_{q,1}$-action that defines the $(q,1)$-Schur algebra $S^\jmath_{q,1}(n,r)$ of type $B$. We invest…
▽ More
By embedding the Hecke algebra $\check H_q$ of type $D$ into the Hecke algebra $H_{q,1}$ of type $B$ with unequal parameters $(q,1)$, the $q$-Schur algebras $S^κ_q(n,r)$ of type $D$ is naturally defined as the endomorphism algebra of the tensor space with the $\check H_q$-action restricted from the $H_{q,1}$-action that defines the $(q,1)$-Schur algebra $S^\jmath_{q,1}(n,r)$ of type $B$. We investigate the algebras $S^\jmath_{q,1}(n,r)$ and $S^κ_q(n,r)$ both algebraically and geometrically and describe their standard bases, dimension formulas and weight idempotents. Most importantly, we use the geometrically derived two sets of the fundamental multiplication formulas in $S^\jmath_{q,1}(n,r)$ to derive multi-sets (9 sets in total!) of the fundamental multiplication formulas in $S^κ_q(n,r)$.
△ Less
Submitted 13 June, 2024;
originally announced June 2024.
-
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
Authors:
Jiefeng Ma,
Yan Wang,
Chenyu Liu,
Jun Du,
Yu Hu,
Zhenrong Zhang,
Pengfei Hu,
Qing Wang,
Jianshu Zhang
Abstract:
Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents,…
▽ More
Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents, constraining comprehensive understanding of complex forms. To address this issue, we present the SRFUND, a hierarchically structured multi-task form understanding benchmark. SRFUND provides refined annotations on top of the original FUNSD and XFUND datasets, encompassing five tasks: (1) word to text-line merging, (2) text-line to entity merging, (3) entity category classification, (4) item table localization, and (5) entity-based full-document hierarchical structure recovery. We meticulously supplemented the original dataset with missing annotations at various levels of granularity and added detailed annotations for multi-item table regions within the forms. Additionally, we introduce global hierarchical structure dependencies for entity relation prediction tasks, surpassing traditional local key-value associations. The SRFUND dataset includes eight languages including English, Chinese, Japanese, German, French, Spanish, Italian, and Portuguese, making it a powerful tool for cross-lingual form understanding. Extensive experimental results demonstrate that the SRFUND dataset presents new challenges and significant opportunities in handling diverse layouts and global hierarchical structures of forms, thus providing deep insights into the field of form understanding. The original dataset and implementations of baseline methods are available at https://sprateam-ustc.github.io/SRFUND
△ Less
Submitted 12 June, 2024;
originally announced June 2024.
-
Emergent Universal Quench Dynamics in Randomly Interacting Spin Models
Authors:
Yuchen Li,
Tian-Gang Zhou,
Ze Wu,
Pai Peng,
Shengyu Zhang,
Riqiang Fu,
Ren Zhang,
Wei Zheng,
Pengfei Zhang,
Hui Zhai,
Xinhua Peng,
Jiangfeng Du
Abstract:
Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also…
▽ More
Universality often emerges in low-energy equilibrium physics of quantum many-body systems, despite their microscopic complexity and variety. Recently, there has been a growing interest in studying far-from-equilibrium dynamics of quantum many-body systems. Such dynamics usually involves highly excited states beyond the traditional low-energy theory description. Whether universal behaviors can also emerge in such non-equilibrium dynamics is a central issue at the frontier of quantum dynamics. Here we report the experimental observation of universal dynamics by monitoring the spin depolarization process in a solid-state NMR system described by an ensemble of randomly interacting spins. The spin depolarization can be related to temporal spin-spin correlation functions at high temperatures. We discover a remarkable phenomenon that these correlation functions obey a universal functional form. This experimental fact helps us identify the dominant interacting processes in the spin depolarization dynamics that lead to this universality. Our observation demonstrates the existence of universality even in non-equilibrium dynamics at high temperatures, thereby complementing the well-established universality in low-energy physics.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
AS-70: A Mandarin stuttered speech dataset for automatic speech recognition and stuttering event detection
Authors:
Rong Gong,
Hongfei Xue,
Lezhi Wang,
Xin Xu,
Qisheng Li,
Lei Xie,
Hui Bu,
Shaomei Wu,
Jiaming Zhou,
Yong Qin,
Binbin Zhang,
Jun Du,
Jia Bin,
Ming Li
Abstract:
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the large…
▽ More
The rapid advancements in speech technologies over the past two decades have led to human-level performance in tasks like automatic speech recognition (ASR) for fluent speech. However, the efficacy of these models diminishes when applied to atypical speech, such as stuttering. This paper introduces AS-70, the first publicly available Mandarin stuttered speech dataset, which stands out as the largest dataset in its category. Encompassing conversational and voice command reading speech, AS-70 includes verbatim manual transcription, rendering it suitable for various speech-related tasks. Furthermore, baseline systems are established, and experimental results are presented for ASR and stuttering event detection (SED) tasks. By incorporating this dataset into the model fine-tuning, significant improvements in the state-of-the-art ASR models, e.g., Whisper and Hubert, are observed, enhancing their inclusivity in addressing stuttered speech.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Efficiently Exploring Large Language Models for Document-Level Machine Translation with In-context Learning
Authors:
Menglong Cui,
Jiangcun Du,
Shaolin Zhu,
Deyi Xiong
Abstract:
Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limi…
▽ More
Large language models (LLMs) exhibit outstanding performance in machine translation via in-context learning. In contrast to sentence-level translation, document-level translation (DOCMT) by LLMs based on in-context learning faces two major challenges: firstly, document translations generated by LLMs are often incoherent; secondly, the length of demonstration for in-context learning is usually limited. To address these issues, we propose a Context-Aware Prompting method (CAP), which enables LLMs to generate more accurate, cohesive, and coherent translations via in-context learning. CAP takes into account multi-level attention, selects the most relevant sentences to the current one as context, and then generates a summary from these collected sentences. Subsequently, sentences most similar to the summary are retrieved from the datastore as demonstrations, which effectively guide LLMs in generating cohesive and coherent translations. We conduct extensive experiments across various DOCMT tasks, and the results demonstrate the effectiveness of our approach, particularly in zero pronoun translation (ZPT) and literary translation tasks.
△ Less
Submitted 11 June, 2024;
originally announced June 2024.
-
Neural Codec-based Adversarial Sample Detection for Speaker Verification
Authors:
Xuanjun Chen,
Jiawei Du,
Haibin Wu,
Jyh-Shing Roger Jang,
Hung-yi Lee
Abstract:
Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbations and retain essential information. Specificall…
▽ More
Automatic Speaker Verification (ASV), increasingly used in security-critical applications, faces vulnerabilities from rising adversarial attacks, with few effective defenses available. In this paper, we propose a neural codec-based adversarial sample detection method for ASV. The approach leverages the codec's ability to discard redundant perturbations and retain essential information. Specifically, we distinguish between genuine and adversarial samples by comparing ASV score differences between original and re-synthesized audio (by codec models). This comprehensive study explores all open-source neural codecs and their variant models for experiments. The Descript-audio-codec model stands out by delivering the highest detection rate among 15 neural codecs and surpassing seven prior state-of-the-art (SOTA) detection methods. Note that, our single-model method even outperforms a SOTA ensemble method by a large margin.
△ Less
Submitted 6 June, 2024;
originally announced June 2024.
-
Click Without Compromise: Online Advertising Measurement via Per User Differential Privacy
Authors:
Yingtai Xiao,
Jian Du,
Shikun Zhang,
Qiang Yan,
Danfeng Zhang,
Daniel Kifer
Abstract:
Online advertising is a cornerstone of the Internet ecosystem, with advertising measurement playing a crucial role in optimizing efficiency. Ad measurement entails attributing desired behaviors, such as purchases, to ad exposures across various platforms, necessitating the collection of user activities across these platforms. As this practice faces increasing restrictions due to rising privacy con…
▽ More
Online advertising is a cornerstone of the Internet ecosystem, with advertising measurement playing a crucial role in optimizing efficiency. Ad measurement entails attributing desired behaviors, such as purchases, to ad exposures across various platforms, necessitating the collection of user activities across these platforms. As this practice faces increasing restrictions due to rising privacy concerns, safeguarding user privacy in this context is imperative. Our work is the first to formulate the real-world challenge of advertising measurement systems with real-time reporting of streaming data in advertising campaigns. We introduce Ads-BPC, a novel user-level differential privacy protection scheme for advertising measurement results. This approach optimizes global noise power and results in a non-identically distributed noise distribution that preserves differential privacy while enhancing measurement accuracy. Through experiments on both real-world advertising campaigns and synthetic datasets, Ads-BPC achieves a 25% to 50% increase in accuracy over existing streaming DP mechanisms applied to advertising measurement. This highlights our method's effectiveness in achieving superior accuracy alongside a formal privacy guarantee, thereby advancing the state-of-the-art in privacy-preserving advertising measurement.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
A DAFT Based Unified Waveform Design Framework for High-Mobility Communications
Authors:
Xingyao Zhang,
Haoran Yin,
Yanqun Tang,
Yu Zhou,
Yuqing Liu,
Jinming Du,
Yipeng Ding
Abstract:
With the increasing demand for multi-carrier communication in high-mobility scenarios, it is urgent to design new multi-carrier communication waveforms that can resist large delay-Doppler spreads. Various multi-carrier waveforms in the transform domain were proposed for the fast time-varying channels, including orthogonal time frequency space (OTFS), orthogonal chirp division multiplexing (OCDM),…
▽ More
With the increasing demand for multi-carrier communication in high-mobility scenarios, it is urgent to design new multi-carrier communication waveforms that can resist large delay-Doppler spreads. Various multi-carrier waveforms in the transform domain were proposed for the fast time-varying channels, including orthogonal time frequency space (OTFS), orthogonal chirp division multiplexing (OCDM), and affine frequency division multiplexing (AFDM). Among these, the AFDM is a strong candidate for its low implementation complexity and ability to achieve optimal diversity. This paper unifies the waveforms based on the discrete affine Fourier transform (DAFT) by using the chirp slope factor "k" in the time-frequency representation to construct a unified design framework for high-mobility communications. The design framework is employed to verify that the bit error rate performance of the DAFT-based waveform can be enhanced when the signal-to-noise ratio (SNR) is sufficiently high by adjusting the chirp slope factor "k".
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
UniOQA: A Unified Framework for Knowledge Graph Question Answering with Large Language Models
Authors:
Zhuoyang Li,
Liran Deng,
Hui Liu,
Qiaoqiao Liu,
Junzhao Du
Abstract:
OwnThink stands as the most extensive Chinese open-domain knowledge graph introduced in recent times. Despite prior attempts in question answering over OwnThink (OQA), existing studies have faced limitations in model representation capabilities, posing challenges in further enhancing overall accuracy in question answering. In this paper, we introduce UniOQA, a unified framework that integrates two…
▽ More
OwnThink stands as the most extensive Chinese open-domain knowledge graph introduced in recent times. Despite prior attempts in question answering over OwnThink (OQA), existing studies have faced limitations in model representation capabilities, posing challenges in further enhancing overall accuracy in question answering. In this paper, we introduce UniOQA, a unified framework that integrates two complementary parallel workflows. Unlike conventional approaches, UniOQA harnesses large language models (LLMs) for precise question answering and incorporates a direct-answer-prediction process as a cost-effective complement. Initially, to bolster representation capacity, we fine-tune an LLM to translate questions into the Cypher query language (CQL), tackling issues associated with restricted semantic understanding and hallucinations. Subsequently, we introduce the Entity and Relation Replacement algorithm to ensure the executability of the generated CQL. Concurrently, to augment overall accuracy in question answering, we further adapt the Retrieval-Augmented Generation (RAG) process to the knowledge graph. Ultimately, we optimize answer accuracy through a dynamic decision algorithm. Experimental findings illustrate that UniOQA notably advances SpCQL Logical Accuracy to 21.2% and Execution Accuracy to 54.9%, achieving the new state-of-the-art results on this benchmark. Through ablation experiments, we delve into the superior representation capacity of UniOQA and quantify its performance breakthrough.
△ Less
Submitted 4 June, 2024;
originally announced June 2024.
-
Molecular-Resolution Imaging of Ice Crystallized from Liquid Water
Authors:
Jingshan S. Du,
Suvo Banik,
Henry Chan,
Birk Fritsch,
Ying Xia,
Andreas Hutzler,
Subramanian K. R. S. Sankaranarayanan,
James J. De Yoreo
Abstract:
Despite the ubiquity of ice, a molecular-resolution image of ice crystallized from liquid water or the resulting defect structure has never been obtained. Here, we report the stabilization and angstrom-resolution electron imaging of ice Ih crystallized from liquid water. We combine lattice mapping with molecular dynamics simulations to reveal that ice formation is highly tolerant to nanoscale defe…
▽ More
Despite the ubiquity of ice, a molecular-resolution image of ice crystallized from liquid water or the resulting defect structure has never been obtained. Here, we report the stabilization and angstrom-resolution electron imaging of ice Ih crystallized from liquid water. We combine lattice mapping with molecular dynamics simulations to reveal that ice formation is highly tolerant to nanoscale defects such as misoriented subdomains and trapped gas bubbles, which are stabilized by molecular-scale structural motifs. Importantly, bubble surfaces adopt low-energy nanofacets and create negligible strain fields in the surrounding crystal. These bubbles can dynamically nucleate, grow, migrate, dissolve, and coalesce under electron irradiation and be monitored in situ near a steady state. This work opens the door to understanding water crystallization behaviors at an unprecedented spatial resolution.
△ Less
Submitted 4 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
Authors:
Yun Peng,
Xiao Lin,
Nachuan Ma,
Jiayuan Du,
Chuangwei Liu,
Chengju Liu,
Qijun Chen
Abstract:
Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and…
▽ More
Visual anomaly detection is vital in real-world applications, such as industrial defect detection and medical diagnosis. However, most existing methods focus on local structural anomalies and fail to detect higher-level functional anomalies under logical conditions. Although recent studies have explored logical anomaly detection, they can only address simple anomalies like missing or addition and show poor generalizability due to being heavily data-driven. To fill this gap, we propose SAM-LAD, a zero-shot, plug-and-play framework for logical anomaly detection in any scene. First, we obtain a query image's feature map using a pre-trained backbone. Simultaneously, we retrieve the reference images and their corresponding feature maps via the nearest neighbor search of the query image. Then, we introduce the Segment Anything Model (SAM) to obtain object masks of the query and reference images. Each object mask is multiplied with the entire image's feature map to obtain object feature maps. Next, an Object Matching Model (OMM) is proposed to match objects in the query and reference images. To facilitate object matching, we further propose a Dynamic Channel Graph Attention (DCGA) module, treating each object as a keypoint and converting its feature maps into feature vectors. Finally, based on the object matching relations, an Anomaly Measurement Model (AMM) is proposed to detect objects with logical anomalies. Structural anomalies in the objects can also be detected. We validate our proposed SAM-LAD using various benchmarks, including industrial datasets (MVTec Loco AD, MVTec AD), and the logical dataset (DigitAnatomy). Extensive experimental results demonstrate that SAM-LAD outperforms existing SoTA methods, particularly in detecting logical anomalies.
△ Less
Submitted 5 June, 2024; v1 submitted 2 June, 2024;
originally announced June 2024.
-
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Authors:
Shengyuan Ye,
Jiangsu Du,
Liekang Zeng,
Wenzhong Ou,
Xiaowen Chu,
Yutong Lu,
Xu Chen
Abstract:
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recogniz…
▽ More
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy introduces a novel hybrid model parallelism to orchestrate collaborative inference, along with a heterogeneity-aware parallelism planning for fully exploiting the resource potential. Furthermore, Galaxy devises a tile-based fine-grained overlapping of communication and computation to mitigate the impact of tensor synchronizations on inference latency under bandwidth-constrained edge environments. Extensive evaluation based on prototype implementation demonstrates that Galaxy remarkably outperforms state-of-the-art approaches under various edge environment setups, achieving up to 2.5x end-to-end latency reduction.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
A Variance-Preserving Interpolation Approach for Diffusion Models with Applications to Single Channel Speech Enhancement and Recognition
Authors:
Zilu Guo,
Qing Wang,
Jun Du,
Jia Pan,
Qing-Feng Liu,
Chin-Hui
Abstract:
In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation…
▽ More
In this paper, we propose a variance-preserving interpolation framework to improve diffusion models for single-channel speech enhancement (SE) and automatic speech recognition (ASR). This new variance-preserving interpolation diffusion model (VPIDM) approach requires only 25 iterative steps and obviates the need for a corrector, an essential element in the existing variance-exploding interpolation diffusion model (VEIDM). Two notable distinctions between VPIDM and VEIDM are the scaling function of the mean of state variables and the constraint imposed on the variance relative to the mean's scale. We conduct a systematic exploration of the theoretical mechanism underlying VPIDM and develop insights regarding VPIDM's applications in SE and ASR using VPIDM as a frontend. Our proposed approach, evaluated on two distinct data sets, demonstrates VPIDM's superior performances over conventional discriminative SE algorithms. Furthermore, we assess the performance of the proposed model under varying signal-to-noise ratio (SNR) levels. The investigation reveals VPIDM's improved robustness in target noise elimination when compared to VEIDM. Furthermore, utilizing the mid-outputs of both VPIDM and VEIDM results in enhanced ASR accuracies, thereby highlighting the practical efficacy of our proposed approach.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
All-voltage control of Giant Magnetoresistance
Authors:
Lujun Wei,
Yiyang Zhang,
Fei Huang,
Jiajv Yang,
Jincheng Peng,
Yanghui Li,
Yu Lu,
Jiarui Chen,
Tianyu Liu,
Yong Pu,
Jun Du
Abstract:
The aim of voltage control of magnetism is to reduce the power consumption of spintronic devices. For a spin valve, the magnetization directions of two ferromagnetic layers determine the giant magnetoresistance magnitude. However, achieving all-voltage manipulation of the magnetization directions between parallel and antiparallel states is a significant challenge. Here, we demonstrate that by util…
▽ More
The aim of voltage control of magnetism is to reduce the power consumption of spintronic devices. For a spin valve, the magnetization directions of two ferromagnetic layers determine the giant magnetoresistance magnitude. However, achieving all-voltage manipulation of the magnetization directions between parallel and antiparallel states is a significant challenge. Here, we demonstrate that by utilizing two exchange-biased Co/IrMn bilayers with opposite pinning directions and with ferromagnetic coupling through the Ruderman-Kittel-Kasuya-Yosida interaction between two Co layers, the magnetization directions of the two ferromagnetic layers of a spin valve can be switched between parallel and antiparallel states through allvoltage-induced strain control. The all-voltage controlled giant magnetoresistance is repeatable and nonvolatile. The rotation of magnetizations in the two Co layers under voltages, from antiparallel to parallel states, occurs in opposite directions as revealed through simulations utilizing the Landau-Lifshitz-Gilbert equation. This work can provide valuable reference for the development of low-power all-voltage-controlled spintronic devices.
△ Less
Submitted 27 May, 2024;
originally announced May 2024.
-
Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Authors:
Chang Li,
Ruoyu Wang,
Lijuan Liu,
Jun Du,
Yixuan Sun,
Zilu Guo,
Zhenrong Zhang,
Yuan Jiang
Abstract:
In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mi…
▽ More
In recent years, diffusion-based text-to-music (TTM) generation has gained prominence, offering a novel approach to synthesizing musical content from textual descriptions. Achieving high accuracy and diversity in this generation process requires extensive, high-quality data, which often constitutes only a fraction of available datasets. Within open-source datasets, the prevalence of issues like mislabeling, weak labeling, unlabeled data, and low-quality music waveform significantly hampers the development of music generation models. To overcome these challenges, we introduce a novel quality-aware masked diffusion transformer (QA-MDT) approach that enables generative models to discern the quality of input music waveform during training. Building on the unique properties of musical signals, we have adapted and implemented a MDT model for TTM task, while further unveiling its distinct capacity for quality control. Moreover, we address the issue of low-quality captions with a caption refinement data processing approach. Our demo page is shown in https://qa-mdt.github.io/. Code on https://github.com/ivcylc/qa-mdt
△ Less
Submitted 24 May, 2024;
originally announced May 2024.
-
RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports
Authors:
Jiawei Du,
Jia Guo,
Weihang Zhang,
Shengzhu Yang,
Hanruo Liu,
Huiqi Li,
Ningli Wang
Abstract:
The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our fou…
▽ More
The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs (CFPs), employing a tripartite optimization strategy to focus on left eye, right eye, and patient level to reflect real-world clinical scenarios. Extensive experiments demonstrate that RET-CLIP outperforms existing benchmarks across eight diverse datasets spanning four critical diagnostic categories: diabetic retinopathy, glaucoma, multiple disease diagnosis, and multi-label classification of multiple diseases, which demonstrate the performance and generality of our foundation model. The sourse code and pre-trained model are available at https://github.com/sStonemason/RET-CLIP.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
SEMv3: A Fast and Robust Approach to Table Separation Line Detection
Authors:
Chunxia Qin,
Zhenrong Zhang,
Pengfei Hu,
Chenyu Liu,
Jiefeng Ma,
Jun Du
Abstract:
Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Spl…
▽ More
Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Split, Embed and Merge), a method that is both fast and robust for detecting table separation lines. During the split stage, we introduce a Keypoint Offset Regression (KOR) module, which effectively detects table separation lines by directly regressing the offset of each line relative to its keypoint proposals. Moreover, in the merge stage, we define a series of merge actions to efficiently describe the table structure based on table grids. Extensive ablation studies demonstrate that our proposed KOR module can detect table separation lines quickly and accurately. Furthermore, on public datasets (e.g. WTW, ICDAR-2019 cTDaR Historical and iFLYTAB), SEMv3 achieves state-of-the-art (SOTA) performance. The code is available at https://github.com/Chunchunwumu/SEMv3.
△ Less
Submitted 20 May, 2024;
originally announced May 2024.
-
Positional encoding is not the same as context: A study on positional encoding for Sequential recommendation
Authors:
Alejo Lopez-Avila,
Jinhua Du,
Abbas Shimary,
Ze Li
Abstract:
The expansion of streaming media and e-commerce has led to a boom in recommendation systems, including Sequential recommendation systems, which consider the user's previous interactions with items. In recent years, research has focused on architectural improvements such as transformer blocks and feature extraction that can augment model information. Among these features are context and attributes.…
▽ More
The expansion of streaming media and e-commerce has led to a boom in recommendation systems, including Sequential recommendation systems, which consider the user's previous interactions with items. In recent years, research has focused on architectural improvements such as transformer blocks and feature extraction that can augment model information. Among these features are context and attributes. Of particular importance is the temporal footprint, which is often considered part of the context and seen in previous publications as interchangeable with positional information. Other publications use positional encodings with little attention to them. In this paper, we analyse positional encodings, showing that they provide relative information between items that are not inferable from the temporal footprint. Furthermore, we evaluate different encodings and how they affect metrics and stability using Amazon datasets. We added some new encodings to help with these problems along the way. We found that we can reach new state-of-the-art results by finding the correct positional encoding, but more importantly, certain encodings stabilise the training.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Towards Informatics-Driven Design of Nuclear Waste Forms
Authors:
Vinay I. Hegde,
Miroslava Peterson,
Sarah I. Allec,
Xiaonan Lu,
Thiruvillamalai Mahadevan,
Thanh Nguyen,
Jayani Kalahe,
Jared Oshiro,
Robert J. Seffens,
Ethan K. Nickerson,
Jincheng Du,
Brian J. Riley,
John D. Vienna,
James E. Saal
Abstract:
Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the…
▽ More
Informatics-driven approaches, such as machine learning and sequential experimental design, have shown the potential to drastically impact next-generation materials discovery and design. In this perspective, we present a few guiding principles for applying informatics-based methods towards the design of novel nuclear waste forms. We advocate for adopting a system design approach, and describe the effective usage of data-driven methods in every stage of such a design process. We demonstrate how this approach can optimally leverage physics-based simulations, machine learning surrogates, and experimental synthesis and characterization, within a feedback-driven closed-loop sequential learning framework. We discuss the importance of incorporating domain knowledge into the representation of materials, the construction and curation of datasets, the development of predictive property models, and the design and execution of experiments. We illustrate the application of this approach by successfully designing and validating Na- and Nd-containing phosphate-based ceramic waste forms. Finally, we discuss open challenges in such informatics-driven workflows and present an outlook for their widespread application for the cleanup of nuclear wastes.
△ Less
Submitted 16 May, 2024;
originally announced May 2024.
-
Challenging theories of dark energy with levitated force sensor
Authors:
Peiran Yin,
Rui Li,
Chengjiang Yin,
Xiangyu Xu,
Xiang Bian,
Han Xie,
Chang-Kui Duan,
Pu Huang,
Jian-hua He,
Jiangfeng Du
Abstract:
The nature of dark energy is one of the most outstanding problems in physical science, and various theories have been proposed. It is therefore essential to directly verify or rule out these theories experimentally. However, despite substantial efforts in astrophysical observations and laboratory experiments, previous tests have not yet acquired enough accuracy to provide decisive conclusions as t…
▽ More
The nature of dark energy is one of the most outstanding problems in physical science, and various theories have been proposed. It is therefore essential to directly verify or rule out these theories experimentally. However, despite substantial efforts in astrophysical observations and laboratory experiments, previous tests have not yet acquired enough accuracy to provide decisive conclusions as to the validity of these theories. Here, using a diamagnetically levitated force sensor, we carry out a test on one of the most compelling explanations for dark energy to date, namely the Chameleon theory, an ultra-light scalar field with screening mechanisms, which couples to normal-matter fields and leaves a detectable fifth force. Our results extend previous results by nearly two orders of magnitude to the entire physical plausible parameter space of cosmologically viable chameleon models. We find no evidence for such a fifth force. Our results decisively rule out the basic chameleon model as a candidate for dark energy. Our work, thus, demonstrates the robustness of laboratory experiments in unveiling the nature of dark energy in the future. The methodology developed here can be further applied to study a broad range of fundamental physics.
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Large coordinate kernel attention network for lightweight image super-resolution
Authors:
Fangwei Hao,
Jiesheng Wu,
Haotian Lu,
Ji Du,
Jing Xu
Abstract:
The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in comp…
▽ More
The multi-scale receptive field and large kernel attention (LKA) module have been shown to significantly improve performance in the lightweight image super-resolution task. However, existing lightweight super-resolution (SR) methods seldom pay attention to designing efficient building block with multi-scale receptive field for local modeling, and their LKA modules face a quadratic increase in computational and memory footprints as the convolutional kernel size increases. To address the first issue, we propose the multi-scale blueprint separable convolutions (MBSConv) as highly efficient building block with multi-scale receptive field, it can focus on the learning for the multi-scale information which is a vital component of discriminative representation. As for the second issue, we revisit the key properties of LKA in which we find that the adjacent direct interaction of local information and long-distance dependencies is crucial to provide remarkable performance. Thus, taking this into account and in order to mitigate the complexity of LKA, we propose a large coordinate kernel attention (LCKA) module which decomposes the 2D convolutional kernels of the depth-wise convolutional layers in LKA into horizontal and vertical 1-D kernels. LCKA enables the adjacent direct interaction of local information and long-distance dependencies not only in the horizontal direction but also in the vertical. Besides, LCKA allows for the direct use of extremely large kernels in the depth-wise convolutional layers to capture more contextual information, which helps to significantly improve the reconstruction performance, and it incurs lower computational complexity and memory footprints. Integrating MBSConv and LCKA, we propose a large coordinate kernel attention network (LCAN).
△ Less
Submitted 15 May, 2024;
originally announced May 2024.
-
Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming
Authors:
Tommaso Pasini,
Alejo López-Ávila,
Husam Quteineh,
Gerasimos Lampouras,
Jinhua Du,
Yubing Wang,
Ze Li,
Yusen Sun
Abstract:
Composing poetry or lyrics involves several creative factors, but a challenging aspect of generation is the adherence to a more or less strict metric and rhyming pattern. To address this challenge specifically, previous work on the task has mainly focused on reverse language modeling, which brings the critical selection of each rhyming word to the forefront of each verse. On the other hand, revers…
▽ More
Composing poetry or lyrics involves several creative factors, but a challenging aspect of generation is the adherence to a more or less strict metric and rhyming pattern. To address this challenge specifically, previous work on the task has mainly focused on reverse language modeling, which brings the critical selection of each rhyming word to the forefront of each verse. On the other hand, reversing the word order requires that models be trained from scratch with this task-specific goal and cannot take advantage of transfer learning from a Pretrained Language Model (PLM). We propose a novel fine-tuning approach that prepends the rhyming word at the start of each lyric, which allows the critical rhyming decision to be made before the model commits to the content of the lyric (as during reverse language modeling), but maintains compatibility with the word order of regular PLMs as the lyric itself is still generated in left-to-right order. We conducted extensive experiments to compare this fine-tuning against the current state-of-the-art strategies for rhyming, finding that our approach generates more readable text and better rhyming capabilities. Furthermore, we furnish a high-quality dataset in English and 12 other languages, analyse the approach's feasibility in a multilingual context, provide extensive experimental results shedding light on good and bad practices for lyrics generation, and propose metrics to compare methods in the future.
△ Less
Submitted 8 May, 2024;
originally announced May 2024.
-
PLAYER*: Enhancing LLM-based Multi-Agent Communication and Interaction in Murder Mystery Games
Authors:
Qinglin Zhu,
Runcong Zhao,
Jinhua Du,
Lin Gui,
Yulan He
Abstract:
We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping…
▽ More
We propose PLAYER*, a novel framework that addresses the limitations of existing agent-based approaches built on Large Language Models (LLMs) in handling complex questions and understanding interpersonal relationships in dynamic environments. PLAYER* enhances path planning in Murder Mystery Games (MMGs) using an anytime sampling-based planner and a questioning-driven search framework. By equipping agents with a set of sensors, PLAYER* eliminates the need for pre-defined questions and enables agents to navigate complex social interactions. We additionally make a contribution by introducing a quantifiable evaluation method using multiple-choice questions and present WellPlay, a dataset containing 1,482 question-answer pairs. Experimental results demonstrate PLAYER*'s superiority over existing multi-agent methods, enhancing the generalisability and adaptability of agents in MMGs and paving the way for more effective multi-agent interactions.
△ Less
Submitted 17 June, 2024; v1 submitted 26 April, 2024;
originally announced April 2024.
-
An energy efficient quantum-enhanced machine
Authors:
Waner Hou,
Xingyu Zhao,
Kamran Rehan,
Yi Li,
Yue Li,
Eric Lutz,
Yiheng Lin,
Jiangfeng Du
Abstract:
Quantum friction, a quantum analog of classical friction, reduces the performance of quantum machines, such as heat engines, and makes them less energy efficient. We here report the experimental realization of an energy efficient quantum engine coupled to a quantum battery that stores the produced work, using a single ion in a linear Paul trap. We first establish the quantum nature of the device b…
▽ More
Quantum friction, a quantum analog of classical friction, reduces the performance of quantum machines, such as heat engines, and makes them less energy efficient. We here report the experimental realization of an energy efficient quantum engine coupled to a quantum battery that stores the produced work, using a single ion in a linear Paul trap. We first establish the quantum nature of the device by observing nonclassical work oscillations with the number of cycles as verified by energy measurements of the battery. We moreover successfully apply shortcut-to-adiabaticity techniques to suppress quantum friction and improve work production. While the average energy cost of the shortcut protocol is only about $3\%$, the work output is enhanced by up to approximately 33$\%$, making the machine significantly more energy efficient. In addition, we show that the quantum engine consistently outperforms its classical counterpart in this regime. Our results pave the way for energy efficient machines with quantum-enhanced performance.
△ Less
Submitted 23 April, 2024;
originally announced April 2024.
-
Near-Quantum-limited Haloscope Detection of Dark Photon Dark Matter Enhanced by a High-Q Superconducting Cavit
Authors:
Runqi Kang,
Man Jiao,
Yu Tong,
Yang Liu,
Youpeng Zhong,
Yi-Fu Cai,
Jingwei Zhou,
Xing Rong,
Jiangfeng Du
Abstract:
We report new experimental results on the search for dark photons based on a near-quantum-limited haloscope equipped with a superconducting cavity. The loaded quality factor of the superconducting cavity is $6\times10^{5}$, so that the expected signal from dark photon dark matter can be enhanced by more than one order compared to a copper cavity. A Josephson parametric amplifier with a near-quantu…
▽ More
We report new experimental results on the search for dark photons based on a near-quantum-limited haloscope equipped with a superconducting cavity. The loaded quality factor of the superconducting cavity is $6\times10^{5}$, so that the expected signal from dark photon dark matter can be enhanced by more than one order compared to a copper cavity. A Josephson parametric amplifier with a near-quantum-limited noise temperature has been utilized to minimize the noise during the search. Furthermore, a digital acquisition card based on field programmable gate arrays has been utilized to maximize data collection efficiency with a duty cycle being 100$\%$. This work has established the most stringent constraints on dark photons at around 26.965 $μ$eV. In the future, our apparatus can be extended to search for other dark matter candidates, such as axions and axion-like particles, and scrutinize new physics beyond the Standard Model.
△ Less
Submitted 19 April, 2024;
originally announced April 2024.
-
NTIRE 2024 Challenge on Short-form UGC Video Quality Assessment: Methods and Results
Authors:
Xin Li,
Kun Yuan,
Yajing Pei,
Yiting Lu,
Ming Sun,
Chao Zhou,
Zhibo Chen,
Radu Timofte,
Wei Sun,
Haoning Wu,
Zicheng Zhang,
Jun Jia,
Zhichao Zhang,
Linhan Cao,
Qiubo Chen,
Xiongkuo Min,
Weisi Lin,
Guangtao Zhai,
Jianhui Sun,
Tianyi Wang,
Lei Li,
Han Kong,
Wenxuan Wang,
Bing Li,
Cheng Luo
, et al. (43 additional authors not shown)
Abstract:
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The…
▽ More
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality Assessment (S-UGC VQA), where various excellent solutions are submitted and evaluated on the collected dataset KVQ from popular short-form video platform, i.e., Kuaishou/Kwai Platform. The KVQ database is divided into three parts, including 2926 videos for training, 420 videos for validation, and 854 videos for testing. The purpose is to build new benchmarks and advance the development of S-UGC VQA. The competition had 200 participants and 13 teams submitted valid solutions for the final testing phase. The proposed solutions achieved state-of-the-art performances for S-UGC VQA. The project can be found at https://github.com/lixinustc/KVQChallenge-CVPR-NTIRE2024.
△ Less
Submitted 17 April, 2024;
originally announced April 2024.
-
Causal Inference for Genomic Data with Multiple Heterogeneous Outcomes
Authors:
Jin-Hong Du,
Zhenghao Zeng,
Edward H. Kennedy,
Larry Wasserman,
Kathryn Roeder
Abstract:
With the evolution of single-cell RNA sequencing techniques into a standard approach in genomics, it has become possible to conduct cohort-level causal inferences based on single-cell-level measurements. However, the individual gene expression levels of interest are not directly observable; instead, only repeated proxy measurements from each individual's cells are available, providing a derived ou…
▽ More
With the evolution of single-cell RNA sequencing techniques into a standard approach in genomics, it has become possible to conduct cohort-level causal inferences based on single-cell-level measurements. However, the individual gene expression levels of interest are not directly observable; instead, only repeated proxy measurements from each individual's cells are available, providing a derived outcome to estimate the underlying outcome for each of many genes. In this paper, we propose a generic semiparametric inference framework for doubly robust estimation with multiple derived outcomes, which also encompasses the usual setting of multiple outcomes when the response of each unit is available. To reliably quantify the causal effects of heterogeneous outcomes, we specialize the analysis to standardized average treatment effects and quantile treatment effects. Through this, we demonstrate the use of the semiparametric inferential results for doubly robust estimators derived from both Von Mises expansions and estimating equations. A multiple testing procedure based on Gaussian multiplier bootstrap is tailored for doubly robust estimators to control the false discovery exceedance rate. Applications in single-cell CRISPR perturbation analysis and individual-level differential expression analysis demonstrate the utility of the proposed methods and offer insights into the usage of different estimands for causal inference in genomics.
△ Less
Submitted 16 April, 2024; v1 submitted 13 April, 2024;
originally announced April 2024.
-
3D-CSAD: Untrained 3D Anomaly Detection for Complex Manufacturing Surfaces
Authors:
Xuanming Cao,
Chengyu Tao,
Juan Du
Abstract:
The surface quality inspection of manufacturing parts based on 3D point cloud data has attracted increasing attention in recent years. The reason is that the 3D point cloud can capture the entire surface of manufacturing parts, unlike the previous practices that focus on some key product characteristics. However, achieving accurate 3D anomaly detection is challenging, due to the complex surfaces o…
▽ More
The surface quality inspection of manufacturing parts based on 3D point cloud data has attracted increasing attention in recent years. The reason is that the 3D point cloud can capture the entire surface of manufacturing parts, unlike the previous practices that focus on some key product characteristics. However, achieving accurate 3D anomaly detection is challenging, due to the complex surfaces of manufacturing parts and the difficulty of collecting sufficient anomaly samples. To address these challenges, we propose a novel untrained anomaly detection method based on 3D point cloud data for complex manufacturing parts, which can achieve accurate anomaly detection in a single sample without training data. In the proposed framework, we transform an input sample into two sets of profiles along different directions. Based on one set of the profiles, a novel segmentation module is devised to segment the complex surface into multiple basic and simple components. In each component, another set of profiles, which have the nature of similar shapes, can be modeled as a low-rank matrix. Thus, accurate 3D anomaly detection can be achieved by using Robust Principal Component Analysis (RPCA) on these low-rank matrices. Extensive numerical experiments on different types of parts show that our method achieves promising results compared with the benchmark methods.
△ Less
Submitted 11 April, 2024;
originally announced April 2024.
-
SoK: Gradient Leakage in Federated Learning
Authors:
Jiacheng Du,
Jiahui Hu,
Zhibo Wang,
Peng Sun,
Neil Zhenqiang Gong,
Kui Ren
Abstract:
Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual effic…
▽ More
Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual efficacy against \emph{practical FL systems} remains under-explored. To address this gap, we conduct a comprehensive study on GIAs in this work. We start with a survey of GIAs that establishes a milestone to trace their evolution and develops a systematization to uncover their inherent threats. Specifically, we categorize the auxiliary assumptions used by existing GIAs based on their practical accessibility to potential adversaries. To facilitate deeper analysis, we highlight the challenges that GIAs face in practical FL systems from three perspectives: \textit{local training}, \textit{model}, and \textit{post-processing}. We then perform extensive theoretical and empirical evaluations of state-of-the-art GIAs across diverse settings, utilizing eight datasets and thirteen models. Our findings indicate that GIAs have inherent limitations when reconstructing data under practical local training settings. Furthermore, their efficacy is sensitive to the trained model, and even simple post-processing measures applied to gradients can be effective defenses. Overall, our work provides crucial insights into the limited effectiveness of GIAs in practical FL systems. By rectifying prior misconceptions, we hope to inspire more accurate and realistic investigations on this topic.
△ Less
Submitted 8 April, 2024;
originally announced April 2024.
-
Optimizing Information Propagation for Blockchain-empowered Mobile AIGC: A Graph Attention Network Approach
Authors:
Jiana Liao,
Jinbo Wen,
Jiawen Kang,
Yang Zhang,
Jianbo Du,
Qihao Li,
Weiting Zhang,
Dong Yang
Abstract:
Artificial Intelligence-Generated Content (AIGC) is a rapidly evolving field that utilizes advanced AI algorithms to generate content. Through integration with mobile edge networks, mobile AIGC networks have gained significant attention, which can provide real-time customized and personalized AIGC services and products. Since blockchains can facilitate decentralized and transparent data management…
▽ More
Artificial Intelligence-Generated Content (AIGC) is a rapidly evolving field that utilizes advanced AI algorithms to generate content. Through integration with mobile edge networks, mobile AIGC networks have gained significant attention, which can provide real-time customized and personalized AIGC services and products. Since blockchains can facilitate decentralized and transparent data management, AIGC products can be securely managed by blockchain to avoid tampering and plagiarization. However, the evolution of blockchain-empowered mobile AIGC is still in its nascent phase, grappling with challenges such as improving information propagation efficiency to enable blockchain-empowered mobile AIGC. In this paper, we design a Graph Attention Network (GAT)-based information propagation optimization framework for blockchain-empowered mobile AIGC. We first innovatively apply age of information as a data-freshness metric to measure information propagation efficiency in public blockchains. Considering that GATs possess the excellent ability to process graph-structured data, we utilize the GAT to obtain the optimal information propagation trajectory. Numerical results demonstrate that the proposed scheme exhibits the most outstanding information propagation efficiency compared with traditional routing mechanisms.
△ Less
Submitted 7 April, 2024;
originally announced April 2024.
-
Joint Identifiability of Cross-Domain Recommendation via Hierarchical Subspace Disentanglement
Authors:
Jing Du,
Zesheng Ye,
Bin Guo,
Zhiwen Yu,
Lina Yao
Abstract:
Cross-Domain Recommendation (CDR) seeks to enable effective knowledge transfer across domains. Existing works rely on either representation alignment or transformation bridges, but they struggle on identifying domain-shared from domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its join…
▽ More
Cross-Domain Recommendation (CDR) seeks to enable effective knowledge transfer across domains. Existing works rely on either representation alignment or transformation bridges, but they struggle on identifying domain-shared from domain-specific latent factors. Specifically, while CDR describes user representations as a joint distribution over two domains, these methods fail to account for its joint identifiability as they primarily fixate on the marginal distribution within a particular domain. Such a failure may overlook the conditionality between two domains and how it contributes to latent factor disentanglement, leading to negative transfer when domains are weakly correlated. In this study, we explore what should and should not be transferred in cross-domain user representations from a causality perspective. We propose a Hierarchical subspace disentanglement approach to explore the Joint IDentifiability of cross-domain joint distribution, termed HJID, to preserve domain-specific behaviors from domain-shared factors. HJID organizes user representations into layers: generic shallow subspaces and domain-oriented deep subspaces. We first encode the generic pattern in the shallow subspace by minimizing the Maximum Mean Discrepancy of initial layer activation. Then, to dissect how domain-oriented latent factors are encoded in deeper layers activation, we construct a cross-domain causality-based data generation graph, which identifies cross-domain consistent and domain-specific components, adhering to the Minimal Change principle. This allows HJID to maintain stability whilst discovering unique factors for different domains, all within a generative framework of invertible transformations that guarantee the joint identifiability. With experiments on real-world datasets, we show that HJID outperforms SOTA methods on a range of strongly and weakly correlated CDR tasks.
△ Less
Submitted 5 April, 2024;
originally announced April 2024.