subscribe to arXiv mailings

One-Bit MIMO Detection: From Global Maximum-Likelihood Detector to Amplitude Retrieval Approach

Authors: Mingjie Shao, Wei-Kun Chen, Cheng-Yang Yu, Ya-Feng Liu, Wing-Kin Ma

Abstract: As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-in… ▽ More As communication systems advance towards the future 6G era, the incorporation of large-scale antenna arrays in base stations (BSs) presents challenges such as increased hardware costs and energy consumption. To address these issues, the use of one-bit analog-to-digital converters (ADCs)/digital-to-analog converters (DACs) has gained significant attentions. This paper focuses on one-bit multiple-input multiple-output (MIMO) detection in an uplink multiuser transmission scenario where the BS employs one-bit ADCs. One-bit quantization retains only the sign information and loses the amplitude information, which poses a unique challenge in the corresponding detection problem. The maximum-likelihood (ML) formulation of one-bit MIMO detection has a challenging likelihood function that hinders the application of many high-performance detectors developed for classic MIMO detection (under high-resolution ADCs). While many approximate methods for the ML detection problem have been studied, it lacks an efficient global algorithm. This paper fills this gap by proposing an efficient branch-and-bound algorithm, which is guaranteed to find the global solution of the one-bit ML MIMO detection problem. Additionally, a new amplitude retrieval (AR) detection approach is developed, incorporating explicit amplitude variables into the problem formulation. The AR approach yields simpler objective functions that enable the development of efficient algorithms offering both global and approximate solutions. The paper also contributes to the computational complexity analysis of both ML and AR detection problems. Extensive simulations are conducted to demonstrate the effectiveness and efficiency of the proposed formulations and algorithms. △ Less

Submitted 13 July, 2024; originally announced July 2024.

arXiv:2407.07487 [pdf, other]

Review-LLM: Harnessing Large Language Models for Personalized Review Generation

Authors: Qiyao Peng, Hongtao Liu, Hongyan Xu, Qing Yang, Minglai Shao, Wenjun Wang

Abstract: Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' ph… ▽ More Product review generation is an important task in recommender systems, which could provide explanation and persuasiveness for the recommendation. Recently, Large Language Models (LLMs, e.g., ChatGPT) have shown superior text modeling and generating ability, which could be applied in review generation. However, directly applying the LLMs for generating reviews might be troubled by the ``polite'' phenomenon of the LLMs and could not generate personalized reviews (e.g., negative reviews). In this paper, we propose Review-LLM that customizes LLMs for personalized review generation. Firstly, we construct the prompt input by aggregating user historical behaviors, which include corresponding item titles and reviews. This enables the LLMs to capture user interest features and review writing style. Secondly, we incorporate ratings as indicators of satisfaction into the prompt, which could further improve the model's understanding of user preferences and the sentiment tendency control of generated reviews. Finally, we feed the prompt text into LLMs, and use Supervised Fine-Tuning (SFT) to make the model generate personalized reviews for the given user and target item. Experimental results on the real-world dataset show that our fine-tuned model could achieve better review generation performance than existing close-source LLMs. △ Less

Submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.04877 [pdf]

Leveraging Data Mining, Active Learning, and Domain Adaptation in a Multi-Stage, Machine Learning-Driven Approach for the Efficient Discovery of Advanced Acidic Oxygen Evolution Electrocatalysts

Authors: Rui Ding, Jianguo Liu, Kang Hua, Xuebin Wang, Xiaoben Zhang, Minhua Shao, Yuxin Chen, Junhong Chen

Abstract: Developing advanced catalysts for acidic oxygen evolution reaction (OER) is crucial for sustainable hydrogen production. This study introduces a novel, multi-stage machine learning (ML) approach to streamline the discovery and optimization of complex multi-metallic catalysts. Our method integrates data mining, active learning, and domain adaptation throughout the materials discovery process. Unlik… ▽ More Developing advanced catalysts for acidic oxygen evolution reaction (OER) is crucial for sustainable hydrogen production. This study introduces a novel, multi-stage machine learning (ML) approach to streamline the discovery and optimization of complex multi-metallic catalysts. Our method integrates data mining, active learning, and domain adaptation throughout the materials discovery process. Unlike traditional trial-and-error methods, this approach systematically narrows the exploration space using domain knowledge with minimized reliance on subjective intuition. Then the active learning module efficiently refines element composition and synthesis conditions through iterative experimental feedback. The process culminated in the discovery of a promising Ru-Mn-Ca-Pr oxide catalyst. Our workflow also enhances theoretical simulations with domain adaptation strategy, providing deeper mechanistic insights aligned with experimental findings. By leveraging diverse data sources and multiple ML strategies, we establish an efficient pathway for electrocatalyst discovery and optimization. This comprehensive, data-driven approach represents a paradigm shift and potentially new benchmark in electrocatalysts research. △ Less

Submitted 5 July, 2024; originally announced July 2024.

Comments: 95 pages (main text 37 pages; supplementary materials 58 pages); 38 figures (main text 6 figures; supplementary materials 32 figures)

arXiv:2406.10958 [pdf, other]

City-LEO: Toward Transparent City Management Using LLM with End-to-End Optimization

Authors: Zihao Jiao, Mengyi Sha, Haoyu Zhang, Xinyu Jiang, Wei Qi

Abstract: Existing operations research (OR) models and tools play indispensable roles in smart-city operations, yet their practical implementation is limited by the complexity of modeling and deficiencies in optimization proficiency. To generate more relevant and accurate solutions to users' requirements, we propose a large language model (LLM)-based agent ("City-LEO") that enhances the efficiency and trans… ▽ More Existing operations research (OR) models and tools play indispensable roles in smart-city operations, yet their practical implementation is limited by the complexity of modeling and deficiencies in optimization proficiency. To generate more relevant and accurate solutions to users' requirements, we propose a large language model (LLM)-based agent ("City-LEO") that enhances the efficiency and transparency of city management through conversational interactions. Specifically, to accommodate diverse users' requirements and enhance computational tractability, City-LEO leverages LLM's logical reasoning capabilities on prior knowledge to scope down large-scale optimization problems efficiently. In the human-like decision process, City-LEO also incorporates End-to-end (E2E) model to synergize the prediction and optimization. The E2E framework be conducive to coping with environmental uncertainties and involving more query-relevant features, and then facilitates transparent and interpretable decision-making process. In case study, we employ City-LEO in the operations management of e-bike sharing (EBS) system. The numerical results demonstrate that City-LEO has superior performance when benchmarks against the full-scale optimization problem. With less computational time, City-LEO generates more satisfactory and relevant solutions to the users' requirements, and achieves lower global suboptimality without significantly compromising accuracy. In a broader sense, our proposed agent offers promise to develop LLM-embedded OR tools for smart-city operations management. △ Less

Submitted 17 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: 26 pages, 8 figures, 5 tables

arXiv:2406.09495 [pdf, other]

Fair Data Generation via Score-based Diffusion Model

Authors: Yujie Lin, Dong Li, Chen Zhao, Minglai Shao

Abstract: The fairness of AI decision-making has garnered increasing attention, leading to the proposal of numerous fairness algorithms. In this paper, we aim not to address this issue by directly introducing fair learning algorithms, but rather by generating entirely new, fair synthetic data from biased datasets for use in any downstream tasks. Additionally, the distribution of test data may differ from th… ▽ More The fairness of AI decision-making has garnered increasing attention, leading to the proposal of numerous fairness algorithms. In this paper, we aim not to address this issue by directly introducing fair learning algorithms, but rather by generating entirely new, fair synthetic data from biased datasets for use in any downstream tasks. Additionally, the distribution of test data may differ from that of the training set, potentially impacting the performance of the generated synthetic data in downstream tasks. To address these two challenges, we propose a diffusion model-based framework, FADM: Fairness-Aware Diffusion with Meta-training. FADM introduces two types of gradient induction during the sampling phase of the diffusion model: one to ensure that the generated samples belong to the desired target categories, and another to make the sensitive attributes of the generated samples difficult to classify into any specific sensitive attribute category. To overcome data distribution shifts in the test environment, we train the diffusion model and the two classifiers used for induction within a meta-learning framework. Compared to other baselines, FADM allows for flexible control over the categories of the generated samples and exhibits superior generalization capability. Experiments on real datasets demonstrate that FADM achieves better accuracy and optimal fairness in downstream tasks. △ Less

Submitted 13 June, 2024; originally announced June 2024.

arXiv:2406.07693 [pdf]

A Labelled Dataset for Sentiment Analysis of Videos on YouTube, TikTok, and Other Sources about the 2024 Outbreak of Measles

Authors: Nirmalya Thakur, Vanessa Su, Mingchen Shao, Kesha A. Patel, Hongseok Jeong, Victoria Knieling, Andrew Bian

Abstract: The work of this paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. The dataset is available at https://dx.doi.org/10.21227/40s8-xf63. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder… ▽ More The work of this paper presents a dataset that contains the data of 4011 videos about the ongoing outbreak of measles published on 264 websites on the internet between January 1, 2024, and May 31, 2024. The dataset is available at https://dx.doi.org/10.21227/40s8-xf63. These websites primarily include YouTube and TikTok, which account for 48.6% and 15.2% of the videos, respectively. The remainder of the websites include Instagram and Facebook as well as the websites of various global and local news organizations. For each of these videos, the URL of the video, title of the post, description of the post, and the date of publication of the video are presented as separate attributes in the dataset. After developing this dataset, sentiment analysis (using VADER), subjectivity analysis (using TextBlob), and fine-grain sentiment analysis (using DistilRoBERTa-base) of the video titles and video descriptions were performed. This included classifying each video title and video description into (i) one of the sentiment classes i.e. positive, negative, or neutral, (ii) one of the subjectivity classes i.e. highly opinionated, neutral opinionated, or least opinionated, and (iii) one of the fine-grain sentiment classes i.e. fear, surprise, joy, sadness, anger, disgust, or neutral. These results are presented as separate attributes in the dataset for the training and testing of machine learning algorithms for performing sentiment analysis or subjectivity analysis in this field as well as for other applications. Finally, this paper also presents a list of open research questions that may be investigated using this dataset. △ Less

Submitted 16 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

Comments: 19 pages

ACM Class: I.2.7; I.2.8; I.5.4; K.4.2; H.2.8; I.2.6

arXiv:2406.05590 [pdf, other]

NYU CTF Dataset: A Scalable Open-Source Benchmark Dataset for Evaluating LLMs in Offensive Security

Authors: Minghao Shao, Sofija Jancheska, Meet Udeshi, Brendan Dolan-Gavitt, Haoran Xi, Kimberly Milner, Boyuan Chen, Max Yin, Siddharth Garg, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique

Abstract: Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database incl… ▽ More Large Language Models (LLMs) are being deployed across various domains today. However, their capacity to solve Capture the Flag (CTF) challenges in cybersecurity has not been thoroughly evaluated. To address this, we develop a novel method to assess LLMs in solving CTF challenges by creating a scalable, open-source benchmark database specifically designed for these applications. This database includes metadata for LLM testing and adaptive learning, compiling a diverse range of CTF challenges from popular competitions. Utilizing the advanced function calling capabilities of LLMs, we build a fully automated system with an enhanced workflow and support for external tool calls. Our benchmark dataset and automated framework allow us to evaluate the performance of five LLMs, encompassing both black-box and open-source models. This work lays the foundation for future research into improving the efficiency of LLMs in interactive cybersecurity tasks and automated task planning. By providing a specialized dataset, our project offers an ideal platform for developing, testing, and refining LLM-based approaches to vulnerability detection and resolution. Evaluating LLMs on these challenges and comparing with human performance yields insights into their potential for AI-driven cybersecurity solutions to perform real-world threat management. We make our dataset open source to public https://github.com/NYU-LLM-CTF/LLM_CTF_Database along with our playground automated framework https://github.com/NYU-LLM-CTF/llm_ctf_automation. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.01284 [pdf]

Extraction of Weak Surface Diaphragmatic Electromyogram Using Modified Progressive FastICA Peel-Off

Authors: Yao Li, Dongsheng Zhao, Haowen Zhao, Xu Zhang, Min Shao

Abstract: Diaphragmatic electromyogram (EMGdi) contains crucial information about human respiration therefore can be used to monitor respiratory condition. Although it is practical to record EMGdi noninvasively and conveniently by placing surface electrodes over chest skin, extraction of such weak surface EMGdi (sEMGdi) from great noisy environment is a challenging task, limiting its clinical use compared w… ▽ More Diaphragmatic electromyogram (EMGdi) contains crucial information about human respiration therefore can be used to monitor respiratory condition. Although it is practical to record EMGdi noninvasively and conveniently by placing surface electrodes over chest skin, extraction of such weak surface EMGdi (sEMGdi) from great noisy environment is a challenging task, limiting its clinical use compared with esophageal EMGdi. In this paper, a novel method is presented for extracting weak sEMGdi signal from high-noise environment based on fast independent component analysis (FastICA), constrained FastICA and a peel-off strategy. It is truly a modified version of of progressive FastICA peel-off (PFP) framework, where the constrained FastICA helps to extract and refine respiration-related sEMGdi signals, while the peel-off strategy ensures the complete extraction of weaker sEMGdi components. The method was validated using both synthetic and clinical signals. It was demonstrated that our method was able to extract clean sEMGdi signals efficiently with little distortion. It outperformed state-of-the-art comparison methods in terms of sufficiently high SIR and CORR at all noise levels when tested on synthetic data, while also achieved an accuracy of 95.06% and a F2-score of 96.73% for breath identification on clinical data. The study presents a valuable solution for noninvasive extraction of sEMGdi signals, providing a convenient and valuable way of ventilator synchrony with a significant potential in aiding respiratory rehabilitation and health. △ Less

Submitted 28 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

arXiv:2405.06516 [pdf, ps, other]

An Efficient Algorithm for Sum-Rate Maximization in Fluid Antenna-Assisted ISAC System

Authors: Qian Zhang, Mingjie Shao, Tong Zhang, Gaojie Chen, Ju Liu

Abstract: In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorit… ▽ More In this letter, we investigate the fluid antenna (FA)-assisted integrated sensing and communication (ISAC) system, where communication and radar sensing employ the co-waveform design. Specifically, we focus on the beamformer design and antenna position configuration to realize a higher communication rate while guaranteeing the minimum radar probing power. Different from existing beamformer algorithms, we propose an efficient proximal distance algorithm (PDA) to solve the multiuser sum-rate maximization problem with radar sensing constraint to obtain the closed-form beamforming vector. In addition, we develop an extrapolated projected gradient (EPG) algorithm to obtain a better antenna location configuration for FA to enhance the ISAC performance. Numerical results show that the considered FA-assisted ISAC system enjoys a higher sum-rate by the proposed algorithm, compared with that in existing non-FA ISAC systems. △ Less

Submitted 10 May, 2024; originally announced May 2024.

arXiv:2405.02132 [pdf, other]

Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configurations of speech encoders, LLMs, and projector modules in the context of the speech foundation encoder-LLM ASR paradigm. Furthermore, we introduce a three-stage training approach, expressly developed to enhance the model's ability to align auditory and textual information. The implementation of this approach, alongside the strategic integration of ASR components, enabled us to achieve the SOTA performance on the AISHELL-1, Test_Net, and Test_Meeting test sets. Our analysis presents an empirical foundation for future research in LLM-based ASR systems and offers insights into optimizing performance using Chinese datasets. We will publicly release all scripts used for data preparation, training, inference, and scoring, as well as pre-trained models and training logs to promote reproducible research. △ Less

Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

arXiv:2404.19383 [pdf, other]

Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition

Authors: Zhendong Liu, Haifeng Xia, Tong Guo, Libo Sun, Ming Shao, Siyu Xia

Abstract: Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purpos… ▽ More Human action video recognition has recently attracted more attention in applications such as video security and sports posture correction. Popular solutions, including graph convolutional networks (GCNs) that model the human skeleton as a spatiotemporal graph, have proven very effective. GCNs-based methods with stacked blocks usually utilize top-layer semantics for classification/annotation purposes. Although the global features learned through the procedure are suitable for the general classification, they have difficulty capturing fine-grained action change across adjacent frames -- decisive factors in sports actions. In this paper, we propose a novel ``Cross-block Fine-grained Semantic Cascade (CFSC)'' module to overcome this challenge. In summary, the proposed CFSC progressively integrates shallow visual knowledge into high-level blocks to allow networks to focus on action details. In particular, the CFSC module utilizes the GCN feature maps produced at different levels, as well as aggregated features from proceeding levels to consolidate fine-grained features. In addition, a dedicated temporal convolution is applied at each level to learn short-term temporal features, which will be carried over from shallow to deep layers to maximize the leverage of low-level details. This cross-block feature aggregation methodology, capable of mitigating the loss of fine-grained information, has resulted in improved performance. Last, FD-7, a new action recognition dataset for fencing sports, was collected and will be made publicly available. Experimental results and empirical analysis on public benchmarks (FSD-10) and self-collected (FD-7) demonstrate the advantage of our CFSC module on learning discriminative patterns for action classification over others. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.03386 [pdf, other]

SENSOR: Imitate Third-Person Expert's Behaviors via Active Sensoring

Authors: Kaichen Huang, Minghao Shao, Shenghua Wan, Hai-Hang Sun, Shuai Feng, Le Gan, De-Chuan Zhan

Abstract: In many real-world visual Imitation Learning (IL) scenarios, there is a misalignment between the agent's and the expert's perspectives, which might lead to the failure of imitation. Previous methods have generally solved this problem by domain alignment, which incurs extra computation and storage costs, and these methods fail to handle the \textit{hard cases} where the viewpoint gap is too large.… ▽ More In many real-world visual Imitation Learning (IL) scenarios, there is a misalignment between the agent's and the expert's perspectives, which might lead to the failure of imitation. Previous methods have generally solved this problem by domain alignment, which incurs extra computation and storage costs, and these methods fail to handle the \textit{hard cases} where the viewpoint gap is too large. To alleviate the above problems, we introduce active sensoring in the visual IL setting and propose a model-based SENSory imitatOR (SENSOR) to automatically change the agent's perspective to match the expert's. SENSOR jointly learns a world model to capture the dynamics of latent states, a sensor policy to control the camera, and a motor policy to control the agent. Experiments on visual locomotion tasks show that SENSOR can efficiently simulate the expert's perspective and strategy, and outperforms most baseline methods. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.03382 [pdf, other]

DIDA: Denoised Imitation Learning based on Domain Adaptation

Authors: Kaichen Huang, Hai-Hang Sun, Shenghua Wan, Minghao Shao, Shuai Feng, Le Gan, De-Chuan Zhan

Abstract: Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. Previous IL methods improve t… ▽ More Imitating skills from low-quality datasets, such as sub-optimal demonstrations and observations with distractors, is common in real-world applications. In this work, we focus on the problem of Learning from Noisy Demonstrations (LND), where the imitator is required to learn from data with noise that often occurs during the processes of data collection or transmission. Previous IL methods improve the robustness of learned policies by injecting an adversarially learned Gaussian noise into pure expert data or utilizing additional ranking information, but they may fail in the LND setting. To alleviate the above problems, we propose Denoised Imitation learning based on Domain Adaptation (DIDA), which designs two discriminators to distinguish the noise level and expertise level of data, facilitating a feature encoder to learn task-related but domain-agnostic representations. Experiment results on MuJoCo demonstrate that DIDA can successfully handle challenging imitation tasks from demonstrations with various types of noise, outperforming most baseline methods. △ Less

Submitted 4 April, 2024; originally announced April 2024.

arXiv:2404.01843 [pdf, other]

Sketch3D: Style-Consistent Guidance for Sketch-to-3D Generation

Authors: Wangguandong Zheng, Haifeng Xia, Rui Chen, Ming Shao, Siyu Xia, Zhengming Ding

Abstract: Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To ove… ▽ More Recently, image-to-3D approaches have achieved significant results with a natural image as input. However, it is not always possible to access these enriched color input samples in practical applications, where only sketches are available. Existing sketch-to-3D researches suffer from limitations in broad applications due to the challenges of lacking color information and multi-view content. To overcome them, this paper proposes a novel generation paradigm Sketch3D to generate realistic 3D assets with shape aligned with the input sketch and color matching the textual description. Concretely, Sketch3D first instantiates the given sketch in the reference image through the shape-preserving generation process. Second, the reference image is leveraged to deduce a coarse 3D Gaussian prior, and multi-view style-consistent guidance images are generated based on the renderings of the 3D Gaussians. Finally, three strategies are designed to optimize 3D Gaussians, i.e., structural optimization via a distribution transfer mechanism, color optimization with a straightforward MSE loss and sketch similarity optimization with a CLIP-based geometric similarity loss. Extensive visual comparisons and quantitative analysis illustrate the advantage of our Sketch3D in generating realistic 3D assets while preserving consistency with the input. △ Less

Submitted 7 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

arXiv:2403.18866 [pdf, other]

Graph Bayesian Optimization for Multiplex Influence Maximization

Authors: Zirui Yuan, Minglai Shao, Zhiqian Chen

Abstract: Influence maximization (IM) is the problem of identifying a limited number of initial influential users within a social network to maximize the number of influenced users. However, previous research has mostly focused on individual information propagation, neglecting the simultaneous and interactive dissemination of multiple information items. In reality, when users encounter a piece of informatio… ▽ More Influence maximization (IM) is the problem of identifying a limited number of initial influential users within a social network to maximize the number of influenced users. However, previous research has mostly focused on individual information propagation, neglecting the simultaneous and interactive dissemination of multiple information items. In reality, when users encounter a piece of information, such as a smartphone product, they often associate it with related products in their minds, such as earphones or computers from the same brand. Additionally, information platforms frequently recommend related content to users, amplifying this cascading effect and leading to multiplex influence diffusion. This paper first formulates the Multiplex Influence Maximization (Multi-IM) problem using multiplex diffusion models with an information association mechanism. In this problem, the seed set is a combination of influential users and information. To effectively manage the combinatorial complexity, we propose Graph Bayesian Optimization for Multi-IM (GBIM). The multiplex diffusion process is thoroughly investigated using a highly effective global kernelized attention message-passing module. This module, in conjunction with Bayesian linear regression (BLR), produces a scalable surrogate model. A data acquisition module incorporating the exploration-exploitation trade-off is developed to optimize the seed set further. Extensive experiments on synthetic and real-world datasets have proven our proposed framework effective. The code is available at https://github.com/zirui-yuan/GBIM. △ Less

Submitted 25 March, 2024; originally announced March 2024.

Comments: Proceedings of the AAAI Conference on Artificial Intelligence, 2024

arXiv:2403.16334 [pdf, other]

Graphs Generalization under Distribution Shifts

Authors: Qin Tian, Wenjun Wang, Chen Zhao, Minglai Shao, Wang Zhang, Dong Li

Abstract: Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a signi… ▽ More Traditional machine learning methods heavily rely on the independent and identically distribution assumption, which imposes limitations when the test distribution deviates from the training distribution. To address this crucial issue, out-of-distribution (OOD) generalization, which aims to achieve satisfactory generalization performance when faced with unknown distribution shifts, has made a significant process. However, the OOD method for graph-structured data currently lacks clarity and remains relatively unexplored due to two primary challenges. Firstly, distribution shifts on graphs often occur simultaneously on node attributes and graph topology. Secondly, capturing invariant information amidst diverse distribution shifts proves to be a formidable challenge. To overcome these obstacles, in this paper, we introduce a novel framework, namely Graph Learning Invariant Domain genERation (GLIDER). The goal is to (1) diversify variations across domains by modeling the potential seen or unseen variations of attribute distribution and topological structure and (2) minimize the discrepancy of the variation in a representation space where the target is to predict semantic labels. Extensive experiment results indicate that our model outperforms baseline methods on node-level OOD generalization across domains in distribution shift on node features and topological structures simultaneously. △ Less

Submitted 24 March, 2024; originally announced March 2024.

arXiv:2403.12839 [pdf, other]

Global-guided Focal Neural Radiance Field for Large-scale Scene Rendering

Authors: Mingqi Shao, Feng Xiong, Hang Zhang, Shuang Yang, Mu Xu, Wei Bian, Xueqian Wang

Abstract: Neural radiance fields~(NeRF) have recently been applied to render large-scale scenes. However, their limited model capacity typically results in blurred rendering results. Existing large-scale NeRFs primarily address this limitation by partitioning the scene into blocks, which are subsequently handled by separate sub-NeRFs. These sub-NeRFs, trained from scratch and processed independently, lead t… ▽ More Neural radiance fields~(NeRF) have recently been applied to render large-scale scenes. However, their limited model capacity typically results in blurred rendering results. Existing large-scale NeRFs primarily address this limitation by partitioning the scene into blocks, which are subsequently handled by separate sub-NeRFs. These sub-NeRFs, trained from scratch and processed independently, lead to inconsistencies in geometry and appearance across the scene. Consequently, the rendering quality fails to exhibit significant improvement despite the expansion of model capacity. In this work, we present global-guided focal neural radiance field (GF-NeRF) that achieves high-fidelity rendering of large-scale scenes. Our proposed GF-NeRF utilizes a two-stage (Global and Focal) architecture and a global-guided training strategy. The global stage obtains a continuous representation of the entire scene while the focal stage decomposes the scene into multiple blocks and further processes them with distinct sub-encoders. Leveraging this two-stage architecture, sub-encoders only need fine-tuning based on the global encoder, thus reducing training complexity in the focal stage while maintaining scene-wide consistency. Spatial information and error information from the global stage also benefit the sub-encoders to focus on crucial areas and effectively capture more details of large-scale scenes. Notably, our approach does not rely on any prior knowledge about the target scene, attributing GF-NeRF adaptable to various large-scale scene types, including street-view and aerial-view scenes. We demonstrate that our method achieves high-fidelity, natural rendering results on various types of large-scale datasets. Our project page: https://shaomq2187.github.io/GF-NeRF/ △ Less

Submitted 19 March, 2024; originally announced March 2024.

arXiv:2402.11814 [pdf, other]

An Empirical Evaluation of LLMs for Solving Offensive Security Challenges

Authors: Minghao Shao, Boyuan Chen, Sofija Jancheska, Brendan Dolan-Gavitt, Siddharth Garg, Ramesh Karri, Muhammad Shafique

Abstract: Capture The Flag (CTF) challenges are puzzles related to computer security scenarios. With the advent of large language models (LLMs), more and more CTF participants are using LLMs to understand and solve the challenges. However, so far no work has evaluated the effectiveness of LLMs in solving CTF challenges with a fully automated workflow. We develop two CTF-solving workflows, human-in-the-loop… ▽ More Capture The Flag (CTF) challenges are puzzles related to computer security scenarios. With the advent of large language models (LLMs), more and more CTF participants are using LLMs to understand and solve the challenges. However, so far no work has evaluated the effectiveness of LLMs in solving CTF challenges with a fully automated workflow. We develop two CTF-solving workflows, human-in-the-loop (HITL) and fully-automated, to examine the LLMs' ability to solve a selected set of CTF challenges, prompted with information about the question. We collect human contestants' results on the same set of questions, and find that LLMs achieve higher success rate than an average human participant. This work provides a comprehensive evaluation of the capability of LLMs in solving real world CTF challenges, from real competition to fully automated workflow. Our results provide references for applying LLMs in cybersecurity education and pave the way for systematic evaluation of offensive cybersecurity capabilities in LLMs. △ Less

Submitted 18 February, 2024; originally announced February 2024.

arXiv:2402.10434 [pdf, other]

Parametric Augmentation for Time Series Contrastive Learning

Authors: Xu Zheng, Tianchun Wang, Wei Cheng, Aitian Ma, Haifeng Chen, Mo Sha, Dongsheng Luo

Abstract: Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data au… ▽ More Modern techniques like contrastive learning have been effectively used in many areas, including computer vision, natural language processing, and graph-structured data. Creating positive examples that assist the model in learning robust and discriminative representations is a crucial stage in contrastive learning approaches. Usually, preset human intuition directs the selection of relevant data augmentations. Due to patterns that are easily recognized by humans, this rule of thumb works well in the vision and language domains. However, it is impractical to visually inspect the temporal structures in time series. The diversity of time series augmentations at both the dataset and instance levels makes it difficult to choose meaningful augmentations on the fly. In this study, we address this gap by analyzing time series data augmentation using information theory and summarizing the most commonly adopted augmentations in a unified format. We then propose a contrastive learning framework with parametric augmentation, AutoTCL, which can be adaptively employed to support time series representation learning. The proposed approach is encoder-agnostic, allowing it to be seamlessly integrated with different backbone encoders. Experiments on univariate forecasting tasks demonstrate the highly competitive results of our method, with an average 6.5\% reduction in MSE and 4.7\% in MAE over the leading baselines. In classification tasks, AutoTCL achieves a $1.2\%$ increase in average accuracy. △ Less

Submitted 15 February, 2024; originally announced February 2024.

Comments: Accepted by International Conference on Learning Representations (ICLR 2024)

arXiv:2402.01327 [pdf, other]

Supervised Algorithmic Fairness in Distribution Shifts: A Survey

Authors: Minglai Shao, Dong Li, Chen Zhao, Xintao Wu, Yujie Lin, Qin Tian

Abstract: Supervised fairness-aware machine learning under distribution shifts is an emerging field that addresses the challenge of maintaining equitable and unbiased predictions when faced with changes in data distributions from source to target domains. In real-world applications, machine learning models are often trained on a specific dataset but deployed in environments where the data distribution may s… ▽ More Supervised fairness-aware machine learning under distribution shifts is an emerging field that addresses the challenge of maintaining equitable and unbiased predictions when faced with changes in data distributions from source to target domains. In real-world applications, machine learning models are often trained on a specific dataset but deployed in environments where the data distribution may shift over time due to various factors. This shift can lead to unfair predictions, disproportionately affecting certain groups characterized by sensitive attributes, such as race and gender. In this survey, we provide a summary of various types of distribution shifts and comprehensively investigate existing methods based on these shifts, highlighting six commonly used approaches in the literature. Additionally, this survey lists publicly available datasets and evaluation metrics for empirical studies. We further explore the interconnection with related research fields, discuss the significant challenges, and identify potential directions for future studies. △ Less

Submitted 4 May, 2024; v1 submitted 2 February, 2024; originally announced February 2024.

Comments: IJCAI 2024

arXiv:2312.08079 [pdf, other]

Extending Whisper with prompt tuning to target-speaker ASR

Authors: Hao Ma, Zhiyuan Peng, Mingjie Shao, Jing Li, Ju Liu

Abstract: Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech of a target speaker from multi-talker overlapped utterances. Most of the existing target-speaker ASR (TS-ASR) methods involve either training from scratch or fully fine-tuning a pre-trained model, leading to significant training costs and becoming inapplicable to large foundation models. This work leverages pro… ▽ More Target-speaker automatic speech recognition (ASR) aims to transcribe the desired speech of a target speaker from multi-talker overlapped utterances. Most of the existing target-speaker ASR (TS-ASR) methods involve either training from scratch or fully fine-tuning a pre-trained model, leading to significant training costs and becoming inapplicable to large foundation models. This work leverages prompt tuning, a parameter-efficient fine-tuning approach, to extend Whisper, a large-scale single-talker ASR model, to TS-ASR. Variants of prompt tuning approaches along with their configurations are explored and optimized for TS-ASR.Experimental results show that prompt tuning can achieve performance comparable to state-of-the-art full training approaches while only requiring about 1\% of task-specific model parameters. Notably, the original Whisper's features, such as inverse text normalization and timestamp tagging, are retained in target-speaker ASR, keeping the generated transcriptions natural and informative. △ Less

Submitted 11 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: ICASSP 2024

arXiv:2312.07941 [pdf, ps, other]

An efficient algorithm for multiuser sum-rate maximization of large-scale active RIS-aided MIMO system

Authors: Qian Zhang, Mingjie Shao, Qiang Li, Ju Liu

Abstract: Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms… ▽ More Active reconfigurable intelligent surface (RIS) is a new RIS architecture that can reflect and amplify communication signals. It can provide enhanced performance gain compared to the conventional passive RIS systems that can only reflect the signals. On the other hand, the design problem of active RIS-aided systems is more challenging than the passive RIS-aided systems and its efficient algorithms are less studied. In this paper, we consider the sum rate maximization problem in the multiuser massive multiple-input single-output (MISO) downlink with the aid of a large-scale active RIS. Existing approaches for handling this problem usually resort to general optimization solvers and can be computationally prohibitive. We propose an efficient block successive upper bound minimization (BSUM) method, of which each step has a (semi) closed-form update. Thus, the proposed algorithm has an attractive low per-iteration complexity. By simulation, our proposed algorithm consumes much less computation than the existing approaches. In particular, when the MIMO and/or RIS sizes are large, our proposed algorithm can be orders-of-magnitude faster than existing approaches. △ Less

Submitted 11 January, 2024; v1 submitted 13 December, 2023; originally announced December 2023.

Comments: ICASSP 2024

arXiv:2312.01315 [pdf, other]

Few-shot Shape Recognition by Learning Deep Shape-aware Features

Authors: Wenlong Shi, Changsheng Lu, Ming Shao, Yinjie Zhang, Siyu Xia, Piotr Koniusz

Abstract: Traditional shape descriptors have been gradually replaced by convolutional neural networks due to their superior performance in feature extraction and classification. The state-of-the-art methods recognize object shapes via image reconstruction or pixel classification. However , these methods are biased toward texture information and overlook the essential shape descriptions, thus, they fail to g… ▽ More Traditional shape descriptors have been gradually replaced by convolutional neural networks due to their superior performance in feature extraction and classification. The state-of-the-art methods recognize object shapes via image reconstruction or pixel classification. However , these methods are biased toward texture information and overlook the essential shape descriptions, thus, they fail to generalize to unseen shapes. We are the first to propose a fewshot shape descriptor (FSSD) to recognize object shapes given only one or a few samples. We employ an embedding module for FSSD to extract transformation-invariant shape features. Secondly, we develop a dual attention mechanism to decompose and reconstruct the shape features via learnable shape primitives. In this way, any shape can be formed through a finite set basis, and the learned representation model is highly interpretable and extendable to unseen shapes. Thirdly, we propose a decoding module to include the supervision of shape masks and edges and align the original and reconstructed shape features, enforcing the learned features to be more shape-aware. Lastly, all the proposed modules are assembled into a few-shot shape recognition scheme. Experiments on five datasets show that our FSSD significantly improves the shape classification compared to the state-of-the-art under the few-shot setting. △ Less

Submitted 3 December, 2023; originally announced December 2023.

Comments: Accepted by WACV 2024; 8 pages for main paper

arXiv:2310.15294 [pdf, other]

Adaptive End-to-End Metric Learning for Zero-Shot Cross-Domain Slot Filling

Authors: Yuanjun Shi, Linzhi Wu, Minglai Shao

Abstract: Recently slot filling has witnessed great development thanks to deep learning and the availability of large-scale annotated data. However, it poses a critical challenge to handle a novel domain whose samples are never seen during training. The recognition performance might be greatly degraded due to severe domain shifts. Most prior works deal with this problem in a two-pass pipeline manner based o… ▽ More Recently slot filling has witnessed great development thanks to deep learning and the availability of large-scale annotated data. However, it poses a critical challenge to handle a novel domain whose samples are never seen during training. The recognition performance might be greatly degraded due to severe domain shifts. Most prior works deal with this problem in a two-pass pipeline manner based on metric learning. In practice, these dominant pipeline models may be limited in computational efficiency and generalization capacity because of non-parallel inference and context-free discrete label embeddings. To this end, we re-examine the typical metric-based methods, and propose a new adaptive end-to-end metric learning scheme for the challenging zero-shot slot filling. Considering simplicity, efficiency and generalizability, we present a cascade-style joint learning framework coupled with context-aware soft label representations and slot-level contrastive representation learning to mitigate the data and label shift problems effectively. Extensive experiments on public benchmarks demonstrate the superiority of the proposed approach over a series of competitive baselines. △ Less

Submitted 23 October, 2023; originally announced October 2023.

Comments: Accepted to EMNLP 2023 (Main, Long Paper)

arXiv:2309.13005 [pdf, other]

Towards Counterfactual Fairness-aware Domain Generalization in Changing Environments

Authors: Yujie Lin, Chen Zhao, Minglai Shao, Baoluo Meng, Xujiang Zhao, Haifeng Chen

Abstract: Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodolog… ▽ More Recognizing the prevalence of domain shift as a common challenge in machine learning, various domain generalization (DG) techniques have been developed to enhance the performance of machine learning systems when dealing with out-of-distribution (OOD) data. Furthermore, in real-world scenarios, data distributions can gradually change across a sequence of sequential domains. While current methodologies primarily focus on improving model effectiveness within these new domains, they often overlook fairness issues throughout the learning process. In response, we introduce an innovative framework called Counterfactual Fairness-Aware Domain Generalization with Sequential Autoencoder (CDSAE). This approach effectively separates environmental information and sensitive attributes from the embedded representation of classification features. This concurrent separation not only greatly improves model generalization across diverse and unfamiliar domains but also effectively addresses challenges related to unfair classification. Our strategy is rooted in the principles of causal inference to tackle these dual issues. To examine the intricate relationship between semantic information, sensitive attributes, and environmental cues, we systematically categorize exogenous uncertainty factors into four latent variables: 1) semantic information influenced by sensitive attributes, 2) semantic information unaffected by sensitive attributes, 3) environmental cues influenced by sensitive attributes, and 4) environmental cues unaffected by sensitive attributes. By incorporating fairness regularization, we exclusively employ semantic information for classification purposes. Empirical validation on synthetic and real-world datasets substantiates the effectiveness of our approach, demonstrating improved accuracy levels while ensuring the preservation of fairness in the evolving landscape of continuous domains. △ Less

Submitted 5 May, 2024; v1 submitted 22 September, 2023; originally announced September 2023.

Comments: IJCAI 2024

arXiv:2309.07380 [pdf]

Domain-adaptive Graph Attention-supervised Network for Cross-network Edge Classification

Authors: Xiao Shen, Mengqiu Shao, Shirui Pan, Laurence T. Yang, Xi Zhou

Abstract: Graph neural networks (GNNs) have shown great ability in modeling graphs, however, their performance would significantly degrade when there are noisy edges connecting nodes from different classes. To alleviate negative effect of noisy edges on neighborhood aggregation, some recent GNNs propose to predict the label agreement between node pairs within a single network. However, predicting the label… ▽ More Graph neural networks (GNNs) have shown great ability in modeling graphs, however, their performance would significantly degrade when there are noisy edges connecting nodes from different classes. To alleviate negative effect of noisy edges on neighborhood aggregation, some recent GNNs propose to predict the label agreement between node pairs within a single network. However, predicting the label agreement of edges across different networks has not been investigated yet. Our work makes the pioneering attempt to study a novel problem of cross-network homophilous and heterophilous edge classification (CNHHEC), and proposes a novel domain-adaptive graph attention-supervised network (DGASN) to effectively tackle the CNHHEC problem. Firstly, DGASN adopts multi-head GAT as the GNN encoder, which jointly trains node embeddings and edge embeddings via the node classification and edge classification losses. As a result, label-discriminative embeddings can be obtained to distinguish homophilous edges from heterophilous edges. In addition, DGASN applies direct supervision on graph attention learning based on the observed edge labels from the source network, thus lowering the negative effects of heterophilous edges while enlarging the positive effects of homophilous edges during neighborhood aggregation. To facilitate knowledge transfer across networks, DGASN employs adversarial domain adaptation to mitigate domain divergence. Extensive experiments on real-world benchmark datasets demonstrate that the proposed DGASN achieves the state-of-the-art performance in CNHHEC. △ Less

Submitted 13 September, 2023; originally announced September 2023.

Comments: IEEE Transactions on Neural Networks and Learning Systems, 2023

arXiv:2309.01627 [pdf, other]

Cross-Consistent Deep Unfolding Network for Adaptive All-In-One Video Restoration

Authors: Yuanshuo Cheng, Mingwen Shao, Yecong Wan, Yuanjian Qiao, Wangmeng Zuo, Deyu Meng

Abstract: Existing Video Restoration (VR) methods always necessitate the individual deployment of models for each adverse weather to remove diverse adverse weather degradations, lacking the capability for adaptive processing of degradations. Such limitation amplifies the complexity and deployment costs in practical applications. To overcome this deficiency, in this paper, we propose a Cross-consistent Deep… ▽ More Existing Video Restoration (VR) methods always necessitate the individual deployment of models for each adverse weather to remove diverse adverse weather degradations, lacking the capability for adaptive processing of degradations. Such limitation amplifies the complexity and deployment costs in practical applications. To overcome this deficiency, in this paper, we propose a Cross-consistent Deep Unfolding Network (CDUN) for All-In-One VR, which enables the employment of a single model to remove diverse degradations for the first time. Specifically, the proposed CDUN accomplishes a novel iterative optimization framework, capable of restoring frames corrupted by corresponding degradations according to the degradation features given in advance. To empower the framework for eliminating diverse degradations, we devise a Sequence-wise Adaptive Degradation Estimator (SADE) to estimate degradation features for the input corrupted video. By orchestrating these two cascading procedures, CDUN achieves adaptive processing for diverse degradation. In addition, we introduce a window-based inter-frame fusion strategy to utilize information from more adjacent frames. This strategy involves the progressive stacking of temporal windows in multiple iterations, effectively enlarging the temporal receptive field and enabling each frame's restoration to leverage information from distant frames. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance in All-In-One VR. △ Less

Submitted 10 December, 2023; v1 submitted 4 September, 2023; originally announced September 2023.

Comments: 16 pages, 13 figures

arXiv:2308.16879 [pdf, other]

Adaptation Speed Analysis for Fairness-aware Causal Models

Authors: Yujie Lin, Chen Zhao, Minglai Shao, Xujiang Zhao, Haifeng Chen

Abstract: For example, in machine translation tasks, to achieve bidirectional translation between two languages, the source corpus is often used as the target corpus, which involves the training of two models with opposite directions. The question of which one can adapt most quickly to a domain shift is of significant importance in many fields. Specifically, consider an original distribution p that changes… ▽ More For example, in machine translation tasks, to achieve bidirectional translation between two languages, the source corpus is often used as the target corpus, which involves the training of two models with opposite directions. The question of which one can adapt most quickly to a domain shift is of significant importance in many fields. Specifically, consider an original distribution p that changes due to an unknown intervention, resulting in a modified distribution p*. In aligning p with p*, several factors can affect the adaptation rate, including the causal dependencies between variables in p. In real-life scenarios, however, we have to consider the fairness of the training process, and it is particularly crucial to involve a sensitive variable (bias) present between a cause and an effect variable. To explore this scenario, we examine a simple structural causal model (SCM) with a cause-bias-effect structure, where variable A acts as a sensitive variable between cause (X) and effect (Y). The two models, respectively, exhibit consistent and contrary cause-effect directions in the cause-bias-effect SCM. After conducting unknown interventions on variables within the SCM, we can simulate some kinds of domain shifts for analysis. We then compare the adaptation speeds of two models across four shift scenarios. Additionally, we prove the connection between the adaptation speeds of the two models across all interventions. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: CIKM 2023

arXiv:2308.16441 [pdf, other]

Contrastive Representation Learning Based on Multiple Node-centered Subgraphs

Authors: Dong Li, Wenjun Wang, Minglai Shao, Chen Zhao

Abstract: As the basic element of graph-structured data, node has been recognized as the main object of study in graph representation learning. A single node intuitively has multiple node-centered subgraphs from the whole graph (e.g., one person in a social network has multiple social circles based on his different relationships). We study this intuition under the framework of graph contrastive learning, an… ▽ More As the basic element of graph-structured data, node has been recognized as the main object of study in graph representation learning. A single node intuitively has multiple node-centered subgraphs from the whole graph (e.g., one person in a social network has multiple social circles based on his different relationships). We study this intuition under the framework of graph contrastive learning, and propose a multiple node-centered subgraphs contrastive representation learning method to learn node representation on graphs in a self-supervised way. Specifically, we carefully design a series of node-centered regional subgraphs of the central node. Then, the mutual information between different subgraphs of the same node is maximized by contrastive loss. Experiments on various real-world datasets and different downstream tasks demonstrate that our model has achieved state-of-the-art results. △ Less

Submitted 31 August, 2023; originally announced August 2023.

Comments: CIKM 2023

arXiv:2307.12872 [pdf, other]

Latent Code Augmentation Based on Stable Diffusion for Data-free Substitute Attacks

Authors: Mingwen Shao, Lingzhuang Meng, Yuanjian Qiao, Lixu Zhang, Wangmeng Zuo

Abstract: Since the training data of the target model is not available in the black-box substitute attack, most recent schemes utilize GANs to generate data for training the substitute model. However, these GANs-based schemes suffer from low training efficiency as the generator needs to be retrained for each target model during the substitute training process, as well as low generation quality. To overcome… ▽ More Since the training data of the target model is not available in the black-box substitute attack, most recent schemes utilize GANs to generate data for training the substitute model. However, these GANs-based schemes suffer from low training efficiency as the generator needs to be retrained for each target model during the substitute training process, as well as low generation quality. To overcome these limitations, we consider utilizing the diffusion model to generate data, and propose a novel data-free substitute attack scheme based on the Stable Diffusion (SD) to improve the efficiency and accuracy of substitute training. Despite the data generated by the SD exhibiting high quality, it presents a different distribution of domains and a large variation of positive and negative samples for the target model. For this problem, we propose Latent Code Augmentation (LCA) to facilitate SD in generating data that aligns with the data distribution of the target model. Specifically, we augment the latent codes of the inferred member data with LCA and use them as guidance for SD. With the guidance of LCA, the data generated by the SD not only meets the discriminative criteria of the target model but also exhibits high diversity. By utilizing this data, it is possible to train the substitute model that closely resembles the target model more efficiently. Extensive experiments demonstrate that our LCA achieves higher attack success rates and requires fewer query budgets compared to GANs-based schemes for different target models. Our codes are available at \url{https://github.com/LzhMeng/LCA}. △ Less

Submitted 30 March, 2024; v1 submitted 24 July, 2023; originally announced July 2023.

Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2307.07688 [pdf, other]

DRM-IR: Task-Adaptive Deep Unfolding Network for All-In-One Image Restoration

Authors: Yuanshuo Cheng, Mingwen Shao, Yecong Wan, Chao Wang

Abstract: Existing All-In-One image restoration (IR) methods usually lack flexible modeling on various types of degradation, thus impeding the restoration performance. To achieve All-In-One IR with higher task dexterity, this work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR), which consists of task-adaptive degradation modeling and model-based image restoring. Specifically, these two s… ▽ More Existing All-In-One image restoration (IR) methods usually lack flexible modeling on various types of degradation, thus impeding the restoration performance. To achieve All-In-One IR with higher task dexterity, this work proposes an efficient Dynamic Reference Modeling paradigm (DRM-IR), which consists of task-adaptive degradation modeling and model-based image restoring. Specifically, these two subtasks are formalized as a pair of entangled reference-based maximum a posteriori (MAP) inferences, which are optimized synchronously in an unfolding-based manner. With the two cascaded subtasks, DRM-IR first dynamically models the task-specific degradation based on a reference image pair and further restores the image with the collected degradation statistics. Besides, to bridge the semantic gap between the reference and target degraded images, we further devise a Degradation Prior Transmitter (DPT) that restrains the instance-specific feature differences. DRM-IR explicitly provides superior flexibility for All-in-One IR while being interpretable. Extensive experiments on multiple benchmark datasets show that our DRM-IR achieves state-of-the-art in All-In-One IR. △ Less

Submitted 30 November, 2023; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2307.04066 [pdf, other]

Random Position Adversarial Patch for Vision Transformers

Authors: Mingzhen Shao

Abstract: Previous studies have shown the vulnerability of vision transformers to adversarial patches, but these studies all rely on a critical assumption: the attack patches must be perfectly aligned with the patches used for linear projection in vision transformers. Due to this stringent requirement, deploying adversarial patches for vision transformers in the physical world becomes impractical, unlike th… ▽ More Previous studies have shown the vulnerability of vision transformers to adversarial patches, but these studies all rely on a critical assumption: the attack patches must be perfectly aligned with the patches used for linear projection in vision transformers. Due to this stringent requirement, deploying adversarial patches for vision transformers in the physical world becomes impractical, unlike their effectiveness on CNNs. This paper proposes a novel method for generating an adversarial patch (G-Patch) that overcomes the alignment constraint, allowing the patch to launch a targeted attack at any position within the field of view. Specifically, instead of directly optimizing the patch using gradients, we employ a GAN-like structure to generate the adversarial patch. Our experiments show the effectiveness of the adversarial patch in achieving universal attacks on vision transformers, both in digital and physical-world scenarios. Additionally, further analysis reveals that the generated adversarial patch exhibits robustness to brightness restriction, color transfer, and random noise. Real-world attack experiments validate the effectiveness of the G-Patch to launch robust attacks even under some very challenging conditions. △ Less

Submitted 8 July, 2023; originally announced July 2023.

arXiv:2307.00421 [pdf, other]

Brightness-Restricted Adversarial Attack Patch

Authors: Mingzhen Shao

Abstract: Adversarial attack patches have gained increasing attention due to their practical applicability in physical-world scenarios. However, the bright colors used in attack patches represent a significant drawback, as they can be easily identified by human observers. Moreover, even though these attacks have been highly successful in deceiving target networks, which specific features of the attack patch… ▽ More Adversarial attack patches have gained increasing attention due to their practical applicability in physical-world scenarios. However, the bright colors used in attack patches represent a significant drawback, as they can be easily identified by human observers. Moreover, even though these attacks have been highly successful in deceiving target networks, which specific features of the attack patch contribute to its success are still unknown. Our paper introduces a brightness-restricted patch (BrPatch) that uses optical characteristics to effectively reduce conspicuousness while preserving image independence. We also conducted an analysis of the impact of various image features (such as color, texture, noise, and size) on the effectiveness of an attack patch in physical-world deployment. Our experiments show that attack patches exhibit strong redundancy to brightness and are resistant to color transfer and noise. Based on our findings, we propose some additional methods to further reduce the conspicuousness of BrPatch. Our findings also explain the robustness of attack patches observed in physical-world scenarios. △ Less

Submitted 1 July, 2023; originally announced July 2023.

arXiv:2306.15167 [pdf, other]

An Efficient Global Algorithm for One-Bit Maximum-Likelihood MIMO Detection

Authors: Cheng-Yang Yu, Mingjie Shao, Wei-Kun Chen, Ya-Feng Liu, Wing-Kin Ma

Abstract: There has been growing interest in implementing massive MIMO systems by one-bit analog-to-digital converters (ADCs), which have the benefit of reducing the power consumption and hardware complexity. One-bit MIMO detection arises in such a scenario. It aims to detect the multiuser signals from the one-bit quantized received signals in an uplink channel. In this paper, we consider one-bit maximum-li… ▽ More There has been growing interest in implementing massive MIMO systems by one-bit analog-to-digital converters (ADCs), which have the benefit of reducing the power consumption and hardware complexity. One-bit MIMO detection arises in such a scenario. It aims to detect the multiuser signals from the one-bit quantized received signals in an uplink channel. In this paper, we consider one-bit maximum-likelihood (ML) MIMO detection in massive MIMO systems, which amounts to solving a large-scale nonlinear integer programming problem. We propose an efficient global algorithm for solving the one-bit ML MIMO detection problem. We first reformulate the problem as a mixed integer linear programming (MILP) problem that has a massive number of linear constraints. The massive number of linear constraints raises computational challenges. To solve the MILP problem efficiently, we custom build a light-weight branch-and-bound tree search algorithm, where the linear constraints are incrementally added during the tree search procedure and only small-size linear programming subproblems need to be solved at each iteration. We provide simulation results to demonstrate the efficiency of the proposed method. △ Less

Submitted 3 July, 2023; v1 submitted 26 June, 2023; originally announced June 2023.

arXiv:2306.10695 [pdf, other]

SeMAIL: Eliminating Distractors in Visual Imitation via Separated Models

Authors: Shenghua Wan, Yucen Wang, Minghao Shao, Ruying Chen, De-Chuan Zhan

Abstract: Model-based imitation learning (MBIL) is a popular reinforcement learning method that improves sample efficiency on high-dimension input sources, such as images and videos. Following the convention of MBIL research, existing algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose a new algorithm - named Separated M… ▽ More Model-based imitation learning (MBIL) is a popular reinforcement learning method that improves sample efficiency on high-dimension input sources, such as images and videos. Following the convention of MBIL research, existing algorithms are highly deceptive by task-irrelevant information, especially moving distractors in videos. To tackle this problem, we propose a new algorithm - named Separated Model-based Adversarial Imitation Learning (SeMAIL) - decoupling the environment dynamics into two parts by task-relevant dependency, which is determined by agent actions, and training separately. In this way, the agent can imagine its trajectories and imitate the expert behavior efficiently in task-relevant state space. Our method achieves near-expert performance on various visual control tasks with complex observations and the more challenging tasks with different backgrounds from expert observations. △ Less

Submitted 19 June, 2023; originally announced June 2023.

Comments: 18 pages, 7 figures

arXiv:2305.11640 [pdf, other]

Distribution-Free Matrix Prediction Under Arbitrary Missing Pattern

Authors: Meijia Shao, Yuan Zhang

Abstract: This paper studies the open problem of conformalized entry prediction in a row/column-exchangeable matrix. The matrix setting presents novel and unique challenges, but there exists little work on this interesting topic. We meticulously define the problem, differentiate it from closely related problems, and rigorously delineate the boundary between achievable and impossible goals. We then propose t… ▽ More This paper studies the open problem of conformalized entry prediction in a row/column-exchangeable matrix. The matrix setting presents novel and unique challenges, but there exists little work on this interesting topic. We meticulously define the problem, differentiate it from closely related problems, and rigorously delineate the boundary between achievable and impossible goals. We then propose two practical algorithms. The first method provides a fast emulation of the full conformal prediction, while the second method leverages the technique of algorithmic stability for acceleration. Both methods are computationally efficient and can effectively safeguard coverage validity in presence of arbitrary missing pattern. Further, we quantify the impact of missingness on prediction accuracy and establish fundamental limit results. Empirical evidence from synthetic and real-world data sets corroborates the superior performance of our proposed methods. △ Less

Submitted 6 June, 2023; v1 submitted 19 May, 2023; originally announced May 2023.

Comments: 12 pages, 4 figures

arXiv:2305.09996 [pdf, other]

Restoring Images Captured in Arbitrary Hybrid Adverse Weather Conditions in One Go

Authors: Ye-Cong Wan, Ming-Wen Shao, Yuan-Shuo Cheng, Yue-Xian Liu, Zhi-Yuan Bao

Abstract: Adverse conditions typically suffer from stochastic hybrid weather degradations (e.g., rainy and hazy night), while existing image restoration algorithms envisage that weather degradations occur independently, thus may fail to handle real-world complicated scenarios. Besides, supervised training is not feasible due to the lack of a comprehensive paired dataset to characterize hybrid conditions. To… ▽ More Adverse conditions typically suffer from stochastic hybrid weather degradations (e.g., rainy and hazy night), while existing image restoration algorithms envisage that weather degradations occur independently, thus may fail to handle real-world complicated scenarios. Besides, supervised training is not feasible due to the lack of a comprehensive paired dataset to characterize hybrid conditions. To this end, we have advanced the aforementioned limitations with two tactics: framework and data. First, we present a novel unified framework, dubbed RAHC, to Restore Arbitrary Hybrid adverse weather Conditions in one go. Specifically, our RAHC leverages a multi-head aggregation architecture to learn multiple degradation representation subspaces and then constrains the network to flexibly handle multiple hybrid adverse weather in a unified paradigm through a discrimination mechanism in the output space. Furthermore, we devise a reconstruction vectors aided scheme to provide auxiliary visual content cues for reconstruction, thus can comfortably cope with hybrid scenarios with insufficient remaining image constituents. Second, we construct a new dataset, termed HAC, for learning and benchmarking arbitrary Hybrid Adverse Conditions restoration. HAC contains 31 scenarios composed of an arbitrary combination of five common weather, with a total of ~316K adverse-weather/clean pairs. Extensive experiments yield superior results and establish new state-of-the-art results on both HAC and conventional datasets. △ Less

Submitted 13 June, 2023; v1 submitted 17 May, 2023; originally announced May 2023.

Comments: In submission

arXiv:2304.09976 [pdf, other]

Analyzing the Domain Shift Immunity of Deep Homography Estimation

Authors: Mingzhen Shao, Tolga Tasdizen, Sarang Joshi

Abstract: Homography estimation serves as a fundamental technique for image alignment in a wide array of applications. The advent of convolutional neural networks has introduced learning-based methodologies that have exhibited remarkable efficacy in this realm. Yet, the generalizability of these approaches across distinct domains remains underexplored. Unlike other conventional tasks, CNN-driven homography… ▽ More Homography estimation serves as a fundamental technique for image alignment in a wide array of applications. The advent of convolutional neural networks has introduced learning-based methodologies that have exhibited remarkable efficacy in this realm. Yet, the generalizability of these approaches across distinct domains remains underexplored. Unlike other conventional tasks, CNN-driven homography estimation models show a distinctive immunity to domain shifts, enabling seamless deployment from one dataset to another without the necessity of transfer learning. This study explores the resilience of a variety of deep homography estimation models to domain shifts, revealing that the network architecture itself is not a contributing factor to this remarkable adaptability. By closely examining the models' focal regions and subjecting input images to a variety of modifications, we confirm that the models heavily rely on local textures such as edges and corner points for homography estimation. Moreover, our analysis underscores that the domain shift immunity itself is intricately tied to the utilization of these local textures. △ Less

Submitted 29 November, 2023; v1 submitted 19 April, 2023; originally announced April 2023.

arXiv:2303.15594 [pdf, other]

doi 10.1145/3584371.3612987

On de novo Bridging Paired-end RNA-seq Data

Authors: Xiang Li, Mingfu Shao

Abstract: The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced two ends in the absence of the reference genome - a problem here we refer to as de novo bridging. Solving this problem provides longer, more informative RNA-seq r… ▽ More The high-throughput short-reads RNA-seq protocols often produce paired-end reads, with the middle portion of the fragments being unsequenced. We explore if the full-length fragments can be computationally reconstructed from the sequenced two ends in the absence of the reference genome - a problem here we refer to as de novo bridging. Solving this problem provides longer, more informative RNA-seq reads, and benefits downstream RNA-seq analysis such as transcript assembly, expression quantification, and splicing differential analysis. However, de novo bridging is a challenging and complicated task owing to alternative splicing, transcript noises, and sequencing errors. It remains unclear if the data provides sufficient information for accurate bridging, let alone efficient algorithms that determine the true bridges. Methods have been proposed to bridge paired-end reads in the presence of reference genome (called reference-based bridging), but the algorithms are far away from scaling for de novo bridging as the underlying compacted de Bruijn graph(cdBG) used in the latter task often contains millions of vertices and edges. We designed a new truncated Dijkstra's algorithm for this problem, and proposed a novel algorithm that reuses the shortest path tree to avoid running the truncated Dijkstra's algorithm from scratch for all vertices for further speeding up. These innovative techniques result in scalable algorithms that can bridge all paired-end reads in a cdBG with millions of vertices. Our experiments showed that paired-end RNA-seq reads can be accurately bridged to a large extent. The resulting tool is freely available at https://github.com/Shao-Group/rnabridge-denovo. △ Less

Submitted 27 March, 2023; originally announced March 2023.

Comments: 10 pages, 4 figures

ACM Class: J.3

arXiv:2303.10926 [pdf, other]

On the Maximal Independent Sets of $k$-mers with the Edit Distance

Authors: Leran Ma, Ke Chen, Mingfu Shao

Abstract: In computational biology, $k$-mers and edit distance are fundamental concepts. However, little is known about the metric space of all $k$-mers equipped with the edit distance. In this work, we explore the structure of the $k$-mer space by studying its maximal independent sets (MISs). An MIS is a sparse sketch of all $k$-mers with nice theoretical properties, and therefore admits critical applicati… ▽ More In computational biology, $k$-mers and edit distance are fundamental concepts. However, little is known about the metric space of all $k$-mers equipped with the edit distance. In this work, we explore the structure of the $k$-mer space by studying its maximal independent sets (MISs). An MIS is a sparse sketch of all $k$-mers with nice theoretical properties, and therefore admits critical applications in clustering, indexing, hashing, and sketching large-scale sequencing data, particularly those with high error-rates. Finding an MIS is a challenging problem, as the size of a $k$-mer space grows geometrically with respect to $k$. We propose three algorithms for this problem. The first and the most intuitive one uses a greedy strategy. The second method implements two techniques to avoid redundant comparisons by taking advantage of the locality-property of the $k$-mer space and the estimated bounds on the edit distance. The last algorithm avoids expensive calculations of the edit distance by translating the edit distance into the shortest path in a specifically designed graph. These algorithms are implemented and the calculated MISs of $k$-mer spaces and their statistical properties are reported and analyzed for $k$ up to 15. Source code is freely available at https://github.com/Shao-Group/kmerspace . △ Less

Submitted 20 March, 2023; originally announced March 2023.

Comments: 9 pages, 1 figure

arXiv:2302.13096 [pdf]

Real-Time Recognition of In-Place Body Actions and Head Gestures using Only a Head-Mounted Display

Authors: Jingbo Zhao, Mingjun Shao, Yaojun Wang, Ruolin Xu

Abstract: Body actions and head gestures are natural interfaces for interaction in virtual environments. Existing methods for in-place body action recognition often require hardware more than a head-mounted display (HMD), making body action interfaces difficult to be introduced to ordinary virtual reality (VR) users as they usually only possess an HMD. In addition, there lacks a unified solution to recogniz… ▽ More Body actions and head gestures are natural interfaces for interaction in virtual environments. Existing methods for in-place body action recognition often require hardware more than a head-mounted display (HMD), making body action interfaces difficult to be introduced to ordinary virtual reality (VR) users as they usually only possess an HMD. In addition, there lacks a unified solution to recognize in-place body actions and head gestures. This potentially hinders the exploration of the use of in-place body actions and head gestures for novel interaction experiences in virtual environments. We present a unified two-stream 1-D convolutional neural network (CNN) for recognition of body actions when a user performs walking-in-place (WIP) and for recognition of head gestures when a user stands still wearing only an HMD. Compared to previous approaches, our method does not require specialized hardware and/or additional tracking devices other than an HMD and can recognize a significantly larger number of body actions and head gestures than other existing methods. In total, ten in-place body actions and eight head gestures can be recognized with the proposed method, which makes this method a readily available body action interface (head gestures included) for interaction with virtual environments. We demonstrate one utility of the interface through a virtual locomotion task. Results show that the present body action interface is reliable in detecting body actions for the VR locomotion task but is physically demanding compared to a touch controller interface. The present body action interface is promising for new VR experiences and applications, especially for VR fitness applications where workouts are intended. △ Less

Submitted 25 February, 2023; originally announced February 2023.

Comments: 2023 IEEE International Conference on Virtual Reality and 3D User Interfaces, Shanghai, Mar 25-29, 2023

arXiv:2302.11707 [pdf]

A Deep Neural Network Based Approach to Building Budget-Constrained Models for Big Data Analysis

Authors: Rui Ming, Haiping Xu, Shannon E. Gibbs, Donghui Yan, Ming Shao

Abstract: Deep learning approaches require collection of data on many different input features or variables for accurate model training and prediction. Since data collection on input features could be costly, it is crucial to reduce the cost by selecting a subset of features and developing a budget-constrained model (BCM). In this paper, we introduce an approach to eliminating less important features for bi… ▽ More Deep learning approaches require collection of data on many different input features or variables for accurate model training and prediction. Since data collection on input features could be costly, it is crucial to reduce the cost by selecting a subset of features and developing a budget-constrained model (BCM). In this paper, we introduce an approach to eliminating less important features for big data analysis using Deep Neural Networks (DNNs). Once a DNN model has been developed, we identify the weak links and weak neurons, and remove some input features to bring the model cost within a given budget. The experimental results show our approach is feasible and supports user selection of a suitable BCM within a given budget. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 8 pages

arXiv:2302.11430 [pdf]

Differentiable Rotamer Sampling with Molecular Force Fields

Authors: Congzhou M. Sha, Jian Wang, Nikolay V. Dokholyan

Abstract: Molecular dynamics is the primary computational method by which modern structural biology explores macromolecule structure and function. Boltzmann generators have been proposed as an alternative to molecular dynamics, by replacing the integration of molecular systems over time with the training of generative neural networks. This neural network approach to MD samples rare events at a higher rate t… ▽ More Molecular dynamics is the primary computational method by which modern structural biology explores macromolecule structure and function. Boltzmann generators have been proposed as an alternative to molecular dynamics, by replacing the integration of molecular systems over time with the training of generative neural networks. This neural network approach to MD samples rare events at a higher rate than traditional MD, however critical gaps in the theory and computational feasibility of Boltzmann generators significantly reduce their usability. Here, we develop a mathematical foundation to overcome these barriers; we demonstrate that the Boltzmann generator approach is sufficiently rapid to replace traditional MD for complex macromolecules, such as proteins in specific applications, and we provide a comprehensive toolkit for the exploration of molecular energy landscapes with neural networks. △ Less

Submitted 22 February, 2023; originally announced February 2023.

Comments: 41 pages, 1 graphical abstract, 5 figures

arXiv:2301.07234 [pdf, other]

DRIMET: Deep Registration for 3D Incompressible Motion Estimation in Tagged-MRI with Application to the Tongue

Authors: Zhangxing Bian, Fangxu Xing, Jinglun Yu, Muhan Shao, Yihao Liu, Aaron Carass, Jiachen Zhuo, Jonghye Woo, Jerry L. Prince

Abstract: Tagged magnetic resonance imaging~(MRI) has been used for decades to observe and quantify the detailed motion of deforming tissue. However, this technique faces several challenges such as tag fading, large motion, long computation times, and difficulties in obtaining diffeomorphic incompressible flow fields. To address these issues, this paper presents a novel unsupervised phase-based 3D motion es… ▽ More Tagged magnetic resonance imaging~(MRI) has been used for decades to observe and quantify the detailed motion of deforming tissue. However, this technique faces several challenges such as tag fading, large motion, long computation times, and difficulties in obtaining diffeomorphic incompressible flow fields. To address these issues, this paper presents a novel unsupervised phase-based 3D motion estimation technique for tagged MRI. We introduce two key innovations. First, we apply a sinusoidal transformation to the harmonic phase input, which enables end-to-end training and avoids the need for phase interpolation. Second, we propose a Jacobian determinant-based learning objective to encourage incompressible flow fields for deforming biological tissues. Our method efficiently estimates 3D motion fields that are accurate, dense, and approximately diffeomorphic and incompressible. The efficacy of the method is assessed using human tongue motion during speech, and includes both healthy controls and patients that have undergone glossectomy. We show that the method outperforms existing approaches, and also exhibits improvements in speed, robustness to tag fading, and large tongue motion. The code is available: https://github.com/jasonbian97/DRIMET-tagged-MRI △ Less

Submitted 30 April, 2023; v1 submitted 17 January, 2023; originally announced January 2023.

Comments: Accepted to MIDL 2023 (oral)

arXiv:2301.06114 [pdf, other]

Segmenting thalamic nuclei from manifold projections of multi-contrast MRI

Authors: Chang Yan, Muhan Shao, Zhangxing Bian, Anqi Feng, Yuan Xue, Jiachen Zhuo, Rao P. Gullapalli, Aaron Carass, Jerry L. Prince

Abstract: The thalamus is a subcortical gray matter structure that plays a key role in relaying sensory and motor signals within the brain. Its nuclei can atrophy or otherwise be affected by neurological disease and injuries including mild traumatic brain injury. Segmenting both the thalamus and its nuclei is challenging because of the relatively low contrast within and around the thalamus in conventional m… ▽ More The thalamus is a subcortical gray matter structure that plays a key role in relaying sensory and motor signals within the brain. Its nuclei can atrophy or otherwise be affected by neurological disease and injuries including mild traumatic brain injury. Segmenting both the thalamus and its nuclei is challenging because of the relatively low contrast within and around the thalamus in conventional magnetic resonance (MR) images. This paper explores imaging features to determine key tissue signatures that naturally cluster, from which we can parcellate thalamic nuclei. Tissue contrasts include T1-weighted and T2-weighted images, MR diffusion measurements including FA, mean diffusivity, Knutsson coefficients that represent fiber orientation, and synthetic multi-TI images derived from FGATIR and T1-weighted images. After registration of these contrasts and isolation of the thalamus, we use the uniform manifold approximation and projection (UMAP) method for dimensionality reduction to produce a low-dimensional representation of the data within the thalamus. Manual labeling of the thalamus provides labels for our UMAP embedding from which k nearest neighbors can be used to label new unseen voxels in that same UMAP embedding. N -fold cross-validation of the method reveals comparable performance to state-of-the-art methods for thalamic parcellation. △ Less

Submitted 31 January, 2023; v1 submitted 15 January, 2023; originally announced January 2023.

Comments: 8 pages, 3 figures, 2023 SPIE-MI Image Processing

arXiv:2212.03039 [pdf, ps, other]

Covariance Regularization for Probabilistic Linear Discriminant Analysis

Authors: Zhiyuan Peng, Mingjie Shao, Xuanji He, Xu Li, Tan Lee, Ke Ding, Guanglu Wan

Abstract: Probabilistic linear discriminant analysis (PLDA) is commonly used in speaker verification systems to score the similarity of speaker embeddings. Recent studies improved the performance of PLDA in domain-matched conditions by diagonalizing its covariance. We suspect such brutal pruning approach could eliminate its capacity in modeling dimension correlation of speaker embeddings, leading to inadequ… ▽ More Probabilistic linear discriminant analysis (PLDA) is commonly used in speaker verification systems to score the similarity of speaker embeddings. Recent studies improved the performance of PLDA in domain-matched conditions by diagonalizing its covariance. We suspect such brutal pruning approach could eliminate its capacity in modeling dimension correlation of speaker embeddings, leading to inadequate performance with domain adaptation. This paper explores two alternative covariance regularization approaches, namely, interpolated PLDA and sparse PLDA, to tackle the problem. The interpolated PLDA incorporates the prior knowledge from cosine scoring to interpolate the covariance of PLDA. The sparse PLDA introduces a sparsity penalty to update the covariance. Experimental results demonstrate that both approaches outperform diagonal regularization noticeably with domain adaptation. In addition, in-domain data can be significantly reduced when training sparse PLDA for domain adaptation. △ Less

Submitted 6 December, 2022; originally announced December 2022.

arXiv:2211.16092 [pdf, other]

Unsupervised Visual Defect Detection with Score-Based Generative Model

Authors: Yapeng Teng, Haoyang Li, Fuzhen Cai, Ming Shao, Siyu Xia

Abstract: Anomaly Detection (AD), as a critical problem, has been widely discussed. In this paper, we specialize in one specific problem, Visual Defect Detection (VDD), in many industrial applications. And in practice, defect image samples are very rare and difficult to collect. Thus, we focus on the unsupervised visual defect detection and localization tasks and propose a novel framework based on the recen… ▽ More Anomaly Detection (AD), as a critical problem, has been widely discussed. In this paper, we specialize in one specific problem, Visual Defect Detection (VDD), in many industrial applications. And in practice, defect image samples are very rare and difficult to collect. Thus, we focus on the unsupervised visual defect detection and localization tasks and propose a novel framework based on the recent score-based generative models, which synthesize the real image by iterative denoising through stochastic differential equations (SDEs). Our work is inspired by the fact that with noise injected into the original image, the defects may be changed into normal cases in the denoising process (i.e., reconstruction). First, based on the assumption that the anomalous data lie in the low probability density region of the normal data distribution, we explain a common phenomenon that occurs when reconstruction-based approaches are applied to VDD: normal pixels also change during the reconstruction process. Second, due to the differences in normal pixels between the reconstructed and original images, a time-dependent gradient value (i.e., score) of normal data distribution is utilized as a metric, rather than reconstruction loss, to gauge the defects. Third, a novel $T$ scales approach is developed to dramatically reduce the required number of iterations, accelerating the inference process. These practices allow our model to generalize VDD in an unsupervised manner while maintaining reasonably good performance. We evaluate our method on several datasets to demonstrate its effectiveness. △ Less

Submitted 29 November, 2022; originally announced November 2022.

arXiv:2210.03888 [pdf, ps, other]

Accelerated and Deep Expectation Maximization for One-Bit MIMO-OFDM Detection

Authors: Mingjie Shao, Wing-Kin Ma, Junbin Liu, Zihao Huang

Abstract: In this paper we study the expectation maximization (EM) technique for one-bit MIMO-OFDM detection (OMOD). Arising from the recent interest in massive MIMO with one-bit analog-to-digital converters, OMOD is a massive-scale problem. EM is an iterative method that can exploit the OFDM structure to process the problem in a per-iteration efficient fashion. In this study we analyze the convergence rate… ▽ More In this paper we study the expectation maximization (EM) technique for one-bit MIMO-OFDM detection (OMOD). Arising from the recent interest in massive MIMO with one-bit analog-to-digital converters, OMOD is a massive-scale problem. EM is an iterative method that can exploit the OFDM structure to process the problem in a per-iteration efficient fashion. In this study we analyze the convergence rate of EM for a class of approximate maximum-likelihood OMOD formulations, or, in a broader sense, a class of problems involving regression from quantized data. We show how the SNR and channel conditions can have an impact on the convergence rate. We do so by making a connection between the EM and the proximal gradient methods in the context of OMOD. This connection also gives us insight to build new accelerated and/or inexact EM schemes. The accelerated scheme has faster convergence in theory, and the inexact scheme provides us with the flexibility to implement EM more efficiently, with convergence guarantee. Furthermore we develop a deep EM algorithm, wherein we take the structure of our inexact EM algorithm and apply deep unfolding to train an efficient structured deep net. Simulation results show that our accelerated exact/inexact EM algorithms run much faster than their standard EM counterparts, and that the deep EM algorithm gives promising detection and runtime performances. △ Less

Submitted 26 January, 2024; v1 submitted 7 October, 2022; originally announced October 2022.

arXiv:2208.11836 [pdf, other]

Polarimetric Inverse Rendering for Transparent Shapes Reconstruction

Authors: Mingqi Shao, Chongkun Xia, Dongxu Duan, Xueqian Wang

Abstract: In this work, we propose a novel method for the detailed reconstruction of transparent objects by exploiting polarimetric cues. Most of the existing methods usually lack sufficient constraints and suffer from the over-smooth problem. Hence, we introduce polarization information as a complementary cue. We implicitly represent the object's geometry as a neural network, while the polarization render… ▽ More In this work, we propose a novel method for the detailed reconstruction of transparent objects by exploiting polarimetric cues. Most of the existing methods usually lack sufficient constraints and suffer from the over-smooth problem. Hence, we introduce polarization information as a complementary cue. We implicitly represent the object's geometry as a neural network, while the polarization render is capable of rendering the object's polarization images from the given shape and illumination configuration. Direct comparison of the rendered polarization images to the real-world captured images will have additional errors due to the transmission in the transparent object. To address this issue, the concept of reflection percentage which represents the proportion of the reflection component is introduced. The reflection percentage is calculated by a ray tracer and then used for weighting the polarization loss. We build a polarization dataset for multi-view transparent shapes reconstruction to verify our method. The experimental results show that our method is capable of recovering detailed shapes and improving the reconstruction quality of transparent objects. Our dataset and code will be publicly available at https://github.com/shaomq2187/TransPIR. △ Less

Submitted 24 August, 2022; originally announced August 2022.

arXiv:2206.07465 [pdf]

High-fidelity quantitative differential phase contrast deconvolution using dark-field sparse prior

Authors: Shuhe Zhang, Tao Peng, Zeyu Ke, Meng Shao, Tos T. J. M. Berendschot, Jinhua Zhou

Abstract: Differential phase contrast (DPC) imaging plays an important role in the family of quantitative phase measurement. However, the reconstruction algorithm for quantitative DPC (qDPC) imaging is not yet optimized, as it does not incorporate the inborn properties of qDPC imaging. In this research, we propose a simple but effective image prior, the dark-field sparse prior (DSP), to facilitate the phase… ▽ More Differential phase contrast (DPC) imaging plays an important role in the family of quantitative phase measurement. However, the reconstruction algorithm for quantitative DPC (qDPC) imaging is not yet optimized, as it does not incorporate the inborn properties of qDPC imaging. In this research, we propose a simple but effective image prior, the dark-field sparse prior (DSP), to facilitate the phase reconstruction quality for all DPC-based phase reconstruction algorithms. The DSP is based on the key observation that most pixel values for an idea differential phase contrast image are zeros since the subtraction of two images under anti-symmetric illumination cancels all background components. With this DSP prior, we formed a new cost function in which L0-norm was used to represent the DSP. Further, we developed two different algorithms based on (1) the Half Quadratic Splitting, and (2) the Richardson-Lucy deconvolution to solve this NP-hard L0-norm problem. We tested our new model on both simulated and experimental data and compare against state-of-the-art methods including L2-norm and total variation regularizations. Results show that our proposed model is superior in terms of phase reconstruction quality and implementation efficiency, in which it significantly increases the experimental robustness, while maintaining the data fidelity. △ Less

Submitted 15 June, 2022; originally announced June 2022.

Showing 1–50 of 94 results for author: Sha, M