Skip to main content

Showing 1–50 of 111 results for author: Shan, H

  1. arXiv:2407.10315  [pdf, other

    cs.LG physics.app-ph q-bio.NC

    Order parameters and phase transitions of continual learning in deep neural networks

    Authors: Haozhe Shan, Qianyi Li, Haim Sompolinsky

    Abstract: Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge. CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks. While various techniques exist to mitigate forgetting, theoretical insights into when and why CL fails in NNs are lacking. Here, we present a statistical-mechanics th… ▽ More

    Submitted 14 July, 2024; originally announced July 2024.

    Comments: 26 pages, 8 figures

  2. arXiv:2407.09857  [pdf, other

    cs.CV

    IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

    Authors: Shaohong Wang, Lu Bin, Xinyu Xiao, Zhiyu Xiang, Hangguan Shan, Eryun Liu

    Abstract: Multi-agent collaborative perception has emerged as a widely recognized technology in the field of autonomous driving in recent years. However, current collaborative perception predominantly relies on LiDAR point clouds, with significantly less attention given to methods using camera images. This severely impedes the development of budget-constrained collaborative systems and the exploitation of t… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  3. arXiv:2407.09048  [pdf, other

    cs.AI

    KUNPENG: An Embodied Large Model for Intelligent Maritime

    Authors: Naiyao Wang, Tongbang Jiang, Ye Wang, Shaoyang Qiu, Bo Zhang, Xinqiang Xie, Munan Li, Chunliu Wang, Yiyang Wang, Hongxiang Ren, Ruili Wang, Hongjun Shan, Hongbo Liu

    Abstract: Intelligent maritime, as an essential component of smart ocean construction, deeply integrates advanced artificial intelligence technology and data analysis methods, which covers multiple aspects such as smart vessels, route optimization, safe navigation, aiming to enhance the efficiency of ocean resource utilization and the intelligence of transportation networks. However, the complex and dynamic… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 9 pages, 3 figures

  4. arXiv:2407.03548  [pdf, other

    cs.CV

    HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation

    Authors: Tao Chen, Chenhui Wang, Zhihao Chen, Yiming Lei, Hongming Shan

    Abstract: Medical image segmentation has been significantly advanced with the rapid development of deep learning (DL) techniques. Existing DL-based segmentation models are typically discriminative; i.e., they aim to learn a mapping from the input image to segmentation masks. However, these discriminative methods neglect the underlying data distribution and intrinsic class characteristics, suffering from uns… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: Accepted by IEEE Transactions on Medical Imaging 2024

  5. arXiv:2405.16121  [pdf

    cs.HC

    Design and Implementation of an Emotion Analysis System Based on EEG Signals

    Authors: Zhang Yutian, Huang Shan, Zhang Jianing, Fan Ci'en

    Abstract: Traditional brain-computer systems are complex and expensive, and emotion classification algorithms lack repre-sentations of the intrinsic relationships between different channels of electroencephalogram (EEG) signals. There is still room for improvement in accuracy. To lower the research barrier for EEG and harness the rich information embedded in multi-channel EEG, we propose and implement a sim… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  6. arXiv:2404.14162  [pdf, other

    cs.CV

    FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on

    Authors: Chenhui Wang, Tao Chen, Zhihao Chen, Zhizhong Huang, Taoran Jiang, Qi Wang, Hongming Shan

    Abstract: Despite their impressive generative performance, latent diffusion model-based virtual try-on (VTON) methods lack faithfulness to crucial details of the clothes, such as style, pattern, and text. To alleviate these issues caused by the diffusion stochastic nature and latent supervision, we propose a novel Faithful Latent Diffusion Model for VTON, termed FLDM-VTON. FLDM-VTON improves the conventiona… ▽ More

    Submitted 19 May, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  7. arXiv:2404.02570  [pdf, other

    cs.CL

    MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

    Authors: Shijia Zhou, Huangyan Shan, Barbara Plank, Robert Litschko

    Abstract: This paper presents our system developed for the SemEval-2024 Task 1: Semantic Textual Relatedness (STR), on Track C: Cross-lingual. The task aims to detect semantic relatedness of two sentences in a given target language without access to direct supervision (i.e. zero-shot cross-lingual transfer). To this end, we focus on different source language selection strategies on two different pre-trained… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  8. arXiv:2403.13374  [pdf, other

    cs.LG cs.AI cs.CR

    Byzantine-resilient Federated Learning With Adaptivity to Data Heterogeneity

    Authors: Shiyuan Zuo, Xingrun Yan, Rongfei Fan, Han Hu, Hangguan Shan, Tony Q. S. Quek

    Abstract: This paper deals with federated learning (FL) in the presence of malicious Byzantine attacks and data heterogeneity. A novel Robust Average Gradient Algorithm (RAGA) is proposed, which leverages the geometric median for aggregation and can freely select the round number for local updating. Different from most existing resilient approaches, which perform convergence analysis based on strongly-conve… ▽ More

    Submitted 27 March, 2024; v1 submitted 20 March, 2024; originally announced March 2024.

  9. arXiv:2403.12749  [pdf, other

    cs.CL

    Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

    Authors: Siyao Peng, Zihang Sun, Huangyan Shan, Marie Kolm, Verena Blaschke, Ekaterina Artemova, Barbara Plank

    Abstract: Named Entity Recognition (NER) is a fundamental task to extract key information from texts, but annotated resources are scarce for dialects. This paper introduces the first dialectal NER dataset for German, BarNER, with 161K tokens annotated on Bavarian Wikipedia articles (bar-wiki) and tweets (bar-tweet), using a schema adapted from German CoNLL 2006 and GermEval. The Bavarian dialect differs fro… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: LREC-COLING 2024

  10. arXiv:2403.06128  [pdf, other

    eess.IV cs.CV

    Low-dose CT Denoising with Language-engaged Dual-space Alignment

    Authors: Zhihao Chen, Tao Chen, Chenhui Wang, Chuang Niu, Ge Wang, Hongming Shan

    Abstract: While various deep learning methods were proposed for low-dose computed tomography (CT) denoising, they often suffer from over-smoothing, blurring, and lack of explainability. To alleviate these issues, we propose a plug-and-play Language-Engaged Dual-space Alignment loss (LEDA) to optimize low-dose CT denoising models. Our idea is to leverage large language models (LLMs) to align denoised CT and… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: 11 pages, 6 figures

  11. arXiv:2403.05545  [pdf

    cs.CY

    Unveiling the influence of behavioural, built environment and socio-economic features on the spatial and temporal variability of bus use using explainable machine learning

    Authors: Sui Tao, Francisco Rowe, Hongyu Shan

    Abstract: Understanding the variability of people's travel patterns is key to transport planning and policy-making. However, to what extent daily transit use displays geographic and temporal variabilities, and what are the contributing factors have not been fully addressed. Drawing on smart card data in Beijing, China, this study seeks to address these deficits by adopting new indices to capture the spatial… ▽ More

    Submitted 6 February, 2024; originally announced March 2024.

    Comments: 58 pages including supplementary material

  12. arXiv:2402.14152  [pdf, other

    cs.AR cs.CR

    ModSRAM: Algorithm-Hardware Co-Design for Large Number Modular Multiplication in SRAM

    Authors: Jonathan Ku, Junyao Zhang, Haoxuan Shan, Saichand Samudrala, Jiawen Wu, Qilin Zheng, Ziru Li, JV Rajendran, Yiran Chen

    Abstract: Elliptic curve cryptography (ECC) is widely used in security applications such as public key cryptography (PKC) and zero-knowledge proofs (ZKP). ECC is composed of modular arithmetic, where modular multiplication takes most of the processing time. Computational complexity and memory constraints of ECC limit the performance. Therefore, hardware acceleration on ECC is an active field of research. Pr… ▽ More

    Submitted 21 February, 2024; originally announced February 2024.

    Comments: DAC 2024

  13. arXiv:2402.11423  [pdf, other

    cs.CR eess.SP

    VoltSchemer: Use Voltage Noise to Manipulate Your Wireless Charger

    Authors: Zihao Zhan, Yirui Yang, Haoqi Shan, Hanqiu Wang, Yier Jin, Shuo Wang

    Abstract: Wireless charging is becoming an increasingly popular charging solution in portable electronic products for a more convenient and safer charging experience than conventional wired charging. However, our research identified new vulnerabilities in wireless charging systems, making them susceptible to intentional electromagnetic interference. These vulnerabilities facilitate a set of novel attack vec… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted by the 33rd USENIX Security Symposium

  14. arXiv:2402.02299  [pdf, other

    cs.CR cs.LG

    A Review and Comparison of AI Enhanced Side Channel Analysis

    Authors: Max Panoff, Honggang Yu, Haoqi Shan, Yier Jin

    Abstract: Side Channel Analysis (SCA) presents a clear threat to privacy and security in modern computing systems. The vast majority of communications are secured through cryptographic algorithms. These algorithms are often provably-secure from a cryptographical perspective, but their implementation on real hardware introduces vulnerabilities. Adversaries can exploit these vulnerabilities to conduct SCA and… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted by ACM Journal on Emerging Technologies in Computing Systems (JETC)

  15. Invisible Finger: Practical Electromagnetic Interference Attack on Touchscreen-based Electronic Devices

    Authors: Haoqi Shan, Boyi Zhang, Zihao Zhan, Dean Sullivan, Shuo Wang, Yier Jin

    Abstract: Touchscreen-based electronic devices such as smart phones and smart tablets are widely used in our daily life. While the security of electronic devices have been heavily investigated recently, the resilience of touchscreens against various attacks has yet to be thoroughly investigated. In this paper, for the first time, we show that touchscreen-based electronic devices are vulnerable to intentiona… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Comments: This paper has been accepted by 2022 IEEE Symposium on Security and Privacy (SP) and won distinguished paper award

  16. arXiv:2401.11764  [pdf, other

    cs.MM

    Identity-Driven Multimedia Forgery Detection via Reference Assistance

    Authors: Junhao Xu, Jingjing Chen, Xue Song, Feng Han, Haijun Shan, Yugang Jiang

    Abstract: Recent advancements in technologies, such as the 'deepfake' technique, have paved the way for the generation of various media forgeries. In response to the potential hazards of these media forgeries, many researchers engage in exploring detection methods, increasing the demand for high-quality media forgery datasets. Despite this, existing datasets have certain limitations. Firstly, most of datase… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  17. arXiv:2312.15663  [pdf, other

    cs.CV cs.AI

    IQAGPT: Image Quality Assessment with Vision-language and ChatGPT Models

    Authors: Zhihao Chen, Bin Hu, Chuang Niu, Tao Chen, Yuxin Li, Hongming Shan, Ge Wang

    Abstract: Large language models (LLMs), such as ChatGPT, have demonstrated impressive capabilities in various tasks and attracted an increasing interest as a natural language interface across many domains. Recently, large vision-language models (VLMs) like BLIP-2 and GPT-4 have been intensively investigated, which learn rich vision-language correlation from image-text pairs. However, despite these developme… ▽ More

    Submitted 25 December, 2023; originally announced December 2023.

    Comments: 14 pages, 9 figures

  18. HeisenTrojans: They Are Not There Until They Are Triggered

    Authors: Akshita Reddy Mavurapu, Haoqi Shan, Xiaolong Guo, Orlando Arias, Dean Sullivan

    Abstract: The hardware security community has made significant advances in detecting Hardware Trojan vulnerabilities using software fuzzing-inspired automated analysis. However, the Electronic Design Automation (EDA) code base itself remains under-examined by the same techniques. Our experiments in fuzzing EDA tools demonstrate that, indeed, they are prone to software bugs. As a consequence, this paper unve… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by IEEE Asian Hardware Oriented Security and Trust Symposium (AsianHOST' 2023)

  19. When Memory Mappings Attack: On the (Mis)use of the ARM Cortex-M FPB Unit

    Authors: Haoqi Shan, Dean Sullivan, Orlando Arias

    Abstract: In recent years we have seen an explosion in the usage of low-cost, low-power microcontrollers (MCUs) in embedded devices around us due to the popularity of Internet of Things (IoT) devices. Although this is good from an economics perspective, it has also been detrimental for security as microcontroller-based systems are now a viable attack target. In response, researchers have developed various p… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: This paper has been accepted by IEEE Asian Hardware Oriented Security and Trust Symposium (AsianHOST' 2023) and won Best Paper Award

  20. arXiv:2312.10479  [pdf, other

    cs.CL

    A Soft Contrastive Learning-based Prompt Model for Few-shot Sentiment Analysis

    Authors: Jingyi Zhou, Jie Zhou, Jiabao Zhao, Siyin Wang, Haijun Shan, Gui Tao, Qi Zhang, Xuanjing Huang

    Abstract: Few-shot text classification has attracted great interest in both academia and industry due to the lack of labeled data in many fields. Different from general text classification (e.g., topic classification), few-shot sentiment classification is more challenging because the semantic distances among the classes are more subtle. For instance, the semantic distances between the sentiment labels in a… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: Accepted by ICASSP

  21. arXiv:2312.05038  [pdf, other

    cs.CV

    Prompt-In-Prompt Learning for Universal Image Restoration

    Authors: Zilong Li, Yiming Lei, Chenglong Ma, Junping Zhang, Hongming Shan

    Abstract: Image restoration, which aims to retrieve and enhance degraded images, is fundamental across a wide range of applications. While conventional deep learning approaches have notably improved the image quality across various tasks, they still suffer from (i) the high storage cost needed for various task-specific models and (ii) the lack of interactivity and flexibility, hindering their wider applicat… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

  22. arXiv:2312.04433  [pdf, other

    cs.CV

    DreamVideo: Composing Your Dream Videos with Customized Subject and Motion

    Authors: Yujie Wei, Shiwei Zhang, Zhiwu Qing, Hangjie Yuan, Zhiheng Liu, Yu Liu, Yingya Zhang, Jingren Zhou, Hongming Shan

    Abstract: Customized generation using diffusion models has made impressive progress in image generation, but remains unsatisfactory in the challenging video generation task, as it requires the controllability of both subjects and motions. To that end, we present DreamVideo, a novel approach to generating personalized videos from a few static images of the desired subject and a few videos of target motion. D… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  23. arXiv:2311.12386  [pdf, other

    cs.CV

    Point, Segment and Count: A Generalized Framework for Object Counting

    Authors: Zhizhong Huang, Mingliang Dai, Yi Zhang, Junping Zhang, Hongming Shan

    Abstract: Class-agnostic object counting aims to count all objects in an image with respect to example boxes or class names, \emph{a.k.a} few-shot and zero-shot counting. In this paper, we propose a generalized framework for both few-shot and zero-shot object counting based on detection. Our framework combines the superior advantages of two foundation models without compromising their zero-shot capability:… ▽ More

    Submitted 27 March, 2024; v1 submitted 21 November, 2023; originally announced November 2023.

    Comments: Accepted by CVPR 2024. Camera ready

  24. arXiv:2311.12049  [pdf, other

    cs.CV

    Energizing Federated Learning via Filter-Aware Attention

    Authors: Ziyuan Yang, Zerui Shao, Huijie Huangfu, Hui Yu, Andrew Beng Jin Teoh, Xiaoxiao Li, Hongming Shan, Yi Zhang

    Abstract: Federated learning (FL) is a promising distributed paradigm, eliminating the need for data sharing but facing challenges from data heterogeneity. Personalized parameter generation through a hypernetwork proves effective, yet existing methods fail to personalize local model structures. This leads to redundant parameters struggling to adapt to diverse data distributions. To address these limitations… ▽ More

    Submitted 18 November, 2023; originally announced November 2023.

  25. arXiv:2311.11683  [pdf, ps, other

    cs.CV cs.AI

    SIAM: A Simple Alternating Mixer for Video Prediction

    Authors: Xin Zheng, Ziang Peng, Yuan Cao, Hongming Shan, Junping Zhang

    Abstract: Video prediction, predicting future frames from the previous ones, has broad applications such as autonomous driving and weather forecasting. Existing state-of-the-art methods typically focus on extracting either spatial, temporal, or spatiotemporal features from videos. Different feature focuses, resulting from different network architectures, may make the resultant models excel at some video pre… ▽ More

    Submitted 20 May, 2024; v1 submitted 20 November, 2023; originally announced November 2023.

  26. arXiv:2311.09532  [pdf, other

    cs.CR

    LightEMU: Hardware Assisted Fuzzing of Trusted Applications

    Authors: Haoqi Shan, Sravani Nissankararao, Yujia Liu, Moyao Huang, Shuo Wang, Yier Jin, Dean Sullivan

    Abstract: Trusted Execution Environments (TEEs) are deployed in many CPU designs because of the confidentiality and integrity guarantees they provide. ARM TrustZone is a TEE extensively deployed on smart phones, IoT devices, and notebooks. Specifically, TrustZone is used to separate code execution and data into two worlds, normal world and secure world. However, this separation inherently prevents tradition… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: This paper has been accepted by IEEE International Symposium on Hardware Oriented Security and Trust (HOST'2024)

  27. arXiv:2310.09821  [pdf, other

    cs.CV

    LICO: Explainable Models with Language-Image Consistency

    Authors: Yiming Lei, Zilong Li, Yangyang Li, Junping Zhang, Hongming Shan

    Abstract: Interpreting the decisions of deep learning models has been actively studied since the explosion of deep neural networks. One of the most convincing interpretation approaches is salience-based visual interpretation, such as Grad-CAM, where the generation of attention maps depends merely on categorical labels. Although existing interpretation methods can provide explainable decision clues, they oft… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Accepted by NeurIPS 2023

  28. arXiv:2309.08551  [pdf, other

    cs.CL cs.SD eess.AS

    Augmenting conformers with structured state-space sequence models for online speech recognition

    Authors: Haozhe Shan, Albert Gu, Zhong Meng, Weiran Wang, Krzysztof Choromanski, Tara Sainath

    Abstract: Online speech recognition, where the model only accesses context to the left, is an important and challenging use case for ASR systems. In this work, we investigate augmenting neural encoders for online ASR by incorporating structured state-space sequence models (S4), a family of models that provide a parameter-efficient way of accessing arbitrarily long left context. We performed systematic ablat… ▽ More

    Submitted 27 December, 2023; v1 submitted 15 September, 2023; originally announced September 2023.

    Comments: ICASSP 2024

  29. arXiv:2309.05314  [pdf, other

    cs.CV cs.AI

    Semantic Latent Decomposition with Normalizing Flows for Face Editing

    Authors: Binglei Li, Zhizhong Huang, Hongming Shan, Junping Zhang

    Abstract: Navigating in the latent space of StyleGAN has shown effectiveness for face editing. However, the resulting methods usually encounter challenges in complicated navigation due to the entanglement among different attributes in the latent space. To address this issue, this paper proposes a novel framework, termed SDFlow, with a semantic decomposition in original latent space using continuous conditio… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

  30. arXiv:2308.11474  [pdf, other

    cs.IR

    Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval

    Authors: Xiaojie Sun, Keping Bi, Jiafeng Guo, Xinyu Ma, Fan Yixing, Hongyu Shan, Qishen Zhang, Zhongyi Liu

    Abstract: Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively on plain text. In contrast, there has been little research on retrieving data with multiple aspects using dense models. In the scenarios such as product search, the aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of le… ▽ More

    Submitted 22 August, 2023; originally announced August 2023.

    Comments: accepted by cikm2023

  31. arXiv:2308.08463  [pdf, other

    eess.IV cs.CV

    Learning to Distill Global Representation for Sparse-View CT

    Authors: Zilong Li, Chenglong Ma, Jie Chen, Junping Zhang, Hongming Shan

    Abstract: Sparse-view computed tomography (CT) -- using a small number of projections for tomographic reconstruction -- enables much lower radiation dose to patients and accelerated data acquisition. The reconstructed images, however, suffer from strong artifacts, greatly limiting their diagnostic value. Current trends for sparse-view CT turn to the raw data for better information recovery. The resultant du… ▽ More

    Submitted 19 August, 2023; v1 submitted 16 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  32. arXiv:2308.02190  [pdf, other

    cs.SD cs.CL eess.AS

    Emo-DNA: Emotion Decoupling and Alignment Learning for Cross-Corpus Speech Emotion Recognition

    Authors: Jiaxin Ye, Yujie Wei, Xin-Cheng Wen, Chenglong Ma, Zhizhong Huang, Kunhong Liu, Hongming Shan

    Abstract: Cross-corpus speech emotion recognition (SER) seeks to generalize the ability of inferring speech emotion from a well-labeled corpus to an unlabeled one, which is a rather challenging task due to the significant discrepancy between two corpora. Existing methods, typically based on unsupervised domain adaptation (UDA), struggle to learn corpus-invariant features by global distribution alignment, bu… ▽ More

    Submitted 4 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  33. arXiv:2308.00301  [pdf, other

    cs.CV

    Online Prototype Learning for Online Continual Learning

    Authors: Yujie Wei, Jiaxin Ye, Zhizhong Huang, Junping Zhang, Hongming Shan

    Abstract: Online continual learning (CL) studies the problem of learning continuously from a single-pass data stream while adapting to new data and mitigating catastrophic forgetting. Recently, by storing a small subset of old data, replay-based methods have shown promising performance. Unlike previous methods that focus on sample storage or knowledge distillation against catastrophic forgetting, this paper… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  34. ASCON: Anatomy-aware Supervised Contrastive Learning Framework for Low-dose CT Denoising

    Authors: Zhihao Chen, Qi Gao, Yi Zhang, Hongming Shan

    Abstract: While various deep learning methods have been proposed for low-dose computed tomography (CT) denoising, most of them leverage the normal-dose CT images as the ground-truth to supervise the denoising process. These methods typically ignore the inherent correlation within a single CT image, especially the anatomical semantics of human tissues, and lack the interpretability on the denoising process.… ▽ More

    Submitted 23 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023

    Journal ref: MICCAI 2023

  35. arXiv:2307.07790  [pdf, other

    cs.CV

    Adaptive Nonlinear Latent Transformation for Conditional Face Editing

    Authors: Zhizhong Huang, Siteng Ma, Junping Zhang, Hongming Shan

    Abstract: Recent works for face editing usually manipulate the latent space of StyleGAN via the linear semantic directions. However, they usually suffer from the entanglement of facial attributes, need to tune the optimal editing strength, and are limited to binary attributes with strong supervision signals. This paper proposes a novel adaptive nonlinear latent transformation for disentangled and conditiona… ▽ More

    Submitted 15 July, 2023; originally announced July 2023.

    Comments: ICCV 2023

  36. FreeSeed: Frequency-band-aware and Self-guided Network for Sparse-view CT Reconstruction

    Authors: Chenglong Ma, Zilong Li, Junping Zhang, Yi Zhang, Hongming Shan

    Abstract: Sparse-view computed tomography (CT) is a promising solution for expediting the scanning process and mitigating radiation exposure to patients, the reconstructed images, however, contain severe streak artifacts, compromising subsequent screening and diagnosis. Recently, deep learning-based image post-processing methods along with their dual-domain counterparts have shown promising results. However… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: MICCAI 2023

    Journal ref: MICCAI 2023

  37. arXiv:2305.13585  [pdf, other

    cs.CL

    Query Structure Modeling for Inductive Logical Reasoning Over Knowledge Graphs

    Authors: Siyuan Wang, Zhongyu Wei, Meng Han, Zhihao Fan, Haijun Shan, Qi Zhang, Xuanjing Huang

    Abstract: Logical reasoning over incomplete knowledge graphs to answer complex logical queries is a challenging task. With the emergence of new entities and relations in constantly evolving KGs, inductive logical reasoning over KGs has become a crucial problem. However, previous PLMs-based methods struggle to model the logical structures of complex queries, which limits their ability to generalize within th… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: 11 pages, 2 figures, 8 tables, accepted as a long paper to ACL 203

  38. FAN-Net: Fourier-Based Adaptive Normalization For Cross-Domain Stroke Lesion Segmentation

    Authors: Weiyi Yu, Yiming Lei, Hongming Shan

    Abstract: Since stroke is the main cause of various cerebrovascular diseases, deep learning-based stroke lesion segmentation on magnetic resonance (MR) images has attracted considerable attention. However, the existing methods often neglect the domain shift among MR images collected from different sites, which has limited performance improvement. To address this problem, we intend to change style informatio… ▽ More

    Submitted 23 April, 2023; originally announced April 2023.

    Comments: Accepted by IEEE ICASSP 2023

    Journal ref: IEEE ICASSP 2023

  39. CLIP-Lung: Textual Knowledge-Guided Lung Nodule Malignancy Prediction

    Authors: Yiming Lei, Zilong Li, Yan Shen, Junping Zhang, Hongming Shan

    Abstract: Lung nodule malignancy prediction has been enhanced by advanced deep-learning techniques and effective tricks. Nevertheless, current methods are mainly trained with cross-entropy loss using one-hot categorical labels, which results in difficulty in distinguishing those nodules with closer progression labels. Interestingly, we observe that clinical text information annotated by radiologists provide… ▽ More

    Submitted 17 April, 2023; originally announced April 2023.

    Journal ref: MICCAI 2023

  40. BerDiff: Conditional Bernoulli Diffusion Model for Medical Image Segmentation

    Authors: Tao Chen, Chenhui Wang, Hongming Shan

    Abstract: Medical image segmentation is a challenging task with inherent ambiguity and high uncertainty, attributed to factors such as unclear tumor boundaries and multiple plausible annotations. The accuracy and diversity of segmentation masks are both crucial for providing valuable references to radiologists in clinical practice. While existing diffusion models have shown strong capacities in various visu… ▽ More

    Submitted 10 April, 2023; originally announced April 2023.

    Comments: 14 pages, 7 figures

    Journal ref: MICCAI 2023

  41. arXiv:2304.01814  [pdf, other

    eess.IV cs.CV cs.LG physics.med-ph

    CoreDiff: Contextual Error-Modulated Generalized Diffusion Model for Low-Dose CT Denoising and Generalization

    Authors: Qi Gao, Zilong Li, Junping Zhang, Yi Zhang, Hongming Shan

    Abstract: Low-dose computed tomography (CT) images suffer from noise and artifacts due to photon starvation and electronic noise. Recently, some works have attempted to use diffusion models to address the over-smoothness and training instability encountered by previous deep-learning-based denoising models. However, diffusion models suffer from long inference times due to the large number of sampling steps i… ▽ More

    Submitted 6 October, 2023; v1 submitted 4 April, 2023; originally announced April 2023.

    Comments: IEEE Transactions on Medical Imaging, 2023

    Journal ref: IEEE Transactions on Medical Imaging, 43(2), 2024

  42. arXiv:2303.14240  [pdf, other

    cs.CV

    Adaptive Base-class Suppression and Prior Guidance Network for One-Shot Object Detection

    Authors: Wenwen Zhang, Xinyu Xiao, Hangguan Shan, Eryun Liu

    Abstract: One-shot object detection (OSOD) aims to detect all object instances towards the given category specified by a query image. Most existing studies in OSOD endeavor to explore effective cross-image correlation and alleviate the semantic feature misalignment, however, ignoring the phenomenon of the model bias towards the base classes and the generalization degradation on the novel classes. Observing… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

  43. Cross-head Supervision for Crowd Counting with Noisy Annotations

    Authors: Mingliang Dai, Zhizhong Huang, Jiaqi Gao, Hongming Shan, Junping Zhang

    Abstract: Noisy annotations such as missing annotations and location shifts often exist in crowd counting datasets due to multi-scale head sizes, high occlusion, etc. These noisy annotations severely affect the model training, especially for density map-based methods. To alleviate the negative impact of noisy annotations, we propose a novel crowd counting model with one convolution head and one transformer… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

    Comments: accepted by ICASSP 2023

    Journal ref: IEEE ICASSP 2023

  44. arXiv:2303.06930  [pdf, other

    cs.CV cs.AI

    Twin Contrastive Learning with Noisy Labels

    Authors: Zhizhong Huang, Junping Zhang, Hongming Shan

    Abstract: Learning from noisy data is a challenging task that significantly degenerates the model performance. In this paper, we present TCL, a novel twin contrastive learning model to learn robust representations and handle noisy labels for classification. Specifically, we construct a Gaussian mixture model (GMM) over the representations by injecting the supervised model predictions into GMM to link label-… ▽ More

    Submitted 13 March, 2023; originally announced March 2023.

    Comments: CVPR 2023

  45. arXiv:2302.10630  [pdf, other

    eess.IV cs.CV physics.med-ph

    LIT-Former: Linking In-plane and Through-plane Transformers for Simultaneous CT Image Denoising and Deblurring

    Authors: Zhihao Chen, Chuang Niu, Qi Gao, Ge Wang, Hongming Shan

    Abstract: This paper studies 3D low-dose computed tomography (CT) imaging. Although various deep learning methods were developed in this context, typically they focus on 2D images and perform denoising due to low-dose and deblurring for super-resolution separately. Up to date, little work was done for simultaneous in-plane denoising and through-plane deblurring, which is important to obtain high-quality 3D… ▽ More

    Submitted 7 January, 2024; v1 submitted 21 February, 2023; originally announced February 2023.

    Comments: 15 pages, 12 figures

    Journal ref: IEEE Transactions on Medical Imaging, 2024

  46. arXiv:2301.06122  [pdf, other

    cs.CV

    CORE: Learning Consistent Ordinal REpresentations for Image Ordinal Estimation

    Authors: Yiming Lei, Zilong Li, Yangyang Li, Junping Zhang, Hongming Shan

    Abstract: The goal of image ordinal estimation is to estimate the ordinal label of a given image with a convolutional neural network. Existing methods are mainly based on ordinal regression and particularly focus on modeling the ordinal mapping from the feature representation of the input to the ordinal label space. However, the manifold of the resultant feature representations does not maintain the intrins… ▽ More

    Submitted 15 January, 2023; originally announced January 2023.

    Comments: 13 pages

  47. Temporal Modeling Matters: A Novel Temporal Emotional Modeling Approach for Speech Emotion Recognition

    Authors: Jiaxin Ye, Xin-cheng Wen, Yujie Wei, Yong Xu, Kunhong Liu, Hongming Shan

    Abstract: Speech emotion recognition (SER) plays a vital role in improving the interactions between humans and machines by inferring human emotion and affective states from speech signals. Whereas recent works primarily focus on mining spatiotemporal information from hand-crafted features, we explore how to model the temporal patterns of speech emotions from dynamic temporal scales. Towards that goal, we in… ▽ More

    Submitted 14 August, 2023; v1 submitted 14 November, 2022; originally announced November 2022.

    Comments: ICASSP 2023

    Journal ref: IEEE ICASSP 2023

  48. Motion Matters: A Novel Motion Modeling For Cross-View Gait Feature Learning

    Authors: Jingqi Li, Jiaqi Gao, Yuzhen Zhang, Hongming Shan, Junping Zhang

    Abstract: As a unique biometric that can be perceived at a distance, gait has broad applications in person authentication, social security, and so on. Existing gait recognition methods suffer from changes in viewpoint and clothing and barely consider extracting diverse motion features, a fundamental characteristic in gaits, from gait sequences. This paper proposes a novel motion modeling method to extract t… ▽ More

    Submitted 19 January, 2023; v1 submitted 21 October, 2022; originally announced October 2022.

    Journal ref: IEEE ICASSP 2023

  49. When Age-Invariant Face Recognition Meets Face Age Synthesis: A Multi-Task Learning Framework and A New Benchmark

    Authors: Zhizhong Huang, Junping Zhang, Hongming Shan

    Abstract: To minimize the impact of age variation on face recognition, age-invariant face recognition (AIFR) extracts identity-related discriminative features by minimizing the correlation between identity- and age-related features while face age synthesis (FAS) eliminates age variation by converting the faces in different age groups to the same group. However, AIFR lacks visual results for model interpreta… ▽ More

    Submitted 26 October, 2022; v1 submitted 17 October, 2022; originally announced October 2022.

    Comments: TPAMI 2022. arXiv admin note: substantial text overlap with arXiv:2103.01520

    Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022

  50. Quad-Net: Quad-domain Network for CT Metal Artifact Reduction

    Authors: Zilong Li, Qi Gao, Yaping Wu, Chuang Niu, Junping Zhang, Meiyun Wang, Ge Wang, Hongming Shan

    Abstract: Metal implants and other high-density objects in patients introduce severe streaking artifacts in CT images, compromising image quality and diagnostic performance. Although various methods were developed for CT metal artifact reduction over the past decades, including the latest dual-domain deep networks, remaining metal artifacts are still clinically challenging in many cases. Here we extend the… ▽ More

    Submitted 31 May, 2023; v1 submitted 24 July, 2022; originally announced July 2022.

    Journal ref: IEEE Transactions on Medical Imaging, 2024