Skip to main content

Showing 1–27 of 27 results for author: Jiao, G

  1. arXiv:2407.13188  [pdf, other

    cs.CV

    Safe-SD: Safe and Traceable Stable Diffusion with Text Prompt Trigger for Invisible Generative Watermarking

    Authors: Zhiyuan Ma, Guoli Jia, Biqing Qi, Bowen Zhou

    Abstract: Recently, stable diffusion (SD) models have typically flourished in the field of image synthesis and personalized editing, with a range of photorealistic and unprecedented images being successfully generated. As a result, widespread interest has been ignited to develop and use various SD-based tools for visual content creation. However, the exposure of AI-created content on public platforms could… ▽ More

    Submitted 18 July, 2024; originally announced July 2024.

  2. arXiv:2404.13701  [pdf, other

    cs.CV cs.LG

    Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation

    Authors: Guanlong Jiao, Chenyangguang Zhang, Haonan Yin, Yu Mo, Biqing Huang, Hui Pan, Yi Luo, Jingxian Liu

    Abstract: Domain generalized semantic segmentation is an essential computer vision task, for which models only leverage source data to learn the capability of generalized semantic segmentation towards the unseen target domains. Previous works typically address this challenge by global style randomization or feature regularization. In this paper, we argue that given the observation that different local seman… ▽ More

    Submitted 21 April, 2024; originally announced April 2024.

  3. arXiv:2404.00292  [pdf, other

    cs.CV

    LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion

    Authors: Pancheng Zhao, Peng Xu, Pengda Qin, Deng-Ping Fan, Zhicheng Zhang, Guoli Jia, Bowen Zhou, Jufeng Yang

    Abstract: Camouflaged vision perception is an important vision task with numerous practical applications. Due to the expensive collection and labeling costs, this community struggles with a major bottleneck that the species category of its datasets is limited to a small number of object species. However, the existing camouflaged generation methods require specifying the background manually, thus failing to… ▽ More

    Submitted 12 July, 2024; v1 submitted 30 March, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024, Fig.2 and Equation 4 revised

  4. arXiv:2402.11307  [pdf, other

    cs.CV

    ICHPro: Intracerebral Hemorrhage Prognosis Classification Via Joint-attention Fusion-based 3d Cross-modal Network

    Authors: Xinlei Yu, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Taosheng Xu, Xiang Wan, Changmiao Wang

    Abstract: Intracerebral Hemorrhage (ICH) is the deadliest subtype of stroke, necessitating timely and accurate prognostic evaluation to reduce mortality and disability. However, the multi-factorial nature and complexity of ICH make methods based solely on computed tomography (CT) image features inadequate. Despite the capacity of cross-modal networks to fuse additional information, the effective combination… ▽ More

    Submitted 17 February, 2024; originally announced February 2024.

    Comments: 6 pages,4 figures, 4 tables, accepted by ISBI

  5. arXiv:2401.11913  [pdf, other

    cs.CV cs.AI

    Large receptive field strategy and important feature extraction strategy in 3D object detection

    Authors: Leichao Cui, Xiuxian Li, Min Meng, Guangyu Jia

    Abstract: The enhancement of 3D object detection is pivotal for precise environmental perception and improved task execution capabilities in autonomous driving. LiDAR point clouds, offering accurate depth information, serve as a crucial information for this purpose. Our study focuses on key challenges in 3D target detection. To tackle the challenge of expanding the receptive field of a 3D convolutional kern… ▽ More

    Submitted 10 March, 2024; v1 submitted 22 January, 2024; originally announced January 2024.

  6. arXiv:2401.03048  [pdf, other

    cs.CV

    Latte: Latent Diffusion Transformer for Video Generation

    Authors: Xin Ma, Yaohui Wang, Gengyun Jia, Xinyuan Chen, Ziwei Liu, Yuan-Fang Li, Cunjian Chen, Yu Qiao

    Abstract: We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte first extracts spatio-temporal tokens from input videos and then adopts a series of Transformer blocks to model video distribution in the latent space. In order to model a substantial number of tokens extracted from videos, four efficient variants are introduced from the perspective of decomposing the spatia… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Project page: https://maxin-cn.github.io/latte_project

  7. arXiv:2312.08019  [pdf, other

    cs.CV

    AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing

    Authors: Zhiyuan Ma, Guoli Jia, Bowen Zhou

    Abstract: With the great success of text-conditioned diffusion models in creative text-to-image generation, various text-driven image editing approaches have attracted the attentions of many researchers. However, previous works mainly focus on discreteness-sensitive instructions such as adding, removing or replacing specific objects, background elements or global styles (i.e., hard editing), while generally… ▽ More

    Submitted 24 December, 2023; v1 submitted 13 December, 2023; originally announced December 2023.

  8. arXiv:2311.07033  [pdf, other

    eess.IV cs.CV

    TTMFN: Two-stream Transformer-based Multimodal Fusion Network for Survival Prediction

    Authors: Ruiquan Ge, Xiangyang Hu, Rungen Huang, Gangyong Jia, Yaqi Wang, Renshu Gu, Changmiao Wang, Elazab Ahmed, Linyan Wang, Juan Ye, Ye Li

    Abstract: Survival prediction plays a crucial role in assisting clinicians with the development of cancer treatment protocols. Recent evidence shows that multimodal data can help in the diagnosis of cancer disease and improve survival prediction. Currently, deep learning-based approaches have experienced increasing success in survival prediction by integrating pathological images and gene expression data. H… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

  9. arXiv:2311.04772  [pdf, other

    eess.IV cs.CV

    GCS-ICHNet: Assessment of Intracerebral Hemorrhage Prognosis using Self-Attention with Domain Knowledge Integration

    Authors: Xuhao Shan, Xinyang Li, Ruiquan Ge, Shibin Wu, Ahmed Elazab, Jichao Zhu, Lingyan Zhang, Gangyong Jia, Qingying Xiao, Xiang Wan, Changmiao Wang

    Abstract: Intracerebral Hemorrhage (ICH) is a severe condition resulting from damaged brain blood vessel ruptures, often leading to complications and fatalities. Timely and accurate prognosis and management are essential due to its high mortality rate. However, conventional methods heavily rely on subjective clinician expertise, which can lead to inaccurate diagnoses and delays in treatment. Artificial inte… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

    Comments: 6 pages, 3 figures, 5 tables, published to BIBM 2023

  10. arXiv:2310.11696  [pdf, other

    cs.CV

    MOHO: Learning Single-view Hand-held Object Reconstruction with Multi-view Occlusion-Aware Supervision

    Authors: Chenyangguang Zhang, Guanlong Jiao, Yan Di, Gu Wang, Ziqin Huang, Ruida Zhang, Fabian Manhardt, Bowen Fu, Federico Tombari, Xiangyang Ji

    Abstract: Previous works concerning single-view hand-held object reconstruction typically rely on supervision from 3D ground-truth models, which are hard to collect in real world. In contrast, readily accessible hand-object videos offer a promising training data source, but they only give heavily occluded object observations. In this paper, we present a novel synthetic-to-real framework to exploit Multi-vie… ▽ More

    Submitted 13 March, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: CVPR 2024

  11. arXiv:2307.11530  [pdf, other

    eess.IV cs.CV

    UWAT-GAN: Fundus Fluorescein Angiography Synthesis via Ultra-wide-angle Transformation Multi-scale GAN

    Authors: Zhaojie Fang, Zhanghao Chen, Pengxue Wei, Wangting Li, Shaochong Zhang, Ahmed Elazab, Gangyong Jia, Ruiquan Ge, Changmiao Wang

    Abstract: Fundus photography is an essential examination for clinical and differential diagnosis of fundus diseases. Recently, Ultra-Wide-angle Fundus (UWF) techniques, UWF Fluorescein Angiography (UWF-FA) and UWF Scanning Laser Ophthalmoscopy (UWF-SLO) have been gradually put into use. However, Fluorescein Angiography (FA) and UWF-FA require injecting sodium fluorescein which may have detrimental influence… ▽ More

    Submitted 21 July, 2023; originally announced July 2023.

    Comments: 26th International Conference on Medical Image Computing and Computer Assisted Intervention

  12. arXiv:2307.08059  [pdf, other

    cs.CV

    LafitE: Latent Diffusion Model with Feature Editing for Unsupervised Multi-class Anomaly Detection

    Authors: Haonan Yin, Guanlong Jiao, Qianhui Wu, Borje F. Karlsson, Biqing Huang, Chin Yew Lin

    Abstract: In the context of flexible manufacturing systems that are required to produce different types and quantities of products with minimal reconfiguration, this paper addresses the problem of unsupervised multi-class anomaly detection: develop a unified model to detect anomalies from objects belonging to multiple classes when only normal data is accessible. We first explore the generative-based approac… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

    Comments: 8 pages

  13. arXiv:2307.07494  [pdf, other

    cs.CV

    TALL: Thumbnail Layout for Deepfake Video Detection

    Authors: Yuting Xu, Jian Liang, Gengyun Jia, Ziming Yang, Yanhao Zhang, Ran He

    Abstract: The growing threats of deepfakes to society and cybersecurity have raised enormous public concerns, and increasing efforts have been devoted to this critical topic of deepfake video detection. Existing video methods achieve good performance but are computationally intensive. This paper introduces a simple yet effective strategy named Thumbnail Layout (TALL), which transforms a video clip into a pr… ▽ More

    Submitted 17 February, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

    Comments: Accepted by ICCV 2023; We revised the first paragraph of section 3

  14. arXiv:2208.12623  [pdf

    cs.CV

    From WSI-level to Patch-level: Structure Prior Guided Binuclear Cell Fine-grained Detection

    Authors: Baomin Wang, Geng Hu, Dan Chen, Lihua Hu, Cheng Li, Yu An, Guiping Hu, Guang Jia

    Abstract: Accurately and quickly binuclear cell (BC) detection plays a significant role in predicting the risk of leukemia and other malignant tumors. However, manual microscopy counting is time-consuming and lacks objectivity. Moreover, with the limitation of staining quality and diversity of morphology features in BC microscopy whole slide images (WSIs), traditional image processing approaches are helples… ▽ More

    Submitted 26 August, 2022; originally announced August 2022.

  15. Improving COVID-19 CT Classification of CNNs by Learning Parameter-Efficient Representation

    Authors: Yujia Xu, Hak-Keung Lam, Guangyu Jia, Jian Jiang, Junkai Liao, Xinqi Bao

    Abstract: COVID-19 pandemic continues to spread rapidly over the world and causes a tremendous crisis in global human health and the economy. Its early detection and diagnosis are crucial for controlling the further spread. Many deep learning-based methods have been proposed to assist clinicians in automatic COVID-19 diagnosis based on computed tomography imaging. However, challenges still remain, including… ▽ More

    Submitted 9 August, 2022; originally announced August 2022.

  16. arXiv:2206.08791  [pdf, other

    cs.CV cs.AI

    DU-Net based Unsupervised Contrastive Learning for Cancer Segmentation in Histology Images

    Authors: Yilong Li, Yaqi Wang, Huiyu Zhou, Huaqiong Wang, Gangyong Jia, Qianni Zhang

    Abstract: In this paper, we introduce an unsupervised cancer segmentation framework for histology images. The framework involves an effective contrastive learning scheme for extracting distinctive visual representations for segmentation. The encoder is a Deep U-Net (DU-Net) structure that contains an extra fully convolution layer compared to the normal U-Net. A contrastive learning scheme is developed to so… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

    Comments: arXiv admin note: text overlap with arXiv:2002.05709 by other authors

  17. arXiv:2206.08778  [pdf, other

    cs.CV cs.AI

    CTooth: A Fully Annotated 3D Dataset and Benchmark for Tooth Volume Segmentation on Cone Beam Computed Tomography Images

    Authors: Weiwei Cui, Yaqi Wang, Qianni Zhang, Huiyu Zhou, Dan Song, Xingyong Zuo, Gangyong Jia, Liaoyuan Zeng

    Abstract: 3D tooth segmentation is a prerequisite for computer-aided dental diagnosis and treatment. However, segmenting all tooth regions manually is subjective and time-consuming. Recently, deep learning-based segmentation methods produce convincing results and reduce manual annotation efforts, but it requires a large quantity of ground truth for training. To our knowledge, there are few tooth data availa… ▽ More

    Submitted 17 June, 2022; originally announced June 2022.

  18. arXiv:2206.08524  [pdf, other

    cs.CV

    CDNet: Contrastive Disentangled Network for Fine-Grained Image Categorization of Ocular B-Scan Ultrasound

    Authors: Ruilong Dan, Yunxiang Li, Yijie Wang, Gangyong Jia, Ruiquan Ge, Juan Ye, Qun Jin, Yaqi Wang

    Abstract: Precise and rapid categorization of images in the B-scan ultrasound modality is vital for diagnosing ocular diseases. Nevertheless, distinguishing various diseases in ultrasound still challenges experienced ophthalmologists. Thus a novel contrastive disentangled network (CDNet) is developed in this work, aiming to tackle the fine-grained image categorization (FGIC) challenges of ocular abnormaliti… ▽ More

    Submitted 16 June, 2022; originally announced June 2022.

  19. arXiv:2205.12602  [pdf, other

    cs.CV

    VTP: Volumetric Transformer for Multi-view Multi-person 3D Pose Estimation

    Authors: Yuxing Chen, Renshu Gu, Ouhan Huang, Gangyong Jia

    Abstract: This paper presents Volumetric Transformer Pose estimator (VTP), the first 3D volumetric transformer framework for multi-view multi-person 3D human pose estimation. VTP aggregates features from 2D keypoints in all camera views and directly learns the spatial relationships in the 3D voxel space in an end-to-end fashion. The aggregated 3D features are passed through 3D convolutions before being flat… ▽ More

    Submitted 25 May, 2022; originally announced May 2022.

  20. arXiv:2203.04299  [pdf, other

    eess.IV cs.AI cs.CV

    Plug-and-play Shape Refinement Framework for Multi-site and Lifespan Brain Skull Stripping

    Authors: Yunxiang Li, Ruilong Dan, Shuai Wang, Yifan Cao, Xiangde Luo, Chenghao Tan, Gangyong Jia, Huiyu Zhou, You Zhang, Yaqi Wang, Li Wang

    Abstract: Skull stripping is a crucial prerequisite step in the analysis of brain magnetic resonance images (MRI). Although many excellent works or tools have been proposed, they suffer from low generalization capability. For instance, the model trained on a dataset with specific imaging parameters cannot be well applied to other datasets with different imaging parameters. Especially, for the lifespan datas… ▽ More

    Submitted 22 December, 2022; v1 submitted 8 March, 2022; originally announced March 2022.

    Comments: 11 page

  21. arXiv:2112.10310  [pdf, other

    cs.CV

    Contrastive Attention Network with Dense Field Estimation for Face Completion

    Authors: Xin Ma, Xiaoqiang Zhou, Huaibo Huang, Gengyun Jia, Zhenhua Chai, Xiaolin Wei

    Abstract: Most modern face completion approaches adopt an autoencoder or its variants to restore missing regions in face images. Encoders are often utilized to learn powerful representations that play an important role in meeting the challenges of sophisticated learning tasks. Specifically, various kinds of masks are often presented in face images in the wild, forming complex patterns, especially in this ha… ▽ More

    Submitted 19 December, 2021; originally announced December 2021.

    Comments: Accepted by Pattern Recognition 2021. arXiv admin note: substantial text overlap with arXiv:2010.15643

  22. arXiv:2108.10152  [pdf, other

    eess.SP cs.AI cs.LG cs.MM

    Emotion Recognition from Multiple Modalities: Fundamentals and Methodologies

    Authors: Sicheng Zhao, Guoli Jia, Jufeng Yang, Guiguang Ding, Kurt Keutzer

    Abstract: Humans are emotional creatures. Multiple modalities are often involved when we express emotions, whether we do so explicitly (e.g., facial expression, speech) or implicitly (e.g., text, image). Enabling machines to have emotional intelligence, i.e., recognizing, interpreting, processing, and simulating emotions, is becoming increasingly important. In this tutorial, we discuss several key aspects o… ▽ More

    Submitted 18 August, 2021; originally announced August 2021.

    Comments: Accepted by IEEE Signal Processing Magazine (SPM)

  23. arXiv:2106.16125  [pdf, other

    cs.CV cs.AI cs.MM

    Affective Image Content Analysis: Two Decades Review and New Perspectives

    Authors: Sicheng Zhao, Xingxu Yao, Jufeng Yang, Guoli Jia, Guiguang Ding, Tat-Seng Chua, Björn W. Schuller, Kurt Keutzer

    Abstract: Images can convey rich semantics and induce various emotions in viewers. Recently, with the rapid advancement of emotional intelligence and the explosive growth of visual data, extensive research efforts have been dedicated to affective image content analysis (AICA). In this survey, we will comprehensively review the development of AICA in the recent two decades, especially focusing on the state-o… ▽ More

    Submitted 30 June, 2021; originally announced June 2021.

    Comments: Accepted by IEEE TPAMI

  24. arXiv:2001.03952  [pdf, other

    eess.SP cs.LG stat.ML

    Channel Assignment in Uplink Wireless Communication using Machine Learning Approach

    Authors: Guangyu Jia, Zhaohui Yang, Hak-Keung Lam, Jianfeng Shi, Mohammad Shikh-Bahaei

    Abstract: This letter investigates a channel assignment problem in uplink wireless communication systems. Our goal is to maximize the sum rate of all users subject to integer channel assignment constraints. A convex optimization based algorithm is provided to obtain the optimal channel assignment, where the closed-form solution is obtained in each step. Due to high computational complexity in the convex opt… ▽ More

    Submitted 12 January, 2020; originally announced January 2020.

  25. arXiv:1909.12929  [pdf, other

    cs.CV

    Self-Paced Video Data Augmentation with Dynamic Images Generated by Generative Adversarial Networks

    Authors: Yumeng Zhang, Gaoguo Jia, Li Chen, Mingrui Zhang, Junhai Yong

    Abstract: There is an urgent need for an effective video classification method by means of a small number of samples. The deficiency of samples could be effectively alleviated by generating samples through Generative Adversarial Networks (GAN), but the generation of videos on a typical category remains to be underexplored since the complex actions and the changeable viewpoints are difficult to simulate. In… ▽ More

    Submitted 16 September, 2019; originally announced September 2019.

  26. Theme-Aware Aesthetic Distribution Prediction With Full-Resolution Photographs

    Authors: Gengyun Jia, Peipei Li, Ran He

    Abstract: Aesthetic quality assessment (AQA) is a challenging task due to complex aesthetic factors. Currently, it is common to conduct AQA using deep neural networks that require fixed-size inputs. Existing methods mainly transform images by resizing, cropping, and padding or employ adaptive pooling to alternately capture the aesthetic features from fixed-size inputs. However, these transformations potenti… ▽ More

    Submitted 16 March, 2022; v1 submitted 4 August, 2019; originally announced August 2019.

    Comments: Accepted by IEEE Transactions on Neural Networks and Learning Systems

  27. arXiv:1510.02709  [pdf, other

    cs.DC cs.LG cs.NE

    Large-scale Artificial Neural Network: MapReduce-based Deep Learning

    Authors: Kairan Sun, Xu Wei, Gengtao Jia, Risheng Wang, Ruizhi Li

    Abstract: Faced with continuously increasing scale of data, original back-propagation neural network based machine learning algorithm presents two non-trivial challenges: huge amount of data makes it difficult to maintain both efficiency and accuracy; redundant data aggravates the system workload. This project is mainly focused on the solution to the issues above, combining deep learning algorithm with clou… ▽ More

    Submitted 9 October, 2015; originally announced October 2015.