Skip to main content

Showing 1–50 of 55 results for author: Ge, W

  1. arXiv:2407.10671  [pdf, other

    cs.CL cs.AI

    Qwen2 Technical Report

    Authors: An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang , et al. (34 additional authors not shown)

    Abstract: This report introduces the Qwen2 series, the latest addition to our large language models and large multimodal models. We release a comprehensive suite of foundational and instruction-tuned language models, encompassing a parameter range from 0.5 to 72 billion, featuring dense models and a Mixture-of-Experts model. Qwen2 surpasses most prior open-weight models, including its predecessor Qwen1.5, a… ▽ More

    Submitted 16 July, 2024; v1 submitted 15 July, 2024; originally announced July 2024.

    Comments: 25 pages, 1 figure

  2. arXiv:2405.17104  [pdf, other

    cs.CV cs.AI cs.CL

    LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

    Authors: Haoyu Zhao, Wenhang Ge, Ying-cong Chen

    Abstract: Visual grounding is an essential tool that links user-provided text queries with query-specific regions within an image. Despite advancements in visual grounding models, their ability to comprehend complex queries remains limited. To overcome this limitation, we introduce LLM-Optic, an innovative method that utilizes Large Language Models (LLMs) as an optical lens to enhance existing visual ground… ▽ More

    Submitted 27 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Project Page: https://haoyu-zhao.github.io/LLM-Optic.github.io/

  3. arXiv:2405.15321  [pdf, other

    cs.CV

    SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

    Authors: Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Guangyong Chen, Yijun Li, Ying-Cong Chen

    Abstract: Recent advancements in text-to-image generation have been propelled by the development of diffusion models and multi-modality learning. However, since text is typically represented sequentially in these models, it often falls short in providing accurate contextualization and structural control. So the generated images do not consistently align with human expectations, especially in complex scenari… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  4. arXiv:2404.14329  [pdf, other

    cs.CV

    X-Ray: A Sequential 3D Representation For Generation

    Authors: Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee

    Abstract: We introduce X-Ray, a novel 3D sequential representation inspired by the penetrability of x-ray scans. X-Ray transforms a 3D object into a series of surface frames at different layers, making it suitable for generating 3D models from images. Our method utilizes ray casting from the camera center to capture geometric and textured details, including depth, normal, and color, across all intersected s… ▽ More

    Submitted 1 June, 2024; v1 submitted 22 April, 2024; originally announced April 2024.

  5. arXiv:2404.00343  [pdf, other

    cs.RO

    Commonsense Scene Graph-based Target Localization for Object Search

    Authors: Wenqi Ge, Chao Tang, Hong Zhang

    Abstract: Object search is a fundamental skill for household robots, yet the core problem lies in the robot's ability to locate the target object accurately. The dynamic nature of household environments, characterized by the arbitrary placement of daily objects by users, makes it challenging to perform target localization. To efficiently locate the target object, the robot needs to be equipped with knowledg… ▽ More

    Submitted 30 March, 2024; originally announced April 2024.

  6. arXiv:2402.14392  [pdf, other

    cs.CV

    Reading Relevant Feature from Global Representation Memory for Visual Object Tracking

    Authors: Xinyu Zhou, Pinxue Guo, Lingyi Hong, Jinglun Li, Wei Zhang, Weifeng Ge, Wenqiang Zhang

    Abstract: Reference features from a template or historical frames are crucial for visual object tracking. Prior works utilize all features from a fixed template or memory for visual object tracking. However, due to the dynamic nature of videos, the required reference historical information for different search regions at different time steps is also inconsistent. Therefore, using all features in the templat… ▽ More

    Submitted 26 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

    Comments: 9pages,5 figures, accepted by the Thirty-seventh Conference on Neural Information Processing Systems(Neurips 2023)

  7. arXiv:2401.10712  [pdf, other

    cs.CV cs.AI cs.CL

    Q&A Prompts: Discovering Rich Visual Clues through Mining Question-Answer Prompts for VQA requiring Diverse World Knowledge

    Authors: Haibi Wang, Weifeng Ge

    Abstract: With the breakthrough of multi-modal large language models, answering complex visual questions that demand advanced reasoning abilities and world knowledge has become a much more important testbed for developing AI models than ever. However, equipping AI models with robust cross-modality reasoning ability remains challenging since the cognition scheme of humans has not been understood systematical… ▽ More

    Submitted 14 July, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: Accepted by ECCV'24

  8. arXiv:2401.10711  [pdf, other

    cs.CV cs.AI cs.CL

    Weakly Supervised Gaussian Contrastive Grounding with Large Multimodal Models for Video Question Answering

    Authors: Haibo Wang, Chenghang Lai, Yixuan Sun, Weifeng Ge

    Abstract: Video Question Answering (VideoQA) aims to answer natural language questions based on the information observed in videos. Despite the recent success of Large Multimodal Models (LMMs) in image-language understanding and reasoning, they deal with VideoQA insufficiently, by simply taking uniformly sampled frames as visual inputs, which ignores question-relevant visual clues. Moreover, there are no hu… ▽ More

    Submitted 26 April, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

  9. arXiv:2401.07571  [pdf, other

    cs.CV

    A Bi-Pyramid Multimodal Fusion Method for the Diagnosis of Bipolar Disorders

    Authors: Guoxin Wang, Sheng Shi, Shan An, Fengmei Fan, Wenshu Ge, Qi Wang, Feng Yu, Zhiren Wang

    Abstract: Previous research on the diagnosis of Bipolar disorder has mainly focused on resting-state functional magnetic resonance imaging. However, their accuracy can not meet the requirements of clinical diagnosis. Efficient multimodal fusion strategies have great potential for applications in multimodal data and can further improve the performance of medical diagnosis models. In this work, we utilize bot… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Accepted by IEEE ICASSP 2024

  10. arXiv:2311.13951  [pdf, other

    cs.CL

    MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria

    Authors: Wentao Ge, Shunian Chen, Guiming Hardy Chen, Zhihong Chen, Junying Chen, Shuo Yan, Chenghao Zhu, Ziyue Lin, Wenya Xie, Xinyi Zhang, Yichen Chai, Xiaoyu Liu, Dingjie Song, Xidong Wang, Anningzhe Gao, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang

    Abstract: Multimodal large language models (MLLMs) (e.g., GPT-4V, LLaVA, and Claude-3) have broadened the scope of AI applications. Yet, evaluating their performance presents a significant challenge owing to the inherently subjective nature of tasks that do not yield clear-cut solutions especially for those open-ended queries. Existing automatic evaluation methodologies are mainly limited in evaluating obje… ▽ More

    Submitted 27 April, 2024; v1 submitted 23 November, 2023; originally announced November 2023.

    Comments: 23 pages

  11. arXiv:2309.16609  [pdf, other

    cs.CL

    Qwen Technical Report

    Authors: Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan , et al. (23 additional authors not shown)

    Abstract: Large language models (LLMs) have revolutionized the field of artificial intelligence, enabling natural language processing tasks that were previously thought to be exclusive to humans. In this work, we introduce Qwen, the first installment of our large language model series. Qwen is a comprehensive language model series that encompasses distinct models with varying parameter counts. It includes Q… ▽ More

    Submitted 28 September, 2023; originally announced September 2023.

    Comments: 59 pages, 5 figures

  12. arXiv:2309.09586  [pdf, ps, other

    cs.CR cs.SD eess.AS

    Spoofing attack augmentation: can differently-trained attack models improve generalisation?

    Authors: Wanying Ge, Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Nicholas Evans

    Abstract: A reliable deepfake detector or spoofing countermeasure (CM) should be robust in the face of unpredictable spoofing attacks. To encourage the learning of more generaliseable artefacts, rather than those specific only to known attacks, CMs are usually exposed to a broad variety of different attacks during training. Even so, the performance of deep-learning-based CM solutions are known to vary, some… ▽ More

    Submitted 8 January, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: Accepted to ICASSP 2024

  13. arXiv:2308.14147  [pdf, other

    cs.HC

    Adaptive Assessment of Visualization Literacy

    Authors: Yuan Cui, Lily W. Ge, Yiren Ding, Fumeng Yang, Lane Harrison, Matthew Kay

    Abstract: Visualization literacy is an essential skill for accurately interpreting data to inform critical decisions. Consequently, it is vital to understand the evolution of this ability and devise targeted interventions to enhance it, requiring concise and repeatable assessments of visualization literacy for individuals. However, current assessments, such as the Visualization Literacy Assessment Test (VLA… ▽ More

    Submitted 27 August, 2023; originally announced August 2023.

  14. arXiv:2308.11062  [pdf, other

    cs.CV cs.LG

    UnLoc: A Unified Framework for Video Localization Tasks

    Authors: Shen Yan, Xuehan Xiong, Arsha Nagrani, Anurag Arnab, Zhonghao Wang, Weina Ge, David Ross, Cordelia Schmid

    Abstract: While large-scale image-text pretrained models such as CLIP have been used for multiple video-level tasks on trimmed videos, their use for temporal localization in untrimmed videos is still a relatively unexplored task. We design a new approach for this called UnLoc, which uses pretrained image and text towers, and feeds tokens to a video-text fusion model. The output of the fusion module are then… ▽ More

    Submitted 21 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  15. arXiv:2307.13204  [pdf, other

    cs.RO

    GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping

    Authors: Chao Tang, Dehao Huang, Wenqi Ge, Weiyu Liu, Hong Zhang

    Abstract: Task-oriented grasping (TOG) refers to the problem of predicting grasps on an object that enable subsequent manipulation tasks. To model the complex relationships between objects, tasks, and grasps, existing methods incorporate semantic knowledge as priors into TOG pipelines. However, the existing semantic knowledge is typically constructed based on closed-world concept sets, restraining the gener… ▽ More

    Submitted 20 September, 2023; v1 submitted 24 July, 2023; originally announced July 2023.

    Comments: 15 pages, 8 figures

  16. arXiv:2306.14408  [pdf, other

    cs.CV

    Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models

    Authors: Luozhou Wang, Guibao Shen, Wenhang Ge, Guangyong Chen, Yijun Li, Ying-cong Chen

    Abstract: Text-to-image diffusion models have advanced towards more controllable generation via supporting various additional conditions (e.g.,depth map, bounding box) beyond text. However, these models are learned based on the premise of perfect alignment between the text and extra conditions. If this alignment is not satisfied, the final output could be either dominated by one condition, or ambiguity may… ▽ More

    Submitted 14 July, 2024; v1 submitted 25 June, 2023; originally announced June 2023.

  17. arXiv:2306.12446  [pdf, other

    q-fin.ST cs.LG q-fin.CP q-fin.RM

    Comparing Deep Learning Models for the Task of Volatility Prediction Using Multivariate Data

    Authors: Wenbo Ge, Pooia Lalbakhsh, Leigh Isai, Artem Lensky, Hanna Suominen

    Abstract: This study aims to compare multiple deep learning-based forecasters for the task of predicting volatility using multivariate data. The paper evaluates a range of models, starting from simpler and shallower ones and progressing to deeper and more complex architectures. Additionally, the performance of these models is compared against naive predictions and variations of classical GARCH models. The… ▽ More

    Submitted 23 June, 2023; v1 submitted 20 June, 2023; originally announced June 2023.

  18. arXiv:2306.07655  [pdf, other

    eess.AS cs.CR cs.LG

    Malafide: a novel adversarial convolutive noise attack against deepfake and spoofing detection systems

    Authors: Michele Panariello, Wanying Ge, Hemlata Tak, Massimiliano Todisco, Nicholas Evans

    Abstract: We present Malafide, a universal adversarial attack against automatic speaker verification (ASV) spoofing countermeasures (CMs). By introducing convolutional noise using an optimised linear time-invariant filter, Malafide attacks can be used to compromise CM reliability while preserving other speech attributes such as quality and the speaker's voice. In contrast to other adversarial attacks propos… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: Accepted at INTERSPEECH 2023

  19. arXiv:2306.07186  [pdf, ps, other

    cs.CV cs.LG eess.IV

    CD-CTFM: A Lightweight CNN-Transformer Network for Remote Sensing Cloud Detection Fusing Multiscale Features

    Authors: Wenxuan Ge, Xubing Yang, Li Zhang

    Abstract: Clouds in remote sensing images inevitably affect information extraction, which hinder the following analysis of satellite images. Hence, cloud detection is a necessary preprocessing procedure. However, the existing methods have numerous calculations and parameters. In this letter, a lightweight CNN-Transformer network, CD-CTFM, is proposed to solve the problem. CD-CTFM is based on encoder-decoder… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  20. arXiv:2306.04657  [pdf, other

    cs.CL cs.AI

    Improving Empathetic Dialogue Generation by Dynamically Infusing Commonsense Knowledge

    Authors: Hua Cai, Xuli Shen, Qing Xu, Weilin Shen, Xiaomei Wang, Weifeng Ge, Xiaoqing Zheng, Xiangyang Xue

    Abstract: In empathetic conversations, individuals express their empathy towards others. Previous work has mainly focused on generating empathetic responses by utilizing the speaker's emotion. Besides, external commonsense knowledge has been applied to enhance the system's understandings of the speaker's situation. However, given an event, commonsense knowledge base contains various relations, potentially l… ▽ More

    Submitted 24 May, 2023; originally announced June 2023.

    Comments: Accepted by ACL 2023. arXiv admin note: substantial text overlap with arXiv:2109.05739 by other authors

  21. arXiv:2303.10840  [pdf, other

    cs.CV cs.LG

    Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection

    Authors: Wenhang Ge, Tao Hu, Haoyu Zhao, Shu Liu, Ying-Cong Chen

    Abstract: Neural implicit surface learning has shown significant progress in multi-view 3D reconstruction, where an object is represented by multilayer perceptrons that provide continuous implicit surface representation and view-dependent radiance. However, current methods often fail to accurately reconstruct reflective surfaces, leading to severe ambiguity. To overcome this issue, we propose Ref-NeuS, whic… ▽ More

    Submitted 17 July, 2023; v1 submitted 19 March, 2023; originally announced March 2023.

    Comments: ICCV 2023, Project webpage: https://g3956.github.io/

  22. arXiv:2212.09247  [pdf, other

    cs.CV cs.LG eess.IV

    ColoristaNet for Photorealistic Video Style Transfer

    Authors: Xiaowen Qiu, Ruize Xu, Boan He, Yingtao Zhang, Wenqiang Zhang, Weifeng Ge

    Abstract: Photorealistic style transfer aims to transfer the artistic style of an image onto an input image or video while keeping photorealism. In this paper, we think it's the summary statistics matching scheme in existing algorithms that leads to unrealistic stylization. To avoid employing the popular Gram loss, we propose a self-supervised style transfer framework, which contains a style removal part an… ▽ More

    Submitted 21 December, 2022; v1 submitted 18 December, 2022; originally announced December 2022.

    Comments: 30 pages, 29 figures

  23. arXiv:2212.04408  [pdf, other

    cs.CV cs.AI cs.CL cs.LG

    OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models

    Authors: Jinze Bai, Rui Men, Hao Yang, Xuancheng Ren, Kai Dang, Yichang Zhang, Xiaohuan Zhou, Peng Wang, Sinan Tan, An Yang, Zeyu Cui, Yu Han, Shuai Bai, Wenbin Ge, Jianxin Ma, Junyang Lin, Jingren Zhou, Chang Zhou

    Abstract: Generalist models, which are capable of performing diverse multi-modal tasks in a task-agnostic way within a single model, have been explored recently. Being, hopefully, an alternative to approaching general-purpose AI, existing generalist models are still at an early stage, where modality and task coverage is limited. To empower multi-modal task-scaling and speed up this line of research, we rele… ▽ More

    Submitted 8 December, 2022; originally announced December 2022.

  24. arXiv:2211.15320  [pdf, other

    cs.CV

    RankDNN: Learning to Rank for Few-shot Learning

    Authors: Qianyu Guo, Hongtong Gong, Xujun Wei, Yanwei Fu, Weifeng Ge, Yizhou Yu, Wenqiang Zhang

    Abstract: This paper introduces a new few-shot learning pipeline that casts relevance ranking for image retrieval as binary ranking relation classification. In comparison to image classification, ranking relation classification is sample efficient and domain agnostic. Besides, it provides a new perspective on few-shot learning and is complementary to state-of-the-art methods. The core component of our deep… ▽ More

    Submitted 29 November, 2022; v1 submitted 28 November, 2022; originally announced November 2022.

    Comments: 12 pages, 4 figures. Accepted to AAAI2023. The code is available at: https://github.com/guoqianyu-alberta/RankDNN

  25. arXiv:2211.05207  [pdf, other

    cs.CV cs.AI cs.LG

    Interpretable Machine Learning System to EEG Patterns on the Ictal-Interictal-Injury Continuum

    Authors: Alina Jade Barnett, Zhicheng Guo, Jin Jing, Wendong Ge, Cynthia Rudin, M. Brandon Westover

    Abstract: In intensive care units (ICUs), critically ill patients are monitored with electroencephalograms (EEGs) to prevent serious brain injury. The number of patients who can be monitored is constrained by the availability of trained physicians to read EEGs, and EEG interpretation can be subjective and prone to inter-observer variability. Automated deep learning systems for EEG could reduce human bias an… ▽ More

    Submitted 11 April, 2023; v1 submitted 9 November, 2022; originally announced November 2022.

    Comments: 20 pages including appendices, 7 figures, submitted for peer review

    ACM Class: I.2.6; I.4.9; I.5.4

  26. arXiv:2210.14229  [pdf, other

    cs.LG cs.AI cs.CR

    Causal Information Bottleneck Boosts Adversarial Robustness of Deep Neural Network

    Authors: Huan Hua, Jun Yan, Xi Fang, Weiquan Huang, Huilin Yin, Wancheng Ge

    Abstract: The information bottleneck (IB) method is a feasible defense solution against adversarial attacks in deep learning. However, this method suffers from the spurious correlation, which leads to the limitation of its further improvement of adversarial robustness. In this paper, we incorporate the causal inference into the IB framework to alleviate such a problem. Specifically, we divide the features o… ▽ More

    Submitted 25 October, 2022; originally announced October 2022.

  27. arXiv:2209.13307  [pdf, other

    cs.CV cs.CL cs.IR

    Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

    Authors: Chengzhi Lin, Ancong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen

    Abstract: Cross-modal retrieval between videos and texts has gained increasing research interest due to the rapid emergence of videos on the web. Generally, a video contains rich instance and event information and the query text only describes a part of the information. Thus, a video can correspond to multiple different text descriptions and queries. We call this phenomenon the ``Video-Text Correspondence A… ▽ More

    Submitted 27 September, 2022; originally announced September 2022.

    Comments: NIPS2022

    Journal ref: NIPS2022

  28. arXiv:2206.03727  [pdf, other

    cs.CV

    Wavelet Regularization Benefits Adversarial Training

    Authors: Jun Yan, Huilin Yin, Xiaoyang Deng, Ziming Zhao, Wancheng Ge, Hao Zhang, Gerhard Rigoll

    Abstract: Adversarial training methods are state-of-the-art (SOTA) empirical defense methods against adversarial examples. Many regularization methods have been proven to be effective with the combination of adversarial training. Nevertheless, such regularization methods are implemented in the time domain. Since adversarial vulnerability can be regarded as a high-frequency phenomenon, it is essential to reg… ▽ More

    Submitted 8 June, 2022; originally announced June 2022.

    Comments: Preprint version

  29. arXiv:2203.15210  [pdf, other

    cs.CV

    Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification

    Authors: Chao Wu, Wenhang Ge, Ancong Wu, Xiaobin Chang

    Abstract: To learn camera-view invariant features for person Re-IDentification (Re-ID), the cross-camera image pairs of each person play an important role. However, such cross-view training samples could be unavailable under the ISolated Camera Supervised (ISCS) setting, e.g., a surveillance system deployed across distant scenes. To handle this challenging problem, a new pipeline is introduced by synthesizi… ▽ More

    Submitted 4 April, 2022; v1 submitted 28 March, 2022; originally announced March 2022.

    Comments: 11 pages, 9 figures, accepted by CVPR 2022

  30. arXiv:2203.09463  [pdf, other

    cs.CV

    FERV39k: A Large-Scale Multi-Scene Dataset for Facial Expression Recognition in Videos

    Authors: Yan Wang, Yixuan Sun, Yiwen Huang, Zhongying Liu, Shuyong Gao, Wei Zhang, Weifeng Ge, Wenqiang Zhang

    Abstract: Current benchmarks for facial expression recognition (FER) mainly focus on static images, while there are limited datasets for FER in videos. It is still ambiguous to evaluate whether performances of existing methods remain satisfactory in real-world application-oriented scenes. For example, the "Happy" expression with high intensity in Talk-Show is more discriminating than the same expression wit… ▽ More

    Submitted 20 March, 2022; v1 submitted 17 March, 2022; originally announced March 2022.

    Comments: Accepted for CVPR2022

  31. arXiv:2203.09064  [pdf, other

    cs.CV cs.AI cs.LG

    Attribute Surrogates Learning and Spectral Tokens Pooling in Transformers for Few-shot Learning

    Authors: Yangji He, Weihan Liang, Dongyang Zhao, Hong-Yu Zhou, Weifeng Ge, Yizhou Yu, Wenqiang Zhang

    Abstract: This paper presents new hierarchically cascaded transformers that can improve data efficiency through attribute surrogates learning and spectral tokens pooling. Vision transformers have recently been thought of as a promising alternative to convolutional neural networks for visual recognition. But when there is no sufficient data, it gets stuck in overfitting and shows inferior performance. To imp… ▽ More

    Submitted 16 March, 2022; originally announced March 2022.

    Comments: To appear in CVPR 2022, codes are released at https://github.com/StomachCold/HCTransformers

  32. arXiv:2203.06935  [pdf

    cs.MM

    A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances

    Authors: Yan Wang, Wei Song, Wei Tao, Antonio Liotta, Dawei Yang, Xinlei Li, Shuyong Gao, Yixuan Sun, Weifeng Ge, Wei Zhang, Wenqiang Zhang

    Abstract: Affective computing plays a key role in human-computer interactions, entertainment, teaching, safe driving, and multimedia integration. Major breakthroughs have been made recently in the areas of affective computing (i.e., emotion recognition and sentiment analysis). Affective computing is realized based on unimodal or multimodal data, primarily consisting of physical information (e.g., textual, a… ▽ More

    Submitted 20 March, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

    Comments: Accepted for Information Fusion

  33. Effects of Epileptiform Activity on Discharge Outcome in Critically Ill Patients

    Authors: Harsh Parikh, Kentaro Hoffman, Haoqi Sun, Wendong Ge, Jin Jing, Rajesh Amerineni, Lin Liu, Jimeng Sun, Sahar Zafar, Aaron Struck, Alexander Volfovsky, Cynthia Rudin, M. Brandon Westover

    Abstract: Epileptiform activity (EA) is associated with worse outcomes including increased risk of disability and death. However, the effect of EA on the neurologic outcome is confounded by the feedback between treatment with anti-seizure medications (ASM) and EA burden. A randomized clinical trial is challenging due to the sequential nature of EA-ASM feedback, as well as ethical reasons. However, some mech… ▽ More

    Submitted 11 March, 2023; v1 submitted 9 March, 2022; originally announced March 2022.

    Comments: 4 Figures

  34. arXiv:2202.13693  [pdf, other

    eess.AS cs.SD

    Explainable deepfake and spoofing detection: an attack analysis using SHapley Additive exPlanations

    Authors: Wanying Ge, Massimiliano Todisco, Nicholas Evans

    Abstract: Despite several years of research in deepfake and spoofing detection for automatic speaker verification, little is known about the artefacts that classifiers use to distinguish between bona fide and spoofed utterances. An understanding of these is crucial to the design of trustworthy, explainable solutions. In this paper we report an extension of our previous work to better understand classifier b… ▽ More

    Submitted 4 May, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: Accepted to Speaker Odyssey Workshop 2022

  35. arXiv:2202.00478  [pdf

    cs.CL

    NeuraHealth: An Automated Screening Pipeline to Detect Undiagnosed Cognitive Impairment in Electronic Health Records with Deep Learning and Natural Language Processing

    Authors: Tanish Tyagi, Colin G. Magdamo, Ayush Noori, Zhaozhi Li, Xiao Liu, Mayuresh Deodhar, Zhuoqiao Hong, Wendong Ge, Elissa M. Ye, Yi-han Sheu, Haitham Alabsi, Laura Brenner, Gregory K. Robbins, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Alberto Serrano-Pozo, Dimitry Prokopenko, Rudolph E. Tanzi, Bradley T. Hyman, Deborah Blacker, Shibani S. Mukerji, M. Brandon Westover, Sudeshna Das

    Abstract: Dementia related cognitive impairment (CI) is a neurodegenerative disorder, affecting over 55 million people worldwide and growing rapidly at the rate of one new case every 3 seconds. 75% cases go undiagnosed globally with up to 90% in low-and-middle-income countries, leading to an estimated annual worldwide cost of USD 1.3 trillion, forecasted to reach 2.8 trillion by 2030. With no cure, a recurr… ▽ More

    Submitted 20 June, 2022; v1 submitted 12 January, 2022; originally announced February 2022.

  36. arXiv:2111.09115  [pdf, other

    cs.CL cs.LG

    Using Deep Learning to Identify Patients with Cognitive Impairment in Electronic Health Records

    Authors: Tanish Tyagi, Colin G. Magdamo, Ayush Noori, Zhaozhi Li, Xiao Liu, Mayuresh Deodhar, Zhuoqiao Hong, Wendong Ge, Elissa M. Ye, Yi-han Sheu, Haitham Alabsi, Laura Brenner, Gregory K. Robbins, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Alberto Serrano-Pozo, Dimitry Prokopenko, Rudolph E. Tanzi, Bradley T. Hyman, Deborah Blacker, Shibani S. Mukerji, M. Brandon Westover, Sudeshna Das

    Abstract: Dementia is a neurodegenerative disorder that causes cognitive decline and affects more than 50 million people worldwide. Dementia is under-diagnosed by healthcare professionals - only one in four people who suffer from dementia are diagnosed. Even when a diagnosis is made, it may not be entered as a structured International Classification of Diseases (ICD) diagnosis code in a patient's charts. In… ▽ More

    Submitted 12 November, 2021; originally announced November 2021.

    Comments: Machine Learning for Health (ML4H) - Extended Abstract

  37. Multi-agent Reinforcement Learning for Cooperative Lane Changing of Connected and Autonomous Vehicles in Mixed Traffic

    Authors: Wei Zhou, Dong Chen, Jun Yan, Zhaojian Li, Huilin Yin, Wanchen Ge

    Abstract: Autonomous driving has attracted significant research interests in the past two decades as it offers many potential benefits, including releasing drivers from exhausting driving and mitigating traffic congestion, among others. Despite promising progress, lane-changing remains a great challenge for autonomous vehicles (AV), especially in mixed and dynamic traffic scenarios. Recently, reinforcement… ▽ More

    Submitted 5 January, 2024; v1 submitted 11 November, 2021; originally announced November 2021.

    Comments: This paper was published on Autonomous Intelligent Systems (Volume 2, article number 5, 2022)

    Journal ref: Autonomous Intelligent Systems, 2(1) (2022)

  38. arXiv:2108.04409  [pdf, ps, other

    cs.LG cs.AI cs.CR cs.CV

    On Procedural Adversarial Noise Attack And Defense

    Authors: Jun Yan, Xiaoyang Deng, Huilin Yin, Wancheng Ge

    Abstract: Deep Neural Networks (DNNs) are vulnerable to adversarial examples which would inveigle neural networks to make prediction errors with small perturbations on the input images. Researchers have been devoted to promoting the research on the universal adversarial perturbations (UAPs) which are gradient-free and have little prior knowledge on data distributions. Procedural adversarial noise attack is… ▽ More

    Submitted 26 August, 2021; v1 submitted 9 August, 2021; originally announced August 2021.

    Comments: Remove theoretical analysis and focus on the empirical study

  39. arXiv:2108.00580  [pdf, other

    cs.CV

    GraphFPN: Graph Feature Pyramid Network for Object Detection

    Authors: Gangming Zhao, Weifeng Ge, Yizhou Yu

    Abstract: Feature pyramids have been proven powerful in image understanding tasks that require multi-scale features. State-of-the-art methods for multi-scale feature learning focus on performing feature interactions across space and scales using neural networks with a fixed topology. In this paper, we propose graph feature pyramid networks that are capable of adapting their topological structures to varying… ▽ More

    Submitted 8 January, 2022; v1 submitted 1 August, 2021; originally announced August 2021.

    Comments: accepted by ICCV 2021, codes are updated at https://github.com/GangmingZhao/GraphFPN-Graph-Feature-Pyramid-Network-for-Object-Detection

  40. arXiv:2108.00211  [pdf, other

    cs.CV

    Multi-scale Matching Networks for Semantic Correspondence

    Authors: Dongyang Zhao, Ziyang Song, Zhenghao Ji, Gangming Zhao, Weifeng Ge, Yizhou Yu

    Abstract: Deep features have been proven powerful in building accurate dense semantic correspondences in various previous works. However, the multi-scale and pyramidal hierarchy of convolutional neural networks has not been well studied to learn discriminative pixel-level features for semantic correspondence. In this paper, we propose a multi-scale matching network that is sensitive to tiny semantic differe… ▽ More

    Submitted 27 August, 2021; v1 submitted 31 July, 2021; originally announced August 2021.

    Comments: Accepted to appear in ICCV 2021

  41. Cross-Camera Feature Prediction for Intra-Camera Supervised Person Re-identification across Distant Scenes

    Authors: Wenhang Ge, Chunyan Pan, Ancong Wu, Hongwei Zheng, Wei-Shi Zheng

    Abstract: Person re-identification (Re-ID) aims to match person images across non-overlapping camera views. The majority of Re-ID methods focus on small-scale surveillance systems in which each pedestrian is captured in different camera views of adjacent scenes. However, in large-scale surveillance systems that cover larger areas, it is required to track a pedestrian of interest across distant scenes (e.g.,… ▽ More

    Submitted 29 July, 2021; originally announced July 2021.

    Comments: 10 pages, 6 figures, accepted by ACM International Conference on Multimedia

  42. arXiv:2104.03123  [pdf, other

    cs.LG cs.SD eess.AS

    Partially-Connected Differentiable Architecture Search for Deepfake and Spoofing Detection

    Authors: Wanying Ge, Michele Panariello, Jose Patino, Massimiliano Todisco, Nicholas Evans

    Abstract: This paper reports the first successful application of a differentiable architecture search (DARTS) approach to the deepfake and spoofing detection problems. An example of neural architecture search, DARTS operates upon a continuous, differentiable search space which enables both the architecture and parameters to be optimised via gradient descent. Solutions based on partially-connected DARTS use… ▽ More

    Submitted 30 June, 2021; v1 submitted 7 April, 2021; originally announced April 2021.

    Comments: Accepted to INTERSPEECH 2021

  43. arXiv:2011.08609  [pdf, other

    cs.SD eess.AS

    Accent and Speaker Disentanglement in Many-to-many Voice Conversion

    Authors: Zhichao Wang, Wenshuo Ge, Xiong Wang, Shan Yang, Wendong Gan, Haitao Chen, Hai Li, Lei Xie, Xiulin Li

    Abstract: This paper proposes an interesting voice and accent joint conversion approach, which can convert an arbitrary source speaker's voice to a target speaker with non-native accent. This problem is challenging as each target speaker only has training data in native accent and we need to disentangle accent and speaker information in the conversion model training and re-combine them in the conversion sta… ▽ More

    Submitted 17 November, 2020; originally announced November 2020.

    Comments: Accepted to ISCSLP2021

  44. arXiv:2011.06489  [pdf, other

    cs.CL

    Natural Language Processing to Detect Cognitive Concerns in Electronic Health Records Using Deep Learning

    Authors: Zhuoqiao Hong, Colin G. Magdamo, Yi-han Sheu, Prathamesh Mohite, Ayush Noori, Elissa M. Ye, Wendong Ge, Haoqi Sun, Laura Brenner, Gregory Robbins, Shibani Mukerji, Sahar Zafar, Nicole Benson, Lidia Moura, John Hsu, Bradley T. Hyman, Michael B. Westover, Deborah Blacker, Sudeshna Das

    Abstract: Dementia is under-recognized in the community, under-diagnosed by healthcare professionals, and under-coded in claims data. Information on cognitive dysfunction, however, is often found in unstructured clinician notes within medical records but manual review by experts is time consuming and often prone to errors. Automated mining of these notes presents a potential opportunity to label patients wi… ▽ More

    Submitted 12 November, 2020; originally announced November 2020.

    Comments: Machine Learning for Health (ML4H) at NeurIPS 2020 - Extended Abstract

    MSC Class: I.2.7

  45. arXiv:2008.13768  [pdf, other

    cs.SE cs.CR

    A3Ident: A Two-phased Approach to Identify the Leading Authors of Android Apps

    Authors: Wei Wang, Guozhu Meng, Haoyu Wang, Kai Chen, Weimin Ge, Xiaohong Li

    Abstract: Authorship identification is the process of identifying and classifying authors through given codes. Authorship identification can be used in a wide range of software domains, e.g., code authorship disputes, plagiarism detection, exposure of attackers' identity. Besides the inherent challenges from legacy software development, framework programming and crowdsourcing mode in Android raise the diffi… ▽ More

    Submitted 31 August, 2020; originally announced August 2020.

    Comments: 12 pages

    Journal ref: ICSME 2020: 36th IEEE International Conference on Software Maintenance and Evolution

  46. arXiv:1910.02624  [pdf, other

    cs.CV

    Label-PEnet: Sequential Label Propagation and Enhancement Networks for Weakly Supervised Instance Segmentation

    Authors: Weifeng Ge, Sheng Guo, Weilin Huang, Matthew R. Scott

    Abstract: Weakly-supervised instance segmentation aims to detect and segment object instances precisely, given imagelevel labels only. Unlike previous methods which are composed of multiple offline stages, we propose Sequential Label Propagation and Enhancement Networks (referred as Label-PEnet) that progressively transform image-level labels to pixel-wise labels in a coarse-to-fine manner. We design four c… ▽ More

    Submitted 24 April, 2020; v1 submitted 7 October, 2019; originally announced October 2019.

    Comments: Rectifiy some typos in Arxiv title

  47. arXiv:1903.07796  [pdf, other

    cs.CR

    Umbrella: Enabling ISPs to Offer Readily Deployable and Privacy-Preserving DDoS Prevention Services

    Authors: Zhuotao Liu, Yuan Cao, Min Zhu, Wei Ge

    Abstract: Defending against distributed denial of service (DDoS) attacks in the Internet is a fundamental problem. However, recent industrial interviews with over 100 security experts from more than ten industry segments indicate that DDoS problems have not been fully addressed. The reasons are twofold. On one hand, many academic proposals that are provably secure witness little real-world deployment. On th… ▽ More

    Submitted 6 April, 2019; v1 submitted 18 March, 2019; originally announced March 2019.

  48. arXiv:1903.02827  [pdf, other

    cs.CV

    Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up

    Authors: Weifeng Ge, Xiangru Lin, Yizhou Yu

    Abstract: Given a training dataset composed of images and corresponding category labels, deep convolutional neural networks show a strong ability in mining discriminative parts for image classification. However, deep convolutional neural networks trained with image level labels only tend to focus on the most discriminative parts while missing other object parts, which could provide complementary information… ▽ More

    Submitted 7 March, 2019; originally announced March 2019.

    Comments: Accepted to appear in CVPR 2019

  49. arXiv:1810.12582  [pdf, other

    cs.LG stat.ML

    DSKG: A Deep Sequential Model for Knowledge Graph Completion

    Authors: Lingbing Guo, Qingheng Zhang, Weiyi Ge, Wei Hu, Yuzhong Qu

    Abstract: Knowledge graph (KG) completion aims to fill the missing facts in a KG, where a fact is represented as a triple in the form of $(subject, relation, object)$. Current KG completion models compel two-thirds of a triple provided (e.g., $subject$ and $relation$) to predict the remaining one. In this paper, we propose a new model, which uses a KG-specific multi-layer recurrent neural network (RNN) to m… ▽ More

    Submitted 30 December, 2018; v1 submitted 30 October, 2018; originally announced October 2018.

    Comments: CCKS (China Conference on Knowledge Graph and Semantic Computing) Best English Paper Award 2018

  50. arXiv:1810.06951  [pdf, other

    cs.CV

    Deep Metric Learning with Hierarchical Triplet Loss

    Authors: Weifeng Ge, Weilin Huang, Dengke Dong, Matthew R. Scott

    Abstract: We present a novel hierarchical triplet loss (HTL) capable of automatically collecting informative training samples (triplets) via a defined hierarchical tree that encodes global context information. This allows us to cope with the main limitation of random sampling in training a conventional triplet loss, which is a central issue for deep metric learning. Our main contributions are two-fold. (i)… ▽ More

    Submitted 16 October, 2018; originally announced October 2018.

    Comments: Published in ECCV 2018