Skip to main content

Showing 1–50 of 131 results for author: Shi, D

  1. arXiv:2407.05320  [pdf, other

    cs.AI

    KAE: A Property-based Method for Knowledge Graph Alignment and Extension

    Authors: Daqian Shi, Xiaoyue Li, Fausto Giunchiglia

    Abstract: A common solution to the semantic heterogeneity problem is to perform knowledge graph (KG) extension exploiting the information encoded in one or more candidate KGs, where the alignment between the reference KG and candidate KGs is considered the critical procedure. However, existing KG alignment methods mainly rely on entity type (etype) label matching as a prerequisite, which is poorly performin… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2405.02463

  2. arXiv:2406.18406  [pdf, other

    cs.CL cs.AI

    IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

    Authors: Dan Shi, Renren Jin, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong

    Abstract: It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Ide… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, 5 tables

  3. arXiv:2406.12496  [pdf, other

    cs.CV

    Reparameterizable Dual-Resolution Network for Real-time Semantic Segmentation

    Authors: Guoyu Yang, Yuan Wang, Daming Shi

    Abstract: Semantic segmentation plays a key role in applications such as autonomous driving and medical image. Although existing real-time semantic segmentation models achieve a commendable balance between accuracy and speed, their multi-path blocks still affect overall speed. To address this issue, this study proposes a Reparameterizable Dual-Resolution Network (RDRNet) dedicated to real-time semantic segm… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  4. arXiv:2406.11678  [pdf, other

    cs.IR cs.CL

    TourRank: Utilizing Large Language Models for Documents Ranking with a Tournament-Inspired Strategy

    Authors: Yiqun Chen, Qi Liu, Yi Zhang, Weiwei Sun, Daiting Shi, Jiaxin Mao, Dawei Yin

    Abstract: Large Language Models (LLMs) are increasingly employed in zero-shot documents ranking, yielding commendable results. However, several significant challenges still persist in LLMs for ranking: (1) LLMs are constrained by limited input length, precluding them from processing a large number of documents simultaneously; (2) The output document sequence is influenced by the input order of documents, re… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  5. arXiv:2406.06644  [pdf, other

    cs.LG cs.AI

    Latent Diffusion Model-Enabled Real-Time Semantic Communication Considering Semantic Ambiguities and Channel Noises

    Authors: Jianhua Pei, Cheng Feng, Ping Wang, Hina Tabassum, Dongyuan Shi

    Abstract: Semantic communication (SemCom) has emerged as a new paradigm for 6G communication, with deep learning (DL) models being one of the key drives to shift from the accuracy of bit/symbol to the semantics and pragmatics of data. Nevertheless, DL-based SemCom systems often face performance bottlenecks due to overfitting, poor generalization, and sensitivity to outliers. Furthermore, the varying-fading… ▽ More

    Submitted 24 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

  6. arXiv:2406.01993  [pdf

    eess.IV cs.CV

    Choroidal Vessel Segmentation on Indocyanine Green Angiography Images via Human-in-the-Loop Labeling

    Authors: Ruoyu Chen, Ziwei Zhao, Mayinuer Yusufu, Xianwen Shang, Danli Shi, Mingguang He

    Abstract: Human-in-the-loop (HITL) strategy has been recently introduced into the field of medical image processing. Indocyanine green angiography (ICGA) stands as a well-established examination for visualizing choroidal vasculature and detecting chorioretinal diseases. However, the intricate nature of choroidal vascular networks makes large-scale manual segmentation of ICGA images challenging. Thus, the st… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: 25 pages,4 figures

  7. arXiv:2405.12521  [pdf, other

    cs.LG

    Unleash Graph Neural Networks from Heavy Tuning

    Authors: Lequan Lin, Dai Shi, Andi Han, Zhiyong Wang, Junbin Gao

    Abstract: Graph Neural Networks (GNNs) are deep-learning architectures designed for graph-type data, where understanding relationships among individual observations is crucial. However, achieving promising GNN performance, especially on unseen data, requires comprehensive hyperparameter tuning and meticulous training. Unfortunately, these processes come with high computational costs and significant human ef… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  8. arXiv:2405.12496  [pdf, other

    eess.AS cs.NI cs.SD eess.SP

    A Survey of Integrating Wireless Technology into Active Noise Control

    Authors: Xiaoyi Shen, Dongyuan Shi, Zhengding Luo, Junwei Ji, Woon-Seng Gan

    Abstract: Active Noise Control (ANC) is a widely adopted technology for reducing environmental noise across various scenarios. This paper focuses on enhancing noise reduction performance, particularly through the refinement of signal quality fed into ANC systems. We discuss the main wireless technique integrated into the ANC system, equipped with some innovative algorithms, in diverse environments. Instead… ▽ More

    Submitted 21 May, 2024; originally announced May 2024.

  9. arXiv:2405.11338  [pdf

    cs.CV cs.AI

    EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging

    Authors: Danli Shi, Weiyi Zhang, Xiaolan Chen, Yexin Liu, Jiancheng Yang, Siyu Huang, Yih Chung Tham, Yingfeng Zheng, Mingguang He

    Abstract: Artificial intelligence (AI) is vital in ophthalmology, tackling tasks like diagnosis, classification, and visual question answering (VQA). However, existing AI models in this domain often require extensive annotation and are task-specific, limiting their clinical utility. While recent developments have brought about foundation models for ophthalmology, they are limited by the need to train separa… ▽ More

    Submitted 21 May, 2024; v1 submitted 18 May, 2024; originally announced May 2024.

    Comments: 21 pages, 2 figures, 4 tables

  10. arXiv:2405.09841  [pdf, other

    stat.ML cs.LG

    Simultaneous Identification of Sparse Structures and Communities in Heterogeneous Graphical Models

    Authors: Dapeng Shi, Tiandong Wang, Zhiliang Ying

    Abstract: Exploring and detecting community structures hold significant importance in genetics, social sciences, neuroscience, and finance. Especially in graphical models, community detection can encourage the exploration of sets of variables with group-like properties. In this paper, within the framework of Gaussian graphical models, we introduce a novel decomposition of the underlying graphical structure… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: 61 pages, 11 figures, 4 tables

  11. arXiv:2405.07468  [pdf

    cs.CL cs.AI

    Evaluating large language models in medical applications: a survey

    Authors: Xiaolan Chen, Jiayang Xiang, Shanfu Lu, Yexin Liu, Mingguang He, Danli Shi

    Abstract: Large language models (LLMs) have emerged as powerful tools with transformative potential across numerous domains, including healthcare and medicine. In the medical domain, LLMs hold promise for tasks ranging from clinical decision support to patient education. However, evaluating the performance of LLMs in medical contexts presents unique challenges due to the complex and critical nature of medic… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 4 figures, 1 table

  12. arXiv:2405.02463  [pdf, other

    cs.AI

    Knowledge Graph Extension by Entity Type Recognition

    Authors: Daqian Shi

    Abstract: Knowledge graphs have emerged as a sophisticated advancement and refinement of semantic networks, and their deployment is one of the critical methodologies in contemporary artificial intelligence. The construction of knowledge graphs is a multifaceted process involving various techniques, where researchers aim to extract the knowledge from existing resources for the construction since building fro… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: PhD thesis

  13. arXiv:2404.03869  [pdf, other

    cs.LG cs.AI cs.MA cs.RO eess.SY

    Heterogeneous Multi-Agent Reinforcement Learning for Zero-Shot Scalable Collaboration

    Authors: Xudong Guo, Daming Shi, Junjie Yu, Wenhui Fan

    Abstract: The rise of multi-agent systems, especially the success of multi-agent reinforcement learning (MARL), is reshaping our future across diverse domains like autonomous vehicle networks. However, MARL still faces significant challenges, particularly in achieving zero-shot scalability, which allows trained MARL models to be directly applied to unseen tasks with varying numbers of agents. In addition, r… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  14. arXiv:2404.01622  [pdf, ps, other

    cs.HC cs.AI cs.GR

    Gen4DS: Workshop on Data Storytelling in an Era of Generative AI

    Authors: Xingyu Lan, Leni Yang, Zezhong Wang, Yun Wang, Danqing Shi, Sheelagh Carpendale

    Abstract: Storytelling is an ancient and precious human ability that has been rejuvenated in the digital age. Over the last decade, there has been a notable surge in the recognition and application of data storytelling, both in academia and industry. Recently, the rapid development of generative AI has brought new opportunities and challenges to this field, sparking numerous new questions. These questions m… ▽ More

    Submitted 5 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

  15. arXiv:2403.17421  [pdf, other

    cs.IR cs.AI

    MA4DIV: Multi-Agent Reinforcement Learning for Search Result Diversification

    Authors: Yiqun Chen, Jiaxin Mao, Yi Zhang, Dehong Ma, Long Xia, Jun Fan, Daiting Shi, Zhicong Cheng, Simiu Gu, Dawei Yin

    Abstract: The objective of search result diversification (SRD) is to ensure that selected documents cover as many different subtopics as possible. Existing methods primarily utilize a paradigm of "greedy selection", i.e., selecting one document with the highest diversity score at a time. These approaches tend to be inefficient and are easily trapped in a suboptimal state. In addition, some other methods aim… ▽ More

    Submitted 27 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

  16. arXiv:2403.07012  [pdf, other

    cs.LG

    Non-Intrusive Load Monitoring with Missing Data Imputation Based on Tensor Decomposition

    Authors: DengYu Shi

    Abstract: With the widespread adoption of Non-Intrusive Load Monitoring (NILM) in building energy management, ensuring the high quality of NILM data has become imperative. However, practical applications of NILM face challenges associated with data loss, significantly impacting accuracy and reliability in energy management. This paper addresses the issue of NILM data loss by introducing an innovative tensor… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  17. arXiv:2403.00840  [pdf

    cs.CL cs.AI

    EyeGPT: Ophthalmic Assistant with Large Language Models

    Authors: Xiaolan Chen, Ziwei Zhao, Weiyi Zhang, Pusheng Xu, Le Gao, Mingpu Xu, Yue Wu, Yinwen Li, Danli Shi, Mingguang He

    Abstract: Artificial intelligence (AI) has gained significant attention in healthcare consultation due to its potential to improve clinical workflow and enhance medical communication. However, owing to the complex nature of medical information, large language models (LLM) trained with general world knowledge might not possess the capability to tackle medical-related tasks at an expert level. Here, we introd… ▽ More

    Submitted 29 February, 2024; originally announced March 2024.

    Comments: 47 pages, 4 figures, 1 table, 2 supplementary figures and 9 supplementary tables

  18. Unsupervised learning based end-to-end delayless generative fixed-filter active noise control

    Authors: Zhengding Luo, Dongyuan Shi, Xiaoyi Shen, Woon-Seng Gan

    Abstract: Delayless noise control is achieved by our earlier generative fixed-filter active noise control (GFANC) framework through efficient coordination between the co-processor and real-time controller. However, the one-dimensional convolutional neural network (1D CNN) in the co-processor requires initial training using labelled noise datasets. Labelling noise data can be resource-intensive and may intro… ▽ More

    Submitted 8 February, 2024; originally announced February 2024.

    Comments: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024)

  19. arXiv:2402.02694  [pdf, other

    eess.AS cs.LG cs.SD

    Description on IEEE ICME 2024 Grand Challenge: Semi-supervised Acoustic Scene Classification under Domain Shift

    Authors: Jisheng Bai, Mou Wang, Haohe Liu, Han Yin, Yafei Jia, Siwei Huang, Yutong Du, Dongzhe Zhang, Dongyuan Shi, Woon-Seng Gan, Mark D. Plumbley, Susanto Rahardja, Bin Xiang, Jianfeng Chen

    Abstract: Acoustic scene classification (ASC) is a crucial research problem in computational auditory scene analysis, and it aims to recognize the unique acoustic characteristics of an environment. One of the challenges of the ASC task is the domain shift between training and testing data. Since 2018, ASC challenges have focused on the generalization of ASC models across different recording devices. Althoug… ▽ More

    Submitted 28 February, 2024; v1 submitted 4 February, 2024; originally announced February 2024.

  20. arXiv:2401.14580  [pdf, other

    cs.LG

    Design Your Own Universe: A Physics-Informed Agnostic Method for Enhancing Graph Neural Networks

    Authors: Dai Shi, Andi Han, Lequan Lin, Yi Guo, Zhiyong Wang, Junbin Gao

    Abstract: Physics-informed Graph Neural Networks have achieved remarkable performance in learning through graph-structured data by mitigating common GNN challenges such as over-smoothing, over-squashing, and heterophily adaption. Despite these advancements, the development of a simple yet effective paradigm that appropriately integrates previous methods for handling all these challenges is still underway. I… ▽ More

    Submitted 12 June, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

  21. arXiv:2401.10222  [pdf, other

    cs.CV cs.AI

    Supervised Fine-tuning in turn Improves Visual Foundation Models

    Authors: Xiaohu Jiang, Yixiao Ge, Yuying Ge, Dachuan Shi, Chun Yuan, Ying Shan

    Abstract: Image-text training like CLIP has dominated the pretraining of vision foundation models in recent years. Subsequent efforts have been made to introduce region-level visual learning into CLIP's pretraining but face scalability challenges due to the lack of large-scale region-level datasets. Drawing inspiration from supervised fine-tuning (SFT) in natural language processing such as instruction tuni… ▽ More

    Submitted 11 April, 2024; v1 submitted 18 January, 2024; originally announced January 2024.

    Comments: 23 pages, 3 figures, Project page: https://github.com/TencentARC/ViSFT/tree/main

  22. arXiv:2401.08678  [pdf, other

    eess.AS cs.SD

    Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

    Authors: Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

    Abstract: This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.

    Submitted 10 January, 2024; originally announced January 2024.

    Comments: Submitted to ICASSP 2024

  23. arXiv:2401.08119  [pdf, other

    cs.LG

    SpecSTG: A Fast Spectral Diffusion Framework for Probabilistic Spatio-Temporal Traffic Forecasting

    Authors: Lequan Lin, Dai Shi, Andi Han, Junbin Gao

    Abstract: Traffic forecasting, a crucial application of spatio-temporal graph (STG) learning, has traditionally relied on deterministic models for accurate point estimations. Yet, these models fall short of identifying latent risks of unexpected volatility in future observations. To address this gap, probabilistic methods, especially variants of diffusion models, have emerged as uncertainty-aware solutions.… ▽ More

    Submitted 23 January, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

  24. arXiv:2312.13620  [pdf, other

    cs.CV eess.IV

    A Comprehensive End-to-End Computer Vision Framework for Restoration and Recognition of Low-Quality Engineering Drawings

    Authors: Lvyang Yang, Jiankang Zhang, Huaiqiang Li, Longfei Ren, Chen Yang, Jingyu Wang, Dongyuan Shi

    Abstract: The digitization of engineering drawings is crucial for efficient reuse, distribution, and archiving. Existing computer vision approaches for digitizing engineering drawings typically assume the input drawings have high quality. However, in reality, engineering drawings are often blurred and distorted due to improper scanning, storage, and transmission, which may jeopardize the effectiveness of ex… ▽ More

    Submitted 21 December, 2023; originally announced December 2023.

    Comments: 20 pages, 13 figures, submitted to Engineering Applications of Artificial Intelligence

  25. arXiv:2312.12853  [pdf, other

    cs.CL

    CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

    Authors: Dan Shi, Chaobin You, Jiantao Huang, Taihao Li, Deyi Xiong

    Abstract: As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense k… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

    Comments: AAAI 2024

  26. arXiv:2312.04346  [pdf, other

    cs.LG cs.CR

    Improved Efficient Two-Stage Denoising Diffusion Power System Measurement Recovery Against False Data Injection Attacks and Data Losses

    Authors: Jianhua Pei, Jingyu Wang, Dongyuan Shi, Ping Wang

    Abstract: Measurement uncertainties, represented by cyber-attacks and data losses, seriously degrade the quality of power system measurements. Fortunately, the powerful generation ability of the denoising diffusion models can enable more precise measurement generation for power system data recovery. However, the controllable data generation and efficient computing methods of denoising diffusion models for d… ▽ More

    Submitted 7 December, 2023; originally announced December 2023.

  27. arXiv:2311.17132  [pdf, other

    cs.CV cs.AI

    TransNeXt: Robust Foveal Visual Perception for Vision Transformers

    Authors: Dai Shi

    Abstract: Due to the depth degradation effect in residual connections, many efficient Vision Transformers models that rely on stacking layers for information exchange often fail to form sufficient information mixing, leading to unnatural visual perception. To address this issue, in this paper, we propose Aggregated Attention, a biomimetic design-based token mixer that simulates biological foveal vision and… ▽ More

    Submitted 20 April, 2024; v1 submitted 28 November, 2023; originally announced November 2023.

    Comments: CVPR 2024 Camera-ready Version. Project Page: https://github.com/DaiShiResearch/TransNeXt

  28. arXiv:2311.12465  [pdf, other

    cs.AI

    Towards a Gateway for Knowledge Graph Schemas Collection, Analysis, and Embedding

    Authors: Mattia Fumagalli, Marco Boffo, Daqian Shi, Mayukh Bagchi, Fausto Giunchiglia

    Abstract: One of the significant barriers to the training of statistical models on knowledge graphs is the difficulty that scientists have in finding the best input data to address their prediction goal. In addition to this, a key challenge is to determine how to manipulate these relational data, which are often in the form of particular triples (i.e., subject, predicate, object), to enable the learning pro… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: Ontology Showcase and Demonstrations Track, 9th Joint Ontology Workshops (JOWO 2023), Co-located with FOIS 2023, 19-20 July, 2023, Sherbrooke, Québec, Canada. arXiv admin note: substantial text overlap with arXiv:2207.06112

    Report number: DISIKNOWDIVE21112023

  29. arXiv:2311.07073  [pdf, ps, other

    cs.LG

    Exposition on over-squashing problem on GNNs: Current Methods, Benchmarks and Challenges

    Authors: Dai Shi, Andi Han, Lequan Lin, Yi Guo, Junbin Gao

    Abstract: Graph-based message-passing neural networks (MPNNs) have achieved remarkable success in both node and graph-level learning tasks. However, several identified problems, including over-smoothing (OSM), limited expressive power, and over-squashing (OSQ), still limit the performance of MPNNs. In particular, OSQ serves as the latest identified problem, where MPNNs gradually lose their learning accuracy… ▽ More

    Submitted 17 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

  30. arXiv:2311.06015  [pdf

    cs.RO cs.AI

    RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

    Authors: Hongyin Zhang, Diyuan Shi, Zifeng Zhuang, Han Zhao, Zhenyu Wei, Feng Zhao, Sibo Gai, Shangke Lyu, Donglin Wang

    Abstract: Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed t… ▽ More

    Submitted 10 November, 2023; originally announced November 2023.

  31. arXiv:2310.19736  [pdf, other

    cs.CL cs.AI

    Evaluating Large Language Models: A Comprehensive Survey

    Authors: Zishan Guo, Renren Jin, Chuang Liu, Yufei Huang, Dan Shi, Supryadi, Linhao Yu, Yan Liu, Jiaxuan Li, Bojian Xiong, Deyi Xiong

    Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across a broad spectrum of tasks. They have attracted significant attention and been deployed in numerous downstream applications. Nevertheless, akin to a double-edged sword, LLMs also present potential risks. They could suffer from private data leaks or yield inappropriate, harmful, or misleading content. Additionally, the rap… ▽ More

    Submitted 25 November, 2023; v1 submitted 30 October, 2023; originally announced October 2023.

    Comments: 111 pages

  32. arXiv:2310.10666  [pdf, other

    cs.CR cs.LG

    Extracting Physical Causality from Measurements to Detect and Localize False Data Injection Attacks

    Authors: Shengyang Wu, Jingyu Wang, Dongyuan Shi

    Abstract: False Data Injection Attack (FDIA) has become a growing concern in modern cyber-physical power systems. Most existing FDIA detection techniques project the raw measurement data into a high-dimensional latent space to separate normal and attacked samples. These approaches focus more on the statistical correlations of data values and are therefore susceptible to data distribution drifts induced by c… ▽ More

    Submitted 20 September, 2023; originally announced October 2023.

    Comments: 10 pages

  33. arXiv:2310.10121  [pdf, other

    cs.LG cs.AI stat.ML

    From Continuous Dynamics to Graph Neural Networks: Neural Diffusion and Beyond

    Authors: Andi Han, Dai Shi, Lequan Lin, Junbin Gao

    Abstract: Graph neural networks (GNNs) have demonstrated significant promise in modelling relational data and have been widely applied in various fields of interest. The key mechanism behind GNNs is the so-called message passing where information is being iteratively aggregated to central nodes from their neighbourhood. Such a scheme has been found to be intrinsically linked to a physical process known as h… ▽ More

    Submitted 29 October, 2023; v1 submitted 16 October, 2023; originally announced October 2023.

  34. arXiv:2309.15203  [pdf, other

    cs.CR cs.HC eess.SP

    Eve Said Yes: AirBone Authentication for Head-Wearable Smart Voice Assistant

    Authors: Chenpei Huang, Hui Zhong, Jie Lian, Pavana Prakash, Dian Shi, Yuan Xu, Miao Pan

    Abstract: Recent advances in machine learning and natural language processing have fostered the enormous prosperity of smart voice assistants and their services, e.g., Alexa, Google Home, Siri, etc. However, voice spoofing attacks are deemed to be one of the major challenges of voice control security, and never stop evolving such as deep-learning-based voice conversion and speech synthesis techniques. To so… ▽ More

    Submitted 26 September, 2023; originally announced September 2023.

    Comments: 13 pages, 12 figures

  35. arXiv:2309.06645  [pdf, other

    cs.LG

    Bregman Graph Neural Network

    Authors: Jiayu Zhai, Lequan Lin, Dai Shi, Junbin Gao

    Abstract: Numerous recent research on graph neural networks (GNNs) has focused on formulating GNN architectures as an optimization problem with the smoothness assumption. However, in node classification tasks, the smoothing effect induced by GNNs tends to assimilate representations and over-homogenize labels of connected nodes, leading to adverse effects such as over-smoothing and misclassification. In this… ▽ More

    Submitted 12 September, 2023; originally announced September 2023.

  36. arXiv:2309.05590  [pdf, other

    cs.CV cs.AI cs.MM

    Temporal Action Localization with Enhanced Instant Discriminability

    Authors: Dingfeng Shi, Qiong Cao, Yujie Zhong, Shan An, Jian Cheng, Haogang Zhu, Dacheng Tao

    Abstract: Temporal action detection (TAD) aims to detect all action boundaries and their corresponding categories in an untrimmed video. The unclear boundaries of actions in videos often result in imprecise predictions of action boundaries by existing methods. To resolve this issue, we propose a one-stage framework named TriDet. First, we propose a Trident-head to model the action boundary via an estimated… ▽ More

    Submitted 11 September, 2023; originally announced September 2023.

    Comments: An extended version of the CVPR paper arXiv:2303.07347, submitted to IJCV

  37. arXiv:2309.02769  [pdf, other

    cs.LG

    Unifying over-smoothing and over-squashing in graph neural networks: A physics informed approach and beyond

    Authors: Zhiqi Shao, Dai Shi, Andi Han, Yi Guo, Qibin Zhao, Junbin Gao

    Abstract: Graph Neural Networks (GNNs) have emerged as one of the leading approaches for machine learning on graph-structured data. Despite their great success, critical computational challenges such as over-smoothing, over-squashing, and limited expressive power continue to impact the performance of GNNs. In this study, inspired from the time-reversal principle commonly utilized in classical and quantum ph… ▽ More

    Submitted 12 September, 2023; v1 submitted 6 September, 2023; originally announced September 2023.

  38. arXiv:2308.15930  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    LLaSM: Large Language and Speech Model

    Authors: Yu Shu, Siwei Dong, Guangyao Chen, Wenhao Huang, Ruihua Zhang, Daochen Shi, Qiqi Xiang, Yemin Shi

    Abstract: Multi-modal large language models have garnered significant interest recently. Though, most of the works focus on vision-language multi-modal models providing strong capabilities in following vision-and-language instructions. However, we claim that speech is also an important modality through which humans interact with the world. Hence, it is crucial for a general-purpose assistant to be able to f… ▽ More

    Submitted 16 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

  39. arXiv:2308.13764  [pdf, other

    cs.CV cs.AI

    Unified Single-Stage Transformer Network for Efficient RGB-T Tracking

    Authors: Jianqiang Xia, DianXi Shi, Ke Song, Linna Song, XiaoLei Wang, Songchang Jin, Li Zhou, Yu Cheng, Lei Jin, Zheng Zhu, Jianan Li, Gang Wang, Junliang Xing, Jian Zhao

    Abstract: Most existing RGB-T tracking networks extract modality features in a separate manner, which lacks interaction and mutual guidance between modalities. This limits the network's ability to adapt to the diverse dual-modality appearances of targets and the dynamic relationships between the modalities. Additionally, the three-stage fusion tracking paradigm followed by these networks significantly restr… ▽ More

    Submitted 26 August, 2023; originally announced August 2023.

  40. arXiv:2308.03684  [pdf, other

    eess.AS cs.SD

    Active Noise Control based on the Momentum Multichannel Normalized Filtered-x Least Mean Square Algorithm

    Authors: Dongyuan Shi, Woon-Seng Gan, Bhan Lam, Shulin Wen, Xiaoyi Shen

    Abstract: Multichannel active noise control (MCANC) is widely utilized to achieve significant noise cancellation area in the complicated acoustic field. Meanwhile, the filter-x least mean square (FxLMS) algorithm gradually becomes the benchmark solution for the implementation of MCANC due to its low computational complexity. However, its slow convergence speed more or less undermines the performance of deal… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

    Comments: Conference: INTER-NOISE and NOISE-CON Congress and Conference Proceedings 2020 At Korea Volume: 261

  41. Toward Zero-shot Character Recognition: A Gold Standard Dataset with Radical-level Annotations

    Authors: Xiaolei Diao, Daqian Shi, Jian Li, Lida Shi, Mingzhe Yue, Ruihua Qi, Chuntao Li, Hao Xu

    Abstract: Optical character recognition (OCR) methods have been applied to diverse tasks, e.g., street view text recognition and document analysis. Recently, zero-shot OCR has piqued the interest of the research community because it considers a practical OCR scenario with unbalanced data distribution. However, there is a lack of benchmarks for evaluating such zero-shot methods that apply a divide-and-conque… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

  42. Probabilistic Compute-in-Memory Design For Efficient Markov Chain Monte Carlo Sampling

    Authors: Yihan Fu, Daijing Shi, Anjunyi Fan, Wenshuo Yue, Yuchao Yang, Ru Huang, Bonan Yan

    Abstract: Markov chain Monte Carlo (MCMC) is a widely used sampling method in modern artificial intelligence and probabilistic computing systems. It involves repetitive random number generations and thus often dominates the latency of probabilistic model computing. Hence, we propose a compute-in-memory (CIM) based MCMC design as a hardware acceleration solution. This work investigates SRAM bitcell stochasti… ▽ More

    Submitted 16 July, 2023; originally announced July 2023.

  43. arXiv:2307.09768  [pdf, other

    cs.LG

    How Curvature Enhance the Adaptation Power of Framelet GCNs

    Authors: Dai Shi, Yi Guo, Zhiqi Shao, Junbin Gao

    Abstract: Graph neural network (GNN) has been demonstrated powerful in modeling graph-structured data. However, despite many successful cases of applying GNNs to various graph classification and prediction tasks, whether the graph geometrical information has been fully exploited to enhance the learning performance of GNNs is not yet well understood. This paper introduces a new approach to enhance GNN by dis… ▽ More

    Submitted 19 July, 2023; originally announced July 2023.

  44. arXiv:2307.06631  [pdf, other

    cs.LG

    Frameless Graph Knowledge Distillation

    Authors: Dai Shi, Zhiqi Shao, Yi Guo, Junbin Gao

    Abstract: Knowledge distillation (KD) has shown great potential for transferring knowledge from a complex teacher model to a simple student model in which the heavy learning task can be accomplished efficiently and without losing too much prediction accuracy. Recently, many attempts have been made by applying the KD mechanism to the graph representation learning models such as graph neural networks (GNNs) t… ▽ More

    Submitted 13 July, 2023; originally announced July 2023.

  45. Anti-noise window: Subjective perception of active noise reduction and effect of informational masking

    Authors: Bhan Lam, Kelvin Chee Quan Lim, Kenneth Ooi, Zhen-Ting Ong, Dongyuan Shi, Woon-Seng Gan

    Abstract: Reviving natural ventilation (NV) for urban sustainability presents challenges for indoor acoustic comfort. Active control and interference-based noise mitigation strategies, such as the use of loudspeakers, offer potential solutions to achieve acoustic comfort while maintaining NV. However, these approaches are not commonly integrated or evaluated from a perceptual standpoint. This study examines… ▽ More

    Submitted 8 July, 2023; originally announced July 2023.

    Comments: Accepted manuscript submitted to Sustainable Cities and Society

    Journal ref: Sustain. Cities Soc., 104763, 2023

  46. arXiv:2306.10484  [pdf, other

    eess.IV cs.CV

    The STOIC2021 COVID-19 AI challenge: applying reusable training methodologies to private data

    Authors: Luuk H. Boulogne, Julian Lorenz, Daniel Kienzle, Robin Schon, Katja Ludwig, Rainer Lienhart, Simon Jegou, Guang Li, Cong Chen, Qi Wang, Derik Shi, Mayug Maniparambil, Dominik Muller, Silvan Mertes, Niklas Schroter, Fabio Hellmann, Miriam Elia, Ine Dirks, Matias Nicolas Bossa, Abel Diaz Berenguer, Tanmoy Mukherjee, Jef Vandemeulebroucke, Hichem Sahli, Nikos Deligiannis, Panagiotis Gonidakis , et al. (13 additional authors not shown)

    Abstract: Challenges drive the state-of-the-art of automated medical image analysis. The quantity of public training data that they provide can limit the performance of their solutions. Public access to the training methodology for these solutions remains absent. This study implements the Type Three (T3) challenge format, which allows for training solutions on private data and guarantees reusable training m… ▽ More

    Submitted 25 June, 2023; v1 submitted 18 June, 2023; originally announced June 2023.

  47. arXiv:2306.08865  [pdf, other

    cs.CV cs.LG

    One-Shot Learning of Visual Path Navigation for Autonomous Vehicles

    Authors: Zhongying CuiZhu, Francois Charette, Amin Ghafourian, Debo Shi, Matthew Cui, Anjali Krishnamachar, Iman Soltani

    Abstract: Autonomous driving presents many challenges due to the large number of scenarios the autonomous vehicle (AV) may encounter. End-to-end deep learning models are comparatively simplistic models that can handle a broad set of scenarios. However, end-to-end models require large amounts of diverse data to perform well. This paper presents a novel deep neural network that performs image-to-steering path… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Machine Learning for Autonomous Driving Workshop at the 35th Conference on Neural Information Processing Systems (NeurIPS 20222), New Orleans, USA

  48. arXiv:2306.07760  [pdf, other

    cs.HC

    Urania: Visualizing Data Analysis Pipelines for Natural Language-Based Data Exploration

    Authors: Yi Guo, Nan Cao, Xiaoyu Qi, Haoyang Li, Danqing Shi, Jing Zhang, Qing Chen, Daniel Weiskopf

    Abstract: Exploratory Data Analysis (EDA) is an essential yet tedious process for examining a new dataset. To facilitate it, natural language interfaces (NLIs) can help people intuitively explore the dataset via data-oriented questions. However, existing NLIs primarily focus on providing accurate answers to questions, with few offering explanations or presentations of the data analysis pipeline used to unco… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

  49. arXiv:2305.17455  [pdf, other

    cs.CV cs.CL

    CrossGET: Cross-Guided Ensemble of Tokens for Accelerating Vision-Language Transformers

    Authors: Dachuan Shi, Chaofan Tao, Anyi Rao, Zhendong Yang, Chun Yuan, Jiaqi Wang

    Abstract: Recent vision-language models have achieved tremendous advances. However, their computational costs are also escalating dramatically, making model acceleration exceedingly critical. To pursue more efficient vision-language Transformers, this paper introduces Cross-Guided Ensemble of Tokens (CrossGET), a general acceleration framework for vision-language Transformers. This framework adaptively comb… ▽ More

    Submitted 13 June, 2024; v1 submitted 27 May, 2023; originally announced May 2023.

    Comments: ICML 2024. Code: https://github.com/sdc17/CrossGET

  50. arXiv:2305.16217  [pdf, other

    cs.LG

    Beyond Reward: Offline Preference-guided Policy Optimization

    Authors: Yachen Kang, Diyuan Shi, Jinxin Liu, Li He, Donglin Wang

    Abstract: This study focuses on the topic of offline preference-based reinforcement learning (PbRL), a variant of conventional reinforcement learning that dispenses with the need for online interaction or specification of reward functions. Instead, the agent is provided with fixed offline trajectories and human preferences between pairs of trajectories to extract the dynamics and task information, respectiv… ▽ More

    Submitted 9 June, 2023; v1 submitted 25 May, 2023; originally announced May 2023.