Skip to main content

Showing 1–50 of 578 results for author: Xu, T

  1. arXiv:2407.01530  [pdf, other

    eess.IV cs.CV

    xLSTM-UNet can be an Effective 2D & 3D Medical Image Segmentation Backbone with Vision-LSTM (ViL) better than its Mamba Counterpart

    Authors: Tianrun Chen, Chaotao Ding, Lanyun Zhu, Tao Xu, Deyi Ji, Yan Wang, Ying Zang, Zejian Li

    Abstract: Convolutional Neural Networks (CNNs) and Vision Transformers (ViT) have been pivotal in biomedical image segmentation, yet their ability to manage long-range dependencies remains constrained by inherent locality and computational overhead. To overcome these challenges, in this technical report, we first propose xLSTM-UNet, a UNet structured deep learning neural network that leverages Vision-LSTM (… ▽ More

    Submitted 2 July, 2024; v1 submitted 1 July, 2024; originally announced July 2024.

  2. arXiv:2407.01511  [pdf, other

    cs.AI

    CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents

    Authors: Tianqi Xu, Linyao Chen, Dai-Jie Wu, Yanjun Chen, Zecheng Zhang, Xiang Yao, Zhiqiang Xie, Yongchao Chen, Shilong Liu, Bochen Qian, Philip Torr, Bernard Ghanem, Guohao Li

    Abstract: The development of autonomous agents increasingly relies on Multimodal Language Models (MLMs) to perform tasks described in natural language with GUI environments, such as websites, desktop computers, or mobile phones. Existing benchmarks for MLM agents in interactive environments are limited by their focus on a single environment, lack of detailed and generalized evaluation methods, and the compl… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2406.18548  [pdf

    eess.IV cs.CV

    Exploration of Multi-Scale Image Fusion Systems in Intelligent Medical Image Analysis

    Authors: Yuxiang Hu, Haowei Yang, Ting Xu, Shuyao He, Jiajie Yuan, Haozhang Deng

    Abstract: The diagnosis of brain cancer relies heavily on medical imaging techniques, with MRI being the most commonly used. It is necessary to perform automatic segmentation of brain tumors on MRI images. This project intends to build an MRI algorithm based on U-Net. The residual network and the module used to enhance the context information are combined, and the void space convolution pooling pyramid is a… ▽ More

    Submitted 23 May, 2024; originally announced June 2024.

  4. Joint Admission Control and Resource Allocation of Virtual Network Embedding via Hierarchical Deep Reinforcement Learning

    Authors: Tianfu Wang, Li Shen, Qilin Fan, Tong Xu, Tongliang Liu, Hui Xiong

    Abstract: As an essential resource management problem in network virtualization, virtual network embedding (VNE) aims to allocate the finite resources of physical network to sequentially arriving virtual network requests (VNRs) with different resource demands. Since this is an NP-hard combinatorial optimization problem, many efforts have been made to provide viable solutions. However, most existing approach… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Accepted by IEEE Transactions on Services Computing (TSC)

    Journal ref: IEEE Transactions on Services Computing ( Volume: 17, Issue: 3, May-June 2024)

  5. arXiv:2406.16069  [pdf, other

    cs.CL cs.AI

    FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

    Authors: Junyi Zhu, Shuochen Liu, Yu Yu, Bo Tang, Yibo Yan, Zhiyu Li, Feiyu Xiong, Tong Xu, Matthew B. Blaschko

    Abstract: Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method designed to enhance instruction fine-tuned LLMs' context awareness through fast memorization of the prompt. FastMem maximizes the likelihood of the prompt before in… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  6. arXiv:2406.14979  [pdf, other

    cs.CL

    Retrieve-Plan-Generation: An Iterative Planning and Answering Framework for Knowledge-Intensive LLM Generation

    Authors: Yuanjie Lyu, Zihan Niu, Zheyong Xie, Chao Zhang, Tong Xu, Yang Wang, Enhong Chen

    Abstract: Despite the significant progress of large language models (LLMs) in various tasks, they often produce factual errors due to their limited internal knowledge. Retrieval-Augmented Generation (RAG), which enhances LLMs with external knowledge sources, offers a promising solution. However, these methods can be misled by irrelevant paragraphs in retrieved documents. Due to the inherent uncertainty in L… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

  7. arXiv:2406.14106  [pdf, other

    cs.AI cs.CL

    EasyECR: A Library for Easy Implementation and Evaluation of Event Coreference Resolution Models

    Authors: Yuncong Li, Tianhua Xu, Sheng-hua Zhong, Haiqin Yang

    Abstract: Event Coreference Resolution (ECR) is the task of clustering event mentions that refer to the same real-world event. Despite significant advancements, ECR research faces two main challenges: limited generalizability across domains due to narrow dataset evaluations, and difficulties in comparing models within diverse ECR pipelines. To address these issues, we develop EasyECR, the first open-source… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures, 12 tables

  8. arXiv:2406.13885  [pdf, other

    cs.CL cs.AI

    Knowledge Tagging System on Math Questions via LLMs with Flexible Demonstration Retriever

    Authors: Hang Li, Tianlong Xu, Jiliang Tang, Qingsong Wen

    Abstract: Knowledge tagging for questions plays a crucial role in contemporary intelligent educational applications, including learning progress diagnosis, practice question recommendations, and course content organization. Traditionally, these annotations are always conducted by pedagogical experts, as the task requires not only a strong semantic understanding of both question stems and knowledge definitio… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 13 pages, 6 figures

  9. arXiv:2406.13618  [pdf, other

    cs.CL

    In-Context Former: Lightning-fast Compressing Context for Large Language Model

    Authors: Xiangfeng Wang, Zaiyi Chen, Zheyong Xie, Tong Xu, Yongyi He, Enhong Chen

    Abstract: With the rising popularity of Transformer-based large language models (LLMs), reducing their high inference costs has become a significant research focus. One effective approach is to compress the long input contexts. Existing methods typically leverage the self-attention mechanism of the LLM itself for context compression. While these methods have achieved notable results, the compression process… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  10. arXiv:2406.12975  [pdf, other

    cs.CL cs.AI cs.CY

    SHIELD: Evaluation and Defense Strategies for Copyright Compliance in LLM Text Generation

    Authors: Xiaoze Liu, Ting Sun, Tianyang Xu, Feijie Wu, Cunxiang Wang, Xiaoqian Wang, Jing Gao

    Abstract: Large Language Models (LLMs) have transformed machine learning but raised significant legal concerns due to their potential to produce text that infringes on copyrights, resulting in several high-profile lawsuits. The legal landscape is struggling to keep pace with these rapid advancements, with ongoing debates about whether generated text might plagiarize copyrighted materials. Current LLMs may i… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  11. arXiv:2406.12020  [pdf, other

    cs.IR cs.AI

    When Box Meets Graph Neural Network in Tag-aware Recommendation

    Authors: Fake Lin, Ziwei Zhao, Xi Zhu, Da Zhang, Shitian Shen, Xueying Li, Tong Xu, Suojuan Zhang, Enhong Chen

    Abstract: Last year has witnessed the re-flourishment of tag-aware recommender systems supported by the LLM-enriched tags. Unfortunately, though large efforts have been made, current solutions may fail to describe the diversity and uncertainty inherent in user preferences with only tag-driven profiles. Recently, with the development of geometry-based techniques, e.g., box embedding, diversity of user prefer… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  12. arXiv:2406.08864  [pdf

    cs.LG cs.AI

    Research on Early Warning Model of Cardiovascular Disease Based on Computer Deep Learning

    Authors: Yuxiang Hu, Jinxin Hu, Ting Xu, Bo Zhang, Jiajie Yuan, Haozhang Deng

    Abstract: This project intends to study a cardiovascular disease risk early warning model based on one-dimensional convolutional neural networks. First, the missing values of 13 physiological and symptom indicators such as patient age, blood glucose, cholesterol, and chest pain were filled and Z-score was standardized. The convolutional neural network is converted into a 2D matrix, the convolution function… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 6 pages

  13. arXiv:2406.08358  [pdf, other

    cs.CV cs.AI

    From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition

    Authors: Shiwei Wu, Chao Zhang, Joya Chen, Tong Xu, Likang Wu, Yao Hu, Enhong Chen

    Abstract: People's social relationships are often manifested through their surroundings, with certain objects or interactions acting as symbols for specific relationships, e.g., wedding rings, roses, hugs, or holding hands. This brings unique challenges to recognizing social relationships, requiring understanding and capturing the essence of these contexts from visual appearances. However, current methods o… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.08090  [pdf, other

    cs.CV

    From Sim-to-Real: Toward General Event-based Low-light Frame Interpolation with Per-scene Optimization

    Authors: Ziran Zhang, Yongrui Ma, Yueting Chen, Feng Zhang, Jinwei Gu, Tianfan Xue, Shi Guo

    Abstract: Video Frame Interpolation (VFI) is important for video enhancement, frame rate up-conversion, and slow-motion generation. The introduction of event cameras, which capture per-pixel brightness changes asynchronously, has significantly enhanced VFI capabilities, particularly for high-speed, nonlinear motions. However, these event-based methods encounter challenges in low-light conditions, notably tr… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  15. arXiv:2406.07579  [pdf, other

    cs.AI cs.GR cs.LG

    GFPack++: Improving 2D Irregular Packing by Learning Gradient Field with Attention

    Authors: Tianyang Xue, Lin Lu, Yang Liu, Mingdong Wu, Hao Dong, Yanbin Zhang, Renmin Han, Baoquan Chen

    Abstract: 2D irregular packing is a classic combinatorial optimization problem with various applications, such as material utilization and texture atlas generation. This NP-hard problem requires efficient algorithms to optimize space utilization. Conventional numerical methods suffer from slow convergence and high computational cost. Existing learning-based methods, such as the score-based diffusion model,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  16. arXiv:2406.04776  [pdf, ps, other

    eess.SP cs.AI

    OFDM-Standard Compatible SC-NOFS Waveforms for Low-Latency and Jitter-Tolerance Industrial IoT Communications

    Authors: Tongyang Xu, Shuangyang Li, Jinhong Yuan

    Abstract: Traditional communications focus on regular and orthogonal signal waveforms for simplified signal processing and improved spectral efficiency. In contrast, the next-generation communications would aim for irregular and non-orthogonal signal waveforms to introduce new capabilities. This work proposes a spectrally efficient irregular Sinc (irSinc) shaping technique, revisiting the traditional Sinc b… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  17. arXiv:2406.04129  [pdf, other

    cs.CV

    LenslessFace: An End-to-End Optimized Lensless System for Privacy-Preserving Face Verification

    Authors: Xin Cai, Hailong Zhang, Chenchen Wang, Wentao Liu, Jinwei Gu, Tianfan Xue

    Abstract: Lensless cameras, innovatively replacing traditional lenses for ultra-thin, flat optics, encode light directly onto sensors, producing images that are not immediately recognizable. This compact, lightweight, and cost-effective imaging solution offers inherent privacy advantages, making it attractive for privacy-sensitive applications like face verification. Typical lensless face verification adopt… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: under review

  18. arXiv:2406.02728  [pdf

    cs.HC

    Impacts of Illuminance and Correlated Color Temperature on Cognitive Performance: A VR-Lighting Study

    Authors: Armin Mostafavi, Milica Vujovic, Tong Bill Xu, Michael Hensel

    Abstract: This study contributes to the ongoing exploration of methods to enhance the environmental design, cognitive function, and overall wellbeing, primarily focusing on understanding the modulation of human cognitive performance by artificial lighting conditions. In this investigation, participants (N=35) engaged with two distinct architectural contexts, each featuring five different lighting conditions… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  19. arXiv:2406.01003  [pdf, other

    cs.CV

    Uni-ISP: Unifying the Learning of ISPs from Multiple Cameras

    Authors: Lingen Li, Mingde Yao, Xingyu Meng, Muquan Yu, Tianfan Xue, Jinwei Gu

    Abstract: Modern end-to-end image signal processors (ISPs) can learn complex mappings from RAW/XYZ data to sRGB (or inverse), opening new possibilities in image processing. However, as the diversity of camera models continues to expand, developing and maintaining individual ISPs is not sustainable in the long term, which inherently lacks versatility, hindering the adaptability to multiple camera models. In… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  20. arXiv:2406.00448  [pdf, other

    cs.CV cs.GR

    Bilateral Guided Radiance Field Processing

    Authors: Yuehao Wang, Chaoyi Wang, Bingchen Gong, Tianfan Xue

    Abstract: Neural Radiance Fields (NeRF) achieves unprecedented performance in synthesizing novel view synthesis, utilizing multi-view consistency. When capturing multiple inputs, image signal processing (ISP) in modern cameras will independently enhance them, including exposure adjustment, color correction, local tone mapping, etc. While these processings greatly improve image quality, they often break the… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: SIGGRAPH (ACM TOG), 2024. Project page: https://bilarfpro.github.io

  21. arXiv:2405.21075  [pdf, other

    cs.CV cs.CL

    Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

    Authors: Chaoyou Fu, Yuhan Dai, Yongdong Luo, Lei Li, Shuhuai Ren, Renrui Zhang, Zihan Wang, Chenyu Zhou, Yunhang Shen, Mengdan Zhang, Peixian Chen, Yanwei Li, Shaohui Lin, Sirui Zhao, Ke Li, Tong Xu, Xiawu Zheng, Enhong Chen, Rongrong Ji, Xing Sun

    Abstract: In the quest for artificial general intelligence, Multi-modal Large Language Models (MLLMs) have emerged as a focal point in recent advancements. However, the predominant focus remains on developing their capabilities in static image understanding. The potential of MLLMs in processing sequential visual data is still insufficiently explored, highlighting the absence of a comprehensive, high-quality… ▽ More

    Submitted 16 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: Project Page: https://video-mme.github.io

  22. arXiv:2405.20974  [pdf, other

    cs.CL cs.AI cs.LG

    SaySelf: Teaching LLMs to Express Confidence with Self-Reflective Rationales

    Authors: Tianyang Xu, Shujin Wu, Shizhe Diao, Xiaoze Liu, Xingyao Wang, Yangyi Chen, Jing Gao

    Abstract: Large language models (LLMs) often generate inaccurate or fabricated information and generally fail to indicate their confidence, which limits their broader applications. Previous work elicits confidence from LLMs by direct or self-consistency prompting, or constructing specific datasets for supervised finetuning. The prompting-based approaches have inferior performance, and the training-based app… ▽ More

    Submitted 5 June, 2024; v1 submitted 31 May, 2024; originally announced May 2024.

    Comments: The code is available at https://github.com/xu1868/SaySelf

  23. arXiv:2405.18842  [pdf, other

    cs.CV

    Descriptive Image Quality Assessment in the Wild

    Authors: Zhiyuan You, Jinjin Gu, Zheyuan Li, Xin Cai, Kaiwen Zhu, Chao Dong, Tianfan Xue

    Abstract: With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and capture the multifaceted nature of IQA tasks. However, current methods are still far from practical usage. First, prior works focus narrowly on specific sub-tasks or settings, which do not align with diverse real-wor… ▽ More

    Submitted 12 June, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  24. arXiv:2405.16789  [pdf, other

    cs.IR

    NoteLLM-2: Multimodal Large Representation Models for Recommendation

    Authors: Chao Zhang, Haoxin Zhang, Shiwei Wu, Di Wu, Tong Xu, Yan Gao, Yao Hu, Enhong Chen

    Abstract: Large Language Models (LLMs) have demonstrated exceptional text understanding. Existing works explore their application in text embedding tasks. However, there are few works utilizing LLMs to assist multimodal representation tasks. In this work, we investigate the potential of LLMs to enhance multimodal representation in multimodal item-to-item (I2I) recommendations. One feasible method is the tra… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 19 pages, 5 figures

  25. arXiv:2405.16241  [pdf, other

    cs.CR cs.AI

    FastQuery: Communication-efficient Embedding Table Query for Private LLM Inference

    Authors: Chenqi Lin, Tianshi Xu, Zebin Yang, Runsheng Wang, Ru Huang, Meng Li

    Abstract: With the fast evolution of large language models (LLMs), privacy concerns with user queries arise as they may contain sensitive information. Private inference based on homomorphic encryption (HE) has been proposed to protect user query privacy. However, a private embedding table query has to be formulated as a HE-based matrix-vector multiplication problem and suffers from enormous computation and… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 6 pages, DAC2024

  26. arXiv:2405.14569  [pdf, other

    cs.CR cs.AI

    PrivCirNet: Efficient Private Inference via Block Circulant Transformation

    Authors: Tianshi Xu, Lemeng Wu, Runsheng Wang, Meng Li

    Abstract: Homomorphic encryption (HE)-based deep neural network (DNN) inference protects data and model privacy but suffers from significant computation overhead. We observe transforming the DNN weights into circulant matrices converts general matrix-vector multiplications into HE-friendly 1-dimensional convolutions, drastically reducing the HE computation cost. Hence, in this paper, we propose \method, a p… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  27. arXiv:2405.14455  [pdf, other

    cs.CV

    TIGER: Text-Instructed 3D Gaussian Retrieval and Coherent Editing

    Authors: Teng Xu, Jiamin Chen, Peng Chen, Youjia Zhang, Junqing Yu, Wei Yang

    Abstract: Editing objects within a scene is a critical functionality required across a broad spectrum of applications in computer vision and graphics. As 3D Gaussian Splatting (3DGS) emerges as a frontier in scene representation, the effective modification of 3D Gaussian scenes has become increasingly vital. This process entails accurately retrieve the target objects and subsequently performing modification… ▽ More

    Submitted 1 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  28. arXiv:2405.11531  [pdf, other

    cs.IR cs.AI

    Knowledge Graph Pruning for Recommendation

    Authors: Fake Lin, Xi Zhu, Ziwei Zhao, Deqiang Huang, Yu Yu, Xueying Li, Tong Xu, Enhong Chen

    Abstract: Recent years have witnessed the prosperity of knowledge graph based recommendation system (KGRS), which enriches the representation of users, items, and entities by structural knowledge with striking improvement. Nevertheless, its unaffordable computational cost still limits researchers from exploring more sophisticated models. We observe that the bottleneck for training efficiency arises from the… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  29. arXiv:2405.10959  [pdf, other

    cs.CY cs.LG

    Foundation Models for Education: Promises and Prospects

    Authors: Tianlong Xu, Richard Tong, Jing Liang, Xing Fan, Haoyang Li, Qingsong Wen

    Abstract: With the advent of foundation models like ChatGPT, educators are excited about the transformative role that AI might play in propelling the next education revolution. The developing speed and the profound impact of foundation models in various industries force us to think deeply about the changes they will make to education, a domain that is critically important for the future of humans. In this p… ▽ More

    Submitted 8 April, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Intelligent Systems

  30. arXiv:2405.10879  [pdf, other

    cs.CV

    One registration is worth two segmentations

    Authors: Shiqi Huang, Tingfa Xu, Ziyi Shen, Shaheer Ullah Saeed, Wen Yan, Dean Barratt, Yipeng Hu

    Abstract: The goal of image registration is to establish spatial correspondence between two or more images, traditionally through dense displacement fields (DDFs) or parametric transformations (e.g., rigid, affine, and splines). Rethinking the existing paradigms of achieving alignment via spatial transformations, we uncover an alternative but more intuitive correspondence representation: a set of correspond… ▽ More

    Submitted 17 May, 2024; originally announced May 2024.

    Comments: Early Accepted by MICCAI2024

  31. arXiv:2405.08573  [pdf, other

    cs.HC

    ViSTooth: A Visualization Framework for Tooth Segmentation on Panoramic Radiograph

    Authors: Shenji Zhu, Miaoxin Hu, Tianya Pan, Yue Hong, Bin Li, Zhiguang Zhou, Ting Xu

    Abstract: Tooth segmentation is a key step for computer aided diagnosis of dental diseases. Numerous machine learning models have been employed for tooth segmentation on dental panoramic radiograph. However, it is a difficult task to achieve accurate tooth segmentation due to complex tooth shapes, diverse tooth categories and incomplete sample set for machine learning. In this paper, we propose ViSTooth, a… ▽ More

    Submitted 14 May, 2024; originally announced May 2024.

  32. arXiv:2405.07580  [pdf, other

    cs.IR cs.AI

    DynLLM: When Large Language Models Meet Dynamic Graph Recommendation

    Authors: Ziwei Zhao, Fake Lin, Xi Zhu, Zhi Zheng, Tong Xu, Shitian Shen, Xueying Li, Zikai Yin, Enhong Chen

    Abstract: Last year has witnessed the considerable interest of Large Language Models (LLMs) for their potential applications in recommender systems, which may mitigate the persistent issue of data sparsity. Though large efforts have been made for user-item graph augmentation with better graph-based recommendation performance, they may fail to deal with the dynamic graph recommendation task, which involves b… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 11 pages, 5 figures

  33. arXiv:2405.05004  [pdf, other

    cs.CV

    TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking

    Authors: Pengcheng Shao, Tianyang Xu, Zhangyong Tang, Linze Li, Xiao-Jun Wu, Josef Kittler

    Abstract: There is currently strong interest in improving visual object tracking by augmenting the RGB modality with the output of a visual event camera that is particularly informative about the scene motion. However, existing approaches perform event feature extraction for RGB-E tracking using traditional appearance models, which have been optimised for RGB only tracking, without adapting it for the intri… ▽ More

    Submitted 8 May, 2024; originally announced May 2024.

  34. arXiv:2405.04082  [pdf, other

    cs.RO

    Logic-Skill Programming: An Optimization-based Approach to Sequential Skill Planning

    Authors: Teng Xue, Amirreza Razmjoo, Suhan Shetty, Sylvain Calinon

    Abstract: Recent advances in robot skill learning have unlocked the potential to construct task-agnostic skill libraries, facilitating the seamless sequencing of multiple simple manipulation primitives (aka. skills) to tackle significantly more complex tasks. Nevertheless, determining the optimal sequence for independently learned skills remains an open problem, particularly when the objective is given sole… ▽ More

    Submitted 4 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: In Proc. Robotics: Science and Systems (RSS), 2024

  35. arXiv:2405.02801  [pdf, other

    cs.SD cs.AI eess.AS

    Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models

    Authors: Tianze Xu, Jiajun Li, Xuesong Chen, Xinrui Yao, Shuchang Liu

    Abstract: In recent years, AI-Generated Content (AIGC) has witnessed rapid advancements, facilitating the generation of music, images, and other forms of artistic expression across various industries. However, researches on general multi-modal music generation model remain scarce. To fill this gap, we propose a multi-modal music generation framework Mozart's Touch. It could generate aligned music with the c… ▽ More

    Submitted 7 May, 2024; v1 submitted 4 May, 2024; originally announced May 2024.

    Comments: 7 pages, 2 figures, submitted to ACM MM 2024

  36. arXiv:2405.02132  [pdf, other

    cs.SD cs.CL eess.AS

    Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets

    Authors: Xuelong Geng, Tianyi Xu, Kun Wei, Bingshen Mu, Hongfei Xue, He Wang, Yangze Li, Pengcheng Guo, Yuhang Dai, Longhao Li, Mingchen Shao, Lei Xie

    Abstract: Large Language Models (LLMs) have demonstrated unparalleled effectiveness in various NLP tasks, and integrating LLMs with automatic speech recognition (ASR) is becoming a mainstream paradigm. Building upon this momentum, our research delves into an in-depth examination of this paradigm on a large open-source Chinese dataset. Specifically, our research aims to evaluate the impact of various configu… ▽ More

    Submitted 6 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

  37. arXiv:2405.00168  [pdf, other

    cs.CV

    Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Method

    Authors: Zhangyong Tang, Tianyang Xu, Zhenhua Feng, Xuefeng Zhu, He Wang, Pengcheng Shao, Chunyang Cheng, Xiao-Jun Wu, Muhammad Awais, Sara Atito, Josef Kittler

    Abstract: RGBT tracking draws increasing attention due to its robustness in multi-modality warranting (MMW) scenarios, such as nighttime and bad weather, where relying on a single sensing modality fails to ensure stable tracking results. However, the existing benchmarks predominantly consist of videos collected in common scenarios where both RGB and thermal infrared (TIR) information are of sufficient quali… ▽ More

    Submitted 30 April, 2024; originally announced May 2024.

  38. arXiv:2404.17569  [pdf, other

    cs.CV

    MaPa: Text-driven Photorealistic Material Painting for 3D Shapes

    Authors: Shangzhan Zhang, Sida Peng, Tao Xu, Yuanbo Yang, Tianrun Chen, Nan Xue, Yujun Shen, Hujun Bao, Ruizhen Hu, Xiaowei Zhou

    Abstract: This paper aims to generate materials for 3D meshes from text descriptions. Unlike existing methods that synthesize texture maps, we propose to generate segment-wise procedural material graphs as the appearance representation, which supports high-quality rendering and provides substantial flexibility in editing. Instead of relying on extensive paired data, i.e., 3D meshes with material graphs and… ▽ More

    Submitted 25 June, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

    Comments: SIGGRAPH 2024. Project page: https://zju3dv.github.io/MaPa

  39. arXiv:2404.16510  [pdf, other

    cs.GR cs.CV

    Interactive3D: Create What You Want by Interactive 3D Generation

    Authors: Shaocong Dong, Lihe Ding, Zhanpeng Huang, Zibin Wang, Tianfan Xue, Dan Xu

    Abstract: 3D object generation has undergone significant advancements, yielding high-quality results. However, fall short of achieving precise user control, often yielding results that do not align with user expectations, thus limiting their applicability. User-envisioning 3D object generation faces significant challenges in realizing its concepts using current generative models due to limited interaction c… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

    Comments: project page: https://interactive-3d.github.io/

  40. arXiv:2404.16359  [pdf, other

    cs.CV

    An Improved Graph Pooling Network for Skeleton-Based Action Recognition

    Authors: Cong Wu, Xiao-Jun Wu, Tianyang Xu, Josef Kittler

    Abstract: Pooling is a crucial operation in computer vision, yet the unique structure of skeletons hinders the application of existing pooling strategies to skeleton graph modelling. In this paper, we propose an Improved Graph Pooling Network, referred to as IGPN. The main innovations include: Our method incorporates a region-awareness pooling strategy based on structural partitioning. The correlation matri… ▽ More

    Submitted 25 April, 2024; originally announced April 2024.

  41. arXiv:2404.16138  [pdf, other

    cs.RO

    Logic Learning from Demonstrations for Multi-step Manipulation Tasks in Dynamic Environments

    Authors: Yan Zhang, Teng Xue, Amirreza Razmjoo, Sylvain Calinon

    Abstract: Learning from Demonstration (LfD) stands as an efficient framework for imparting human-like skills to robots. Nevertheless, designing an LfD framework capable of seamlessly imitating, generalizing, and reacting to disturbances for long-horizon manipulation tasks in dynamic environments remains a challenge. To tackle this challenge, we present Logic Dynamic Movement Primitives (Logic-DMP), which co… ▽ More

    Submitted 23 June, 2024; v1 submitted 24 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE RA-L

  42. arXiv:2404.10342  [pdf, other

    cs.CV cs.MM

    Referring Flexible Image Restoration

    Authors: Runwei Guan, Rongsheng Hu, Zhuhao Zhou, Tianlang Xue, Ka Lok Man, Jeremy Smith, Eng Gee Lim, Weiping Ding, Yutao Yue

    Abstract: In reality, images often exhibit multiple degradations, such as rain and fog at night (triple degradations). However, in many cases, individuals may not want to remove all degradations, for instance, a blurry lens revealing a beautiful snowy landscape (double degradations). In such scenarios, people may only desire to deblur. These situations and requirements shed light on a new challenge in image… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 15 pages, 19 figures

  43. arXiv:2404.09426  [pdf, other

    cs.CV

    ViFu: Multiple 360$^\circ$ Objects Reconstruction with Clean Background via Visible Part Fusion

    Authors: Tianhan Xu, Takuya Ikeda, Koichi Nishiwaki

    Abstract: In this paper, we propose a method to segment and recover a static, clean background and multiple 360$^\circ$ objects from observations of scenes at different timestamps. Recent works have used neural radiance fields to model 3D scenes and improved the quality of novel view synthesis, while few studies have focused on modeling the invisible or occluded parts of the training images. These under-rec… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  44. arXiv:2404.09404  [pdf, other

    cs.CR

    EQO: Exploring Ultra-Efficient Private Inference with Winograd-Based Protocol and Quantization Co-Optimization

    Authors: Wenxuan Zeng, Tianshi Xu, Meng Li, Runsheng Wang

    Abstract: Private convolutional neural network (CNN) inference based on secure two-party computation (2PC) suffers from high communication and latency overhead, especially from convolution layers. In this paper, we propose EQO, a quantized 2PC inference framework that jointly optimizes the CNNs and 2PC protocols. EQO features a novel 2PC protocol that combines Winograd transformation with quantization for e… ▽ More

    Submitted 14 April, 2024; originally announced April 2024.

  45. arXiv:2404.07946  [pdf, other

    cs.LG cs.AI

    Towards Faster Training of Diffusion Models: An Inspiration of A Consistency Phenomenon

    Authors: Tianshuo Xu, Peng Mi, Ruilin Wang, Yingcong Chen

    Abstract: Diffusion models (DMs) are a powerful generative framework that have attracted significant attention in recent years. However, the high computational cost of training DMs limits their practical applications. In this paper, we start with a consistency phenomenon of DMs: we observe that DMs with different initializations or even different architectures can produce very similar outputs given the same… ▽ More

    Submitted 14 March, 2024; originally announced April 2024.

  46. arXiv:2404.04848  [pdf, other

    eess.IV cs.AI cs.CV

    Task-Aware Encoder Control for Deep Video Compression

    Authors: Xingtong Ge, Jixiang Luo, Xinjie Zhang, Tongda Xu, Guo Lu, Dailan He, Jing Geng, Yan Wang, Jun Zhang, Hongwei Qin

    Abstract: Prior research on deep video compression (DVC) for machine tasks typically necessitates training a unique codec for each specific task, mandating a dedicated decoder per task. In contrast, traditional video codecs employ a flexible encoder controller, enabling the adaptation of a single codec to different tasks through mechanisms like mode prediction. Drawing inspiration from this, we introduce an… ▽ More

    Submitted 20 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by CVPR 2024

  47. arXiv:2404.04720  [pdf, other

    cs.CV

    On Exploring PDE Modeling for Point Cloud Video Representation Learning

    Authors: Zhuoxu Huang, Zhenkun Fan, Tao Xu, Jungong Han

    Abstract: Point cloud video representation learning is challenging due to complex structures and unordered spatial arrangement. Traditional methods struggle with frame-to-frame correlations and point-wise correspondence tracking. Recently, partial differential equations (PDE) have provided a new perspective in uniformly solving spatial-temporal data information within certain constraints. While tracking tan… ▽ More

    Submitted 29 May, 2024; v1 submitted 6 April, 2024; originally announced April 2024.

  48. arXiv:2404.00942  [pdf, other

    cs.CL cs.AI cs.LG

    Evaluating the Factuality of Large Language Models using Large-Scale Knowledge Graphs

    Authors: Xiaoze Liu, Feijie Wu, Tianyang Xu, Zhuo Chen, Yichi Zhang, Xiaoqian Wang, Jing Gao

    Abstract: The advent of Large Language Models (LLMs) has significantly transformed the AI landscape, enhancing machine learning and AI capabilities. Factuality issue is a critical concern for LLMs, as they may generate factually incorrect responses. In this paper, we propose GraphEval to evaluate an LLM's performance using a substantially large test dataset. Specifically, the test dataset is retrieved from… ▽ More

    Submitted 1 April, 2024; originally announced April 2024.

  49. arXiv:2404.00694  [pdf, other

    cs.CV

    DMSSN: Distilled Mixed Spectral-Spatial Network for Hyperspectral Salient Object Detection

    Authors: Haolin Qin, Tingfa Xu, Peifu Liu, Jingxuan Xu, Jianan Li

    Abstract: Hyperspectral salient object detection (HSOD) has exhibited remarkable promise across various applications, particularly in intricate scenarios where conventional RGB-based approaches fall short. Despite the considerable progress in HSOD method advancements, two critical challenges require immediate attention. Firstly, existing hyperspectral data dimension reduction techniques incur a loss of spec… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  50. arXiv:2403.18105  [pdf, other

    cs.CL cs.AI

    Large Language Models for Education: A Survey and Outlook

    Authors: Shen Wang, Tianlong Xu, Hang Li, Chaoli Zhang, Joleen Liang, Jiliang Tang, Philip S. Yu, Qingsong Wen

    Abstract: The advent of Large Language Models (LLMs) has brought in a new era of possibilities in the realm of education. This survey paper summarizes the various technologies of LLMs in educational settings from multifaceted perspectives, encompassing student and teacher assistance, adaptive learning, and commercial tools. We systematically review the technological advancements in each perspective, organiz… ▽ More

    Submitted 1 April, 2024; v1 submitted 26 March, 2024; originally announced March 2024.