Skip to main content

Showing 1–50 of 291 results for author: Xia, G

  1. arXiv:2407.03824  [pdf, ps, other

    cs.LG cs.AI

    Emergent Interpretable Symbols and Content-Style Disentanglement via Variance-Invariance Constraints

    Authors: Yuxuan Wu, Ziyu Wang, Bhiksha Raj, Gus Xia

    Abstract: We contribute an unsupervised method that effectively learns from raw observation and disentangles its latent space into content and style representations. Unlike most disentanglement algorithms that rely on domain-specific labels and knowledge, our method is based on the insight of domain-general statistical differences between content and style -- content varies more among different fragments wi… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  2. Parametric Primitive Analysis of CAD Sketches with Vision Transformer

    Authors: Xiaogang Wang, Liang Wang, Hongyu Wu, Guoqiang Xiao, Kai Xu

    Abstract: The design and analysis of Computer-Aided Design (CAD) sketches play a crucial role in industrial product design, primarily involving CAD primitives and their inter-primitive constraints. To address challenges related to error accumulation in autoregressive models and the complexities associated with self-supervised model design for this task, we propose a two-stage network framework. This framewo… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  3. arXiv:2406.18453  [pdf, other

    cs.CV

    Towards Human-Level 3D Relative Pose Estimation: Generalizable, Training-Free, with Single Reference

    Authors: Yuan Gao, Yajing Luo, Junhong Wang, Kui Jia, Gui-Song Xia

    Abstract: Humans can easily deduce the relative pose of an unseen object, without label/training, given only a single query-reference image pair. This is arguably achieved by incorporating (i) 3D/2.5D shape perception from a single image, (ii) render-and-compare simulation, and (iii) rich semantic cue awareness to furnish (coarse) reference-query correspondence. Existing methods implement (i) by a 3D CAD mo… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: The codes are available at https://github.com/ethanygao/training-free_generalizable_relative_pose

  4. arXiv:2406.14485   

    cs.AI cs.HC cs.MM cs.SD eess.AS

    Proceedings of The second international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Shuoyang Zheng, Helen Kennedy, Alan Chamberlain, Makayla Lewis, Drew Hemment, Zijin Li, Qiong Wu, Lanxi Xiao, Gus Xia, Jeba Rezwana, Michael Clemens, Gabriel Vigliensoni

    Abstract: This second international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 16th ACM Conference on Creativity and Cognition (C&C 2024), Chicago, USA.

    Submitted 1 July, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.10774  [pdf, other

    cs.CL cs.LG

    Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

    Authors: Jiaming Tang, Yilong Zhao, Kan Zhu, Guangxuan Xiao, Baris Kasikci, Song Han

    Abstract: As the demand for long-context large language models (LLMs) increases, models with context windows of up to 128K or 1M tokens are becoming increasingly prevalent. However, long-context LLM inference is challenging since the inference speed decreases significantly as the sequence length grows. This slowdown is primarily caused by loading a large KV cache during self-attention. Previous works have s… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  6. arXiv:2406.10576  [pdf, other

    cs.LG cs.CL stat.ML

    Optimization-based Structural Pruning for Large Language Models without Back-Propagation

    Authors: Yuan Gao, Zujing Liu, Weizhong Zhang, Bo Du, Gui-Song Xia

    Abstract: Compared to the moderate size of neural network models, structural weight pruning on the Large-Language Models (LLMs) imposes a novel challenge on the efficiency of the pruning algorithms, due to the heavy computation/memory demands of the LLMs. Recent efficient LLM pruning methods typically operate at the post-training phase without the expensive weight finetuning, however, their pruning criteria… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages

  7. arXiv:2406.05773  [pdf, other

    cs.CV

    CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder

    Authors: Tangfei Liao, Xiaoqin Zhang, Guobao Xiao, Min Li, Tao Wang, Mang Ye

    Abstract: Pre-training has emerged as a simple yet powerful methodology for representation learning across various domains. However, due to the expensive training cost and limited data, pre-training has not yet been extensively studied in correspondence pruning. To tackle these challenges, we propose a pre-training method to acquire a generic inliers-consistent representation by reconstructing masked corres… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  8. arXiv:2406.03839  [pdf, other

    cs.SE

    PCART: Automated Repair of Python API Parameter Compatibility Issues

    Authors: Shuai Zhang, Guanping Xiao, Jun Wang, Huashan Lei, Yepang Liu, Yulei Sui, Zheng Zheng

    Abstract: In modern software development, Python third-party libraries have become crucial, particularly due to their widespread use in fields such as deep learning and scientific computing. However, the parameters of APIs in third-party libraries often change during evolution, causing compatibility issues for client applications that depend on specific versions. Due to Python's flexible parameter-passing m… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Submitted to IEEE Transactions on Software Engineering

  9. arXiv:2405.18386  [pdf, other

    cs.SD cs.AI cs.LG cs.MM eess.AS

    Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning

    Authors: Yixiao Zhang, Yukara Ikemiya, Woosung Choi, Naoki Murata, Marco A. Martínez-Ramírez, Liwei Lin, Gus Xia, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

    Abstract: Recent advances in text-to-music editing, which employ text queries to modify music (e.g.\ by changing its style or adjusting instrumental components), present unique challenges and opportunities for AI-assisted music creation. Previous approaches in this domain have been constrained by the necessity to train specific editing models from scratch, which is both resource-intensive and inefficient; o… ▽ More

    Submitted 29 May, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

    Comments: Code and demo are available at: https://github.com/ldzhangyx/instruct-musicgen

  10. arXiv:2405.13050  [pdf, other

    cs.HC cs.AI

    Human-Centered LLM-Agent User Interface: A Position Paper

    Authors: Daniel Chin, Yuxuan Wang, Gus Xia

    Abstract: Large Language Model (LLM) -in-the-loop applications have been shown to effectively interpret the human user's commands, make plans, and operate external tools/systems accordingly. Still, the operation scope of the LLM agent is limited to passively following the user, requiring the user to frame his/her needs with regard to the underlying tools/systems. We note that the potential of an LLM-Agent U… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

  11. arXiv:2405.09901  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models

    Authors: Ziyu Wang, Lejun Min, Gus Xia

    Abstract: Recent deep music generation studies have put much emphasis on long-term generation with structures. However, we are yet to see high-quality, well-structured whole-song generation. In this paper, we make the first attempt to model a full music piece under the realization of compositional hierarchy. With a focus on symbolic representations of pop songs, we define a hierarchical language, in which e… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Proceedings of the International Conference on Learning Representations (ICLR 2024)

    MSC Class: 68Txx

  12. arXiv:2405.05695  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Aux-NAS: Exploiting Auxiliary Labels with Negligibly Extra Inference Cost

    Authors: Yuan Gao, Weizhong Zhang, Wenhan Luo, Lin Ma, Jin-Gang Yu, Gui-Song Xia, Jiayi Ma

    Abstract: We aim at exploiting additional auxiliary labels from an independent (auxiliary) task to boost the primary task performance which we focus on, while preserving a single task inference cost of the primary task. While most existing auxiliary learning methods are optimization-based relying on loss weights/gradients manipulation, our method is architecture-based with a flexible asymmetric structure fo… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: Accepted to ICLR 2024

    Journal ref: International Conference on Learning Representations (ICLR), 2024

  13. arXiv:2405.04532  [pdf, other

    cs.CL cs.AI cs.LG cs.PF

    QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

    Authors: Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, Song Han

    Abstract: Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. Nonetheless, state-of-the-art INT4 quantization techniques only accelerate low-batch, edge LLM inference, failing to deliver performance gains in large-batch, cloud-based LLM serving. We uncover a critical issue: existing… ▽ More

    Submitted 10 May, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: The first three authors contribute equally to this project and are listed in the alphabetical order. Yujun Lin leads the quantization algorithm, Haotian Tang and Shang Yang lead the GPU kernels and the serving system. Code is available at https://github.com/mit-han-lab/qserve

  14. arXiv:2405.03613  [pdf, other

    cs.CV

    Dual Relation Mining Network for Zero-Shot Learning

    Authors: Jinwei Han, Yingguo Gao, Zhiwen Lin, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

    Abstract: Zero-shot learning (ZSL) aims to recognize novel classes through transferring shared semantic knowledge (e.g., attributes) from seen classes to unseen classes. Recently, attention-based methods have exhibited significant progress which align visual features and attributes via a spatial attention mechanism. However, these methods only explore visual-semantic relationship in the spatial dimension, w… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  15. arXiv:2404.18081  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ComposerX: Multi-Agent Symbolic Music Composition with LLMs

    Authors: Qixin Deng, Qikai Yang, Ruibin Yuan, Yipeng Huang, Yi Wang, Xubo Liu, Zeyue Tian, Jiahao Pan, Ge Zhang, Hanfeng Lin, Yizhi Li, Yinghao Ma, Jie Fu, Chenghua Lin, Emmanouil Benetos, Wenwu Wang, Guangyu Xia, Wei Xue, Yike Guo

    Abstract: Music composition represents the creative side of humanity, and itself is a complex task that requires abilities to understand and generate information with long dependency and harmony constraints. While demonstrating impressive capabilities in STEM subjects, current LLMs easily fail in this task, generating ill-written music even when equipped with modern techniques like In-Context-Learning and C… ▽ More

    Submitted 30 April, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  16. arXiv:2404.15574  [pdf, other

    cs.CL

    Retrieval Head Mechanistically Explains Long-Context Factuality

    Authors: Wenhao Wu, Yizhong Wang, Guangxuan Xiao, Hao Peng, Yao Fu

    Abstract: Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrie… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: Preprint

  17. arXiv:2404.06393  [pdf, other

    cs.SD cs.AI eess.AS

    MuPT: A Generative Symbolic Music Pretrained Transformer

    Authors: Xingwei Qu, Yuelin Bai, Yinghao Ma, Ziya Zhou, Ka Man Lo, Jiaheng Liu, Ruibin Yuan, Lejun Min, Xueling Liu, Tianyu Zhang, Xinrun Du, Shuyue Guo, Yiming Liang, Yizhi Li, Shangda Wu, Junting Zhou, Tianyu Zheng, Ziyang Ma, Fengze Han, Wei Xue, Gus Xia, Emmanouil Benetos, Xiang Yue, Chenghua Lin, Xu Tan , et al. (4 additional authors not shown)

    Abstract: In this paper, we explore the application of Large Language Models (LLMs) to the pre-training of music. While the prevalent use of MIDI in music modeling is well-established, our findings suggest that LLMs are inherently more compatible with ABC Notation, which aligns more closely with their design and strengths, thereby enhancing the model's performance in musical composition. To address the chal… ▽ More

    Submitted 10 April, 2024; v1 submitted 9 April, 2024; originally announced April 2024.

  18. arXiv:2404.06244  [pdf, other

    cs.CV

    Anchor-based Robust Finetuning of Vision-Language Models

    Authors: Jinwei Han, Zhiwen Lin, Zhongyisun Sun, Yingguo Gao, Ke Yan, Shouhong Ding, Yuan Gao, Gui-Song Xia

    Abstract: We aim at finetuning a vision-language model without hurting its out-of-distribution (OOD) generalization. We address two types of OOD generalization, i.e., i) domain shift such as natural to sketch images, and ii) zero-shot capability to recognize the category that was not contained in the finetune data. Arguably, the diminished OOD generalization after finetuning stems from the excessively simpl… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: CVPR2024

  19. arXiv:2404.04823  [pdf, other

    cs.CV

    3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions

    Authors: Weijia Li, Haote Yang, Zhenghao Hu, Juepeng Zheng, Gui-Song Xia, Conghui He

    Abstract: 3D building reconstruction from monocular remote sensing images is an important and challenging research problem that has received increasing attention in recent years, owing to its low cost of data acquisition and availability for large-scale applications. However, existing methods rely on expensive 3D-annotated samples for fully-supervised training, restricting their application to large-scale c… ▽ More

    Submitted 7 April, 2024; originally announced April 2024.

    Comments: accepted by CVPR 2024

  20. arXiv:2403.20213  [pdf, other

    cs.CV

    H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model

    Authors: Chao Pang, Jiang Wu, Jiayu Li, Yi Liu, Jiaxing Sun, Weijia Li, Xingxing Weng, Shuai Wang, Litong Feng, Gui-Song Xia, Conghui He

    Abstract: The generic large Vision-Language Models (VLMs) is rapidly developing, but still perform poorly in Remote Sensing (RS) domain, which is due to the unique and specialized nature of RS imagery and the comparatively limited spatial perception of current VLMs. Existing Remote Sensing specific Vision Language Models (RSVLMs) still have considerable potential for improvement, primarily owing to the lack… ▽ More

    Submitted 29 March, 2024; originally announced March 2024.

    Comments: Equal contribution: Chao Pang, Jiang Wu; Corresponding author: Gui-Song Xia, Conghui He

  21. arXiv:2403.14715  [pdf, other

    cs.LG cs.AI cs.CV

    Understanding Why Label Smoothing Degrades Selective Classification and How to Fix It

    Authors: Guoxuan Xia, Olivier Laurent, Gianni Franchi, Christos-Savvas Bouganis

    Abstract: Label smoothing (LS) is a popular regularisation method for training deep neural network classifiers due to its effectiveness in improving test accuracy and its simplicity in implementation. "Hard" one-hot labels are "smoothed" by uniformly distributing probability mass to other classes, reducing overfitting. In this work, we reveal that LS negatively affects selective classification (SC) - where… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  22. arXiv:2403.14198  [pdf, other

    cs.CV

    Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization

    Authors: Guopeng Li, Ming Qian, Gui-Song Xia

    Abstract: This paper investigates the effective utilization of unlabeled data for large-area cross-view geo-localization (CVGL), encompassing both unsupervised and semi-supervised settings. Common approaches to CVGL rely on ground-satellite image pairs and employ label-driven supervised training. However, the cost of collecting precise cross-view image pairs hinders the deployment of CVGL in real-life scena… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

    Comments: Accepted by CVPR2024

  23. arXiv:2403.12702  [pdf, other

    cs.CV

    Learning Cross-view Visual Geo-localization without Ground Truth

    Authors: Haoyuan Li, Chang Xu, Wen Yang, Huai Yu, Gui-Song Xia

    Abstract: Cross-View Geo-Localization (CVGL) involves determining the geographical location of a query image by matching it with a corresponding GPS-tagged reference image. Current state-of-the-art methods predominantly rely on training models with labeled paired images, incurring substantial annotation costs and training burdens. In this study, we investigate the adaptation of frozen models for CVGL withou… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  24. arXiv:2403.06444  [pdf, other

    cs.CV

    Latent Semantic Consensus For Deterministic Geometric Model Fitting

    Authors: Guobao Xiao, Jun Yu, Jiayi Ma, Deng-Ping Fan, Ling Shao

    Abstract: Estimating reliable geometric model parameters from the data with severe outliers is a fundamental and important task in computer vision. This paper attempts to sample high-quality subsets and select model instances to estimate parameters in the multi-structural data. To address this, we propose an effective method called Latent Semantic Consensus (LSC). The principle of LSC is to preserve the lat… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  25. arXiv:2402.16280  [pdf, other

    cs.CV

    Few-Shot Learning for Annotation-Efficient Nucleus Instance Segmentation

    Authors: Yu Ming, Zihao Wu, Jie Yang, Danyi Li, Yuan Gao, Changxin Gao, Gui-Song Xia, Yuanqing Li, Li Liang, Jin-Gang Yu

    Abstract: Nucleus instance segmentation from histopathology images suffers from the extremely laborious and expert-dependent annotation of nucleus instances. As a promising solution to this task, annotation-efficient deep learning paradigms have recently attracted much research interest, such as weakly-/semi-supervised learning, generative adversarial learning, etc. In this paper, we propose to formulate an… ▽ More

    Submitted 27 February, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  26. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  27. arXiv:2402.12765  [pdf, other

    cs.CV

    GOOD: Towards Domain Generalized Orientated Object Detection

    Authors: Qi Bi, Beichen Zhou, Jingjun Yi, Wei Ji, Haolan Zhan, Gui-Song Xia

    Abstract: Oriented object detection has been rapidly developed in the past few years, but most of these methods assume the training and testing images are under the same statistical distribution, which is far from reality. In this paper, we propose the task of domain generalized oriented object detection, which intends to explore the generalization of oriented object detectors on arbitrary unseen target dom… ▽ More

    Submitted 20 February, 2024; originally announced February 2024.

    Comments: 8 pages, 6 figures

  28. arXiv:2402.10193  [pdf, other

    cs.LG cs.CL

    BitDelta: Your Fine-Tune May Only Be Worth One Bit

    Authors: James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

    Abstract: Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it's intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into t… ▽ More

    Submitted 27 February, 2024; v1 submitted 15 February, 2024; originally announced February 2024.

  29. arXiv:2402.09508  [pdf, other

    cs.SD cs.AI eess.AS

    Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls

    Authors: Liwei Lin, Gus Xia, Yixiao Zhang, Junyan Jiang

    Abstract: Controllable music generation plays a vital role in human-AI music co-creation. While Large Language Models (LLMs) have shown promise in generating high-quality music, their focus on autoregressive generation limits their utility in music editing tasks. To address this gap, we propose a novel approach leveraging a parameter-efficient heterogeneous adapter combined with a masking training scheme. T… ▽ More

    Submitted 10 June, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

  30. arXiv:2402.06178  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

    Authors: Yixiao Zhang, Yukara Ikemiya, Gus Xia, Naoki Murata, Marco A. Martínez-Ramírez, Wei-Hsiang Liao, Yuki Mitsufuji, Simon Dixon

    Abstract: Recent advances in text-to-music generation models have opened new avenues in musical creativity. However, music generation usually involves iterative refinements, and how to edit the generated music remains a significant challenge. This paper introduces a novel approach to the editing of music generated by such models, enabling the modification of specific attributes, such as genre, mood and inst… ▽ More

    Submitted 28 May, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Accepted to IJCAI 2024

  31. arXiv:2402.06079  [pdf, other

    q-bio.GN cs.AI cs.LG

    DiscDiff: Latent Diffusion Model for DNA Sequence Generation

    Authors: Zehui Li, Yuhao Ni, William A V Beardall, Guoxuan Xia, Akashaditya Das, Guy-Bart Stan, Yiren Zhao

    Abstract: This paper introduces a novel framework for DNA sequence generation, comprising two key components: DiscDiff, a Latent Diffusion Model (LDM) tailored for generating discrete DNA sequences, and Absorb-Escape, a post-training algorithm designed to refine these sequences. Absorb-Escape enhances the realism of the generated sequences by correcting `round errors' inherent in the conversion process betw… ▽ More

    Submitted 17 April, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

    Comments: Different from the prior work "Latent Diffusion Model for DNA Sequence Generation" (arXiv:2310.06150), we updated the evaluation framework and compared the DiscDiff with other methods comprehensively. In addition, a post-training framework is proposed to increase the quality of generated sequences

  32. arXiv:2402.04617  [pdf, other

    cs.CL cs.AI cs.LG

    InfLLM: Training-Free Long-Context Extrapolation for LLMs with an Efficient Context Memory

    Authors: Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Maosong Sun

    Abstract: Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs (e.g., LLM-driven agents). However, existing LLMs, pre-trained on sequences with a restricted maximum length, cannot process longer sequences due to the out-of-domain and distraction issues. Common solutions often involve continual pre-training on longer sequences, which will introdu… ▽ More

    Submitted 28 May, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  33. arXiv:2401.10752  [pdf, other

    cs.CV

    HiCD: Change Detection in Quality-Varied Images via Hierarchical Correlation Distillation

    Authors: Chao Pang, Xingxing Weng, Jiang Wu, Qiang Wang, Gui-Song Xia

    Abstract: Advanced change detection techniques primarily target image pairs of equal and high quality. However, variations in imaging conditions and platforms frequently lead to image pairs with distinct qualities: one image being high-quality, while the other being low-quality. These disparities in image quality present significant challenges for understanding image pairs semantically and extracting change… ▽ More

    Submitted 19 January, 2024; originally announced January 2024.

    Comments: accepted by TGRS

  34. arXiv:2401.08860  [pdf, other

    cs.CV

    Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization

    Authors: Qi Bi, Wei Ji, Jingjun Yi, Haolan Zhan, Gui-Song Xia

    Abstract: High-quality annotation of fine-grained visual categories demands great expert knowledge, which is taxing and time consuming. Alternatively, learning fine-grained visual representation from enormous unlabeled images (e.g., species, brands) by self-supervised learning becomes a feasible solution. However, recent researches find that existing self-supervised learning methods are less qualified to re… ▽ More

    Submitted 26 February, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: work in progress

  35. arXiv:2401.08056  [pdf, other

    cs.CV

    Robust Tiny Object Detection in Aerial Images amidst Label Noise

    Authors: Haoran Zhu, Chang Xu, Wen Yang, Ruixiang Zhang, Yan Zhang, Gui-Song Xia

    Abstract: Precise detection of tiny objects in remote sensing imagery remains a significant challenge due to their limited visual information and frequent occurrence within scenes. This challenge is further exacerbated by the practical burden and inherent errors associated with manual annotation: annotating tiny objects is laborious and prone to errors (i.e., label noise). Training detectors for such object… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

  36. arXiv:2401.03459  [pdf, other

    cs.CV

    BCLNet: Bilateral Consensus Learning for Two-View Correspondence Pruning

    Authors: Xiangyang Miao, Guobao Xiao, Shiping Wang, Jun Yu

    Abstract: Correspondence pruning aims to establish reliable correspondences between two related images and recover relative camera motion. Existing approaches often employ a progressive strategy to handle the local and global contexts, with a prominent emphasis on transitioning from local to global, resulting in the neglect of interactions between different contexts. To tackle this issue, we propose a paral… ▽ More

    Submitted 7 January, 2024; originally announced January 2024.

  37. arXiv:2401.02614  [pdf, other

    cs.CV cs.MM

    Scaling and Masking: A New Paradigm of Data Sampling for Image and Video Quality Assessment

    Authors: Yongxu Liu, Yinghui Quan, Guoyao Xiao, Aobo Li, Jinjian Wu

    Abstract: Quality assessment of images and videos emphasizes both local details and global semantics, whereas general data sampling methods (e.g., resizing, cropping or grid-based fragment) fail to catch them simultaneously. To address the deficiency, current approaches have to adopt multi-branch models and take as input the multi-resolution data, which burdens the model complexity. In this work, instead of… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI2024. Code has been released at https://github.com/Sissuire/SAMA

  38. arXiv:2312.15971  [pdf, other

    cs.CV

    Graph Context Transformation Learning for Progressive Correspondence Pruning

    Authors: Junwen Guo, Guobao Xiao, Shiping Wang, Jun Yu

    Abstract: Most of existing correspondence pruning methods only concentrate on gathering the context information as much as possible while neglecting effective ways to utilize such information. In order to tackle this dilemma, in this paper we propose Graph Context Transformation Network (GCT-Net) enhancing context information to conduct consensus guidance for progressive correspondence pruning. Specifically… ▽ More

    Submitted 26 December, 2023; originally announced December 2023.

  39. arXiv:2312.11112  [pdf, other

    cs.CV

    ConDaFormer: Disassembled Transformer with Local Structure Enhancement for 3D Point Cloud Understanding

    Authors: Lunhao Duan, Shanshan Zhao, Nan Xue, Mingming Gong, Gui-Song Xia, Dacheng Tao

    Abstract: Transformers have been recently explored for 3D point cloud understanding with impressive progress achieved. A large number of points, over 0.1 million, make the global self-attention infeasible for point cloud data. Thus, most methods propose to apply the transformer in a local region, e.g., spherical or cubic window. However, it still contains a large number of Query-Key pairs, which requires hi… ▽ More

    Submitted 18 December, 2023; originally announced December 2023.

    Comments: NeurIPS 2023. Code: https://github.com/LHDuan/ConDaFormer

  40. arXiv:2312.08774  [pdf, other

    cs.CV

    VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning

    Authors: Tangfei Liao, Xiaoqin Zhang, Li Zhao, Tao Wang, Guobao Xiao

    Abstract: Correspondence pruning aims to find correct matches (inliers) from an initial set of putative correspondences, which is a fundamental task for many applications. The process of finding is challenging, given the varying inlier ratios between scenes/image pairs due to significant visual differences. However, the performance of the existing methods is usually limited by the problem of lacking visual… ▽ More

    Submitted 4 January, 2024; v1 submitted 14 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI2024

  41. arXiv:2312.01536  [pdf, other

    cs.CV

    CalliPaint: Chinese Calligraphy Inpainting with Diffusion Model

    Authors: Qisheng Liao, Zhinuo Wang, Muhammad Abdul-Mageed, Gus Xia

    Abstract: Chinese calligraphy can be viewed as a unique form of visual art. Recent advancements in computer vision hold significant potential for the future development of generative models in the realm of Chinese calligraphy. Nevertheless, methods of Chinese calligraphy inpainting, which can be effectively used in the art and education fields, remain relatively unexplored. In this paper, we introduce a new… ▽ More

    Submitted 3 December, 2023; originally announced December 2023.

    Comments: Accepted as a Machine Learning for Creativity and Design(ML4CD) workshop paper at NeruaIPS 2023. https://neurips.cc/virtual/2023/workshop/66545#wse-detail-75063

  42. arXiv:2311.00157  [pdf, other

    cs.LG cs.AI cs.CV

    Score Normalization for a Faster Diffusion Exponential Integrator Sampler

    Authors: Guoxuan Xia, Duolikun Danier, Ayan Das, Stathi Fotiadis, Farhang Nabiei, Ushnish Sengupta, Alberto Bernacchia

    Abstract: Recently, Zhang et al. have proposed the Diffusion Exponential Integrator Sampler (DEIS) for fast generation of samples from Diffusion Models. It leverages the semi-linear nature of the probability flow ordinary differential equation (ODE) in order to greatly reduce integration error and improve generation quality at low numbers of function evaluations (NFEs). Key to this approach is the score fun… ▽ More

    Submitted 9 November, 2023; v1 submitted 31 October, 2023; originally announced November 2023.

  43. arXiv:2310.17162  [pdf, other

    cs.AI cs.SD eess.AS

    Content-based Controls For Music Large Language Modeling

    Authors: Liwei Lin, Gus Xia, Junyan Jiang, Yixiao Zhang

    Abstract: Recent years have witnessed a rapid growth of large-scale language models in the domain of music audio. Such models enable end-to-end generation of higher-quality music, and some allow conditioned generation using text descriptions. However, the control power of text controls on music is intrinsically limited, as they can only describe music indirectly through meta-data (such as singers and instru… ▽ More

    Submitted 13 April, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

  44. arXiv:2310.16334  [pdf, other

    cs.SD cs.AI cs.MM eess.AS

    AccoMontage-3: Full-Band Accompaniment Arrangement via Sequential Style Transfer and Multi-Track Function Prior

    Authors: Jingwei Zhao, Gus Xia, Ye Wang

    Abstract: We propose AccoMontage-3, a symbolic music automation system capable of generating multi-track, full-band accompaniment based on the input of a lead melody with chords (i.e., a lead sheet). The system contains three modular components, each modelling a vital aspect of full-band composition. The first component is a piano arranger that generates piano accompaniment for the lead sheet by transferrin… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  45. arXiv:2310.14718  [pdf, other

    cs.CV

    Rethinking Scale Imbalance in Semi-supervised Object Detection for Aerial Images

    Authors: Ruixiang Zhang, Chang Xu, Fang Xu, Wen Yang, Guangjun He, Huai Yu, Gui-Song Xia

    Abstract: This paper focuses on the scale imbalance problem of semi-supervised object detection(SSOD) in aerial images. Compared to natural images, objects in aerial images show smaller sizes and larger quantities per image, increasing the difficulty of manual annotation. Meanwhile, the advanced SSOD technique can train superior detectors by leveraging limited labeled data and massive unlabeled data, saving… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  46. arXiv:2310.12404  [pdf, other

    cs.SD cs.CL cs.HC cs.LG eess.AS

    Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing

    Authors: Yixiao Zhang, Akira Maezawa, Gus Xia, Kazuhiko Yamamoto, Simon Dixon

    Abstract: Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpre… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: Source code and demo video are available at \url{https://sites.google.com/view/loop-copilot}

  47. arXiv:2310.11555  [pdf, other

    cs.DB cs.AI

    Integrating 3D City Data through Knowledge Graphs

    Authors: Linfang Ding, Guohui Xiao, Albulen Pano, Mattia Fumagalli, Dongsheng Chen, Yu Feng, Diego Calvanese, Hongchao Fan, Liqiu Meng

    Abstract: CityGML is a widely adopted standard by the Open Geospatial Consortium (OGC) for representing and exchanging 3D city models. The representation of semantic and topological properties in CityGML makes it possible to query such 3D city data to perform analysis in various applications, e.g., security management and emergency response, energy consumption and estimation, and occupancy measurement. Howe… ▽ More

    Submitted 17 October, 2023; originally announced October 2023.

  48. arXiv:2310.10815  [pdf, ps, other

    cs.DS

    Streaming Algorithms for Graph k-Matching with Optimal or Near-Optimal Update Time

    Authors: Jianer Chen, Qin Huang, Iyad Kanj, Qian Li, Ge Xia

    Abstract: We present streaming algorithms for the graph $k$-matching problem in both the insert-only and dynamic models. Our algorithms, with space complexity matching the best upper bounds, have optimal or near-optimal update time, significantly improving on previous results. More specifically, for the insert-only streaming model, we present a one-pass algorithm with optimal space complexity $O(k^2)$ and o… ▽ More

    Submitted 16 October, 2023; originally announced October 2023.

  49. arXiv:2310.06428   

    cs.AI cs.HC cs.SD eess.AS

    Proceedings of The first international workshop on eXplainable AI for the Arts (XAIxArts)

    Authors: Nick Bryan-Kinns, Corey Ford, Alan Chamberlain, Steven David Benford, Helen Kennedy, Zijin Li, Wu Qiong, Gus G. Xia, Jeba Rezwana

    Abstract: This first international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts to explore the role of XAI for the Arts. Workshop held at the 15th ACM Conference on Creativity and Cognition (C&C 2023).

    Submitted 10 October, 2023; originally announced October 2023.

  50. arXiv:2310.06150  [pdf, other

    cs.LG

    Latent Diffusion Model for DNA Sequence Generation

    Authors: Zehui Li, Yuhao Ni, Tim August B. Huygelen, Akashaditya Das, Guoxuan Xia, Guy-Bart Stan, Yiren Zhao

    Abstract: The harnessing of machine learning, especially deep generative models, has opened up promising avenues in the field of synthetic DNA sequence generation. Whilst Generative Adversarial Networks (GANs) have gained traction for this application, they often face issues such as limited sample diversity and mode collapse. On the other hand, Diffusion Models are a promising new class of generative models… ▽ More

    Submitted 24 December, 2023; v1 submitted 9 October, 2023; originally announced October 2023.

    Comments: 2023 Conference on Neural Information Processing Systems (NeurIPS 2023) AI for Science Workshop