Skip to main content

Showing 1–34 of 34 results for author: Qi, D

  1. arXiv:2406.11434  [pdf, other

    cs.DB

    DB-GPT-Hub: Towards Open Benchmarking Text-to-SQL Empowered by Large Language Models

    Authors: Fan Zhou, Siqiao Xue, Danrui Qi, Wenhui Shi, Wang Zhao, Ganglin Wei, Hongyang Zhang, Caigai Jiang, Gangwei Jiang, Zhixuan Chu, Faqiang Chen

    Abstract: Large language models (LLMs) becomes the dominant paradigm for the challenging task of text-to-SQL. LLM-empowered text-to-SQL methods are typically categorized into prompting-based and tuning approaches. Compared to prompting-based methods, benchmarking fine-tuned LLMs for text-to-SQL is important yet under-explored, partially attributed to the prohibitively high computational cost. In this paper,… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  2. arXiv:2406.10839  [pdf, other

    cs.CV cs.CL

    Reminding Multimodal Large Language Models of Object-aware Knowledge with Retrieved Tags

    Authors: Daiqing Qi, Handong Zhao, Zijun Wei, Sheng Li

    Abstract: Despite recent advances in the general visual instruction-following ability of Multimodal Large Language Models (MLLMs), they still struggle with critical problems when required to provide a precise and detailed response to a visual instruction: (1) failure to identify novel objects or entities, (2) mention of non-existent objects, and (3) neglect of object's attributed details. Intuitive solution… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures

  3. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  4. arXiv:2404.10209  [pdf, other

    cs.AI cs.LG

    Demonstration of DB-GPT: Next Generation Data Interaction System Empowered by Large Language Models

    Authors: Siqiao Xue, Danrui Qi, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Hong Yi, Shaodong Liu, Hongjun Yang, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. The technologies of interacting with data particularly have an important entanglement with LLMs as efficient and intuitive data interactions are paramount. In this paper, we present DB-GPT, a revolutionary and product-ready Python library that integrates LLMs into traditional data interact… ▽ More

    Submitted 24 April, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

  5. arXiv:2404.02617  [pdf, other

    cs.CV

    Neural Radiance Fields with Torch Units

    Authors: Bingnan Ni, Huanyu Wang, Dongfeng Bai, Minghe Weng, Dexin Qi, Weichao Qiu, Bingbing Liu

    Abstract: Neural Radiance Fields (NeRF) give rise to learning-based 3D reconstruction methods widely used in industrial applications. Although prevalent methods achieve considerable improvements in small-scale scenes, accomplishing reconstruction in complex and large-scale scenes is still challenging. First, the background in complex scenes shows a large variance among different views. Second, the current i… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

  6. arXiv:2403.19369  [pdf, other

    cs.RO

    RAIL: Robot Affordance Imagination with Large Language Models

    Authors: Ceng Zhang, Xin Meng, Dongchen Qi, Gregory S. Chirikjian

    Abstract: This paper introduces an automatic affordance reasoning paradigm tailored to minimal semantic inputs, addressing the critical challenges of classifying and manipulating unseen classes of objects in household settings. Inspired by human cognitive processes, our method integrates generative language models and physics-based simulators to foster analytical thinking and creative imagination of novel a… ▽ More

    Submitted 7 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

  7. arXiv:2403.08291  [pdf, other

    cs.LG cs.AI cs.MA

    CleanAgent: Automating Data Standardization with LLM-based Agents

    Authors: Danrui Qi, Jiannan Wang

    Abstract: Data standardization is a crucial part in data science life cycle. While tools like Pandas offer robust functionalities, their complexity and the manual effort required for customizing code to diverse column types pose significant challenges. Although large language models (LLMs) like ChatGPT have shown promise in automating this process through natural language understanding and code generation,… ▽ More

    Submitted 24 April, 2024; v1 submitted 13 March, 2024; originally announced March 2024.

  8. arXiv:2403.06367  [pdf, other

    cs.LG cs.DB

    FeatAug: Automatic Feature Augmentation From One-to-Many Relationship Tables

    Authors: Danrui Qi, Weiling Zheng, Jiannan Wang

    Abstract: Feature augmentation from one-to-many relationship tables is a critical but challenging problem in ML model development. To augment good features, data scientists need to come up with SQL queries manually, which is time-consuming. Featuretools [1] is a widely used tool by the data science community to automatically augment the training data by extracting new features from relevant tables. It repre… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

  9. arXiv:2401.02241  [pdf, other

    cs.CV

    Slot-guided Volumetric Object Radiance Fields

    Authors: Di Qi, Tong Yang, Xiangyu Zhang

    Abstract: We present a novel framework for 3D object-centric representation learning. Our approach effectively decomposes complex scenes into individual objects from a single image in an unsupervised fashion. This method, called slot-guided Volumetric Object Radiance Fields (sVORF), composes volumetric object radiance fields with object slots as a guidance to implement unsupervised 3D scene decomposition. S… ▽ More

    Submitted 4 January, 2024; originally announced January 2024.

    Comments: NeurIPS 2023

  10. arXiv:2312.17449  [pdf, other

    cs.DB

    DB-GPT: Empowering Database Interactions with Private Large Language Models

    Authors: Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Danrui Qi, Hong Yi, Shaodong Liu, Faqiang Chen

    Abstract: The recent breakthroughs in large language models (LLMs) are positioned to transition many areas of software. Database technologies particularly have an important entanglement with LLMs as efficient and intuitive database interactions are paramount. In this paper, we present DB-GPT, a revolutionary and production-ready project that integrates LLMs with traditional database systems to enhance user… ▽ More

    Submitted 3 January, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  11. arXiv:2310.18698  [pdf, other

    cs.CV cs.LG

    Triplet Attention Transformer for Spatiotemporal Predictive Learning

    Authors: Xuesong Nie, Xi Chen, Haoyuan Jin, Zhihang Zhu, Yunfeng Yan, Donglian Qi

    Abstract: Spatiotemporal predictive learning offers a self-supervised learning paradigm that enables models to learn both spatial and temporal patterns by predicting future sequences based on historical sequences. Mainstream methods are dominated by recurrent units, yet they are limited by their lack of parallelization and often underperform in real-world scenarios. To improve prediction quality while maint… ▽ More

    Submitted 28 October, 2023; originally announced October 2023.

    Comments: Accepted to WACV 2024

  12. arXiv:2310.02540  [pdf, other

    cs.LG cs.AI cs.DB cs.IR

    Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data

    Authors: Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang

    Abstract: Classical machine learning models, such as linear models and tree-based models, are widely used in industry. These models are sensitive to data distribution, thus feature preprocessing, which transforms features from one distribution to another, is a crucial step to ensure good model quality. Manually constructing a feature preprocessing pipeline is challenging because data scientists need to make… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  13. arXiv:2308.12315  [pdf, other

    cs.LG cs.AI

    Trustworthy Representation Learning Across Domains

    Authors: Ronghang Zhu, Dongliang Guo, Daiqing Qi, Zhixuan Chu, Xiang Yu, Sheng Li

    Abstract: As AI systems have obtained significant performance to be deployed widely in our daily live and human society, people both enjoy the benefits brought by these technologies and suffer many social issues induced by these systems. To make AI systems good enough and trustworthy, plenty of researches have been done to build guidelines for trustworthy AI systems. Machine learning is one of the most impo… ▽ More

    Submitted 29 August, 2023; v1 submitted 23 August, 2023; originally announced August 2023.

    Comments: 38 pages, 15 figures

    ACM Class: A.1

  14. arXiv:2306.07520  [pdf, other

    cs.CV

    Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions

    Authors: Weizhen He, Yiheng Deng, Shixiang Tang, Qihao Chen, Qingsong Xie, Yizhou Wang, Lei Bai, Feng Zhu, Rui Zhao, Wanli Ouyang, Donglian Qi, Yunfeng Yan

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a new instruct-ReID task that requires the model to retrieve im… ▽ More

    Submitted 31 December, 2023; v1 submitted 12 June, 2023; originally announced June 2023.

  15. arXiv:2303.02936  [pdf, other

    cs.CV

    UniHCP: A Unified Model for Human-Centric Perceptions

    Authors: Yuanzheng Ci, Yizhou Wang, Meilin Chen, Shixiang Tang, Lei Bai, Feng Zhu, Rui Zhao, Fengwei Yu, Donglian Qi, Wanli Ouyang

    Abstract: Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian detection, person re-identification, etc.) play a key role in industrial applications of visual models. While specific human-centric tasks have their own relevant semantic aspect to focus on, they also share the same underlying semantic structure of the human body. However, few works have attempted to exploit such homogene… ▽ More

    Submitted 22 June, 2023; v1 submitted 6 March, 2023; originally announced March 2023.

    Comments: Accepted for publication at the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2023 (CVPR 2023)

  16. arXiv:2302.13001  [pdf, other

    cs.LG cs.AI

    Better Generative Replay for Continual Federated Learning

    Authors: Daiqing Qi, Handong Zhao, Sheng Li

    Abstract: Federated learning is a technique that enables a centralized server to learn from distributed clients via communications without accessing the client local data. However, existing federated learning works mainly focus on a single task scenario with static data. In this paper, we introduce the problem of continual federated learning, where clients incrementally learn new tasks and history data cann… ▽ More

    Submitted 25 February, 2023; originally announced February 2023.

  17. arXiv:2302.11461  [pdf, other

    cs.CV

    Saliency Guided Contrastive Learning on Scene Images

    Authors: Meilin Chen, Yizhou Wang, Shixiang Tang, Feng Zhu, Haiyang Yang, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

    Abstract: Self-supervised learning holds promise in leveraging large numbers of unlabeled data. However, its success heavily relies on the highly-curated dataset, e.g., ImageNet, which still needs human cleaning. Directly learning representations from less-curated scene images is essential for pushing self-supervised learning to a higher level. Different from curated images which include simple and clear se… ▽ More

    Submitted 23 February, 2023; v1 submitted 22 February, 2023; originally announced February 2023.

    Comments: 12 pages, 5 figures. arXiv admin note: text overlap with arXiv:2106.11952 by other authors

  18. arXiv:2206.06293  [pdf, other

    cs.CV cs.AI

    Learning Domain Adaptive Object Detection with Probabilistic Teacher

    Authors: Meilin Chen, Weijie Chen, Shicai Yang, Jie Song, Xinchao Wang, Lei Zhang, Yunfeng Yan, Donglian Qi, Yueting Zhuang, Di Xie, Shiliang Pu

    Abstract: Self-training for unsupervised domain adaptive object detection is a challenging task, of which the performance depends heavily on the quality of pseudo boxes. Despite the promising results, prior works have largely overlooked the uncertainty of pseudo boxes during self-training. In this paper, we present a simple yet effective framework, termed as Probabilistic Teacher (PT), which aims to capture… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

    Comments: To appear in ICML 2022. Code is coming soon: https://github.com/hikvision-research/ProbabilisticTeacher

    Journal ref: International Conference on Machine Learning (ICML), 2022

  19. arXiv:2204.02574  [pdf, other

    cs.CV

    FocalClick: Towards Practical Interactive Image Segmentation

    Authors: Xi Chen, Zhiyan Zhao, Yilei Zhang, Manni Duan, Donglian Qi, Hengshuang Zhao

    Abstract: Interactive segmentation allows users to extract target masks by making positive/negative clicks. Although explored by many previous works, there is still a gap between academic approaches and industrial needs: first, existing models are not efficient enough to work on low power devices; second, they perform poorly when used to refine preexisting masks as they could not avoid destroying the correc… ▽ More

    Submitted 17 April, 2022; v1 submitted 6 April, 2022; originally announced April 2022.

    Comments: CVPR2022

  20. arXiv:2203.12301  [pdf

    cs.CV

    Lane detection with Position Embedding

    Authors: Jun Xie, Jiacheng Han, Dezhen Qi, Feng Chen, Kaer Huang, Jianwei Shuai

    Abstract: Recently, lane detection has made great progress in autonomous driving. RESA (REcurrent Feature-Shift Aggregator) is based on image segmentation. It presents a novel module to enrich lane feature after preliminary feature extraction with an ordinary CNN. For Tusimple dataset, there is not too complicated scene and lane has more prominent spatial features. On the basis of RESA, we introduce the met… ▽ More

    Submitted 23 March, 2022; originally announced March 2022.

  21. arXiv:2112.13809  [pdf, other

    cs.CV

    Improving Deep Image Matting via Local Smoothness Assumption

    Authors: Rui Wang, Jun Xie, Jiacheng Han, Dezhen Qi

    Abstract: Natural image matting is a fundamental and challenging computer vision task. Conventionally, the problem is formulated as an underconstrained problem. Since the problem is ill-posed, further assumptions on the data distribution are required to make the problem well-posed. For classical matting methods, a commonly adopted assumption is the local smoothness assumption on foreground and background co… ▽ More

    Submitted 5 April, 2022; v1 submitted 27 December, 2021; originally announced December 2021.

    Comments: 9 pages, accepted by IEEE ICME 2022

  22. arXiv:2112.00496  [pdf, other

    cs.CV

    Revisiting the Transferability of Supervised Pretraining: an MLP Perspective

    Authors: Yizhou Wang, Shixiang Tang, Feng Zhu, Lei Bai, Rui Zhao, Donglian Qi, Wanli Ouyang

    Abstract: The pretrain-finetune paradigm is a classical pipeline in visual learning. Recent progress on unsupervised pretraining methods shows superior transfer performance to their supervised counterparts. This paper revisits this phenomenon and sheds new light on understanding the transferability gap between unsupervised and supervised pretraining from a multilayer perceptron (MLP) perspective. While prev… ▽ More

    Submitted 28 March, 2022; v1 submitted 1 December, 2021; originally announced December 2021.

    Comments: Accepted by CVPR 2022. [camera ready with supplement]

  23. arXiv:2110.01521  [pdf, other

    cs.CV cs.AI

    Balanced Masked and Standard Face Recognition

    Authors: Delong Qi, Kangli Hu, Weijun Tan, Qi Yao, Jingfeng Liu

    Abstract: We present the improved network architecture, data augmentation, and training strategies for the Webface track and Insightface/Glint360K track of the masked face recognition challenge of ICCV2021. One of the key goals is to have a balanced performance of masked and standard face recognition. In order to prevent the overfitting for the masked face recognition, we control the total number of masked… ▽ More

    Submitted 4 October, 2021; originally announced October 2021.

    Journal ref: 2021 ICCV Workshops

  24. arXiv:2109.14811  [pdf, other

    cs.LG math.OC

    Surveillance Evasion Through Bayesian Reinforcement Learning

    Authors: Dongping Qi, David Bindel, Alexander Vladimirsky

    Abstract: We consider a task of surveillance-evading path-planning in a continuous setting. An Evader strives to escape from a 2D domain while minimizing the risk of detection (and immediate capture). The probability of detection is path-dependent and determined by the spatially inhomogeneous surveillance intensity, which is fixed but a priori unknown and gradually learned in the multi-episodic setting. We… ▽ More

    Submitted 23 February, 2023; v1 submitted 29 September, 2021; originally announced September 2021.

    Comments: 6 pages, 3 figures; accepted for presentation publication at AISTATS 2023

    MSC Class: 93E35; 49L20; 68W27; 68T37; 60G15; 62N02

  25. arXiv:2105.12931  [pdf, other

    cs.CV

    YOLO5Face: Why Reinventing a Face Detector

    Authors: Delong Qi, Weijun Tan, Qi Yao, Jingfeng Liu

    Abstract: Tremendous progress has been made on face detection in recent years using convolutional neural networks. While many face detectors use designs designated for detecting faces, we treat face detection as a generic object detection task. We implement a face detector based on the YOLOv5 object detector and call it YOLO5Face. We make a few key modifications to the YOLOv5 and optimize it for face detect… ▽ More

    Submitted 27 January, 2022; v1 submitted 26 May, 2021; originally announced May 2021.

  26. arXiv:2105.01058  [pdf, other

    cs.CV

    A Dataset and System for Real-Time Gun Detection in Surveillance Video Using Deep Learning

    Authors: Delong Qi, Weijun Tan, Zhifu Liu, Qi Yao, Jingfeng Liu

    Abstract: Gun violence is a severe problem in the world, particularly in the United States. Deep learning methods have been studied to detect guns in surveillance video cameras or smart IP cameras and to send a real-time alert to security personals. One problem for the development of gun detection algorithms is the lack of large public datasets. In this work, we first publish a dataset with 51K annotated gu… ▽ More

    Submitted 16 August, 2021; v1 submitted 3 May, 2021; originally announced May 2021.

    Comments: IEEE SMC 2021 Oral

  27. arXiv:2103.00301  [pdf, other

    cs.LG

    Spline parameterization of neural network controls for deep learning

    Authors: Stefanie Günther, Will Pazner, Dongping Qi

    Abstract: Based on the continuous interpretation of deep learning cast as an optimal control problem, this paper investigates the benefits of employing B-spline basis functions to parameterize neural network controls across the layers. Rather than equipping each layer of a discretized ODE-network with a set of trainable weights, we choose a fixed number of B-spline basis functions whose coefficients are the… ▽ More

    Submitted 27 February, 2021; originally announced March 2021.

    Comments: 19 pages, 9 figures

  28. arXiv:2010.15881  [pdf, other

    cs.CL cs.AI

    Less is More: Data-Efficient Complex Question Answering over Knowledge Bases

    Authors: Yuncheng Hua, Yuan-Fang Li, Guilin Qi, Wei Wu, Jingyao Zhang, Daiqing Qi

    Abstract: Question answering is an effective method for obtaining information from knowledge bases (KB). In this paper, we propose the Neural-Symbolic Complex Question Answering (NS-CQA) model, a data-efficient reinforcement learning framework for complex question answering by using only a modest number of training samples. Our framework consists of a neural generator and a symbolic executor that, respectiv… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: 18 pages, 4 figures, published in JWS

  29. arXiv:2003.00482  [pdf, other

    cs.CV

    State-Aware Tracker for Real-Time Video Object Segmentation

    Authors: Xi Chen, Zuoxin Li, Ye Yuan, Gang Yu, Jianxin Shen, Donglian Qi

    Abstract: In this work, we address the task of semi-supervised video object segmentation(VOS) and explore how to make efficient use of video property to tackle the challenge of semi-supervision. We propose a novel pipeline called State-Aware Tracker(SAT), which can produce accurate segmentation results with real-time speed. For higher efficiency, SAT takes advantage of the inter-frame consistency and deals… ▽ More

    Submitted 1 March, 2020; originally announced March 2020.

    Comments: Accepted by CVPR2020

  30. arXiv:2001.07966  [pdf, other

    cs.CV

    ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

    Authors: Di Qi, Lin Su, Jia Song, Edward Cui, Taroon Bharti, Arun Sacheti

    Abstract: In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them. The model is pre-trained on four tasks simultaneously: Masked Language Modeling (MLM), Masked Object Classification (MOC), Masked Region Feature Regression (MRF… ▽ More

    Submitted 23 January, 2020; v1 submitted 22 January, 2020; originally announced January 2020.

  31. arXiv:1912.11333   

    cs.SD cs.LG eess.AS

    Audio-based automatic mating success prediction of giant pandas

    Authors: WeiRan Yan, MaoLin Tang, Qijun Zhao, Peng Chen, Dunwu Qi, Rong Hou, Zhihe Zhang

    Abstract: Giant pandas, stereotyped as silent animals, make significantly more vocal sounds during breeding season, suggesting that sounds are essential for coordinating their reproduction and expression of mating preference. Previous biological studies have also proven that giant panda sounds are correlated with mating results and reproduction. This paper makes the first attempt to devise an automatic meth… ▽ More

    Submitted 3 June, 2020; v1 submitted 24 December, 2019; originally announced December 2019.

    Comments: The manuscript needs further revision

  32. arXiv:1901.03814  [pdf, other

    cs.CV

    Boundary-Aware Network for Fast and High-Accuracy Portrait Segmentation

    Authors: Xi Chen, Donglian Qi, Jianxin Shen

    Abstract: Compared with other semantic segmentation tasks, portrait segmentation requires both higher precision and faster inference speed. However, this problem has not been well studied in previous works. In this paper, we propose a lightweight network architecture, called Boundary-Aware Network (BANet) which selectively extracts detail information in boundary area to make high-quality segmentation output… ▽ More

    Submitted 12 January, 2019; originally announced January 2019.

  33. arXiv:1701.02543  [pdf, other

    cs.AI

    Predicting Citywide Crowd Flows Using Deep Spatio-Temporal Residual Networks

    Authors: Junbo Zhang, Yu Zheng, Dekang Qi, Ruiyuan Li, Xiuwen Yi, Tianrui Li

    Abstract: Forecasting the flow of crowds is of great importance to traffic management and public safety, and very challenging as it is affected by many complex factors, including spatial dependencies (nearby and distant), temporal dependencies (closeness, period, trend), and external conditions (e.g., weather and events). We propose a deep-learning-based approach, called ST-ResNet, to collectively forecast… ▽ More

    Submitted 10 January, 2017; originally announced January 2017.

    Comments: 21 pages, 16 figures. arXiv admin note: substantial text overlap with arXiv:1610.00081

  34. arXiv:1610.00081  [pdf, other

    cs.AI cs.LG

    Deep Spatio-Temporal Residual Networks for Citywide Crowd Flows Prediction

    Authors: Junbo Zhang, Yu Zheng, Dekang Qi

    Abstract: Forecasting the flow of crowds is of great importance to traffic management and public safety, yet a very challenging task affected by many complex factors, such as inter-region traffic, events and weather. In this paper, we propose a deep-learning-based approach, called ST-ResNet, to collectively forecast the in-flow and out-flow of crowds in each and every region through a city. We design an end… ▽ More

    Submitted 10 January, 2017; v1 submitted 30 September, 2016; originally announced October 2016.

    Comments: AAAI 2017