Skip to main content

Showing 1–50 of 659 results for author: Deng, Y

  1. arXiv:2407.02252  [pdf, other

    cs.CV

    GlyphDraw2: Automatic Generation of Complex Glyph Posters with Diffusion Models and Large Language Models

    Authors: Jian Ma, Yonglin Deng, Chen Chen, Haonan Lu, Zhenyu Yang

    Abstract: Posters play a crucial role in marketing and advertising, contributing significantly to industrial design by enhancing visual communication and brand visibility. With recent advances in controllable text-to-image diffusion models, more concise research is now focusing on rendering text within synthetic images. Despite improvements in text rendering accuracy, the field of end-to-end poster generati… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  2. arXiv:2407.01489  [pdf, other

    cs.SE cs.AI cs.CL cs.LG

    Agentless: Demystifying LLM-based Software Engineering Agents

    Authors: Chunqiu Steven Xia, Yinlin Deng, Soren Dunn, Lingming Zhang

    Abstract: Recent advancements in large language models (LLMs) have significantly advanced the automation of software development tasks, including code synthesis, program repair, and test generation. More recently, researchers and industry practitioners have developed various autonomous LLM agents to perform end-to-end software development tasks. These agents are equipped with the ability to use tools, run c… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2407.01231  [pdf, other

    cs.CL cs.AI

    MIRAI: Evaluating LLM Agents for Event Forecasting

    Authors: Chenchen Ye, Ziniu Hu, Yihe Deng, Zijie Huang, Mingyu Derek Ma, Yanqiao Zhu, Wei Wang

    Abstract: Recent advancements in Large Language Models (LLMs) have empowered LLM agents to autonomously collect world information, over which to conduct reasoning to solve complex problems. Given this capability, increasing interests have been put into employing LLM agents for predicting international events, which can influence decision-making and shape policy development on an international scale. Despite… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 66 pages, 8 figures, 6 tables; Website: https://mirai-llm.github.io/

  4. arXiv:2406.14844  [pdf, other

    cs.LG cs.AI

    DN-CL: Deep Symbolic Regression against Noise via Contrastive Learning

    Authors: Jingyi Liu, Yanjie Li, Lina Yu, Min Wu, Weijun Li, Wenqiang Li, Meilan Hao, Yusong Deng, Shu Wei

    Abstract: Noise ubiquitously exists in signals due to numerous factors including physical, electronic, and environmental effects. Traditional methods of symbolic regression, such as genetic programming or deep learning models, aim to find the most fitting expressions for these signals. However, these methods often overlook the noise present in real-world data, leading to reduced fitting accuracy. To tackle… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  5. arXiv:2406.14283  [pdf, other

    cs.AI

    Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning

    Authors: Chaojie Wang, Yanchen Deng, Zhiyi Lv, Zeng Liang, Jujie He, Shuicheng Yan, An Bo

    Abstract: Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. However, the auto-regressive generation process makes LLMs prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. In this paper, by casting multi-step reasoning of LLMs as a heuristic search problem, we aim to alleviate the pathology by introducing… ▽ More

    Submitted 27 June, 2024; v1 submitted 20 June, 2024; originally announced June 2024.

  6. arXiv:2406.13963  [pdf, ps, other

    cs.CV

    SSAD: Self-supervised Auxiliary Detection Framework for Panoramic X-ray based Dental Disease Diagnosis

    Authors: Zijian Cai, Xinquan Yang, Xuguang Li, Xiaoling Luo, Xuechen Li, Linlin Shen, He Meng, Yongqiang Deng

    Abstract: Panoramic X-ray is a simple and effective tool for diagnosing dental diseases in clinical practice. When deep learning models are developed to assist dentist in interpreting panoramic X-rays, most of their performance suffers from the limited annotated data, which requires dentist's expertise and a lot of time cost. Although self-supervised learning (SSL) has been proposed to address this challeng… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  7. arXiv:2406.12639  [pdf, other

    cs.CL cs.AI

    Ask-before-Plan: Proactive Language Agents for Real-World Planning

    Authors: Xuan Zhang, Yang Deng, Zifeng Ren, See-Kiong Ng, Tat-Seng Chua

    Abstract: The evolution of large language models (LLMs) has enhanced the planning capabilities of language agents in diverse real-world scenarios. Despite these advancements, the potential of LLM-powered agents to comprehend ambiguous user instructions for reasoning and decision-making is still under exploration. In this work, we introduce a new task, Proactive Agent Planning, which requires language agents… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  8. arXiv:2406.12355  [pdf, other

    cs.CV

    LiCAF: LiDAR-Camera Asymmetric Fusion for Gait Recognition

    Authors: Yunze Deng, Haijun Xiong, Bin Feng

    Abstract: Gait recognition is a biometric technology that identifies individuals by using walking patterns. Due to the significant achievements of multimodal fusion in gait recognition, we consider employing LiDAR-camera fusion to obtain robust gait representations. However, existing methods often overlook intrinsic characteristics of modalities, and lack fine-grained fusion and temporal modeling. In this p… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: Accepted by ICIP2024

  9. arXiv:2406.11364  [pdf, other

    cs.SD eess.AS

    AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection

    Authors: Anbai Jiang, Bing Han, Zhiqiang Lv, Yufeng Deng, Wei-Qiang Zhang, Xie Chen, Yanmin Qian, Jia Liu, Pingyi Fan

    Abstract: Large pre-trained models have demonstrated dominant performances in multiple areas, where the consistency between pre-training and fine-tuning is the key to success. However, few works reported satisfactory results of pre-trained models for the machine anomalous sound detection (ASD) task. This may be caused by the inconsistency of the pre-trained model and the inductive bias of machine audio, res… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted by INTERSPEECH 2024

  10. arXiv:2406.11066  [pdf, other

    cs.CV

    Parameter Blending for Multi-Camera Harmonization for Automotive Surround View Systems

    Authors: Yuzhuo Ren, Yining Deng, David Pajak, Robin Jenkin, Niranjan Avadhanam, Varsha Hedau

    Abstract: In a surround view system, the image color and tone captured by multiple cameras can be different due to cameras applying auto white balance (AWB), global tone mapping (GTM) individually for each camera. The color and brightness along stitched seam location may look discontinuous among multiple cameras which impacts overall stitched image visual quality. To improve the color transition between adj… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  11. arXiv:2406.09410  [pdf, other

    cs.CV cs.AI

    STAR: A First-Ever Dataset and A Large-Scale Benchmark for Scene Graph Generation in Large-Size Satellite Imagery

    Authors: Yansheng Li, Linlin Wang, Tingzhu Wang, Xue Yang, Junwei Luo, Qi Wang, Youming Deng, Wenbin Wang, Xian Sun, Haifeng Li, Bo Dang, Yongjun Zhang, Yi Yu, Junchi Yan

    Abstract: Scene graph generation (SGG) in satellite imagery (SAI) benefits promoting understanding of geospatial scenarios from perception to cognition. In SAI, objects exhibit great variations in scales and aspect ratios, and there exist rich relationships between objects (even between spatially disjoint objects), which makes it attractive to holistically conduct SGG in large-size very-high-resolution (VHR… ▽ More

    Submitted 3 July, 2024; v1 submitted 13 June, 2024; originally announced June 2024.

    Comments: 18 pages, 11 figures

  12. arXiv:2406.08464  [pdf, other

    cs.CL cs.AI

    Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

    Authors: Zhangchen Xu, Fengqing Jiang, Luyao Niu, Yuntian Deng, Radha Poovendran, Yejin Choi, Bill Yuchen Lin

    Abstract: High-quality instruction data is critical for aligning large language models (LLMs). Although some models, such as Llama-3-Instruct, have open weights, their alignment data remain private, which hinders the democratization of AI. High human labor costs and a limited, predefined scope for prompting prevent existing open-source data creation methods from scaling effectively, potentially limiting the… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: Link: https://magpie-align.github.io/

  13. arXiv:2406.08184  [pdf, other

    cs.AI cs.HC

    MobileAgentBench: An Efficient and User-Friendly Benchmark for Mobile LLM Agents

    Authors: Luyuan Wang, Yongyu Deng, Yiwei Zha, Guodong Mao, Qinmin Wang, Tianchen Min, Wei Chen, Shoufa Chen

    Abstract: Large language model (LLM)-based mobile agents are increasingly popular due to their capability to interact directly with mobile phone Graphic User Interfaces (GUIs) and their potential to autonomously manage daily tasks. Despite their promising prospects in both academic and industrial sectors, little research has focused on benchmarking the performance of existing mobile agents, due to the inexh… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  14. arXiv:2406.08009  [pdf, other

    cs.CV cs.AI cs.RO

    OpenObj: Open-Vocabulary Object-Level Neural Radiance Fields with Fine-Grained Understanding

    Authors: Yinan Deng, Jiahui Wang, Jingyu Zhao, Jianyu Dou, Yi Yang, Yufeng Yue

    Abstract: In recent years, there has been a surge of interest in open-vocabulary 3D scene reconstruction facilitated by visual language models (VLMs), which showcase remarkable capabilities in open-set retrieval. However, existing methods face some limitations: they either focus on learning point-wise features, resulting in blurry semantic understanding, or solely tackle object-level reconstruction, thereby… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: 8 pages, 7figures. Project Url: https://openobj.github.io/

  15. arXiv:2406.06565  [pdf, other

    cs.CL cs.AI cs.LG

    MixEval: Deriving Wisdom of the Crowd from LLM Benchmark Mixtures

    Authors: Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, Yang You

    Abstract: Evaluating large language models (LLMs) is challenging. Traditional ground-truth-based benchmarks fail to capture the comprehensiveness and nuance of real-world queries, while LLM-as-judge benchmarks suffer from grading biases and limited query quantity. Both of them may also become contaminated over time. User-facing evaluation, such as Chatbot Arena, provides reliable signals but is costly and s… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  16. arXiv:2406.05925  [pdf, other

    cs.CL cs.AI

    Hello Again! LLM-powered Personalized Agent for Long-term Dialogue

    Authors: Hao Li, Chenghao Yang, An Zhang, Yang Deng, Xiang Wang, Tat-Seng Chua

    Abstract: Open-domain dialogue systems have seen remarkable advancements with the development of large language models (LLMs). Nonetheless, most existing dialogue systems predominantly focus on brief single-session interactions, neglecting the real-world demands for long-term companionship and personalized interactions with chatbots. Crucial to addressing this real-world need are event summary and persona m… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

    Comments: 17 pages, 4 figures

  17. arXiv:2406.05410  [pdf, other

    cs.AI cs.CL

    MLLM-SR: Conversational Symbolic Regression base Multi-Modal Large Language Models

    Authors: Yanjie Li, Weijun Li, Lina Yu, Min Wu, Jingyi Liu, Wenqiang Li, Shu Wei, Yusong Deng

    Abstract: Formulas are the language of communication between humans and nature. It is an important research topic of artificial intelligence to find expressions from observed data to reflect the relationship between each variable in the data, which is called a symbolic regression problem. The existing symbolic regression methods directly generate expressions according to the given observation data, and we c… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

    Comments: 13 pages,

  18. arXiv:2406.04770  [pdf, other

    cs.CL cs.AI

    WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

    Authors: Bill Yuchen Lin, Yuntian Deng, Khyathi Chandu, Faeze Brahman, Abhilasha Ravichander, Valentina Pyatkin, Nouha Dziri, Ronan Le Bras, Yejin Choi

    Abstract: We introduce WildBench, an automated evaluation framework designed to benchmark large language models (LLMs) using challenging, real-world user queries. WildBench consists of 1,024 tasks carefully selected from over one million human-chatbot conversation logs. For automated evaluation with WildBench, we have developed two metrics, WB-Reward and WB-Score, which are computable using advanced LLMs su… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Link: https://hf.co/spaces/allenai/WildBench

  19. arXiv:2406.04603  [pdf, ps, other

    cs.CV

    Simplify Implant Depth Prediction as Video Grounding: A Texture Perceive Implant Depth Prediction Network

    Authors: Xinquan Yang, Xuguang Li, Xiaoling Luo, Leilei Zeng, Yudi Zhang, Linlin Shen, Yongqiang Deng

    Abstract: Surgical guide plate is an important tool for the dental implant surgery. However, the design process heavily relies on the dentist to manually simulate the implant angle and depth. When deep neural networks have been applied to assist the dentist quickly locates the implant position, most of them are not able to determine the implant depth. Inspired by the video grounding task which localizes the… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Journal ref: MICCAI'2024

  20. arXiv:2406.04277  [pdf, other

    cs.CV

    VideoTetris: Towards Compositional Text-to-Video Generation

    Authors: Ye Tian, Ling Yang, Haotian Yang, Yuan Gao, Yufan Deng, Jingmin Chen, Xintao Wang, Zhaochen Yu, Xin Tao, Pengfei Wan, Di Zhang, Bin Cui

    Abstract: Diffusion models have demonstrated great success in text-to-video (T2V) generation. However, existing methods may face challenges when handling complex (long) video generation scenarios that involve multiple objects or dynamic changes in object numbers. To address these limitations, we propose VideoTetris, a novel framework that enables compositional T2V generation. Specifically, we propose spatio… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Code: https://github.com/YangLing0818/VideoTetris

  21. arXiv:2406.03714  [pdf, other

    cs.SD eess.AS

    Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining

    Authors: Jinlong Xue, Yayue Deng, Yingming Gao, Ya Li

    Abstract: Recent prompt-based text-to-speech (TTS) models can clone an unseen speaker using only a short speech prompt. They leverage a strong in-context ability to mimic the speech prompts, including speaker style, prosody, and emotion. Therefore, the selection of a speech prompt greatly influences the generated speech, akin to the importance of a prompt in large language models (LLMs). However, current pr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  22. arXiv:2406.03706  [pdf, other

    cs.SD cs.CL eess.AS

    Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model

    Authors: Jinlong Xue, Yayue Deng, Yicheng Han, Yingming Gao, Ya Li

    Abstract: Recent advances in large language models (LLMs) and development of audio codecs greatly propel the zero-shot TTS. They can synthesize personalized speech with only a 3-second speech of an unseen speaker as acoustic prompt. However, they only support short speech prompts and cannot leverage longer context information, as required in audiobook and conversational TTS scenarios. In this paper, we intr… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by Interspeech 2024

  23. arXiv:2406.00800  [pdf, other

    cs.LG cs.AI

    MagR: Weight Magnitude Reduction for Enhancing Post-Training Quantization

    Authors: Aozhong Zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, Penghang Yin

    Abstract: In this paper, we present a simple optimization-based preprocessing technique called Weight Magnitude Reduction (MagR) to improve the performance of post-training quantization. For each linear layer, we adjust the pre-trained floating-point weights by solving an $\ell_\infty$-regularized optimization problem. This process greatly diminishes the maximum magnitude of the weights and smooths out outl… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

  24. arXiv:2405.19716  [pdf, other

    cs.CV cs.CL

    Enhancing Large Vision Language Models with Self-Training on Image Comprehension

    Authors: Yihe Deng, Pan Lu, Fan Yin, Ziniu Hu, Sheng Shen, James Zou, Kai-Wei Chang, Wei Wang

    Abstract: Large vision language models (LVLMs) integrate large language models (LLMs) with pre-trained vision encoders, thereby activating the perception capability of the model to understand image inputs for different queries and conduct subsequent reasoning. Improving this capability requires high-quality vision-language data, which is costly and labor-intensive to acquire. Self-training approaches have b… ▽ More

    Submitted 30 May, 2024; originally announced May 2024.

    Comments: 19 pages, 14 figures, 6 tables

  25. arXiv:2405.17790  [pdf, other

    cs.CV

    Instruct-ReID++: Towards Universal Purpose Instruction-Guided Person Re-identification

    Authors: Weizhen He, Yiheng Deng, Yunfeng Yan, Feng Zhu, Yizhou Wang, Lei Bai, Qingsong Xie, Donglian Qi, Wanli Ouyang, Shixiang Tang

    Abstract: Human intelligence can retrieve any person according to both visual and language descriptions. However, the current computer vision community studies specific person re-identification (ReID) tasks in different scenarios separately, which limits the applications in the real world. This paper strives to resolve this problem by proposing a novel instruct-ReID task that requires the model to retrieve… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2306.07520

  26. arXiv:2405.17474  [pdf, other

    cs.LG cs.AI

    Federated Offline Policy Optimization with Dual Regularization

    Authors: Sheng Yue, Zerui Qin, Xingyuan Hua, Yongheng Deng, Ju Ren

    Abstract: Federated Reinforcement Learning (FRL) has been deemed as a promising solution for intelligent decision-making in the era of Artificial Internet of Things. However, existing FRL approaches often entail repeated interactions with the environment during local updating, which can be prohibitively expensive or even infeasible in many real-world domains. To overcome this challenge, this paper proposes… ▽ More

    Submitted 28 May, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

    Comments: IEEE International Conference on Computer Communications (INFOCOM)

  27. arXiv:2405.17457  [pdf, other

    cs.CV cs.DC cs.LG

    Data-Free Federated Class Incremental Learning with Diffusion-Based Generative Memory

    Authors: Naibo Wang, Yuchen Deng, Wenjie Feng, Jianwei Yin, See-Kiong Ng

    Abstract: Federated Class Incremental Learning (FCIL) is a critical yet largely underexplored issue that deals with the dynamic incorporation of new classes within federated learning (FL). Existing methods often employ generative adversarial networks (GANs) to produce synthetic images to address privacy concerns in FL. However, GANs exhibit inherent instability and high sensitivity, compromising the effecti… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

  28. arXiv:2405.14838  [pdf, other

    cs.CL cs.AI cs.LG

    From Explicit CoT to Implicit CoT: Learning to Internalize CoT Step by Step

    Authors: Yuntian Deng, Yejin Choi, Stuart Shieber

    Abstract: When leveraging language models for reasoning tasks, generating explicit chain-of-thought (CoT) steps often proves essential for achieving high accuracy in final outputs. In this paper, we investigate if models can be taught to internalize these CoT steps. To this end, we propose a simple yet effective method for internalizing CoT steps: starting with a model trained for explicit CoT reasoning, we… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  29. arXiv:2405.14620  [pdf, other

    cs.LG

    Closed-form Symbolic Solutions: A New Perspective on Solving Partial Differential Equations

    Authors: Shu Wei, Yanjie Li, Lina Yu, Min Wu, Weijun Li, Meilan Hao, Wenqiang Li, Jingyi Liu, Yusong Deng

    Abstract: Solving partial differential equations (PDEs) in Euclidean space with closed-form symbolic solutions has long been a dream for mathematicians. Inspired by deep learning, Physics-Informed Neural Networks (PINNs) have shown great promise in numerically solving PDEs. However, since PINNs essentially approximate solutions within the continuous function space, their numerical solutions fall short in bo… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  30. arXiv:2405.13820  [pdf, other

    cs.CL

    Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching

    Authors: Weixiang Zhao, Yulin Hu, Zhuojun Li, Yang Deng, Yanyan Zhao, Bing Qin, Tat-Seng Chua

    Abstract: Safety alignment of large language models (LLMs) has been gaining increasing attention. However, current safety-aligned LLMs suffer from the fragile and imbalanced safety mechanisms, which can still be induced to generate unsafe responses, exhibit over-safety by rejecting safe user inputs, and fail to preserve general utility after safety alignment. To this end, we propose a novel post safety alig… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 24 pages, 8 figures and 12 tables

  31. arXiv:2405.12081  [pdf, other

    cs.CL

    Selective Annotation via Data Allocation: These Data Should Be Triaged to Experts for Annotation Rather Than the Model

    Authors: Chen Huang, Yang Deng, Wenqiang Lei, Jiancheng Lv, Ido Dagan

    Abstract: To obtain high-quality annotations under limited budget, semi-automatic annotation methods are commonly used, where a portion of the data is annotated by experts and a model is then trained to complete the annotations for the remaining data. However, these methods mainly focus on selecting informative data for expert annotations to improve the model predictive ability (i.e., triage-to-human data),… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 18 pages, 4 figures

  32. arXiv:2405.12063  [pdf, other

    cs.CL

    CLAMBER: A Benchmark of Identifying and Clarifying Ambiguous Information Needs in Large Language Models

    Authors: Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua

    Abstract: Large language models (LLMs) are increasingly used to meet user information needs, but their effectiveness in dealing with user queries that contain various types of ambiguity remains unknown, ultimately risking user trust and satisfaction. To this end, we introduce CLAMBER, a benchmark for evaluating LLMs using a well-organized taxonomy. Building upon the taxonomy, we construct ~12K high-quality… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to ACL 2024. Camera Ready. Our dataset is available at https://github.com/zt991211/CLAMBER

  33. arXiv:2405.12059  [pdf, other

    cs.CL

    STYLE: Improving Domain Transferability of Asking Clarification Questions in Large Language Model Powered Conversational Agents

    Authors: Yue Chen, Chen Huang, Yang Deng, Wenqiang Lei, Dingnan Jin, Jia Liu, Tat-Seng Chua

    Abstract: Equipping a conversational search engine with strategies regarding when to ask clarification questions is becoming increasingly important across various domains. Attributing to the context understanding capability of LLMs and their access to domain-specific sources of knowledge, LLM-based clarification strategies feature rapid transfer to various domains in a post-hoc manner. However, they still s… ▽ More

    Submitted 1 June, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

    Comments: Accepted to Findings of ACL 2024. Camera Ready

  34. arXiv:2405.10248  [pdf, other

    cs.HC cs.IR

    Co-Matching: Towards Human-Machine Collaborative Legal Case Matching

    Authors: Chen Huang, Xinwei Yang, Yang Deng, Wenqiang Lei, JianCheng Lv, Tat-Seng Chua

    Abstract: Recent efforts have aimed to improve AI machines in legal case matching by integrating legal domain knowledge. However, successful legal case matching requires the tacit knowledge of legal practitioners, which is difficult to verbalize and encode into machines. This emphasizes the crucial role of involving legal practitioners in high-stakes legal case matching. To address this, we propose a collab… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

    Comments: Draft V1: 23 pages, 7 figures

  35. Eulerian-Lagrangian Fluid Simulation on Particle Flow Maps

    Authors: Junwei Zhou, Duowen Chen, Molin Deng, Yitong Deng, Yuchen Sun, Sinan Wang, Shiying Xiong, Bo Zhu

    Abstract: We propose a novel Particle Flow Map (PFM) method to enable accurate long-range advection for incompressible fluid simulation. The foundation of our method is the observation that a particle trajectory generated in a forward simulation naturally embodies a perfect flow map. Centered on this concept, we have developed an Eulerian-Lagrangian framework comprising four essential components: Lagrangian… ▽ More

    Submitted 15 May, 2024; originally announced May 2024.

  36. arXiv:2405.07459  [pdf, other

    cs.CV

    DualFocus: A Unified Framework for Integrating Positive and Negative Descriptors in Text-based Person Retrieval

    Authors: Yuchuan Deng, Zhanpeng Hu, Jiakun Han, Chuang Deng, Qijun Zhao

    Abstract: Text-based person retrieval (TPR) aims to retrieve images of a person from an extensive array of candidates based on a given textual description. The core challenge lies in mapping visual and textual data into a unified latent space. While existing TPR methods concentrate on recognizing explicit and positive characteristics, they often neglect the critical influence of negative descriptors, result… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

  37. arXiv:2405.06419  [pdf, other

    cs.LG cs.AI cs.NE

    Time Evidence Fusion Network: Multi-source View in Long-Term Time Series Forecasting

    Authors: Tianxiang Zhan, Yuanpeng He, Zhen Li, Yong Deng

    Abstract: In real-world scenarios, time series forecasting often demands timeliness, making research on model backbones a perennially hot topic. To meet these performance demands, we propose a novel backbone from the perspective of information fusion. Introducing the Basic Probability Assignment (BPA) Module and the Time Evidence Fusion Network (TEFN), based on evidence theory, allows us to achieve superior… ▽ More

    Submitted 10 May, 2024; originally announced May 2024.

  38. arXiv:2405.04788  [pdf, other

    cs.CV

    DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector

    Authors: Kaiyu Li, Xiangyong Cao, Yupeng Deng, Junmin Liu, Deyu Meng, Zhi Wang

    Abstract: Change Detection (CD) aims to identify pixels with semantic changes between images. However, annotating massive numbers of pixel-level images is labor-intensive and costly, especially for multi-temporal images, which require pixel-wise comparisons by human experts. Considering the excellent performance of visual language models (VLMs) for zero-shot, open-vocabulary, etc. with prompt-based reasonin… ▽ More

    Submitted 22 June, 2024; v1 submitted 7 May, 2024; originally announced May 2024.

    Comments: 13 pages, 5 figures

  39. arXiv:2405.02653  [pdf, other

    cs.AI

    Isopignistic Canonical Decomposition via Belief Evolution Network

    Authors: Qianli Zhou, Tianxiang Zhan, Yong Deng

    Abstract: Developing a general information processing model in uncertain environments is fundamental for the advancement of explainable artificial intelligence. Dempster-Shafer theory of evidence is a well-known and effective reasoning method for representing epistemic uncertainty, which is closely related to subjective probability theory and possibility theory. Although they can be transformed to each othe… ▽ More

    Submitted 4 May, 2024; originally announced May 2024.

  40. arXiv:2405.01868  [pdf, other

    cs.CL

    Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

    Authors: Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li

    Abstract: This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work,… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: Main paper 8 pages; References and Appendix 9 pages; 7 figures and 14 tables

  41. arXiv:2405.01470  [pdf, other

    cs.CL

    WildChat: 1M ChatGPT Interaction Logs in the Wild

    Authors: Wenting Zhao, Xiang Ren, Jack Hessel, Claire Cardie, Yejin Choi, Yuntian Deng

    Abstract: Chatbots such as GPT-4 and ChatGPT are now serving millions of users. Despite their widespread use, there remains a lack of public datasets showcasing how these tools are used by a population of users in practice. To bridge this gap, we offered free access to ChatGPT for online users in exchange for their affirmative, consensual opt-in to anonymously collect their chat transcripts and request head… ▽ More

    Submitted 2 May, 2024; originally announced May 2024.

    Comments: accepted by ICLR 2024

  42. arXiv:2405.00603  [pdf, other

    cs.SD eess.AS

    Learning Expressive Disentangled Speech Representations with Soft Speech Units and Adversarial Style Augmentation

    Authors: Yimin Deng, Jianzong Wang, Xulong Zhang, Ning Cheng, Jing Xiao

    Abstract: Voice conversion is the task to transform voice characteristics of source speech while preserving content information. Nowadays, self-supervised representation learning models are increasingly utilized in content extraction. However, in these representations, a lot of hidden speaker information leads to timbre leakage while the prosodic information of hidden units lacks use. To address these issue… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: Accepted by the 2024 International Joint Conference on Neural Networks (IJCNN 2024)

  43. arXiv:2404.19368  [pdf, other

    cs.SE

    Exploring Multi-Lingual Bias of Large Code Models in Code Generation

    Authors: Chaozheng Wang, Zongjie Li, Cuiyun Gao, Wenxuan Wang, Ting Peng, Hailiang Huang, Yuetang Deng, Shuai Wang, Michael R. Lyu

    Abstract: Code generation aims to synthesize code and fulfill functional requirements based on natural language (NL) specifications, which can greatly improve development efficiency. In the era of large language models (LLMs), large code models (LCMs) have been recently proposed to generate source code. LCMs can generate highly feasible solutions for programming problems described in natural language. Despi… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

    Comments: 12 pages

  44. arXiv:2404.18156  [pdf, other

    cs.CV

    Event-based Video Frame Interpolation with Edge Guided Motion Refinement

    Authors: Yuhan Liu, Yongjian Deng, Hao Chen, Bochen Xie, Youfu Li, Zhen Yang

    Abstract: Video frame interpolation, the process of synthesizing intermediate frames between sequential video frames, has made remarkable progress with the use of event cameras. These sensors, with microsecond-level temporal resolution, fill information gaps between frames by providing precise motion cues. However, contemporary Event-Based Video Frame Interpolation (E-VFI) techniques often neglect the fact… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

  45. arXiv:2404.17900  [pdf, other

    cs.CV

    Unsupervised Anomaly Detection via Masked Diffusion Posterior Sampling

    Authors: Di Wu, Shicai Fan, Xue Zhou, Li Yu, Yuzhong Deng, Jianxiao Zou, Baihong Lin

    Abstract: Reconstruction-based methods have been commonly used for unsupervised anomaly detection, in which a normal image is reconstructed and compared with the given test image to detect and locate anomalies. Recently, diffusion models have shown promising applications for anomaly detection due to their powerful generative ability. However, these models lack strict mathematical support for normal image re… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Journal ref: International Joint Conference on Artificial Intelligence 2024

  46. arXiv:2404.16687  [pdf, other

    cs.CV

    NTIRE 2024 Quality Assessment of AI-Generated Content Challenge

    Authors: Xiaohong Liu, Xiongkuo Min, Guangtao Zhai, Chunyi Li, Tengchuan Kou, Wei Sun, Haoning Wu, Yixuan Gao, Yuqin Cao, Zicheng Zhang, Xiele Wu, Radu Timofte, Fei Peng, Huiyuan Fu, Anlong Ming, Chuanming Wang, Huadong Ma, Shuai He, Zifei Dou, Shu Chen, Huacong Zhang, Haiyi Xie, Chengwei Wang, Baoying Chen, Jishen Zeng , et al. (89 additional authors not shown)

    Abstract: This paper reports on the NTIRE 2024 Quality Assessment of AI-Generated Content Challenge, which will be held in conjunction with the New Trends in Image Restoration and Enhancement Workshop (NTIRE) at CVPR 2024. This challenge is to address a major challenge in the field of image and video processing, namely, Image Quality Assessment (IQA) and Video Quality Assessment (VQA) for AI-Generated Conte… ▽ More

    Submitted 7 May, 2024; v1 submitted 25 April, 2024; originally announced April 2024.

  47. arXiv:2404.15458  [pdf, other

    physics.optics cs.LG

    Can Large Language Models Learn the Physics of Metamaterials? An Empirical Study with ChatGPT

    Authors: Darui Lu, Yang Deng, Jordan M. Malof, Willie J. Padilla

    Abstract: Large language models (LLMs) such as ChatGPT, Gemini, LlaMa, and Claude are trained on massive quantities of text parsed from the internet and have shown a remarkable ability to respond to complex prompts in a manner often indistinguishable from humans. We present a LLM fine-tuned on up to 40,000 data that can predict electromagnetic spectra over a range of frequencies given a text prompt that onl… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  48. arXiv:2404.15451  [pdf, other

    cs.CV

    CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection

    Authors: Hongyi Cai, Mohammad Mahdinur Rahman, Jingyu Wu, Yulun Deng

    Abstract: Feature pyramids have been widely adopted in convolutional neural networks (CNNs) and transformers for tasks like medical image segmentation and object detection. However, the currently existing models generally focus on the Encoder-side Transformer to extract features, from which decoder improvement can bring further potential with well-designed architecture. We propose CFPFormer, a novel decoder… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

  49. arXiv:2404.14662  [pdf, other

    cs.LG cs.CL cs.PL cs.SE

    NExT: Teaching Large Language Models to Reason about Code Execution

    Authors: Ansong Ni, Miltiadis Allamanis, Arman Cohan, Yinlin Deng, Kensen Shi, Charles Sutton, Pengcheng Yin

    Abstract: A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck debugging). However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of h… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: 35 pages

  50. arXiv:2404.12670  [pdf, other

    cs.IR cs.CL cs.HC

    Towards Human-centered Proactive Conversational Agents

    Authors: Yang Deng, Lizi Liao, Zhonghua Zheng, Grace Hui Yang, Tat-Seng Chua

    Abstract: Recent research on proactive conversational agents (PCAs) mainly focuses on improving the system's capabilities in anticipating and planning action sequences to accomplish tasks and achieve goals before users articulate their requests. This perspectives paper highlights the importance of moving towards building human-centered PCAs that emphasize human needs and expectations, and that considers eth… ▽ More

    Submitted 19 April, 2024; originally announced April 2024.

    Comments: Accepted by SIGIR 2024 (Perspectives Track)