Skip to main content

Showing 1–50 of 156 results for author: Cheng, D

  1. arXiv:2406.14491  [pdf, other

    cs.CL

    Instruction Pre-Training: Language Models are Supervised Multitask Learners

    Authors: Daixuan Cheng, Yuxian Gu, Shaohan Huang, Junyu Bi, Minlie Huang, Furu Wei

    Abstract: Unsupervised multitask pre-training has been the critical method behind the recent success of language models (LMs). However, supervised multitask learning still holds significant promise, as scaling it in the post-training stage trends towards better generalization. In this paper, we explore supervised multitask pre-training by proposing Instruction Pre-Training, a framework that scalably augment… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  2. arXiv:2406.08830  [pdf, other

    cs.LG cs.AI

    Center-Sensitive Kernel Optimization for Efficient On-Device Incremental Learning

    Authors: Dingwen Zhang, Yan Li, De Cheng, Nannan Wang, Junwei Han

    Abstract: To facilitate the evolution of edge intelligence in ever-changing environments, we study on-device incremental learning constrained in limited computation resource in this paper. Current on-device training methods just focus on efficient training without considering the catastrophic forgetting, preventing the model getting stronger when continually exploring the world. To solve this problem, a dir… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

  3. arXiv:2406.05658  [pdf, other

    cs.CV cs.AI

    Visual Prompt Tuning in Null Space for Continual Learning

    Authors: Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang

    Abstract: Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features, so as to ensure no interference on tasks that have been learned to… ▽ More

    Submitted 10 June, 2024; v1 submitted 9 June, 2024; originally announced June 2024.

    Comments: 20 pages, 10 figures

  4. arXiv:2406.03751  [pdf, other

    cs.LG

    Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting

    Authors: Yifan Hu, Peiyuan Liu, Peng Zhu, Dawei Cheng, Tao Dai

    Abstract: Transformer-based and MLP-based methods have emerged as leading approaches in time series forecasting (TSF). While Transformer-based methods excel in capturing long-range dependencies, they suffer from high computational complexities and tend to overfit. Conversely, MLP-based methods offer computational efficiency and adeptness in modeling temporal dynamics, but they struggle with capturing comple… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  5. arXiv:2405.14622  [pdf, other

    cs.LG cs.CL cs.CV

    Calibrated Self-Rewarding Vision Language Models

    Authors: Yiyang Zhou, Zhiyuan Fan, Dongjie Cheng, Sihan Yang, Zhaorun Chen, Chenhang Cui, Xiyao Wang, Yun Li, Linjun Zhang, Huaxiu Yao

    Abstract: Large Vision-Language Models (LVLMs) have made substantial progress by integrating pre-trained large language models (LLMs) and vision models through instruction tuning. Despite these advancements, LVLMs often exhibit the hallucination phenomenon, where generated text responses appear linguistically plausible but contradict the input image, indicating a misalignment between image and text pairs. T… ▽ More

    Submitted 31 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: fix some typos and add acknowledgement section in V3

  6. arXiv:2404.11825  [pdf, other

    cs.LG

    Hypergraph Self-supervised Learning with Sampling-efficient Signals

    Authors: Fan Li, Xiaoyang Wang, Dawei Cheng, Wenjie Zhang, Ying Zhang, Xuemin Lin

    Abstract: Self-supervised learning (SSL) provides a promising alternative for representation learning on hypergraphs without costly labels. However, existing hypergraph SSL models are mostly based on contrastive methods with the instance-level discrimination strategy, suffering from two significant limitations: (1) They select negative samples arbitrarily, which is unreliable in deciding similar and dissimi… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

    Comments: 9 pages,4 figures,4 tables

  7. arXiv:2404.03259  [pdf, ps, other

    cs.CL cs.AI

    Enhancing the Performance of Aspect-Based Sentiment Analysis Systems

    Authors: Chen Li, Huidong Tang, Peng Ju, Debo Cheng, Yasuhiko Morimoto

    Abstract: Aspect-based sentiment analysis aims to predict sentiment polarity with fine granularity. While Graph Convolutional Networks (GCNs) are widely utilized for sentimental feature extraction, their naive application for syntactic feature extraction can compromise information preservation. This study introduces an innovative edge-enhanced GCN, named SentiSys, to navigate the syntactic graph while prese… ▽ More

    Submitted 19 April, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  8. arXiv:2404.03254  [pdf, ps, other

    cs.DC

    Mining Area Skyline Objects from Map-based Big Data using Apache Spark Framework

    Authors: Chen Li, Ye Zhu, Yang Cao, Jinli Zhang, Annisa Annisa, Debo Cheng, Yasuhiko Morimoto

    Abstract: The computation of the skyline provides a mechanism for utilizing multiple location-based criteria to identify optimal data points. However, the efficiency of these computations diminishes and becomes more challenging as the input data expands. This study presents a novel algorithm aimed at mitigating this challenge by harnessing the capabilities of Apache Spark, a distributed processing platform,… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  9. arXiv:2403.17458  [pdf, ps, other

    cs.CR cs.LG

    Expectations Versus Reality: Evaluating Intrusion Detection Systems in Practice

    Authors: Jake Hesford, Daniel Cheng, Alan Wan, Larry Huynh, Seungho Kim, Hyoungshick Kim, Jin B. Hong

    Abstract: Our paper provides empirical comparisons between recent IDSs to provide an objective comparison between them to help users choose the most appropriate solution based on their requirements. Our results show that no one solution is the best, but is dependent on external variables such as the types of attacks, complexity, and network environment in the dataset. For example, BoT_IoT and Stratosphere I… ▽ More

    Submitted 28 March, 2024; v1 submitted 26 March, 2024; originally announced March 2024.

    Comments: 10 pages

    MSC Class: 68M25; 68M20 ACM Class: C.4; D.m

  10. arXiv:2403.12865  [pdf, other

    cs.RO

    PE-Planner: A Performance-Enhanced Quadrotor Motion Planner for Autonomous Flight in Complex and Dynamic Environments

    Authors: Jiaxin Qiu, Qingchen Liu, Jiahu Qin, Dewang Cheng, Yawei Tian, Qichao Ma

    Abstract: The role of a motion planner is pivotal in quadrotor applications, yet existing methods often struggle to adapt to complex environments, limiting their ability to achieve fast, safe, and robust flight. In this letter, we introduce a performance-enhanced quadrotor motion planner designed for autonomous flight in complex environments including dense obstacles, dynamic obstacles, and unknown disturba… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

  11. arXiv:2403.10339  [pdf, other

    cs.LG

    Generation is better than Modification: Combating High Class Homophily Variance in Graph Anomaly Detection

    Authors: Rui Zhang, Dawei Cheng, Xin Liu, Jie Yang, Yi Ouyang, Xian Wu, Yefeng Zheng

    Abstract: Graph-based anomaly detection is currently an important research topic in the field of graph neural networks (GNNs). We find that in graph anomaly detection, the homophily distribution differences between different classes are significantly greater than those in homophilic and heterophilic graphs. For the first time, we introduce a new metric called Class Homophily Variance, which quantitatively d… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  12. arXiv:2403.07292  [pdf, other

    cs.CV cs.AI

    Continual All-in-One Adverse Weather Removal with Knowledge Replay on a Unified Network Structure

    Authors: De Cheng, Yanling Ji, Dong Gong, Yan Li, Nannan Wang, Junwei Han, Dingwen Zhang

    Abstract: In real-world applications, image degeneration caused by adverse weather is always complex and changes with different weather conditions from days and seasons. Systems in real-world environments constantly encounter adverse weather conditions that are not previously observed. Therefore, it practically requires adverse weather removal models to continually learn from incrementally collected data re… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

  13. arXiv:2403.06107  [pdf, other

    cs.CV

    Textureless Object Recognition: An Edge-based Approach

    Authors: Frincy Clement, Kirtan Shah, Dhara Pancholi, Gabriel Lugo Bustillo, Dr. Irene Cheng

    Abstract: Textureless object recognition has become a significant task in Computer Vision with the advent of Robotics and its applications in manufacturing sector. It has been challenging to obtain good accuracy in real time because of its lack of discriminative features and reflectance properties which makes the techniques for textured object recognition insufficient for textureless objects. A lot of work… ▽ More

    Submitted 10 March, 2024; originally announced March 2024.

    Comments: arXiv admin note: text overlap with arXiv:1910.14255

  14. arXiv:2402.15759  [pdf

    cs.CV cs.AI

    Increasing SAM Zero-Shot Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

    Authors: Zekun Jiang, Dongjie Cheng, Ziyuan Qin, Jun Gao, Qicheng Lao, Kang Li, Le Zhang

    Abstract: This study develops and evaluates a novel multimodal medical image zero-shot segmentation algorithm named Text-Visual-Prompt SAM (TV-SAM) without any manual annotations. TV-SAM incorporates and integrates large language model GPT-4, Vision Language Model GLIP, and Segment Anything Model (SAM), to autonomously generate descriptive text prompts and visual bounding box prompts from medical images, th… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

    Comments: 12 pages, 4 figures, 4 tables

  15. arXiv:2402.09668  [pdf, other

    cs.LG cs.AI cs.CL

    How to Train Data-Efficient LLMs

    Authors: Noveen Sachdeva, Benjamin Coleman, Wang-Cheng Kang, Jianmo Ni, Lichan Hong, Ed H. Chi, James Caverlee, Julian McAuley, Derek Zhiyuan Cheng

    Abstract: The training of large language models (LLMs) is expensive. In this paper, we study data-efficient approaches for pre-training LLMs, i.e., techniques that aim to optimize the Pareto frontier of model quality and training resource/data consumption. We seek to understand the tradeoffs associated with data selection routines based on (i) expensive-to-compute data-quality estimates, and (ii) maximizati… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

    Comments: Under review. 44 pages, 30 figures

  16. arXiv:2402.06854  [pdf, other

    cs.CV cs.GR cs.LG

    Gyroscope-Assisted Motion Deblurring Network

    Authors: Simin Luan, Cong Yang, Zeyd Boukhers, Xue Qin, Dongfeng Cheng, Wei Sui, Zhijun Li

    Abstract: Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of pixel-aligned training triplets (background, blurred image, and blur heat map) and restricted information inherent in blurred images. This paper presents a simple yet efficient framework to synthetic a… ▽ More

    Submitted 9 February, 2024; originally announced February 2024.

  17. arXiv:2402.04852  [pdf, other

    cs.LG

    Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning

    Authors: Yuxuan Bian, Xuan Ju, Jiangtong Li, Zhijian Xu, Dawei Cheng, Qiang Xu

    Abstract: In this study, we present aLLM4TS, an innovative framework that adapts Large Language Models (LLMs) for time-series representation learning. Central to our approach is that we reconceive time-series forecasting as a self-supervised, multi-patch prediction task, which, compared to traditional contrastive learning or mask-and-reconstruction methods, captures temporal dynamics in patch representation… ▽ More

    Submitted 9 March, 2024; v1 submitted 7 February, 2024; originally announced February 2024.

  18. arXiv:2402.04141  [pdf, other

    cs.SE cs.AI

    Multi-line AI-assisted Code Authoring

    Authors: Omer Dunay, Daniel Cheng, Adam Tait, Parth Thakkar, Peter C Rigby, Andy Chiu, Imad Ahmad, Arun Ganesan, Chandra Maddila, Vijayaraghavan Murali, Ali Tayyebi, Nachiappan Nagappan

    Abstract: CodeCompose is an AI-assisted code authoring tool powered by large language models (LLMs) that provides inline suggestions to 10's of thousands of developers at Meta. In this paper, we present how we scaled the product from displaying single-line suggestions to multi-line suggestions. This evolution required us to overcome several unique challenges in improving the usability of these suggestions f… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  19. arXiv:2402.01242  [pdf, other

    cs.LG

    Two Heads Are Better Than One: Boosting Graph Sparse Training via Semantic and Topological Awareness

    Authors: Guibin Zhang, Yanwei Yue, Kun Wang, Junfeng Fang, Yongduo Sui, Kai Wang, Yuxuan Liang, Dawei Cheng, Shirui Pan, Tianlong Chen

    Abstract: Graph Neural Networks (GNNs) excel in various graph learning tasks but face computational challenges when applied to large-scale graphs. A promising solution is to remove non-essential edges to reduce the computational overheads in GNN. Previous literature generally falls into two categories: topology-guided and semantic-guided. The former maintains certain graph topological properties yet often u… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

  20. arXiv:2402.00672  [pdf, other

    cs.CV cs.AI

    Exploring Homogeneous and Heterogeneous Consistent Label Associations for Unsupervised Visible-Infrared Person ReID

    Authors: Lingfeng He, De Cheng, Nannan Wang, Xinbo Gao

    Abstract: Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to retrieve pedestrian images of the same identity from different modalities without annotations. While prior work focuses on establishing cross-modality pseudo-label associations to bridge the modality-gap, they ignore maintaining the instance-level homogeneous and heterogeneous consistency in pseudo-label space, resulting… ▽ More

    Submitted 4 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

  21. arXiv:2312.07175  [pdf, other

    cs.LG cs.AI stat.ME

    Instrumental Variable Estimation for Causal Inference in Longitudinal Data with Time-Dependent Latent Confounders

    Authors: Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Wentao Gao, Thuc Duy Le

    Abstract: Causal inference from longitudinal observational data is a challenging problem due to the difficulty in correctly identifying the time-dependent confounders, especially in the presence of latent time-dependent confounders. Instrumental variable (IV) is a powerful tool for addressing the latent confounders issue, but the traditional IV technique cannot deal with latent time-dependent confounders in… ▽ More

    Submitted 12 December, 2023; originally announced December 2023.

    Comments: 13 pages, 7 figures and 3 tables

  22. arXiv:2312.06323  [pdf, other

    cs.CV

    Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models

    Authors: Yubin Wang, Xinyang Jiang, De Cheng, Dongsheng Li, Cairong Zhao

    Abstract: Prompt learning has become a prevalent strategy for adapting vision-language foundation models to downstream tasks. As large language models (LLMs) have emerged, recent studies have explored the use of category-related descriptions as input to enhance prompt effectiveness. Nevertheless, conventional descriptions fall short of structured information that effectively represents the interconnections… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

    Comments: AAAI2024

  23. arXiv:2312.05404  [pdf, other

    cs.LG cs.AI stat.ME

    Disentangled Latent Representation Learning for Tackling the Confounding M-Bias Problem in Causal Inference

    Authors: Debo Cheng, Yang Xie, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Yinghao Zhang, Zaiwen Feng

    Abstract: In causal inference, it is a fundamental task to estimate the causal effect from observational data. However, latent confounders pose major challenges in causal inference in observational data, for example, confounding bias and M-bias. Recent data-driven causal effect estimators tackle the confounding bias problem via balanced representation learning, but assume no M-bias in the system, thus they… ▽ More

    Submitted 8 December, 2023; originally announced December 2023.

    Comments: 10 pages, 3 figures and 5 tables. Accepted by ICDM2023

  24. arXiv:2312.02483  [pdf, other

    cs.CV

    EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model

    Authors: Guozhang Li, Xinpeng Ding, De Cheng, Jie Li, Nannan Wang, Xinbo Gao

    Abstract: Early weakly supervised video grounding (WSVG) methods often struggle with incomplete boundary detection due to the absence of temporal boundary annotations. To bridge the gap between video-level and boundary-level annotation, explicit-supervision methods, i.e., generating pseudo-temporal boundaries for training, have achieved great success. However, data augmentations in these methods might disru… ▽ More

    Submitted 6 March, 2024; v1 submitted 4 December, 2023; originally announced December 2023.

  25. arXiv:2311.08593  [pdf, other

    cs.CL cs.IR

    ACID: Abstractive, Content-Based IDs for Document Retrieval with Language Models

    Authors: Haoxin Li, Phillip Keung, Daniel Cheng, Jungo Kasai, Noah A. Smith

    Abstract: Generative retrieval (Wang et al., 2022; Tay et al., 2022) is a new approach for end-to-end document retrieval that directly generates document identifiers given an input query. Techniques for designing effective, high-quality document IDs remain largely unexplored. We introduce ACID, in which each document's ID is composed of abstractive keyphrases generated by a large language model, rather than… ▽ More

    Submitted 14 November, 2023; originally announced November 2023.

  26. arXiv:2311.08430  [pdf, other

    cs.LG cs.AI cs.IR

    Rankitect: Ranking Architecture Search Battling World-class Engineers at Meta Scale

    Authors: Wei Wen, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Hang Yin, Weiwei Chu, Kaveh Hassani, Mengying Sun, Jiang Liu, Xu Wang, Lin Jiang, Yuxin Chen, Buyun Zhang, Xi Liu, Dehua Cheng, Zhengxing Chen, Guang Zhao, Fangqiu Han, Jiyan Yang, Yuchen Hao, Liang Xiong, Wen-Yen Chen

    Abstract: Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and potential for ranking systems. However, prior work focused on academic problems, which are evaluated at small scale under well-controlled fixed baselines. In industry system, such as ranking system in Meta, it is unclear whether NAS algorithms from the literature can outperform production baselines because of: (1… ▽ More

    Submitted 13 November, 2023; originally announced November 2023.

    Comments: Wei Wen and Kuang-Hung Liu contribute equally

  27. arXiv:2311.06761  [pdf, other

    cs.CL

    Learning Knowledge-Enhanced Contextual Language Representations for Domain Natural Language Understanding

    Authors: Ruyao Xu, Taolin Zhang, Chengyu Wang, Zhongjie Duan, Cen Chen, Minghui Qiu, Dawei Cheng, Xiaofeng He, Weining Qian

    Abstract: Knowledge-Enhanced Pre-trained Language Models (KEPLMs) improve the performance of various downstream NLP tasks by injecting knowledge facts from large-scale Knowledge Graphs (KGs). However, existing methods for pre-training KEPLMs with relational triples are difficult to be adapted to close domains due to the lack of sufficient domain graph semantics. In this paper, we propose a Knowledge-enhance… ▽ More

    Submitted 12 November, 2023; originally announced November 2023.

    Comments: emnlp 2023

  28. arXiv:2311.05812  [pdf, other

    cs.CL

    CFBenchmark: Chinese Financial Assistant Benchmark for Large Language Model

    Authors: Yang Lei, Jiangtong Li, Dawei Cheng, Zhijun Ding, Changjun Jiang

    Abstract: Large language models (LLMs) have demonstrated great potential in the financial domain. Thus, it becomes important to assess the performance of LLMs in the financial tasks. In this work, we introduce CFBenchmark, to evaluate the performance of LLMs for Chinese financial assistant. The basic version of CFBenchmark is designed to evaluate the basic ability in Chinese financial text processing from t… ▽ More

    Submitted 21 May, 2024; v1 submitted 9 November, 2023; originally announced November 2023.

    Comments: 12 pages, 4 figures

  29. arXiv:2310.09983  [pdf, other

    cs.LG cs.AI cs.CL cs.IR

    Farzi Data: Autoregressive Data Distillation

    Authors: Noveen Sachdeva, Zexue He, Wang-Cheng Kang, Jianmo Ni, Derek Zhiyuan Cheng, Julian McAuley

    Abstract: We study data distillation for auto-regressive machine learning tasks, where the input and output have a strict left-to-right causal structure. More specifically, we propose Farzi, which summarizes an event sequence dataset into a small number of synthetic sequences -- Farzi Data -- which are optimized to maintain (if not improve) model performance compared to training on the full dataset. Under t… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

    Comments: Under review. 23 pages, 9 figures

  30. arXiv:2310.09762  [pdf, other

    cs.CL cs.AI

    Diversifying the Mixture-of-Experts Representation for Language Models with Orthogonal Optimizer

    Authors: Boan Liu, Liang Ding, Li Shen, Keqin Peng, Yu Cao, Dazhao Cheng, Dacheng Tao

    Abstract: The Mixture of Experts (MoE) has emerged as a highly successful technique in deep learning, based on the principle of divide-and-conquer to maximize model capacity without significant additional computational cost. Even in the era of large-scale language models (LLMs), MoE continues to play a crucial role, as some researchers have indicated that GPT-4 adopts the MoE structure to ensure diverse inf… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.

  31. arXiv:2310.02162  [pdf, other

    cs.RO

    TreeScope: An Agricultural Robotics Dataset for LiDAR-Based Mapping of Trees in Forests and Orchards

    Authors: Derek Cheng, Fernando Cladera Ojeda, Ankit Prabhu, Xu Liu, Alan Zhu, Patrick Corey Green, Reza Ehsani, Pratik Chaudhari, Vijay Kumar

    Abstract: Data collection for forestry, timber, and agriculture currently relies on manual techniques which are labor-intensive and time-consuming. We seek to demonstrate that robotics offers improvements over these techniques and accelerate agricultural research, beginning with semantic segmentation and diameter estimation of trees in forests and orchards. We present TreeScope v1.0, the first robotics data… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: Submitted to 2024 IEEE International Conference on Robotics and Automation (ICRA 2024) for review

  32. arXiv:2310.01937  [pdf, other

    cs.LG

    Causal Inference with Conditional Front-Door Adjustment and Identifiable Variational Autoencoder

    Authors: Ziqi Xu, Debo Cheng, Jiuyong Li, Jixue Liu, Lin Liu, Kui Yu

    Abstract: An essential and challenging problem in causal inference is causal effect estimation from observational data. The problem becomes more difficult with the presence of unobserved confounding variables. The front-door adjustment is a practical approach for dealing with unobserved confounding variables. However, the restriction for the standard front-door adjustment is difficult to satisfy in practice… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

  33. arXiv:2310.01865  [pdf, other

    cs.LG cs.AI

    Conditional Instrumental Variable Regression with Representation Learning for Causal Inference

    Authors: Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Jixue Liu, Thuc Duy Le

    Abstract: This paper studies the challenging problem of estimating causal effects from observational data, in the presence of unobserved confounders. The two-stage least square (TSLS) method and its variants with a standard instrumental variable (IV) are commonly used to eliminate confounding bias, including the bias caused by unobserved confounders, but they rely on the linearity assumption. Besides, the s… ▽ More

    Submitted 3 October, 2023; originally announced October 2023.

    Comments: 17pages, 3 figures and 6 tables

  34. arXiv:2309.11932  [pdf, other

    cs.LG

    A Machine Learning-oriented Survey on Tiny Machine Learning

    Authors: Luigi Capogrosso, Federico Cunico, Dong Seon Cheng, Franco Fummi, Marco Cristani

    Abstract: The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused… ▽ More

    Submitted 26 September, 2023; v1 submitted 21 September, 2023; originally announced September 2023.

    Comments: Article currently under review at IEEE Access

  35. arXiv:2309.10654  [pdf, other

    cs.CL cs.AI cs.CE

    CFGPT: Chinese Financial Assistant with Large Language Model

    Authors: Jiangtong Li, Yuxuan Bian, Guoxuan Wang, Yang Lei, Dawei Cheng, Zhijun Ding, Changjun Jiang

    Abstract: Large language models (LLMs) have demonstrated great potential in natural language processing tasks within the financial domain. In this work, we present a Chinese Financial Generative Pre-trained Transformer framework, named CFGPT, which includes a dataset~(CFData) for pre-training and supervised fine-tuning, a financial LLM~(CFLLM) to adeptly manage financial texts, and a deployment framework~(C… ▽ More

    Submitted 22 September, 2023; v1 submitted 19 September, 2023; originally announced September 2023.

    Comments: 12 pages, 5 figures

  36. arXiv:2309.09530  [pdf, other

    cs.CL

    Adapting Large Language Models via Reading Comprehension

    Authors: Daixuan Cheng, Shaohan Huang, Furu Wei

    Abstract: We explore how continued pre-training on domain-specific corpora influences large language models, revealing that training on the raw corpora endows the model with domain knowledge, but drastically hurts its prompting ability for question answering. Taken inspiration from human learning via reading comprehension--practice after reading improves the ability to answer questions based on the learned… ▽ More

    Submitted 14 July, 2024; v1 submitted 18 September, 2023; originally announced September 2023.

    Comments: ICLR 2024 Conference

  37. arXiv:2309.04389  [pdf, other

    cs.CL cs.CE

    CSPRD: A Financial Policy Retrieval Dataset for Chinese Stock Market

    Authors: Jinyuan Wang, Hai Zhao, Zhong Wang, Zeyang Zhu, Jinhao Xie, Yong Yu, Yongjian Fei, Yue Huang, Dawei Cheng

    Abstract: In recent years, great advances in pre-trained language models (PLMs) have sparked considerable research focus and achieved promising performance on the approach of dense passage retrieval, which aims at retrieving relative passages from massive corpus with given questions. However, most of existing datasets mainly benchmark the models with factoid queries of general commonsense, while specialised… ▽ More

    Submitted 11 September, 2023; v1 submitted 8 September, 2023; originally announced September 2023.

  38. Ground-to-Aerial Person Search: Benchmark Dataset and Approach

    Authors: Shizhou Zhang, Qingchun Yang, De Cheng, Yinghui Xing, Guoqiang Liang, Peng Wang, Yanning Zhang

    Abstract: In this work, we construct a large-scale dataset for Ground-to-Aerial Person Search, named G2APS, which contains 31,770 images of 260,559 annotated bounding boxes for 2,644 identities appearing in both of the UAVs and ground surveillance cameras. To our knowledge, this is the first dataset for cross-platform intelligent surveillance applications, where the UAVs could work as a powerful complement… ▽ More

    Submitted 24 August, 2023; originally announced August 2023.

    Comments: Accepted by ACM MM 2023

    ACM Class: I.5.4; I.4.8

  39. arXiv:2307.13949  [pdf, other

    cs.CL cs.AI

    How Does Diffusion Influence Pretrained Language Models on Out-of-Distribution Data?

    Authors: Huazheng Wang, Daixuan Cheng, Haifeng Sun, Jingyu Wang, Qi Qi, Jianxin Liao, Jing Wang, Cong Liu

    Abstract: Transformer-based pretrained language models (PLMs) have achieved great success in modern NLP. An important advantage of PLMs is good out-of-distribution (OOD) robustness. Recently, diffusion models have attracted a lot of work to apply diffusion to PLMs. It remains under-explored how diffusion influences PLMs on OOD data. The core of diffusion models is a forward diffusion process which gradually… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: Accepted by ECAI 2023

  40. arXiv:2306.12453  [pdf, other

    cs.LG cs.AI stat.ME

    Learning Conditional Instrumental Variable Representation for Causal Effect Estimation

    Authors: Debo Cheng, Ziqi Xu, Jiuyong Li, Lin Liu, Thuc Duy Le, Jixue Liu

    Abstract: One of the fundamental challenges in causal inference is to estimate the causal effect of a treatment on its outcome of interest from observational data. However, causal effect estimation often suffers from the impacts of confounding bias caused by unmeasured confounders that affect both the treatment and the outcome. The instrumental variable (IV) approach is a powerful way to eliminate the confo… ▽ More

    Submitted 20 June, 2023; originally announced June 2023.

    Comments: Debo Cheng and Ziqi Xu contributed equally. 20 pages, 5 tables, and 3 figures. Accepted at ECML-PKDD2023

  41. arXiv:2306.09588  [pdf, other

    cs.LG

    Understanding the Role of Feedback in Online Learning with Switching Costs

    Authors: Duo Cheng, Xingyu Zhou, Bo Ji

    Abstract: In this paper, we study the role of feedback in online learning with switching costs. It has been shown that the minimax regret is $\widetildeΘ(T^{2/3})$ under bandit feedback and improves to $\widetildeΘ(\sqrt{T})$ under full-information feedback, where $T$ is the length of the time horizon. However, it remains largely unknown how the amount and type of feedback generally impact regret. To this e… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

    Comments: Accepted to ICML 2023

  42. arXiv:2306.00041  [pdf, other

    q-bio.QM cs.LG

    Causal Intervention for Measuring Confidence in Drug-Target Interaction Prediction

    Authors: Wenting Ye, Chen Li, Yang Xie, Wen Zhang, Hong-Yu Zhang, Bowen Wang, Debo Cheng, Zaiwen Feng

    Abstract: Identifying and discovering drug-target interactions(DTIs) are vital steps in drug discovery and development. They play a crucial role in assisting scientists in finding new drugs and accelerating the drug development process. Recently, knowledge graph and knowledge graph embedding (KGE) models have made rapid advancements and demonstrated impressive performance in drug discovery. However, such mo… ▽ More

    Submitted 14 November, 2023; v1 submitted 31 May, 2023; originally announced June 2023.

  43. arXiv:2305.17386  [pdf, other

    cs.IR cs.LG

    HyperFormer: Learning Expressive Sparse Feature Representations via Hypergraph Transformer

    Authors: Kaize Ding, Albert Jiongqian Liang, Bryan Perrozi, Ting Chen, Ruoxi Wang, Lichan Hong, Ed H. Chi, Huan Liu, Derek Zhiyuan Cheng

    Abstract: Learning expressive representations for high-dimensional yet sparse features has been a longstanding problem in information retrieval. Though recent deep learning methods can partially solve the problem, they often fail to handle the numerous sparse features, particularly those tail feature values with infrequent occurrences in the training data. Worse still, existing methods cannot explicitly lev… ▽ More

    Submitted 27 May, 2023; originally announced May 2023.

    Comments: Accepted by SIGIR 2023

  44. arXiv:2305.12711  [pdf, other

    cs.CV cs.AI

    Unsupervised Visible-Infrared Person ReID by Collaborative Learning with Neighbor-Guided Label Refinement

    Authors: De Cheng, Xiaojian Huang, Nannan Wang, Lingfeng He, Zhihui Li, Xinbo Gao

    Abstract: Unsupervised learning visible-infrared person re-identification (USL-VI-ReID) aims at learning modality-invariant features from unlabeled cross-modality dataset, which is crucial for practical applications in video surveillance systems. The key to essentially address the USL-VI-ReID task is to solve the cross-modality data association problem for further heterogeneous joint learning. To address th… ▽ More

    Submitted 5 January, 2024; v1 submitted 22 May, 2023; originally announced May 2023.

  45. arXiv:2305.12673  [pdf, other

    cs.CV cs.AI

    Efficient Bilateral Cross-Modality Cluster Matching for Unsupervised Visible-Infrared Person ReID

    Authors: De Cheng, Lingfeng He, Nannan Wang, Shizhou Zhang, Zhen Wang, Xinbo Gao

    Abstract: Unsupervised visible-infrared person re-identification (USL-VI-ReID) aims to match pedestrian images of the same identity from different modalities without annotations. Existing works mainly focus on alleviating the modality gap by aligning instance-level features of the unlabeled samples. However, the relationships between cross-modality clusters are not well explored. To this end, we propose a n… ▽ More

    Submitted 25 May, 2024; v1 submitted 21 May, 2023; originally announced May 2023.

  46. arXiv:2305.12102  [pdf, other

    cs.LG cs.IR

    Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

    Authors: Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach, Ruoxi Wang, Lichan Hong, Ed H. Chi, Derek Zhiyuan Cheng

    Abstract: Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions of tokens. The standard approach is to represent each feature value as a d-dimensional embedding, introducing hundreds of billions of parameters for extremely h… ▽ More

    Submitted 14 November, 2023; v1 submitted 20 May, 2023; originally announced May 2023.

    Comments: NeurIPS'23 Spotlight

    Journal ref: Proceedings of the 37th Annual Conference on Neural Information Processing Systems (NeurIPS 2023) 56234-56255

  47. arXiv:2305.12050  [pdf, other

    cs.SE cs.AI

    AI-assisted Code Authoring at Scale: Fine-tuning, deploying, and mixed methods evaluation

    Authors: Vijayaraghavan Murali, Chandra Maddila, Imad Ahmad, Michael Bolin, Daniel Cheng, Negar Ghorbani, Renuka Fernandez, Nachiappan Nagappan, Peter C. Rigby

    Abstract: Generative LLMs have been shown to effectively power AI-based code authoring tools that can suggest entire statements or blocks of code during code authoring. In this paper we present CodeCompose, an AI-assisted code authoring tool developed and deployed at Meta internally. CodeCompose is based on the InCoder LLM that merges generative capabilities with bi-directionality. We have scaled up CodeCom… ▽ More

    Submitted 16 February, 2024; v1 submitted 19 May, 2023; originally announced May 2023.

  48. arXiv:2305.08740  [pdf, other

    q-fin.ST cs.LG q-fin.PM

    Temporal and Heterogeneous Graph Neural Network for Financial Time Series Prediction

    Authors: Sheng Xiang, Dawei Cheng, Chencheng Shang, Ying Zhang, Yuqi Liang

    Abstract: The price movement prediction of stock market has been a classical yet challenging problem, with the attention of both economists and computer scientists. In recent years, graph neural network has significantly improved the prediction performance by employing deep learning on company relations. However, existing relation graphs are usually constructed by handcraft human labeling or nature language… ▽ More

    Submitted 9 May, 2023; originally announced May 2023.

    Comments: 10 pages, 6 figures, ACM CIKM'22, Code: https://github.com/CharlieSCC/alpha/tree/main/alpha/model/THGNN

  49. arXiv:2305.06474  [pdf, other

    cs.IR cs.LG

    Do LLMs Understand User Preferences? Evaluating LLMs On User Rating Prediction

    Authors: Wang-Cheng Kang, Jianmo Ni, Nikhil Mehta, Maheswaran Sathiamoorthy, Lichan Hong, Ed Chi, Derek Zhiyuan Cheng

    Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities in generalizing to new tasks in a zero-shot or few-shot manner. However, the extent to which LLMs can comprehend user preferences based on their previous behavior remains an emerging and still unclear research question. Traditionally, Collaborative Filtering (CF) has been the most effective method for these tasks, predominantl… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

  50. Ripple Knowledge Graph Convolutional Networks For Recommendation Systems

    Authors: Chen Li, Yang Cao, Ye Zhu, Debo Cheng, Chengyuan Li, Yasuhiko Morimoto

    Abstract: Using knowledge graphs to assist deep learning models in making recommendation decisions has recently been proven to effectively improve the model's interpretability and accuracy. This paper introduces an end-to-end deep learning model, named RKGCN, which dynamically analyses each user's preferences and makes a recommendation of suitable items. It combines knowledge graphs on both the item side an… ▽ More

    Submitted 10 April, 2024; v1 submitted 1 May, 2023; originally announced May 2023.

    Journal ref: Machine Intelligence Research, 2024 (https://link.springer.com/article/10.1007/s11633-023-1440-x)