Skip to main content

Showing 1–50 of 280 results for author: Shen, T

  1. arXiv:2406.18406  [pdf, other

    cs.CL cs.AI

    IRCAN: Mitigating Knowledge Conflicts in LLM Generation via Identifying and Reweighting Context-Aware Neurons

    Authors: Dan Shi, Renren Jin, Tianhao Shen, Weilong Dong, Xinwei Wu, Deyi Xiong

    Abstract: It is widely acknowledged that large language models (LLMs) encode a vast reservoir of knowledge after being trained on mass data. Recent studies disclose knowledge conflicts in LLM generation, wherein outdated or incorrect parametric knowledge (i.e., encoded knowledge) contradicts new knowledge provided in the context. To mitigate such knowledge conflicts, we propose a novel framework, IRCAN (Ide… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: 19 pages, 13 figures, 5 tables

  2. arXiv:2406.16989  [pdf, other

    cs.LG cs.AI

    Retrieval-Augmented Mixture of LoRA Experts for Uploadable Machine Learning

    Authors: Ziyu Zhao, Leilei Gan, Guoyin Wang, Yuwei Hu, Tao Shen, Hongxia Yang, Kun Kuang, Fei Wu

    Abstract: Low-Rank Adaptation (LoRA) offers an efficient way to fine-tune large language models (LLMs). Its modular and plug-and-play nature allows the integration of various domain-specific LoRAs, enhancing LLM capabilities. Open-source platforms like Huggingface and Modelscope have introduced a new computational paradigm, Uploadable Machine Learning (UML). In UML, contributors use decentralized data to tr… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2402.09997

  3. arXiv:2406.14903  [pdf, other

    cs.AI

    GIEBench: Towards Holistic Evaluation of Group Identity-based Empathy for Large Language Models

    Authors: Leyan Wang, Yonggang Jin, Tianhao Shen, Tianyu Zheng, Xinrun Du, Chenchen Zhang, Wenhao Huang, Jiaheng Liu, Shi Wang, Ge Zhang, Liuyu Xiang, Zhaofeng He

    Abstract: As large language models (LLMs) continue to develop and gain widespread application, the ability of LLMs to exhibit empathy towards diverse group identities and understand their perspectives is increasingly recognized as critical. Most existing benchmarks for empathy evaluation of LLMs focus primarily on universal human emotions, such as sadness and pain, often overlooking the context of individua… ▽ More

    Submitted 24 June, 2024; v1 submitted 21 June, 2024; originally announced June 2024.

  4. arXiv:2406.12459  [pdf, other

    cs.CV

    HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

    Authors: Panwang Pan, Zhuo Su, Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li, Tingting Shen, Yadong Mu, Yebin Liu

    Abstract: Despite recent advancements in high-fidelity human reconstruction techniques, the requirements for densely captured images or time-consuming per-instance optimization significantly hinder their applications in broader scenarios. To tackle these issues, we present HumanSplat which predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner. In part… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  5. arXiv:2406.10867  [pdf, other

    cs.LG q-bio.BM

    Geometric-informed GFlowNets for Structure-Based Drug Design

    Authors: Grayson Lee, Tony Shen, Martin Ester

    Abstract: The rise of cost involved with drug discovery and current speed of which they are discover, underscore the need for more efficient structure-based drug design (SBDD) methods. We employ Generative Flow Networks (GFlowNets), to effectively explore the vast combinatorial space of drug-like molecules, which traditional virtual screening methods fail to cover. We introduce a novel modification to the G… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Accepted at MoML 2024 as Spotlight

  6. arXiv:2406.10224  [pdf, other

    cs.CV

    EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models

    Authors: Julian Straub, Daniel DeTone, Tianwei Shen, Nan Yang, Chris Sweeney, Richard Newcombe

    Abstract: The advent of wearable computers enables a new source of context for AI that is embedded in egocentric sensor data. This new egocentric data comes equipped with fine-grained 3D location information and thus presents the opportunity for a novel class of spatial foundation models that are rooted in 3D space. To measure progress on what we term Egocentric Foundation Models (EFMs) we establish EFM3D,… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  7. arXiv:2406.07070  [pdf, other

    cs.CL

    HalluDial: A Large-Scale Benchmark for Automatic Dialogue-Level Hallucination Evaluation

    Authors: Wen Luo, Tianshu Shen, Wei Li, Guangyue Peng, Richeng Xuan, Houfeng Wang, Xi Yang

    Abstract: Large Language Models (LLMs) have significantly advanced the field of Natural Language Processing (NLP), achieving remarkable performance across diverse tasks and enabling widespread real-world applications. However, LLMs are prone to hallucination, generating content that either conflicts with established knowledge or is unfaithful to the original sources. Existing hallucination benchmarks primar… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.03085  [pdf, other

    cs.LG cs.IR

    Exploring User Retrieval Integration towards Large Language Models for Cross-Domain Sequential Recommendation

    Authors: Tingjia Shen, Hao Wang, Jiaqing Zhang, Sirui Zhao, Liangyue Li, Zulong Chen, Defu Lian, Enhong Chen

    Abstract: Cross-Domain Sequential Recommendation (CDSR) aims to mine and transfer users' sequential preferences across different domains to alleviate the long-standing cold-start issue. Traditional CDSR models capture collaborative information through user and item modeling while overlooking valuable semantic information. Recently, Large Language Model (LLM) has demonstrated powerful semantic reasoning capa… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: 10 pages, 5 figures

    ACM Class: I.2.7

  9. arXiv:2406.00121  [pdf, other

    cs.CV

    Empowering Visual Creativity: A Vision-Language Assistant to Image Editing Recommendations

    Authors: Tiancheng Shen, Jun Hao Liew, Long Mai, Lu Qi, Jiashi Feng, Jiaya Jia

    Abstract: Advances in text-based image generation and editing have revolutionized content creation, enabling users to create impressive content from imaginative text prompts. However, existing methods are not designed to work well with the oversimplified prompts that are often encountered in typical scenarios when users start their editing with only vague or abstract purposes in mind. Those scenarios demand… ▽ More

    Submitted 31 May, 2024; originally announced June 2024.

  10. arXiv:2405.19257  [pdf, other

    cs.RO cs.DC

    Hybrid-Parallel: Achieving High Performance and Energy Efficient Distributed Inference on Robots

    Authors: Zekai Sun, Xiuxian Guan, Junming Wang, Haoze Song, Yuhao Qing, Tianxiang Shen, Dong Huang, Fangming Liu, Heming Cui

    Abstract: The rapid advancements in machine learning techniques have led to significant achievements in various real-world robotic tasks. These tasks heavily rely on fast and energy-efficient inference of deep neural network (DNN) models when deployed on robots. To enhance inference performance, distributed inference has emerged as a promising approach, parallelizing inference across multiple powerful GPU d… ▽ More

    Submitted 29 May, 2024; originally announced May 2024.

  11. arXiv:2405.17840  [pdf, other

    cs.CL

    Benchmarks Underestimate the Readiness of Multi-lingual Dialogue Agents

    Authors: Andrew H. Lee, Sina J. Semnani, Galo Castillo-López, Gäel de Chalendar, Monojit Choudhury, Ashna Dua, Kapil Rajesh Kavitha, Sungkyun Kim, Prashant Kodali, Ponnurangam Kumaraguru, Alexis Lombard, Mehrad Moradshahi, Gihyun Park, Nasredine Semmar, Jiwon Seo, Tianhao Shen, Manish Shrivastava, Deyi Xiong, Monica S. Lam

    Abstract: Creating multilingual task-oriented dialogue (TOD) agents is challenging due to the high cost of training data acquisition. Following the research trend of improving training data efficiency, we show for the first time, that in-context learning is sufficient to tackle multilingual TOD. To handle the challenging dialogue state tracking (DST) subtask, we break it down to simpler steps that are mor… ▽ More

    Submitted 16 June, 2024; v1 submitted 28 May, 2024; originally announced May 2024.

  12. arXiv:2405.15356  [pdf, other

    cs.CV

    Alleviating Hallucinations in Large Vision-Language Models through Hallucination-Induced Optimization

    Authors: Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen

    Abstract: Although Large Visual Language Models (LVLMs) have demonstrated exceptional abilities in understanding multimodal data, they invariably suffer from hallucinations, leading to a disconnect between the generated text and the corresponding images. Almost all current visual contrastive decoding methods attempt to mitigate these hallucinations by introducing visual uncertainty information that appropri… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

    Comments: 10 pages. arXiv admin note: text overlap with arXiv:2311.16922 by other authors

  13. arXiv:2405.10576  [pdf, other

    cs.RO

    An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic Systems

    Authors: Jiyue Tao, Yunsong Zhang, Sunil Kumar Rajendran, Feitian Zhang, Dexin Zhao, Tongsheng Shen

    Abstract: Robotic systems driven by artificial muscles present unique challenges due to the nonlinear dynamics of actuators and the complex designs of mechanical structures. Traditional model-based controllers often struggle to achieve desired control performance in such systems. Deep reinforcement learning (DRL), a trending machine learning technique widely adopted in robot control, offers a promising alte… ▽ More

    Submitted 7 June, 2024; v1 submitted 17 May, 2024; originally announced May 2024.

  14. MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels

    Authors: Qi Chen, Xiubo Geng, Corby Rosset, Carolyn Buractaon, Jingwen Lu, Tao Shen, Kun Zhou, Chenyan Xiong, Yeyun Gong, Paul Bennett, Nick Craswell, Xing Xie, Fan Yang, Bryan Tower, Nikhil Rao, Anlei Dong, Wenqi Jiang, Zheng Liu, Mingqin Li, Chuanjie Liu, Zengzhong Li, Rangan Majumder, Jennifer Neville, Andy Oakley, Knut Magne Risvik , et al. (6 additional authors not shown)

    Abstract: Recent breakthroughs in large models have highlighted the critical significance of data scale, labels and modals. In this paper, we introduce MS MARCO Web Search, the first large-scale information-rich web dataset, featuring millions of real clicked query-document labels. This dataset closely mimics real-world web document and query distribution, provides rich information for various kinds of down… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: 10 pages, 6 figures, for associated dataset, see http://github.com/microsoft/MS-MARCO-Web-Search

  15. arXiv:2405.03971  [pdf, other

    cs.CV cs.MA

    Unified End-to-End V2X Cooperative Autonomous Driving

    Authors: Zhiwei Li, Bozhen Zhang, Lei Yang, Tianyu Shen, Nuo Xu, Ruosen Hao, Weiting Li, Tao Yan, Huaping Liu

    Abstract: V2X cooperation, through the integration of sensor data from both vehicles and infrastructure, is considered a pivotal approach to advancing autonomous driving technology. Current research primarily focuses on enhancing perception accuracy, often overlooking the systematic improvement of accident prediction accuracy through end-to-end learning, leading to insufficient attention to the safety issue… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  16. arXiv:2405.00797  [pdf, other

    cs.RO cs.CV

    ADM: Accelerated Diffusion Model via Estimated Priors for Robust Motion Prediction under Uncertainties

    Authors: Jiahui Li, Tianle Shen, Zekai Gu, Jiawei Sun, Chengran Yuan, Yuhang Han, Shuo Sun, Marcelo H. Ang Jr

    Abstract: Motion prediction is a challenging problem in autonomous driving as it demands the system to comprehend stochastic dynamics and the multi-modal nature of real-world agent interactions. Diffusion models have recently risen to prominence, and have proven particularly effective in pedestrian motion prediction tasks. However, the significant time consumption and sensitivity to noise have limited the r… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 7 pages, 4 figures

  17. arXiv:2404.19372  [pdf, other

    cs.NI

    AutoNet: Automatic Reachability Policy Management in Public Cloud Networks

    Authors: German Sviridov, Zheng Tao Shen, Jorge Cardoso

    Abstract: Virtual Private Cloud (VPC) is the main network abstraction technology used in public cloud systems. VPCs are composed of a set of network services that permit the definition of complex network reachability properties among internal and external cloud entities such as tenants' VMs or some generic internet nodes. Although hiding the underlying complexity through a comprehensible abstraction layer,… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  18. arXiv:2404.11108  [pdf, other

    cs.CV

    LADDER: An Efficient Framework for Video Frame Interpolation

    Authors: Tong Shen, Dong Li, Ziheng Gao, Lu Tian, Emad Barsoum

    Abstract: Video Frame Interpolation (VFI) is a crucial technique in various applications such as slow-motion generation, frame rate conversion, video frame restoration etc. This paper introduces an efficient video frame interpolation framework that aims to strike a favorable balance between efficiency and quality. Our framework follows a general paradigm consisting of a flow estimator and a refinement modul… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  19. arXiv:2403.19211  [pdf, other

    cs.LG cs.AI cs.CL

    Dual-Personalizing Adapter for Federated Foundation Models

    Authors: Yiyuan Yang, Guodong Long, Tao Shen, Jing Jiang, Michael Blumenstein

    Abstract: Recently, foundation models, particularly large language models (LLMs), have demonstrated an impressive ability to adapt to various tasks by fine-tuning large amounts of instruction data. Notably, federated foundation models emerge as a privacy preservation method to fine-tune models collaboratively under federated learning (FL) settings by leveraging many distributed datasets with non-IID data. T… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  20. arXiv:2403.11607  [pdf, other

    cs.RO

    AGRNav: Efficient and Energy-Saving Autonomous Navigation for Air-Ground Robots in Occlusion-Prone Environments

    Authors: Junming Wang, Zekai Sun, Xiuxian Guan, Tianxiang Shen, Zongyuan Zhang, Tianyang Duan, Dong Huang, Shixiong Zhao, Heming Cui

    Abstract: The exceptional mobility and long endurance of air-ground robots are raising interest in their usage to navigate complex environments (e.g., forests and large buildings). However, such environments often contain occluded and unknown regions, and without accurate prediction of unobserved obstacles, the movement of the air-ground robot often suffers a suboptimal trajectory under existing mapping-bas… ▽ More

    Submitted 18 March, 2024; originally announced March 2024.

    Comments: Accepted to ICRA 2024

  21. arXiv:2403.10252  [pdf, other

    cs.CV

    Region-aware Distribution Contrast: A Novel Approach to Multi-Task Partially Supervised Learning

    Authors: Meixuan Li, Tianyu Li, Guoqing Wang, Peng Wang, Yang Yang, Heng Tao Shen

    Abstract: In this study, we address the intricate challenge of multi-task dense prediction, encompassing tasks such as semantic segmentation, depth estimation, and surface normal estimation, particularly when dealing with partially annotated data (MTPSL). The complexity arises from the absence of complete task labels for each training image. Given the inter-related nature of these pixel-wise dense tasks, ou… ▽ More

    Submitted 15 March, 2024; originally announced March 2024.

  22. arXiv:2403.03134  [pdf, other

    cs.CV cs.AI q-bio.NC

    Simplicity in Complexity : Explaining Visual Complexity using Deep Segmentation Models

    Authors: Tingke Shen, Surabhi S Nath, Aenne Brielmann, Peter Dayan

    Abstract: The complexity of visual stimuli plays an important role in many cognitive phenomena, including attention, engagement, memorability, time perception and aesthetic evaluation. Despite its importance, complexity is poorly understood and ironically, previous models of image complexity have been quite complex. There have been many attempts to find handcrafted features that explain complexity, but thes… ▽ More

    Submitted 6 May, 2024; v1 submitted 5 March, 2024; originally announced March 2024.

  23. arXiv:2402.18458  [pdf, other

    cs.CL

    Meta-Task Prompting Elicits Embedding from Large Language Models

    Authors: Yibin Lei, Di Wu, Tianyi Zhou, Tao Shen, Yu Cao, Chongyang Tao, Andrew Yates

    Abstract: In this work, we introduce a new unsupervised embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning or task-specific engineering. Leveraging meta-task prompting, MetaEOL guides LLMs to produce embeddings through a series of carefully designed prompts… ▽ More

    Submitted 28 February, 2024; originally announced February 2024.

  24. arXiv:2402.18031  [pdf, other

    cs.IR cs.CL

    Corpus-Steered Query Expansion with Large Language Models

    Authors: Yibin Lei, Yu Cao, Tianyi Zhou, Tao Shen, Andrew Yates

    Abstract: Recent studies demonstrate that query expansions generated by large language models (LLMs) can considerably enhance information retrieval systems by generating hypothetical documents that answer the queries as expansions. However, challenges arise from misalignments between the expansions and the retrieval corpus, resulting in issues like hallucinations and outdated information due to the limited… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: EACL 2024 (Short)

  25. arXiv:2402.16153  [pdf, other

    cs.SD cs.AI cs.CL cs.LG cs.MM eess.AS

    ChatMusician: Understanding and Generating Music Intrinsically with LLM

    Authors: Ruibin Yuan, Hanfeng Lin, Yi Wang, Zeyue Tian, Shangda Wu, Tianhao Shen, Ge Zhang, Yuhang Wu, Cong Liu, Ziya Zhou, Ziyang Ma, Liumeng Xue, Ziyu Wang, Qin Liu, Tianyu Zheng, Yizhi Li, Yinghao Ma, Yiming Liang, Xiaowei Chi, Ruibo Liu, Zili Wang, Pengfei Li, Jingcheng Wu, Chenghua Lin, Qifeng Liu , et al. (10 additional authors not shown)

    Abstract: While Large Language Models (LLMs) demonstrate impressive capabilities in text generation, we find that their ability has yet to be generalized to music, humanity's creative language. We introduce ChatMusician, an open-source LLM that integrates intrinsic musical abilities. It is based on continual pre-training and finetuning LLaMA2 on a text-compatible music representation, ABC notation, and the… ▽ More

    Submitted 25 February, 2024; originally announced February 2024.

    Comments: GitHub: https://shanghaicannon.github.io/ChatMusician/

  26. arXiv:2402.14658  [pdf, other

    cs.SE cs.AI cs.CL

    OpenCodeInterpreter: Integrating Code Generation with Execution and Refinement

    Authors: Tianyu Zheng, Ge Zhang, Tianhao Shen, Xueling Liu, Bill Yuchen Lin, Jie Fu, Wenhu Chen, Xiang Yue

    Abstract: The introduction of large language models has significantly advanced code generation. However, open-source models often lack the execution capabilities and iterative refinement of advanced systems like the GPT-4 Code Interpreter. To address this, we introduce OpenCodeInterpreter, a family of open-source code systems designed for generating, executing, and iteratively refining code. Supported by Co… ▽ More

    Submitted 27 February, 2024; v1 submitted 22 February, 2024; originally announced February 2024.

  27. arXiv:2402.13116  [pdf, other

    cs.CL

    A Survey on Knowledge Distillation of Large Language Models

    Authors: Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, Tianyi Zhou

    Abstract: In the era of Large Language Models (LLMs), Knowledge Distillation (KD) emerges as a pivotal methodology for transferring advanced capabilities from leading proprietary LLMs, such as GPT-4, to their open-source counterparts like LLaMA and Mistral. Additionally, as open-source LLMs flourish, KD plays a crucial role in both compressing these models, and facilitating their self-improvement by employi… ▽ More

    Submitted 8 March, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

    Comments: 44 pages

  28. arXiv:2402.12048  [pdf, other

    cs.CL

    Model Tailor: Mitigating Catastrophic Forgetting in Multi-modal Large Language Models

    Authors: Didi Zhu, Zhongyi Sun, Zexi Li, Tao Shen, Ke Yan, Shouhong Ding, Kun Kuang, Chao Wu

    Abstract: Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large language models (MLLMs), where improving performance on unseen tasks often leads to a significant performance drop on the original tasks. This paper presents a comprehensive analysis of catastrophic forgetting in MLLMs and introduces a post-training adjustment method called Model Tailor. Our method primarily… ▽ More

    Submitted 19 February, 2024; originally announced February 2024.

  29. arXiv:2402.10699  [pdf, other

    cs.CL

    Rethinking Human-like Translation Strategy: Integrating Drift-Diffusion Model with Large Language Models for Machine Translation

    Authors: Hongbin Na, Zimu Wang, Mieradilijiang Maimaiti, Tong Chen, Wei Wang, Tao Shen, Ling Chen

    Abstract: Large language models (LLMs) have demonstrated promising potential in various downstream tasks, including machine translation. However, prior work on LLM-based machine translation has mainly focused on better utilizing training data, demonstrations, or pre-defined and universal knowledge to improve performance, with a lack of consideration of decision-making like human translators. In this paper,… ▽ More

    Submitted 16 February, 2024; originally announced February 2024.

    Comments: Under review

  30. arXiv:2402.09055  [pdf, other

    cs.CV cs.AI

    Comment-aided Video-Language Alignment via Contrastive Pre-training for Short-form Video Humor Detection

    Authors: Yang Liu, Tongfei Shen, Dong Zhang, Qingying Sun, Shoushan Li, Guodong Zhou

    Abstract: The growing importance of multi-modal humor detection within affective computing correlates with the expanding influence of short-form video sharing on social media platforms. In this paper, we propose a novel two-branch hierarchical model for short-form video humor detection (SVHD), named Comment-aided Video-Language Alignment (CVLA) via data-augmented multi-modal contrastive pre-training. Notabl… ▽ More

    Submitted 14 April, 2024; v1 submitted 14 February, 2024; originally announced February 2024.

    Comments: Accepted by ICMR 2024

  31. arXiv:2402.02555  [pdf, other

    cs.CV cs.CL

    Generalizable Entity Grounding via Assistance of Large Language Model

    Authors: Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang

    Abstract: In this work, we propose a novel approach to densely ground visual entities from a long caption. We leverage a large multimodal model (LMM) to extract semantic nouns, a class-agnostic segmentation model to generate entity-level segmentation, and the proposed multi-modal feature fusion module to associate each semantic noun with its corresponding segmentation mask. Additionally, we introduce a stra… ▽ More

    Submitted 4 February, 2024; originally announced February 2024.

  32. arXiv:2402.01342  [pdf, other

    cs.LG stat.ML

    Training-time Neuron Alignment through Permutation Subspace for Improving Linear Mode Connectivity and Model Fusion

    Authors: Zexi Li, Zhiqi Li, Jie Lin, Tao Shen, Tao Lin, Chao Wu

    Abstract: In deep learning, stochastic gradient descent often yields functionally similar yet widely scattered solutions in the weight space even under the same initialization, causing barriers in the Linear Mode Connectivity (LMC) landscape. Overcoming these barriers is crucial for understanding deep learning dynamics and enhancing model-fusion algorithms. Previous studies highlight the role of permutation… ▽ More

    Submitted 2 February, 2024; originally announced February 2024.

    Comments: preprint

  33. arXiv:2401.16433  [pdf, other

    cs.IR cs.LG

    Within-basket Recommendation via Neural Pattern Associator

    Authors: Kai Luo, Tianshu Shen, Lan Yao, Ga Wu, Aaron Liblong, Istvan Fehervari, Ruijian An, Jawad Ahmed, Harshit Mishra, Charu Pujari

    Abstract: Within-basket recommendation (WBR) refers to the task of recommending items to the end of completing a non-empty shopping basket during a shopping session. While the latest innovations in this space demonstrate remarkable performance improvement on benchmark datasets, they often overlook the complexity of user behaviors in practice, such as 1) co-existence of multiple shopping intentions, 2) multi… ▽ More

    Submitted 14 March, 2024; v1 submitted 25 January, 2024; originally announced January 2024.

    Comments: 13 pages, 9 figures

  34. arXiv:2401.07103  [pdf, other

    cs.CL

    Leveraging Large Language Models for NLG Evaluation: Advances and Challenges

    Authors: Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, Shuai Ma

    Abstract: In the rapidly evolving domain of Natural Language Generation (NLG) evaluation, introducing Large Language Models (LLMs) has opened new avenues for assessing generated content quality, e.g., coherence, creativity, and context relevance. This paper aims to provide a thorough overview of leveraging LLMs for NLG evaluation, a burgeoning area that lacks a systematic analysis. We propose a coherent tax… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 January, 2024; originally announced January 2024.

    Comments: 21 pages, 5 figures

  35. arXiv:2401.03141  [pdf, other

    cs.RO

    Estimating the Lateral Motion States of an Underwater Robot by Propeller Wake Sensing Using an Artificial Lateral Line

    Authors: Jun Wang, Dexin Zhao, Youxi Zhao, Feitian Zhang, Tongsheng Shen

    Abstract: An artificial lateral line (ALL) is a bioinspired flow sensing system of an underwater robot that consists of distributed flow sensors. The ALL has achieved great success in sensing the motion states of bioinspired underwater robots, e.g., robotic fish, that are driven by body undulation and/or tail flapping. However, the ALL has not been systematically tested and studied in the sensing of underwa… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: 10 pages, 8 figures

  36. arXiv:2312.17425  [pdf, other

    cs.CV cs.AI

    ALF: Adaptive Label Finetuning for Scene Graph Generation

    Authors: Qishen Chen, Jianzhi Liu, Xinyu Lyu, Lianli Gao, Heng Tao Shen, Jingkuan Song

    Abstract: Scene Graph Generation (SGG) endeavors to predict the relationships between subjects and objects in a given image. Nevertheless, the long-tail distribution of relations often leads to biased prediction on coarse labels, presenting a substantial hurdle in SGG. To address this issue, researchers focus on unbiased SGG and introduce data transfer methods to transfer coarse-grained predicates into fine… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 December, 2023; originally announced December 2023.

  37. arXiv:2312.16132  [pdf, other

    cs.CL

    RoleEval: A Bilingual Role Evaluation Benchmark for Large Language Models

    Authors: Tianhao Shen, Sun Li, Quan Tu, Deyi Xiong

    Abstract: The rapid evolution of large language models necessitates effective benchmarks for evaluating their role knowledge, which is essential for establishing connections with the real world and providing more immersive interactions. This paper introduces RoleEval, a bilingual benchmark designed to assess the memorization, utilization, and reasoning capabilities of role knowledge. RoleEval comprises Role… ▽ More

    Submitted 16 February, 2024; v1 submitted 26 December, 2023; originally announced December 2023.

    Comments: Our dataset is available at https://github.com/Magnetic2014/RoleEval

  38. arXiv:2312.15263  [pdf, other

    cs.CV

    Self-Supervised Depth Completion Guided by 3D Perception and Geometry Consistency

    Authors: Yu Cai, Tianyu Shen, Shi-Sheng Huang, Hua Huang

    Abstract: Depth completion, aiming to predict dense depth maps from sparse depth measurements, plays a crucial role in many computer vision related applications. Deep learning approaches have demonstrated overwhelming success in this task. However, high-precision depth completion without relying on the ground-truth data, which are usually costly, still remains challenging. The reason lies on the ignorance o… ▽ More

    Submitted 23 December, 2023; originally announced December 2023.

    Comments: 13 pages, 7 figures

  39. arXiv:2312.12826  [pdf, other

    cs.CV

    ReCo-Diff: Explore Retinex-Based Condition Strategy in Diffusion Model for Low-Light Image Enhancement

    Authors: Yuhui Wu, Guoqing Wang, Zhiwen Wang, Yang Yang, Tianyu Li, Peng Wang, Chongyi Li, Heng Tao Shen

    Abstract: Low-light image enhancement (LLIE) has achieved promising performance by employing conditional diffusion models. In this study, we propose ReCo-Diff, a novel approach that incorporates Retinex-based prior as an additional pre-processing condition to regulate the generating capabilities of the diffusion model. ReCo-Diff first leverages a pre-trained decomposition network to produce initial reflecta… ▽ More

    Submitted 20 December, 2023; originally announced December 2023.

  40. arXiv:2312.12478  [pdf, other

    cs.CV

    ProS: Prompting-to-simulate Generalized knowledge for Universal Cross-Domain Retrieval

    Authors: Kaipeng Fang, Jingkuan Song, Lianli Gao, Pengpeng Zeng, Zhi-Qi Cheng, Xiyao Li, Heng Tao Shen

    Abstract: The goal of Universal Cross-Domain Retrieval (UCDR) is to achieve robust performance in generalized test scenarios, wherein data may belong to strictly unknown domains and categories during training. Recently, pre-trained models with prompt tuning have shown strong generalization capabilities and attained noteworthy achievements in various downstream tasks, such as few-shot learning and video-text… ▽ More

    Submitted 29 February, 2024; v1 submitted 19 December, 2023; originally announced December 2023.

  41. arXiv:2312.09894  [pdf, other

    cs.CV cs.AI

    PathoDuet: Foundation Models for Pathological Slide Analysis of H&E and IHC Stains

    Authors: Shengyi Hua, Fang Yan, Tianle Shen, Xiaofan Zhang

    Abstract: Large amounts of digitized histopathological data display a promising future for developing pathological foundation models via self-supervised learning methods. Foundation models pretrained with these methods serve as a good basis for downstream tasks. However, the gap between natural and histopathological images hinders the direct application of existing methods. In this work, we present PathoDue… ▽ More

    Submitted 15 December, 2023; originally announced December 2023.

  42. arXiv:2312.07549  [pdf, other

    cs.CV

    Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control

    Authors: Sitong Su, Litao Guo, Lianli Gao, Heng Tao Shen, Jingkuan Song

    Abstract: Story Visualization aims to generate images aligned with story prompts, reflecting the coherence of storybooks through visual consistency among characters and scenes.Whereas current approaches exclusively concentrate on characters and neglect the visual consistency among contextually correlated scenes, resulting in independent character images without inter-image coherence.To tackle this issue, we… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  43. arXiv:2312.01598  [pdf, other

    cs.CV

    Good Questions Help Zero-Shot Image Reasoning

    Authors: Kaiwen Yang, Tao Shen, Xinmei Tian, Xiubo Geng, Chongyang Tao, Dacheng Tao, Tianyi Zhou

    Abstract: Aligning the recent large language models (LLMs) with computer vision models leads to large vision-language models (LVLMs), which have paved the way for zero-shot image reasoning tasks. However, LVLMs are usually trained on short high-level captions only referring to sparse focus regions in images. Such a ``tunnel vision'' limits LVLMs to exploring other relevant contexts in complex scenes. To add… ▽ More

    Submitted 8 December, 2023; v1 submitted 3 December, 2023; originally announced December 2023.

  44. arXiv:2312.00840  [pdf, other

    cs.LG cs.CV

    Towards Redundancy-Free Sub-networks in Continual Learning

    Authors: Cheng Chen, Jingkuan Song, LianLi Gao, Heng Tao Shen

    Abstract: Catastrophic Forgetting (CF) is a prominent issue in continual learning. Parameter isolation addresses this challenge by masking a sub-network for each task to mitigate interference with old tasks. However, these sub-networks are constructed relying on weight magnitude, which does not necessarily correspond to the importance of weights, resulting in maintaining unimportant weights and constructing… ▽ More

    Submitted 11 January, 2024; v1 submitted 30 November, 2023; originally announced December 2023.

  45. Continual Referring Expression Comprehension via Dual Modular Memorization

    Authors: Heng Tao Shen, Cheng Chen, Peng Wang, Lianli Gao, Meng Wang, Jingkuan Song

    Abstract: Referring Expression Comprehension (REC) aims to localize an image region of a given object described by a natural-language expression. While promising performance has been demonstrated, existing REC algorithms make a strong assumption that training data feeding into a model are given upfront, which degrades its practicality for real-world scenarios. In this paper, we propose Continual Referring E… ▽ More

    Submitted 24 November, 2023; originally announced November 2023.

    Comments: IEEE Transactions on Image Processing

  46. arXiv:2311.10091  [pdf, other

    cs.CV cs.GR

    Adaptive Shells for Efficient Neural Radiance Field Rendering

    Authors: Zian Wang, Tianchang Shen, Merlin Nimier-David, Nicholas Sharp, Jun Gao, Alexander Keller, Sanja Fidler, Thomas Müller, Zan Gojcic

    Abstract: Neural radiance fields achieve unprecedented quality for novel view synthesis, but their volumetric formulation remains expensive, requiring a huge number of samples to render high-resolution images. Volumetric encodings are essential to represent fuzzy geometry such as foliage and hair, and they are well-suited for stochastic optimization. Yet, many scenes ultimately consist largely of solid surf… ▽ More

    Submitted 16 November, 2023; originally announced November 2023.

    Comments: SIGGRAPH Asia 2023. Project page: research.nvidia.com/labs/toronto-ai/adaptive-shells/

  47. arXiv:2311.08734  [pdf, other

    cs.CL

    Thread of Thought Unraveling Chaotic Contexts

    Authors: Yucheng Zhou, Xiubo Geng, Tao Shen, Chongyang Tao, Guodong Long, Jian-Guang Lou, Jianbing Shen

    Abstract: Large Language Models (LLMs) have ushered in a transformative era in the field of natural language processing, excelling in tasks related to text comprehension and generation. Nevertheless, they encounter difficulties when confronted with chaotic contexts (e.g., distractors rather than long irrelevant context), leading to the inadvertent omission of certain details within the chaotic context. In r… ▽ More

    Submitted 15 November, 2023; originally announced November 2023.

    Comments: 11 pages, 7 figures, 5 tables

  48. arXiv:2311.03352  [pdf, other

    cs.CV

    Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

    Authors: Hao Zhou, Tiancheng Shen, Xu Yang, Hai Huang, Xiangtai Li, Lu Qi, Ming-Hsuan Yang

    Abstract: In this paper, we highlight a problem of evaluation metrics adopted in the open-vocabulary segmentation. That is, the evaluation process still heavily relies on closed-set metrics on zero-shot or cross-dataset pipelines without considering the similarity between predicted and ground truth categories. To tackle this issue, we first survey eleven similarity measurements between two categorical words… ▽ More

    Submitted 6 November, 2023; originally announced November 2023.

  49. arXiv:2311.02083  [pdf, other

    cs.IR cs.AI

    MaRU: A Manga Retrieval and Understanding System Connecting Vision and Language

    Authors: Conghao Tom Shen, Violet Yao, Yixin Liu

    Abstract: Manga, a widely celebrated Japanese comic art form, is renowned for its diverse narratives and distinct artistic styles. However, the inherently visual and intricate structure of Manga, which comprises images housing multiple panels, poses significant challenges for content retrieval. To address this, we present MaRU (Manga Retrieval and Understanding), a multi-staged system that connects vision a… ▽ More

    Submitted 22 October, 2023; originally announced November 2023.

  50. arXiv:2310.09930  [pdf, other

    cs.CL

    FiLM: Fill-in Language Models for Any-Order Generation

    Authors: Tianxiao Shen, Hao Peng, Ruoqi Shen, Yao Fu, Zaid Harchaoui, Yejin Choi

    Abstract: Language models have become the backbone of today's AI systems. However, their predominant left-to-right generation limits the use of bidirectional context, which is essential for tasks that involve filling text in the middle. We propose the Fill-in Language Model (FiLM), a new language modeling approach that allows for flexible generation at any position without adhering to a specific generation… ▽ More

    Submitted 15 October, 2023; originally announced October 2023.