Skip to main content

Showing 1–50 of 689 results for author: Lu, C

  1. arXiv:2407.03245  [pdf, other

    cs.RO cs.AI eess.SY

    TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach

    Authors: Weikun Peng, Jun Lv, Yuwei Zeng, Haonan Chen, Siheng Zhao, Jicheng Sun, Cewu Lu, Lin Shao

    Abstract: The tie-knotting task is highly challenging due to the tie's high deformation and long-horizon manipulation actions. This work presents TieBot, a Real-to-Sim-to-Real learning from visual demonstration system for the robots to learn to knot a tie. We introduce the Hierarchical Feature Matching approach to estimate a sequence of tie's meshes from the demonstration video. With these estimated meshes… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: initial commit

  2. arXiv:2407.02911  [pdf, other

    eess.IV cs.CV

    Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

    Authors: Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

    Abstract: Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the rec… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  3. arXiv:2407.00769  [pdf, other

    quant-ph cs.DC

    Achieving Energetic Superiority Through System-Level Quantum Circuit Simulation

    Authors: Rong Fu, Zhongling Su, Han-Sen Zhong, Xiti Zhao, Jianyang Zhang, Feng Pan, Pan Zhang, Xianhe Zhao, Ming-Cheng Chen, Chao-Yang Lu, Jian-Wei Pan, Zhiling Pei, Xingcheng Zhang, Wanli Ouyang

    Abstract: Quantum Computational Superiority boasts rapid computation and high energy efficiency. Despite recent advances in classical algorithms aimed at refuting the milestone claim of Google's sycamore, challenges remain in generating uncorrelated samples of random quantum circuits. In this paper, we present a groundbreaking large-scale system technology that leverages optimization on global, node, and de… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  4. arXiv:2407.00495  [pdf, other

    cs.LG

    A Bayesian Solution To The Imitation Gap

    Authors: Risto Vuorio, Mattie Fellows, Cong Lu, Clémence Grislain, Shimon Whiteson

    Abstract: In many real-world settings, an agent must learn to act in environments where no reward signal can be specified, but a set of expert demonstrations is available. Imitation learning (IL) is a popular framework for learning policies from such demonstrations. However, in some cases, differences in observability between the expert and the agent can give rise to an imitation gap such that the expert's… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  5. arXiv:2407.00299  [pdf, other

    cs.RO cs.AI cs.CV cs.HC cs.LG

    Human-Agent Joint Learning for Efficient Robot Manipulation Skill Acquisition

    Authors: Shengcheng Luo, Quanquan Peng, Jun Lv, Kaiwen Hong, Katherine Rose Driggs-Campbell, Cewu Lu, Yong-Lu Li

    Abstract: Employing a teleoperation system for gathering demonstrations offers the potential for more efficient learning of robot manipulation. However, teleoperating a robot arm equipped with a dexterous hand or gripper, via a teleoperation system poses significant challenges due to its high dimensionality, complex motions, and differences in physiological structure. In this study, we introduce a novel s… ▽ More

    Submitted 2 July, 2024; v1 submitted 28 June, 2024; originally announced July 2024.

    Comments: 8 pages, 6 figures

  6. arXiv:2406.19972  [pdf, other

    cs.RO

    HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

    Authors: Xinyu Xu, Yizheng Zhang, Yong-Lu Li, Lei Han, Cewu Lu

    Abstract: Physical Human-Scene Interaction (HSI) plays a crucial role in numerous applications. However, existing HSI techniques are limited to specific object dynamics and privileged information, which prevents the development of more comprehensive applications. To address this limitation, we introduce HumanVLA for general object rearrangement directed by practical vision and language. A teacher-stud… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  7. arXiv:2406.19485  [pdf, other

    eess.IV cs.CV

    GAPNet: Granularity Attention Network with Anatomy-Prior-Constraint for Carotid Artery Segmentation

    Authors: Lin Zhang, Chenggang Lu, Xin-yang Shi, Caifeng Shan, Jiong Zhang, Da Chen, Laurent D. Cohen

    Abstract: Atherosclerosis is a chronic, progressive disease that primarily affects the arterial walls. It is one of the major causes of cardiovascular disease. Magnetic Resonance (MR) black-blood vessel wall imaging (BB-VWI) offers crucial insights into vascular disease diagnosis by clearly visualizing vascular structures. However, the complex anatomy of the neck poses challenges in distinguishing the carot… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  8. arXiv:2406.19131  [pdf, other

    cs.CV

    CELLO: Causal Evaluation of Large Vision-Language Models

    Authors: Meiqi Chen, Bo Peng, Yan Zhang, Chaochao Lu

    Abstract: Causal reasoning is fundamental to human intelligence and crucial for effective decision-making in real-world environments. Despite recent advancements in large vision-language models (LVLMs), their ability to comprehend causality remains unclear. Previous work typically focuses on commonsense causality between events and/or actions, which is insufficient for applications like embodied agents and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  9. arXiv:2406.17274  [pdf, other

    cs.CL cs.LG

    Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

    Authors: Jianfeng He, Runing Yang, Linlin Yu, Changbin Li, Ruoxi Jia, Feng Chen, Ming Jin, Chang-Tien Lu

    Abstract: Text summarization, a key natural language generation (NLG) task, is vital in various domains. However, the high cost of inaccurate summaries in risk-critical applications, particularly those involving human-in-the-loop decision-making, raises concerns about the reliability of uncertainty estimation on text summarization (UE-TS) evaluation methods. This concern stems from the dependency of uncerta… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 63 pages, 41 figures, 11 tables

  10. arXiv:2406.16605  [pdf, other

    cs.CL cs.AI cs.LG stat.ME

    CLEAR: Can Language Models Really Understand Causal Graphs?

    Authors: Sirui Chen, Mengying Xu, Kun Wang, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Chaochao Lu

    Abstract: Causal reasoning is a cornerstone of how humans interpret the world. To model and reason about causality, causal graphs offer a concise yet effective solution. Given the impressive advancements in language models, a crucial question arises: can they really understand causal graphs? To this end, we pioneer an investigation into language models' understanding of causal graphs. Specifically, we devel… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  11. arXiv:2406.15042  [pdf, other

    cs.LG cs.AI

    Behaviour Distillation

    Authors: Andrei Lupu, Chris Lu, Jarek Liesen, Robert Tjarko Lange, Jakob Foerster

    Abstract: Dataset distillation aims to condense large datasets into a small number of synthetic examples that can be used as drop-in replacements when training new models. It has applications to interpretability, neural architecture search, privacy, and continual learning. Despite strong successes in supervised domains, such methods have not yet been extended to reinforcement learning, where the lack of a f… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.

    Comments: Published as a conference paper at ICLR 2024

  12. arXiv:2406.12731  [pdf, other

    cs.RO

    Tactile SoftHand-A: 3D-Printed, Tactile, Highly-underactuated, Anthropomorphic Robot Hand with an Antagonistic Tendon Mechanism

    Authors: Haoran Li, Christopher J. Ford, Chenghua Lu, Yijiong Lin, Matteo Bianchi, Manuel G. Catalano, Efi Psomopoulou, Nathan F. Lepora

    Abstract: For tendon-driven multi-fingered robotic hands, ensuring grasp adaptability while minimizing the number of actuators needed to provide human-like functionality is a challenging problem. Inspired by the Pisa/IIT SoftHand, this paper introduces a 3D-printed, highly-underactuated, five-finger robotic hand named the Tactile SoftHand-A, which features only two actuators. The dual-tendon design allows f… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 17 pages, 13 figures

  13. arXiv:2406.12589  [pdf, other

    cs.LG

    Discovering Minimal Reinforcement Learning Environments

    Authors: Jarek Liesen, Chris Lu, Andrei Lupu, Jakob N. Foerster, Henning Sprekeler, Robert T. Lange

    Abstract: Reinforcement learning (RL) agents are commonly trained and evaluated in the same environment. In contrast, humans often train in a specialized environment before being evaluated, such as studying a book before taking an exam. The potential of such specialized training environments is still vastly underexplored, despite their capacity to dramatically speed up training. The framework of synthetic… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 10 pages, 7 figures

  14. arXiv:2406.12053  [pdf, other

    cs.CL

    InternalInspector $I^2$: Robust Confidence Estimation in LLMs through Internal States

    Authors: Mohammad Beigi, Ying Shen, Runing Yang, Zihao Lin, Qifan Wang, Ankith Mohan, Jianfeng He, Ming Jin, Chang-Tien Lu, Lifu Huang

    Abstract: Despite their vast capabilities, Large Language Models (LLMs) often struggle with generating reliable outputs, frequently producing high-confidence inaccuracies known as hallucinations. Addressing this challenge, our research introduces InternalInspector, a novel framework designed to enhance confidence estimation in LLMs by leveraging contrastive learning on internal states including attention st… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 8 pages

  15. arXiv:2406.11905  [pdf, other

    cs.NE cs.LG

    EvIL: Evolution Strategies for Generalisable Imitation Learning

    Authors: Silvia Sapora, Gokul Swamy, Chris Lu, Yee Whye Teh, Jakob Nicolaus Foerster

    Abstract: Often times in imitation learning (IL), the environment we collect expert demonstrations in and the environment we want to deploy our learned policy in aren't exactly the same (e.g. demonstrations collected in simulation but deployment in the real world). Compared to policy-centric approaches to IL like behavioural cloning, reward-centric approaches like inverse reinforcement learning (IRL) often… ▽ More

    Submitted 15 June, 2024; originally announced June 2024.

    Comments: 17 pages, 8 figures, ICML 2024

  16. arXiv:2406.11429  [pdf, other

    cs.CL cs.AI

    Fusion Makes Perfection: An Efficient Multi-Grained Matching Approach for Zero-Shot Relation Extraction

    Authors: Shilong Li, Ge Bai, Zhang Zhang, Ying Liu, Chenji Lu, Daichi Guo, Ruifang Liu, Yong Sun

    Abstract: Predicting unseen relations that cannot be observed during the training phase is a challenging task in relation extraction. Previous works have made progress by matching the semantics between input instances and label descriptions. However, fine-grained matching often requires laborious manual annotation, and rich interactions between instances and label descriptions come with significant computat… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Accepted to the main conference of NAACL2024

  17. arXiv:2406.11142  [pdf, other

    cs.RO cs.CV

    Graspness Discovery in Clutters for Fast and Accurate Grasp Detection

    Authors: Chenxi Wang, Hao-Shu Fang, Minghao Gou, Hongjie Fang, Jin Gao, Cewu Lu

    Abstract: Efficient and robust grasp pose detection is vital for robotic manipulation. For general 6 DoF grasping, conventional methods treat all points in a scene equally and usually adopt uniform sampling to select grasp candidates. However, we discover that ignoring where to grasp greatly harms the speed and accuracy of current grasp pose detection methods. In this paper, we propose "graspness", a qualit… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: ICCV 2021

  18. arXiv:2406.08414  [pdf, other

    cs.LG

    Discovering Preference Optimization Algorithms with and for Large Language Models

    Authors: Chris Lu, Samuel Holt, Claudio Fanconi, Alex J. Chan, Jakob Foerster, Mihaela van der Schaar, Robert Tjarko Lange

    Abstract: Offline preference optimization is a key method for enhancing and controlling the quality of Large Language Model (LLM) outputs. Typically, preference optimization is approached as an offline supervised learning task using manually-crafted convex loss functions. While these methods are based on theoretical insights, they are inherently constrained by human creativity, so the large search space of… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  19. arXiv:2406.07294  [pdf, other

    cs.RO cs.CV

    OTO Planner: An Efficient Only Travelling Once Exploration Planner for Complex and Unknown Environments

    Authors: Bo Zhou, Chuanzhao Lu, Yan Pan, Fu Chen

    Abstract: Autonomous exploration in complex and cluttered environments is essential for various applications. However, there are many challenges due to the lack of global heuristic information. Existing exploration methods suffer from the repeated paths and considerable computational resource requirement in large-scale environments. To address the above issues, this letter proposes an efficient exploration… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  20. arXiv:2406.06040  [pdf, other

    cs.CV

    Vript: A Video Is Worth Thousands of Words

    Authors: Dongjie Yang, Suyuan Huang, Chengqiang Lu, Xiaodong Han, Haoxin Zhang, Yan Gao, Yao Hu, Hai Zhao

    Abstract: Advancements in multimodal learning, particularly in video understanding and generation, require high-quality video-text datasets for improved model performance. Vript addresses this issue with a meticulously annotated corpus of 12K high-resolution videos, offering detailed, dense, and script-like captions for over 420K clips. Each clip has a caption of ~145 words, which is over 10x longer than mo… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

    Comments: submitted to NeurIPS Dataset & Benchmark track

  21. arXiv:2406.04257  [pdf, ps, other

    cs.LG cs.IR

    Data Measurements for Decentralized Data Markets

    Authors: Charles Lu, Mohammad Mohammadi Amiri, Ramesh Raskar

    Abstract: Decentralized data markets can provide more equitable forms of data acquisition for machine learning. However, to realize practical marketplaces, efficient techniques for seller selection need to be developed. We propose and benchmark federated data measurements to allow a data buyer to find sellers with relevant and diverse datasets. Diversity and relevance measures enable a buyer to make relativ… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 20 pages, 11 figures

  22. arXiv:2406.03793  [pdf, other

    cs.LG cs.CV

    Low-Rank Similarity Mining for Multimodal Dataset Distillation

    Authors: Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, Yong-Lu Li

    Abstract: Though dataset distillation has witnessed rapid development in recent years, the distillation of multimodal data, e.g., image-text pairs, poses unique and under-explored challenges. Unlike unimodal data, image-text contrastive learning (ITC) data lack inherent categorization and should instead place greater emphasis on modality correspondence. In this work, we propose Low-Rank Similarity Mining (L… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Accepted at ICML 2024

  23. arXiv:2406.00392  [pdf, other

    cs.AI

    Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning

    Authors: Jonathan Cook, Chris Lu, Edward Hughes, Joel Z. Leibo, Jakob Foerster

    Abstract: Cultural accumulation drives the open-ended and diverse progress in capabilities spanning human history. It builds an expanding body of knowledge and skills by combining individual exploration with inter-generational information transmission. Despite its widespread success among humans, the capacity for artificial learning agents to accumulate culture remains under-explored. In particular, approac… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

  24. arXiv:2405.17705  [pdf, other

    cs.CV

    DC-Gaussian: Improving 3D Gaussian Splatting for Reflective Dash Cam Videos

    Authors: Linhan Wang, Kai Cheng, Shuo Lei, Shengkun Wang, Wei Yin, Chenyang Lei, Xiaoxiao Long, Chang-Tien Lu

    Abstract: We present DC-Gaussian, a new method for generating novel views from in-vehicle dash cam videos. While neural rendering techniques have made significant strides in driving scenarios, existing methods are primarily designed for videos collected by autonomous vehicles. However, these videos are limited in both quantity and diversity compared to dash cam videos, which are more widely used across vari… ▽ More

    Submitted 29 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 9 pages,7 figures;project page: https://linhanwang.github.io/dcgaussian/

  25. arXiv:2405.17358  [pdf, other

    cs.LG cs.AI

    Rethinking Transformers in Solving POMDPs

    Authors: Chenhao Lu, Ruizhe Shi, Yuyao Liu, Kaizhe Hu, Simon S. Du, Huazhe Xu

    Abstract: Sequential decision-making algorithms such as reinforcement learning (RL) in real-world scenarios inevitably face environments with partial observability. This paper scrutinizes the effectiveness of a popular architecture, namely Transformers, in Partially Observable Markov Decision Processes (POMDPs) and reveals its theoretical limitations. We establish that regular languages, which Transformers… ▽ More

    Submitted 30 May, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024; references added; typos fixed

  26. arXiv:2405.17188  [pdf, other

    cs.CV

    The SkatingVerse Workshop & Challenge: Methods and Results

    Authors: Jian Zhao, Lei Jin, Jianshu Li, Zheng Zhu, Yinglei Teng, Jiaojiao Zhao, Sadaf Gulshad, Zheng Wang, Bo Zhao, Xiangbo Shu, Yunchao Wei, Xuecheng Nie, Xiaojie Jin, Xiaodan Liang, Shin'ichi Satoh, Yandong Guo, Cewu Lu, Junliang Xing, Jane Shen Shengmei

    Abstract: The SkatingVerse Workshop & Challenge aims to encourage research in developing novel and accurate methods for human action understanding. The SkatingVerse dataset used for the SkatingVerse Challenge has been publicly released. There are two subsets in the dataset, i.e., the training subset and testing subset. The training subsets consists of 19,993 RGB video sequences, and the testing subsets cons… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

  27. arXiv:2405.16409  [pdf, other

    cs.AI cs.LG

    Network Interdiction Goes Neural

    Authors: Lei Zhang, Zhiqian Chen, Chang-Tien Lu, Liang Zhao

    Abstract: Network interdiction problems are combinatorial optimization problems involving two players: one aims to solve an optimization problem on a network, while the other seeks to modify the network to thwart the first player's objectives. Such problems typically emerge in an attacker-defender context, encompassing areas such as military operations, disease spread analysis, and communication network man… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  28. arXiv:2405.16220  [pdf

    cs.CV

    DAFFNet: A Dual Attention Feature Fusion Network for Classification of White Blood Cells

    Authors: Yuzhuo Chen, Zetong Chen, Yunuo An, Chenyang Lu, Xu Qiao

    Abstract: The precise categorization of white blood cell (WBC) is crucial for diagnosing blood-related disorders. However, manual analysis in clinical settings is time-consuming, labor-intensive, and prone to errors. Numerous studies have employed machine learning and deep learning techniques to achieve objective WBC classification, yet these studies have not fully utilized the information of WBC images. Th… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  29. arXiv:2405.15143  [pdf, other

    cs.LG cs.AI cs.CL

    Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models

    Authors: Cong Lu, Shengran Hu, Jeff Clune

    Abstract: Go-Explore is a powerful family of algorithms designed to solve hard-exploration problems, built on the principle of archiving discovered states, and iteratively returning to and exploring from the most promising states. This approach has led to superhuman performance across a wide variety of challenging problems including Atari games and robotic control, but requires manually designing heuristics… ▽ More

    Submitted 30 May, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

  30. arXiv:2405.14014  [pdf, other

    cs.CV cs.AI cs.LG cs.RO

    RadarOcc: Robust 3D Occupancy Prediction with 4D Imaging Radar

    Authors: Fangqiang Ding, Xiangyu Wen, Lawrence Zhu, Yiming Li, Chris Xiaoxuan Lu

    Abstract: 3D occupancy-based perception pipeline has significantly advanced autonomous driving by capturing detailed scene descriptions and demonstrating strong generalizability across various object categories and shapes. Current methods predominantly rely on LiDAR or camera inputs for 3D occupancy prediction. These methods are susceptible to adverse weather conditions, limiting the all-weather deployment… ▽ More

    Submitted 13 June, 2024; v1 submitted 22 May, 2024; originally announced May 2024.

    Comments: 16 pages, 3 figures

  31. arXiv:2405.09591  [pdf, other

    cs.LG cs.AI

    A Comprehensive Survey on Data Augmentation

    Authors: Zaitian Wang, Pengfei Wang, Kunpeng Liu, Pengyang Wang, Yanjie Fu, Chang-Tien Lu, Charu C. Aggarwal, Jian Pei, Yuanchun Zhou

    Abstract: Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certa… ▽ More

    Submitted 17 May, 2024; v1 submitted 15 May, 2024; originally announced May 2024.

  32. arXiv:2405.07391  [pdf, other

    cs.RO cs.AI cs.LG

    AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch

    Authors: Max Yang, Chenghua Lu, Alex Church, Yijiong Lin, Chris Ford, Haoran Li, Efi Psomopoulou, David A. W. Barton, Nathan F. Lepora

    Abstract: Human hands are capable of in-hand manipulation in the presence of different hand motions. For a robot hand, harnessing rich tactile information to achieve this level of dexterity still remains a significant challenge. In this paper, we present AnyRotate, a system for gravity-invariant multi-axis in-hand object rotation using dense featured sim-to-real touch. We tackle this problem by training a d… ▽ More

    Submitted 11 June, 2024; v1 submitted 12 May, 2024; originally announced May 2024.

    Comments: Project website can be found at https://maxyang27896.github.io/anyrotate/

  33. arXiv:2405.07309  [pdf, other

    cs.RO cs.AI cs.CV cs.LG

    DiffGen: Robot Demonstration Generation via Differentiable Physics Simulation, Differentiable Rendering, and Vision-Language Model

    Authors: Yang Jin, Jun Lv, Shuqiang Jiang, Cewu Lu

    Abstract: Generating robot demonstrations through simulation is widely recognized as an effective way to scale up robot data. Previous work often trained reinforcement learning agents to generate expert policies, but this approach lacks sample efficiency. Recently, a line of work has attempted to generate robot demonstrations via differentiable simulation, which is promising but heavily relies on reward des… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

  34. arXiv:2405.05852  [pdf, other

    cs.CV cs.AI cs.CL cs.LG cs.RO stat.ML

    Pre-trained Text-to-Image Diffusion Models Are Versatile Representation Learners for Control

    Authors: Gunshi Gupta, Karmesh Yadav, Yarin Gal, Dhruv Batra, Zsolt Kira, Cong Lu, Tim G. J. Rudner

    Abstract: Embodied AI agents require a fine-grained understanding of the physical world mediated through visual and language inputs. Such capabilities are difficult to learn solely from task-specific data. This has led to the emergence of pre-trained vision-language models as a tool for transferring representations learned from internet-scale data to downstream tasks and new domains. However, commonly used… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  35. arXiv:2405.03150  [pdf, other

    cs.CV cs.LG

    Video Diffusion Models: A Survey

    Authors: Andrew Melnik, Michal Ljubljanac, Cong Lu, Qi Yan, Weiming Ren, Helge Ritter

    Abstract: Diffusion generative models have recently become a robust technique for producing and modifying coherent, high-quality video. This survey offers a systematic overview of critical elements of diffusion models for video generation, covering applications, architectural choices, and the modeling of temporal dynamics. Recent advancements in the field are summarized and grouped into development trends.… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  36. arXiv:2405.02897  [pdf, other

    cs.RO

    DexiTac: Soft Dexterous Tactile Gripping

    Authors: Chenghua Lu, Kailuan Tang, Max Yang, Tianqi Yue, Nathan F. Lepora

    Abstract: Grasping object,whether they are flat, round, or narrow and whether they have regular or irregular shapes,introduces difficulties in determining the ideal grasping posture, even for the most state-of-the-art grippers. In this article, we presented a reconfigurable pneumatic gripper with fingers that could be set in various configurations, such as hooking, supporting, closuring, and pinching. Each… ▽ More

    Submitted 5 May, 2024; originally announced May 2024.

    Comments: 11 pages, 12 figures

  37. arXiv:2405.00622  [pdf, other

    cs.CL cs.AI cs.LG

    Causal Evaluation of Language Models

    Authors: Sirui Chen, Bo Peng, Meiqi Chen, Ruiqi Wang, Mengying Xu, Xingyu Zeng, Rui Zhao, Shengjie Zhao, Yu Qiao, Chaochao Lu

    Abstract: Causal reasoning is viewed as crucial for achieving human-level machine intelligence. Recent advances in language models have expanded the horizons of artificial intelligence across various domains, sparking inquiries into their potential for causal reasoning. In this work, we introduce Causal evaluation of Language Models (CaLM), which, to the best of our knowledge, is the first comprehensive ben… ▽ More

    Submitted 1 May, 2024; originally announced May 2024.

    Comments: 315 pages, 230 figures, 21 tables. Project website: https://opencausalab.github.io/CaLM

  38. arXiv:2404.18151  [pdf, other

    cs.LO

    Decidability of Graph Neural Networks via Logical Characterizations

    Authors: Michael Benedikt, Chia-Hsuan Lu, Boris Motik, Tony Tan

    Abstract: We present results concerning the expressiveness and decidability of a popular graph learning formalism, graph neural networks (GNNs), exploiting connections with logic. We use a family of recently-discovered decidable logics involving "Presburger quantifiers". We show how to use these logics to measure the expressiveness of classes of GNNs, in some cases getting exact correspondences between the… ▽ More

    Submitted 23 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

  39. Beyond Imitation: A Life-long Policy Learning Framework for Path Tracking Control of Autonomous Driving

    Authors: C. Gong, C. Lu, Z. Li, Z. Liu, J. Gong, X. Chen

    Abstract: Model-free learning-based control methods have recently shown significant advantages over traditional control methods in avoiding complex vehicle characteristic estimation and parameter tuning. As a primary policy learning method, imitation learning (IL) is capable of learning control policies directly from expert demonstrations. However, the performance of IL policies is highly dependent on the d… ▽ More

    Submitted 26 April, 2024; originally announced April 2024.

    Journal ref: IEEE Transactions on Vehicular Technology 2024 Pages 1-14

  40. arXiv:2404.13475  [pdf, other

    quant-ph cs.AI cs.CR cs.ET cs.LG

    PristiQ: A Co-Design Framework for Preserving Data Security of Quantum Learning in the Cloud

    Authors: Zhepeng Wang, Yi Sheng, Nirajan Koirala, Kanad Basu, Taeho Jung, Cheng-Chang Lu, Weiwen Jiang

    Abstract: Benefiting from cloud computing, today's early-stage quantum computers can be remotely accessed via the cloud services, known as Quantum-as-a-Service (QaaS). However, it poses a high risk of data leakage in quantum machine learning (QML). To run a QML model with QaaS, users need to locally compile their quantum circuits including the subcircuit of data encoding first and then send the compiled cir… ▽ More

    Submitted 20 April, 2024; originally announced April 2024.

  41. arXiv:2404.12281  [pdf, other

    cs.RO

    RISE: 3D Perception Makes Real-World Robot Imitation Simple and Effective

    Authors: Chenxi Wang, Hongjie Fang, Hao-Shu Fang, Cewu Lu

    Abstract: Precise robot manipulations require rich spatial information in imitation learning. Image-based policies model object positions from fixed cameras, which are sensitive to camera view changes. Policies utilizing 3D point clouds usually predict keyframes rather than continuous actions, posing difficulty in dynamic and contact-rich scenarios. To utilize 3D perception efficiently, we present RISE, an… ▽ More

    Submitted 21 April, 2024; v1 submitted 18 April, 2024; originally announced April 2024.

  42. arXiv:2404.10324  [pdf

    cs.LG cs.CE eess.SY

    Graph neural network-based surrogate modelling for real-time hydraulic prediction of urban drainage networks

    Authors: Zhiyu Zhang, Chenkaixiang Lu, Wenchong Tian, Zhenliang Liao, Zhiguo Yuan

    Abstract: Physics-based models are computationally time-consuming and infeasible for real-time scenarios of urban drainage networks, and a surrogate model is needed to accelerate the online predictive modelling. Fully-connected neural networks (NNs) are potential surrogate models, but may suffer from low interpretability and efficiency in fitting complex targets. Owing to the state-of-the-art modelling powe… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

  43. arXiv:2404.10227  [pdf, other

    cs.CV cs.RO

    MS-MANO: Enabling Hand Pose Tracking with Biomechanical Constraints

    Authors: Pengfei Xie, Wenqiang Xu, Tutian Tang, Zhenjun Yu, Cewu Lu

    Abstract: This work proposes a novel learning framework for visual hand dynamics analysis that takes into account the physiological aspects of hand motion. The existing models, which are simplified joint-actuated systems, often produce unnatural motions. To address this, we integrate a musculoskeletal system with a learnable parametric hand model, MANO, to create a new model, MS-MANO. This model emulates th… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: 11 pages, 5 figures; CVPR 2024

  44. arXiv:2404.06687  [pdf, other

    cs.RO eess.SY

    Fast and Accurate Relative Motion Tracking for Two Industrial Robots

    Authors: Honglu He, Chen-lung Lu, Glenn Saunders, Pinghai Yang, Jeffrey Schoonover, John Wason, Santiago Paternain, Agung Julius, John T. Wen

    Abstract: Industrial robotic applications such as spraying, welding, and additive manufacturing frequently require fast, accurate, and uniform motion along a 3D spatial curve. To increase process throughput, some manufacturers propose a dual-robot setup to overcome the speed limitation of a single robot. Industrial robot motion is programmed through waypoints connected by motion primitives (Cartesian linear… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  45. arXiv:2404.06356  [pdf, other

    cs.LG cs.AI cs.RO

    Policy-Guided Diffusion

    Authors: Matthew Thomas Jackson, Michael Tryfan Matthews, Cong Lu, Benjamin Ellis, Shimon Whiteson, Jakob Foerster

    Abstract: In many real-world settings, agents must learn from an offline dataset gathered by some prior behavior policy. Such a setting naturally leads to distribution shift between the behavior policy and the target policy being trained - requiring policy conservatism to avoid instability and overestimation bias. Autoregressive world models offer a different solution to this by generating synthetic, on-pol… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

    Comments: Previously at the NeurIPS 2023 Workshop on Robot Learning

  46. arXiv:2404.03590  [pdf, other

    cs.CV cs.AI

    SemGrasp: Semantic Grasp Generation via Language Aligned Discretization

    Authors: Kailin Li, Jingbo Wang, Lixin Yang, Cewu Lu, Bo Dai

    Abstract: Generating natural human grasps necessitates consideration of not just object geometry but also semantic information. Solely depending on object shape for grasp generation confines the applications of prior methods in downstream tasks. This paper presents a novel semantic-based grasp generation method, termed SemGrasp, which generates a static human grasp pose by incorporating semantic information… ▽ More

    Submitted 4 April, 2024; originally announced April 2024.

  47. arXiv:2404.01078  [pdf, other

    cs.LG

    Energy-based Model for Accurate Shapley Value Estimation in Interpretable Deep Learning Predictive Modeling

    Authors: Cheng Lu, Jiusun Zeng, Yu Xia, Jinhui Cai, Shihua Luo

    Abstract: As a favorable tool for explainable artificial intelligence (XAI), Shapley value has been widely used to interpret deep learning based predictive models. However, accurate and efficient estimation of Shapley value is difficult since the computation load grows exponentially with the increase of input features. Most existing accelerated estimation methods have to compromise on estimation accuracy wi… ▽ More

    Submitted 5 May, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

  48. arXiv:2403.19622  [pdf, other

    cs.RO cs.CV

    RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents

    Authors: Zeren Chen, Zhelun Shi, Xiaoya Lu, Lehan He, Sucheng Qian, Hao Shu Fang, Zhenfei Yin, Wanli Ouyang, Jing Shao, Yu Qiao, Cewu Lu, Lu Sheng

    Abstract: The ultimate goals of robotic learning is to acquire a comprehensive and generalizable robotic system capable of performing both seen skills within the training distribution and unseen skills in novel environments. Recent progress in utilizing language models as high-level planners has demonstrated that the complexity of tasks can be reduced through decomposing them into primitive-level plans, mak… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 24 pages, 12 figures, 6 tables

  49. arXiv:2403.19417  [pdf, other

    cs.CV

    OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

    Authors: Xinyu Zhan, Lixin Yang, Yifei Zhao, Kangrui Mao, Hanlin Xu, Zenan Lin, Kailin Li, Cewu Lu

    Abstract: We present OAKINK2, a dataset of bimanual object manipulation tasks for complex daily activities. In pursuit of constructing the complex tasks into a structured representation, OAKINK2 introduces three level of abstraction to organize the manipulation tasks: Affordance, Primitive Task, and Complex Task. OAKINK2 features on an object-centric perspective for decoding the complex tasks, treating them… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: To be appeared in CVPR 2024. 26 pages

  50. arXiv:2403.18346  [pdf, other

    cs.CL cs.CV

    Quantifying and Mitigating Unimodal Biases in Multimodal Large Language Models: A Causal Perspective

    Authors: Meiqi Chen, Yixin Cao, Yan Zhang, Chaochao Lu

    Abstract: Recent advancements in Large Language Models (LLMs) have facilitated the development of Multimodal LLMs (MLLMs). Despite their impressive capabilities, MLLMs often suffer from an over-reliance on unimodal biases (e.g., language bias and vision bias), leading to incorrect answers in complex multimodal tasks. To investigate this issue, we propose a causal framework to interpret the biases in Visual… ▽ More

    Submitted 3 April, 2024; v1 submitted 27 March, 2024; originally announced March 2024.