Skip to main content

Showing 1–50 of 885 results for author: Dong, H

  1. arXiv:2407.09646  [pdf, other

    cs.CV

    Hamba: Single-view 3D Hand Reconstruction with Graph-guided Bi-Scanning Mamba

    Authors: Haoye Dong, Aviral Chharia, Wenbo Gou, Francisco Vicente Carrasco, Fernando De la Torre

    Abstract: 3D Hand reconstruction from a single RGB image is challenging due to the articulated motion, self-occlusion, and interaction with objects. Existing SOTA methods employ attention-based transformers to learn the 3D hand pose and shape, but they fail to achieve robust and accurate performance due to insufficient modeling of joint spatial relations. To address this problem, we propose a novel graph-gu… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: 25 pages

  2. arXiv:2407.09025  [pdf, other

    cs.AI

    SpreadsheetLLM: Encoding Spreadsheets for Large Language Models

    Authors: Yuzhang Tian, Jianbo Zhao, Haoyu Dong, Junyu Xiong, Shiyu Xia, Mengyu Zhou, Yun Lin, José Cambronero, Yeye He, Shi Han, Dongmei Zhang

    Abstract: Spreadsheets, with their extensive two-dimensional grids, various layouts, and diverse formatting options, present notable challenges for large language models (LLMs). In response, we introduce SpreadsheetLLM, pioneering an efficient encoding method designed to unleash and optimize LLMs' powerful understanding and reasoning capability on spreadsheets. Initially, we propose a vanilla serialization… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

  3. arXiv:2407.06168  [pdf, other

    cs.RO cs.CV

    TARGO: Benchmarking Target-driven Object Grasping under Occlusions

    Authors: Yan Xia, Ran Ding, Ziyuan Qin, Guanqi Zhan, Kaichen Zhou, Long Yang, Hao Dong, Daniel Cremers

    Abstract: Recent advances in predicting 6D grasp poses from a single depth image have led to promising performance in robotic grasping. However, previous grasping models face challenges in cluttered environments where nearby objects impact the target object's grasp. In this paper, we first establish a new benchmark dataset for TARget-driven Grasping under Occlusions, named TARGO. We make the following contr… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: 19 pages, 17 figures

  4. arXiv:2407.05411  [pdf, other

    cs.SE

    Assessing Code Generation with Intermediate Languages

    Authors: Xun Deng, Sicheng Zhong, Honghua Dong, Jingyu Hu, Sidi Mohamed Beillahi, Xujie Si, Fan Long

    Abstract: Intermediate step methodologies like chain of thoughts (COT) have demonstrated effectiveness in enhancing the performance of Large Language Models (LLMs) on code generation. This study explores the utilization of intermediate languages, including various programming languages, natural language solutions, and pseudo-code, and systematically evaluates their impact on the performance of LLMs in code… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

  5. arXiv:2407.02483  [pdf, other

    cs.CL cs.AI

    MMedAgent: Learning to Use Medical Tools with Multi-modal Agent

    Authors: Binxu Li, Tiankai Yan, Yuanting Pan, Zhe Xu, Jie Luo, Ruiyang Ji, Shilong Liu, Haoyu Dong, Zihao Lin, Yixin Wang

    Abstract: Multi-Modal Large Language Models (MLLMs), despite being successful, exhibit limited generality and often fall short when compared to specialized models. Recently, LLM-based agents have been developed to address these challenges by selecting appropriate specialized models as tools based on user inputs. However, such advancements have not been extensively explored within the medical domain. To brid… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

  6. arXiv:2407.01518  [pdf, other

    cs.CV cs.AI cs.LG

    Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision

    Authors: Hao Dong, Eleni Chatzi, Olga Fink

    Abstract: The task of open-set domain generalization (OSDG) involves recognizing novel classes within unseen domains, which becomes more challenging with multiple modalities as input. Existing works have only addressed unimodal OSDG within the meta-learning framework, without considering multimodal scenarios. In this work, we introduce a novel approach to address Multimodal Open-Set Domain Generalization (M… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024, code: https://github.com/donghao51/MOOSA

  7. arXiv:2406.19711  [pdf, other

    cs.LG

    CHASE: A Causal Heterogeneous Graph based Framework for Root Cause Analysis in Multimodal Microservice Systems

    Authors: Ziming Zhao, Tiehua Zhang, Zhishu Shen, Hai Dong, Xingjun Ma, Xianhui Liu, Yun Yang

    Abstract: In recent years, the widespread adoption of distributed microservice architectures within the industry has significantly increased the demand for enhanced system availability and robustness. Due to the complex service invocation paths and dependencies at enterprise-level microservice systems, it is challenging to locate the anomalies promptly during service invocations, thus causing intractable is… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  8. arXiv:2406.17898  [pdf, other

    cs.RO cs.AI

    Human-centered In-building Embodied Delivery Benchmark

    Authors: Zhuoqun Xu, Yang Liu, Xiaoqi Li, Jiyao Zhang, Hao Dong

    Abstract: Recently, the concept of embodied intelligence has been widely accepted and popularized, leading people to naturally consider the potential for commercialization in this field. In this work, we propose a specific commercial scenario simulation, human-centered in-building embodied delivery. Furthermore, for this scenario, we have developed a brand-new virtual environment system from scratch, constr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  9. arXiv:2406.17841  [pdf, other

    quant-ph cs.AI

    Probing many-body Bell correlation depth with superconducting qubits

    Authors: Ke Wang, Weikang Li, Shibo Xu, Mengyao Hu, Jiachen Chen, Yaozu Wu, Chuanyu Zhang, Feitong Jin, Xuhao Zhu, Yu Gao, Ziqi Tan, Aosai Zhang, Ning Wang, Yiren Zou, Tingting Li, Fanhao Shen, Jiarun Zhong, Zehang Bao, Zitian Zhu, Zixuan Song, Jinfeng Deng, Hang Dong, Xu Zhang, Pengfei Zhang, Wenjie Jiang , et al. (10 additional authors not shown)

    Abstract: Quantum nonlocality describes a stronger form of quantum correlation than that of entanglement. It refutes Einstein's belief of local realism and is among the most distinctive and enigmatic features of quantum mechanics. It is a crucial resource for achieving quantum advantages in a variety of practical applications, ranging from cryptography and certified random number generation via self-testing… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages,6 figures + 14 pages, 6 figures

  10. arXiv:2406.17551  [pdf, other

    quant-ph

    Sharing tripartite nonlocality sequentially using only projective measurements

    Authors: Yiyang Xu, Hao Sun, Fenzhuo Guo, Haifeng Dong, Qiaoyan Wen

    Abstract: Bell nonlocality is a valuable resource in quantum information processing tasks. Scientists are interested in whether a single entangled state can generate a long sequence of nonlocal correlations. Previous work has accomplished sequential tripartite nonlocality sharing through unsharp measurements. In this paper, we investigate the sharing of tripartite nonlocality using only projective measureme… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 11 pages, 7 figures

  11. arXiv:2406.16992  [pdf, other

    cs.LG cs.AI

    Make Graph Neural Networks Great Again: A Generic Integration Paradigm of Topology-Free Patterns for Traffic Speed Prediction

    Authors: Yicheng Zhou, Pengfei Wang, Hao Dong, Denghui Zhang, Dingqi Yang, Yanjie Fu, Pengyang Wang

    Abstract: Urban traffic speed prediction aims to estimate the future traffic speed for improving urban transportation services. Enormous efforts have been made to exploit Graph Neural Networks (GNNs) for modeling spatial correlations and temporal dependencies of traffic speed evolving patterns, regularized by graph topology.While achieving promising results, current traffic speed prediction methods still su… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

    Comments: Accepted to IJCAI 2024

  12. arXiv:2406.13642  [pdf, other

    cs.CV

    SpatialBot: Precise Spatial Understanding with Vision Language Models

    Authors: Wenxiao Cai, Yaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao

    Abstract: Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI. In this paper, we propose SpatialBot for better spatial understanding by feeding both RGB and depth images. Additionally, we have constructed the SpatialQA dataset, which involves multi-level depth-related… ▽ More

    Submitted 15 July, 2024; v1 submitted 19 June, 2024; originally announced June 2024.

  13. arXiv:2406.13161  [pdf, other

    cs.AI cs.CL cs.LG cs.PL

    APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts

    Authors: Honghua Dong, Qidong Su, Yubo Gao, Zhaoyu Li, Yangjun Ruan, Gennady Pekhimenko, Chris J. Maddison, Xujie Si

    Abstract: Large Language Models (LLMs) have become increasingly capable of handling diverse tasks with the aid of well-crafted prompts and integration of external tools, but as task complexity rises, the workflow involving LLMs can be complicated and thus challenging to implement and maintain. To address this challenge, we propose APPL, A Prompt Programming Language that acts as a bridge between computer pr… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

  14. arXiv:2406.11548  [pdf, other

    cs.RO cs.AI cs.CV

    AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation

    Authors: Chuyan Xiong, Chengyu Shen, Xiaoqi Li, Kaichen Zhou, Jiaming Liu, Ruiping Wang, Hao Dong

    Abstract: The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects.Observing the generalization and reasoning capabilities of Multimodal Large Language Models (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly.However, these methods typically focus on high-level planning corrections using an ad… ▽ More

    Submitted 23 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  15. arXiv:2406.07579  [pdf, other

    cs.AI cs.GR cs.LG

    GFPack++: Improving 2D Irregular Packing by Learning Gradient Field with Attention

    Authors: Tianyang Xue, Lin Lu, Yang Liu, Mingdong Wu, Hao Dong, Yanbin Zhang, Renmin Han, Baoquan Chen

    Abstract: 2D irregular packing is a classic combinatorial optimization problem with various applications, such as material utilization and texture atlas generation. This NP-hard problem requires efficient algorithms to optimize space utilization. Conventional numerical methods suffer from slow convergence and high computational cost. Existing learning-based methods, such as the score-based diffusion model,… ▽ More

    Submitted 9 June, 2024; originally announced June 2024.

  16. arXiv:2406.07549  [pdf, other

    cs.RO

    A3VLM: Actionable Articulation-Aware Vision Language Model

    Authors: Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu, Hao Dong, Peng Gao, Abdeslam Boularias, Hongsheng Li

    Abstract: Vision Language Models (VLMs) have received significant attention in recent years in the robotics community. VLMs are shown to be able to perform complex visual reasoning and scene understanding tasks, which makes them regarded as a potential universal solution for general robotics problems such as manipulation and navigation. However, previous VLMs for robotics such as RT-1, RT-2, and ManipLLM ha… ▽ More

    Submitted 13 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  17. arXiv:2406.04882  [pdf, other

    cs.RO cs.AI cs.CL cs.CV

    InstructNav: Zero-shot System for Generic Instruction Navigation in Unexplored Environment

    Authors: Yuxing Long, Wenzhe Cai, Hongcheng Wang, Guanqi Zhan, Hao Dong

    Abstract: Enabling robots to navigate following diverse language instructions in unexplored environments is an attractive goal for human-robot interaction. However, this goal is challenging because different navigation tasks require different strategies. The scarcity of instruction navigation data hinders training an instruction navigation model with varied strategies. Therefore, previous methods are all co… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: Submitted to CoRL 2024

  18. arXiv:2406.04316  [pdf, other

    cs.CV

    Omni6DPose: A Benchmark and Model for Universal 6D Object Pose Estimation and Tracking

    Authors: Jiyao Zhang, Weiyao Huang, Bo Peng, Mingdong Wu, Fei Hu, Zijian Chen, Bo Zhao, Hao Dong

    Abstract: 6D Object Pose Estimation is a crucial yet challenging task in computer vision, suffering from a significant lack of large-scale datasets. This scarcity impedes comprehensive evaluation of model performance, limiting research advancements. Furthermore, the restricted number of available instances or categories curtails its applications. To address these issues, this paper introduces Omni6DPose, a… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  19. arXiv:2406.02283  [pdf, other

    cs.RO

    Broadcasting Support Relations Recursively from Local Dynamics for Object Retrieval in Clutters

    Authors: Yitong Li, Ruihai Wu, Haoran Lu, Chuanruo Ning, Yan Shen, Guanqi Zhan, Hao Dong

    Abstract: In our daily life, cluttered objects are everywhere, from scattered stationery and books cluttering the table to bowls and plates filling the kitchen sink. Retrieving a target object from clutters is an essential while challenging skill for robots, for the difficulty of safely manipulating an object without disturbing others, which requires the robot to plan a manipulation sequence and first move… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: RSS 2024

  20. arXiv:2406.01047  [pdf, other

    cs.DC cs.AI cs.LG

    An Advanced Reinforcement Learning Framework for Online Scheduling of Deferrable Workloads in Cloud Computing

    Authors: Hang Dong, Liwen Zhu, Zhao Shan, Bo Qiao, Fangkai Yang, Si Qin, Chuan Luo, Qingwei Lin, Yuwen Yang, Gurpreet Virdi, Saravan Rajmohan, Dongmei Zhang, Thomas Moscibroda

    Abstract: Efficient resource utilization and perfect user experience usually conflict with each other in cloud computing platforms. Great efforts have been invested in increasing resource utilization but trying not to affect users' experience for cloud computing platforms. In order to better utilize the remaining pieces of computing resources spread over the whole platform, deferrable jobs are provided with… ▽ More

    Submitted 3 June, 2024; originally announced June 2024.

  21. arXiv:2406.00439  [pdf, other

    cs.RO cs.CV

    Learning Manipulation by Predicting Interaction

    Authors: Jia Zeng, Qingwen Bu, Bangjun Wang, Wenke Xia, Li Chen, Hao Dong, Haoming Song, Dong Wang, Di Hu, Ping Luo, Heming Cui, Bin Zhao, Xuelong Li, Yu Qiao, Hongyang Li

    Abstract: Representation learning approaches for robotic manipulation have boomed in recent years. Due to the scarcity of in-domain robot data, prevailing methodologies tend to leverage large-scale human video datasets to extract generalizable features for visuomotor policy learning. Despite the progress achieved, prior endeavors disregard the interactive dynamics that capture behavior patterns and physical… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: Accepted to RSS 2024. Project page: https://github.com/OpenDriveLab/MPI

  22. arXiv:2405.19092  [pdf, other

    cs.CV

    Benchmarking and Improving Detail Image Caption

    Authors: Hongyuan Dong, Jiawen Li, Bohong Wu, Jiacong Wang, Yuan Zhang, Haoyuan Guo

    Abstract: Image captioning has long been regarded as a fundamental task in visual understanding. Recently, however, few large vision-language model (LVLM) research discusses model's image captioning performance because of the outdated short-caption benchmarks and unreliable evaluation metrics. In this work, we propose to benchmark detail image caption task by curating high-quality evaluation datasets annota… ▽ More

    Submitted 7 July, 2024; v1 submitted 29 May, 2024; originally announced May 2024.

  23. arXiv:2405.17419  [pdf, other

    cs.CV cs.AI cs.LG

    MultiOOD: Scaling Out-of-Distribution Detection for Multiple Modalities

    Authors: Hao Dong, Yue Zhao, Eleni Chatzi, Olga Fink

    Abstract: Detecting out-of-distribution (OOD) samples is important for deploying machine learning models in safety-critical applications such as autonomous driving and robot-assisted surgery. Existing research has mainly focused on unimodal scenarios on image data. However, real-world applications are inherently multimodal, which makes it essential to leverage information from multiple modalities to enhance… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Code and MultiOOD benchmark: https://github.com/donghao51/MultiOOD

  24. arXiv:2405.17158  [pdf, other

    cs.CV

    PatchScaler: An Efficient Patch-Independent Diffusion Model for Super-Resolution

    Authors: Yong Liu, Hang Dong, Jinshan Pan, Qingji Dong, Kai Chen, Rongxiang Zhang, Lean Fu, Fei Wang

    Abstract: Diffusion models significantly improve the quality of super-resolved images with their impressive content generation capabilities. However, the huge computational costs limit the applications of these methods.Recent efforts have explored reasonable inference acceleration to reduce the number of sampling steps, but the computational cost remains high as each step is performed on the entire image.Th… ▽ More

    Submitted 11 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

  25. arXiv:2405.17147  [pdf, other

    cs.MM

    Large Language Models (LLMs): Deployment, Tokenomics and Sustainability

    Authors: Haiwei Dong, Shuang Xie

    Abstract: The rapid advancement of Large Language Models (LLMs) has significantly impacted human-computer interaction, epitomized by the release of GPT-4o, which introduced comprehensive multi-modality capabilities. In this paper, we first explored the deployment strategies, economic considerations, and sustainability challenges associated with the state-of-the-art LLMs. More specifically, we discussed the… ▽ More

    Submitted 27 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE CTSoc-NCT

  26. arXiv:2405.16734  [pdf, other

    stat.ML cs.LG

    Faster Sampling via Stochastic Gradient Proximal Sampler

    Authors: Xunpeng Huang, Difan Zou, Yi-An Ma, Hanze Dong, Tong Zhang

    Abstract: Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochasti… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 48 pages, 2 figures, 5 tables

  27. arXiv:2405.16387  [pdf, other

    stat.ML cs.LG

    Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference

    Authors: Xunpeng Huang, Difan Zou, Hanze Dong, Yi Zhang, Yi-An Ma, Tong Zhang

    Abstract: To generate data from trained diffusion models, most inference algorithms, such as DDPM, DDIM, and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Ga… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

    Comments: 68 pages, 2 figures

  28. arXiv:2405.16234  [pdf, other

    cs.CV

    Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities

    Authors: Shiyu Xia, Junyu Xiong, Haoyu Dong, Jianbo Zhao, Yuzhang Tian, Mengyu Zhou, Yeye He, Shi Han, Dongmei Zhang

    Abstract: This paper explores capabilities of Vision Language Models on spreadsheet comprehension. We propose three self-supervised challenges with corresponding evaluation metrics to comprehensively evaluate VLMs on Optical Character Recognition (OCR), spatial perception, and visual format recognition. Additionally, we utilize the spreadsheet table detection task to assess the overall performance of VLMs b… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  29. arXiv:2405.14156  [pdf, other

    cs.CV

    Unveiling the Tapestry of Consistency in Large Vision-Language Models

    Authors: Yuan Zhang, Fei Xiao, Tao Huang, Chun-Kai Fan, Hongyuan Dong, Jiawen Li, Jiacong Wang, Kuan Cheng, Shanghang Zhang, Haoyuan Guo

    Abstract: Large vision-language models (LVLMs) have recently achieved rapid progress, exhibiting great perception and reasoning abilities concerning visual information. However, when faced with prompts in different sizes of solution spaces, LVLMs fail to always give consistent answers regarding the same knowledge point. This inconsistency of answers between different solution spaces is prevalent in LVLMs an… ▽ More

    Submitted 7 June, 2024; v1 submitted 23 May, 2024; originally announced May 2024.

    Comments: This project is available at https://github.com/foundation-multimodal-models/ConBench

  30. arXiv:2405.13063  [pdf, other

    physics.ao-ph cs.LG

    Aurora: A Foundation Model of the Atmosphere

    Authors: Cristian Bodnar, Wessel P. Bruinsma, Ana Lucic, Megan Stanley, Johannes Brandstetter, Patrick Garvan, Maik Riechert, Jonathan Weyn, Haiyu Dong, Anna Vaughan, Jayesh K. Gupta, Kit Tambiratnam, Alex Archibald, Elizabeth Heider, Max Welling, Richard E. Turner, Paris Perdikaris

    Abstract: Deep learning foundation models are revolutionizing many facets of science by leveraging vast amounts of data to learn general-purpose representations that can be adapted to tackle diverse downstream tasks. Foundation models hold the promise to also transform our ability to model our planet and its subsystems by exploiting the vast expanse of Earth system data. Here we introduce Aurora, a large-sc… ▽ More

    Submitted 28 May, 2024; v1 submitted 20 May, 2024; originally announced May 2024.

  31. arXiv:2405.10440  [pdf, other

    cs.CL

    Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease Identification

    Authors: Jinge Wu, Hang Dong, Zexi Li, Arijit Patra, Honghan Wu

    Abstract: The infrequency and heterogeneity of clinical presentations in rare diseases often lead to underdiagnosis and their exclusion from structured datasets. This necessitates the utilization of unstructured text data for comprehensive analysis. However, the manual identification from clinical reports is an arduous and intrinsically subjective task. This study proposes a novel hybrid approach that syner… ▽ More

    Submitted 16 May, 2024; originally announced May 2024.

  32. arXiv:2405.08099  [pdf, other

    cs.CL

    KET-QA: A Dataset for Knowledge Enhanced Table Question Answering

    Authors: Mengkang Hu, Haoyu Dong, Ping Luo, Shi Han, Dongmei Zhang

    Abstract: Due to the concise and structured nature of tables, the knowledge contained therein may be incomplete or missing, posing a significant challenge for table question answering (TableQA) and data analysis systems. Most existing datasets either fail to address the issue of external knowledge in TableQA or only utilize unstructured text as supplementary information for tables. In this paper, we propose… ▽ More

    Submitted 13 May, 2024; originally announced May 2024.

    Comments: LREC-Coling 2024

  33. arXiv:2405.07863  [pdf, other

    cs.LG cs.AI cs.CL stat.ML

    RLHF Workflow: From Reward Modeling to Online RLHF

    Authors: Hanze Dong, Wei Xiong, Bo Pang, Haoxiang Wang, Han Zhao, Yingbo Zhou, Nan Jiang, Doyen Sahoo, Caiming Xiong, Tong Zhang

    Abstract: We present the workflow of Online Iterative Reinforcement Learning from Human Feedback (RLHF) in this technical report, which is widely reported to outperform its offline counterpart by a large margin in the recent large language model (LLM) literature. However, existing open-source RLHF projects are still largely confined to the offline learning setting. In this technical report, we aim to fill i… ▽ More

    Submitted 12 June, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

  34. arXiv:2405.07759  [pdf, other

    cs.MM cs.AI cs.NI eess.IV

    MADRL-Based Rate Adaptation for 360° Video Streaming with Multi-Viewpoint Prediction

    Authors: Haopeng Wang, Zijian Long, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: Over the last few years, 360° video traffic on the network has grown significantly. A key challenge of 360° video playback is ensuring a high quality of experience (QoE) with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate (ABR) streaming based on single viewport prediction to reduce bandwidth consumption. However, the performance of models for single-viewpo… ▽ More

    Submitted 17 May, 2024; v1 submitted 13 May, 2024; originally announced May 2024.

    Comments: Accepted by IEEE Internet of Things Journal

  35. arXiv:2405.06903  [pdf, other

    cs.CV

    UniGarmentManip: A Unified Framework for Category-Level Garment Manipulation via Dense Visual Correspondence

    Authors: Ruihai Wu, Haoran Lu, Yiyan Wang, Yubo Wang, Hao Dong

    Abstract: Garment manipulation (e.g., unfolding, folding and hanging clothes) is essential for future robots to accomplish home-assistant tasks, while highly challenging due to the diversity of garment configurations, geometries and deformations. Although able to manipulate similar shaped garments in a certain task, previous works mostly have to design different policies for different tasks, could not gener… ▽ More

    Submitted 11 May, 2024; originally announced May 2024.

    Comments: CVPR 2024

  36. arXiv:2405.05769  [pdf, other

    cs.CV

    Exploring Text-Guided Single Image Editing for Remote Sensing Images

    Authors: Fangzhou Han, Lingyu Si, Hongwei Dong, Lamei Zhang, Hao Chen, Bo Du

    Abstract: Artificial Intelligence Generative Content (AIGC) technologies have significantly influenced the remote sensing domain, particularly in the realm of image generation. However, remote sensing image editing, an equally vital research area, has not garnered sufficient attention. Different from text-guided editing in natural images, which relies on extensive text-image paired data for semantic correla… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

  37. arXiv:2405.02369  [pdf, other

    cs.NE cs.AI cs.LG

    No One-Size-Fits-All Neurons: Task-based Neurons for Artificial Neural Networks

    Authors: Feng-Lei Fan, Meng Wang, Hang-Cheng Dong, Jianwei Ma, Tieyong Zeng

    Abstract: Biologically, the brain does not rely on a single type of neuron that universally functions in all aspects. Instead, it acts as a sophisticated designer of task-based neurons. In this study, we address the following question: since the human brain is a task-based neuron user, can the artificial network design go from the task-based architecture design to the task-based neuron design? Since methodo… ▽ More

    Submitted 3 May, 2024; originally announced May 2024.

    Comments: 12 pages, 4 figures

  38. arXiv:2404.19202  [pdf, other

    hep-ph hep-ex nucl-ex

    Dihadron helicity correlation in photon-nucleus collisions

    Authors: Zhao-Xuan Chen, Hui Dong, Shu-Yi Wei

    Abstract: The helicity correlation of two back-to-back hadrons is a powerful tool that makes it possible to probe the longitudinal spin transfer, $G_{1L}$, in unpolarized hadronic collisions. In this work, we investigate the helicity correlation of back-to-back dihadrons produced in photon-nucleus collisions with both space-like and quasireal photons and explore its potential in understanding the flavor dep… ▽ More

    Submitted 29 April, 2024; originally announced April 2024.

    Comments: 10 pages

  39. arXiv:2404.18356  [pdf, other

    cs.DC

    FEDQ-Trust: Efficient Data-Driven Trust Prediction for Mobile Edge-Based IoT Systems

    Authors: Jiahui Bai, Hai Dong, Athman Bouguettaya

    Abstract: We introduce FEDQ-Trust, an innovative data-driven trust prediction approach designed for mobile edge-based Internet of Things (IoT) environments. The decentralized nature of mobile edge environments introduces challenges due to variations in data distribution, impacting the accuracy and training efficiency of existing distributed data-driven trust prediction models. FEDQ-Trust effectively tackles… ▽ More

    Submitted 28 April, 2024; originally announced April 2024.

    Comments: 14 pages, 6 figures, submitted to IEEE Transactions on Services Computing

  40. arXiv:2404.18195  [pdf, other

    cond-mat.stat-mech physics.bio-ph

    Power-Efficiency Constraint for Biological Rotary Motor Driven by Chemical Gradient

    Authors: Ruo-Xun Zhai, Hui Dong

    Abstract: Biological motors, as counterpart of heat engines, are driven by a chemical gradient to output mechanical work and play critical roles in biological functions, such as the ATP synthesis. Its thermodynamic efficiency, along with power, are two vital criteria for evaluating the performance of these motors. In this letter, we investigate a model of a microscopic chemical engine and assess the constra… ▽ More

    Submitted 5 May, 2024; v1 submitted 28 April, 2024; originally announced April 2024.

    Comments: 9pages, 7 figures

  41. arXiv:2404.17780  [pdf, other

    cs.MA cs.AI

    Verco: Learning Coordinated Verbal Communication for Multi-agent Reinforcement Learning

    Authors: Dapeng Li, Hang Dong, Lu Wang, Bo Qiao, Si Qin, Qingwei Lin, Dongmei Zhang, Qi Zhang, Zhiwei Xu, Bin Zhang, Guoliang Fan

    Abstract: In recent years, multi-agent reinforcement learning algorithms have made significant advancements in diverse gaming environments, leading to increased interest in the broader application of such techniques. To address the prevalent challenge of partial observability, communication-based algorithms have improved cooperative performance through the sharing of numerical embedding between agents. Howe… ▽ More

    Submitted 27 April, 2024; originally announced April 2024.

    Comments: 12 pages, 6 figures

  42. Tile-Weighted Rate-Distortion Optimized Packet Scheduling for 360$^\circ$ VR Video Streaming

    Authors: Haopeng Wang, Haiwei Dong, Abdulmotaleb El Saddik

    Abstract: A key challenge of 360$^\circ$ VR video streaming is ensuring high quality with limited network bandwidth. Currently, most studies focus on tile-based adaptive bitrate streaming to reduce bandwidth consumption, where resources in network nodes are not fully utilized. This article proposes a tile-weighted rate-distortion (TWRD) packet scheduling optimization system to reduce data volume and improve… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IEEE Intelligent Systems

  43. arXiv:2404.11336  [pdf, other

    eess.SY cs.CV cs.RO

    Vision-based control for landing an aerial vehicle on a marine vessel

    Authors: Haohua Dong

    Abstract: This work addresses the landing problem of an aerial vehicle, exemplified by a simple quadrotor, on a moving platform using image-based visual servo control. First, the mathematical model of the quadrotor aircraft is introduced, followed by the design of the inner-loop control. At the second stage, the image features on the textured target plane are exploited to derive a vision-based control law.… ▽ More

    Submitted 17 April, 2024; originally announced April 2024.

  44. arXiv:2404.09957  [pdf, other

    cs.CV cs.LG

    How to build the best medical image segmentation algorithm using foundation models: a comprehensive empirical study with Segment Anything Model

    Authors: Hanxue Gu, Haoyu Dong, Jichen Yang, Maciej A. Mazurowski

    Abstract: Automated segmentation is a fundamental medical image analysis task, which enjoys significant advances due to the advent of deep learning. While foundation models have been useful in natural language processing and some vision tasks for some time, the foundation model developed with image segmentation in mind - Segment Anything Model (SAM) - has been developed only recently and has shown similar p… ▽ More

    Submitted 13 May, 2024; v1 submitted 15 April, 2024; originally announced April 2024.

    Comments: Code available at https://github.com/mazurowski-lab/finetune-SAM

  45. arXiv:2404.07318  [pdf, other

    eess.IV cs.CV cs.LG

    Rethinking Perceptual Metrics for Medical Image Translation

    Authors: Nicholas Konz, Yuwen Chen, Hanxue Gu, Haoyu Dong, Maciej A. Mazurowski

    Abstract: Modern medical image translation methods use generative models for tasks such as the conversion of CT images to MRI. Evaluating these methods typically relies on some chosen downstream task in the target domain, such as segmentation. On the other hand, task-agnostic metrics are attractive, such as the network feature-based perceptual metrics (e.g., FID) that are common to image translation in gene… ▽ More

    Submitted 10 April, 2024; originally announced April 2024.

  46. arXiv:2404.04050  [pdf, other

    cs.CV

    No Time to Train: Empowering Non-Parametric Networks for Few-shot 3D Scene Segmentation

    Authors: Xiangyang Zhu, Renrui Zhang, Bowei He, Ziyu Guo, Jiaming Liu, Han Xiao, Chaoyou Fu, Hao Dong, Peng Gao

    Abstract: To reduce the reliance on large-scale datasets, recent works in 3D segmentation resort to few-shot learning. Current 3D few-shot segmentation methods first pre-train models on 'seen' classes, and then evaluate their generalization performance on 'unseen' classes. However, the prior pre-training stage not only introduces excessive time overhead but also incurs a significant domain gap on 'unseen' c… ▽ More

    Submitted 5 April, 2024; originally announced April 2024.

    Comments: CVPR Highlight. Code is available at https://github.com/yangyangyang127/Seg-NN. arXiv admin note: text overlap with arXiv:2308.12961

  47. arXiv:2404.03634  [pdf, other

    cs.RO cs.CV

    PreAfford: Universal Affordance-Based Pre-Grasping for Diverse Objects and Environments

    Authors: Kairui Ding, Boyuan Chen, Ruihai Wu, Yuyang Li, Zongzheng Zhang, Huan-ang Gao, Siqi Li, Guyue Zhou, Yixin Zhu, Hao Dong, Hao Zhao

    Abstract: Robotic manipulation with two-finger grippers is challenged by objects lacking distinct graspable features. Traditional pre-grasping methods, which typically involve repositioning objects or utilizing external aids like table edges, are limited in their adaptability across different object categories and environments. To overcome these limitations, we introduce PreAfford, a novel pre-grasping plan… ▽ More

    Submitted 4 July, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

    Comments: Project Page: https://air-discover.github.io/PreAfford/

  48. arXiv:2404.03418  [pdf, ps, other

    cs.LO cs.AI

    Permissible Knowledge Pooling

    Authors: Huimin Dong

    Abstract: Information pooling has been extensively formalised across various logical frameworks in distributed systems, characterized by diverse information-sharing patterns. These approaches generally adopt an intersection perspective, aggregating all possible information, regardless of whether it is known or unknown to the agents. In contrast, this work adopts a unique stance, emphasising that sharing kno… ▽ More

    Submitted 15 May, 2024; v1 submitted 4 April, 2024; originally announced April 2024.

  49. arXiv:2404.02653  [pdf, ps, other

    astro-ph.SR

    A Statistical Investigation of the Neupert Effect in Solar Flares Observed with ASO-S/HXI

    Authors: Dong Li, Hanyang Dong, Wei Chen, Yang Su, Yu Huang, Zongjun Ning

    Abstract: The Neupert effect refers to the strong correlation between the soft X-ray (SXR) light curve and the time-integrated hard X-rays (HXR) or microwave flux, which is frequently observed in solar flares. In this article, we therefore utilized the newly launched Hard X-ray Imager (HXI) on board the Advanced Space-based Solar Observatory to investigate the Neupert effect during solar flares. By checking… ▽ More

    Submitted 3 April, 2024; originally announced April 2024.

    Comments: 23 pages, 6 figures

  50. arXiv:2404.01365  [pdf, other

    cs.LG cs.AI cs.CL

    Prompt-prompted Mixture of Experts for Efficient LLM Generation

    Authors: Harry Dong, Beidi Chen, Yuejie Chi

    Abstract: With the development of transformer-based large language models (LLMs), they have been applied to many fields due to their remarkable utility, but this comes at a considerable computational cost at deployment. Fortunately, some methods such as pruning or constructing a mixture of experts (MoE) aim at exploiting sparsity in transformer feedforward (FF) blocks to gain boosts in speed and reduction i… ▽ More

    Submitted 5 April, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

    Comments: Revision 1: Updated abstract with code link; re-ran top-k + sampling rows in Table 4, conclusions unchanged