Skip to main content

Showing 1–50 of 3,016 results for author: Huang, H

  1. arXiv:2407.11701  [pdf, other

    cs.CV

    Novel Artistic Scene-Centric Datasets for Effective Transfer Learning in Fragrant Spaces

    Authors: Shumei Liu, Haiting Huang, Mathias Zinnen, Andreas Maier, Vincent Christlein

    Abstract: Olfaction, often overlooked in cultural heritage studies, holds profound significance in shaping human experiences and identities. Examining historical depictions of olfactory scenes can offer valuable insights into the role of smells in history. We show that a transfer-learning approach using weakly labeled training data can remarkably improve the classification of fragrant spaces and, more gener… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    ACM Class: I.4.8; J.5

  2. arXiv:2407.11298  [pdf, other

    cs.RO

    ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

    Authors: Yaoyao Qian, Xupeng Zhu, Ondrej Biza, Shuo Jiang, Linfeng Zhao, Haojie Huang, Yu Qi, Robert Platt

    Abstract: Robotic grasping in cluttered environments remains a significant challenge due to occlusions and complex object arrangements. We have developed ThinkGrasp, a plug-and-play vision-language grasping system that makes use of GPT-4o's advanced contextual reasoning for heavy clutter environment grasping strategies. ThinkGrasp can effectively identify and generate grasp poses for target objects, even wh… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Project Website:(https://h-freax.github.io/thinkgrasp_page/)

  3. arXiv:2407.10953  [pdf, other

    cs.CL

    MMM: Multilingual Mutual Reinforcement Effect Mix Datasets & Test with Open-domain Information Extraction Large Language Models

    Authors: Chengguang Gan, Qingyu Yin, Xinyang He, Hanjun Wei, Yunhao Liang, Younghun Lim, Shijian Wang, Hexiang Huang, Qinghao Zhang, Shiwen Ni, Tatsunori Mori

    Abstract: The Mutual Reinforcement Effect (MRE) represents a promising avenue in information extraction and multitasking research. Nevertheless, its applicability has been constrained due to the exclusive availability of MRE mix datasets in Japanese, thereby limiting comprehensive exploration by the global research community. To address this limitation, we introduce a Multilingual MRE mix dataset (MMM) that… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Under Review. 11 pages, 5 Figure

  4. arXiv:2407.10892  [pdf, other

    hep-ex astro-ph.SR nucl-ex

    First Measurement of Solar $^8$B Neutrino Flux through Coherent Elastic Neutrino-Nucleus Scattering in PandaX-4T

    Authors: PandaX Collaboration, Zihao Bo, Wei Chen, Xun Chen, Yunhua Chen, Zhaokan Cheng, Xiangyi Cui, Yingjie Fan, Deqing Fang, Zhixing Gao, Lisheng Geng, Karl Giboni, Xunan Guo, Xuyuan Guo, Zichao Guo, Chencheng Han, Ke Han, Changda He, Jinrong He, Di Huang, Houqi Huang, Junting Huang, Ruquan Hou, Yu Hou, Xiangdong Ji , et al. (77 additional authors not shown)

    Abstract: The PandaX-4T liquid xenon detector at the China Jinping Underground Laboratory is used to measure the solar $^8$B neutrino flux by detecting neutrinos through coherent scattering with xenon nuclei. Data samples requiring the coincidence of scintillation and ionization signals (paired), as well as unpaired ionization-only signals (US2), are selected with energy threshold of approximately 1.1 keV (… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  5. arXiv:2407.10687  [pdf, other

    cs.CV cs.GR

    FRI-Net: Floorplan Reconstruction via Room-wise Implicit Representation

    Authors: Honghao Xu, Juzhan Xu, Zeyu Huang, Pengfei Xu, Hui Huang, Ruizhen Hu

    Abstract: In this paper, we introduce a novel method called FRI-Net for 2D floorplan reconstruction from 3D point cloud. Existing methods typically rely on corner regression or box regression, which lack consideration for the global shapes of rooms. To address these issues, we propose a novel approach using a room-wise implicit representation with structural regularization to characterize the shapes of room… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: ECCV 2024

  6. arXiv:2407.10636  [pdf, other

    cs.CV

    Temporal Residual Guided Diffusion Framework for Event-Driven Video Reconstruction

    Authors: Lin Zhu, Yunlong Zheng, Yijun Zhang, Xiao Wang, Lizhi Wang, Hua Huang

    Abstract: Event-based video reconstruction has garnered increasing attention due to its advantages, such as high dynamic range and rapid motion capture capabilities. However, current methods often prioritize the extraction of temporal information from continuous event flow, leading to an overemphasis on low-frequency texture features in the scene, resulting in over-smoothing and blurry artifacts. Addressing… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV2024

  7. arXiv:2407.10494  [pdf, other

    cs.LG cs.CV

    Learning to Unlearn for Robust Machine Unlearning

    Authors: Mark He Huang, Lin Geng Foo, Jun Liu

    Abstract: Machine unlearning (MU) seeks to remove knowledge of specific data samples from trained models without the necessity for complete retraining, a task made challenging by the dual objectives of effective erasure of data and maintaining the overall performance of the model. Despite recent advances in this field, balancing between the dual objectives of unlearning remains challenging. From a fresh per… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: Accepted by ECCV 2024

    ACM Class: I.2.6

  8. arXiv:2407.09952  [pdf, other

    physics.ins-det hep-ex

    The STAR Forward Silicon Tracker

    Authors: J. D. Brandenburg, Y. Chang, J. Dong, Y. He, Y. Hu, H. Huang, T. Huang, H. Li, M. Nie, R. Sharma, X. Sun, P. Tribedy, F. Videbæk, G. Visser, G. Wilks, P. Wang, G. Xie, G. Yan, Z. Ye, L. Yi, Y. Yang, S. Zhang, Z. Zhang

    Abstract: The Forward Silicon Tracker (FST) is a pivotal component of the forward upgrade of the Solenoidal Tracker at RHIC (STAR), designed to discern hadron charge signs with a momentum resolution better than 30\% for $0.2 < p_T < 2$ GeV/c in the $2.5 < η< 4$ pseudorapidity range. Its compact design features three disks along the beam direction, minimized material budget and scattering effects. The FST us… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

    Comments: 29 pages, 28 figures

  9. arXiv:2407.09807  [pdf, other

    eess.AS

    A Streaming Multi-Channel End-to-End Speech Recognition System with Realistic Evaluations

    Authors: Xiangzhu Kong, Tianqi Ning, Hao Huang, Zhijian Ou

    Abstract: Recently multi-channel end-to-end (ME2E) ASR systems have emerged. While streaming single-channel end-to-end ASR has been extensively studied, streaming ME2E ASR is limited in exploration. Additionally, recent studies call attention to the gap between in-distribution (ID) and out-of-distribution (OOD) tests and doing realistic evaluations. This paper focuses on two research problems: realizing str… ▽ More

    Submitted 13 July, 2024; originally announced July 2024.

  10. arXiv:2407.09509  [pdf, other

    q-bio.NC cs.HC

    Brain Dialogue Interface (BDI): A User-Friendly fMRI Model for Interactive Brain Decoding

    Authors: Heng Huang, Lin Zhao, Zihao Wu, Xiaowei Yu, Jing Zhang, Xintao Hu, Dajiang Zhu, Tianming Liu

    Abstract: Brain decoding techniques are essential for understanding the neurocognitive system. Although numerous methods have been introduced in this field, accurately aligning complex external stimuli with brain activities remains a formidable challenge. To alleviate alignment difficulties, many studies have simplified their models by employing single-task paradigms and establishing direct links between br… ▽ More

    Submitted 17 June, 2024; originally announced July 2024.

  11. arXiv:2407.09094  [pdf, other

    eess.IV cs.CV

    Beyond Image Prior: Embedding Noise Prior into Conditional Denoising Transformer

    Authors: Yuanfei Huang, Hua Huang

    Abstract: Existing learning-based denoising methods typically train models to generalize the image prior from large-scale datasets, suffering from the variability in noise distributions encountered in real-world scenarios. In this work, we propose a new perspective on the denoising challenge by highlighting the distinct separation between noise and image priors. This insight forms the basis for our developm… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Under Review

  12. arXiv:2407.09091  [pdf, other

    cs.RO

    Accurate Prior-centric Monocular Positioning with Offline LiDAR Fusion

    Authors: Jinhao He, Huaiyang Huang, Shuyang Zhang, Jianhao Jiao, Chengju Liu, Ming Liu

    Abstract: Unmanned vehicles usually rely on Global Positioning System (GPS) and Light Detection and Ranging (LiDAR) sensors to achieve high-precision localization results for navigation purpose. However, this combination with their associated costs and infrastructure demands, poses challenges for widespread adoption in mass-market applications. In this paper, we aim to use only a monocular camera to achieve… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: ICRA 2024

  13. arXiv:2407.08813  [pdf, other

    eess.IV cs.AI cs.CV

    FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification

    Authors: Yu Tian, Congcong Wen, Min Shi, Muhammad Muneeb Afzal, Hao Huang, Muhammad Osama Khan, Yan Luo, Yi Fang, Mengyu Wang

    Abstract: Addressing fairness in artificial intelligence (AI), particularly in medical AI, is crucial for ensuring equitable healthcare outcomes. Recent efforts to enhance fairness have introduced new methodologies and datasets in medical AI. However, the fairness issue under the setting of domain transfer is almost unexplored, while it is common that clinics rely on different imaging technologies (e.g., di… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

    Comments: ECCV 2024; Codes available at https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain

  14. arXiv:2407.08499  [pdf

    cond-mat.mes-hall physics.optics

    Spin-valley-locked Electroluminescence for High-Performance Circularly-Polarized Organic Light-Emitting Diodes

    Authors: Yibo Deng, Teng Long, Pingyang Wang, Han Huang, Zijian Deng, Chunling Gu, Cunbin An, Bo Liao, Guillaume Malpuech, Dmitry Solnyshkov, Hongbing Fu, Qing Liao

    Abstract: Circularly polarized (CP) organic light-emitting diodes (OLEDs) have attracted attention in potential applications including novel display and photonic technologies. However, conventional approaches cannot meet the requirements of device performance, such as high dissymmetry factor, high directionality, narrowband emission, simplified device structure and low costs. Here, we demonstrate spin-valle… ▽ More

    Submitted 11 July, 2024; originally announced July 2024.

  15. arXiv:2407.07921  [pdf, other

    cs.CR cs.AI cs.LG eess.SP

    A Trustworthy AIoT-enabled Localization System via Federated Learning and Blockchain

    Authors: Junfei Wang, He Huang, Jingze Feng, Steven Wong, Lihua Xie, Jianfei Yang

    Abstract: There is a significant demand for indoor localization technology in smart buildings, and the most promising solution in this field is using RF sensors and fingerprinting-based methods that employ machine learning models trained on crowd-sourced user data gathered from IoT devices. However, this raises security and privacy issues in practice. Some researchers propose to use federated learning to pa… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  16. arXiv:2407.07754  [pdf, other

    quant-ph cond-mat.str-el cs.CC cs.IT

    Random unitaries in extremely low depth

    Authors: Thomas Schuster, Jonas Haferkamp, Hsin-Yuan Huang

    Abstract: We prove that random quantum circuits on any geometry, including a 1D line, can form approximate unitary designs over $n$ qubits in $\log n$ depth. In a similar manner, we construct pseudorandom unitaries (PRUs) in 1D circuits in $\text{poly} \log n $ depth, and in all-to-all-connected circuits in $\text{poly} \log \log n $ depth. In all three cases, the $n$ dependence is optimal and improves expo… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 12 pages, 6 figures + 46-page appendix

  17. arXiv:2407.06579  [pdf, other

    cs.CL

    NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification

    Authors: Hongfei Huang, Tingting Liang, Xixi Sun, Zikang Jin, Yuyu Yin

    Abstract: Existing research on learning with noisy labels predominantly focuses on synthetic label noise. Although synthetic noise possesses well-defined structural properties, it often fails to accurately replicate real-world noise patterns. In recent years, there has been a concerted effort to construct generalizable and controllable instance-dependent noise datasets for image classification, significantl… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 20 pages , 13 figure

  18. arXiv:2407.06562  [pdf, other

    gr-qc

    Identifying \textit{doppelgänge} Black Holes through Shadow Images

    Authors: Yukun Xu, Hyat Huang, Meng-Yun Lai, De-Cheng Zou

    Abstract: Recently, an interesting \textit{doppelgänge} black hole solution is obtained in the string-inspired Euler-Heisenberg theory, where the black holes have the same radii but share different charges. We found, however, they possess different ISCOs and photon spheres, and hence affect their shadow images. In this work, we investigate the optical appearances, illuminated by an optically and geometrical… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

    Comments: 20 pages, 13 figures

  19. arXiv:2407.05640  [pdf, other

    physics.plasm-ph

    Particle-In-Cell simulations of filamentation process in magnetized plasma of capacitively-coupled radio-frequency discharge

    Authors: Huidong Huang, Jian Chen, Zhibin Wang

    Abstract: In the uniform raido-frequency capacitively-coupled plasma (RF-CCP) between a large electrode pair, adding an axial magnetic field induces diverse longitudinal filaments. This phenomenon, termed 'filamentation', challenges conventional understanding and remains poorly understood to date. To reveal its pattern dynamics, we conduct 2D Particle-In-Cell simulations to comprehensively examine whole pro… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  20. arXiv:2407.05598  [pdf, ps, other

    hep-ph nucl-th

    Probing the nature of the anticharmed-strange pentaquark states: mass spectra, decays, and magnetic moments

    Authors: Xuejie Liu, Yue Tan, Xiaoyun Chen, Dianyong Chen, Hongxia Huang, Jialun Ping

    Abstract: Within the framework of the quark delocalization color screening model, a systematic investigation of the anticharmed-strange pentaquark system is performed using the resonance group method. The currently estimations predict three bound states with estimated masses to be 2886 MeV, 3039 MeV, and 3153 MeV, respectively. Additionally, three resonance states are identified in various scattering phase… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

  21. arXiv:2407.05231  [pdf, other

    cs.CG cs.DS

    Fréchet Distance in Subquadratic Time

    Authors: Siu-Wing Cheng, Haoqiang Huang

    Abstract: Let $m$ and $n$ be the numbers of vertices of two polygonal curves in $\mathbb{R}^d$ for any fixed $d$ such that $m \leq n$. Since it was known in 1995 how to compute the Fréchet distance of these two curves in $O(mn\log (mn))$ time, it has been an open problem whether the running time can be reduced to $o(n^2)$ when $m = Ω(n)$. In the mean time, several well-known quadratic time barriers in compu… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

  22. arXiv:2407.05060  [pdf, other

    q-bio.NC

    Volume-optimal persistence homological scaffolds of hemodynamic networks covary with MEG theta-alpha aperiodic dynamics

    Authors: Nghi Nguyen, Tao Hou, Enrico Amico, Jingyi Zheng, Huajun Huang, Alan D. Kaplan, Giovanni Petri, Joaquín Goñi, Yize Zhao, Duy Duong-Tran, Li Shen

    Abstract: Higher-order properties of functional magnetic resonance imaging (fMRI) induced connectivity have been shown to unravel many exclusive topological and dynamical insights beyond pairwise interactions. Nonetheless, whether these fMRI-induced higher-order properties play a role in disentangling other neuroimaging modalities' insights remains largely unexplored and poorly understood. In this work, by… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: The code for our analyses is provided in https://github.com/ngcaonghi/scaffold_noise

  23. arXiv:2407.04387  [pdf, ps, other

    math.AP math.PR

    The rigorous derivation of Vlasov equations with local alignments from moderately interacting particle systems

    Authors: Jinhuan Wang, Mengdi Zhuang, Hui Huang

    Abstract: In this paper, we present a rigorous derivation of the mean-field limit for a moderately interacting particle system in $\R^d$ $(d\geq 2)$. For stochastic initial data, we demonstrate that the solution to the interacting particle model, with an appropriately applied cut-off, converges in probabilistic sense to the solution of the characteristics of the regularized Vlasov models featuring local ali… ▽ More

    Submitted 5 July, 2024; originally announced July 2024.

    MSC Class: 35Q83; 82C22; 35Q70; 60K35

  24. arXiv:2407.03978  [pdf, other

    cs.CL cs.AI

    Benchmarking Complex Instruction-Following with Multiple Constraints Composition

    Authors: Bosi Wen, Pei Ke, Xiaotao Gu, Lindong Wu, Hao Huang, Jinfeng Zhou, Wenchuang Li, Binxin Hu, Wendy Gao, Jiaxin Xu, Yiming Liu, Jie Tang, Hongning Wang, Minlie Huang

    Abstract: Instruction following is one of the fundamental capabilities of large language models (LLMs). As the ability of LLMs is constantly improving, they have been increasingly applied to deal with complex human instructions in real-world scenarios. Therefore, how to evaluate the ability of complex instruction-following of LLMs has become a critical research problem. Existing benchmarks mainly focus on m… ▽ More

    Submitted 11 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

    Comments: 20 pages, 7 figures

  25. arXiv:2407.03750  [pdf, other

    cs.DB

    GriDB: Scaling Blockchain Database via Sharding and Off-Chain Cross-Shard Mechanism

    Authors: Zicong Hong, Song Guo, Enyuan Zhou, Wuhui Chen, Huawei Huang, Albert Zomaya

    Abstract: Blockchain databases have attracted widespread attention but suffer from poor scalability due to underlying non-scalable blockchains. While blockchain sharding is necessary for a scalable blockchain database, it poses a new challenge named on-chain cross-shard database services. Each cross-shard database service (e.g., cross-shard queries or inter-shard load balancing) involves massive cross-shard… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  26. arXiv:2407.03556  [pdf

    cond-mat.supr-con

    High-temperature Superconductivity in Perovskite Hydride below 10 GPa

    Authors: Mingyang Du, Hongyu Huang, Zihan Zhang, Min Wang, Hao Song, Defang Duan, Tian Cui

    Abstract: Hydrogen and hydrides materials have long been considered promising materials for high-temperature superconductivity. But the extreme pressures required for the metallization of hydrogen-based superconductors limit their applications. Here, we have designed a series of high-temperature perovskite hydrides that can be stable within 10 GPa. Our research covered 182 ternary systems and ultimately det… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  27. arXiv:2407.03531  [pdf, other

    cs.RO

    OrbitGrasp: $SE(3)$-Equivariant Grasp Learning

    Authors: Boce Hu, Xupeng Zhu, Dian Wang, Zihao Dong, Haojie Huang, Chenghao Wang, Robin Walters, Robert Platt

    Abstract: While grasp detection is an important part of any robotic manipulation pipeline, reliable and accurate grasp detection in $SE(3)$ remains a research challenge. Many robotics applications in unstructured environments such as the home or warehouse would benefit a lot from better grasp performance. This paper proposes a novel framework for detecting $SE(3)$ grasp poses based on point cloud input. Our… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

  28. arXiv:2407.02935  [pdf, other

    nucl-ex hep-ex nucl-th

    Properties of the QCD Matter -- An Experimental Review of Selected Results from RHIC BES Program

    Authors: Jinhui Chen, Xin Dong, Xionghong He, Huanzhong Huang, Feng Liu, Xiaofeng Luo, Yu-Gang Ma, Lijuan Ruan, Ming Shao, Shusu Shi, Xu Sun, Aihong Tang, Zebo Tang, Fuqiang Wang, Hai Wang, Yi Wang, Zhigang Xiao, Guannan Xie, Nu Xu, Qinghua Xu, Zhangbu Xu, Chi Yang, Shuai Yang, Wangmei Zha, Yapeng Zhang , et al. (3 additional authors not shown)

    Abstract: In the paper, we discuss the development of the multi-gap resistive plate chamber Time-of-Flight (TOF) technology and the production of the STAR TOF detector in China at the beginning of the 21st century. Then we review recent experimental results from the first beam energy scan program (BES-I) at the Relativistic Heavy Ion Collider (RHIC). Topics cover measurements of collectivity, chirality, cri… ▽ More

    Submitted 3 July, 2024; originally announced July 2024.

    Comments: 31 pages, 33 figures. This review is dedicated to Professor Wenqing Shen on the occasion to celebrate his leadership of the Chinese STAR Collaboration, the development and production of the STAR MRPC TOF detector in China and many physics analyses

  29. arXiv:2407.02395  [pdf, other

    cs.SE cs.CL

    Is Your AI-Generated Code Really Safe? Evaluating Large Language Models on Secure Code Generation with CodeSecEval

    Authors: Jiexin Wang, Xitong Luo, Liuwen Cao, Hongkui He, Hailin Huang, Jiayuan Xie, Adam Jatowt, Yi Cai

    Abstract: Large language models (LLMs) have brought significant advancements to code generation and code repair, benefiting both novice and experienced developers. However, their training using unsanitized data from open-source repositories, like GitHub, raises the risk of inadvertently propagating security vulnerabilities. Despite numerous studies investigating the safety of code LLMs, there remains a gap… ▽ More

    Submitted 4 July, 2024; v1 submitted 2 July, 2024; originally announced July 2024.

    Comments: arXiv admin note: text overlap with arXiv:2310.16263

  30. arXiv:2407.02157  [pdf, other

    cs.CV cs.HC

    FineCLIPER: Multi-modal Fine-grained CLIP for Dynamic Facial Expression Recognition with AdaptERs

    Authors: Haodong Chen, Haojian Huang, Junhao Dong, Mingzhe Zheng, Dian Shao

    Abstract: Dynamic Facial Expression Recognition (DFER) is crucial for understanding human behavior. However, current methods exhibit limited performance mainly due to the scarcity of high-quality data, the insufficient utilization of facial dynamics, and the ambiguity of expression semantics, etc. To this end, we propose a novel framework, named Multi-modal Fine-grained CLIP for Dynamic Facial Expression Re… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Project Page: https://haroldchen19.github.io/FineCLIPER-Page/

  31. arXiv:2407.01902  [pdf, other

    cs.CR cs.AI cs.CL

    SoP: Unlock the Power of Social Facilitation for Automatic Jailbreak Attack

    Authors: Yan Yang, Zeguan Xiao, Xin Lu, Hongru Wang, Hailiang Huang, Guanhua Chen, Yun Chen

    Abstract: The widespread applications of large language models (LLMs) have brought about concerns regarding their potential misuse. Although aligned with human preference data before release, LLMs remain vulnerable to various malicious attacks. In this paper, we adopt a red-teaming strategy to enhance LLM safety and introduce SoP, a simple yet effective framework to design jailbreak prompts automatically. I… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  32. arXiv:2407.01886  [pdf, other

    cs.LG cs.AI

    Core Knowledge Learning Framework for Graph Adaptation and Scalability Learning

    Authors: Bowen Zhang, Zhichao Huang, Genan Dai, Guangning Xu, Xiaomao Fan, Hu Huang

    Abstract: Graph classification is a pivotal challenge in machine learning, especially within the realm of graph-based data, given its importance in numerous real-world applications such as social network analysis, recommendation systems, and bioinformatics. Despite its significance, graph classification faces several hurdles, including adapting to diverse prediction tasks, training across multiple target do… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  33. arXiv:2407.01812  [pdf, other

    cs.RO cs.LG

    Equivariant Diffusion Policy

    Authors: Dian Wang, Stephen Hart, David Surovik, Tarik Kelestemur, Haojie Huang, Haibo Zhao, Mark Yeatman, Jiuguang Wang, Robin Walters, Robert Platt

    Abstract: Recent work has shown diffusion models are an effective approach to learning the multimodal distributions arising from demonstration data in behavior cloning. However, a drawback of this approach is the need to learn a denoising function, which is significantly more complex than learning an explicit policy. In this work, we propose Equivariant Diffusion Policy, a novel diffusion policy learning me… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  34. arXiv:2407.01755  [pdf, other

    cs.RO

    An Intelligent Robotic System for Perceptive Pancake Batter Stirring and Precise Pouring

    Authors: Xinyuan Luo, Shengmiao Jin, Hung-Jui Huang, Wenzhen Yuan

    Abstract: Cooking robots have long been desired by the commercial market, while the technical challenge is still significant. A major difficulty comes from the demand of perceiving and handling liquid with different properties. This paper presents a robot system that mixes batter and makes pancakes out of it, where understanding and handling the viscous liquid is an essential component. The system integrate… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

    Comments: 8 pages, 10 figures, Accepted to IROS 2024

  35. arXiv:2407.00935  [pdf, other

    cs.LG cs.CL

    Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining

    Authors: Qi Zhang, Tianqi Du, Haotian Huang, Yifei Wang, Yisen Wang

    Abstract: In recent years, the rise of generative self-supervised learning (SSL) paradigms has exhibited impressive performance across visual, language, and multi-modal domains. While the varied designs of generative SSL objectives lead to distinct properties in downstream tasks, a theoretical understanding of these differences remains largely unexplored. In this paper, we establish the first theoretical co… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  36. arXiv:2407.00560  [pdf, other

    q-bio.BM math.OC

    DCI: An Accurate Quality Assessment Criteria for Protein Complex Structure Models

    Authors: Wenda Wang, Jiaqi Zhai, He Huang, Xinqi Gong

    Abstract: The structure of proteins is the basis for studying protein function and drug design. The emergence of AlphaFold 2 has greatly promoted the prediction of protein 3D structures, and it is of great significance to give an overall and accurate evaluation of the predicted models, especially the complex models. Among the existing methods for evaluating multimer structures, DockQ is the most commonly us… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

  37. arXiv:2407.00125  [pdf, other

    cs.SE cs.AI cs.DC

    A Survey on Failure Analysis and Fault Injection in AI Systems

    Authors: Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng

    Abstract: The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ens… ▽ More

    Submitted 27 June, 2024; originally announced July 2024.

  38. arXiv:2406.19954  [pdf, other

    cs.CL cs.HC cs.SD eess.AS

    BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

    Authors: Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Żelasko, Jagadeesh Balam, Boris Ginsburg

    Abstract: Incorporating speech understanding capabilities into pretrained large-language models has become a vital research direction (SpeechLLM). The previous architectures can be categorized as: i) GPT-style, prepend speech prompts to the text prompts as a sequence of LLM inputs like a decoder-only model; ii) T5-style, introduce speech cross-attention to each layer of the pretrained LLMs. We propose BESTO… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    MSC Class: 68T10 ACM Class: I.2.7

  39. arXiv:2406.19674  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    Less is More: Accurate Speech Recognition & Translation without Web-Scale Data

    Authors: Krishna C. Puvvada, Piotr Żelasko, He Huang, Oleksii Hrinchuk, Nithin Rao Koluguri, Kunal Dhawan, Somshubra Majumdar, Elena Rastorgueva, Zhehuai Chen, Vitaly Lavrukhin, Jagadeesh Balam, Boris Ginsburg

    Abstract: Recent advances in speech recognition and translation rely on hundreds of thousands of hours of Internet speech data. We argue that state-of-the art accuracy can be reached without relying on web-scale data. Canary - multilingual ASR and speech translation model, outperforms current state-of-the-art models - Whisper, OWSM, and Seamless-M4T on English, French, Spanish, and German languages, while b… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

    Comments: Accepted at Interspeech-2024

  40. arXiv:2406.19212  [pdf, other

    quant-ph

    JuliVQC: an Efficient Variational Quantum Circuit Simulator for Near-Term Quantum Algorithms

    Authors: Wei-You Liao, Xiang Wang, Xiao-Yue Xu, Chen Ding, Shuo Zhang, He-Liang Huang, Chu Guo

    Abstract: We introduce JuliVQC: a light-weight, yet extremely efficient variational quantum circuit simulator. JuliVQC is part of an effort for classical simulation of the \textit{Zuchongzhi} quantum processors, where it is extensively used to characterize the circuit noises, as a building block in the Schr$\ddot{\text{o}}$dinger-Feynman algorithm for classical verification and performance benchmarking, and… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: 12 pages, 8 figures

  41. arXiv:2406.18871  [pdf, other

    eess.AS cs.CL

    DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment

    Authors: Ke-Han Lu, Zhehuai Chen, Szu-Wei Fu, He Huang, Boris Ginsburg, Yu-Chiang Frank Wang, Hung-yi Lee

    Abstract: Recent speech language models (SLMs) typically incorporate pre-trained speech models to extend the capabilities from large language models (LLMs). In this paper, we propose a Descriptive Speech-Text Alignment approach that leverages speech captioning to bridge the gap between speech and text modalities, enabling SLMs to interpret and generate comprehensive natural language descriptions, thereby fa… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted to Interspeech 2024

  42. arXiv:2406.18254  [pdf, other

    cs.IR cs.AI cs.MM

    Improving the Consistency in Cross-Lingual Cross-Modal Retrieval with 1-to-K Contrastive Learning

    Authors: Zhijie Nie, Richong Zhang, Zhangchi Feng, Hailang Huang, Xudong Liu

    Abstract: Cross-lingual Cross-modal Retrieval (CCR) is an essential task in web search, which aims to break the barriers between modality and language simultaneously and achieves image-text retrieval in the multi-lingual scenario with a single model. In recent years, excellent progress has been made based on cross-lingual cross-modal pre-training; particularly, the methods based on contrastive learning on l… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by KDD 2024 Research Track

  43. arXiv:2406.18146  [pdf, other

    cs.CV

    A Refer-and-Ground Multimodal Large Language Model for Biomedicine

    Authors: Xiaoshuang Huang, Haifeng Huang, Lingdong Shen, Yehui Yang, Fangxin Shang, Junwei Liu, Jia Liu

    Abstract: With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhibits a substantial gap in this area, primarily due to the absence of a dedicated refer and ground dataset for biomedical images. To address this chall… ▽ More

    Submitted 28 June, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

    Comments: Accepted by MICCAI2024

  44. arXiv:2406.18018  [pdf, other

    eess.IV

    A Cross Spatio-Temporal Pathology-based Lung Nodule Dataset

    Authors: Muwei Jian, Haoran Zhang, Mingju Shao, Hongyu Chen, Huihui Huang, Yanjie Zhong, Changlei Zhang, Bin Wang, Penghui Gao

    Abstract: Recently, intelligent analysis of lung nodules with the assistant of computer aided detection (CAD) techniques can improve the accuracy rate of lung cancer diagnosis. However, existing CAD systems and pulmonary datasets mainly focus on Computed Tomography (CT) images from one single period, while ignoring the cross spatio-temporal features associated with the progression of nodules contained in im… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  45. arXiv:2406.17770  [pdf, other

    cs.CV

    MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning

    Authors: Xiangyu Zhao, Xiangtai Li, Haodong Duan, Haian Huang, Yining Li, Kai Chen, Hua Yang

    Abstract: Multi-modal large language models (MLLMs) have made significant strides in various visual understanding tasks. However, the majority of these models are constrained to process low-resolution images, which limits their effectiveness in perception tasks that necessitate detailed visual information. In our study, we present MG-LLaVA, an innovative MLLM that enhances the model's visual processing capa… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  46. arXiv:2406.17565  [pdf, other

    cs.DC

    MemServe: Context Caching for Disaggregated LLM Serving with Elastic Memory Pool

    Authors: Cunchen Hu, Heyang Huang, Junhao Hu, Jiang Xu, Xusheng Chen, Tao Xie, Chenxi Wang, Sa Wang, Yungang Bao, Ninghui Sun, Yizhou Shan

    Abstract: Large language model (LLM) serving has transformed from stateless to stateful systems, utilizing techniques like context caching and disaggregated inference. These optimizations extend the lifespan and domain of the KV cache, necessitating a new architectural approach. We present MemServe, a unified system that integrates both inter-request and intra-request optimizations. MemServe introduces MemP… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  47. arXiv:2406.17507  [pdf, other

    cs.IR

    ACE: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling

    Authors: Minghui Fang, Shengpeng Ji, Jialong Zuo, Hai Huang, Yan Xia, Jieming Zhu, Xize Cheng, Xiaoda Yang, Wenrui Liu, Gang Wang, Zhenhua Dong, Zhou Zhao

    Abstract: Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a sequence-to-sequence model to directly generate candidate identifiers based on natural language queries. Without explicitly computing the similarity between queries and candidates, generative retrieval surpasses dual-tower models in both speed and accuracy on large-scale corpora, providing new insights… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  48. arXiv:2406.16520  [pdf

    cond-mat.str-el cond-mat.mtrl-sci cond-mat.supr-con

    Gigantic-oxidative atomically layered epitaxy for designed complex oxides

    Authors: Guangdi Zhou, Haoliang Huang, Fengzhe Wang, Heng Wang, Qishuo Yang, Zihao Nie, Wei Lv, Cui Ding, Yueying Li, Danfeng Li, Yujie Sun, Junhao Lin, Guang-Ming Zhang, Qi-Kun Xue, Zhuoyu Chen

    Abstract: In designing material functionality within the intricate realm of transition metal oxides, lattice structure and d-orbital occupancy are two principal determinants of the correlated physical properties, such as superconductivity. However, the modulation of these two factors is inherently limited by the need to balance thermodynamic stability, kinetic mobility, and synthesis precision, particularly… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  49. arXiv:2406.16137  [pdf, other

    cs.CV

    MLPHand: Real Time Multi-View 3D Hand Mesh Reconstruction via MLP Modeling

    Authors: Jian Yang, Jiakun Li, Guoming Li, Zhen Shen, Huai-Yu Wu, Zhaoxin Fan, Heng Huang

    Abstract: Multi-view hand mesh reconstruction is a critical task for applications in virtual reality and human-computer interaction, but it remains a formidable challenge. Although existing multi-view hand reconstruction methods achieve remarkable accuracy, they typically come with an intensive computational burden that hinders real-time inference. To this end, we propose MLPHand, a novel method designed fo… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

  50. arXiv:2406.15677  [pdf, other

    cs.RO

    Open-vocabulary Pick and Place via Patch-level Semantic Maps

    Authors: Mingxi Jia, Haojie Huang, Zhewen Zhang, Chenghao Wang, Linfeng Zhao, Dian Wang, Jason Xinyu Liu, Robin Walters, Robert Platt, Stefanie Tellex

    Abstract: Controlling robots through natural language instructions in open-vocabulary scenarios is pivotal for enhancing human-robot collaboration and complex robot behavior synthesis. However, achieving this capability poses significant challenges due to the need for a system that can generalize from limited data to a wide range of tasks and environments. Existing methods rely on large, costly datasets and… ▽ More

    Submitted 21 June, 2024; originally announced June 2024.