Skip to main content

Showing 1–50 of 194 results for author: Hu, P

  1. arXiv:2407.03621  [pdf, other

    cs.CL

    The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model

    Authors: Brenden Smith, Dallin Baker, Clayton Chase, Myles Barney, Kaden Parker, Makenna Allred, Peter Hu, Alex Evans, Nancy Fulda

    Abstract: Large Language Models (LLMs) have an unrivaled and invaluable ability to "align" their output to a diverse range of human preferences, by mirroring them in the text they generate. The internal characteristics of such models, however, remain largely opaque. This work presents the Injectable Realignment Model (IRM) as a novel approach to language model interpretability and explainability. Inspired b… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

    Comments: 21 pages, 17 figures

  2. arXiv:2406.16655  [pdf, other

    cs.CL

    Large Language Models Are Cross-Lingual Knowledge-Free Reasoners

    Authors: Peng Hu, Sizhe Liu, Changjiang Gao, Xin Huang, Xue Han, Junlan Feng, Chao Deng, Shujian Huang

    Abstract: Large Language Models have demonstrated impressive reasoning capabilities across multiple languages. However, the relationship between capabilities in different languages is less explored. In this work, we decompose the process of reasoning tasks into two separated parts: knowledge retrieval and knowledge-free reasoning, and analyze the cross-lingual transferability of them. With adapted and const… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  3. arXiv:2406.10928  [pdf, other

    cs.CR cs.AI cs.NI

    Make Your Home Safe: Time-aware Unsupervised User Behavior Anomaly Detection in Smart Homes via Loss-guided Mask

    Authors: Jingyu Xiao, Zhiyao Xu, Qingsong Zou, Qing Li, Dan Zhao, Dong Fang, Ruoyu Li, Wenxin Tang, Kang Li, Xudong Zuo, Penghui Hu, Yong Jiang, Zixuan Weng, Michael R. Lyv

    Abstract: Smart homes, powered by the Internet of Things, offer great convenience but also pose security concerns due to abnormal behaviors, such as improper operations of users and potential attacks from malicious attackers. Several behavior modeling methods have been proposed to identify abnormal behaviors and mitigate potential risks. However, their performance often falls short because they do not effec… ▽ More

    Submitted 18 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: KDD 2024

  4. arXiv:2406.08757  [pdf, other

    cs.CL cs.AI

    SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding

    Authors: Jiefeng Ma, Yan Wang, Chenyu Liu, Jun Du, Yu Hu, Zhenrong Zhang, Pengfei Hu, Qing Wang, Jianshu Zhang

    Abstract: Accurately identifying and organizing textual content is crucial for the automation of document processing in the field of form understanding. Existing datasets, such as FUNSD and XFUND, support entity classification and relationship prediction tasks but are typically limited to local and entity-level annotations. This limitation overlooks the hierarchically structured representation of documents,… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    Comments: NeurIPS 2024 Track on Datasets and Benchmarks under review

  5. arXiv:2406.08454  [pdf, other

    cs.SD eess.AS

    Towards Musically Informed Evaluation of Piano Transcription Models

    Authors: Patricia Hu, Lukáš Samuel Marták, Carlos Cancino-Chacón, Gerhard Widmer

    Abstract: Automatic piano transcription models are typically evaluated using simple frame- or note-wise information retrieval (IR) metrics. Such benchmark metrics do not provide insights into the transcription quality of specific musical aspects such as articulation, dynamics, or rhythmic precision of the output, which are essential in the context of expressive performance analysis. Furthermore, in recent y… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

  6. arXiv:2406.07393  [pdf, other

    cs.CL

    Limited Out-of-Context Knowledge Reasoning in Large Language Models

    Authors: Peng Hu, Changjiang Gao, Ruiqi Gao, Jiajun Chen, Shujian Huang

    Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as knowledge bases and significant in-context reasoning capabilities. However, previous work challenges their out-of-context reasoning ability, i.e., the ability to infer information from their training data, instead of from the context or prompt. This paper focuses on a significant facet of out-of-context reasoning: Out-of-Context… ▽ More

    Submitted 24 June, 2024; v1 submitted 11 June, 2024; originally announced June 2024.

  7. arXiv:2406.01953  [pdf, other

    cs.NI

    On-Demand Routing in LEO Mega-Constellations with Dynamic Laser Inter-Satellite Links

    Authors: Dhiraj Bhattacharjee, Pablo G. Madoery, Aizaz U. Chaudhry, Halim Yanikomeroglu, Gunes Karabulut Kurt, Peng Hu, Khaled Ahmed, Stephane Martel

    Abstract: Low Earth orbit (LEO) satellite mega constellations are beginning to include laser inter-satellite links (LISLs) to extend the Internet to the most remote locations on Earth. Since the process of establishing these links incurs a setup delay on the order of seconds, a static network topology is generally established well in advance, which is then used for the routing calculations. However, this in… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

  8. arXiv:2405.15438  [pdf, other

    cs.CV cs.LG eess.IV

    Comparing remote sensing-based forest biomass mapping approaches using new forest inventory plots in contrasting forests in northeastern and southwestern China

    Authors: Wenquan Dong, Edward T. A. Mitchard, Yuwei Chen, Man Chen, Congfeng Cao, Peilun Hu, Cong Xu, Steven Hancock

    Abstract: Large-scale high spatial resolution aboveground biomass (AGB) maps play a crucial role in determining forest carbon stocks and how they are changing, which is instrumental in understanding the global carbon cycle, and implementing policy to mitigate climate change. The advent of the new space-borne LiDAR sensor, NASA's GEDI instrument, provides unparalleled possibilities for the accurate and unbia… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  9. arXiv:2405.11862  [pdf, other

    cs.CV

    SEMv3: A Fast and Robust Approach to Table Separation Line Detection

    Authors: Chunxia Qin, Zhenrong Zhang, Pengfei Hu, Chenyu Liu, Jiefeng Ma, Jun Du

    Abstract: Table structure recognition (TSR) aims to parse the inherent structure of a table from its input image. The `"split-and-merge" paradigm is a pivotal approach to parse table structure, where the table separation line detection is crucial. However, challenges such as wireless and deformed tables make it demanding. In this paper, we adhere to the "split-and-merge" paradigm and propose SEMv3 (SEM: Spl… ▽ More

    Submitted 20 May, 2024; originally announced May 2024.

    Comments: 9 pages, 6 figures, 5 tables. Accepted by IJCAI2024 main track

  10. arXiv:2404.17875  [pdf, other

    cs.LG

    Noisy Node Classification by Bi-level Optimization based Multi-teacher Distillation

    Authors: Yujing Liu, Zongqian Wu, Zhengyu Lu, Ci Nie, Guoqiu Wen, Ping Hu, Xiaofeng Zhu

    Abstract: Previous graph neural networks (GNNs) usually assume that the graph data is with clean labels for representation learning, but it is not true in real applications. In this paper, we propose a new multi-teacher distillation method based on bi-level optimization (namely BO-NNC), to conduct noisy node classification on the graph data. Specifically, we first employ multiple self-supervised learning me… ▽ More

    Submitted 8 May, 2024; v1 submitted 27 April, 2024; originally announced April 2024.

  11. arXiv:2404.11577  [pdf, other

    cs.LG cs.AI

    Towards Reliable Empirical Machine Unlearning Evaluation: A Game-Theoretic View

    Authors: Yiwen Tu, Pingbang Hu, Jiaqi Ma

    Abstract: Machine unlearning is the process of updating machine learning models to remove the information of specific training data samples, in order to comply with data protection regulations that allow individuals to request the removal of their personal data. Despite the recent development of numerous unlearning algorithms, reliable evaluation of these algorithms remains an open research question. In thi… ▽ More

    Submitted 12 June, 2024; v1 submitted 17 April, 2024; originally announced April 2024.

  12. arXiv:2404.04659  [pdf, other

    cs.CL

    Multilingual Pretraining and Instruction Tuning Improve Cross-Lingual Knowledge Alignment, But Only Shallowly

    Authors: Changjiang Gao, Hongda Hu, Peng Hu, Jiajun Chen, Jixing Li, Shujian Huang

    Abstract: Despite their strong ability to retrieve knowledge in English, current large language models show imbalance abilities in different languages. Two approaches are proposed to address this, i.e., multilingual pretraining and multilingual instruction tuning. However, whether and how do such methods contribute to the cross-lingual knowledge alignment inside the models is unknown. In this paper, we prop… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

  13. arXiv:2404.04346  [pdf, other

    cs.CV

    Koala: Key frame-conditioned long video-LLM

    Authors: Reuben Tan, Ximeng Sun, Ping Hu, Jui-hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

    Abstract: Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships. State-of-the-art video Large Language Models (vLLMs) hold promise as a viable solution due to their demonstrated emergent capabilities on new tasks. However, despite being trained on millions of short seconds-long videos, vLLMs are unable to unde… ▽ More

    Submitted 3 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024 as a poster highlight

  14. arXiv:2404.00855  [pdf, other

    cs.CV cs.AI

    TSOM: Small Object Motion Detection Neural Network Inspired by Avian Visual Circuit

    Authors: Pignge Hu, Xiaoteng Zhang, Mengmeng Li, Yingjie Zhu, Li Shi

    Abstract: Detecting small moving objects in complex backgrounds from an overhead perspective is a highly challenging task for machine vision systems. As an inspiration from nature, the avian visual system is capable of processing motion information in various complex aerial scenes, and its Retina-OT-Rt visual circuit is highly sensitive to capturing the motion information of small objects from high altitude… ▽ More

    Submitted 31 March, 2024; originally announced April 2024.

  15. arXiv:2403.19386  [pdf, other

    cs.CV cs.AI

    PointCloud-Text Matching: Benchmark Datasets and a Baseline

    Authors: Yanglin Feng, Yang Qin, Dezhong Peng, Hongyuan Zhu, Xi Peng, Peng Hu

    Abstract: In this paper, we present and study a new instance-level retrieval task: PointCloud-Text Matching~(PTM), which aims to find the exact cross-modal instance that matches a given point-cloud query or text query. PTM could be applied to various scenarios, such as indoor/urban-canyon localization and scene retrieval. However, there exists no suitable and targeted dataset for PTM in practice. Therefore,… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

  16. Towards Balanced RGB-TSDF Fusion for Consistent Semantic Scene Completion by 3D RGB Feature Completion and a Classwise Entropy Loss Function

    Authors: Laiyan Ding, Panwen Hu, Jie Li, Rui Huang

    Abstract: Semantic Scene Completion (SSC) aims to jointly infer semantics and occupancies of 3D scenes. Truncated Signed Distance Function (TSDF), a 3D encoding of depth, has been a common input for SSC. Furthermore, RGB-TSDF fusion, seems promising since these two modalities provide color and geometry information, respectively. Nevertheless, RGB-TSDF fusion has been considered nontrivial and commonly-used… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

  17. arXiv:2403.13936  [pdf, other

    cs.NI

    Secure and Efficient Group Handover Protocol in 5G Non-Terrestrial Networks

    Authors: Bohan Zhang, Peng Hu, Ahmad Akbari Azirani, Mohammad A. Salahuddin, Diogo Barradas, Noura Limam, Raouf Boutaba

    Abstract: The growing low-Earth orbit (LEO) satellite constellations have become an essential part of the fifth-generation (5G) non-terrestrial network (NTN) market. These satellites can enable direct-to-cell connectivity for mobile devices and support various applications with ubiquitous coverage for 5G and beyond networks. However, satellite-based NTNs bring several challenges to the 5G handover protocol… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

    Comments: Accepted by the 2024 IEEE International Conference on Communications (ICC), 9-13 June 2024, Denver, CO, USA

  18. arXiv:2403.11549  [pdf, other

    cs.CV

    Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

    Authors: Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He

    Abstract: Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. However, mitigating the performance degradation in large-scale models is non-trivial due to (i) parameter shifts throughout lifelong learning and (ii) significant computational burdens associated with full-model tuning. In this work, we present… ▽ More

    Submitted 3 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: This work is accepted by CVPR2024. More modifications may be performed

  19. arXiv:2403.08292  [pdf, other

    math.NA cs.AI math.DS

    Weak Collocation Regression for Inferring Stochastic Dynamics with Lévy Noise

    Authors: Liya Guo, Liwei Lu, Zhijun Zeng, Pipi Hu, Yi Zhu

    Abstract: With the rapid increase of observational, experimental and simulated data for stochastic systems, tremendous efforts have been devoted to identifying governing laws underlying the evolution of these systems. Despite the broad applications of non-Gaussian fluctuations in numerous physical phenomena, the data-driven approaches to extracting stochastic dynamics with Lévy noise are relatively few. In… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: 19 pages, 5 figures, 10 tables

  20. arXiv:2403.07153  [pdf, other

    cs.CV

    2023 Low-Power Computer Vision Challenge (LPCVC) Summary

    Authors: Leo Chen, Benjamin Boardley, Ping Hu, Yiru Wang, Yifan Pu, Xin Jin, Yongqiang Yao, Ruihao Gong, Bo Li, Gao Huang, Xianglong Liu, Zifu Wan, Xinwang Chen, Ning Liu, Ziyi Zhang, Dongping Liu, Ruijie Shan, Zhengping Che, Fachao Zhang, Xiaofeng Mou, Jian Tang, Maxim Chuprov, Ivan Malofeev, Alexander Goncharenko, Andrey Shcherbin , et al. (5 additional authors not shown)

    Abstract: This article describes the 2023 IEEE Low-Power Computer Vision Challenge (LPCVC). Since 2015, LPCVC has been an international competition devoted to tackling the challenge of computer vision (CV) on edge devices. Most CV researchers focus on improving accuracy, at the expense of ever-growing sizes of machine models. LPCVC balances accuracy with resource requirements. Winners must achieve high accu… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: LPCVC 2023, website: https://lpcv.ai/

  21. arXiv:2403.05002  [pdf, other

    cs.RO

    LHMap-loc: Cross-Modal Monocular Localization Using LiDAR Point Cloud Heat Map

    Authors: Xinrui Wu, Jianbo Xu, Puyuan Hu, Guangming Wang, Hesheng Wang

    Abstract: Localization using a monocular camera in the pre-built LiDAR point cloud map has drawn increasing attention in the field of autonomous driving and mobile robotics. However, there are still many challenges (e.g. difficulties of map storage, poor localization robustness in large scenes) in accurately and efficiently implementing cross-modal localization. To solve these problems, a novel pipeline ter… ▽ More

    Submitted 10 March, 2024; v1 submitted 7 March, 2024; originally announced March 2024.

    Comments: Accepted by 2024 IEEE International Conference on Robotics and Automation (ICRA 2024)

  22. arXiv:2402.16297  [pdf, other

    cs.LG cs.AI

    A Poisson-Gamma Dynamic Factor Model with Time-Varying Transition Dynamics

    Authors: Jiahao Wang, Sikun Yang, Heinz Koeppl, Xiuzhen Cheng, Pengfei Hu, Guoming Zhang

    Abstract: Probabilistic approaches for handling count-valued time sequences have attracted amounts of research attentions because their ability to infer explainable latent structures and to estimate uncertainties, and thus are especially suitable for dealing with \emph{noisy} and \emph{incomplete} count data. Among these models, Poisson-Gamma Dynamical Systems (PGDSs) are proven to be effective in capturing… ▽ More

    Submitted 23 May, 2024; v1 submitted 25 February, 2024; originally announced February 2024.

  23. arXiv:2402.15141  [pdf, ps, other

    math.NA cs.LG

    A note on the adjoint method for neural ordinary differential equation network

    Authors: Pipi Hu

    Abstract: Perturbation and operator adjoint method are used to give the right adjoint form rigourously. From the derivation, we can have following results: 1) The loss gradient is not an ODE, it is an integral and we shows the reason; 2) The traditional adjoint form is not equivalent with the back propagation results. 3) The adjoint operator analysis shows that if and only if the discrete adjoint has the sa… ▽ More

    Submitted 23 February, 2024; originally announced February 2024.

  24. arXiv:2401.11818  [pdf, other

    cs.MM

    MInD: Improving Multimodal Sentiment Analysis via Multimodal Information Disentanglement

    Authors: Weichen Dai, Xingyu Li, Pengbo Hu, Zeyu Wang, Ji Qi, Jianlin Peng, Yi Zhou

    Abstract: Learning effective joint representations has been a central task in multimodal sentiment analysis. Previous methods focus on leveraging the correlations between different modalities and enhancing performance through sophisticated fusion techniques. However, challenges still exist due to the inherent heterogeneity of distinct modalities, which may lead to distributional gap, impeding the full explo… ▽ More

    Submitted 22 January, 2024; originally announced January 2024.

  25. arXiv:2401.10370  [pdf, other

    q-fin.CP cs.LG q-fin.RM q-fin.ST

    Deep Generative Modeling for Financial Time Series with Application in VaR: A Comparative Review

    Authors: Lars Ericson, Xuejun Zhu, Xusi Han, Rao Fu, Shuang Li, Steve Guo, Ping Hu

    Abstract: In the financial services industry, forecasting the risk factor distribution conditional on the history and the current market environment is the key to market risk modeling in general and value at risk (VaR) model in particular. As one of the most widely adopted VaR models in commercial banks, Historical simulation (HS) uses the empirical distribution of daily returns in a historical window as th… ▽ More

    Submitted 18 January, 2024; originally announced January 2024.

  26. arXiv:2401.07842  [pdf, ps, other

    cs.NI

    Closing the Performance and Management Gaps with Satellite Internet: Challenges, Approaches, and Future Directions

    Authors: Peng Hu

    Abstract: Recent advancements in low-Earth orbit (LEO) satellites represented by large constellations and advanced payloads provide great promises for enabling beyond 5G and 6G telecommunications and high-quality and ubiquitous Internet connectivity to everyone anywhere on Earth. LEO satellite networks are envisioned to bridge the urban-rural connectivity gap for the digital divide. However, the digital div… ▽ More

    Submitted 15 January, 2024; originally announced January 2024.

    Comments: Published at the IAB Workshop on Barriers to Internet Access of Services (BIAS) 2024. Available at: https://www.ietf.org/slides/slides-biasws-closing-the-performance-and-management-gaps-with-satellite-internet-challenges-approaches-and-future-directions-01.pdf

  27. arXiv:2401.06786  [pdf, other

    cs.DC cs.AI

    CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation

    Authors: Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Zhuoqing Mao, Ennan Zhai, Dennis Cai

    Abstract: Among the thriving ecosystem of cloud computing and the proliferation of Large Language Model (LLM)-based code generation tools, there is a lack of benchmarking for code generation in cloud-native applications. In response to this need, we present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de fac… ▽ More

    Submitted 9 November, 2023; originally announced January 2024.

  28. arXiv:2401.02869  [pdf, ps, other

    cs.LO

    Practical Reasoning in DatalogMTL

    Authors: Dingmin Wang, Przemysław A. Wałęga, Pan Hu, Bernardo Cuenca Grau

    Abstract: DatalogMTL is an extension of Datalog with metric temporal operators that has found an increasing number of applications in recent years. Reasoning in DatalogMTL is, however, of high computational complexity, which makes reasoning in modern data-intensive applications challenging. In this paper we present a practical reasoning algorithm for the full DatalogMTL language, which we have implemented i… ▽ More

    Submitted 5 January, 2024; originally announced January 2024.

    Comments: Under consideration in Theory and Practice of Logic Programming (TPLP). arXiv admin note: text overlap with arXiv:2208.07100

  29. arXiv:2401.01077  [pdf, other

    cs.LG

    Constrained Online Two-stage Stochastic Optimization: Algorithm with (and without) Predictions

    Authors: Piao Hu, Jiashuo Jiang, Guodong Lyu, Hao Su

    Abstract: We consider an online two-stage stochastic optimization with long-term constraints over a finite horizon of $T$ periods. At each period, we take the first-stage action, observe a model parameter realization and then take the second-stage action from a feasible set that depends both on the first-stage decision and the model parameter. We aim to minimize the cumulative objective value while guarante… ▽ More

    Submitted 2 January, 2024; originally announced January 2024.

    Comments: arXiv admin note: substantial text overlap with arXiv:2302.00997

  30. arXiv:2401.00435  [pdf, other

    cs.CV cs.AI

    Bidirectional Trained Tree-Structured Decoder for Handwritten Mathematical Expression Recognition

    Authors: Hanbo Cheng, Chenyu Liu, Pengfei Hu, Zhenrong Zhang, Jiefeng Ma, Jun Du

    Abstract: The Handwritten Mathematical Expression Recognition (HMER) task is a critical branch in the field of OCR. Recent studies have demonstrated that incorporating bidirectional context information significantly improves the performance of HMER models. However, existing methods fail to effectively utilize bidirectional context information during the inference stage. Furthermore, current bidirectional tr… ▽ More

    Submitted 31 December, 2023; originally announced January 2024.

  31. arXiv:2312.11297  [pdf, other

    cs.DB cs.AI

    Optimised Storage for Datalog Reasoning

    Authors: Xinyue Zhang, Pan Hu, Yavor Nenov, Ian Horrocks

    Abstract: Materialisation facilitates Datalog reasoning by precomputing all consequences of the facts and the rules so that queries can be directly answered over the materialised facts. However, storing all materialised facts may be infeasible in practice, especially when the rules are complex and the given set of facts is large. We observe that for certain combinations of rules, there exist data structures… ▽ More

    Submitted 19 December, 2023; v1 submitted 18 December, 2023; originally announced December 2023.

    Comments: 19 pages

  32. arXiv:2312.05583  [pdf, other

    cs.LG cs.AI math.NA

    Better Neural PDE Solvers Through Data-Free Mesh Movers

    Authors: Peiyan Hu, Yue Wang, Zhi-Ming Ma

    Abstract: Recently, neural networks have been extensively employed to solve partial differential equations (PDEs) in physical system modeling. While major studies focus on learning system evolution on predefined static mesh discretizations, some methods utilize reinforcement learning or supervised learning techniques to create adaptive and dynamic meshes, due to the dynamic nature of these systems. However,… ▽ More

    Submitted 19 February, 2024; v1 submitted 9 December, 2023; originally announced December 2023.

  33. arXiv:2312.04038  [pdf, other

    cs.LG math.DS math.NA

    Reconstruction of dynamical systems from data without time labels

    Authors: Zhijun Zeng, Pipi Hu, Chenglong Bao, Yi Zhu, Zuoqiang Shi

    Abstract: In this paper, we study the method to reconstruct dynamical systems from data without time labels. Data without time labels appear in many applications, such as molecular dynamics, single-cell RNA sequencing etc. Reconstruction of dynamical system from time sequence data has been studied extensively. However, these methods do not apply if time labels are unknown. Without time labels, sequence data… ▽ More

    Submitted 8 April, 2024; v1 submitted 6 December, 2023; originally announced December 2023.

  34. arXiv:2312.03018  [pdf, other

    cs.CV

    DreamVideo: High-Fidelity Image-to-Video Generation with Image Retention and Text Guidance

    Authors: Cong Wang, Jiaxi Gu, Panwen Hu, Songcen Xu, Hang Xu, Xiaodan Liang

    Abstract: Image-to-video generation, which aims to generate a video starting from a given reference image, has drawn great attention. Existing methods try to extend pre-trained text-guided image diffusion models to image-guided video generation models. Nevertheless, these methods often result in either low fidelity or flickering over time due to their limitation to shallow image guidance and poor temporal c… ▽ More

    Submitted 12 December, 2023; v1 submitted 4 December, 2023; originally announced December 2023.

  35. arXiv:2312.00823  [pdf, other

    cs.LG cs.AI cs.CV

    Adaptive Multi-Modality Prompt Learning

    Authors: Zongqian Wu, Yujing Liu, Mengmeng Zhan, Jialie Shen, Ping Hu, Xiaofeng Zhu

    Abstract: Although current prompt learning methods have successfully been designed to effectively reuse the large pre-trained models without fine-tuning their large number of parameters, they still have limitations to be addressed, i.e., without considering the adverse impact of meaningless patches in every image and without simultaneously considering in-sample generalization and out-of-sample generalizatio… ▽ More

    Submitted 30 November, 2023; originally announced December 2023.

  36. Decoupling and Interacting Multi-Task Learning Network for Joint Speech and Accent Recognition

    Authors: Qijie Shao, Pengcheng Guo, Jinghao Yan, Pengfei Hu, Lei Xie

    Abstract: Accents, as variations from standard pronunciation, pose significant challenges for speech recognition systems. Although joint automatic speech recognition (ASR) and accent recognition (AR) training has been proven effective in handling multi-accent scenarios, current multi-task ASR-AR approaches overlook the granularity differences between tasks. Fine-grained units capture pronunciation-related a… ▽ More

    Submitted 17 November, 2023; v1 submitted 12 November, 2023; originally announced November 2023.

    Comments: Accepted by IEEE Transactions on Audio, Speech and Language Processing (TASLP)

  37. arXiv:2310.18946  [pdf, other

    cs.CV cs.MM

    Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement

    Authors: Ping Hu, Simon Niklaus, Lu Zhang, Stan Sclaroff, Kate Saenko

    Abstract: In this work, we first propose a fully differentiable Many-to-Many (M2M) splatting framework to interpolate frames efficiently. Given a frame pair, we estimate multiple bidirectional flows to directly forward warp the pixels to the desired time step before fusing overlapping pixels. In doing so, each source pixel renders multiple target pixels and each target pixel can be synthesized from a larger… ▽ More

    Submitted 29 October, 2023; originally announced October 2023.

    Comments: T-PAMI. arXiv admin note: substantial text overlap with arXiv:2204.03513

  38. arXiv:2310.17468  [pdf, other

    cs.CV cs.LG

    Cross-modal Active Complementary Learning with Self-refining Correspondence

    Authors: Yang Qin, Yuan Sun, Dezhong Peng, Joey Tianyi Zhou, Xi Peng, Peng Hu

    Abstract: Recently, image-text matching has attracted more and more attention from academia and industry, which is fundamental to understanding the latent correspondence across visual and textual modalities. However, most existing methods implicitly assume the training pairs are well-aligned while ignoring the ubiquitous annotation noise, a.k.a noisy correspondence (NC), thereby inevitably leading to a perf… ▽ More

    Submitted 7 January, 2024; v1 submitted 26 October, 2023; originally announced October 2023.

    Comments: This paper is accepted by NeurIPS 2023

  39. arXiv:2310.11989  [pdf, other

    cs.LG

    Image Clustering with External Guidance

    Authors: Yunfan Li, Peng Hu, Dezhong Peng, Jiancheng Lv, Jianping Fan, Xi Peng

    Abstract: The core of clustering is incorporating prior knowledge to construct supervision signals. From classic k-means based on data compactness to recent contrastive clustering guided by self-supervision, the evolution of clustering methods intrinsically corresponds to the progression of supervision signals. At present, substantial efforts have been devoted to mining internal supervision signals from dat… ▽ More

    Submitted 16 May, 2024; v1 submitted 18 October, 2023; originally announced October 2023.

    Journal ref: ICML 2024

  40. arXiv:2310.11598  [pdf, other

    cs.CV

    Learning Neural Implicit through Volume Rendering with Attentive Depth Fusion Priors

    Authors: Pengchong Hu, Zhizhong Han

    Abstract: Learning neural implicit representations has achieved remarkable performance in 3D reconstruction from multi-view images. Current methods use volume rendering to render implicit representations into either RGB or depth images that are supervised by multi-view ground truth. However, rendering a view each time suffers from incomplete depth at holes and unawareness of occluded structures from the dep… ▽ More

    Submitted 7 January, 2024; v1 submitted 17 October, 2023; originally announced October 2023.

    Comments: NeurIPS 2023

  41. arXiv:2310.09297  [pdf, other

    cs.LG cs.AI cs.CL

    A Framework for Inference Inspired by Human Memory Mechanisms

    Authors: Xiangyu Zeng, Jie Lin, Piao Hu, Ruizheng Huang, Zhicheng Zhang

    Abstract: How humans and machines make sense of current inputs for relation reasoning and question-answering while putting the perceived information into context of our past memories, has been a challenging conundrum in cognitive science and artificial intelligence. Inspired by human brain's memory system and cognitive architectures, we propose a PMI framework that consists of perception, memory and inferen… ▽ More

    Submitted 20 May, 2024; v1 submitted 1 October, 2023; originally announced October 2023.

  42. arXiv:2309.12113  [pdf, other

    cs.AI

    Incentivizing Massive Unknown Workers for Budget-Limited Crowdsensing: From Off-Line and On-Line Perspectives

    Authors: Feng Li, Yuqi Chai, Huan Yang, Pengfei Hu, Lingjie Duan

    Abstract: How to incentivize strategic workers using limited budget is a very fundamental problem for crowdsensing systems; nevertheless, since the sensing abilities of the workers may not always be known as prior knowledge due to the diversities of their sensor devices and behaviors, it is difficult to properly select and pay the unknown workers. Although the uncertainties of the workers can be addressed b… ▽ More

    Submitted 2 January, 2024; v1 submitted 21 September, 2023; originally announced September 2023.

  43. arXiv:2309.07925  [pdf, other

    eess.AS cs.AI cs.MM cs.SD

    Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023

    Authors: Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng

    Abstract: In this paper, we propose a novel framework for recognizing both discrete and dimensional emotions. In our framework, deep features extracted from foundation models are used as robust acoustic and visual representations of raw video. Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion. Then, we introduce a joint decoding structure for e… ▽ More

    Submitted 10 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures

    Journal ref: The 31st ACM International Conference on Multimedia (MM'23), 2023

  44. arXiv:2309.04814  [pdf, other

    cs.CV

    Speech2Lip: High-fidelity Speech to Lip Generation by Learning from a Short Video

    Authors: Xiuzhe Wu, Pengfei Hu, Yang Wu, Xiaoyang Lyu, Yan-Pei Cao, Ying Shan, Wenming Yang, Zhongqian Sun, Xiaojuan Qi

    Abstract: Synthesizing realistic videos according to a given speech is still an open challenge. Previous works have been plagued by issues such as inaccurate lip shape generation and poor image quality. The key reason is that only motions and appearances on limited facial areas (e.g., lip area) are mainly driven by the input speech. Therefore, directly learning a mapping function from speech to the entire h… ▽ More

    Submitted 9 September, 2023; originally announced September 2023.

  45. arXiv:2309.02399  [pdf, other

    cs.SD cs.DL eess.AS

    The Batik-plays-Mozart Corpus: Linking Performance to Score to Musicological Annotations

    Authors: Patricia Hu, Gerhard Widmer

    Abstract: We present the Batik-plays-Mozart Corpus, a piano performance dataset combining professional Mozart piano sonata performances with expert-labelled scores at a note-precise level. The performances originate from a recording by Viennese pianist Roland Batik on a computer-monitored Bösendorfer grand piano, and are available both as MIDI files and audio recordings. They have been precisely aligned, no… ▽ More

    Submitted 6 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

    Comments: To be published in the Proceedings of the 24th International Society for Music Information Retrieval Conference (ISMIR 2023), Milan, Italy

  46. arXiv:2308.14667  [pdf

    cs.CV cs.AI

    Neural Network-Based Histologic Remission Prediction In Ulcerative Colitis

    Authors: Yemin li, Zhongcheng Liu, Xiaoying Lou, Mirigual Kurban, Miao Li, Jie Yang, Kaiwei Che, Jiankun Wang, Max Q. -H Meng, Yan Huang, Qin Guo, Pinjin Hu

    Abstract: BACKGROUND & AIMS: Histological remission (HR) is advocated and considered as a new therapeutic target in ulcerative colitis (UC). Diagnosis of histologic remission currently relies on biopsy; during this process, patients are at risk for bleeding, infection, and post-biopsy fibrosis. In addition, histologic response scoring is complex and time-consuming, and there is heterogeneity among pathologi… ▽ More

    Submitted 28 August, 2023; originally announced August 2023.

  47. arXiv:2308.12350  [pdf, other

    cs.CV

    Diffusion-based Image Translation with Label Guidance for Domain Adaptive Semantic Segmentation

    Authors: Duo Peng, Ping Hu, Qiuhong Ke, Jun Liu

    Abstract: Translating images from a source domain to a target domain for learning target models is one of the most common strategies in domain adaptive semantic segmentation (DASS). However, existing methods still struggle to preserve semantically-consistent local details between the original and translated images. In this work, we present an innovative approach that addresses this challenge by using source… ▽ More

    Submitted 23 August, 2023; originally announced August 2023.

    Comments: Accepted to ICCV2023

  48. arXiv:2308.11164  [pdf, other

    cs.CV

    Decoupled Contrastive Multi-View Clustering with High-Order Random Walks

    Authors: Yiding Lu, Yijie Lin, Mouxing Yang, Dezhong Peng, Peng Hu, Xi Peng

    Abstract: In recent, some robust contrastive multi-view clustering (MvC) methods have been proposed, which construct data pairs from neighborhoods to alleviate the false negative issue, i.e., some intra-cluster samples are wrongly treated as negative pairs. Although promising performance has been achieved by these methods, the false negative issue is still far from addressed and the false positive issue eme… ▽ More

    Submitted 18 January, 2024; v1 submitted 21 August, 2023; originally announced August 2023.

    Comments: Accepted by AAAI 2024

  49. arXiv:2308.09911  [pdf, other

    cs.CV cs.MM

    Noisy-Correspondence Learning for Text-to-Image Person Re-identification

    Authors: Yang Qin, Yingke Chen, Dezhong Peng, Xi Peng, Joey Tianyi Zhou, Peng Hu

    Abstract: Text-to-image person re-identification (TIReID) is a compelling topic in the cross-modal community, which aims to retrieve the target person based on a textual query. Although numerous TIReID methods have been proposed and achieved promising performance, they implicitly assume the training image-text pairs are correctly aligned, which is not always the case in real-world scenarios. In practice, th… ▽ More

    Submitted 28 March, 2024; v1 submitted 19 August, 2023; originally announced August 2023.

  50. arXiv:2308.09658  [pdf, other

    cs.CL cs.AI cs.CV

    Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning

    Authors: Pengbo Hu, Ji Qi, Xingyu Li, Hong Li, Xinqi Wang, Bing Quan, Ruiyu Wang, Yi Zhou

    Abstract: There emerges a promising trend of using large language models (LLMs) to generate code-like plans for complex inference tasks such as visual reasoning. This paradigm, known as LLM-based planning, provides flexibility in problem solving and endows better interpretability. However, current research is mostly limited to basic scenarios of simple questions that can be straightforward answered in a few… ▽ More

    Submitted 20 August, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

    Comments: 16 pages,1 figures, under review