Skip to main content

Showing 1–50 of 1,605 results for author: Cao, J

  1. arXiv:2407.11385  [pdf, other

    cs.RO cs.GR

    Grasping Diverse Objects with Simulated Humanoids

    Authors: Zhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu

    Abstract: We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. T… ▽ More

    Submitted 16 July, 2024; originally announced July 2024.

    Comments: Project page: https://www.zhengyiluo.com/Omnigrasp/

  2. arXiv:2407.10613  [pdf, other

    physics.plasm-ph

    Global destabilization of drift-tearing mode with coupling to discretized electron drift-wave instability

    Authors: J. Bao, W. L. Zhang, Z. Lin, H. S. Cai, D. J. Liu, H. T. Chen, C. Dong, J. T. Cao, D. Li

    Abstract: The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable sol… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 23 pages, 15 figues

  3. arXiv:2407.10486  [pdf, other

    cs.AI cs.CL

    IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

    Authors: Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

    Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically i… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

  4. arXiv:2407.08136  [pdf, other

    cs.CV

    EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

    Authors: Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma

    Abstract: The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to… ▽ More

    Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

  5. arXiv:2407.07690  [pdf

    physics.optics physics.app-ph

    High power GaSb-based distributed feedback laser with laterally coupled dielectric gratings at 1.95μm

    Authors: Zhengqing Ding, Juntian Cao, Kun Zhan, Yihang Chen, Lidan Zhou, Hao Tan, Chenao Yang, Ying Yu, Zhichuan Niu, Siyuan Yu

    Abstract: Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce ad… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Comments: 9 pages, 7 figures, 1 table

    MSC Class: 78A60 ACM Class: J.2.6

  6. Heat transfer enhancement by mist/air two-phase flow in a high-temperature channel

    Authors: Junxian Cao, Mengqi Ye, Haiwang Li, Tianyou Wang, Zhizhao Che

    Abstract: Mist/air two-phase flow is a promising cooling technique for many applications such as internal cooling of gas turbine blades. A significant enhancement of heat transfer can be achieved with a low mass fraction of droplets by utilizing the latent heat of the droplets. Using newly designed atomizers to accurately control the mist droplets, this study experimentally explores the heat transfer perfor… ▽ More

    Submitted 6 July, 2024; originally announced July 2024.

    Comments: 16 pages, 10 figures

    Journal ref: International Journal of Heat and Mass Transfer. Volume 193, 1 September 2022, 122966Volume 193, 1 September 2022, 122966

  7. arXiv:2407.03937  [pdf, other

    cs.CL

    TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models

    Authors: Jiahuan Cao, Dezhi Peng, Peirong Zhang, Yongxin Shi, Yang Liu, Kai Ding, Lianwen Jin

    Abstract: Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowle… ▽ More

    Submitted 4 July, 2024; originally announced July 2024.

  8. arXiv:2407.02759  [pdf

    cs.LG cs.AI

    Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System

    Authors: Yang Zhao, Chang Zhou, Jin Cao, Yi Zhao, Shaobo Liu, Chiyu Cheng, Xingchen Li

    Abstract: This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a sh… ▽ More

    Submitted 2 July, 2024; originally announced July 2024.

    Comments: Accepted by 2024 5th International Conference on Artificial Intelligence and Electromechanical Automation IEEE (ISBN: 979-8-3503-6617-4)

  9. arXiv:2407.00692  [pdf, ps, other

    math.AG

    The motivic fundamental group of a punctured elliptic curve and algebraic cycles

    Authors: Jin Cao, Tomohide Terasoma

    Abstract: In this paper, we consider the motivic fundamental group of the punctured elliptic curves as a DG complex in the DG category of elliptic motives and describe its resolution via Schur complexes. During this process, we find the algebraic cycles analogous to the Bloch-Totaro cycles.

    Submitted 30 June, 2024; originally announced July 2024.

    MSC Class: 14C15; 14C25

  10. arXiv:2407.00639  [pdf, other

    astro-ph.HE

    GRB 221009A/SN 2022xiw: A Supernova Obscured by a Gamma-Ray Burst Afterglow?

    Authors: De-Feng Kong, Xiang-Gao Wang, WeiKang Zheng, Hou-Jun Lü, L. P. Xin, Da-Bin Lin, Jia-Xin Cao, Ming-Xuan Lu, B. Ren, Edgar P. Vidal, J. Y. Wei, En-Wei Liang, Alexei V. Filippenko

    Abstract: We present optical photometry for the afterglow of GRB 221009A, in some respects the most extraordinary gamma-ray burst (GRB) ever observed. Good quality in the R-band light curve is obtained, covering 0.32-19.57 days since the Fermi-GBM trigger. We find that a weak bump emerges fromthe declining afterglow at $t \approx 11$ days; a supernova (SN) may be responsible. We use a smooth broken power-la… ▽ More

    Submitted 30 June, 2024; originally announced July 2024.

  11. arXiv:2407.00187  [pdf, other

    cs.RO cs.CV cs.GR

    SMPLOlympics: Sports Environments for Physically Simulated Humanoids

    Authors: Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, Jingbo Wang, Ye Yuan, Jinkun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani

    Abstract: We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for m… ▽ More

    Submitted 28 June, 2024; originally announced July 2024.

    Comments: Project page: https://smplolympics.github.io/SMPLOlympics

  12. arXiv:2406.19969  [pdf, other

    q-bio.QM

    Enhancing Terrestrial Net Primary Productivity Estimation with EXP-CASA: A Novel Light Use Efficiency Model Approach

    Authors: Guanzhou Chen, Kaiqi Zhang, Xiaodong Zhang, Hong Xie, Haobo Yang, Xiaoliang Tan, Tong Wang, Yule Ma, Qing Wang, Jinzhou Cao, Weihong Cui

    Abstract: The Light Use Efficiency model, epitomized by the CASA model, is extensively applied in the quantitative estimation of vegetation Net Primary Productivity. However, the classic CASA model is marked by significant complexity: the estimation of environmental stress parameters, in particular, necessitates multi-source observation data, adding to the complexity and uncertainty of the model's operation… ▽ More

    Submitted 28 June, 2024; originally announced June 2024.

  13. arXiv:2406.19434  [pdf, other

    cs.GR cs.AI

    Lightweight Predictive 3D Gaussian Splats

    Authors: Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren

    Abstract: Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space.… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Project Page: https://plumpuddings.github.io/LPGS//

  14. arXiv:2406.18069  [pdf, other

    eess.SP cs.AI cs.CL

    Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

    Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

    Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More

    Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

  15. arXiv:2406.17624  [pdf, other

    cs.CL cs.AI

    Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models

    Authors: Zhiyuan Wen, Yu Yang, Jiannong Cao, Haoming Sun, Ruosong Yang, Shuaiqi Liu

    Abstract: As large language models (LLMs) appear to behave increasingly human-like in text-based interactions, more and more researchers become interested in investigating personality in LLMs. However, the diversity of psychological personality research and the rapid development of LLMs have led to a broad yet fragmented landscape of studies in this interdisciplinary field. Extensive studies across differen… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  16. arXiv:2406.17309  [pdf, other

    cs.CV

    Zero-Shot Long-Form Video Understanding through Screenplay

    Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

    Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge

  17. arXiv:2406.17307  [pdf, other

    stat.CO

    Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation

    Authors: Jian Cao, Matthias Katzfuss

    Abstract: We propose a linear-complexity method for sampling from truncated multivariate normal (TMVN) distributions with high fidelity by applying nearest-neighbor approximations to a product-of-conditionals decomposition of the TMVN density. To make the sequential sampling based on the decomposition feasible, we introduce a novel method that avoids the intractable high-dimensional TMVN distribution by sam… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

  18. arXiv:2406.16245  [pdf

    cond-mat.mtrl-sci

    Single-Layer Fe-Cu Interphase in Ferritic Steels Stabilized by Magnetic Friedel Oscillations

    Authors: Wen-Qiang Xie, Jin-Li Cao, Wen-Tong Geng

    Abstract: Copper precipitation is a technique extensively deployed in steel strengthening. Being as tiny as a few nanometers in diameter, the Cu precipitates present a real challenge to experimental techniques in determination of their composition. The late Professor Morris Fine called it a mystery when addressing the discrepancy between the fact of low solubility of Fe in bulk Cu and the remarkable content… ▽ More

    Submitted 23 June, 2024; originally announced June 2024.

    Comments: 17 pages, 11 figures

  19. arXiv:2406.15781  [pdf, other

    cs.CL

    DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

    Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

    Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anoma… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

  20. arXiv:2406.15769  [pdf, other

    cs.DC

    Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

    Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

    Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More

    Submitted 22 June, 2024; originally announced June 2024.

    Comments: 14 pages; 27 figures

  21. arXiv:2406.14644  [pdf, other

    cs.CL

    Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation

    Authors: Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, Arman Cohan

    Abstract: Data contamination has garnered increased attention in the era of large language models (LLMs) due to the reliance on extensive internet-derived training corpora. The issue of training corpus overlap with evaluation benchmarks--referred to as contamination--has been the focus of significant recent research. This body of work aims to identify contamination, understand its impacts, and explore mitig… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

    Comments: ACL 2024 Camera-Ready Version

  22. arXiv:2406.14558  [pdf, other

    cs.RO cs.AI

    CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

    Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

    Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  23. arXiv:2406.13201  [pdf, other

    cs.LG cs.SI

    Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach

    Authors: Yicong Li, Yu Yang, Jiannong Cao, Shuaiqi Liu, Haoran Tang, Guandong Xu

    Abstract: Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

  24. arXiv:2406.12912  [pdf, other

    physics.ins-det hep-ex

    Burn-in Test and Thermal Performance Evaluation of Silicon Photomultipliers for the JUNO-TAO Experiment

    Authors: X. Chen, G. F. Cao, M. H. Qu, H. W. Wang, N. Anfimov, A. Rybnikov, J. Y. Xu, A. Q. Su, Z. L. Chen, J. Cao, Y. C. Li, M. Qi

    Abstract: This study evaluates more than 4,000 tiles made of Hamamatsu visual-sensitive silicon photomultipier (SiPM), each with dimensions of 5 $\times$ 5 cm$^2$, intended for the central detector of the Taishan Anti-neutrino Observatory (TAO), a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO) aimed at measuring the reactor anti-neutrino energy spectrum with unprecedented energ… ▽ More

    Submitted 13 June, 2024; originally announced June 2024.

    Comments: 15 pages, 15 figures, submitted to JINST

    Report number: JUNO-doc-11626

  25. arXiv:2406.12902  [pdf, other

    cs.LG cs.AI cs.PL cs.SE

    Can AI Beat Undergraduates in Entry-level Java Assignments? Benchmarking Large Language Models on JavaBench

    Authors: Jialun Cao, Zhiyong Chen, Jiarong Wu, Shing-chi Cheung, Chang Xu

    Abstract: Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  26. arXiv:2406.12380  [pdf, other

    hep-ex physics.ins-det

    Search for fractionally charged particles with CUORE

    Authors: CUORE Collaboration, D. Q. Adams, C. Alduino, K. Alfonso, F. T. Avignone III, O. Azzolini, G. Bari, F. Bellini, G. Benato, M. Beretta, M. Biassoni, A. Branca, C. Brofferio, C. Bucci, J. Camilleri, A. Caminata, A. Campani, J. Cao, S. Capelli, C. Capelli, L. Cappelli, L. Cardani, P. Carniti, N. Casali, E. Celi , et al. (95 additional authors not shown)

    Abstract: The Cryogenic Underground Observatory for Rare Events (CUORE) is a detector array comprised by 988 5$\;$cm$\times$5$\;$cm$\times$5$\;$cm TeO$_2$ crystals held below 20 mK, primarily searching for neutrinoless double-beta decay in $^{130}$Te. Unprecedented in size amongst cryogenic calorimetric experiments, CUORE provides a promising setting for the study of exotic through-going particles. Using th… ▽ More

    Submitted 18 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures

  27. arXiv:2406.12200  [pdf, other

    cs.LG cs.DC cs.ET cs.MM cs.NE

    SFedCA: Credit Assignment-Based Active Client Selection Strategy for Spiking Federated Learning

    Authors: Qiugang Zhan, Jinbo Cao, Xiurui Xie, Malu Zhang, Huajin Tang, Guisong Liu

    Abstract: Spiking federated learning is an emerging distributed learning paradigm that allows resource-constrained devices to train collaboratively at low power consumption without exchanging local data. It takes advantage of both the privacy computation property in federated learning (FL) and the energy efficiency in spiking neural networks (SNN). Thus, it is highly promising to revolutionize the efficient… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: 9 pages

  28. arXiv:2406.11517  [pdf, other

    cs.LG cs.AI

    Revisiting Spurious Correlation in Domain Generalization

    Authors: Bin Qin, Jiangmeng Li, Yi Li, Xuesong Wu, Yupeng Wang, Wenwen Qiang, Jianwen Cao

    Abstract: Without loss of generality, existing machine learning techniques may learn spurious correlation dependent on the domain, which exacerbates the generalization of models in out-of-distribution (OOD) scenarios. To address this issue, recent works build a structural causal model (SCM) to describe the causality within data generation process, thereby motivating methods to avoid the learning of spurious… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

  29. arXiv:2406.11501  [pdf, other

    cs.LG cs.AI stat.ME

    Teleporter Theory: A General and Simple Approach for Modeling Cross-World Counterfactual Causality

    Authors: Jiangmeng Li, Bin Qin, Qirui Ji, Yi Li, Wenwen Qiang, Jianwen Cao, Fanjiang Xu

    Abstract: Leveraging the development of structural causal model (SCM), researchers can establish graphical models for exploring the causal mechanisms behind machine learning techniques. As the complexity of machine learning applications rises, single-world interventionism causal analysis encounters theoretical adaptation limitations. Accordingly, cross-world counterfactual approach extends our understanding… ▽ More

    Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

  30. arXiv:2406.11180  [pdf, other

    cond-mat.mes-hall cond-mat.mtrl-sci

    Definition and Frequency Dependence of Intrinsic Nonlinear Current

    Authors: Cong Xiao, Jin Cao, Qian Niu, Shengyuan A. Yang

    Abstract: We show that the three commonly employed approaches that define the same intrinsic linear anomalous Hall response actually lead to different results for intrinsic nonlinear transport. The difference arises from an intrinsic anomalous distribution. It originates from scattering, but its value is completely independent of scattering, because it represents the local equilibration of electron wave pac… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

  31. arXiv:2406.11087  [pdf, other

    cs.CR cs.AI cs.CL cs.LG

    MemDPT: Differential Privacy for Memory Efficient Language Models

    Authors: Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Chen Ma, Songhang Deng, Mengchen Fu, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

    Abstract: Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on… ▽ More

    Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

    Comments: 12 pages first version

  32. arXiv:2406.10744  [pdf, other

    cs.CV

    Technique Report of CVPR 2024 PBDL Challenges

    Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

    Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More

    Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

    Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

  33. arXiv:2406.10424  [pdf, other

    cs.CV cs.AI

    What is the Visual Cognition Gap between Humans and Multimodal LLMs?

    Authors: Xu Cao, Bolin Lai, Wenqian Ye, Yunsheng Ma, Joerg Heintz, Jintai Chen, Jianguo Cao, James M. Rehg

    Abstract: Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: 14 pages, 4 figures, the appendix will be updated soon

    MSC Class: 68T01

  34. arXiv:2406.10239  [pdf

    cs.IR cs.LG

    Predict Click-Through Rates with Deep Interest Network Model in E-commerce Advertising

    Authors: Chang Zhou, Yang Zhao, Yuelin Zou, Jin Cao, Wenhan Fan, Yi Zhao, Chiyu Cheng

    Abstract: This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to tradi… ▽ More

    Submitted 4 June, 2024; originally announced June 2024.

    Comments: Accepted by the 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS 2024), 2024 IEEE

  35. arXiv:2406.09828  [pdf, other

    cs.RO

    Dynamic Decentralized 3D Urban Coverage and Patrol with UAVs

    Authors: Wai Lun Leong, Jiawei Cao, Rodney Teo

    Abstract: In the event of natural or man-made disasters in an urban environment, such as fires, floods, and earthquakes, a swarm of unmanned aerial vehicles (UAVs) can rapidly sweep and provide coverage to monitor the area of interest and locate survivors. We propose a modular framework and patrol strategy that enables a swarm of UAVs to perform cooperative and periodic coverage in such scenarios. Our appro… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

    Comments: Accepted for the 2024 International Conference on Unmanned Aircraft Systems (ICUAS 2024) in Chania, Greece

  36. arXiv:2406.09779  [pdf, other

    cs.AI cs.CL cs.CV

    OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

    Authors: Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai Wong

    Abstract: Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Langu… ▽ More

    Submitted 14 June, 2024; originally announced June 2024.

  37. arXiv:2406.07944  [pdf, other

    cs.SE cs.AI

    DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis

    Authors: Meiziniu Li, Dongze Li, Jianmeng Liu, Jialun Cao, Yongqiang Tian, Shing-Chi Cheung

    Abstract: Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. However, these techniques are limited in finding implementations that offer the same functionality and generating diverse test inputs for differential testing. This paper introduces DLLens, a novel dif… ▽ More

    Submitted 12 June, 2024; originally announced June 2024.

    ACM Class: D.2.5; I.2.5

  38. arXiv:2406.07789  [pdf, ps, other

    math.NA

    A posteriori error estimates for the exponential midpoint method for linear and semilinear parabolic equations

    Authors: Xianfa Hu, Wansheng Wang, Mengli Mao, Jiliang Cao

    Abstract: In this paper, the a posteriori error estimates of the exponential midpoint method for time discretization are studied for linear and semilinear parabolic equations. Using the exponential midpoint approximation defined by a continuous and piecewise linear interpolation of nodal values yields the suboptimal order estimates. Based on the property of the entire function, we introduce a continuous and… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  39. arXiv:2406.07472  [pdf, other

    cs.CV

    4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

    Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

    Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  40. arXiv:2406.07038  [pdf, ps, other

    physics.comp-ph physics.flu-dyn

    A Multi-Scale Boltzmann Equation for Complex Systems of Neutral Gases across All Flow Regimes

    Authors: Sha Liu, Junzhe Cao, Sirui Yang, Chengwen Zhong

    Abstract: A Multi-scale Boltzmann Equation (MBE) is found from the gas-kinetic theory and the direct modeling philosophy as a master equation for complex physical systems of neutral gases across all flow regimes, which locates between the continuum limit and the free-molecular limit, covering a vast range of applications such as hypersonic flows over aerospace crafts and delicate flows around MEMS. The most… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  41. arXiv:2406.05572  [pdf, other

    cs.RO cs.AI

    Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction

    Authors: Aidan Curtis, Nishanth Kumar, Jing Cao, Tomás Lozano-Pérez, Leslie Pack Kaelbling

    Abstract: Recent developments in pretrained large language models (LLMs) applied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constra… ▽ More

    Submitted 8 June, 2024; originally announced June 2024.

  42. arXiv:2406.04844  [pdf, other

    cs.CV

    Multi-Granularity Language-Guided Multi-Object Tracking

    Authors: Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

    Abstract: Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as o… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  43. arXiv:2406.04658  [pdf, other

    cs.CR cs.AI cs.LG

    Advanced Payment Security System:XGBoost, CatBoost and SMOTE Integrated

    Authors: Qi Zheng, Chang Yu, Jin Cao, Yongshun Xu, Qianwen Xing, Yinxin Jin

    Abstract: With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security. This study explores the application of advanced machine learning models, specifically XGBoost and LightGBM, for developing a more accurate and robust Payment Security Protection Model.To enhance data reliability, we meticulously processed the data sources and used SM… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

    Comments: This paper is received by https://ieee-metacom.org

  44. arXiv:2406.04504  [pdf, other

    math.NA math-ph

    Mixed Finite Element Method for Multi-layer Elastic Contact Systems

    Authors: Zhizhuo Zhang, Mikaël Barboteu, Xiaobing Nie, Serge Dumont, Mahmoud Abdel-Aty, Jinde Cao

    Abstract: With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the num… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  45. arXiv:2406.04499  [pdf, other

    math.NA math-ph

    A layer decomposition method for multi-layer elastic contact systems with interlayer Tresca friction

    Authors: Zhizhuo Zhang, Xiaobing Nie, Mikaël Barboteu, Jinde Cao

    Abstract: With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  46. arXiv:2406.04466  [pdf, other

    cs.HC

    Dog Heart Rate and Blood Oxygen Metaverse Interaction System

    Authors: Yanhui Jiang, Jin Cao, Chang Yu

    Abstract: This study developed an improved dog heart rate and blood oxygen sensor system using Arduino. Traditional methods face accuracy and reliability issues. Our system integrates advanced computational techniques with hardware-based sensing to enhance measurement precision. An Arduino microcontroller connected to a heart rate and blood oxygen sensor collects raw data, which is preprocessed and filtered… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures, conference for IEEE metacom accepted (https://ieee-metacom.org/)

  47. arXiv:2406.04465  [pdf, other

    cs.HC

    Rough Set improved Therapy-Based Metaverse Assisting System

    Authors: Jin Cao, Yanhui Jiang, Chang Yu, Feiwei Qin, Zekun Jiang

    Abstract: Chronic neck and shoulder pain (CNSP) is a major global public health issue. Traditional treatments like physiotherapy and rehabilitation have drawbacks, including high costs, low precision, and user discomfort. This paper presents an interactive system based on Cognitive Therapy Theory (CBT) for CNSP treatment. The system includes a pain detection module using EMG and IMU to monitor pain and opti… ▽ More

    Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

    Comments: 7 pages, 5 figures, conference for IEEE metacom accepted (https://ieee-metacom.org/)

  48. arXiv:2406.04333  [pdf, other

    cs.CV

    BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

    Authors: Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this wor… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/BitsFusion

  49. arXiv:2406.04324  [pdf, other

    cs.CV eess.IV

    SF-V: Single Forward Video Generation Model

    Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

    Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: Project Page: https://snap-research.github.io/SF-V

  50. arXiv:2406.03807  [pdf, other

    cs.AI cs.CL cs.RO

    Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering

    Authors: Yanming Liu, Xinyue Peng, Yuwei Zhang, Jiannan Cao, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

    Abstract: Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

    Comments: 46pages first version