subscribe to arXiv mailings

Grasping Diverse Objects with Simulated Humanoids

Authors: Zhengyi Luo, Jinkun Cao, Sammy Christen, Alexander Winkler, Kris Kitani, Weipeng Xu

Abstract: We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. T… ▽ More We present a method for controlling a simulated humanoid to grasp an object and move it to follow an object trajectory. Due to the challenges in controlling a humanoid with dexterous hands, prior methods often use a disembodied hand and only consider vertical lifts or short trajectories. This limited scope hampers their applicability for object manipulation required for animation and simulation. To close this gap, we learn a controller that can pick up a large number (>1200) of objects and carry them to follow randomly generated trajectories. Our key insight is to leverage a humanoid motion representation that provides human-like motor skills and significantly speeds up training. Using only simplistic reward, state, and object representations, our method shows favorable scalability on diverse object and trajectories. For training, we do not need dataset of paired full-body motion and object trajectories. At test time, we only require the object mesh and desired trajectories for grasping and transporting. To demonstrate the capabilities of our method, we show state-of-the-art success rates in following object trajectories and generalizing to unseen objects. Code and models will be released. △ Less

Submitted 16 July, 2024; originally announced July 2024.

Comments: Project page: https://www.zhengyiluo.com/Omnigrasp/

arXiv:2407.10613 [pdf, other]

Global destabilization of drift-tearing mode with coupling to discretized electron drift-wave instability

Authors: J. Bao, W. L. Zhang, Z. Lin, H. S. Cai, D. J. Liu, H. T. Chen, C. Dong, J. T. Cao, D. Li

Abstract: The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable sol… ▽ More The global linear behaviors of 2/1 DTM in the collisional regime are investigated based on a concisely resistive drift-MHD model. Besides DTM, extra normal modes including EDW and SAW are coupled together and destabilized in different parameter regimes by considering resistivity in this system. The EVP approach is applied for solving the eigenstate spectra with the distribution of all unstable solutions. It is found that in the small EDD frequency (omega_*e) regime, DTM growth rate agrees well with local theory that is reduced with increasing omega_*e. However, when omega_*e exceeds a critical threshold omega_*crit, the strongly linear coupling between DTM and other discretized EDW instabilities happens so that the free energies from current and pressure channels can be released together and thus enhance the DTM, of which growth rate increases with increasing omega_*e and deviates from local theory results qualitatively. Correspondingly, a cross-scale mode structure forms with mixed polarization, namely, phi perturbation is dominated by electrostatic polarized short-wavelength oscillation as EDW instability character, and A_para perturbation remains typical tearing mode solution of Alfvenic polarized macroscopic structure. Within omega_*e > omega_*crit, the additional IDD causes phi oscillating structure to shift towards small density gradient domain, which cancels the extra drive from ion channel and thus DTM growth rate is insensitive to IDD frequency. Compared to EDD effects, the IDD effect alone with zero-omega_*e only leads to the stabilization of RTM that shows agreements between global simulation and local theory, which is no longer the condition for DTM regime. These results are useful for clarifying the DTM global properties with underlying physics mechanisms, which occurs in the regime of omega_*e >> gamma_c that is relevant to nowadays tokamak discharges with hot plasmas. △ Less

Submitted 15 July, 2024; originally announced July 2024.

Comments: 23 pages, 15 figues

arXiv:2407.10486 [pdf, other]

IDEAL: Leveraging Infinite and Dynamic Characterizations of Large Language Models for Query-focused Summarization

Authors: Jie Cao, Dian Jiao, Qiang Yan, Wenqiao Zhang, Siliang Tang, Yueting Zhuang

Abstract: Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically i… ▽ More Query-focused summarization (QFS) aims to produce summaries that answer particular questions of interest, enabling greater user control and personalization. With the advent of large language models (LLMs), shows their impressive capability of textual understanding through large-scale pretraining, which implies the great potential of extractive snippet generation. In this paper, we systematically investigated two indispensable characteristics that the LLMs-based QFS models should be harnessed, Lengthy Document Summarization and Efficiently Fine-grained Query-LLM Alignment, respectively. Correspondingly, we propose two modules called Query-aware HyperExpert and Query-focused Infini-attention to access the aforementioned characteristics. These innovations pave the way for broader application and accessibility in the field of QFS technology. Extensive experiments conducted on existing QFS benchmarks indicate the effectiveness and generalizability of the proposed approach. Our code is publicly available at https://github.com/DCDmllm/IDEAL_Summary. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.08136 [pdf, other]

EchoMimic: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditions

Authors: Zhiyuan Chen, Jiajiong Cao, Zhiquan Chen, Yuming Li, Chenguang Ma

Abstract: The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to… ▽ More The area of portrait image animation, propelled by audio input, has witnessed notable progress in the generation of lifelike and dynamic portraits. Conventional methods are limited to utilizing either audios or facial key points to drive images into videos, while they can yield satisfactory results, certain issues exist. For instance, methods driven solely by audios can be unstable at times due to the relatively weaker audio signal, while methods driven exclusively by facial key points, although more stable in driving, can result in unnatural outcomes due to the excessive control of key point information. In addressing the previously mentioned challenges, in this paper, we introduce a novel approach which we named EchoMimic. EchoMimic is concurrently trained using both audios and facial landmarks. Through the implementation of a novel training strategy, EchoMimic is capable of generating portrait videos not only by audios and facial landmarks individually, but also by a combination of both audios and selected facial landmarks. EchoMimic has been comprehensively compared with alternative algorithms across various public datasets and our collected dataset, showcasing superior performance in both quantitative and qualitative evaluations. Additional visualization and access to the source code can be located on the EchoMimic project page. △ Less

Submitted 11 July, 2024; v1 submitted 10 July, 2024; originally announced July 2024.

arXiv:2407.07690 [pdf]

High power GaSb-based distributed feedback laser with laterally coupled dielectric gratings at 1.95μm

Authors: Zhengqing Ding, Juntian Cao, Kun Zhan, Yihang Chen, Lidan Zhou, Hao Tan, Chenao Yang, Ying Yu, Zhichuan Niu, Siyuan Yu

Abstract: Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce ad… ▽ More Traditional Distributed Feedback (DFB) or Distributed Bragg Reflector (DBR) lasers typically utilize buried gratings as frequency-selective optical feedback mechanisms. However, the fabrication of such gratings often necessitates regrowth processes, which can pose technical challenges for materials platforms such as GaAs and GaSb. Metal gratings were also used for GaSb lasers but they introduce additional absorption loss that limits device efficiency and output power. In this paper, we introduce a novel laterally coupled dielectric Bragg grating structure, which enables highly controllable, deterministic, and stable coupling between the grating and the optical mode. Our device demonstrates a continuous-wave output power of 47.02 mW at room temperature, exhibiting stable single-mode operation from 300-1000 mA and achieving a maximum side mode suppression ratio of 46.7 dB. These results underscore the innovative lateral coupled dielectric grating as a feasible and technologically superior approach for fabricating DFB and DBR lasers, which hold universal applicability across different material platforms and wavelength bands. △ Less

Submitted 10 July, 2024; originally announced July 2024.

Comments: 9 pages, 7 figures, 1 table

MSC Class: 78A60 ACM Class: J.2.6

arXiv:2407.05130 [pdf, other]

doi 10.1016/j.ijheatmasstransfer.2022.122966

Heat transfer enhancement by mist/air two-phase flow in a high-temperature channel

Authors: Junxian Cao, Mengqi Ye, Haiwang Li, Tianyou Wang, Zhizhao Che

Abstract: Mist/air two-phase flow is a promising cooling technique for many applications such as internal cooling of gas turbine blades. A significant enhancement of heat transfer can be achieved with a low mass fraction of droplets by utilizing the latent heat of the droplets. Using newly designed atomizers to accurately control the mist droplets, this study experimentally explores the heat transfer perfor… ▽ More Mist/air two-phase flow is a promising cooling technique for many applications such as internal cooling of gas turbine blades. A significant enhancement of heat transfer can be achieved with a low mass fraction of droplets by utilizing the latent heat of the droplets. Using newly designed atomizers to accurately control the mist droplets, this study experimentally explores the heat transfer performance of mist/air flow in a high-temperature channel with a maximum temperature of 880 K. The effects of the mist/air mass ratio, droplet size, Reynolds number, and wall heat flux are studied. The results show that the cooling performance of the test section can be significantly improved by even adding a small amount of droplets. Considering mist droplets of different sizes, larger droplets can cause more remarkable temperature reduction, while smaller droplets can improve the uniformity of temperature distribution. For large droplets, the cooling effect in the upstream is more obvious than that in the downstream due to the interaction between the wall and the droplets, and with the increase of mist/air mass ratio, the area with obvious cooling extends downstream. The performance of mist/air cooling is tested by increasing the heat flux until the maximum temperature at the outlet reaches a predetermined value. Compared with air-only cooling, the increment in the wall heat flux by the mist/air cooling with a mass ratio of 3% can be up to 18.4%. △ Less

Submitted 6 July, 2024; originally announced July 2024.

Comments: 16 pages, 10 figures

Journal ref: International Journal of Heat and Mass Transfer. Volume 193, 1 September 2022, 122966Volume 193, 1 September 2022, 122966

arXiv:2407.03937 [pdf, other]

TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models

Authors: Jiahuan Cao, Dezhi Peng, Peirong Zhang, Yongxin Shi, Yang Liu, Kai Ding, Lianwen Jin

Abstract: Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowle… ▽ More Classical Chinese is a gateway to the rich heritage and wisdom of ancient China, yet its complexities pose formidable comprehension barriers for most modern people without specialized knowledge. While Large Language Models (LLMs) have shown remarkable capabilities in Natural Language Processing (NLP), they struggle with Classical Chinese Understanding (CCU), especially in data-demanding and knowledge-intensive tasks. In response to this dilemma, we propose \textbf{TongGu} (mean understanding ancient and modern), the first CCU-specific LLM, underpinned by three core contributions. First, we construct a two-stage instruction-tuning dataset ACCN-INS derived from rich classical Chinese corpora, aiming to unlock the full CCU potential of LLMs. Second, we propose Redundancy-Aware Tuning (RAT) to prevent catastrophic forgetting, enabling TongGu to acquire new capabilities while preserving its foundational knowledge. Third, we present a CCU Retrieval-Augmented Generation (CCU-RAG) technique to reduce hallucinations based on knowledge-grounding. Extensive experiments across 24 diverse CCU tasks validate TongGu's superior ability, underscoring the effectiveness of RAT and CCU-RAG. The model and dataset will be public available. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2407.02759 [pdf]

Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System

Authors: Yang Zhao, Chang Zhou, Jin Cao, Yi Zhao, Shaobo Liu, Chiyu Cheng, Xingchen Li

Abstract: This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a sh… ▽ More This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a shared objective and allows for strategy communication to boost overall performance. Our results show marked improvements in metrics such as click-through rate (CTR), conversion rate, and total sales, confirming our method's efficacy in practical settings. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Accepted by 2024 5th International Conference on Artificial Intelligence and Electromechanical Automation IEEE (ISBN: 979-8-3503-6617-4)

arXiv:2407.00692 [pdf, ps, other]

The motivic fundamental group of a punctured elliptic curve and algebraic cycles

Authors: Jin Cao, Tomohide Terasoma

Abstract: In this paper, we consider the motivic fundamental group of the punctured elliptic curves as a DG complex in the DG category of elliptic motives and describe its resolution via Schur complexes. During this process, we find the algebraic cycles analogous to the Bloch-Totaro cycles. In this paper, we consider the motivic fundamental group of the punctured elliptic curves as a DG complex in the DG category of elliptic motives and describe its resolution via Schur complexes. During this process, we find the algebraic cycles analogous to the Bloch-Totaro cycles. △ Less

Submitted 30 June, 2024; originally announced July 2024.

MSC Class: 14C15; 14C25

arXiv:2407.00639 [pdf, other]

GRB 221009A/SN 2022xiw: A Supernova Obscured by a Gamma-Ray Burst Afterglow?

Authors: De-Feng Kong, Xiang-Gao Wang, WeiKang Zheng, Hou-Jun Lü, L. P. Xin, Da-Bin Lin, Jia-Xin Cao, Ming-Xuan Lu, B. Ren, Edgar P. Vidal, J. Y. Wei, En-Wei Liang, Alexei V. Filippenko

Abstract: We present optical photometry for the afterglow of GRB 221009A, in some respects the most extraordinary gamma-ray burst (GRB) ever observed. Good quality in the R-band light curve is obtained, covering 0.32-19.57 days since the Fermi-GBM trigger. We find that a weak bump emerges fromthe declining afterglow at $t \approx 11$ days; a supernova (SN) may be responsible. We use a smooth broken power-la… ▽ More We present optical photometry for the afterglow of GRB 221009A, in some respects the most extraordinary gamma-ray burst (GRB) ever observed. Good quality in the R-band light curve is obtained, covering 0.32-19.57 days since the Fermi-GBM trigger. We find that a weak bump emerges fromthe declining afterglow at $t \approx 11$ days; a supernova (SN) may be responsible. We use a smooth broken power-law and $^{56}\mathrm{Ni}$ model to fit the light curve. The best-fitting results reveal that the SN ejected a total mass of $M_\mathrm{ej} = 3.70 M_\odot$, a $^{56}\mathrm{Ni}$ mass of $M_\mathrm{Ni} = 0.23 M_\odot$, and a kinetic energy of $E_\mathrm{SN,K} = 2.35 \times 10^{52} \mathrm{erg}$. We also compare GRB 221009A with other GRB-SN events based on a GRB-associated SN sample, and find that only SN 2003lw and SN 2011kl can be obviously revealed in the afterglow of GRB 221009A by setting these objects at its distance. This suggests that a supernova (SN 2022xiw) is possibly obscured by the brighter afterglow emission from GRB 221009A. △ Less

Submitted 30 June, 2024; originally announced July 2024.

arXiv:2407.00187 [pdf, other]

SMPLOlympics: Sports Environments for Physically Simulated Humanoids

Authors: Zhengyi Luo, Jiashun Wang, Kangni Liu, Haotian Zhang, Chen Tessler, Jingbo Wang, Ye Yuan, Jinkun Cao, Zihui Lin, Fengyi Wang, Jessica Hodgins, Kris Kitani

Abstract: We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for m… ▽ More We present SMPLOlympics, a collection of physically simulated environments that allow humanoids to compete in a variety of Olympic sports. Sports simulation offers a rich and standardized testing ground for evaluating and improving the capabilities of learning algorithms due to the diversity and physically demanding nature of athletic activities. As humans have been competing in these sports for many years, there is also a plethora of existing knowledge on the preferred strategy to achieve better performance. To leverage these existing human demonstrations from videos and motion capture, we design our humanoid to be compatible with the widely-used SMPL and SMPL-X human models from the vision and graphics community. We provide a suite of individual sports environments, including golf, javelin throw, high jump, long jump, and hurdling, as well as competitive sports, including both 1v1 and 2v2 games such as table tennis, tennis, fencing, boxing, soccer, and basketball. Our analysis shows that combining strong motion priors with simple rewards can result in human-like behavior in various sports. By providing a unified sports benchmark and baseline implementation of state and reward designs, we hope that SMPLOlympics can help the control and animation communities achieve human-like and performant behaviors. △ Less

Submitted 28 June, 2024; originally announced July 2024.

Comments: Project page: https://smplolympics.github.io/SMPLOlympics

arXiv:2406.19969 [pdf, other]

Enhancing Terrestrial Net Primary Productivity Estimation with EXP-CASA: A Novel Light Use Efficiency Model Approach

Authors: Guanzhou Chen, Kaiqi Zhang, Xiaodong Zhang, Hong Xie, Haobo Yang, Xiaoliang Tan, Tong Wang, Yule Ma, Qing Wang, Jinzhou Cao, Weihong Cui

Abstract: The Light Use Efficiency model, epitomized by the CASA model, is extensively applied in the quantitative estimation of vegetation Net Primary Productivity. However, the classic CASA model is marked by significant complexity: the estimation of environmental stress parameters, in particular, necessitates multi-source observation data, adding to the complexity and uncertainty of the model's operation… ▽ More The Light Use Efficiency model, epitomized by the CASA model, is extensively applied in the quantitative estimation of vegetation Net Primary Productivity. However, the classic CASA model is marked by significant complexity: the estimation of environmental stress parameters, in particular, necessitates multi-source observation data, adding to the complexity and uncertainty of the model's operation. Additionally, the saturation effect of the Normalized Difference Vegetation Index (NDVI), a key variable in the CASA model, weakened the accuracy of CASA's NPP predictions in densely vegetated areas. To address these limitations, this study introduces the Exponential-CASA (EXP-CASA) model. The EXP-CASA model effectively improves the CASA model by using novel functions for estimating the fraction of absorbed photosynthetically active radiation (FPAR) and environmental stress, by utilizing long-term observational data from FLUXNET and MODIS surface reflectance data. In a comparative analysis of NPP estimation accuracy among four different NPP products, EXP-CASA ($R^2 = 0.68, RMSE= 1.1gC\cdot m^{-2} \cdot d^{-1}$) outperforms others, followed by GLASS-NPP, and lastly MODIS-NPP and classic CASA. Additionally, this research assesses the EXP-CASA model's adaptability to various vegetation indices, evaluates the sensitivity and stability of its parameters over time, and compares its accuracy against other leading NPP estimation products. The findings reveal that the EXP-CASA model exhibits strong adaptability to diverse vegetation indices and stability of model parameters over time series. By introducing a novel estimation approach that optimizes model construction, the EXP-CASA model remarkably improves the accuracy of NPP estimations and paves the way for global-scale, consistent, and continuous assessment of vegetation NPP. △ Less

Submitted 28 June, 2024; originally announced June 2024.

arXiv:2406.19434 [pdf, other]

Lightweight Predictive 3D Gaussian Splats

Authors: Junli Cao, Vidit Goel, Chaoyang Wang, Anil Kag, Ju Hu, Sergei Korolev, Chenfanfu Jiang, Sergey Tulyakov, Jian Ren

Abstract: Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space.… ▽ More Recent approaches representing 3D objects and scenes using Gaussian splats show increased rendering speed across a variety of platforms and devices. While rendering such representations is indeed extremely efficient, storing and transmitting them is often prohibitively expensive. To represent large-scale scenes, one often needs to store millions of 3D Gaussians, occupying gigabytes of disk space. This poses a very practical limitation, prohibiting widespread adoption.Several solutions have been proposed to strike a balance between disk size and rendering quality, noticeably reducing the visual quality. In this work, we propose a new representation that dramatically reduces the hard drive footprint while featuring similar or improved quality when compared to the standard 3D Gaussian splats. When compared to other compact solutions, ours offers higher quality renderings with significantly reduced storage, being able to efficiently run on a mobile device in real-time. Our key observation is that nearby points in the scene can share similar representations. Hence, only a small ratio of 3D points needs to be stored. We introduce an approach to identify such points which are called parent points. The discarded points called children points along with attributes can be efficiently predicted by tiny MLPs. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Project Page: https://plumpuddings.github.io/LPGS//

arXiv:2406.18069 [pdf, other]

Large Language Models for Cuffless Blood Pressure Measurement From Wearable Biosignals

Authors: Zengding Liu, Chen Chen, Jiannong Cao, Minglei Pan, Jikui Liu, Nan Li, Fen Miao, Ye Li

Abstract: Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood press… ▽ More Large language models (LLMs) have captured significant interest from both academia and industry due to their impressive performance across various textual tasks. However, the potential of LLMs to analyze physiological time-series data remains an emerging research field. Particularly, there is a notable gap in the utilization of LLMs for analyzing wearable biosignals to achieve cuffless blood pressure (BP) measurement, which is critical for the management of cardiovascular diseases. This paper presents the first work to explore the capacity of LLMs to perform cuffless BP estimation based on wearable biosignals. We extracted physiological features from electrocardiogram (ECG) and photoplethysmogram (PPG) signals and designed context-enhanced prompts by combining these features with BP domain knowledge and user information. Subsequently, we adapted LLMs to BP estimation tasks through fine-tuning. To evaluate the proposed approach, we conducted assessments of ten advanced LLMs using a comprehensive public dataset of wearable biosignals from 1,272 participants. The experimental results demonstrate that the optimally fine-tuned LLM significantly surpasses conventional task-specific baselines, achieving an estimation error of 0.00 $\pm$ 9.25 mmHg for systolic BP and 1.29 $\pm$ 6.37 mmHg for diastolic BP. Notably, the ablation studies highlight the benefits of our context enhancement strategy, leading to an 8.9% reduction in mean absolute error for systolic BP estimation. This paper pioneers the exploration of LLMs for cuffless BP measurement, providing a potential solution to enhance the accuracy of cuffless BP measurement. △ Less

Submitted 4 July, 2024; v1 submitted 26 June, 2024; originally announced June 2024.

arXiv:2406.17624 [pdf, other]

Self-assessment, Exhibition, and Recognition: a Review of Personality in Large Language Models

Authors: Zhiyuan Wen, Yu Yang, Jiannong Cao, Haoming Sun, Ruosong Yang, Shuaiqi Liu

Abstract: As large language models (LLMs) appear to behave increasingly human-like in text-based interactions, more and more researchers become interested in investigating personality in LLMs. However, the diversity of psychological personality research and the rapid development of LLMs have led to a broad yet fragmented landscape of studies in this interdisciplinary field. Extensive studies across differen… ▽ More As large language models (LLMs) appear to behave increasingly human-like in text-based interactions, more and more researchers become interested in investigating personality in LLMs. However, the diversity of psychological personality research and the rapid development of LLMs have led to a broad yet fragmented landscape of studies in this interdisciplinary field. Extensive studies across different research focuses, different personality psychometrics, and different LLMs make it challenging to have a holistic overview and further pose difficulties in applying findings to real-world applications. In this paper, we present a comprehensive review by categorizing current studies into three research problems: self-assessment, exhibition, and recognition, based on the intrinsic characteristics and external manifestations of personality in LLMs. For each problem, we provide a thorough analysis and conduct in-depth comparisons of their corresponding solutions. Besides, we summarize research findings and open challenges from current studies and further discuss their underlying causes. We also collect extensive publicly available resources to facilitate interested researchers and developers. Lastly, we discuss the potential future research directions and application scenarios. Our paper is the first comprehensive survey of up-to-date literature on personality in LLMs. By presenting a clear taxonomy, in-depth analysis, promising future directions, and extensive resource collections, we aim to provide a better understanding and facilitate further advancements in this emerging field. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.17309 [pdf, other]

Zero-Shot Long-Form Video Understanding through Screenplay

Authors: Yongliang Wu, Bozheng Li, Jiawang Cao, Wenbo Zhu, Yi Lu, Weiheng Chi, Chuyun Xie, Haolin Zheng, Ziyue Su, Jay Wu, Xu Yang

Abstract: The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike pr… ▽ More The Long-form Video Question-Answering task requires the comprehension and analysis of extended video content to respond accurately to questions by utilizing both temporal and contextual information. In this paper, we present MM-Screenplayer, an advanced video understanding system with multi-modal perception capabilities that can convert any video into textual screenplay representations. Unlike previous storytelling methods, we organize video content into scenes as the basic unit, rather than just visually continuous shots. Additionally, we developed a ``Look Back'' strategy to reassess and validate uncertain information, particularly targeting breakpoint mode. MM-Screenplayer achieved highest score in the CVPR'2024 LOng-form VidEo Understanding (LOVEU) Track 1 Challenge, with a global accuracy of 87.5% and a breakpoint accuracy of 68.8%. △ Less

Submitted 25 June, 2024; originally announced June 2024.

Comments: Highest Score Award to the CVPR'2024 LOVEU Track 1 Challenge

arXiv:2406.17307 [pdf, other]

Scalable Sampling of Truncated Multivariate Normals Using Sequential Nearest-Neighbor Approximation

Authors: Jian Cao, Matthias Katzfuss

Abstract: We propose a linear-complexity method for sampling from truncated multivariate normal (TMVN) distributions with high fidelity by applying nearest-neighbor approximations to a product-of-conditionals decomposition of the TMVN density. To make the sequential sampling based on the decomposition feasible, we introduce a novel method that avoids the intractable high-dimensional TMVN distribution by sam… ▽ More We propose a linear-complexity method for sampling from truncated multivariate normal (TMVN) distributions with high fidelity by applying nearest-neighbor approximations to a product-of-conditionals decomposition of the TMVN density. To make the sequential sampling based on the decomposition feasible, we introduce a novel method that avoids the intractable high-dimensional TMVN distribution by sampling sequentially from $m$-dimensional TMVN distributions, where $m$ is a tuning parameter controlling the fidelity. This allows us to overcome the existing methods' crucial problem of rapidly decreasing acceptance rates for increasing dimension. Throughout our experiments with up to tens of thousands of dimensions, we can produce high-fidelity samples with $m$ in the dozens, achieving superior scalability compared to existing state-of-the-art methods. We study a tetrachloroethylene concentration dataset that has $3{,}971$ observed responses and $20{,}730$ undetected responses, together modeled as a partially censored Gaussian process, where our method enables posterior inference for the censored responses through sampling a $20{,}730$-dimensional TMVN distribution. △ Less

Submitted 25 June, 2024; originally announced June 2024.

arXiv:2406.16245 [pdf]

Single-Layer Fe-Cu Interphase in Ferritic Steels Stabilized by Magnetic Friedel Oscillations

Authors: Wen-Qiang Xie, Jin-Li Cao, Wen-Tong Geng

Abstract: Copper precipitation is a technique extensively deployed in steel strengthening. Being as tiny as a few nanometers in diameter, the Cu precipitates present a real challenge to experimental techniques in determination of their composition. The late Professor Morris Fine called it a mystery when addressing the discrepancy between the fact of low solubility of Fe in bulk Cu and the remarkable content… ▽ More Copper precipitation is a technique extensively deployed in steel strengthening. Being as tiny as a few nanometers in diameter, the Cu precipitates present a real challenge to experimental techniques in determination of their composition. The late Professor Morris Fine called it a mystery when addressing the discrepancy between the fact of low solubility of Fe in bulk Cu and the remarkable content of Fe in Cu precipitates according to atom probe tomography measurement. With a thorough search using rigorous first-principles density functional theory calculations, we are surprised to find that the interface between Cu precipitate and Fe matrix is neither immiscibly sharp, nor commonly miscible over a range of atomic layers, but rather manifests itself as a single-layer mixed Fe-Cu interphase. Our detailed analysis reveals that spin polarization is a key factor in defining such an interphase. The magnetic Friedel oscillations through spin polarization is the driving force to stabilize it. When the single-layer Fe-Cu interphase is counted in, the content of Fe in Cu precipitates would be significant due to their small size. Our finding is a strong demonstration that quantum mechanical effects such as Friedel oscillations can have explicit consequence also in structural materials. The accurate knowledge of the Fe-Cu interface is crucial for the design of advanced high-strength steels. △ Less

Submitted 23 June, 2024; originally announced June 2024.

Comments: 17 pages, 11 figures

arXiv:2406.15781 [pdf, other]

DABL: Detecting Semantic Anomalies in Business Processes Using Large Language Models

Authors: Wei Guan, Jian Cao, Jianqi Gao, Haiyan Zhao, Shiyou Qian

Abstract: Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anoma… ▽ More Detecting anomalies in business processes is crucial for ensuring operational success. While many existing methods rely on statistical frequency to detect anomalies, it's important to note that infrequent behavior doesn't necessarily imply undesirability. To address this challenge, detecting anomalies from a semantic viewpoint proves to be a more effective approach. However, current semantic anomaly detection methods treat a trace (i.e., process instance) as multiple event pairs, disrupting long-distance dependencies. In this paper, we introduce DABL, a novel approach for detecting semantic anomalies in business processes using large language models (LLMs). We collect 143,137 real-world process models from various domains. By generating normal traces through the playout of these process models and simulating both ordering and exclusion anomalies, we fine-tune Llama 2 using the resulting log. Through extensive experiments, we demonstrate that DABL surpasses existing state-of-the-art semantic anomaly detection methods in terms of both generalization ability and learning of given processes. Users can directly apply DABL to detect semantic anomalies in their own datasets without the need for additional training. Furthermore, DABL offers the capability to interpret the causes of anomalies in natural language, providing valuable insights into the detected anomalies. △ Less

Submitted 22 June, 2024; originally announced June 2024.

arXiv:2406.15769 [pdf, other]

Humas: A Heterogeneity- and Upgrade-aware Microservice Auto-scaling Framework in Large-scale Data Centers

Authors: Qin Hua, Dingyu Yang, Shiyou Qian, Jian Cao, Guangtao Xue, Minglu Li

Abstract: An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two signif… ▽ More An effective auto-scaling framework is essential for microservices to ensure performance stability and resource efficiency under dynamic workloads. As revealed by many prior studies, the key to efficient auto-scaling lies in accurately learning performance patterns, i.e., the relationship between performance metrics and workloads in data-driven schemes. However, we notice that there are two significant challenges in characterizing performance patterns for large-scale microservices. Firstly, diverse microservices demonstrate varying sensitivities to heterogeneous machines, causing difficulty in quantifying the performance difference in a fixed manner. Secondly, frequent version upgrades of microservices result in uncertain changes in performance patterns, known as pattern drifts, leading to imprecise resource capacity estimation issues. To address these challenges, we propose Humas, a heterogeneity- and upgrade-aware auto-scaling framework for large-scale microservices. Firstly, Humas quantifies the difference in resource efficiency among heterogeneous machines for various microservices online and normalizes their resources in standard units. Additionally, Humas develops a least squares density-difference (LSDD) based algorithm to identify pattern drifts caused by upgrades. Lastly, Humas generates capacity adjustment plans for microservices based on the latest performance patterns and predicted workloads. The experiment results conducted on 50 real microservices with over 11,000 containers demonstrate that Humas improves resource efficiency and performance stability by approximately 30.4% and 48.0%, respectively, compared to state-of-the-art approaches. △ Less

Submitted 22 June, 2024; originally announced June 2024.

Comments: 14 pages; 27 figures

arXiv:2406.14644 [pdf, other]

Unveiling the Spectrum of Data Contamination in Language Models: A Survey from Detection to Remediation

Authors: Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, Arman Cohan

Abstract: Data contamination has garnered increased attention in the era of large language models (LLMs) due to the reliance on extensive internet-derived training corpora. The issue of training corpus overlap with evaluation benchmarks--referred to as contamination--has been the focus of significant recent research. This body of work aims to identify contamination, understand its impacts, and explore mitig… ▽ More Data contamination has garnered increased attention in the era of large language models (LLMs) due to the reliance on extensive internet-derived training corpora. The issue of training corpus overlap with evaluation benchmarks--referred to as contamination--has been the focus of significant recent research. This body of work aims to identify contamination, understand its impacts, and explore mitigation strategies from diverse perspectives. However, comprehensive studies that provide a clear pathway from foundational concepts to advanced insights are lacking in this nascent field. Therefore, we present a comprehensive survey in the field of data contamination, laying out the key issues, methodologies, and findings to date, and highlighting areas in need of further research and development. In particular, we begin by examining the effects of data contamination across various stages and forms. We then provide a detailed analysis of current contamination detection methods, categorizing them to highlight their focus, assumptions, strengths, and limitations. We also discuss mitigation strategies, offering a clear guide for future research. This survey serves as a succinct overview of the most recent advancements in data contamination research, providing a straightforward guide for the benefit of future research endeavors. △ Less

Submitted 20 June, 2024; originally announced June 2024.

Comments: ACL 2024 Camera-Ready Version

arXiv:2406.14558 [pdf, other]

CooHOI: Learning Cooperative Human-Object Interaction with Manipulated Object Dynamics

Authors: Jiawei Gao, Ziqin Wang, Zeqi Xiao, Jingbo Wang, Tai Wang, Jinkun Cao, Xiaolin Hu, Si Liu, Jifeng Dai, Jiangmiao Pang

Abstract: Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges… ▽ More Recent years have seen significant advancements in humanoid control, largely due to the availability of large-scale motion capture data and the application of reinforcement learning methodologies. However, many real-world tasks, such as moving large and heavy furniture, require multi-character collaboration. Given the scarcity of data on multi-character collaboration and the efficiency challenges associated with multi-agent learning, these tasks cannot be straightforwardly addressed using training paradigms designed for single-agent scenarios. In this paper, we introduce Cooperative Human-Object Interaction (CooHOI), a novel framework that addresses multi-character objects transporting through a two-phase learning paradigm: individual skill acquisition and subsequent transfer. Initially, a single agent learns to perform tasks using the Adversarial Motion Priors (AMP) framework. Following this, the agent learns to collaborate with others by considering the shared dynamics of the manipulated object during parallel training using Multi Agent Proximal Policy Optimization (MAPPO). When one agent interacts with the object, resulting in specific object dynamics changes, the other agents learn to respond appropriately, thereby achieving implicit communication and coordination between teammates. Unlike previous approaches that relied on tracking-based methods for multi-character HOI, CooHOI is inherently efficient, does not depend on motion capture data of multi-character interactions, and can be seamlessly extended to include more participants and a wide range of object types △ Less

Submitted 20 June, 2024; originally announced June 2024.

arXiv:2406.13201 [pdf, other]

Toward Structure Fairness in Dynamic Graph Embedding: A Trend-aware Dual Debiasing Approach

Authors: Yicong Li, Yu Yang, Jiannong Cao, Shuaiqi Liu, Haoran Tang, Guandong Xu

Abstract: Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably… ▽ More Recent studies successfully learned static graph embeddings that are structurally fair by preventing the effectiveness disparity of high- and low-degree vertex groups in downstream graph mining tasks. However, achieving structure fairness in dynamic graph embedding remains an open problem. Neglecting degree changes in dynamic graphs will significantly impair embedding effectiveness without notably improving structure fairness. This is because the embedding performance of high-degree and low-to-high-degree vertices will significantly drop close to the generally poorer embedding performance of most slightly changed vertices in the long-tail part of the power-law distribution. We first identify biased structural evolutions in a dynamic graph based on the evolving trend of vertex degree and then propose FairDGE, the first structurally Fair Dynamic Graph Embedding algorithm. FairDGE learns biased structural evolutions by jointly embedding the connection changes among vertices and the long-short-term evolutionary trend of vertex degrees. Furthermore, a novel dual debiasing approach is devised to encode fair embeddings contrastively, customizing debiasing strategies for different biased structural evolutions. This innovative debiasing strategy breaks the effectiveness bottleneck of embeddings without notable fairness loss. Extensive experiments demonstrate that FairDGE achieves simultaneous improvement in the effectiveness and fairness of embeddings. △ Less

Submitted 19 June, 2024; originally announced June 2024.

arXiv:2406.12912 [pdf, other]

Burn-in Test and Thermal Performance Evaluation of Silicon Photomultipliers for the JUNO-TAO Experiment

Authors: X. Chen, G. F. Cao, M. H. Qu, H. W. Wang, N. Anfimov, A. Rybnikov, J. Y. Xu, A. Q. Su, Z. L. Chen, J. Cao, Y. C. Li, M. Qi

Abstract: This study evaluates more than 4,000 tiles made of Hamamatsu visual-sensitive silicon photomultipier (SiPM), each with dimensions of 5 $\times$ 5 cm$^2$, intended for the central detector of the Taishan Anti-neutrino Observatory (TAO), a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO) aimed at measuring the reactor anti-neutrino energy spectrum with unprecedented energ… ▽ More This study evaluates more than 4,000 tiles made of Hamamatsu visual-sensitive silicon photomultipier (SiPM), each with dimensions of 5 $\times$ 5 cm$^2$, intended for the central detector of the Taishan Anti-neutrino Observatory (TAO), a satellite experiment of the Jiangmen Underground Neutrino Observatory (JUNO) aimed at measuring the reactor anti-neutrino energy spectrum with unprecedented energy resolution. All SiPM tiles underwent a room temperature burn-in test in the dark for two weeks, while cryogenic testing analyzed the thermal dependence of parameters for some sampled SiPMs. Results from these comprehensive tests provide crucial insights into the long-term performance and stability of the 10 square meters of SiPMs operating at -50°C to detect scintillation photons in the TAO detector. Despite some anomalies awaiting further inspection, all SiPMs successfully passed the burn-in test over two weeks at room temperature, which is equivalent to 6.7 years at -50°C. Results are also used to guide optimal SiPM selection, configuration, and operation, ensuring reliability and sustainability in reactor neutrino measurements. This work also provides insights for a rapid and robust quality assessment in future experiments that employ large-scale SiPMs as detection systems. △ Less

Submitted 13 June, 2024; originally announced June 2024.

Comments: 15 pages, 15 figures, submitted to JINST

Report number: JUNO-doc-11626

arXiv:2406.12902 [pdf, other]

Can AI Beat Undergraduates in Entry-level Java Assignments? Benchmarking Large Language Models on JavaBench

Authors: Jialun Cao, Zhiyong Chen, Jiarong Wu, Shing-chi Cheung, Chang Xu

Abstract: Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.… ▽ More Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of benchmarks involve Python, while only 5 benchmarks involve Java. Second, imbalanced code granularity. Function-/statement-level benchmarks account for over 83.3% of benchmarks. Only a mere handful extends to class-/project-levels, and all are limited to Python. Third, lacking advanced features. Existing benchmarks primarily assess basic coding skills, while overlooking advanced Object-Oriented Programming (OOP) features (i.e., encapsulation, inheritance, and polymorphism). To fill these gaps, we propose JavaBench, a project-level Java benchmark that exercises OOP features. It comprises four Java projects with 389 methods in 106 Java classes. The test coverage is up to 92%, and JavaBench is attested by 282 undergraduate students, reaching a 90.93/100 average score (i.e., pass rate against the test suite), ensuring the quality of documentation, code skeleton, and tests. To better evaluate LLM's capability against JavaBench, we introduce a systematic evaluation design covering three context settings and five synthesis strategies at two granularities using three hierarchical metrics. Our extensive experiment yields several interesting findings. First, we noticed that regarding project-level Java programming, LLMs are far behind undergraduate students (no project can be correctly completed by any studied LLMs, and at most 41.17% Pass@5 in a more relaxed evaluation). Second, using method signature as prompt context may strike an ideal balance for project-level code generation. JavaBench is publicly available at https://github.com/java-bench/JavaBench. △ Less

Submitted 10 June, 2024; originally announced June 2024.

arXiv:2406.12380 [pdf, other]

Search for fractionally charged particles with CUORE

Authors: CUORE Collaboration, D. Q. Adams, C. Alduino, K. Alfonso, F. T. Avignone III, O. Azzolini, G. Bari, F. Bellini, G. Benato, M. Beretta, M. Biassoni, A. Branca, C. Brofferio, C. Bucci, J. Camilleri, A. Caminata, A. Campani, J. Cao, S. Capelli, C. Capelli, L. Cappelli, L. Cardani, P. Carniti, N. Casali, E. Celi , et al. (95 additional authors not shown)

Abstract: The Cryogenic Underground Observatory for Rare Events (CUORE) is a detector array comprised by 988 5$\;$cm$\times$5$\;$cm$\times$5$\;$cm TeO$_2$ crystals held below 20 mK, primarily searching for neutrinoless double-beta decay in $^{130}$Te. Unprecedented in size amongst cryogenic calorimetric experiments, CUORE provides a promising setting for the study of exotic through-going particles. Using th… ▽ More The Cryogenic Underground Observatory for Rare Events (CUORE) is a detector array comprised by 988 5$\;$cm$\times$5$\;$cm$\times$5$\;$cm TeO$_2$ crystals held below 20 mK, primarily searching for neutrinoless double-beta decay in $^{130}$Te. Unprecedented in size amongst cryogenic calorimetric experiments, CUORE provides a promising setting for the study of exotic through-going particles. Using the first tonne-year of CUORE's exposure, we perform a search for hypothesized fractionally charged particles (FCPs), which are well-motivated by various Standard Model extensions and would have suppressed interactions with matter. No excess of FCP candidate tracks is observed over background, setting leading limits on the underground FCP flux with charges between $e/24-e/5$ at 90\% confidence level. Using the low background environment and segmented geometry of CUORE, we establish the sensitivity of tonne-scale sub-Kelvin detectors to diverse signatures of new physics. △ Less

Submitted 18 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures

arXiv:2406.12200 [pdf, other]

SFedCA: Credit Assignment-Based Active Client Selection Strategy for Spiking Federated Learning

Authors: Qiugang Zhan, Jinbo Cao, Xiurui Xie, Malu Zhang, Huajin Tang, Guisong Liu

Abstract: Spiking federated learning is an emerging distributed learning paradigm that allows resource-constrained devices to train collaboratively at low power consumption without exchanging local data. It takes advantage of both the privacy computation property in federated learning (FL) and the energy efficiency in spiking neural networks (SNN). Thus, it is highly promising to revolutionize the efficient… ▽ More Spiking federated learning is an emerging distributed learning paradigm that allows resource-constrained devices to train collaboratively at low power consumption without exchanging local data. It takes advantage of both the privacy computation property in federated learning (FL) and the energy efficiency in spiking neural networks (SNN). Thus, it is highly promising to revolutionize the efficient processing of multimedia data. However, existing spiking federated learning methods employ a random selection approach for client aggregation, assuming unbiased client participation. This neglect of statistical heterogeneity affects the convergence and accuracy of the global model significantly. In our work, we propose a credit assignment-based active client selection strategy, the SFedCA, to judiciously aggregate clients that contribute to the global sample distribution balance. Specifically, the client credits are assigned by the firing intensity state before and after local model training, which reflects the local data distribution difference from the global model. Comprehensive experiments are conducted on various non-identical and independent distribution (non-IID) scenarios. The experimental results demonstrate that the SFedCA outperforms the existing state-of-the-art spiking federated learning methods, and requires fewer communication rounds. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 9 pages

arXiv:2406.11517 [pdf, other]

Revisiting Spurious Correlation in Domain Generalization

Authors: Bin Qin, Jiangmeng Li, Yi Li, Xuesong Wu, Yupeng Wang, Wenwen Qiang, Jianwen Cao

Abstract: Without loss of generality, existing machine learning techniques may learn spurious correlation dependent on the domain, which exacerbates the generalization of models in out-of-distribution (OOD) scenarios. To address this issue, recent works build a structural causal model (SCM) to describe the causality within data generation process, thereby motivating methods to avoid the learning of spurious… ▽ More Without loss of generality, existing machine learning techniques may learn spurious correlation dependent on the domain, which exacerbates the generalization of models in out-of-distribution (OOD) scenarios. To address this issue, recent works build a structural causal model (SCM) to describe the causality within data generation process, thereby motivating methods to avoid the learning of spurious correlation by models. However, from the machine learning viewpoint, such a theoretical analysis omits the nuanced difference between the data generation process and representation learning process, resulting in that the causal analysis based on the former cannot well adapt to the latter. To this end, we explore to build a SCM for representation learning process and further conduct a thorough analysis of the mechanisms underlying spurious correlation. We underscore that adjusting erroneous covariates introduces bias, thus necessitating the correct selection of spurious correlation mechanisms based on practical application scenarios. In this regard, we substantiate the correctness of the proposed SCM and further propose to control confounding bias in OOD generalization by introducing a propensity score weighted estimator, which can be integrated into any existing OOD method as a plug-and-play module. The empirical results comprehensively demonstrate the effectiveness of our method on synthetic and large-scale real OOD datasets. △ Less

Submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11501 [pdf, other]

Teleporter Theory: A General and Simple Approach for Modeling Cross-World Counterfactual Causality

Authors: Jiangmeng Li, Bin Qin, Qirui Ji, Yi Li, Wenwen Qiang, Jianwen Cao, Fanjiang Xu

Abstract: Leveraging the development of structural causal model (SCM), researchers can establish graphical models for exploring the causal mechanisms behind machine learning techniques. As the complexity of machine learning applications rises, single-world interventionism causal analysis encounters theoretical adaptation limitations. Accordingly, cross-world counterfactual approach extends our understanding… ▽ More Leveraging the development of structural causal model (SCM), researchers can establish graphical models for exploring the causal mechanisms behind machine learning techniques. As the complexity of machine learning applications rises, single-world interventionism causal analysis encounters theoretical adaptation limitations. Accordingly, cross-world counterfactual approach extends our understanding of causality beyond observed data, enabling hypothetical reasoning about alternative scenarios. However, the joint involvement of cross-world variables, encompassing counterfactual variables and real-world variables, challenges the construction of the graphical model. Twin network is a subtle attempt, establishing a symbiotic relationship, to bridge the gap between graphical modeling and the introduction of counterfactuals albeit with room for improvement in generalization. In this regard, we demonstrate the theoretical breakdowns of twin networks in certain cross-world counterfactual scenarios. To this end, we propose a novel teleporter theory to establish a general and simple graphical representation of counterfactuals, which provides criteria for determining teleporter variables to connect multiple worlds. In theoretical application, we determine that introducing the proposed teleporter theory can directly obtain the conditional independence between counterfactual variables and real-world variables from the cross-world SCM without requiring complex algebraic derivations. Accordingly, we can further identify counterfactual causal effects through cross-world symbolic derivation. We demonstrate the generality of the teleporter theory to the practical application. Adhering to the proposed theory, we build a plug-and-play module, and the effectiveness of which are substantiated by experiments on benchmarks. △ Less

Submitted 18 June, 2024; v1 submitted 17 June, 2024; originally announced June 2024.

arXiv:2406.11180 [pdf, other]

Definition and Frequency Dependence of Intrinsic Nonlinear Current

Authors: Cong Xiao, Jin Cao, Qian Niu, Shengyuan A. Yang

Abstract: We show that the three commonly employed approaches that define the same intrinsic linear anomalous Hall response actually lead to different results for intrinsic nonlinear transport. The difference arises from an intrinsic anomalous distribution. It originates from scattering, but its value is completely independent of scattering, because it represents the local equilibration of electron wave pac… ▽ More We show that the three commonly employed approaches that define the same intrinsic linear anomalous Hall response actually lead to different results for intrinsic nonlinear transport. The difference arises from an intrinsic anomalous distribution. It originates from scattering, but its value is completely independent of scattering, because it represents the local equilibration of electron wave packets with field corrected energy. As a manifestation, we find that under ac driving, the intrinsic contributions in rectified component and in double-frequency component exhibit distinct frequency dependence, which can be probed in experiment. Using first-principles calculations, we estimate the signals that can be probed in antiferromagnetic CuMnAs. △ Less

Submitted 16 June, 2024; originally announced June 2024.

arXiv:2406.11087 [pdf, other]

MemDPT: Differential Privacy for Memory Efficient Language Models

Authors: Yanming Liu, Xinyue Peng, Jiannan Cao, Yuwei Zhang, Chen Ma, Songhang Deng, Mengchen Fu, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

Abstract: Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on… ▽ More Large language models have consistently demonstrated remarkable performance across a wide spectrum of applications. Nonetheless, the deployment of these models can inadvertently expose user privacy to potential risks. The substantial memory demands of these models during training represent a significant resource consumption challenge. The sheer size of these models imposes a considerable burden on memory resources, which is a matter of significant concern in practice. In this paper, we present an innovative training framework MemDPT that not only reduces the memory cost of large language models but also places a strong emphasis on safeguarding user data privacy. MemDPT provides edge network and reverse network designs to accommodate various differential privacy memory-efficient fine-tuning schemes. Our approach not only achieves $2 \sim 3 \times$ memory optimization but also provides robust privacy protection, ensuring that user data remains secure and confidential. Extensive experiments have demonstrated that MemDPT can effectively provide differential privacy efficient fine-tuning across various task scenarios. △ Less

Submitted 20 June, 2024; v1 submitted 16 June, 2024; originally announced June 2024.

Comments: 12 pages first version

arXiv:2406.10744 [pdf, other]

Technique Report of CVPR 2024 PBDL Challenges

Authors: Ying Fu, Yu Li, Shaodi You, Boxin Shi, Linwei Chen, Yunhao Zou, Zichun Wang, Yichen Li, Yuze Han, Yingkai Zhang, Jianan Wang, Qinglin Liu, Wei Yu, Xiaoqian Lv, Jianing Li, Shengping Zhang, Xiangyang Ji, Yuanpei Chen, Yuhan Zhang, Weihang Peng, Liwen Zhang, Zhe Xu, Dingyong Gou, Cong Li, Senyan Xu , et al. (75 additional authors not shown)

Abstract: The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, a… ▽ More The intersection of physics-based vision and deep learning presents an exciting frontier for advancing computer vision technologies. By leveraging the principles of physics to inform and enhance deep learning models, we can develop more robust and accurate vision systems. Physics-based vision aims to invert the processes to recover scene properties such as shape, reflectance, light distribution, and medium properties from images. In recent years, deep learning has shown promising improvements for various vision tasks, and when combined with physics-based vision, these approaches can enhance the robustness and accuracy of vision systems. This technical report summarizes the outcomes of the Physics-Based Vision Meets Deep Learning (PBDL) 2024 challenge, held in CVPR 2024 workshop. The challenge consisted of eight tracks, focusing on Low-Light Enhancement and Detection as well as High Dynamic Range (HDR) Imaging. This report details the objectives, methodologies, and results of each track, highlighting the top-performing solutions and their innovative approaches. △ Less

Submitted 12 July, 2024; v1 submitted 15 June, 2024; originally announced June 2024.

Comments: CVPR 2024 PBDL Challenges: https://pbdl-ws.github.io/pbdl2024/challenge/index.html

arXiv:2406.10424 [pdf, other]

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Authors: Xu Cao, Bolin Lai, Wenqian Ye, Yunsheng Ma, Joerg Heintz, Jintai Chen, Jianguo Cao, James M. Rehg

Abstract: Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships… ▽ More Recently, Multimodal Large Language Models (MLLMs) have shown great promise in language-guided perceptual tasks such as recognition, segmentation, and object detection. However, their effectiveness in addressing visual cognition problems that require high-level reasoning is not well-established. One such challenge is abstract visual reasoning (AVR) -- the cognitive ability to discern relationships among patterns in a set of images and extrapolate to predict subsequent patterns. This skill is crucial during the early neurodevelopmental stages of children. Inspired by the AVR tasks in Raven's Progressive Matrices (RPM) and Wechsler Intelligence Scale for Children (WISC), we propose a new dataset MaRs-VQA and a new benchmark VCog-Bench containing three datasets to evaluate the zero-shot AVR capability of MLLMs and compare their performance with existing human intelligent investigation. Our comparative experiments with different open-source and closed-source MLLMs on the VCog-Bench revealed a gap between MLLMs and human intelligence, highlighting the visual cognitive limitations of current MLLMs. We believe that the public release of VCog-Bench, consisting of MaRs-VQA, and the inference pipeline will drive progress toward the next generation of MLLMs with human-like visual cognition abilities. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: 14 pages, 4 figures, the appendix will be updated soon

MSC Class: 68T01

arXiv:2406.10239 [pdf]

Predict Click-Through Rates with Deep Interest Network Model in E-commerce Advertising

Authors: Chang Zhou, Yang Zhao, Yuelin Zou, Jin Cao, Wenhan Fan, Yi Zhao, Chiyu Cheng

Abstract: This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to tradi… ▽ More This paper proposes new methods to enhance click-through rate (CTR) prediction models using the Deep Interest Network (DIN) model, specifically applied to the advertising system of Alibaba's Taobao platform. Unlike traditional deep learning approaches, this research focuses on localized user behavior activation for tailored ad targeting by leveraging extensive user behavior data. Compared to traditional models, this method demonstrates superior ability to handle diverse and dynamic user data, thereby improving the efficiency of ad systems and increasing revenue. △ Less

Submitted 4 June, 2024; originally announced June 2024.

Comments: Accepted by the 5th International Conference on Information Science, Parallel and Distributed Systems (ISPDS 2024), 2024 IEEE

arXiv:2406.09828 [pdf, other]

Dynamic Decentralized 3D Urban Coverage and Patrol with UAVs

Authors: Wai Lun Leong, Jiawei Cao, Rodney Teo

Abstract: In the event of natural or man-made disasters in an urban environment, such as fires, floods, and earthquakes, a swarm of unmanned aerial vehicles (UAVs) can rapidly sweep and provide coverage to monitor the area of interest and locate survivors. We propose a modular framework and patrol strategy that enables a swarm of UAVs to perform cooperative and periodic coverage in such scenarios. Our appro… ▽ More In the event of natural or man-made disasters in an urban environment, such as fires, floods, and earthquakes, a swarm of unmanned aerial vehicles (UAVs) can rapidly sweep and provide coverage to monitor the area of interest and locate survivors. We propose a modular framework and patrol strategy that enables a swarm of UAVs to perform cooperative and periodic coverage in such scenarios. Our approach first discretizes the area of interest into viewpoints connected via closed paths. UAVs are assigned to teams via task allocation to cooperatively patrol these closed paths. We propose a minimal, scalable, and robust patrol strategy where UAVs within a team move in a random direction along their assigned closed path and "bounce" off each other when they meet. Our simulation results show that such a minimal strategy can exhibit an emergent behaviour that provides periodic and complete coverage in a 3D urban environment. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted for the 2024 International Conference on Unmanned Aircraft Systems (ICUAS 2024) in Chania, Greece

arXiv:2406.09779 [pdf, other]

doi 10.1145/3589335.3665995

OSPC: Detecting Harmful Memes with Large Language Model as a Catalyst

Authors: Jingtao Cao, Zheng Zhang, Hongru Wang, Bin Liang, Hao Wang, Kam-Fai Wong

Abstract: Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Langu… ▽ More Memes, which rapidly disseminate personal opinions and positions across the internet, also pose significant challenges in propagating social bias and prejudice. This study presents a novel approach to detecting harmful memes, particularly within the multicultural and multilingual context of Singapore. Our methodology integrates image captioning, Optical Character Recognition (OCR), and Large Language Model (LLM) analysis to comprehensively understand and classify harmful memes. Utilizing the BLIP model for image captioning, PP-OCR and TrOCR for text recognition across multiple languages, and the Qwen LLM for nuanced language understanding, our system is capable of identifying harmful content in memes created in English, Chinese, Malay, and Tamil. To enhance the system's performance, we fine-tuned our approach by leveraging additional data labeled using GPT-4V, aiming to distill the understanding capability of GPT-4V for harmful memes to our system. Our framework achieves top-1 at the public leaderboard of the Online Safety Prize Challenge hosted by AI Singapore, with the AUROC as 0.7749 and accuracy as 0.7087, significantly ahead of the other teams. Notably, our approach outperforms previous benchmarks, with FLAVA achieving an AUROC of 0.5695 and VisualBERT an AUROC of 0.5561. △ Less

Submitted 14 June, 2024; originally announced June 2024.

arXiv:2406.07944 [pdf, other]

DLLens: Testing Deep Learning Libraries via LLM-aided Synthesis

Authors: Meiziniu Li, Dongze Li, Jianmeng Liu, Jialun Cao, Yongqiang Tian, Shing-Chi Cheung

Abstract: Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. However, these techniques are limited in finding implementations that offer the same functionality and generating diverse test inputs for differential testing. This paper introduces DLLens, a novel dif… ▽ More Testing is a major approach to ensuring the quality of deep learning (DL) libraries. Existing testing techniques commonly adopt differential testing to relieve the need for test oracle construction. However, these techniques are limited in finding implementations that offer the same functionality and generating diverse test inputs for differential testing. This paper introduces DLLens, a novel differential testing technique for DL library testing. Our insight is that APIs in different DL libraries are commonly designed to accomplish various computations for the same set of published DL algorithms. Although the mapping of these APIs is not often one-to-one, we observe that their computations can be mutually simulated after proper composition and adaptation. The use of these simulation counterparts facilitates differential testing for the detection of functional DL library bugs. Leveraging the insight, we propose DLLens as a novel mechanism that utilizes a large language model (LLM) to synthesize valid counterparts of DL library APIs. To generate diverse test inputs, DLLens incorporates a static analysis method aided by LLM to extract path constraints from all execution paths in each API and its counterpart's implementations. These path constraints are then used to guide the generation of diverse test inputs. We evaluate DLLens on two popular DL libraries, TensorFlow and PyTorch. Our evaluation shows that DLLens can synthesize counterparts for more than twice as many APIs found by state-of-the-art techniques on these libraries. Moreover, DLLens can extract 26.7% more constraints and detect 2.5 times as many bugs as state-of-the-art techniques. DLLens has successfully found 56 bugs in recent TensorFlow and PyTorch libraries. Among them, 41 are previously unknown, 39 of which have been confirmed by developers after reporting, and 19 of those confirmed bugs have been fixed by developers. △ Less

Submitted 12 June, 2024; originally announced June 2024.

ACM Class: D.2.5; I.2.5

arXiv:2406.07789 [pdf, ps, other]

A posteriori error estimates for the exponential midpoint method for linear and semilinear parabolic equations

Authors: Xianfa Hu, Wansheng Wang, Mengli Mao, Jiliang Cao

Abstract: In this paper, the a posteriori error estimates of the exponential midpoint method for time discretization are studied for linear and semilinear parabolic equations. Using the exponential midpoint approximation defined by a continuous and piecewise linear interpolation of nodal values yields the suboptimal order estimates. Based on the property of the entire function, we introduce a continuous and… ▽ More In this paper, the a posteriori error estimates of the exponential midpoint method for time discretization are studied for linear and semilinear parabolic equations. Using the exponential midpoint approximation defined by a continuous and piecewise linear interpolation of nodal values yields the suboptimal order estimates. Based on the property of the entire function, we introduce a continuous and piecewise quadratic time reconstruction of the exponential midpoint method to derive the optimal order estimates, and the error bounds are solely dependent on the discretization parameters, the data of the problem and the approximation of the entire function. Several numerical examples are implemented to illustrate the theoretical results. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07472 [pdf, other]

4Real: Towards Photorealistic 4D Scene Generation via Video Diffusion Models

Authors: Heng Yu, Chaoyang Wang, Peiye Zhuang, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Laszlo A Jeni, Sergey Tulyakov, Hsin-Ying Lee

Abstract: Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependen… ▽ More Existing dynamic scene generation methods mostly rely on distilling knowledge from pre-trained 3D generative models, which are typically fine-tuned on synthetic object datasets. As a result, the generated scenes are often object-centric and lack photorealism. To address these limitations, we introduce a novel pipeline designed for photorealistic text-to-4D scene generation, discarding the dependency on multi-view generative models and instead fully utilizing video generative models trained on diverse real-world datasets. Our method begins by generating a reference video using the video generation model. We then learn the canonical 3D representation of the video using a freeze-time video, delicately generated from the reference video. To handle inconsistencies in the freeze-time video, we jointly learn a per-frame deformation to model these imperfections. We then learn the temporal deformation based on the canonical representation to capture dynamic interactions in the reference video. The pipeline facilitates the generation of dynamic scenes with enhanced photorealism and structural integrity, viewable from multiple perspectives, thereby setting a new standard in 4D scene generation. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.07038 [pdf, ps, other]

A Multi-Scale Boltzmann Equation for Complex Systems of Neutral Gases across All Flow Regimes

Authors: Sha Liu, Junzhe Cao, Sirui Yang, Chengwen Zhong

Abstract: A Multi-scale Boltzmann Equation (MBE) is found from the gas-kinetic theory and the direct modeling philosophy as a master equation for complex physical systems of neutral gases across all flow regimes, which locates between the continuum limit and the free-molecular limit, covering a vast range of applications such as hypersonic flows over aerospace crafts and delicate flows around MEMS. The most… ▽ More A Multi-scale Boltzmann Equation (MBE) is found from the gas-kinetic theory and the direct modeling philosophy as a master equation for complex physical systems of neutral gases across all flow regimes, which locates between the continuum limit and the free-molecular limit, covering a vast range of applications such as hypersonic flows over aerospace crafts and delicate flows around MEMS. The most explicit characteristic of MBE is evolving the variable observation time in the expression, which distinguishes the MBE from the single-scale master or governing equation where a fixed scale is implied in the assumptions. The fundamental properties of MBE, such as the asymptotic property, are proved theoretically, while a concise numerical scheme is developed for MBE to demonstrate its validity by benchmark multi-scale problems. △ Less

Submitted 11 June, 2024; originally announced June 2024.

arXiv:2406.05572 [pdf, other]

Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction

Authors: Aidan Curtis, Nishanth Kumar, Jing Cao, Tomás Lozano-Pérez, Leslie Pack Kaelbling

Abstract: Recent developments in pretrained large language models (LLMs) applied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constra… ▽ More Recent developments in pretrained large language models (LLMs) applied to robotics have demonstrated their capacity for sequencing a set of discrete skills to achieve open-ended goals in simple robotic tasks. In this paper, we examine the topic of LLM planning for a set of continuously parameterized skills whose execution must avoid violations of a set of kinematic, geometric, and physical constraints. We prompt the LLM to output code for a function with open parameters, which, together with environmental constraints, can be viewed as a Continuous Constraint Satisfaction Problem (CCSP). This CCSP can be solved through sampling or optimization to find a skill sequence and continuous parameter settings that achieve the goal while avoiding constraint violations. Additionally, we consider cases where the LLM proposes unsatisfiable CCSPs, such as those that are kinematically infeasible, dynamically unstable, or lead to collisions, and re-prompt the LLM to form a new CCSP accordingly. Experiments across three different simulated 3D domains demonstrate that our proposed strategy, PRoC3S, is capable of solving a wide range of complex manipulation tasks with realistic constraints on continuous parameters much more efficiently and effectively than existing baselines. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.04844 [pdf, other]

Multi-Granularity Language-Guided Multi-Object Tracking

Authors: Yuhao Li, Muzammal Naseer, Jiale Cao, Yu Zhu, Jinqiu Sun, Yanning Zhang, Fahad Shahbaz Khan

Abstract: Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as o… ▽ More Most existing multi-object tracking methods typically learn visual tracking features via maximizing dis-similarities of different instances and minimizing similarities of the same instance. While such a feature learning scheme achieves promising performance, learning discriminative features solely based on visual information is challenging especially in case of environmental interference such as occlusion, blur and domain variance. In this work, we argue that multi-modal language-driven features provide complementary information to classical visual features, thereby aiding in improving the robustness to such environmental interference. To this end, we propose a new multi-object tracking framework, named LG-MOT, that explicitly leverages language information at different levels of granularity (scene-and instance-level) and combines it with standard visual features to obtain discriminative representations. To develop LG-MOT, we annotate existing MOT datasets with scene-and instance-level language descriptions. We then encode both instance-and scene-level language information into high-dimensional embeddings, which are utilized to guide the visual features during training. At inference, our LG-MOT uses the standard visual features without relying on annotated language descriptions. Extensive experiments on three benchmarks, MOT17, DanceTrack and SportsMOT, reveal the merits of the proposed contributions leading to state-of-the-art performance. On the DanceTrack test set, our LG-MOT achieves an absolute gain of 2.2\% in terms of target object association (IDF1 score), compared to the baseline using only visual features. Further, our LG-MOT exhibits strong cross-domain generalizability. The dataset and code will be available at ~\url{https://github.com/WesLee88524/LG-MOT}. △ Less

Submitted 7 June, 2024; originally announced June 2024.

arXiv:2406.04658 [pdf, other]

Advanced Payment Security System:XGBoost, CatBoost and SMOTE Integrated

Authors: Qi Zheng, Chang Yu, Jin Cao, Yongshun Xu, Qianwen Xing, Yinxin Jin

Abstract: With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security. This study explores the application of advanced machine learning models, specifically XGBoost and LightGBM, for developing a more accurate and robust Payment Security Protection Model.To enhance data reliability, we meticulously processed the data sources and used SM… ▽ More With the rise of various online and mobile payment systems, transaction fraud has become a significant threat to financial security. This study explores the application of advanced machine learning models, specifically XGBoost and LightGBM, for developing a more accurate and robust Payment Security Protection Model.To enhance data reliability, we meticulously processed the data sources and used SMOTE (Synthetic Minority Over-sampling Technique) to address class imbalance and improve data representation. By selecting highly correlated features, we aimed to strengthen the training process and boost model performance.We conducted thorough performance evaluations of our proposed models, comparing them against traditional methods including Random Forest, Neural Network, and Logistic Regression. Key metrics such as Precision, Recall, and F1 Score were used to rigorously assess their effectiveness.Our detailed analyses and comparisons reveal that the combination of SMOTE with XGBoost and LightGBM offers a highly efficient and powerful mechanism for payment security protection. The results show that these models not only outperform traditional approaches but also hold significant promise for advancing the field of transaction fraud prevention. △ Less

Submitted 7 June, 2024; originally announced June 2024.

Comments: This paper is received by https://ieee-metacom.org

arXiv:2406.04504 [pdf, other]

Mixed Finite Element Method for Multi-layer Elastic Contact Systems

Authors: Zhizhuo Zhang, Mikaël Barboteu, Xiaobing Nie, Serge Dumont, Mahmoud Abdel-Aty, Jinde Cao

Abstract: With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the num… ▽ More With the development of multi-layer elastic systems in the field of engineering mechanics, the corresponding variational inequality theory and algorithm design have received more attention and research. In this study, a class of equivalent saddle point problems with interlayer Tresca friction conditions and the mixed finite element method are proposed and analyzed. Then, the convergence of the numerical solution of the mixed finite element method is theoretically proven, and the corresponding algebraic dual algorithm is given. Finally, through numerical experiments, the mixed finite element method is not only compared with the layer decomposition method, but also its convergence relationship with respect to the spatial discretization parameter $H$ is verified. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04499 [pdf, other]

A layer decomposition method for multi-layer elastic contact systems with interlayer Tresca friction

Authors: Zhizhuo Zhang, Xiaobing Nie, Mikaël Barboteu, Jinde Cao

Abstract: With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction… ▽ More With the increasing demand for the accuracy of numerical simulation of pavement mechanics, the variational inequality model and its induced finite element method which can simulate the interlayer contact state becomes a potential solution. In this paper, a layer decomposition algorithm for solving variational inequality models of multi-layer elastic contact systems with interlayer Tresca friction conditions is studied. Continuous and discrete versions of the algorithm and their convergence theorems have been proposed and proved successively. Then, the algebraic form of the executable optimization algorithm and the numerical experimental results verify the practicability of the variational inequality model and its algorithm in the pavement mechanics modeling. △ Less

Submitted 6 June, 2024; originally announced June 2024.

arXiv:2406.04466 [pdf, other]

Dog Heart Rate and Blood Oxygen Metaverse Interaction System

Authors: Yanhui Jiang, Jin Cao, Chang Yu

Abstract: This study developed an improved dog heart rate and blood oxygen sensor system using Arduino. Traditional methods face accuracy and reliability issues. Our system integrates advanced computational techniques with hardware-based sensing to enhance measurement precision. An Arduino microcontroller connected to a heart rate and blood oxygen sensor collects raw data, which is preprocessed and filtered… ▽ More This study developed an improved dog heart rate and blood oxygen sensor system using Arduino. Traditional methods face accuracy and reliability issues. Our system integrates advanced computational techniques with hardware-based sensing to enhance measurement precision. An Arduino microcontroller connected to a heart rate and blood oxygen sensor collects raw data, which is preprocessed and filtered to remove noise. Experimental plans include long-term monitoring of multiple dogs and comparative analysis with traditional methods to validate the system's effectiveness, aiming for widespread use in the home or veterinary clinics. Additionally, the system offers dog owners the possibility to interact with their virtual dogs in the metaverse using AR/VR, allowing them to better understand their real dogs' health conditions by observing their virtual health status. △ Less

Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 7 pages, 6 figures, conference for IEEE metacom accepted (https://ieee-metacom.org/)

arXiv:2406.04465 [pdf, other]

Rough Set improved Therapy-Based Metaverse Assisting System

Authors: Jin Cao, Yanhui Jiang, Chang Yu, Feiwei Qin, Zekun Jiang

Abstract: Chronic neck and shoulder pain (CNSP) is a major global public health issue. Traditional treatments like physiotherapy and rehabilitation have drawbacks, including high costs, low precision, and user discomfort. This paper presents an interactive system based on Cognitive Therapy Theory (CBT) for CNSP treatment. The system includes a pain detection module using EMG and IMU to monitor pain and opti… ▽ More Chronic neck and shoulder pain (CNSP) is a major global public health issue. Traditional treatments like physiotherapy and rehabilitation have drawbacks, including high costs, low precision, and user discomfort. This paper presents an interactive system based on Cognitive Therapy Theory (CBT) for CNSP treatment. The system includes a pain detection module using EMG and IMU to monitor pain and optimize data with Rough Set theory, and a cognitive therapy module that processes this data further for CBT-based interventions, including massage and heating therapy. An experimental plan is outlined to evaluate the system's effectiveness and performance. The goal is to create an accessible device for CNSP therapy. Additionally, the paper explores the system's application in a metaverse environment, enhancing treatment immersion and personalization. The metaverse platform simulates treatment environments and responds to real-time patient data, allowing for continuous monitoring and adjustment of treatment plans. △ Less

Submitted 10 June, 2024; v1 submitted 6 June, 2024; originally announced June 2024.

Comments: 7 pages, 5 figures, conference for IEEE metacom accepted (https://ieee-metacom.org/)

arXiv:2406.04333 [pdf, other]

BitsFusion: 1.99 bits Weight Quantization of Diffusion Model

Authors: Yang Sui, Yanyu Li, Anil Kag, Yerlan Idelbayev, Junli Cao, Ju Hu, Dhritiman Sagar, Bo Yuan, Sergey Tulyakov, Jian Ren

Abstract: Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this wor… ▽ More Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices. In this work, we develop a novel weight quantization method that quantizes the UNet from Stable Diffusion v1.5 to 1.99 bits, achieving a model with 7.9X smaller size while exhibiting even better generation quality than the original one. Our approach includes several novel techniques, such as assigning optimal bits to each layer, initializing the quantized model for better performance, and improving the training strategy to dramatically reduce quantization error. Furthermore, we extensively evaluate our quantized model across various benchmark datasets and through human evaluation to demonstrate its superior generation quality. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project Page: https://snap-research.github.io/BitsFusion

arXiv:2406.04324 [pdf, other]

SF-V: Single Forward Video Generation Model

Authors: Zhixing Zhang, Yanyu Li, Yushu Wu, Yanwu Xu, Anil Kag, Ivan Skorokhodov, Willi Menapace, Aliaksandr Siarohin, Junli Cao, Dimitris Metaxas, Sergey Tulyakov, Jian Ren

Abstract: Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune p… ▽ More Diffusion-based video generation models have demonstrated remarkable success in obtaining high-fidelity videos through the iterative denoising process. However, these models require multiple denoising steps during sampling, resulting in high computational costs. In this work, we propose a novel approach to obtain single-step video generation models by leveraging adversarial training to fine-tune pre-trained video diffusion models. We show that, through the adversarial training, the multi-steps video diffusion model, i.e., Stable Video Diffusion (SVD), can be trained to perform single forward pass to synthesize high-quality videos, capturing both temporal and spatial dependencies in the video data. Extensive experiments demonstrate that our method achieves competitive generation quality of synthesized videos with significantly reduced computational overhead for the denoising process (i.e., around $23\times$ speedup compared with SVD and $6\times$ speedup compared with existing works, with even better generation quality), paving the way for real-time video synthesis and editing. More visualization results are made publicly available at https://snap-research.github.io/SF-V. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: Project Page: https://snap-research.github.io/SF-V

arXiv:2406.03807 [pdf, other]

Tool-Planner: Dynamic Solution Tree Planning for Large Language Model with Tool Clustering

Authors: Yanming Liu, Xinyue Peng, Yuwei Zhang, Jiannan Cao, Xuhong Zhang, Sheng Cheng, Xun Wang, Jianwei Yin, Tianyu Du

Abstract: Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs… ▽ More Large language models (LLMs) have demonstrated exceptional reasoning capabilities, enabling them to solve various complex problems. Recently, this ability has been applied to the paradigm of tool learning. Tool learning involves providing examples of tool usage and their corresponding functions, allowing LLMs to formulate plans and demonstrate the process of invoking and executing each tool. LLMs can address tasks that they cannot complete independently, thereby enhancing their potential across different tasks. However, this approach faces two key challenges. First, redundant error correction leads to unstable planning and long execution time. Additionally, designing a correct plan among multiple tools is also a challenge in tool learning. To address these issues, we propose Tool-Planner, a task-processing framework based on toolkits. Tool-Planner groups tools based on the API functions with the same function into a toolkit and allows LLMs to implement planning across the various toolkits. When a tool error occurs, the language model can reselect and adjust tools based on the toolkit. Experiments show that our approach demonstrates a high pass and win rate across different datasets and optimizes the planning scheme for tool learning in models such as GPT-4 and Claude 3, showcasing the potential of our method. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 46pages first version

Showing 1–50 of 1,605 results for author: Cao, J