Skip to main content

Showing 1–50 of 113 results for author: Cai, R

  1. arXiv:2407.04064  [pdf, other

    cs.RO

    Collision Avoidance for Multiple UAVs in Unknown Scenarios with Causal Representation Disentanglement

    Authors: Jiafan Zhuang, Zihao Xia, Gaofei Han, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: Deep reinforcement learning (DRL) has achieved remarkable progress in online path planning tasks for multi-UAV systems. However, existing DRL-based methods often suffer from performance degradation when tackling unseen scenarios, since the non-causal factors in visual representations adversely affect policy learning. To address this issue, we propose a novel representation learning approach, \ie,… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  2. arXiv:2407.04056  [pdf, other

    cs.RO

    Robust Policy Learning for Multi-UAV Collision Avoidance with Causal Feature Selection

    Authors: Jiafan Zhuang, Gaofei Han, Zihao Xia, Boxi Wang, Wenji Li, Dongliang Wang, Zhifeng Hao, Ruichu Cai, Zhun Fan

    Abstract: In unseen and complex outdoor environments, collision avoidance navigation for unmanned aerial vehicle (UAV) swarms presents a challenging problem. It requires UAVs to navigate through various obstacles and complex backgrounds. Existing collision avoidance navigation methods based on deep reinforcement learning show promising performance but suffer from poor generalization abilities, resulting in… ▽ More

    Submitted 15 July, 2024; v1 submitted 4 July, 2024; originally announced July 2024.

  3. arXiv:2406.19195  [pdf, other

    cs.LG cs.AI

    Estimating Long-term Heterogeneous Dose-response Curve: Generalization Bound Leveraging Optimal Transport Weights

    Authors: Zeqin Yang, Weilin Chen, Ruichu Cai, Yuguang Yan, Zhifeng Hao, Zhipeng Yu, Zhichao Zou, Zhen Peng, Jiecheng Guo

    Abstract: Long-term causal effect estimation is a significant but challenging problem in many applications. Existing methods rely on ideal assumptions to estimate long-term average effects, e.g., no unobserved confounders or a binary treatment,while in numerous real-world applications, these assumptions could be violated and average effects are unable to provide individual-level suggestions.In this paper,we… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

  4. arXiv:2406.13227  [pdf, other

    cs.CV

    Controllable and Gradual Facial Blemishes Retouching via Physics-Based Modelling

    Authors: Chenhao Shuai, Rizhao Cai, Bandara Dissanayake, Amanda Newman, Dayan Guan, Dennis Sng, Ling Li, Alex Kot

    Abstract: Face retouching aims to remove facial blemishes, such as pigmentation and acne, and still retain fine-grain texture details. Nevertheless, existing methods just remove the blemishes but focus little on realism of the intermediate process, limiting their use more to beautifying facial images on social media rather than being effective tools for simulating changes in facial pigmentation and ance. Mo… ▽ More

    Submitted 19 June, 2024; originally announced June 2024.

    Comments: 7 pages, 6 figures. The paper has been accepted by the IEEE Conference on Multimedia Expo 2024

  5. arXiv:2406.11819  [pdf, other

    cs.CV

    MegaScenes: Scene-Level View Synthesis at Scale

    Authors: Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, Noah Snavely

    Abstract: Scene-level novel view synthesis (NVS) is fundamental to many vision and graphics applications. Recently, pose-conditioned diffusion models have led to significant progress by extracting 3D information from 2D foundation models, but these methods are limited by the lack of scene-level training data. Common dataset choices either consist of isolated objects (Objaverse), or of object-centric scenes… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: Our project page is at https://megascenes.github.io

  6. arXiv:2406.10260  [pdf, other

    cs.CL cs.LG

    Flextron: Many-in-One Flexible Large Language Model

    Authors: Ruisi Cai, Saurav Muralidharan, Greg Heinrich, Hongxu Yin, Zhangyang Wang, Jan Kautz, Pavlo Molchanov

    Abstract: Training modern LLMs is extremely resource intensive, and customizing them for various deployment scenarios characterized by limited compute and memory resources through repeated training is impractical. In this paper, we introduce Flextron, a network architecture and post-training model optimization framework supporting flexible model deployment. The Flextron architecture utilizes a nested elasti… ▽ More

    Submitted 10 June, 2024; originally announced June 2024.

  7. arXiv:2406.07020  [pdf, other

    cs.LG

    Learning Discrete Latent Variable Structures with Tensor Rank Conditions

    Authors: Zhengming Chen, Ruichu Cai, Feng Xie, Jie Qiao, Anpeng Wu, Zijian Li, Zhifeng Hao, Kun Zhang

    Abstract: Unobserved discrete data are ubiquitous in many scientific disciplines, and how to learn the causal structure of these latent variables is crucial for uncovering data patterns. Most studies focus on the linear latent variable model or impose strict constraints on latent structures, which fail to address cases in discrete data involving non-linear relationships or complex latent structures. To achi… ▽ More

    Submitted 11 June, 2024; originally announced June 2024.

  8. arXiv:2406.05317  [pdf, other

    cs.LG cs.CL

    LoCoCo: Dropping In Convolutions for Long Context Compression

    Authors: Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen

    Abstract: This paper tackles the memory hurdle of processing long context sequences in Large Language Models (LLMs), by presenting a novel approach, Dropping In Convolutions for Long Context Compression (LoCoCo). LoCoCo employs only a fixed-size Key-Value (KV) cache, and can enhance efficiency in both inference and fine-tuning stages. Diverging from prior methods that selectively drop KV pairs based on heur… ▽ More

    Submitted 7 June, 2024; originally announced June 2024.

  9. arXiv:2406.02902  [pdf, other

    cs.CL

    S$^2$GSL: Incorporating Segment to Syntactic Enhanced Graph Structure Learning for Aspect-based Sentiment Analysis

    Authors: Bingfeng Chen, Qihan Ouyang, Yongqi Luo, Boyan Xu, Ruichu Cai, Zhifeng Hao

    Abstract: Previous graph-based approaches in Aspect based Sentiment Analysis(ABSA) have demonstrated impressive performance by utilizing graph neural networks and attention mechanisms to learn structures of static dependency trees and dynamic latent trees. However, incorporating both semantic and syntactic information simultaneously within complex global structures can introduce irrelevant contexts and synt… ▽ More

    Submitted 7 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: ACL2024(main)

  10. arXiv:2405.16130  [pdf, ps, other

    cs.LG stat.ME

    Automating the Selection of Proxy Variables of Unmeasured Confounders

    Authors: Feng Xie, Zhengming Chen, Shanshan Luo, Wang Miao, Ruichu Cai, Zhi Geng

    Abstract: Recently, interest has grown in the use of proxy variables of unobserved confounding for inferring the causal effect in the presence of unmeasured confounders from observational data. One difficulty inhibiting the practical use is finding valid proxy variables of unobserved confounding to a target causal effect of interest. These proxy variables are typically justified by background knowledge. In… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  11. arXiv:2405.16083  [pdf, other

    cs.LG

    From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals

    Authors: Ruichu Cai, Zhifang Jiang, Zijian Li, Weilin Chen, Xuexin Chen, Zhifeng Hao, Yifan Shen, Guangyi Chen, Kun Zhang

    Abstract: Existing methods for multi-modal time series representation learning aim to disentangle the modality-shared and modality-specific latent variables. Although achieving notable performances on downstream tasks, they usually assume an orthogonal latent space. However, the modality-specific and modality-shared latent variables might be dependent on real-world scenarios. Therefore, we propose a general… ▽ More

    Submitted 25 May, 2024; originally announced May 2024.

  12. arXiv:2405.15325  [pdf, other

    cs.LG stat.ML

    On the Identification of Temporally Causal Representation with Instantaneous Dependence

    Authors: Zijian Li, Yifan Shen, Kaitao Zheng, Ruichu Cai, Xiangchen Song, Mingming Gong, Zhengmao Zhu, Guangyi Chen, Kun Zhang

    Abstract: Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes do not have instantaneous relations. Although some recent methods achieve identifiability in the instantaneous causality case, they require either interventions on the latent variables or grouping of the observa… ▽ More

    Submitted 7 June, 2024; v1 submitted 24 May, 2024; originally announced May 2024.

  13. arXiv:2405.08001  [pdf, other

    math.OC cs.GR

    Preconditioned Nonlinear Conjugate Gradient Method for Real-time Interior-point Hyperelasticity

    Authors: Xing Shen, Runyuan Cai, Mengxiao Bi, Tangjie Lv

    Abstract: The linear conjugate gradient method is widely used in physical simulation, particularly for solving large-scale linear systems derived from Newton's method. The nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization, which is extensively utilized in solving practical large-scale unconstrained optimization problems. However, it is rarely discussed i… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

  14. arXiv:2405.03342  [pdf, other

    cs.LG

    Doubly Robust Causal Effect Estimation under Networked Interference via Targeted Learning

    Authors: Weilin Chen, Ruichu Cai, Zeqin Yang, Jie Qiao, Yuguang Yan, Zijian Li, Zhifeng Hao

    Abstract: Causal effect estimation under networked interference is an important but challenging problem. Available parametric methods are limited in their model space, while previous semiparametric methods, e.g., leveraging neural networks to fit only one single nuisance function, may still encounter misspecification problems under networked interference without appropriate assumptions on the data generatio… ▽ More

    Submitted 5 July, 2024; v1 submitted 6 May, 2024; originally announced May 2024.

    Comments: Accepted by ICML 2024

  15. arXiv:2404.13999  [pdf, other

    cs.CV

    CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment

    Authors: Kanglei Zhou, Junlin Li, Ruizhi Cai, Liyuan Wang, Xingxing Zhang, Xiaohui Liang

    Abstract: Action Quality Assessment (AQA) is pivotal for quantifying actions across domains like sports and medical care. Existing methods often rely on pre-trained backbones from large-scale action recognition datasets to boost performance on smaller AQA datasets. However, this common strategy yields suboptimal results due to the inherent struggle of these backbones to capture the subtle cues essential for… ▽ More

    Submitted 22 April, 2024; originally announced April 2024.

    Comments: Accepted by IJCAI 2024

  16. arXiv:2404.06220  [pdf, other

    cs.LG cs.MM

    Zero-Shot Relational Learning for Multimodal Knowledge Graphs

    Authors: Rui Cai, Shichao Pei, Xiangliang Zhang

    Abstract: Relational learning is an essential task in the domain of knowledge representation, particularly in knowledge graph completion (KGC).While relational learning in traditional single-modal settings has been extensively studied, exploring it within a multimodal KGC context presents distinct challenges and opportunities. One of the major challenges is inference on newly discovered relations without an… ▽ More

    Submitted 9 April, 2024; originally announced April 2024.

  17. arXiv:2404.04997  [pdf, other

    cs.LG cs.AI cs.CL

    Adapting LLMs for Efficient Context Processing through Soft Prompt Compression

    Authors: Cangqing Wang, Yutian Yang, Ruisi Li, Dan Sun, Ruicong Cai, Yuzhu Zhang, Chengqian Fu, Lillian Floyd

    Abstract: The rapid advancement of Large Language Models (LLMs) has inaugurated a transformative epoch in natural language processing, fostering unprecedented proficiency in text generation, comprehension, and contextual scrutiny. Nevertheless, effectively handling extensive contexts, crucial for myriad applications, poses a formidable obstacle owing to the intrinsic constraints of the models' context windo… ▽ More

    Submitted 18 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: This paper has been accepted by the 2024 International Conference on Image Processing and Computer Applications (IPCA 2024)

  18. arXiv:2403.16523  [pdf, other

    stat.ML cs.AI cs.LG

    Causal Discovery from Poisson Branching Structural Causal Model Using High-Order Cumulant with Path Analysis

    Authors: Jie Qiao, Yu Xiang, Zhengming Chen, Ruichu Cai, Zhifeng Hao

    Abstract: Count data naturally arise in many fields, such as finance, neuroscience, and epidemiology, and discovering causal structure among count data is a crucial task in various scientific and industrial scenarios. One of the most common characteristics of count data is the inherent branching structure described by a binomial thinning operator and an independent Poisson distribution that captures both br… ▽ More

    Submitted 25 March, 2024; originally announced March 2024.

    Comments: Accepted by AAAI-2024

  19. arXiv:2403.14390  [pdf, other

    cs.CL

    From Large to Tiny: Distilling and Refining Mathematical Expertise for Math Word Problems with Weakly Supervision

    Authors: Qingwen Lin, Boyan Xu, Zhengting Huang, Ruichu Cai

    Abstract: Addressing the challenge of high annotation costs in solving Math Word Problems (MWPs) through full supervision with intermediate equations, recent works have proposed weakly supervised task settings that rely solely on the final answer as a supervised signal. Existing leading approaches typically employ various search techniques to infer intermediate equations, but cannot ensure their semantic co… ▽ More

    Submitted 21 March, 2024; originally announced March 2024.

  20. arXiv:2402.19298  [pdf, other

    cs.CV

    Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing

    Authors: Xun Lin, Shuai Wang, Rizhao Cai, Yizhong Liu, Ying Fu, Zitong Yu, Wenzhong Tang, Alex Kot

    Abstract: Face Anti-Spoofing (FAS) is crucial for securing face recognition systems against presentation attacks. With advancements in sensor manufacture and multi-modal learning techniques, many multi-modal FAS approaches have emerged. However, they face challenges in generalizing to unseen attacks and deployment conditions. These challenges arise from (1) modality unreliability, where some modality sensor… ▽ More

    Submitted 5 March, 2024; v1 submitted 29 February, 2024; originally announced February 2024.

    Comments: Accepeted by CVPR 2024

  21. arXiv:2402.15819  [pdf, other

    cs.IR cs.LG

    Debiased Model-based Interactive Recommendation

    Authors: Zijian Li, Ruichu Cai, Haiqin Huang, Sili Zhang, Yuguang Yan, Zhifeng Hao, Zhenghua Dong

    Abstract: Existing model-based interactive recommendation systems are trained by querying a world model to capture the user preference, but learning the world model from historical logged data will easily suffer from bias issues such as popularity bias and sampling bias. This is why some debiased methods have been proposed recently. However, two essential drawbacks still remain: 1) ignoring the dynamics of… ▽ More

    Submitted 24 February, 2024; originally announced February 2024.

  22. arXiv:2402.12767  [pdf, other

    cs.LG

    When and How: Learning Identifiable Latent States for Nonstationary Time Series Forecasting

    Authors: Zijian Li, Ruichu Cai, Zhenhui Yang, Haiqin Huang, Guangyi Chen, Yifan Shen, Zhengming Chen, Xiangchen Song, Kun Zhang

    Abstract: Temporal distribution shifts are ubiquitous in time series data. One of the most popular methods assumes that the temporal distribution shift occurs uniformly to disentangle the stationary and nonstationary dependencies. But this assumption is difficult to meet, as we do not know when the distribution shifts occur. To solve this problem, we propose to learn IDentifiable latEnt stAtes (IDEA) to det… ▽ More

    Submitted 7 June, 2024; v1 submitted 20 February, 2024; originally announced February 2024.

  23. arXiv:2402.09165  [pdf, other

    cs.LG

    Unifying Invariance and Spuriousity for Graph Out-of-Distribution via Probability of Necessity and Sufficiency

    Authors: Xuexin Chen, Ruichu Cai, Kaitao Zheng, Zhifan Jiang, Zhengting Huang, Zhifeng Hao, Zijian Li

    Abstract: Graph Out-of-Distribution (OOD), requiring that models trained on biased data generalize to the unseen test data, has a massive of real-world applications. One of the most mainstream methods is to extract the invariant subgraph by aligning the original and augmented data with the help of environment augmentation. However, these solutions might lead to the loss or redundancy of semantic subgraph an… ▽ More

    Submitted 14 February, 2024; originally announced February 2024.

  24. arXiv:2402.08845  [pdf, other

    cs.LG stat.ME

    Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

    Authors: Xuexin Chen, Ruichu Cai, Zhengting Huang, Yuxuan Zhu, Julien Horwood, Zhifeng Hao, Zijian Li, Jose Miguel Hernandez-Lobato

    Abstract: We investigate the problem of explainability for machine learning models, focusing on Feature Attribution Methods (FAMs) that evaluate feature importance through perturbation tests. Despite their utility, FAMs struggle to distinguish the contributions of different features, when their prediction changes are similar after perturbation. To enhance FAMs' discriminative power, we introduce Feature Att… ▽ More

    Submitted 4 June, 2024; v1 submitted 13 February, 2024; originally announced February 2024.

    Comments: Accepted in the Proceedings of the 41st International Conference on Machine Learning (ICML2024)

  25. arXiv:2402.04869  [pdf, other

    cs.LG cs.AI

    Learning by Doing: An Online Causal Reinforcement Learning Framework with Causal-Aware Policy

    Authors: Ruichu Cai, Siyang Huang, Jie Qiao, Wei Chen, Yan Zeng, Keli Zhang, Fuchun Sun, Yang Yu, Zhifeng Hao

    Abstract: As a key component to intuitive cognition and reasoning solutions in human intelligence, causal knowledge provides great potential for reinforcement learning (RL) agents' interpretability towards decision-making by helping reduce the searching space. However, there is still a considerable gap in discovering and incorporating causality into RL, which hinders the rapid development of causal RL. In t… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

  26. arXiv:2312.13628  [pdf, other

    cs.LG

    Where and How to Attack? A Causality-Inspired Recipe for Generating Counterfactual Adversarial Examples

    Authors: Ruichu Cai, Yuxuan Zhu, Jie Qiao, Zefeng Liang, Furui Liu, Zhifeng Hao

    Abstract: Deep neural networks (DNNs) have been demonstrated to be vulnerable to well-crafted \emph{adversarial examples}, which are generated through either well-conceived $\mathcal{L}_p$-norm restricted or unrestricted attacks. Nevertheless, the majority of those approaches assume that adversaries can modify any features as they wish, and neglect the causal generating process of the data, which is unreaso… ▽ More

    Submitted 26 January, 2024; v1 submitted 21 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI-2024

  27. arXiv:2312.12206  [pdf, other

    cs.LG cs.AI stat.ME

    Identification of Causal Structure in the Presence of Missing Data with Additive Noise Model

    Authors: Jie Qiao, Zhengming Chen, Jianhua Yu, Ruichu Cai, Zhifeng Hao

    Abstract: Missing data are an unavoidable complication frequently encountered in many causal discovery tasks. When a missing process depends on the missing values themselves (known as self-masking missingness), the recovery of the joint distribution becomes unattainable, and detecting the presence of such self-masking missingness remains a perplexing challenge. Consequently, due to the inability to reconstr… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI-2024

  28. arXiv:2312.11934  [pdf, other

    cs.LG cs.AI stat.ME

    Identification of Causal Structure with Latent Variables Based on Higher Order Cumulants

    Authors: Wei Chen, Zhiyi Huang, Ruichu Cai, Zhifeng Hao, Kun Zhang

    Abstract: Causal discovery with latent variables is a crucial but challenging task. Despite the emergence of numerous methods aimed at addressing this challenge, they are not fully identified to the structure that two observed variables are influenced by one latent variable and there might be a directed edge in between. Interestingly, we notice that this structure can be identified through the utilization o… ▽ More

    Submitted 19 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI 2024

  29. arXiv:2312.02896  [pdf, other

    cs.CV

    BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models

    Authors: Rizhao Cai, Zirui Song, Dayan Guan, Zhenhao Chen, Xing Luo, Chenyu Yi, Alex Kot

    Abstract: Large Multimodal Models (LMMs) such as GPT-4V and LLaVA have shown remarkable capabilities in visual reasoning with common image styles. However, their robustness against diverse style shifts, crucial for practical applications, remains largely unexplored. In this paper, we propose a new benchmark, BenchLMM, to assess the robustness of LMMs against three different styles: artistic image style, ima… ▽ More

    Submitted 5 December, 2023; v1 submitted 5 December, 2023; originally announced December 2023.

    Comments: Code is available at https://github.com/AIFEG/BenchLMM

  30. arXiv:2311.07475   

    cs.CV

    Masked Face Dataset Generation and Masked Face Recognition

    Authors: Rui Cai, Xuying Ning, Peter N. Belhumeur

    Abstract: In the post-pandemic era, wearing face masks has posed great challenge to the ordinary face recognition. In the previous study, researchers has applied pretrained VGG16, and ResNet50 to extract features on the elaborate curated existing masked face recognition (MFR) datasets, RMFRD and SMFRD. To make the model more adaptable to the real world situation where the sample size is smaller and the came… ▽ More

    Submitted 25 December, 2023; v1 submitted 13 November, 2023; originally announced November 2023.

    Comments: This is not a conference paper and is just a technical report

  31. arXiv:2311.04837  [pdf, other

    cs.LG cs.AI q-bio.QM

    Identifying Semantic Component for Robust Molecular Property Prediction

    Authors: Zijian Li, Zunhong Xu, Ruichu Cai, Zhenhui Yang, Yuguang Yan, Zhifeng Hao, Guangyi Chen, Kun Zhang

    Abstract: Although graph neural networks have achieved great success in the task of molecular property prediction in recent years, their generalization ability under out-of-distribution (OOD) settings is still under-explored. Different from existing methods that learn discriminative representations for prediction, we propose a generative model with semantic-components identifiability, named SCI. We demonstr… ▽ More

    Submitted 8 November, 2023; originally announced November 2023.

  32. arXiv:2310.04723  [pdf, other

    cs.LG stat.ML

    Subspace Identification for Multi-Source Domain Adaptation

    Authors: Zijian Li, Ruichu Cai, Guangyi Chen, Boyang Sun, Zhifeng Hao, Kun Zhang

    Abstract: Multi-source domain adaptation (MSDA) methods aim to transfer knowledge from multiple labeled source domains to an unlabeled target domain. Although current methods achieve target joint distribution identifiability by enforcing minimal changes across domains, they often necessitate stringent conditions, such as an adequate number of domains, monotonic transformation of latent variables, and invari… ▽ More

    Submitted 14 December, 2023; v1 submitted 7 October, 2023; originally announced October 2023.

    Comments: NeurIPS2023 Spotlight

  33. arXiv:2309.11092  [pdf, other

    cs.CV cs.MM

    Forgery-aware Adaptive Vision Transformer for Face Forgery Detection

    Authors: Anwei Luo, Rizhao Cai, Chenqi Kong, Xiangui Kang, Jiwu Huang, Alex C. Kot

    Abstract: With the advancement in face manipulation technologies, the importance of face forgery detection in protecting authentication integrity becomes increasingly evident. Previous Vision Transformer (ViT)-based detectors have demonstrated subpar performance in cross-database evaluations, primarily because fully fine-tuning with limited Deepfake data often leads to forgetting pre-trained knowledge and o… ▽ More

    Submitted 20 September, 2023; originally announced September 2023.

  34. arXiv:2309.04038  [pdf, other

    cs.CV

    S-Adapter: Generalizing Vision Transformer for Face Anti-Spoofing with Statistical Tokens

    Authors: Rizhao Cai, Zitong Yu, Chenqi Kong, Haoliang Li, Changsheng Chen, Yongjian Hu, Alex Kot

    Abstract: Face Anti-Spoofing (FAS) aims to detect malicious attempts to invade a face recognition system by presenting spoofed faces. State-of-the-art FAS techniques predominantly rely on deep learning models but their cross-domain generalization capabilities are often hindered by the domain shift problem, which arises due to different distributions between training and testing data. In this study, we devel… ▽ More

    Submitted 19 June, 2024; v1 submitted 7 September, 2023; originally announced September 2023.

    Comments: Accepted by IEEE Transactions on Information Forensics Security (June 2024)

  35. arXiv:2309.02420  [pdf, other

    cs.CV

    Doppelgangers: Learning to Disambiguate Images of Similar Structures

    Authors: Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch-Elor, Bharath Hariharan, Noah Snavely

    Abstract: We consider the visual disambiguation task of determining whether a pair of visually similar images depict the same or distinct 3D surfaces (e.g., the same or opposite sides of a symmetric building). Illusory image matches, where two images observe distinct but visually similar 3D surfaces, can be challenging for humans to differentiate, and can also lead 3D reconstruction algorithms to produce er… ▽ More

    Submitted 5 September, 2023; originally announced September 2023.

    Comments: Published in ICCV 2023 (Oral); Project page: http://doppelgangers-3d.github.io/

  36. arXiv:2308.10110  [pdf, other

    cs.CV cs.AI cs.LG

    Robust Mixture-of-Expert Training for Convolutional Neural Networks

    Authors: Yihua Zhang, Ruisi Cai, Tianlong Chen, Guanhua Zhang, Huan Zhang, Pin-Yu Chen, Shiyu Chang, Zhangyang Wang, Sijia Liu

    Abstract: Sparsely-gated Mixture of Expert (MoE), an emerging deep model architecture, has demonstrated a great promise to enable high-accuracy and ultra-efficient model inference. Despite the growing popularity of MoE, little work investigated its potential to advance convolutional neural networks (CNNs), especially in the plane of adversarial robustness. Since the lack of robustness has become one of the… ▽ More

    Submitted 19 August, 2023; originally announced August 2023.

    Comments: ICCV 2023

  37. arXiv:2308.09107  [pdf, other

    cs.CV

    Hyperbolic Face Anti-Spoofing

    Authors: Shuangpeng Han, Rizhao Cai, Yawen Cui, Zitong Yu, Yongjian Hu, Alex Kot

    Abstract: Learning generalized face anti-spoofing (FAS) models against presentation attacks is essential for the security of face recognition systems. Previous FAS methods usually encourage models to extract discriminative features, of which the distances within the same class (bonafide or attack) are pushed close while those between bonafide and attack are pulled away. However, these methods are designed b… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

  38. arXiv:2308.06718  [pdf, other

    cs.LG cs.AI stat.ME

    Generalized Independent Noise Condition for Estimating Causal Structure with Latent Variables

    Authors: Feng Xie, Biwei Huang, Zhengming Chen, Ruichu Cai, Clark Glymour, Zhi Geng, Kun Zhang

    Abstract: We investigate the task of learning causal structure in the presence of latent variables, including locating latent variables and determining their quantity, and identifying causal relationships among both latent and observed variables. To this end, we propose a Generalized Independent Noise (GIN) condition for linear non-Gaussian acyclic causal models that incorporate latent variables, which esta… ▽ More

    Submitted 9 June, 2024; v1 submitted 13 August, 2023; originally announced August 2023.

  39. arXiv:2308.04011  [pdf, other

    cs.LG stat.ME

    Generalization bound for estimating causal effects from observational network data

    Authors: Ruichu Cai, Zeqin Yang, Weilin Chen, Yuguang Yan, Zhifeng Hao

    Abstract: Estimating causal effects from observational network data is a significant but challenging problem. Existing works in causal inference for observational network data lack an analysis of the generalization bound, which can theoretically provide support for alleviating the complex confounding bias and practically guide the design of learning objectives in a principled manner. To fill this gap, we de… ▽ More

    Submitted 7 August, 2023; originally announced August 2023.

  40. arXiv:2307.16405  [pdf, other

    cs.LG stat.ME stat.ML

    Causal-learn: Causal Discovery in Python

    Authors: Yujia Zheng, Biwei Huang, Wei Chen, Joseph Ramsey, Mingming Gong, Ruichu Cai, Shohei Shimizu, Peter Spirtes, Kun Zhang

    Abstract: Causal discovery aims at revealing causal relations from observational data, which is a fundamental task in science and engineering. We describe $\textit{causal-learn}$, an open-source Python library for causal discovery. This library focuses on bringing a comprehensive collection of causal discovery methods to both practitioners and researchers. It provides easy-to-use APIs for non-specialists, m… ▽ More

    Submitted 31 July, 2023; originally announced July 2023.

    Journal ref: Journal of Machine Learning Research 25 (2024)

  41. arXiv:2307.13958  [pdf, other

    cs.CV

    Visual Prompt Flexible-Modal Face Anti-Spoofing

    Authors: Zitong Yu, Rizhao Cai, Yawen Cui, Ajian Liu, Changsheng Chen

    Abstract: Recently, vision transformer based multimodal learning methods have been proposed to improve the robustness of face anti-spoofing (FAS) systems. However, multimodal face data collected from the real world is often imperfect due to missing modalities from various imaging sensors. Recently, flexible-modal FAS~\cite{yu2023flexible} has attracted more attention, which aims to develop a unified multimo… ▽ More

    Submitted 26 July, 2023; originally announced July 2023.

    Comments: arXiv admin note: text overlap with arXiv:2303.03369 by other authors

  42. arXiv:2306.14114  [pdf, other

    cs.LG cs.AI

    TNPAR: Topological Neural Poisson Auto-Regressive Model for Learning Granger Causal Structure from Event Sequences

    Authors: Yuequn Liu, Ruichu Cai, Wei Chen, Jie Qiao, Yuguang Yan, Zijian Li, Keli Zhang, Zhifeng Hao

    Abstract: Learning Granger causality from event sequences is a challenging but essential task across various applications. Most existing methods rely on the assumption that event sequences are independent and identically distributed (i.i.d.). However, this i.i.d. assumption is often violated due to the inherent dependencies among the event sequences. Fortunately, in practice, we find these dependencies can… ▽ More

    Submitted 12 March, 2024; v1 submitted 24 June, 2023; originally announced June 2023.

  43. arXiv:2306.14048  [pdf, other

    cs.LG

    H$_2$O: Heavy-Hitter Oracle for Efficient Generative Inference of Large Language Models

    Authors: Zhenyu Zhang, Ying Sheng, Tianyi Zhou, Tianlong Chen, Lianmin Zheng, Ruisi Cai, Zhao Song, Yuandong Tian, Christopher Ré, Clark Barrett, Zhangyang Wang, Beidi Chen

    Abstract: Large Language Models (LLMs), despite their recent impressive accomplishments, are notably cost-prohibitive to deploy, particularly for applications involving long-content generation, such as dialogue systems and story writing. Often, a large amount of transient state information, referred to as the KV cache, is stored in GPU memory in addition to model parameters, scaling linearly with the sequen… ▽ More

    Submitted 18 December, 2023; v1 submitted 24 June, 2023; originally announced June 2023.

  44. arXiv:2306.10125  [pdf, other

    cs.LG cs.AI eess.SP stat.AP

    Self-Supervised Learning for Time Series Analysis: Taxonomy, Progress, and Prospects

    Authors: Kexin Zhang, Qingsong Wen, Chaoli Zhang, Rongyao Cai, Ming Jin, Yong Liu, James Zhang, Yuxuan Liang, Guansong Pang, Dongjin Song, Shirui Pan

    Abstract: Self-supervised learning (SSL) has recently achieved impressive performance on various time series tasks. The most prominent advantage of SSL is that it reduces the dependence on labeled data. Based on the pre-training and fine-tuning strategy, even a small amount of labeled data can achieve high performance. Compared with many published self-supervised surveys on computer vision and natural langu… ▽ More

    Submitted 8 April, 2024; v1 submitted 16 June, 2023; originally announced June 2023.

    Comments: Accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI); 26 pages, 200+ references; the first work to comprehensively and systematically summarize self-supervised learning for time series analysis (SSL4TS). The GitHub repository is https://github.com/qingsongedu/Awesome-SSL4TS

  45. arXiv:2306.07970  [pdf, other

    cs.CV

    Neural Scene Chronology

    Authors: Haotong Lin, Qianqian Wang, Ruojin Cai, Sida Peng, Hadar Averbuch-Elor, Xiaowei Zhou, Noah Snavely

    Abstract: In this work, we aim to reconstruct a time-varying 3D model, capable of rendering photo-realistic renderings with independent control of viewpoint, illumination, and time, from Internet photos of large-scale landmarks. The core challenges are twofold. First, different types of temporal changes, such as illumination and changes to the underlying scene itself (such as replacing one graffiti artwork… ▽ More

    Submitted 13 June, 2023; originally announced June 2023.

    Comments: CVPR 2023; Project page: https://zju3dv.github.io/neusc/

  46. arXiv:2306.05422  [pdf, other

    cs.CV

    Tracking Everything Everywhere All at Once

    Authors: Qianqian Wang, Yen-Yu Chang, Ruojin Cai, Zhengqi Li, Bharath Hariharan, Aleksander Holynski, Noah Snavely

    Abstract: We present a new test-time optimization method for estimating dense and long-range motion from a video sequence. Prior optical flow or particle video tracking algorithms typically operate within limited temporal windows, struggling to track through occlusions and maintain global consistency of estimated motion trajectories. We propose a complete and globally consistent motion representation, dubbe… ▽ More

    Submitted 12 September, 2023; v1 submitted 8 June, 2023; originally announced June 2023.

    Comments: ICCV 2023

  47. arXiv:2305.19582  [pdf, ps, other

    cs.LG cs.AI stat.ME

    Causal Discovery with Latent Confounders Based on Higher-Order Cumulants

    Authors: Ruichu Cai, Zhiyi Huang, Wei Chen, Zhifeng Hao, Kun Zhang

    Abstract: Causal discovery with latent confounders is an important but challenging task in many scientific areas. Despite the success of some overcomplete independent component analysis (OICA) based methods in certain domains, they are computationally expensive and can easily get stuck into local optima. We notice that interestingly, by making use of higher-order cumulants, there exists a closed-form soluti… ▽ More

    Submitted 31 May, 2023; originally announced May 2023.

    Comments: Accepted by ICML 2023

  48. arXiv:2305.05986  [pdf, other

    cs.LG cs.AI stat.ME

    Structural Hawkes Processes for Learning Causal Structure from Discrete-Time Event Sequences

    Authors: Jie Qiao, Ruichu Cai, Siyu Wu, Yu Xiang, Keli Zhang, Zhifeng Hao

    Abstract: Learning causal structure among event types from discrete-time event sequences is a particularly important but challenging task. Existing methods, such as the multivariate Hawkes processes based methods, mostly boil down to learning the so-called Granger causality which assumes that the cause event happens strictly prior to its effect event. Such an assumption is often untenable beyond application… ▽ More

    Submitted 10 May, 2023; originally announced May 2023.

    Comments: Accepted by IJCAI 2023

  49. arXiv:2303.09914  [pdf, other

    cs.CV

    Rehearsal-Free Domain Continual Face Anti-Spoofing: Generalize More and Forget Less

    Authors: Rizhao Cai, Yawen Cui, Zhi Li, Zitong Yu, Haoliang Li, Yongjian Hu, Alex Kot

    Abstract: Face Anti-Spoofing (FAS) is recently studied under the continual learning setting, where the FAS models are expected to evolve after encountering the data from new domains. However, existing methods need extra replay buffers to store previous data for rehearsal, which becomes infeasible when previous data is unavailable because of privacy issues. In this paper, we propose the first rehearsal-free… ▽ More

    Submitted 16 March, 2023; originally announced March 2023.

  50. arXiv:2302.12480  [pdf, other

    cs.LG

    Robust Weight Signatures: Gaining Robustness as Easy as Patching Weights?

    Authors: Ruisi Cai, Zhenyu Zhang, Zhangyang Wang

    Abstract: Given a robust model trained to be resilient to one or multiple types of distribution shifts (e.g., natural image corruptions), how is that "robustness" encoded in the model weights, and how easily can it be disentangled and/or "zero-shot" transferred to some other models? This paper empirically suggests a surprisingly simple answer: linearly - by straightforward model weight arithmetic! We start… ▽ More

    Submitted 24 February, 2023; originally announced February 2023.