Skip to main content

Showing 1–50 of 134 results for author: Tan, R

  1. arXiv:2407.07510  [pdf, other

    cs.CR cs.CV eess.SY

    Invisible Optical Adversarial Stripes on Traffic Sign against Autonomous Vehicles

    Authors: Dongfang Guo, Yuting Wu, Yimin Dai, Pengfei Zhou, Xin Lou, Rui Tan

    Abstract: Camera-based computer vision is essential to autonomous vehicle's perception. This paper presents an attack that uses light-emitting diodes and exploits the camera's rolling shutter effect to create adversarial stripes in the captured images to mislead traffic sign recognition. The attack is stealthy because the stripes on the traffic sign are invisible to human. For the attack to be threatening,… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

    Journal ref: In Proceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services (MobiSys 2024), 534-546

  2. arXiv:2407.05870  [pdf

    cs.SD cs.HC eess.AS

    Cervical Auscultation Machine Learning for Dysphagia Assessment

    Authors: An An Chia, Stacy Lum, Michelle Boo, Rex Tan, Balamurali B T, Jer-Ming Chen

    Abstract: This study evaluates the use of machine learning, specifically the Random Forest Classifier, to differentiate normal and pathological swallowing sounds. Employing a commercially available wearable stethoscope, we recorded swallows from both healthy adults and patients with dysphagia. The analysis revealed statistically significant differences in acoustic features, such as spectral crest, and zero-… ▽ More

    Submitted 8 July, 2024; originally announced July 2024.

    Comments: International Conference on Signal Processing and Communications (SPCOM) July 01 - 04, 2024

  3. arXiv:2406.16671  [pdf, other

    cs.RO

    STAR: Swarm Technology for Aerial Robotics Research

    Authors: Jimmy Chiun, Yan Rui Tan, Yuhong Cao, John Tan, Guillaume Sartoretti

    Abstract: In recent years, the field of aerial robotics has witnessed significant progress, finding applications in diverse domains, including post-disaster search and rescue operations. Despite these strides, the prohibitive acquisition costs associated with deploying physical multi-UAV systems have posed challenges, impeding their widespread utilization in research endeavors. To overcome these challenges,… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.11707  [pdf, other

    cs.CR cs.CV cs.LG

    A First Physical-World Trajectory Prediction Attack via LiDAR-induced Deceptions in Autonomous Driving

    Authors: Yang Lou, Yi Zhu, Qun Song, Rui Tan, Chunming Qiao, Wei-Bin Lee, Jianping Wang

    Abstract: Trajectory prediction forecasts nearby agents' moves based on their historical trajectories. Accurate trajectory prediction is crucial for autonomous vehicles. Existing attacks compromise the prediction model of a victim AV by directly manipulating the historical trajectory of an attacker AV, which has limited real-world applicability. This paper, for the first time, explores an indirect attack ap… ▽ More

    Submitted 17 June, 2024; originally announced June 2024.

    Comments: In Proceedings of the 33rd USENIX Security Symposium 2024

  5. arXiv:2406.03722  [pdf, other

    cs.LG cs.AI cs.NE

    Offline Multi-Objective Optimization

    Authors: Ke Xue, Rong-Xi Tan, Xiaobin Huang, Chao Qian

    Abstract: Offline optimization aims to maximize a black-box objective function with a static dataset and has wide applications. In addition to the objective function being black-box and expensive to evaluate, numerous complex real-world problems entail optimizing multiple conflicting objectives, i.e., multi-objective optimization (MOO). Nevertheless, offline MOO has not progressed as much as offline single-… ▽ More

    Submitted 5 June, 2024; originally announced June 2024.

    Comments: ICML 2024

  6. arXiv:2405.14646  [pdf, other

    cs.CL

    Unveiling the Achilles' Heel of NLG Evaluators: A Unified Adversarial Framework Driven by Large Language Models

    Authors: Yiming Chen, Chen Zhang, Danqing Luo, Luis Fernando D'Haro, Robby T. Tan, Haizhou Li

    Abstract: The automatic evaluation of natural language generation (NLG) systems presents a long-lasting challenge. Recent studies have highlighted various neural metrics that align well with human evaluations. Yet, the robustness of these evaluators against adversarial perturbations remains largely under-explored due to the unique challenges in obtaining adversarial data for different NLG evaluation tasks.… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ACL24 Finding

  7. arXiv:2404.16885  [pdf

    cs.CV cs.AI cs.CY cs.LG

    Adapting an Artificial Intelligence Sexually Transmitted Diseases Symptom Checker Tool for Mpox Detection: The HeHealth Experience

    Authors: Rayner Kay Jin Tan, Dilruk Perera, Salomi Arasaratnam, Yudara Kularathne

    Abstract: Artificial Intelligence applications have shown promise in the management of pandemics and have been widely used to assist the identification, classification, and diagnosis of medical images. In response to the global outbreak of Monkeypox (Mpox), the HeHealth.ai team leveraged an existing tool to screen for sexually transmitted diseases to develop a digital screening test for symptomatic Mpox thr… ▽ More

    Submitted 23 April, 2024; originally announced April 2024.

    Comments: 15 pages, 4 figures

  8. arXiv:2404.04346  [pdf, other

    cs.CV

    Koala: Key frame-conditioned long video-LLM

    Authors: Reuben Tan, Ximeng Sun, Ping Hu, Jui-hsien Wang, Hanieh Deilamsalehy, Bryan A. Plummer, Bryan Russell, Kate Saenko

    Abstract: Long video question answering is a challenging task that involves recognizing short-term activities and reasoning about their fine-grained relationships. State-of-the-art video Large Language Models (vLLMs) hold promise as a viable solution due to their demonstrated emergent capabilities on new tasks. However, despite being trained on millions of short seconds-long videos, vLLMs are unable to unde… ▽ More

    Submitted 3 May, 2024; v1 submitted 5 April, 2024; originally announced April 2024.

    Comments: Accepted at CVPR 2024 as a poster highlight

  9. arXiv:2404.01958  [pdf, other

    cs.LG

    MESEN: Exploit Multimodal Data to Design Unimodal Human Activity Recognition with Few Labels

    Authors: Lilin Xu, Chaojie Gu, Rui Tan, Shibo He, Jiming Chen

    Abstract: Human activity recognition (HAR) will be an essential function of various emerging applications. However, HAR typically encounters challenges related to modality limitations and label scarcity, leading to an application gap between current solutions and real-world requirements. In this work, we propose MESEN, a multimodal-empowered unimodal sensing framework, to utilize unlabeled multimodal data a… ▽ More

    Submitted 2 April, 2024; originally announced April 2024.

    Comments: Accepted to the 21th ACM Conference on Embedded Networked Sensor Systems (SenSys 2023)

  10. arXiv:2403.19278  [pdf, other

    cs.CV

    CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection

    Authors: Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan

    Abstract: Domain adaptive object detection aims to adapt detection models to domains where annotated data is unavailable. Existing methods have been proposed to address the domain gap using the semi-supervised student-teacher framework. However, a fundamental issue arises from the class imbalance in the labelled training set, which can result in inaccurate pseudo-labels. The relationship between classes, es… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: Accepted into CVPR 2024

  11. arXiv:2403.12529  [pdf, other

    cs.LG

    Contextualized Messages Boost Graph Representations

    Authors: Brian Godwin Lim, Galvin Brice Lim, Renzo Roel Tan, Kazushi Ikeda

    Abstract: Graph neural networks (GNNs) have gained significant attention in recent years for their ability to process data that may be represented as graphs. This success has prompted several studies to explore the representational capability of GNNs based on the graph isomorphism task. These works inherently assume a countable node feature representation, potentially limiting their applicability. Interesti… ▽ More

    Submitted 22 May, 2024; v1 submitted 19 March, 2024; originally announced March 2024.

  12. arXiv:2403.07408  [pdf, other

    cs.CV

    NightHaze: Nighttime Image Dehazing via Self-Prior Learning

    Authors: Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Robby T. Tan

    Abstract: Masked autoencoder (MAE) shows that severe augmentation during training produces robust representations for high-level tasks. This paper brings the MAE-like framework to nighttime image enhancement, demonstrating that severe augmentation during training produces strong network priors that are resilient to real-world night haze degradations. We propose a novel nighttime image dehazing method with s… ▽ More

    Submitted 12 March, 2024; originally announced March 2024.

  13. arXiv:2403.06485  [pdf, other

    cs.SE cs.CL cs.LG

    Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach

    Authors: Jinxi Kuang, Jinyang Liu, Junjie Huang, Renyi Zhong, Jiazhen Gu, Lan Yu, Rui Tan, Zengyin Yang, Michael R. Lyu

    Abstract: Due to the scale and complexity of cloud systems, a system failure would trigger an "alert storm", i.e., massive correlated alerts. Although these alerts can be traced back to a few root causes, the overwhelming number makes it infeasible for manual handling. Alert aggregation is thus critical to help engineers concentrate on the root cause and facilitate failure resolution. Existing methods typic… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: Accepted by Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE SEIP 2024)

  14. arXiv:2402.18600  [pdf

    eess.IV cs.AI q-bio.TO

    Artificial Intelligence and Diabetes Mellitus: An Inside Look Through the Retina

    Authors: Yasin Sadeghi Bazargani, Majid Mirzaei, Navid Sobhi, Mirsaeed Abdollahi, Ali Jafarizadeh, Siamak Pedrammehr, Roohallah Alizadehsani, Ru San Tan, Sheikh Mohammed Shariful Islam, U. Rajendra Acharya

    Abstract: Diabetes mellitus (DM) predisposes patients to vascular complications. Retinal images and vasculature reflect the body's micro- and macrovascular health. They can be used to diagnose DM complications, including diabetic retinopathy (DR), neuropathy, nephropathy, and atherosclerotic cardiovascular disease, as well as forecast the risk of cardiovascular events. Artificial intelligence (AI)-enabled s… ▽ More

    Submitted 27 February, 2024; originally announced February 2024.

    Comments: 44 Pages, 6 figures, 1 table, 166 references

    ACM Class: J.3.2; J.3.3

  15. arXiv:2402.09975  [pdf

    eess.IV cs.CV

    Current and future roles of artificial intelligence in retinopathy of prematurity

    Authors: Ali Jafarizadeh, Shadi Farabi Maleki, Parnia Pouya, Navid Sobhi, Mirsaeed Abdollahi, Siamak Pedrammehr, Chee Peng Lim, Houshyar Asadi, Roohallah Alizadehsani, Ru-San Tan, Sheikh Mohammad Shariful Islam, U. Rajendra Acharya

    Abstract: Retinopathy of prematurity (ROP) is a severe condition affecting premature infants, leading to abnormal retinal blood vessel growth, retinal detachment, and potential blindness. While semi-automated systems have been used in the past to diagnose ROP-related plus disease by quantifying retinal vessel features, traditional machine learning (ML) models face challenges like accuracy and overfitting. R… ▽ More

    Submitted 15 February, 2024; originally announced February 2024.

    Comments: 28 pages, 8 figures, 2 tables, 235 references, 1 supplementary table

    ACM Class: J.3.2; J.3.3

  16. arXiv:2402.03753  [pdf, other

    cs.LG physics.comp-ph

    Enhanced sampling of robust molecular datasets with uncertainty-based collective variables

    Authors: Aik Rui Tan, Johannes C. B. Dietschreit, Rafael Gomez-Bombarelli

    Abstract: Generating a data set that is representative of the accessible configuration space of a molecular system is crucial for the robustness of machine learned interatomic potentials (MLIP). However, the complexity of molecular systems, characterized by intricate potential energy surfaces (PESs) with numerous local minima and energy barriers, presents a significant challenge. Traditional methods of data… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: 13 pages, 4 figures, 10 pages of Supplementary Information

  17. arXiv:2401.11150  [pdf, other

    cs.CV

    Simultaneous Gesture Classification and Localization with an Automatic Gesture Annotation Model

    Authors: Junxiao Shen, Xuhai Xu, Ran Tan, Amy Karlson, Evan Strasnick

    Abstract: Training a real-time gesture recognition model heavily relies on annotated data. However, manual data annotation is costly and demands substantial human effort. In order to address this challenge, we propose a novel annotation model that can automatically annotate gesture classes and identify their temporal ranges. Our ablation study demonstrates that our annotation model design surpasses the base… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  18. arXiv:2401.11144  [pdf, other

    cs.CV

    Towards Open-World Gesture Recognition

    Authors: Junxiao Shen, Matthias De Lange, Xuhai "Orson" Xu, Enmin Zhou, Ran Tan, Naveen Suda, Maciej Lazarewicz, Per Ola Kristensson, Amy Karlson, Evan Strasnick

    Abstract: Static machine learning methods in gesture recognition assume that training and test data come from the same underlying distribution. However, in real-world applications involving gesture recognition on wrist-worn devices, data distribution may change over time. We formulate this problem of adapting recognition models to new tasks, where new data patterns emerge, as open-world gesture recognition… ▽ More

    Submitted 20 January, 2024; originally announced January 2024.

  19. arXiv:2401.08732  [pdf, other

    cs.LG cs.CV cs.IT

    Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information

    Authors: Linfeng Ye, Shayan Mohajer Hamidi, Renhao Tan, En-Hui Yang

    Abstract: It is believed that in knowledge distillation (KD), the role of the teacher is to provide an estimate for the unknown Bayes conditional probability distribution (BCPD) to be used in the student training process. Conventionally, this estimate is obtained by training the teacher using maximum log-likelihood (MLL) method. To improve this estimate for KD, in this paper we introduce the concept of cond… ▽ More

    Submitted 7 March, 2024; v1 submitted 16 January, 2024; originally announced January 2024.

    Comments: 32 pages, 19 figures, Published as a conference paper at ICLR 2024

    MSC Class: 68T30 ACM Class: I.2.6

    Journal ref: International Conference on Learning Representations 2024 (ICLR)

  20. Semantic Segmentation in Multiple Adverse Weather Conditions with Domain Knowledge Retention

    Authors: Xin Yang, Wending Yan, Yuan Yuan, Michael Bi Mi, Robby T. Tan

    Abstract: Semantic segmentation's performance is often compromised when applied to unlabeled adverse weather conditions. Unsupervised domain adaptation is a potential approach to enhancing the model's adaptability and robustness to adverse weather. However, existing methods encounter difficulties when sequentially adapting the model to multiple unlabeled adverse weather conditions. They struggle to acquire… ▽ More

    Submitted 14 January, 2024; originally announced January 2024.

  21. arXiv:2401.00729  [pdf, other

    cs.CV

    NightRain: Nighttime Video Deraining via Adaptive-Rain-Removal and Adaptive-Correction

    Authors: Beibei Lin, Yeying Jin, Wending Yan, Wei Ye, Yuan Yuan, Shunli Zhang, Robby Tan

    Abstract: Existing deep-learning-based methods for nighttime video deraining rely on synthetic data due to the absence of real-world paired data. However, the intricacies of the real world, particularly with the presence of light effects and low-light regions affected by noise, create significant domain gaps, hampering synthetic-trained models in removing rain streaks properly and leading to over-saturation… ▽ More

    Submitted 10 January, 2024; v1 submitted 1 January, 2024; originally announced January 2024.

    Comments: Accepted by AAAI24

  22. arXiv:2312.17492  [pdf, other

    cs.CV

    HEAP: Unsupervised Object Discovery and Localization with Contrastive Grouping

    Authors: Xin Zhang, Jinheng Xie, Yuan Yuan, Michael Bi Mi, Robby T. Tan

    Abstract: Unsupervised object discovery and localization aims to detect or segment objects in an image without any supervision. Recent efforts have demonstrated a notable potential to identify salient foreground objects by utilizing self-supervised transformer features. However, their scopes only build upon patch-level features within an image, neglecting region/image-level and cross-image relationships at… ▽ More

    Submitted 4 January, 2024; v1 submitted 29 December, 2023; originally announced December 2023.

    Comments: Accepted by AAAI24

  23. arXiv:2312.10586  [pdf, other

    cs.CV

    Few-Shot Learning from Augmented Label-Uncertain Queries in Bongard-HOI

    Authors: Qinqian Lei, Bo Wang, Robby T. Tan

    Abstract: Detecting human-object interactions (HOI) in a few-shot setting remains a challenge. Existing meta-learning methods struggle to extract representative features for classification due to the limited data, while existing few-shot HOI models rely on HOI text labels for classification. Moreover, some query images may display visual similarity to those outside their class, such as similar backgrounds b… ▽ More

    Submitted 16 December, 2023; originally announced December 2023.

    Comments: 9 pages, 4 figures

  24. arXiv:2311.12698  [pdf, other

    cs.DS

    Informative Path Planning with Limited Adaptivity

    Authors: Rayen Tan, Rohan Ghuge, Viswanath Nagarajan

    Abstract: We consider the informative path planning ($\mathtt{IPP}$) problem in which a robot interacts with an uncertain environment and gathers information by visiting locations. The goal is to minimize its expected travel cost to cover a given submodular function. Adaptive solutions, where the robot incorporates all available information to select the next location to visit, achieve the best objective. H… ▽ More

    Submitted 21 November, 2023; originally announced November 2023.

    Comments: 35 pages, 9 figures

  25. arXiv:2311.07609  [pdf

    q-bio.QM cs.CV eess.IV physics.med-ph

    Artificial Intelligence in Assessing Cardiovascular Diseases and Risk Factors via Retinal Fundus Images: A Review of the Last Decade

    Authors: Mirsaeed Abdollahi, Ali Jafarizadeh, Amirhosein Ghafouri Asbagh, Navid Sobhi, Keysan Pourmoghtader, Siamak Pedrammehr, Houshyar Asadi, Roohallah Alizadehsani, Ru-San Tan, U. Rajendra Acharya

    Abstract: Background: Cardiovascular diseases (CVDs) are the leading cause of death globally. The use of artificial intelligence (AI) methods - in particular, deep learning (DL) - has been on the rise lately for the analysis of different CVD-related topics. The use of fundus images and optical coherence tomography angiography (OCTA) in the diagnosis of retinal diseases has also been extensively studied. To… ▽ More

    Submitted 28 April, 2024; v1 submitted 11 November, 2023; originally announced November 2023.

    Comments: 41 pages, 5 figures, 3 tables, 114 references

    ACM Class: J.3.2; J.3.3

  26. arXiv:2311.01454  [pdf, other

    cs.RO cs.AI

    NOIR: Neural Signal Operated Intelligent Robots for Everyday Activities

    Authors: Ruohan Zhang, Sharon Lee, Minjune Hwang, Ayano Hiranaka, Chen Wang, Wensi Ai, Jin Jie Ryan Tan, Shreya Gupta, Yilun Hao, Gabrael Levine, Ruohan Gao, Anthony Norcia, Li Fei-Fei, Jiajun Wu

    Abstract: We present Neural Signal Operated Intelligent Robots (NOIR), a general-purpose, intelligent brain-robot interface system that enables humans to command robots to perform everyday activities through brain signals. Through this interface, humans communicate their intended objects of interest and actions to the robots using electroencephalography (EEG). Our novel system demonstrates success in an exp… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

  27. arXiv:2310.13016  [pdf

    cs.OH cs.AI

    Solving the multiplication problem of a large language model system using a graph-based method

    Authors: Turker Tuncer, Sengul Dogan, Mehmet Baygin, Prabal Datta Barua, Abdul Hafeez-Baig, Ru-San Tan, Subrata Chakraborty, U. Rajendra Acharya

    Abstract: The generative pre-trained transformer (GPT)-based chatbot software ChatGPT possesses excellent natural language processing capabilities but is inadequate for solving arithmetic problems, especially multiplication. Its GPT structure uses a computational graph for multiplication, which has limited accuracy beyond simple multiplication operations. We developed a graph-based multiplication algorithm… ▽ More

    Submitted 18 October, 2023; originally announced October 2023.

    Comments: 9 pages, 3 figures

  28. arXiv:2310.05183  [pdf, other

    cs.CE

    ChiMera: Learning with noisy labels by contrasting mixed-up augmentations

    Authors: Zixuan Liu, Xin Zhang, Junjun He, Dan Fu, Dimitris Samaras, Robby Tan, Xiao Wang, Sheng Wang

    Abstract: Learning with noisy labels has been studied to address incorrect label annotations in real-world applications. In this paper, we present ChiMera, a two-stage learning-from-noisy-labels framework based on semi-supervised learning, developed based on a novel contrastive learning technique MixCLR. The key idea of MixCLR is to learn and refine the representations of mixed augmentations from two differ… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

  29. arXiv:2309.13294  [pdf, other

    cs.CV

    MP-MVS: Multi-Scale Windows PatchMatch and Planar Prior Multi-View Stereo

    Authors: Rongxuan Tan, Qing Wang, Xueyan Wang, Chao Yan, Yang Sun, Youyang Feng

    Abstract: Significant strides have been made in enhancing the accuracy of Multi-View Stereo (MVS)-based 3D reconstruction. However, untextured areas with unstable photometric consistency often remain incompletely reconstructed. In this paper, we propose a resilient and effective multi-view stereo approach (MP-MVS). We design a multi-scale windows PatchMatch (mPM) to obtain reliable depth of untextured areas… ▽ More

    Submitted 23 September, 2023; originally announced September 2023.

  30. arXiv:2309.12183  [pdf, other

    cs.CV

    ORTexME: Occlusion-Robust Human Shape and Pose via Temporal Average Texture and Mesh Encoding

    Authors: Yu Cheng, Bo Wang, Robby T. Tan

    Abstract: In 3D human shape and pose estimation from a monocular video, models trained with limited labeled data cannot generalize well to videos with occlusion, which is common in the wild videos. The recent human neural rendering approaches focusing on novel view synthesis initialized by the off-the-shelf human shape and pose methods have the potential to correct the initial human shape. However, the exis… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

    Comments: 8 pages, 8 figures

  31. arXiv:2309.09123  [pdf, other

    cs.LG cs.AI

    Conditional Mutual Information Constrained Deep Learning for Classification

    Authors: En-Hui Yang, Shayan Mohajer Hamidi, Linfeng Ye, Renhao Tan, Beverly Yang

    Abstract: The concepts of conditional mutual information (CMI) and normalized conditional mutual information (NCMI) are introduced to measure the concentration and separation performance of a classification deep neural network (DNN) in the output probability distribution space of the DNN, where CMI and the ratio between CMI and NCMI represent the intra-class concentration and inter-class separation of the D… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  32. arXiv:2308.16741  [pdf, other

    cs.AI cs.CV

    Socratis: Are large multimodal models emotionally aware?

    Authors: Katherine Deng, Arijit Ray, Reuben Tan, Saadia Gabriel, Bryan A. Plummer, Kate Saenko

    Abstract: Existing emotion prediction benchmarks contain coarse emotion labels which do not consider the diversity of emotions that an image and text can elicit in humans due to various reasons. Learning diverse reactions to multimodal content is important as intelligent machines take a central role in generating and delivering content to society. To address this gap, we propose Socratis, a societal reactio… ▽ More

    Submitted 2 November, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

    Comments: ICCV 2023 WECIA

  33. arXiv:2308.08942  [pdf, other

    cs.CV

    Auxiliary Tasks Benefit 3D Skeleton-based Human Motion Prediction

    Authors: Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Xinchao Wang, Yanfeng Wang

    Abstract: Exploring spatial-temporal dependencies from observed motions is one of the core challenges of human motion prediction. Previous methods mainly focus on dedicated network structures to model the spatial and temporal dependencies. This paper considers a new direction by introducing a model learning framework with auxiliary tasks. In our auxiliary tasks, partial body joints' coordinates are corrupte… ▽ More

    Submitted 2 September, 2023; v1 submitted 17 August, 2023; originally announced August 2023.

    Comments: Accpeted to ICCV2023

  34. arXiv:2308.01738  [pdf, other

    cs.CV

    Enhancing Visibility in Nighttime Haze Images Using Guided APSF and Gradient Adaptive Convolution

    Authors: Yeying Jin, Beibei Lin, Wending Yan, Yuan Yuan, Wei Ye, Robby T. Tan

    Abstract: Visibility in hazy nighttime scenes is frequently reduced by multiple factors, including low light, intense glow, light scattering, and the presence of multicolored light sources. Existing nighttime dehazing methods often struggle with handling glow or low-light conditions, resulting in either excessively dark visuals or unsuppressed glow outputs. In this paper, we enhance the visibility from a si… ▽ More

    Submitted 21 January, 2024; v1 submitted 3 August, 2023; originally announced August 2023.

    Comments: Accepted to ACM'MM2023, https://github.com/jinyeying/nighttime_dehaze

    Journal ref: Published in ACM'MM2023

  35. Uncertainty-Encoded Multi-Modal Fusion for Robust Object Detection in Autonomous Driving

    Authors: Yang Lou, Qun Song, Qian Xu, Rui Tan, Jianping Wang

    Abstract: Multi-modal fusion has shown initial promising results for object detection of autonomous driving perception. However, many existing fusion schemes do not consider the quality of each fusion input and may suffer from adverse conditions on one or more sensors. While predictive uncertainty has been applied to characterize single-modal object detection performance at run time, incorporating uncertain… ▽ More

    Submitted 30 July, 2023; originally announced July 2023.

    Comments: In proceedings of the 26th European Conference on Artificial Intelligence ECAI 2023. 8 pages + 2 appendix pages

  36. arXiv:2307.12854  [pdf, other

    cs.CV

    Multiscale Video Pretraining for Long-Term Activity Forecasting

    Authors: Reuben Tan, Matthias De Lange, Michael Iuzzolino, Bryan A. Plummer, Kate Saenko, Karl Ridgeway, Lorenzo Torresani

    Abstract: Long-term activity forecasting is an especially challenging research problem because it requires understanding the temporal relationships between observed actions, as well as the variability and complexity of human activities. Despite relying on strong supervision via expensive human annotations, state-of-the-art forecasting approaches often generalize poorly to unseen data. To alleviate this issu… ▽ More

    Submitted 24 July, 2023; originally announced July 2023.

  37. arXiv:2307.05784  [pdf, other

    cs.CV cs.AI

    EgoAdapt: A multi-stream evaluation study of adaptation to real-world egocentric user video

    Authors: Matthias De Lange, Hamid Eghbalzadeh, Reuben Tan, Michael Iuzzolino, Franziska Meier, Karl Ridgeway

    Abstract: In egocentric action recognition a single population model is typically trained and subsequently embodied on a head-mounted device, such as an augmented reality headset. While this model remains static for new users and environments, we introduce an adaptive paradigm of two phases, where after pretraining a population model, the model adapts on-device and online to the user's experience. This sett… ▽ More

    Submitted 11 July, 2023; originally announced July 2023.

    Comments: Preprint

  38. arXiv:2305.12228  [pdf, other

    cs.CL

    Dynamic Transformers Provide a False Sense of Efficiency

    Authors: Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan, Haizhou Li

    Abstract: Despite much success in natural language processing (NLP), pre-trained language models typically lead to a high computational cost during inference. Multi-exit is a mainstream approach to address this issue by making a trade-off between efficiency and accuracy, where the saving of computation comes from an early exit. However, whether such saving from early-exiting is robust remains unknown. Motiv… ▽ More

    Submitted 20 May, 2023; originally announced May 2023.

    Comments: Accepted by ACL2023

  39. arXiv:2305.11522  [pdf, other

    cs.CV

    DSFNet: Dual Space Fusion Network for Occlusion-Robust 3D Dense Face Alignment

    Authors: Heyuan Li, Bo Wang, Yu Cheng, Mohan Kankanhalli, Robby T. Tan

    Abstract: Sensitivity to severe occlusion and large view angles limits the usage scenarios of the existing monocular 3D dense face alignment methods. The state-of-the-art 3DMM-based method, directly regresses the model's coefficients, underutilizing the low-level 2D spatial and semantic information, which can actually offer cues for face shape and orientation. In this work, we demonstrate how modeling 3D fa… ▽ More

    Submitted 19 May, 2023; originally announced May 2023.

    Comments: Accepted into CVPR'23

  40. arXiv:2305.01754  [pdf, other

    cs.LG physics.chem-ph

    Single-model uncertainty quantification in neural network potentials does not consistently outperform model ensembles

    Authors: Aik Rui Tan, Shingo Urata, Samuel Goldman, Johannes C. B. Dietschreit, Rafael Gómez-Bombarelli

    Abstract: Neural networks (NNs) often assign high confidence to their predictions, even for points far out-of-distribution, making uncertainty quantification (UQ) a challenge. When they are employed to model interatomic potentials in materials systems, this problem leads to unphysical structures that disrupt simulations, or to biased statistics and dynamics that do not reflect the true physics. Differentiab… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

    Comments: 27 pages, 4 figures, Supporting Information (22 pages)

  41. arXiv:2303.17480  [pdf, other

    cs.CV cs.AI eess.IV

    Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

    Authors: Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li

    Abstract: Talking face generation, also known as speech-to-lip generation, reconstructs facial motions concerning lips given coherent speech input. The previous studies revealed the importance of lip-speech synchronization and visual quality. Despite much progress, they hardly focus on the content of lip movements i.e., the visual intelligibility of the spoken words, which is an important aspect of generati… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

    Comments: accepted by CVPR 2023

  42. arXiv:2303.16342  [pdf, other

    cs.CV cs.AI cs.CL

    Language-Guided Audio-Visual Source Separation via Trimodal Consistency

    Authors: Reuben Tan, Arijit Ray, Andrea Burns, Bryan A. Plummer, Justin Salamon, Oriol Nieto, Bryan Russell, Kate Saenko

    Abstract: We propose a self-supervised approach for learning to perform audio source separation in videos based on natural language queries, using only unlabeled video and audio pairs as training data. A key challenge in this task is learning to associate the linguistic description of a sound-emitting object to its visual features and the corresponding components of the audio waveform, all without access to… ▽ More

    Submitted 23 September, 2023; v1 submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted at CVPR 2023

  43. arXiv:2303.13853  [pdf, other

    cs.CV

    2PCNet: Two-Phase Consistency Training for Day-to-Night Unsupervised Domain Adaptive Object Detection

    Authors: Mikhail Kennerley, Jian-Gang Wang, Bharadwaj Veeravalli, Robby T. Tan

    Abstract: Object detection at night is a challenging problem due to the absence of night image annotations. Despite several domain adaptation methods, achieving high-precision results remains an issue. False-positive error propagation is still observed in methods using the well-established student-teacher framework, particularly for small-scale and low-light objects. This paper proposes a two-phase consiste… ▽ More

    Submitted 24 March, 2023; originally announced March 2023.

    Comments: Accepted into CVPR'23

  44. arXiv:2303.12798  [pdf, ps, other

    cs.NI cs.LG eess.SY

    Interpersonal Distance Tracking with mmWave Radar and IMUs

    Authors: Yimin Dai, Xian Shuai, Rui Tan, Guoliang Xing

    Abstract: Tracking interpersonal distances is essential for real-time social distancing management and {\em ex-post} contact tracing to prevent spreads of contagious diseases. Bluetooth neighbor discovery has been employed for such purposes in combating COVID-19, but does not provide satisfactory spatiotemporal resolutions. This paper presents ImmTrack, a system that uses a millimeter wave radar and exploit… ▽ More

    Submitted 28 February, 2023; originally announced March 2023.

  45. arXiv:2303.10876  [pdf, other

    cs.CV cs.MA

    EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning

    Authors: Chenxin Xu, Robby T. Tan, Yuhong Tan, Siheng Chen, Yu Guang Wang, Xinchao Wang, Yanfeng Wang

    Abstract: Learning to predict agent motions with relationship reasoning is important for many applications. In motion prediction tasks, maintaining motion equivariance under Euclidean geometric transformations and invariance of agent interaction is a critical and fundamental principle. However, such equivariance and invariance properties are overlooked by most existing methods. To fill this gap, we propose… ▽ More

    Submitted 27 March, 2023; v1 submitted 20 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023

  46. arXiv:2211.14751  [pdf, other

    cs.CV

    Estimating Reflectance Layer from A Single Image: Integrating Reflectance Guidance and Shadow/Specular Aware Learning

    Authors: Yeying Jin, Ruoteng Li, Wenhan Yang, Robby T. Tan

    Abstract: Estimating the reflectance layer from a single image is a challenging task. It becomes more challenging when the input image contains shadows or specular highlights, which often render an inaccurate estimate of the reflectance layer. Therefore, we propose a two-stage learning method, including reflectance guidance and a Shadow/Specular-Aware (S-Aware) network to tackle the problem. In the first st… ▽ More

    Submitted 5 August, 2023; v1 submitted 27 November, 2022; originally announced November 2022.

    Comments: Accepted to AAAI2023, https://github.com/jinyeying/S-Aware-network

    Journal ref: published AAAI2023

  47. arXiv:2211.13409  [pdf, other

    cs.CV

    Object Detection in Foggy Scenes by Embedding Depth and Reconstruction into Domain Adaptation

    Authors: Xin Yang, Michael Bi Mi, Yuan Yuan, Xin Wang, Robby T. Tan

    Abstract: Most existing domain adaptation (DA) methods align the features based on the domain feature distributions and ignore aspects related to fog, background and target objects, rendering suboptimal performance. In our DA framework, we retain the depth and background information during the domain feature alignment. A consistency loss between the generated depth and fog transmission map is introduced to… ▽ More

    Submitted 23 November, 2022; originally announced November 2022.

    Comments: Accepted by ACCV

  48. arXiv:2211.08772  [pdf, other

    cs.CV

    MIMT: Multi-Illuminant Color Constancy via Multi-Task Local Surface and Light Color Learning

    Authors: Shuwei Li, Jikai Wang, Michael S. Brown, Robby T. Tan

    Abstract: The assumption of a uniform light color distribution is no longer applicable in scenes that have multiple light colors. Most color constancy methods are designed to deal with a single light color, and thus are erroneous when applied to multiple light colors. The spatial variability in multiple light colors causes the color constancy problem to be more challenging and requires the extraction of loc… ▽ More

    Submitted 22 August, 2023; v1 submitted 16 November, 2022; originally announced November 2022.

    Comments: 8 pages, 6 figures

  49. arXiv:2211.08089  [pdf, other

    cs.CV

    DeS3: Adaptive Attention-driven Self and Soft Shadow Removal using ViT Similarity

    Authors: Yeying Jin, Wei Ye, Wenhan Yang, Yuan Yuan, Robby T. Tan

    Abstract: Removing soft and self shadows that lack clear boundaries from a single image is still challenging. Self shadows are shadows that are cast on the object itself. Most existing methods rely on binary shadow masks, without considering the ambiguous boundaries of soft and self shadows. In this paper, we present DeS3, a method that removes hard, soft and self shadows based on adaptive attention and ViT… ▽ More

    Submitted 14 April, 2024; v1 submitted 15 November, 2022; originally announced November 2022.

    Comments: Accepted to AAAI2024, diffusion shadow removal, \url{https://github.com/jinyeying/DeS3_Deshadow}

  50. arXiv:2211.08007  [pdf, other

    cs.CV

    Uncertainty-aware Gait Recognition via Learning from Dirichlet Distribution-based Evidence

    Authors: Beibei Lin, Chen Liu, Ming Wang, Lincheng Li, Shunli Zhang, Robby T. Tan, Xin Yu

    Abstract: Existing gait recognition frameworks retrieve an identity in the gallery based on the distance between a probe sample and the identities in the gallery. However, existing methods often neglect that the gallery may not contain identities corresponding to the probes, leading to recognition errors rather than raising an alarm. In this paper, we introduce a novel uncertainty-aware gait recognition met… ▽ More

    Submitted 13 October, 2023; v1 submitted 15 November, 2022; originally announced November 2022.