Skip to main content

Showing 1–50 of 123 results for author: Hsu, H

  1. arXiv:2406.10923  [pdf, other

    cs.CV cs.CL cs.LG

    Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies

    Authors: Hung-Ting Su, Chun-Tong Chao, Ya-Ching Hsu, Xudong Lin, Yulei Niu, Hung-Yi Lee, Winston H. Hsu

    Abstract: Large Language Models (LLMs) have demonstrated effectiveness not only in language tasks but also in video reasoning. This paper introduces a novel dataset, Tropes in Movies (TiM), designed as a testbed for exploring two critical yet previously overlooked video reasoning skills: (1) Abstract Perception: understanding and tokenizing abstract concepts in videos, and (2) Long-range Compositional Reaso… ▽ More

    Submitted 16 June, 2024; originally announced June 2024.

    Comments: Project page: https://ander1119.github.io/TiM

  2. arXiv:2406.00761  [pdf, other

    cs.LG cs.AI

    Shared-unique Features and Task-aware Prioritized Sampling on Multi-task Reinforcement Learning

    Authors: Po-Shao Lin, Jia-Fong Yeh, Yi-Ting Chen, Winston H. Hsu

    Abstract: We observe that current state-of-the-art (SOTA) methods suffer from the performance imbalance issue when performing multi-task reinforcement learning (MTRL) tasks. While these methods may achieve impressive performance on average, they perform extremely poorly on a few tasks. To address this, we propose a new and effective method called STARS, which consists of two novel strategies: a shared-uniqu… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.

    Comments: The first two authors contribute equally

  3. arXiv:2405.17507  [pdf, other

    cs.LG cs.AI cs.NI

    Enhancing Sustainable Urban Mobility Prediction with Telecom Data: A Spatio-Temporal Framework Approach

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: Traditional traffic prediction, limited by the scope of sensor data, falls short in comprehensive traffic management. Mobile networks offer a promising alternative using network activity counts, but these lack crucial directionality. Thus, we present the TeltoMob dataset, featuring undirected telecom counts and corresponding directional flows, to predict directional mobility flows on roadways. To… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

    Comments: 8 Figures, 5 Tables. Just accepted by IJCAI (to appear)

  4. arXiv:2405.16545  [pdf, other

    cs.RO

    VICtoR: Learning Hierarchical Vision-Instruction Correlation Rewards for Long-horizon Manipulation

    Authors: Kuo-Han Hung, Pang-Chi Lo, Jia-Fong Yeh, Han-Yuan Hsu, Yi-Ting Chen, Winston H. Hsu

    Abstract: We study reward models for long-horizon manipulation tasks by learning from action-free videos and language instructions, which we term the visual-instruction correlation (VIC) problem. Recent advancements in cross-modality modeling have highlighted the potential of reward modeling through visual and language correlations. However, existing VIC methods face challenges in learning rewards for long-… ▽ More

    Submitted 26 May, 2024; originally announced May 2024.

  5. arXiv:2405.14981  [pdf, other

    cs.LG

    MaSS: Multi-attribute Selective Suppression for Utility-preserving Data Transformation from an Information-theoretic Perspective

    Authors: Yizhuo Chen, Chun-Fu Chen, Hsiang Hsu, Shaohan Hu, Marco Pistoia, Tarek Abdelzaher

    Abstract: The growing richness of large-scale datasets has been crucial in driving the rapid advancement and wide adoption of machine learning technologies. The massive collection and usage of data, however, pose an increasing risk for people's private and sensitive information due to either inadvertent mishandling or malicious exploitation. Besides legislative solutions, many technical approaches have been… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

    Comments: ICML 2024

  6. arXiv:2405.11478  [pdf, other

    cs.CV eess.IV

    Unsupervised Image Prior via Prompt Learning and CLIP Semantic Guidance for Low-Light Image Enhancement

    Authors: Igor Morawski, Kai He, Shusil Dangi, Winston H. Hsu

    Abstract: Currently, low-light conditions present a significant challenge for machine cognition. In this paper, rather than optimizing models by assuming that human and machine cognition are correlated, we use zero-reference low-light enhancement to improve the performance of downstream task models. We propose to improve the zero-reference low-light enhancement method by leveraging the rich visual-linguisti… ▽ More

    Submitted 19 May, 2024; originally announced May 2024.

    Comments: Accepted to CVPR 2024 Workshop NTIRE: New Trends in Image Restoration and Enhancement workshop and Challenges

  7. arXiv:2404.10728  [pdf, other

    cs.LG stat.ML

    Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

    Authors: Hao-Lun Hsu, Weixin Wang, Miroslav Pajic, Pan Xu

    Abstract: We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thompson Sampling (TS)-type algorithms, CoopTS-PHE and CoopTS-LMC, incorporating the perturbed-history exploration (PHE) strategy and the Langevin M… ▽ More

    Submitted 16 April, 2024; originally announced April 2024.

    Comments: 80 pages, 14 figures, 1 table. Hao-Lun Hsu and Weixin Wang contributed equally to this work

  8. arXiv:2403.16451  [pdf, other

    cs.LG cs.AI

    DeepMachining: Online Prediction of Machining Errors of Lathe Machines

    Authors: Xiang-Li Lu, Hwai-Jung Hsu, Che-Wei Chou, H. T. Kung, Chen-Hsin Lee, Sheng-Mao Cheng

    Abstract: We describe DeepMachining, a deep learning-based AI system for online prediction of machining errors of lathe machine operations. We have built and evaluated DeepMachining based on manufacturing data from factories. Specifically, we first pretrain a deep learning model for a given lathe machine's operations to learn the salient features of machining states. Then, we fine-tune the pretrained model… ▽ More

    Submitted 28 March, 2024; v1 submitted 25 March, 2024; originally announced March 2024.

  9. arXiv:2403.12991  [pdf, other

    cs.CV cs.LG

    Tel2Veh: Fusion of Telecom Data and Vehicle Flow to Predict Camera-Free Traffic via a Spatio-Temporal Framework

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: Vehicle flow, a crucial indicator for transportation, is often limited by detector coverage. With the advent of extensive mobile network coverage, we can leverage mobile user activities, or cellular traffic, on roadways as a proxy for vehicle flow. However, as counts of cellular traffic may not directly align with vehicle flow due to data from various user types, we present a new task: predicting… ▽ More

    Submitted 5 March, 2024; originally announced March 2024.

    Comments: 4 pages, 5 figures, 4 tables. Accepted by WWW'24, to appear

  10. arXiv:2403.10542  [pdf, other

    cs.AR cs.CV

    SF-MMCN: A Low Power Re-configurable Server Flow Convolution Neural Network Accelerator

    Authors: Huan-Ke Hsu, I-Chyn Wey, T. Hui Teo

    Abstract: Convolution Neural Network (CNN) accelerators have been developed rapidly in recent studies. There are lots of CNN accelerators equipped with a variety of function and algorithm which results in low power and high-speed performances. However, the scale of a PE array in traditional CNN accelerators is too big, which costs the most energy consumption while conducting multiply and accumulation (MAC)… ▽ More

    Submitted 8 March, 2024; originally announced March 2024.

    Comments: 16 pages, 16 figures

  11. arXiv:2403.06814  [pdf, other

    cs.LG q-bio.NC

    ε-Neural Thompson Sampling of Deep Brain Stimulation for Parkinson Disease Treatment

    Authors: Hao-Lun Hsu, Qitong Gao, Miroslav Pajic

    Abstract: Deep Brain Stimulation (DBS) stands as an effective intervention for alleviating the motor symptoms of Parkinson's disease (PD). Traditional commercial DBS devices are only able to deliver fixed-frequency periodic pulses to the basal ganglia (BG) regions of the brain, i.e., continuous DBS (cDBS). However, they in general suffer from energy inefficiency and side effects, such as speech impairment.… ▽ More

    Submitted 11 March, 2024; originally announced March 2024.

    Comments: 11 pages, 12 figures, 2 tables. To appear in the 15th ACM/IEEE International Conference on Cyber-Physical Systems (ICCPS'2024)

  12. arXiv:2402.04129  [pdf, other

    cs.LG cs.CV

    OVOR: OnePrompt with Virtual Outlier Regularization for Rehearsal-Free Class-Incremental Learning

    Authors: Wei-Cheng Huang, Chun-Fu Chen, Hsiang Hsu

    Abstract: Recent works have shown that by using large pre-trained models along with learnable prompts, rehearsal-free methods for class-incremental learning (CIL) settings can achieve superior performance to prominent rehearsal-based ones. Rehearsal-free CIL methods struggle with distinguishing classes from different tasks, as those are not trained together. In this work we propose a regularization method b… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted by ICLR 2024

  13. arXiv:2402.03860  [pdf, other

    cs.RO

    AED: Adaptable Error Detection for Few-shot Imitation Policy

    Authors: Jia-Fong Yeh, Kuo-Han Hung, Pang-Chi Lo, Chi-Ming Chung, Tsung-Han Wu, Hung-Ting Su, Yi-Ting Chen, Winston H. Hsu

    Abstract: We introduce a new task called Adaptable Error Detection (AED), which aims to identify behavior errors in few-shot imitation (FSI) policies based on visual observations in novel environments. The potential to cause serious damage to surrounding areas limits the application of FSI policies in real-world scenarios. Thus, a robust system is necessary to notify operators when FSI policies are inconsis… ▽ More

    Submitted 25 May, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

  14. arXiv:2402.00728  [pdf, other

    cs.LG stat.ML

    Dropout-Based Rashomon Set Exploration for Efficient Predictive Multiplicity Estimation

    Authors: Hsiang Hsu, Guihong Li, Shaohan Hu, Chun-Fu, Chen

    Abstract: Predictive multiplicity refers to the phenomenon in which classification tasks may admit multiple competing models that achieve almost-equally-optimal performance, yet generate conflicting outputs for individual samples. This presents significant concerns, as it can potentially result in systemic exclusion, inexplicable discrimination, and unfairness in practical applications. Measuring and mitiga… ▽ More

    Submitted 1 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  15. arXiv:2402.00351  [pdf, other

    cs.LG cs.CV

    Machine Unlearning for Image-to-Image Generative Models

    Authors: Guihong Li, Hsiang Hsu, Chun-Fu Chen, Radu Marculescu

    Abstract: Machine unlearning has emerged as a new paradigm to deliberately forget data samples from a given model in order to adhere to stringent regulations. However, existing machine unlearning methods have been primarily focused on classification models, leaving the landscape of unlearning for generative models relatively unexplored. This paper serves as a bridge, addressing the gap by providing a unifyi… ▽ More

    Submitted 1 February, 2024; v1 submitted 1 February, 2024; originally announced February 2024.

    Comments: ICLR 2024

  16. arXiv:2401.03138  [pdf, other

    cs.LG cs.AI

    TelTrans: Applying Multi-Type Telecom Data to Transportation Evaluation and Prediction via Multifaceted Graph Modeling

    Authors: ChungYi Lin, Shen-Lung Tung, Hung-Ting Su, Winston H. Hsu

    Abstract: To address the limitations of traffic prediction from location-bound detectors, we present Geographical Cellular Traffic (GCT) flow, a novel data source that leverages the extensive coverage of cellular traffic to capture mobility patterns. Our extensive analysis validates its potential for transportation. Focusing on vehicle-related GCT flow prediction, we propose a graph neural network that inte… ▽ More

    Submitted 6 January, 2024; originally announced January 2024.

    Comments: 7 pages, 7 figures, 4 tables. Accepted by AAAI-24-IAAI, to appear

  17. arXiv:2312.15549  [pdf, other

    cs.LG cs.MA math.ST stat.ML

    Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

    Authors: Tianyuan Jin, Hao-Lun Hsu, William Chang, Pan Xu

    Abstract: We study the multi-agent multi-armed bandit (MAMAB) problem, where $m$ agents are factored into $ρ$ overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed of individual arms for each agent) and receives a reward according to the hypergraph structure. Specifically, we assume there is a local… ▽ More

    Submitted 24 December, 2023; originally announced December 2023.

    Comments: 22 pages, 7 figures, 2 tables. To appear in the proceedings of the 38th Annual AAAI Conference on Artificial Intelligence (AAAI'2024)

  18. arXiv:2312.14923  [pdf, other

    cs.LG

    Fast-NTK: Parameter-Efficient Unlearning for Large-Scale Models

    Authors: Guihong Li, Hsiang Hsu, Chun-Fu Chen, Radu Marculescu

    Abstract: The rapid growth of machine learning has spurred legislative initiatives such as ``the Right to be Forgotten,'' allowing users to request data removal. In response, ``machine unlearning'' proposes the selective removal of unwanted data without the need for retraining from scratch. While the Neural-Tangent-Kernel-based (NTK-based) unlearning method excels in performance, it suffers from significant… ▽ More

    Submitted 22 December, 2023; originally announced December 2023.

    Comments: 6 pages, 1 figure

  19. arXiv:2312.06519  [pdf, other

    cs.LG cs.AI cs.SI

    A GAN Approach for Node Embedding in Heterogeneous Graphs Using Subgraph Sampling

    Authors: Hung Chun Hsu, Bo-Jun Wu, Ming-Yi Hong, Che Lin, Chih-Yu Wang

    Abstract: Our research addresses class imbalance issues in heterogeneous graphs using graph neural networks (GNNs). We propose a novel method combining the strengths of Generative Adversarial Networks (GANs) with GNNs, creating synthetic nodes and edges that effectively balance the dataset. This approach directly targets and rectifies imbalances at the data level. The proposed framework resolves issues such… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  20. arXiv:2311.02338  [pdf

    cs.CV cs.AI cs.LG

    Potato Leaf Disease Classification using Deep Learning: A Convolutional Neural Network Approach

    Authors: Utkarsh Yashwant Tambe, A. Shobanadevi, A. Shanthini, Hsiu-Chun Hsu

    Abstract: In this study, a Convolutional Neural Network (CNN) is used to classify potato leaf illnesses using Deep Learning. The suggested approach entails preprocessing the leaf image data, training a CNN model on that data, and assessing the model's success on a test set. The experimental findings show that the CNN model, with an overall accuracy of 99.1%, is highly accurate in identifying two kinds of po… ▽ More

    Submitted 4 November, 2023; originally announced November 2023.

    Comments: Accepted at the International Conference on Recent Trends in Data Science and its Applications (ICRTDA 2023), 6 pages, 6 figures, 1 table

  21. arXiv:2310.03821  [pdf, other

    cs.CV cs.RO

    WLST: Weak Labels Guided Self-training for Weakly-supervised Domain Adaptation on 3D Object Detection

    Authors: Tsung-Lin Tsou, Tsung-Han Wu, Winston H. Hsu

    Abstract: In the field of domain adaptation (DA) on 3D object detection, most of the work is dedicated to unsupervised domain adaptation (UDA). Yet, without any target annotations, the performance gap between the UDA approaches and the fully-supervised approach is still noticeable, which is impractical for real-world applications. On the other hand, weakly-supervised domain adaptation (WDA) is an underexplo… ▽ More

    Submitted 7 February, 2024; v1 submitted 5 October, 2023; originally announced October 2023.

    Comments: Accepted to ICRA 2024. Code is available at https://github.com/jacky121298/WLST

  22. arXiv:2308.03243  [pdf, other

    cs.LG

    Unsupervised Adversarial Detection without Extra Model: Training Loss Should Change

    Authors: Chien Cheng Chyou, Hung-Ting Su, Winston H. Hsu

    Abstract: Adversarial robustness poses a critical challenge in the deployment of deep learning models for real-world applications. Traditional approaches to adversarial training and supervised detection rely on prior knowledge of attack types and access to labeled training data, which is often impractical. Existing unsupervised adversarial detection methods identify whether the target model works properly,… ▽ More

    Submitted 6 August, 2023; originally announced August 2023.

    Comments: AdvML in ICML 2023 code:https://github.com/CycleBooster/Unsupervised-adversarial-detection-without-extra-model

  23. arXiv:2306.09425  [pdf, other

    cs.LG cs.CY cs.IT

    Arbitrariness Lies Beyond the Fairness-Accuracy Frontier

    Authors: Carol Xuan Long, Hsiang Hsu, Wael Alghamdi, Flavio P. Calmon

    Abstract: Machine learning tasks may admit multiple competing models that achieve similar performance yet produce conflicting outputs for individual samples -- a phenomenon known as predictive multiplicity. We demonstrate that fairness interventions in machine learning optimized solely for group fairness and accuracy can exacerbate predictive multiplicity. Consequently, state-of-the-art fairness interventio… ▽ More

    Submitted 15 June, 2023; originally announced June 2023.

  24. arXiv:2306.07408  [pdf, other

    cs.LG cs.AI cs.RO

    Robust Reinforcement Learning through Efficient Adversarial Herding

    Authors: Juncheng Dong, Hao-Lun Hsu, Qitong Gao, Vahid Tarokh, Miroslav Pajic

    Abstract: Although reinforcement learning (RL) is considered the gold standard for policy design, it may not always provide a robust solution in various scenarios. This can result in severe performance degradation when the environment is exposed to potential disturbances. Adversarial training using a two-player max-min game has been proven effective in enhancing the robustness of RL agents. In this work, we… ▽ More

    Submitted 12 June, 2023; originally announced June 2023.

  25. arXiv:2305.12976  [pdf, other

    cs.IR cs.LG

    Attentive Graph-based Text-aware Preference Modeling for Top-N Recommendation

    Authors: Ming-Hao Juan, Pu-Jen Cheng, Hui-Neng Hsu, Pin-Hsin Hsiao

    Abstract: Textual data are commonly used as auxiliary information for modeling user preference nowadays. While many prior works utilize user reviews for rating prediction, few focus on top-N recommendation, and even few try to incorporate item textual contents such as title and description. Though delivering promising performance for rating prediction, we empirically find that many review-based models canno… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

  26. arXiv:2304.03754  [pdf, other

    cs.CL cs.CV

    Language Models are Causal Knowledge Extractors for Zero-shot Video Question Answering

    Authors: Hung-Ting Su, Yulei Niu, Xudong Lin, Winston H. Hsu, Shih-Fu Chang

    Abstract: Causal Video Question Answering (CVidQA) queries not only association or temporal relations but also causal relations in a video. Existing question synthesis methods pre-trained question generation (QG) systems on reading comprehension datasets with text descriptions as inputs. However, QG models only learn to ask association questions (e.g., ``what is someone doing...'') and result in inferior pe… ▽ More

    Submitted 7 April, 2023; originally announced April 2023.

    Comments: CVPR 2023 Workshop L3D-IVU

  27. arXiv:2303.16637  [pdf, other

    cs.CV

    MuRAL: Multi-Scale Region-based Active Learning for Object Detection

    Authors: Yi-Syuan Liou, Tsung-Han Wu, Jia-Fong Yeh, Wen-Chin Chen, Winston H. Hsu

    Abstract: Obtaining large-scale labeled object detection dataset can be costly and time-consuming, as it involves annotating images with bounding boxes and class labels. Thus, some specialized active learning methods have been proposed to reduce the cost by selecting either coarse-grained samples or fine-grained instances from unlabeled data for labeling. However, the former approaches suffer from redundant… ▽ More

    Submitted 29 March, 2023; originally announced March 2023.

  28. arXiv:2303.15937  [pdf, other

    cs.CV

    PosterLayout: A New Benchmark and Approach for Content-aware Visual-Textual Presentation Layout

    Authors: HsiaoYuan Hsu, Xiangteng He, Yuxin Peng, Hao Kong, Qing Zhang

    Abstract: Content-aware visual-textual presentation layout aims at arranging spatial space on the given canvas for pre-defined elements, including text, logo, and underlay, which is a key to automatic template-free creative graphic design. In practical applications, e.g., poster designs, the canvas is originally non-empty, and both inter-element relationships as well as inter-layer relationships should be c… ▽ More

    Submitted 28 March, 2023; originally announced March 2023.

    Comments: Accepted to CVPR 2023. Dataset and code are available at https://github.com/PKU-ICST-MIPL/PosterLayout-CVPR2023

  29. arXiv:2303.04027  [pdf, other

    cs.MM cs.RO

    BIRD-PCC: Bi-directional Range Image-based Deep LiDAR Point Cloud Compression

    Authors: Chia-Sheng Liu, Jia-Fong Yeh, Hao Hsu, Hung-Ting Su, Ming-Sui Lee, Winston H. Hsu

    Abstract: The large amount of data collected by LiDAR sensors brings the issue of LiDAR point cloud compression (PCC). Previous works on LiDAR PCC have used range image representations and followed the predictive coding paradigm to create a basic prototype of a coding framework. However, their prediction methods give an inaccurate result due to the negligence of invalid pixels in range images and the omissi… ▽ More

    Submitted 8 March, 2023; v1 submitted 7 March, 2023; originally announced March 2023.

    Comments: Accepted to ICASSP 2023

  30. arXiv:2302.14517  [pdf, other

    cs.LG cs.CR cs.CY stat.ML

    Arbitrary Decisions are a Hidden Cost of Differentially Private Training

    Authors: Bogdan Kulynych, Hsiang Hsu, Carmela Troncoso, Flavio P. Calmon

    Abstract: Mechanisms used in privacy-preserving machine learning often aim to guarantee differential privacy (DP) during model training. Practical DP-ensuring training methods use randomization when fitting model parameters to privacy-sensitive data (e.g., adding Gaussian noise to clipped gradients). We demonstrate that such randomization incurs predictive multiplicity: for a given input example, the output… ▽ More

    Submitted 15 May, 2023; v1 submitted 28 February, 2023; originally announced February 2023.

    Comments: To appear in ACM FAccT 2023

  31. arXiv:2212.08464  [pdf, other

    cs.CV

    Free-form 3D Scene Inpainting with Dual-stream GAN

    Authors: Ru-Fen Jheng, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

    Abstract: Nowadays, the need for user editing in a 3D scene has rapidly increased due to the development of AR and VR technology. However, the existing 3D scene completion task (and datasets) cannot suit the need because the missing regions in scenes are generated by the sensor limitation or object occlusion. Thus, we present a novel task named free-form 3D scene inpainting. Unlike scenes in previous 3D com… ▽ More

    Submitted 16 December, 2022; originally announced December 2022.

    Comments: BMVC 2022

  32. Data-driven identification and analysis of the glass transition in polymer melts

    Authors: Atreyee Banerjee, Hsiao-Ping Hsu, Kurt Kremer, Oleksandra Kukharenko

    Abstract: Understanding the nature of glass transition, as well as precise estimation of the glass transition temperature for polymeric materials, remain open questions in both experimental and theoretical polymer sciences. We propose a data-driven approach, which utilizes the high-resolution details accessible through the molecular dynamics simulation and considers the structural information of individual… ▽ More

    Submitted 1 August, 2023; v1 submitted 25 November, 2022; originally announced November 2022.

    Journal ref: ACS Macro Letters 2023 12 (6), 679-684

  33. arXiv:2210.15575  [pdf, other

    cs.LG cs.AI stat.ML

    A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs

    Authors: Hans Hao-Hsun Hsu, Yuesong Shen, Daniel Cremers

    Abstract: Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration e… ▽ More

    Submitted 27 October, 2022; originally announced October 2022.

    Comments: Presented at NeurIPS 2022 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2022)

  34. arXiv:2210.06391  [pdf, other

    cs.LG cs.AI

    What Makes Graph Neural Networks Miscalibrated?

    Authors: Hans Hao-Hsun Hsu, Yuesong Shen, Christian Tomani, Daniel Cremers

    Abstract: Given the importance of getting calibrated predictions and reliable uncertainty estimations, various post-hoc calibration methods have been developed for neural networks on standard multi-class classification tasks. However, these methods are not well suited for calibrating graph neural networks (GNNs), which presents unique challenges such as accounting for the graph structure and the graph-induc… ▽ More

    Submitted 12 October, 2022; originally announced October 2022.

    Comments: Accepted to NeurIPS 2022

  35. arXiv:2210.03941  [pdf, other

    cs.CV cs.CL

    Learning Fine-Grained Visual Understanding for Video Question Answering via Decoupling Spatial-Temporal Modeling

    Authors: Hsin-Ying Lee, Hung-Ting Su, Bing-Chen Tsai, Tsung-Han Wu, Jia-Fong Yeh, Winston H. Hsu

    Abstract: While recent large-scale video-language pre-training made great progress in video question answering, the design of spatial modeling of video-language models is less fine-grained than that of image-language models; existing practices of temporal modeling also suffer from weak and noisy alignment between modalities. To learn fine-grained visual understanding, we decouple spatial-temporal modeling a… ▽ More

    Submitted 8 October, 2022; originally announced October 2022.

    Comments: BMVC 2022. Code is available at https://github.com/shinying/dest

  36. arXiv:2210.02045  [pdf, other

    cs.CV cs.RO

    Coarse-to-Fine Point Cloud Registration with SE(3)-Equivariant Representations

    Authors: Cheng-Wei Lin, Tung-I Chen, Hsin-Ying Lee, Wen-Chin Chen, Winston H. Hsu

    Abstract: Point cloud registration is a crucial problem in computer vision and robotics. Existing methods either rely on matching local geometric features, which are sensitive to the pose differences, or leverage global shapes, which leads to inconsistency when facing distribution variances such as partial overlapping. Combining the advantages of both types of methods, we adopt a coarse-to-fine pipeline tha… ▽ More

    Submitted 4 March, 2023; v1 submitted 5 October, 2022; originally announced October 2022.

    Comments: ICRA 2023

  37. arXiv:2209.13507  [pdf, other

    cs.CV cs.RO

    CrossDTR: Cross-view and Depth-guided Transformers for 3D Object Detection

    Authors: Ching-Yu Tseng, Yi-Rong Chen, Hsin-Ying Lee, Tsung-Han Wu, Wen-Chin Chen, Winston H. Hsu

    Abstract: To achieve accurate 3D object detection at a low cost for autonomous driving, many multi-camera methods have been proposed and solved the occlusion problem of monocular approaches. However, due to the lack of accurate estimated depth, existing multi-camera methods often generate multiple bounding boxes along a ray of depth direction for difficult small objects such as pedestrians, resulting in an… ▽ More

    Submitted 3 February, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

    Comments: Accepted by IEEE International Conference on Robotics and Automation (ICRA) 2023. The code is available at https://github.com/sty61010/CrossDTR

  38. arXiv:2209.13274  [pdf, other

    cs.RO cs.CV

    Orbeez-SLAM: A Real-time Monocular Visual SLAM with ORB Features and NeRF-realized Mapping

    Authors: Chi-Ming Chung, Yang-Che Tseng, Ya-Ching Hsu, Xiang-Qian Shi, Yun-Hung Hua, Jia-Fong Yeh, Wen-Chin Chen, Yi-Ting Chen, Winston H. Hsu

    Abstract: A spatial AI that can perform complex tasks through visual signals and cooperate with humans is highly anticipated. To achieve this, we need a visual SLAM that easily adapts to new scenes without pre-training and generates dense maps for downstream tasks in real-time. None of the previous learning-based and non-learning-based visual SLAMs satisfy all needs due to the intrinsic limitations of their… ▽ More

    Submitted 31 January, 2023; v1 submitted 27 September, 2022; originally announced September 2022.

  39. arXiv:2209.10729  [pdf, other

    cs.LG cs.CV

    Fair Robust Active Learning by Joint Inconsistency

    Authors: Tsung-Han Wu, Hung-Ting Su, Shang-Tse Chen, Winston H. Hsu

    Abstract: Fairness and robustness play vital roles in trustworthy machine learning. Observing safety-critical needs in various annotation-expensive vision applications, we introduce a novel learning framework, Fair Robust Active Learning (FRAL), generalizing conventional active learning to fair and adversarial robust scenarios. This framework allows us to achieve standard and robust minimax fairness with li… ▽ More

    Submitted 16 November, 2022; v1 submitted 21 September, 2022; originally announced September 2022.

    Comments: 11 pages, 2 figures, 8 tables

  40. arXiv:2209.08864  [pdf, other

    cs.RO

    CFVS: Coarse-to-Fine Visual Servoing for 6-DoF Object-Agnostic Peg-In-Hole Assembly

    Authors: Bo-Siang Lu, Tung-I Chen, Hsin-Ying Lee, Winston H. Hsu

    Abstract: Robotic peg-in-hole assembly remains a challenging task due to its high accuracy demand. Previous work tends to simplify the problem by restricting the degree of freedom of the end-effector, or limiting the distance between the target and the initial pose position, which prevents them from being deployed in real-world manufacturing. Thus, we present a Coarse-to-Fine Visual Servoing (CFVS) peg-in-h… ▽ More

    Submitted 19 January, 2023; v1 submitted 19 September, 2022; originally announced September 2022.

    Comments: Accepted by ICRA 2023

  41. arXiv:2209.08423  [pdf, other

    cs.CV cs.LG

    Automated Segmentation and Recurrence Risk Prediction of Surgically Resected Lung Tumors with Adaptive Convolutional Neural Networks

    Authors: Marguerite B. Basta, Sarfaraz Hussein, Hsiang Hsu, Flavio P. Calmon

    Abstract: Lung cancer is the leading cause of cancer related mortality by a significant margin. While new technologies, such as image segmentation, have been paramount to improved detection and earlier diagnoses, there are still significant challenges in treating the disease. In particular, despite an increased number of curative resections, many postoperative patients still develop recurrent lesions. Conse… ▽ More

    Submitted 17 September, 2022; originally announced September 2022.

    Comments: 9 pages, 5 figures

  42. arXiv:2209.07581  [pdf, other

    physics.space-ph cs.LG eess.SP

    The Development of Spatial Attention U-Net for The Recovery of Ionospheric Measurements and The Extraction of Ionospheric Parameters

    Authors: Guan-Han Huang, Alexei V. Dmitriev, Chia-Hsien Lin, Yu-Chi Chang, Mon-Chai Hsieh, Enkhtuya Tsogtbaatar, Merlin M. Mendoza, Hao-Wei Hsu, Yu-Chiang Lin, Lung-Chih Tsai, Yung-Hui Li

    Abstract: We train a deep learning artificial neural network model, Spatial Attention U-Net to recover useful ionospheric signals from noisy ionogram data measured by Hualien's Vertical Incidence Pulsed Ionospheric Radar. Our results show that the model can well identify F2 layer ordinary and extraordinary modes (F2o, F2x) and the combined signals of the E layer (ordinary and extraordinary modes and sporadi… ▽ More

    Submitted 15 September, 2022; originally announced September 2022.

    Comments: 17 pages, 7 figures, 3 tables

    Journal ref: Radio Science 57 (2022) e2022RS007471

  43. Accelerating Material Design with the Generative Toolkit for Scientific Discovery

    Authors: Matteo Manica, Jannis Born, Joris Cadow, Dimitrios Christofidellis, Ashish Dave, Dean Clarke, Yves Gaetan Nana Teukam, Giorgio Giannone, Samuel C. Hoffman, Matthew Buchan, Vijil Chenthamarakshan, Timothy Donovan, Hsiang Han Hsu, Federico Zipoli, Oliver Schilter, Akihiro Kishimoto, Lisa Hamada, Inkit Padhi, Karl Wehden, Lauren McHugh, Alexy Khrabrov, Payel Das, Seiji Takeda, John R. Smith

    Abstract: With the growing availability of data within various scientific domains, generative models hold enormous potential to accelerate scientific discovery. They harness powerful representations learned from datasets to speed up the formulation of novel hypotheses with the potential to impact material discovery broadly. We present the Generative Toolkit for Scientific Discovery (GT4SD). This extensible… ▽ More

    Submitted 31 January, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

    Comments: 15 pages, 2 figures

    Journal ref: Nature Partner Journals (npj) Computational Materials 9, 69 (2023)

  44. arXiv:2207.03608  [pdf, other

    cs.CV

    GaitTAKE: Gait Recognition by Temporal Attention and Keypoint-guided Embedding

    Authors: Hung-Min Hsu, Yizhou Wang, Cheng-Yen Yang, Jenq-Neng Hwang, Hoang Le Uyen Thuc, Kwang-Ju Kim

    Abstract: Gait recognition, which refers to the recognition or identification of a person based on their body shape and walking styles, derived from video data captured from a distance, is widely used in crime prevention, forensic identification, and social security. However, to the best of our knowledge, most of the existing methods use appearance, posture and temporal feautures without considering a learn… ▽ More

    Submitted 12 July, 2022; v1 submitted 7 July, 2022; originally announced July 2022.

    Comments: IEEE International Conference on Image Processing 2022

  45. arXiv:2206.07801  [pdf, other

    cs.LG cs.CY cs.IT

    Beyond Adult and COMPAS: Fairness in Multi-Class Prediction

    Authors: Wael Alghamdi, Hsiang Hsu, Haewon Jeong, Hao Wang, P. Winston Michalak, Shahab Asoodeh, Flavio P. Calmon

    Abstract: We consider the problem of producing fair probabilistic classifiers for multi-class classification tasks. We formulate this problem in terms of "projecting" a pre-trained (and potentially unfair) classifier onto the set of models that satisfy target group-fairness requirements. The new, projected model is given by post-processing the outputs of the pre-trained classifier by a multiplicative factor… ▽ More

    Submitted 15 June, 2022; originally announced June 2022.

    Comments: 46 pages, 15 figures

  46. arXiv:2206.01295  [pdf, other

    cs.LG cs.IT stat.ML

    Rashomon Capacity: A Metric for Predictive Multiplicity in Classification

    Authors: Hsiang Hsu, Flavio du Pin Calmon

    Abstract: Predictive multiplicity occurs when classification models with statistically indistinguishable performances assign conflicting predictions to individual samples. When used for decision-making in applications of consequence (e.g., lending, education, criminal justice), models developed without regard for predictive multiplicity may result in unjustified and arbitrary decisions for specific individu… ▽ More

    Submitted 19 October, 2022; v1 submitted 2 June, 2022; originally announced June 2022.

    Comments: NeurIPS 2022 camera-ready version (34 pages, 23 figures, 2 tables)

  47. arXiv:2205.11804  [pdf, other

    cs.CV

    Package Theft Detection from Smart Home Security Cameras

    Authors: Hung-Min Hsu, Xinyu Yuan, Baohua Zhu, Zhongwei Cheng, Lin Chen

    Abstract: Package theft detection has been a challenging task mainly due to lack of training data and a wide variety of package theft cases in reality. In this paper, we propose a new Global and Local Fusion Package Theft Detection Embedding (GLF-PTDE) framework to generate package theft scores for each segment within a video to fulfill the real-world requirements on package theft detection. Moreover, we co… ▽ More

    Submitted 24 May, 2022; originally announced May 2022.

  48. arXiv:2205.03688  [pdf, other

    cs.CV

    GenISP: Neural ISP for Low-Light Machine Cognition

    Authors: Igor Morawski, Yu-An Chen, Yu-Sheng Lin, Shusil Dangi, Kai He, Winston H. Hsu

    Abstract: Object detection in low-light conditions remains a challenging but important problem with many practical implications. Some recent works show that, in low-light conditions, object detectors using raw image data are more robust than detectors using image data processed by a traditional ISP pipeline. To improve detection performance in low-light conditions, one can fine-tune the detector to use raw… ▽ More

    Submitted 7 May, 2022; originally announced May 2022.

    Comments: Accepted to CVPR 2022 Workshop NTIRE: New Trends in Image Restoration and Enhancement workshop and Challenges

  49. arXiv:2204.13696  [pdf, other

    cs.CV

    NeurMiPs: Neural Mixture of Planar Experts for View Synthesis

    Authors: Zhi-Hao Lin, Wei-Chiu Ma, Hao-Yu Hsu, Yu-Chiang Frank Wang, Shenlong Wang

    Abstract: We present Neural Mixtures of Planar Experts (NeurMiPs), a novel planar-based scene representation for modeling geometry and appearance. NeurMiPs leverages a collection of local planar experts in 3D space as the scene representation. Each planar expert consists of the parameters of the local rectangular shape representing geometry and a neural radiance field modeling the color and opacity. We rend… ▽ More

    Submitted 28 April, 2022; originally announced April 2022.

    Comments: CVPR 2022. Project page: https://zhihao-lin.github.io/neurmips/

  50. arXiv:2203.10981  [pdf, other

    cs.CV

    MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer

    Authors: Kuan-Chih Huang, Tsung-Han Wu, Hung-Ting Su, Winston H. Hsu

    Abstract: Monocular 3D object detection is an important yet challenging task in autonomous driving. Some existing methods leverage depth information from an off-the-shelf depth estimator to assist 3D detection, but suffer from the additional computational burden and achieve limited performance caused by inaccurate depth priors. To alleviate this, we propose MonoDTR, a novel end-to-end depth-aware transforme… ▽ More

    Submitted 28 March, 2022; v1 submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted to CVPR 2022