Skip to main content

Showing 1–50 of 91 results for author: Cha, W

  1. arXiv:2407.09030  [pdf, other

    eess.IV cs.CV

    CAMP: Continuous and Adaptive Learning Model in Pathology

    Authors: Anh Tien Nguyen, Keunho Byeon, Kyungeun Kim, Boram Song, Seoung Wan Chae, Jin Tae Kwak

    Abstract: There exist numerous diagnostic tasks in pathology. Conventional computational pathology formulates and tackles them as independent and individual image classification problems, thereby resulting in computational inefficiency and high costs. To address the challenges, we propose a generic, unified, and universal framework, called a continuous and adaptive learning model in pathology (CAMP), for pa… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Under review

  2. arXiv:2407.08027  [pdf, other

    cs.CV

    Fish-Vista: A Multi-Purpose Dataset for Understanding & Identification of Traits from Images

    Authors: Kazi Sajeed Mehrab, M. Maruf, Arka Daw, Harish Babu Manogaran, Abhilash Neog, Mridul Khurana, Bahadir Altintas, Yasin Bakis, Elizabeth G Campolongo, Matthew J Thompson, Xiaojun Wang, Hilmar Lapp, Wei-Lun Chao, Paula M. Mabee, Henry L. Bart Jr., Wasila Dahdul, Anuj Karpatne

    Abstract: Fishes are integral to both ecological systems and economic sectors, and studying fish traits is crucial for understanding biodiversity patterns and macro-evolution trends. To enable the analysis of visual traits from fish images, we introduce the Fish-Visual Trait Analysis (Fish-Vista) dataset - a large, annotated collection of about 60K fish images spanning 1900 different species, supporting sev… ▽ More

    Submitted 10 July, 2024; originally announced July 2024.

  3. arXiv:2406.16341  [pdf, other

    cs.CL

    EHRCon: Dataset for Checking Consistency between Unstructured Notes and Structured Tables in Electronic Health Records

    Authors: Yeonsu Kwon, Jiho Kim, Gyubok Lee, Seongsu Bae, Daeun Kyung, Wonchul Cha, Tom Pollard, Alistair Johnson, Edward Choi

    Abstract: Electronic Health Records (EHRs) are integral for storing comprehensive patient medical records, combining structured data (e.g., medications) with detailed clinical notes (e.g., physician notes). These elements are essential for straightforward data retrieval and provide deep, contextual insights into patient care. However, they often suffer from discrepancies due to unintuitive EHR system design… ▽ More

    Submitted 24 June, 2024; originally announced June 2024.

  4. arXiv:2406.02859   

    eess.AS cs.SD

    ConPCO: Preserving Phoneme Characteristics for Automatic Pronunciation Assessment Leveraging Contrastive Ordinal Regularization

    Authors: Bi-Cheng Yan, Wei-Cheng Chao, Jiun-Ting Li, Yi-Cheng Wang, Hsin-Wei Wang, Meng-Shin Lin, Berlin Chen

    Abstract: Automatic pronunciation assessment (APA) manages to evaluate the pronunciation proficiency of a second language (L2) learner in a target language. Existing efforts typically draw on regression models for proficiency score prediction, where the models are trained to estimate target values without explicitly accounting for phoneme-awareness in the feature space. In this paper, we propose a contrasti… ▽ More

    Submitted 8 June, 2024; v1 submitted 4 June, 2024; originally announced June 2024.

    Comments: This paper has been withdrawn because the authors aim to achieve better organization in writing and more detailed experimental analysis

  5. arXiv:2405.16034  [pdf, other

    cs.CV

    DiffuBox: Refining 3D Object Detection with Point Diffusion

    Authors: Xiangyu Chen, Zhenzhen Liu, Katie Z Luo, Siddhartha Datta, Adhitya Polavaram, Yan Wang, Yurong You, Boyi Li, Marco Pavone, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q. Weinberger

    Abstract: Ensuring robust 3D object detection and localization is crucial for many applications in robotics and autonomous driving. Recent models, however, face difficulties in maintaining high performance when applied to domains with differing sensor setups or geographic locations, often resulting in poor localization accuracy due to domain shift. To overcome this challenge, we introduce a novel diffusion-… ▽ More

    Submitted 24 May, 2024; originally announced May 2024.

  6. arXiv:2405.03609  [pdf

    cs.CC

    Decision algorithms for reversibility of one-dimensional non-linear cellular automata under null boundary conditions

    Authors: Ma Junchi, Chen Weilin, Wang Chen, Lin Defu, Wang Chao

    Abstract: The property of reversibility is quite meaningful for the classic theoretical computer science model, cellular automata. For the reversibility problem for a CA under null boundary conditions, while linear rules have been studied a lot, the non-linear rules remain unexplored at present. The paper investigates the reversibility problem of general one-dimensional CA on a finite field $\mathbb{Z}_p$,… ▽ More

    Submitted 6 May, 2024; originally announced May 2024.

    Comments: in Chinese language

  7. arXiv:2404.05139  [pdf, other

    cs.CV cs.RO

    Better Monocular 3D Detectors with LiDAR from the Past

    Authors: Yurong You, Cheng Perng Phoo, Carlos Andres Diaz-Ruiz, Katie Z Luo, Wei-Lun Chao, Mark Campbell, Bharath Hariharan, Kilian Q Weinberger

    Abstract: Accurate 3D object detection is crucial to autonomous driving. Though LiDAR-based detectors have achieved impressive performance, the high cost of LiDAR sensors precludes their widespread adoption in affordable vehicles. Camera-based detectors are cheaper alternatives but often suffer inferior performance compared to their LiDAR-based counterparts due to inherent depth ambiguities in images. In th… ▽ More

    Submitted 9 April, 2024; v1 submitted 7 April, 2024; originally announced April 2024.

    Comments: Accepted by ICRA 2024. The code can be found at https://github.com/YurongYou/AsyncDepth

  8. arXiv:2403.19181  [pdf, other

    cs.IR cs.CL cs.LG

    Make Large Language Model a Better Ranker

    Authors: Wenshuo Chao, Zhi Zheng, Hengshu Zhu, Hao Liu

    Abstract: Large Language Models (LLMs) demonstrate robust capabilities across various fields, leading to a paradigm shift in LLM-enhanced Recommender System (RS). Research to date focuses on point-wise and pair-wise recommendation paradigms, which are inefficient for LLM-based recommenders due to high computational costs. However, existing list-wise approaches also fall short in ranking tasks due to misalig… ▽ More

    Submitted 24 June, 2024; v1 submitted 28 March, 2024; originally announced March 2024.

    Comments: 12 pages, 5 figures

  9. arXiv:2403.13325  [pdf, other

    cs.IR

    Harnessing Large Language Models for Text-Rich Sequential Recommendation

    Authors: Zhi Zheng, Wenshuo Chao, Zhaopeng Qiu, Hengshu Zhu, Hui Xiong

    Abstract: Recent advances in Large Language Models (LLMs) have been changing the paradigm of Recommender Systems (RS). However, when items in the recommendation scenarios contain rich textual information, such as product descriptions in online shopping or news headlines on social media, LLMs require longer texts to comprehensively depict the historical user behavior sequence. This poses significant challeng… ▽ More

    Submitted 20 March, 2024; originally announced March 2024.

  10. arXiv:2402.04476  [pdf, other

    cs.CV cs.AI cs.CL

    Dual-View Visual Contextualization for Web Navigation

    Authors: Jihyung Kil, Chan Hee Song, Boyuan Zheng, Xiang Deng, Yu Su, Wei-Lun Chao

    Abstract: Automatic web navigation aims to build a web agent that can follow language instructions to execute complex and diverse tasks on real-world websites. Existing work primarily takes HTML documents as input, which define the contents and action spaces (i.e., actionable elements and operations) of webpages. Nevertheless, HTML documents may not provide a clear task-related context for each element, mak… ▽ More

    Submitted 30 March, 2024; v1 submitted 6 February, 2024; originally announced February 2024.

    Comments: Accepted to CVPR 2024

  11. arXiv:2401.17838  [pdf, other

    cs.LG cs.AI

    A Cross-View Hierarchical Graph Learning Hypernetwork for Skill Demand-Supply Joint Prediction

    Authors: Wenshuo Chao, Zhaopeng Qiu, Likang Wu, Zhuoning Guo, Zhi Zheng, Hengshu Zhu, Hao Liu

    Abstract: The rapidly changing landscape of technology and industries leads to dynamic skill requirements, making it crucial for employees and employers to anticipate such shifts to maintain a competitive edge in the labor market. Existing efforts in this area either rely on domain-expert knowledge or regarding skill evolution as a simplified time series forecasting problem. However, both approaches overloo… ▽ More

    Submitted 31 January, 2024; originally announced January 2024.

    Comments: 11 pages, 7 figures, AAAI24

  12. arXiv:2401.10510  [pdf, other

    cs.NE cs.AI cs.CL cs.LG

    When large language models meet evolutionary algorithms

    Authors: Wang Chao, Jiaxuan Zhao, Licheng Jiao, Lingling Li, Fang Liu, Shuyuan Yang

    Abstract: Pre-trained large language models (LLMs) have powerful capabilities for generating creative natural text. Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems. Motivated by the common collective and directionality of text generation and evolution, this paper illustrates the parallels between LLMs and EAs, which includes multiple one-to-one key characteristics… ▽ More

    Submitted 29 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

    Comments: A review article under two review

  13. arXiv:2401.00608  [pdf, other

    cs.CV cs.AI

    Bringing Back the Context: Camera Trap Species Identification as Link Prediction on Multimodal Knowledge Graphs

    Authors: Vardaan Pahuja, Weidi Luo, Yu Gu, Cheng-Hao Tu, Hong-You Chen, Tanya Berger-Wolf, Charles Stewart, Song Gao, Wei-Lun Chao, Yu Su

    Abstract: Camera traps are valuable tools in animal ecology for biodiversity monitoring and conservation. However, challenges like poor generalization to deployment at new unseen locations limit their practical application. Images are naturally associated with heterogeneous forms of context possibly in different modalities. In this work, we leverage the structured context associated with the camera trap ima… ▽ More

    Submitted 22 June, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

    Comments: 14 pages, 5 figures

  14. arXiv:2311.18803  [pdf, other

    cs.CV cs.CL cs.LG

    BioCLIP: A Vision Foundation Model for the Tree of Life

    Authors: Samuel Stevens, Jiaman Wu, Matthew J Thompson, Elizabeth G Campolongo, Chan Hee Song, David Edward Carlyn, Li Dong, Wasila M Dahdul, Charles Stewart, Tanya Berger-Wolf, Wei-Lun Chao, Yu Su

    Abstract: Images of the natural world, collected by a variety of cameras, from drones to individual phones, are increasingly abundant sources of biological information. There is an explosion of computational methods and tools, particularly computer vision, for extracting biologically relevant information from images for science and conservation. Yet most of these are bespoke approaches designed for a specif… ▽ More

    Submitted 14 May, 2024; v1 submitted 30 November, 2023; originally announced November 2023.

    Comments: CVPR 2024 (oral) camera-ready version; data released

  15. arXiv:2311.16517  [pdf, other

    eess.IV cs.CV

    LFSRDiff: Light Field Image Super-Resolution via Diffusion Models

    Authors: Wentao Chao, Fuqing Duan, Xuechun Wang, Yingqian Wang, Guanghui Wang

    Abstract: Light field (LF) image super-resolution (SR) is a challenging problem due to its inherent ill-posed nature, where a single low-resolution (LR) input LF image can correspond to multiple potential super-resolved outcomes. Despite this complexity, mainstream LF image SR methods typically adopt a deterministic approach, generating only a single output supervised by pixel-wise loss functions. This tend… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

  16. arXiv:2311.15954  [pdf, other

    cs.CL eess.AS

    A Quantitative Approach to Understand Self-Supervised Models as Cross-lingual Feature Extractors

    Authors: Shuyue Stella Li, Beining Xu, Xiangyu Zhang, Hexin Liu, Wenhan Chao, Leibny Paola Garcia

    Abstract: In this work, we study the features extracted by English self-supervised learning (SSL) models in cross-lingual contexts and propose a new metric to predict the quality of feature representations. Using automatic speech recognition (ASR) as a downstream task, we analyze the effect of model size, training objectives, and model architecture on the models' performance as a feature extractor for a set… ▽ More

    Submitted 27 November, 2023; originally announced November 2023.

    Comments: 12 pages, 5 figures, 4 tables

  17. arXiv:2311.04157  [pdf, other

    cs.CV cs.AI

    A Simple Interpretable Transformer for Fine-Grained Image Classification and Analysis

    Authors: Dipanjyoti Paul, Arpita Chowdhury, Xinqi Xiong, Feng-Ju Chang, David Carlyn, Samuel Stevens, Kaiya L. Provost, Anuj Karpatne, Bryan Carstens, Daniel Rubenstein, Charles Stewart, Tanya Berger-Wolf, Yu Su, Wei-Lun Chao

    Abstract: We present a novel usage of Transformers to make image classification interpretable. Unlike mainstream classifiers that wait until the last fully connected layer to incorporate class information to make predictions, we investigate a proactive approach, asking each class to search for itself in an image. We realize this idea via a Transformer encoder-decoder inspired by DEtection TRansformer (DETR)… ▽ More

    Submitted 14 June, 2024; v1 submitted 7 November, 2023; originally announced November 2023.

    Comments: Accepted to International Conference on Learning Representations 2024 (ICLR 2024)

  18. arXiv:2311.01420  [pdf, other

    cs.LG

    Holistic Transfer: Towards Non-Disruptive Fine-Tuning with Partial Target Data

    Authors: Cheng-Hao Tu, Hong-You Chen, Zheda Mai, Jike Zhong, Vardaan Pahuja, Tanya Berger-Wolf, Song Gao, Charles Stewart, Yu Su, Wei-Lun Chao

    Abstract: We propose a learning problem involving adapting a pre-trained source model to the target domain for classifying all classes that appeared in the source data, using target data that covers only a partial label space. This problem is practical, as it is unrealistic for the target end-users to collect data for all classes prior to adaptation. However, it has received limited attention in the literat… ▽ More

    Submitted 2 November, 2023; originally announced November 2023.

    Comments: Accepted to NeurIPS 2023 main track

  19. arXiv:2310.14592  [pdf, other

    cs.CV cs.LG

    Pre-Training LiDAR-Based 3D Object Detectors Through Colorization

    Authors: Tai-Yu Pan, Chenyang Ma, Tianle Chen, Cheng Perng Phoo, Katie Z Luo, Yurong You, Mark Campbell, Kilian Q. Weinberger, Bharath Hariharan, Wei-Lun Chao

    Abstract: Accurate 3D object detection and understanding for self-driving cars heavily relies on LiDAR point clouds, necessitating large amounts of labeled data to train. In this work, we introduce an innovative pre-training approach, Grounded Point Colorization (GPC), to bridge the gap between data and labels by teaching the model to colorize LiDAR point clouds, equipping it with valuable semantic cues. To… ▽ More

    Submitted 25 February, 2024; v1 submitted 23 October, 2023; originally announced October 2023.

    Comments: Accepted to ICLR 2024

  20. arXiv:2310.13248  [pdf, other

    cs.LG cs.AI cs.CY cs.SI

    FLEE-GNN: A Federated Learning System for Edge-Enhanced Graph Neural Network in Analyzing Geospatial Resilience of Multicommodity Food Flows

    Authors: Yuxiao Qu, Jinmeng Rao, Song Gao, Qianheng Zhang, Wei-Lun Chao, Yu Su, Michelle Miller, Alfonso Morales, Patrick Huber

    Abstract: Understanding and measuring the resilience of food supply networks is a global imperative to tackle increasing food insecurity. However, the complexity of these networks, with their multidimensional interactions and decisions, presents significant challenges. This paper proposes FLEE-GNN, a novel Federated Learning System for Edge-Enhanced Graph Neural Network, designed to overcome these challenge… ▽ More

    Submitted 19 October, 2023; originally announced October 2023.

    Comments: 10 pages, 5 figures

    ACM Class: I.2

    Journal ref: ACM SIGSPATIAL GeoAI 2023

  21. arXiv:2309.12140  [pdf, other

    cs.CV cs.AI cs.LG

    Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features

    Authors: Travis Zhang, Katie Luo, Cheng Perng Phoo, Yurong You, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detector… ▽ More

    Submitted 21 September, 2023; originally announced September 2023.

  22. arXiv:2306.03228  [pdf, other

    cs.LG cs.CV eess.IV

    Discovering Novel Biological Traits From Images Using Phylogeny-Guided Neural Networks

    Authors: Mohannad Elhamod, Mridul Khurana, Harish Babu Manogaran, Josef C. Uyeda, Meghan A. Balk, Wasila Dahdul, Yasin Bakış, Henry L. Bart Jr., Paula M. Mabee, Hilmar Lapp, James P. Balhoff, Caleb Charpentier, David Carlyn, Wei-Lun Chao, Charles V. Stewart, Daniel I. Rubenstein, Tanya Berger-Wolf, Anuj Karpatne

    Abstract: Discovering evolutionary traits that are heritable across species on the tree of life (also referred to as a phylogenetic tree) is of great interest to biologists to understand how organisms diversify and evolve. However, the measurement of traits is often a subjective and labor-intensive process, making trait discovery a highly label-scarce problem. We present a novel approach for discovering evo… ▽ More

    Submitted 5 June, 2023; originally announced June 2023.

  23. arXiv:2305.20044  [pdf, other

    cs.RO

    Probabilistic Uncertainty Quantification of Prediction Models with Application to Visual Localization

    Authors: Junan Chen, Josephine Monica, Wei-Lun Chao, Mark Campbell

    Abstract: The uncertainty quantification of prediction models (e.g., neural networks) is crucial for their adoption in many robotics applications. This is arguably as important as making accurate predictions, especially for safety-critical applications such as self-driving cars. This paper proposes our approach to uncertainty quantification in the context of visual localization for autonomous driving, where… ▽ More

    Submitted 6 April, 2024; v1 submitted 31 May, 2023; originally announced May 2023.

    Comments: Extended version of our ICRA2023 paper

  24. arXiv:2305.17710  [pdf, other

    cs.CV

    OccCasNet: Occlusion-aware Cascade Cost Volume for Light Field Depth Estimation

    Authors: Wentao Chao, Fuqing Duan, Xuechun Wang, Yingqian Wang, Guanghui Wang

    Abstract: Light field (LF) depth estimation is a crucial task with numerous practical applications. However, mainstream methods based on the multi-view stereo (MVS) are resource-intensive and time-consuming as they need to construct a finer cost volume. To address this issue and achieve a better trade-off between accuracy and efficiency, we propose an occlusion-aware cascade cost volume for LF depth (dispar… ▽ More

    Submitted 28 May, 2023; originally announced May 2023.

  25. arXiv:2305.16804  [pdf, other

    cs.CV

    Towards Open-World Segmentation of Parts

    Authors: Tai-Yu Pan, Qing Liu, Wei-Lun Chao, Brian Price

    Abstract: Segmenting object parts such as cup handles and animal bodies is important in many real-world applications but requires more annotation effort. The largest dataset nowadays contains merely two hundred object categories, implying the difficulty to scale up part segmentation to an unconstrained setting. To address this, we propose to explore a seemingly simplified but empirically useful and scalable… ▽ More

    Submitted 26 May, 2023; originally announced May 2023.

    Comments: Accepted to CVPR 2023

  26. arXiv:2305.05803  [pdf, other

    cs.CV cs.AI cs.LG

    Segment Anything Model (SAM) Enhanced Pseudo Labels for Weakly Supervised Semantic Segmentation

    Authors: Tianle Chen, Zheda Mai, Ruiwen Li, Wei-lun Chao

    Abstract: Weakly supervised semantic segmentation (WSSS) aims to bypass the need for laborious pixel-level annotation by using only image-level annotation. Most existing methods rely on Class Activation Maps (CAM) to derive pixel-level pseudo-labels and use them to train a fully supervised semantic segmentation model. Although these pseudo-labels are class-aware, indicating the coarse regions for particular… ▽ More

    Submitted 3 November, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Tianle Chen and Zheda Mai contributed equally to this work. Accepted to NeurIPS2023 ICBINB Workshop Our code is available at \url{https://github.com/cskyl/SAM_WSSS}

  27. arXiv:2304.07882  [pdf, other

    cs.CV

    Federated Learning of Shareable Bases for Personalization-Friendly Image Classification

    Authors: Hong-You Chen, Jike Zhong, Mingda Zhang, Xuhui Jia, Hang Qi, Boqing Gong, Wei-Lun Chao, Li Zhang

    Abstract: Personalized federated learning (PFL) aims to harness the collective wisdom of clients' data while building personalized models tailored to individual clients' data distributions. Existing works offer personalization primarily to clients who participate in the FL process, making it hard to encompass new clients who were absent or newly show up. In this paper, we propose FedBasis, a novel PFL frame… ▽ More

    Submitted 31 October, 2023; v1 submitted 16 April, 2023; originally announced April 2023.

    Comments: Preprint

  28. arXiv:2304.06813  [pdf, other

    cs.LG cs.AI cs.CV

    Unified Out-Of-Distribution Detection: A Model-Specific Perspective

    Authors: Reza Averly, Wei-Lun Chao

    Abstract: Out-of-distribution (OOD) detection aims to identify test examples that do not belong to the training distribution and are thus unlikely to be predicted reliably. Despite a plethora of existing works, most of them focused only on the scenario where OOD examples come from semantic shift (e.g., unseen categories), ignoring other possible causes (e.g., covariate shift). In this paper, we present a no… ▽ More

    Submitted 3 November, 2023; v1 submitted 13 April, 2023; originally announced April 2023.

    Comments: Published in International Conference on Computer Vision (ICCV 2023): https://openaccess.thecvf.com/content/ICCV2023/papers/Averly_Unified_Out-Of-Distribution_Detection_A_Model-Specific_Perspective_ICCV_2023_paper.pdf. Extra references added

  29. arXiv:2303.15286  [pdf, other

    cs.CV cs.LG

    Unsupervised Adaptation from Repeated Traversals for Autonomous Driving

    Authors: Yurong You, Cheng Perng Phoo, Katie Z Luo, Travis Zhang, Wei-Lun Chao, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: For a self-driving car to operate reliably, its perceptual system must generalize to the end-user's environment -- ideally without additional annotation efforts. One potential solution is to leverage unlabeled data (e.g., unlabeled LiDAR point clouds) collected from the end-users' environments (i.e. target domain) to adapt the system to the difference between training and testing environments. Whi… ▽ More

    Submitted 27 March, 2023; originally announced March 2023.

    Comments: Accepted by NeurIPS 2022. Code is available at https://github.com/YurongYou/Rote-DA

  30. arXiv:2303.12722  [pdf, other

    cs.CV cs.LG

    Learning Fractals by Gradient Descent

    Authors: Cheng-Hao Tu, Hong-You Chen, David Carlyn, Wei-Lun Chao

    Abstract: Fractals are geometric shapes that can display complex and self-similar patterns found in nature (e.g., clouds and plants). Recent works in visual recognition have leveraged this property to create random fractal images for model pre-training. In this paper, we study the inverse problem -- given a target image (not necessarily a fractal), we aim to generate a fractal image that looks like it. We p… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

    Comments: Accepted by AAAI 2023

  31. arXiv:2303.06530  [pdf, other

    cs.LG cs.AI

    Making Batch Normalization Great in Federated Deep Learning

    Authors: Jike Zhong, Hong-You Chen, Wei-Lun Chao

    Abstract: Batch Normalization (BN) is widely used in {centralized} deep learning to improve convergence and generalization. However, in {federated} learning (FL) with decentralized data, prior work has observed that training with BN could hinder performance and suggested replacing it with Group Normalization (GN). In this paper, we revisit this substitution by expanding the empirical study conducted in prio… ▽ More

    Submitted 28 March, 2024; v1 submitted 11 March, 2023; originally announced March 2023.

    Comments: An extended version of the workshop paper in NeurIPS 2023 (https://federated-learning.org/fl@fm-neurips-2023/)

  32. arXiv:2303.03041  [pdf

    cs.CV cs.AI

    Automatic detection of aerial survey ground control points based on Yolov5-OBB

    Authors: Cheng Chuanxiang, Yang Jia, Wang Chao, Zheng Zhi, Li Xiaopeng, Dong Di, Chang Mengxia, Zhuang Zhiheng

    Abstract: The use of ground control points (GCPs) for georeferencing is the most common strategy in unmanned aerial vehicle (UAV) photogrammetry, but at the same time their collection represents the most time-consuming and expensive part of UAV campaigns. Recently, deep learning has been rapidly developed in the field of small object detection. In this letter, to automatically extract coordinates informatio… ▽ More

    Submitted 6 March, 2023; originally announced March 2023.

    Comments: 6 pages, 4 figures

  33. arXiv:2212.12454  [pdf

    cs.CL

    Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

    Authors: Yuting Guo, Swati Rajwal, Sahithi Lakamana, Chia-Chun Chiang, Paul C. Menell, Adnan H. Shahid, Yi-Chieh Chen, Nikita Chhabra, Wan-Ju Chao, Chieh-Ju Chao, Todd J. Schwedt, Imon Banerjee, Abeed Sarker

    Abstract: Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text cla… ▽ More

    Submitted 23 December, 2022; originally announced December 2022.

    Comments: Accepted by AMIA 2023 Informatics Summit

  34. arXiv:2212.04088  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    LLM-Planner: Few-Shot Grounded Planning for Embodied Agents with Large Language Models

    Authors: Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M. Sadler, Wei-Lun Chao, Yu Su

    Abstract: This study focuses on using large language models (LLMs) as a planner for embodied agents that can follow natural language instructions to complete complex tasks in a visually-perceived environment. The high data cost and poor sample efficiency of existing methods hinders the development of versatile agents that are capable of many tasks and can learn new tasks quickly. In this work, we propose a… ▽ More

    Submitted 30 March, 2023; v1 submitted 8 December, 2022; originally announced December 2022.

    Comments: 14 pages, 5 figures

    Report number: ICCV 2023

  35. arXiv:2212.03220  [pdf, other

    cs.LG cs.AI cs.CV

    Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning

    Authors: Cheng-Hao Tu, Zheda Mai, Wei-Lun Chao

    Abstract: Intermediate features of a pre-trained model have been shown informative for making accurate predictions on downstream tasks, even if the model backbone is kept frozen. The key challenge is how to utilize these intermediate features given their gigantic amount. We propose visual query tuning (VQT), a simple yet effective approach to aggregate intermediate features of Vision Transformers. Through i… ▽ More

    Submitted 26 April, 2023; v1 submitted 6 December, 2022; originally announced December 2022.

    Comments: Accepted by CVPR 2023. Cheng-Hao Tu and Zheda Mai contributed equally to this work

  36. arXiv:2209.11673  [pdf, other

    cs.CV cs.RO

    Image-to-Image Translation for Autonomous Driving from Coarsely-Aligned Image Pairs

    Authors: Youya Xia, Josephine Monica, Wei-Lun Chao, Bharath Hariharan, Kilian Q Weinberger, Mark Campbell

    Abstract: A self-driving car must be able to reliably handle adverse weather conditions (e.g., snowy) to operate safely. In this paper, we investigate the idea of turning sensor inputs (i.e., images) captured in an adverse condition into a benign one (i.e., sunny), upon which the downstream tasks (e.g., semantic segmentation) can attain high accuracy. Prior work primarily formulates this as an unpaired imag… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: Submitted to the International Conference on Robotics and Automation (ICRA) 2023

  37. arXiv:2209.05534  [pdf, other

    cs.CV cs.CL

    PreSTU: Pre-Training for Scene-Text Understanding

    Authors: Jihyung Kil, Soravit Changpinyo, Xi Chen, Hexiang Hu, Sebastian Goodman, Wei-Lun Chao, Radu Soricut

    Abstract: The ability to recognize and reason about text embedded in visual inputs is often lacking in vision-and-language (V&L) models, perhaps because V&L pre-training methods have often failed to include such an ability in their training objective. In this paper, we propose PreSTU, a novel pre-training recipe dedicated to scene-text understanding (STU). PreSTU introduces OCR-aware pre-training objectives… ▽ More

    Submitted 19 August, 2023; v1 submitted 12 September, 2022; originally announced September 2022.

    Comments: Accepted to ICCV 2023

  38. arXiv:2208.09688  [pdf, other

    cs.CV eess.IV

    Learning Sub-Pixel Disparity Distribution for Light Field Depth Estimation

    Authors: Wentao Chao, Xuechun Wang, Yingqian Wang, Guanghui Wang, Fuqing Duan

    Abstract: Light field (LF) depth estimation plays a crucial role in many LF-based applications. Existing LF depth estimation methods consider depth estimation as a regression problem, where a pixel-wise L1 loss is employed to supervise the training process. However, the disparity map is only a sub-space projection (i.e., an expectation) of the disparity distribution, which is essential for models to learn.… ▽ More

    Submitted 21 November, 2023; v1 submitted 20 August, 2022; originally announced August 2022.

    Comments: Accepted by IEEE Transactions on Computational Imaging

  39. arXiv:2208.01166  [pdf, other

    cs.CV

    Ithaca365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

    Authors: Carlos A. Diaz-Ruiz, Youya Xia, Yurong You, Jose Nino, Junan Chen, Josephine Monica, Xiangyu Chen, Katie Luo, Yan Wang, Marc Emond, Wei-Lun Chao, Bharath Hariharan, Kilian Q. Weinberger, Mark Campbell

    Abstract: Advances in perception for self-driving cars have accelerated in recent years due to the availability of large-scale datasets, typically collected at specific locations and under nice weather conditions. Yet, to achieve the high safety requirement, these perceptual systems must operate robustly under a wide variety of weather conditions including snow and rain. In this paper, we present a new data… ▽ More

    Submitted 1 August, 2022; originally announced August 2022.

    Comments: Accepted by CVPR 2022

  40. arXiv:2207.04587  [pdf, other

    cs.CV cs.AI cs.LG

    Gradual Domain Adaptation without Indexed Intermediate Domains

    Authors: Hong-You Chen, Wei-Lun Chao

    Abstract: The effectiveness of unsupervised domain adaptation degrades when there is a large discrepancy between the source and target domains. Gradual domain adaptation (GDA) is one promising way to mitigate such an issue, by leveraging additional unlabeled data that gradually shift from the source to the target. Through sequentially adapting the model along the "indexed" intermediate domains, GDA substant… ▽ More

    Submitted 10 July, 2022; originally announced July 2022.

    Comments: Accepted to NeurIPS 2021

  41. arXiv:2206.11488  [pdf, other

    cs.LG cs.AI cs.CV

    On the Importance and Applicability of Pre-Training for Federated Learning

    Authors: Hong-You Chen, Cheng-Hao Tu, Ziwei Li, Han-Wei Shen, Wei-Lun Chao

    Abstract: Pre-training is prevalent in nowadays deep learning to improve the learned model's performance. However, in the literature on federated learning (FL), neural networks are mostly initialized with random weights. These attract our interest in conducting a systematic study to explore pre-training for FL. Across multiple visual recognition benchmarks, we found that pre-training can not only improve FL… ▽ More

    Submitted 22 March, 2023; v1 submitted 23 June, 2022; originally announced June 2022.

    Comments: Accepted to ICLR 2023

  42. arXiv:2203.15882  [pdf, other

    cs.CV

    Learning to Detect Mobile Objects from LiDAR Scans Without Labels

    Authors: Yurong You, Katie Z Luo, Cheng Perng Phoo, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: Current 3D object detectors for autonomous driving are almost entirely trained on human-annotated data. Although of high quality, the generation of such data is laborious and costly, restricting them to a few specific locations and object types. This paper proposes an alternative approach entirely based on unlabeled data, which can be collected cheaply and in abundance almost everywhere on earth.… ▽ More

    Submitted 29 March, 2022; originally announced March 2022.

    Comments: Accepted by CVPR 2022. Code is available at https://github.com/YurongYou/MODEST

  43. arXiv:2203.11405  [pdf, other

    cs.CV

    Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception

    Authors: Yurong You, Katie Z Luo, Xiangyu Chen, Junan Chen, Wei-Lun Chao, Wen Sun, Bharath Hariharan, Mark Campbell, Kilian Q. Weinberger

    Abstract: Self-driving cars must detect vehicles, pedestrians, and other traffic participants accurately to operate safely. Small, far-away, or highly occluded objects are particularly challenging because there is limited information in the LiDAR point clouds for detecting them. To address this challenge, we leverage valuable information from the past: in particular, data collected in past traversals of the… ▽ More

    Submitted 21 March, 2022; originally announced March 2022.

    Comments: Accepted by ICLR 2022. Code is available at https://github.com/YurongYou/Hindsight

  44. arXiv:2202.11124  [pdf, other

    cs.CV

    Learning with Free Object Segments for Long-Tailed Instance Segmentation

    Authors: Cheng Zhang, Tai-Yu Pan, Tianle Chen, Jike Zhong, Wenjin Fu, Wei-Lun Chao

    Abstract: One fundamental challenge in building an instance segmentation model for a large number of classes in complex scenes is the lack of training examples, especially for rare objects. In this paper, we explore the possibility to increase the training examples without laborious data collection and annotation. We find that an abundance of instance segments can potentially be obtained freely from object-… ▽ More

    Submitted 4 October, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

    Comments: Accepted to ECCV 2022

  45. arXiv:2202.07028  [pdf, other

    cs.AI cs.CL cs.CV cs.LG cs.RO

    One Step at a Time: Long-Horizon Vision-and-Language Navigation with Milestones

    Authors: Chan Hee Song, Jihyung Kil, Tai-Yu Pan, Brian M. Sadler, Wei-Lun Chao, Yu Su

    Abstract: We study the problem of developing autonomous agents that can follow human instructions to infer and perform a sequence of actions to complete the underlying task. Significant progress has been made in recent years, especially for tasks with short horizons. However, when it comes to long-horizon tasks with extended sequences of actions, an agent can easily ignore some instructions or get stuck in… ▽ More

    Submitted 10 June, 2022; v1 submitted 14 February, 2022; originally announced February 2022.

    Comments: 10 pages, 5 figures. Accepted to CVPR 2022

    Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 15482-15491

  46. arXiv:2111.04910  [pdf

    cs.SE

    Structure-Behavior Coalescence Process Algebra -- Toward a Unified View of the System in Model-Based Systems Engineering

    Authors: William S. Chao

    Abstract: In Model-Based Systems Engineering (MBSE), the Systems Modeling Language (SysML) specification includes a metamodel that defines the language concepts and a user model that defines how the language concepts are represented. In SysML, an important use of metamodel is to provide an integrated semantic framework that every diagram in the user model can be projected as a view of the metamodel. However… ▽ More

    Submitted 8 November, 2021; originally announced November 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2110.15526

  47. arXiv:2110.15526  [pdf

    cs.SE

    The Structure-Behavior Coalescence Method --Toward a Unified View of the Software System in Model-Driven Engineering

    Authors: William S. Chao

    Abstract: In Model-Driven Engineering (MDE), the Unified Modeling Language (UML) 2.0 specification includes a metamodel that defines the language concepts and a user model that defines how the language concepts are represented. In UML 2.0, an important use of metamodel is to provide an integrated semantic framework that every diagram in the user model can be projected as a view of the metamodel. However, mo… ▽ More

    Submitted 27 June, 2022; v1 submitted 28 October, 2021; originally announced October 2021.

  48. arXiv:2110.08998  [pdf

    cs.SE

    Using Structure-Behavior Coalescence Method for Systems Definition 2.0

    Authors: William S. Chao

    Abstract: Systems definition is an artifact created by humans to describe what a system is. A system has been defined, by systems definition 1.0, hopefully to be an integrated whole, embodied in its components, their interrelationships with each other and the environment, and the principles and guidelines governing its design and evolution. This systems definition 1.0 defining the system possesses one cardi… ▽ More

    Submitted 30 June, 2022; v1 submitted 17 October, 2021; originally announced October 2021.

  49. Two-Stage Mesh Deep Learning for Automated Tooth Segmentation and Landmark Localization on 3D Intraoral Scans

    Authors: Tai-Hsien Wu, Chunfeng Lian, Sanghee Lee, Matthew Pastewait, Christian Piers, Jie Liu, Fang Wang, Li Wang, Chiung-Ying Chiu, Wenchi Wang, Christina Jackson, Wei-Lun Chao, Dinggang Shen, Ching-Chang Ko

    Abstract: Accurately segmenting teeth and identifying the corresponding anatomical landmarks on dental mesh models are essential in computer-aided orthodontic treatment. Manually performing these two tasks is time-consuming, tedious, and, more importantly, highly dependent on orthodontists' experiences due to the abnormality and large-scale variance of patients' teeth. Some machine learning-based methods ha… ▽ More

    Submitted 2 June, 2022; v1 submitted 24 September, 2021; originally announced September 2021.

    Comments: 9 pages, 8 figures, accepted by IEEE TMI

  50. arXiv:2109.09840  [pdf, other

    cs.RO

    Sequential Joint Shape and Pose Estimation of Vehicles with Application to Automatic Amodal Segmentation Labeling

    Authors: Josephine Monica, Wei-Lun Chao, Mark Campbell

    Abstract: Shape and pose estimation is a critical perception problem for a self-driving car to fully understand its surrounding environment. One fundamental challenge in solving this problem is the incomplete sensor signal (e.g., LiDAR scans), especially for faraway or occluded objects. In this paper, we propose a novel algorithm to address this challenge, which explicitly leverages the sensor signal captur… ▽ More

    Submitted 1 July, 2022; v1 submitted 20 September, 2021; originally announced September 2021.

    Comments: Accepted to International Conference on Robotics and Automation (ICRA) 2022