subscribe to arXiv mailings

The BiGGen Bench: A Principled Benchmark for Fine-grained Evaluation of Language Models with Language Models

Authors: Seungone Kim, Juyoung Suk, Ji Yong Cho, Shayne Longpre, Chaeeun Kim, Dongkeun Yoon, Guijin Son, Yejin Cho, Sheikh Shafayat, Jinheon Baek, Sue Hyun Park, Hyeonbin Hwang, Jinkyung Jo, Hyowon Cho, Haebin Shin, Seongyun Lee, Hanseok Oh, Noah Lee, Namgyu Ho, Se June Joo, Miyoung Ko, Yoonjoo Lee, Hyungjoo Chae, Jamin Shin, Joel Jang , et al. (7 additional authors not shown)

Abstract: As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on spec… ▽ More As language models (LMs) become capable of handling a wide range of tasks, their evaluation is becoming as challenging as their development. Most generation benchmarks currently assess LMs using abstract evaluation criteria like helpfulness and harmlessness, which often lack the flexibility and granularity of human assessment. Additionally, these benchmarks tend to focus disproportionately on specific capabilities such as instruction following, leading to coverage bias. To overcome these limitations, we introduce the BiGGen Bench, a principled generation benchmark designed to thoroughly evaluate nine distinct capabilities of LMs across 77 diverse tasks. A key feature of the BiGGen Bench is its use of instance-specific evaluation criteria, closely mirroring the nuanced discernment of human evaluation. We apply this benchmark to assess 103 frontier LMs using five evaluator LMs. Our code, data, and evaluation results are all publicly available at https://github.com/prometheus-eval/prometheus-eval/tree/main/BiGGen-Bench. △ Less

Submitted 9 June, 2024; originally announced June 2024.

Comments: Work in Progress

arXiv:2404.14760 [pdf, other]

Retrieval Augmented Generation for Domain-specific Question Answering

Authors: Sanat Sharma, David Seunghyun Yoon, Franck Dernoncourt, Dewang Sultania, Karishma Bagga, Mengjiao Zhang, Trung Bui, Varun Kotte

Abstract: Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we b… ▽ More Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding. △ Less

Submitted 29 May, 2024; v1 submitted 23 April, 2024; originally announced April 2024.

Comments: AAAI 2024 (Association for the Advancement of Artificial Intelligence) Scientific Document Understanding Workshop

arXiv:2404.01537 [pdf, other]

Are Doppler Velocity Measurements Useful for Spinning Radar Odometry?

Authors: Daniil Lisus, Keenan Burnett, David J. Yoon, Richard Poulton, John Marshall, Timothy D. Barfoot

Abstract: Spinning, frequency-modulated continuous-wave (FMCW) radars with 360 degree coverage have been gaining popularity for autonomous-vehicle navigation. However, unlike 'fixed' automotive radar, commercially available spinning radar systems typically do not produce radial velocities due to the lack of repeated measurements in the same direction and the fundamental hardware setup. To make these radial… ▽ More Spinning, frequency-modulated continuous-wave (FMCW) radars with 360 degree coverage have been gaining popularity for autonomous-vehicle navigation. However, unlike 'fixed' automotive radar, commercially available spinning radar systems typically do not produce radial velocities due to the lack of repeated measurements in the same direction and the fundamental hardware setup. To make these radial velocities observable, we modified the firmware of a commercial spinning radar to use triangular frequency modulation. In this paper, we develop a novel way to use this modulation to extract radial Doppler velocity measurements from single raw radar intensity scans without any required data association. We show that these noisy, error-prone measurements contain enough information to provide good ego-velocity estimates, and incorporate these estimates into different modern odometry pipelines. We extensively evaluate the pipelines on over 110 km of driving data in progressively more geometrically challenging autonomous-driving environments. We show that Doppler velocity measurements improve odometry in well-defined geometric conditions and enable it to continue functioning even in severely geometrically degenerate environments, such as long tunnels. △ Less

Submitted 12 July, 2024; v1 submitted 1 April, 2024; originally announced April 2024.

Comments: 8 pages, 7 figures, 2 tables, submitted to Robotics and Automation Letters (RA-L)

arXiv:2402.13781 [pdf, other]

Preserving Near-Optimal Gradient Sparsification Cost for Scalable Distributed Deep Learning

Authors: Daegun Yoon, Sangyoon Oh

Abstract: Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. I… ▽ More Communication overhead is a major obstacle to scaling distributed training systems. Gradient sparsification is a potential optimization approach to reduce the communication volume without significant loss of model fidelity. However, existing gradient sparsification methods have low scalability owing to inefficient design of their algorithms, which raises the communication overhead significantly. In particular, gradient build-up and inadequate sparsity control methods degrade the sparsification performance considerably. Moreover, communication traffic increases drastically owing to workload imbalance of gradient selection between workers. To address these challenges, we propose a novel gradient sparsification scheme called ExDyna. In ExDyna, the gradient tensor of the model comprises fined-grained blocks, and contiguous blocks are grouped into non-overlapping partitions. Each worker selects gradients in its exclusively allocated partition so that gradient build-up never occurs. To balance the workload of gradient selection between workers, ExDyna adjusts the topology of partitions by comparing the workloads of adjacent partitions. In addition, ExDyna supports online threshold scaling, which estimates the accurate threshold of gradient selection on-the-fly. Accordingly, ExDyna can satisfy the user-required sparsity level during a training period regardless of models and datasets. Therefore, ExDyna can enhance the scalability of distributed training systems by preserving near-optimal gradient sparsification cost. In experiments, ExDyna outperformed state-of-the-art sparsifiers in terms of training speed and sparsification performance while achieving high accuracy. △ Less

Submitted 21 February, 2024; originally announced February 2024.

Comments: 24th IEEE/ACM International Symposium on Cluster, Cloud, and Internet Computing (CCGrid 2024). Code: https://github.com/kljp/exdyna

arXiv:2401.10695 [pdf, other]

LangBridge: Multilingual Reasoning Without Multilingual Supervision

Authors: Dongkeun Yoon, Joel Jang, Sungdong Kim, Seungone Kim, Sheikh Shafayat, Minjoon Seo

Abstract: We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducin… ▽ More We introduce LangBridge, a zero-shot approach to adapt language models for multilingual reasoning tasks without multilingual supervision. LangBridge operates by bridging two models, each specialized in different aspects: (1) one specialized in understanding multiple languages (e.g., mT5 encoder) and (2) one specialized in reasoning (e.g., MetaMath). LangBridge connects the two models by introducing minimal trainable parameters between them. Despite utilizing only English data for training, LangBridge considerably enhances the performance of language models on low-resource languages across mathematical reasoning, code completion, logical reasoning, and commonsense reasoning. Our analysis suggests that the efficacy of LangBridge stems from the language-agnostic characteristics of multilingual representations. We publicly release our code and models. △ Less

Submitted 3 June, 2024; v1 submitted 19 January, 2024; originally announced January 2024.

Comments: ACL 2024 Main

arXiv:2312.02819 [pdf, other]

Deterministic Guidance Diffusion Model for Probabilistic Weather Forecasting

Authors: Donggeun Yoon, Minseok Seo, Doyi Kim, Yeji Choi, Donghyeon Cho

Abstract: Weather forecasting requires not only accuracy but also the ability to perform probabilistic prediction. However, deterministic weather forecasting methods do not support probabilistic predictions, and conversely, probabilistic models tend to be less accurate. To address these challenges, in this paper, we introduce the \textbf{\textit{D}}eterministic \textbf{\textit{G}}uidance \textbf{\textit{D}}… ▽ More Weather forecasting requires not only accuracy but also the ability to perform probabilistic prediction. However, deterministic weather forecasting methods do not support probabilistic predictions, and conversely, probabilistic models tend to be less accurate. To address these challenges, in this paper, we introduce the \textbf{\textit{D}}eterministic \textbf{\textit{G}}uidance \textbf{\textit{D}}iffusion \textbf{\textit{M}}odel (DGDM) for probabilistic weather forecasting, integrating benefits of both deterministic and probabilistic approaches. During the forward process, both the deterministic and probabilistic models are trained end-to-end. In the reverse process, weather forecasting leverages the predicted result from the deterministic model, using as an intermediate starting point for the probabilistic model. By fusing deterministic models with probabilistic models in this manner, DGDM is capable of providing accurate forecasts while also offering probabilistic predictions. To evaluate DGDM, we assess it on the global weather forecasting dataset (WeatherBench) and the common video frame prediction benchmark (Moving MNIST). We also introduce and evaluate the Pacific Northwest Windstorm (PNW)-Typhoon weather satellite dataset to verify the effectiveness of DGDM in high-resolution regional forecasting. As a result of our experiments, DGDM achieves state-of-the-art results not only in global forecasting but also in regional forecasting. The code is available at: \url{https://github.com/DongGeun-Yoon/DGDM}. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: 16 pages

arXiv:2310.00967 [pdf, other]

MiCRO: Near-Zero Cost Gradient Sparsification for Scaling and Accelerating Distributed DNN Training

Authors: Daegun Yoon, Sangyoon Oh

Abstract: Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase i… ▽ More Gradient sparsification is a communication optimisation technique for scaling and accelerating distributed deep neural network (DNN) training. It reduces the increasing communication traffic for gradient aggregation. However, existing sparsifiers have poor scalability because of the high computational cost of gradient selection and/or increase in communication traffic. In particular, an increase in communication traffic is caused by gradient build-up and inappropriate threshold for gradient selection. To address these challenges, we propose a novel gradient sparsification method called MiCRO. In MiCRO, the gradient vector is partitioned, and each partition is assigned to the corresponding worker. Each worker then selects gradients from its partition, and the aggregated gradients are free from gradient build-up. Moreover, MiCRO estimates the accurate threshold to maintain the communication traffic as per user requirement by minimising the compression ratio error. MiCRO enables near-zero cost gradient sparsification by solving existing problems that hinder the scalability and acceleration of distributed DNN training. In our extensive experiments, MiCRO outperformed state-of-the-art sparsifiers with an outstanding convergence rate. △ Less

Submitted 20 February, 2024; v1 submitted 2 October, 2023; originally announced October 2023.

Comments: 30th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC 2023). Code: https://github.com/kljp/micro

arXiv:2309.08872 [pdf, other]

PDFTriage: Question Answering over Long, Structured Documents

Authors: Jon Saad-Falcon, Joe Barrow, Alexa Siu, Ani Nenkova, David Seunghyun Yoon, Ryan A. Rossi, Franck Dernoncourt

Abstract: Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with dif… ▽ More Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with different pages, tables, sections, and so on. Representing such structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. Our experiments demonstrate the effectiveness of the proposed PDFTriage-augmented models across several classes of questions where existing retrieval-augmented LLMs fail. To facilitate further research on this fundamental problem, we release our benchmark dataset consisting of 900+ human-generated questions over 80 structured documents from 10 different categories of question types for document QA. Our code and datasets will be released soon on Github. △ Less

Submitted 8 November, 2023; v1 submitted 16 September, 2023; originally announced September 2023.

arXiv:2307.03500 [pdf, other]

doi 10.1145/3605573.3605609

DEFT: Exploiting Gradient Norm Difference between Model Layers for Scalable Gradient Sparsification

Authors: Daegun Yoon, Sangyoon Oh

Abstract: Gradient sparsification is a widely adopted solution for reducing the excessive communication traffic in distributed deep learning. However, most existing gradient sparsifiers have relatively poor scalability because of considerable computational cost of gradient selection and/or increased communication traffic owing to gradient build-up. To address these challenges, we propose a novel gradient sp… ▽ More Gradient sparsification is a widely adopted solution for reducing the excessive communication traffic in distributed deep learning. However, most existing gradient sparsifiers have relatively poor scalability because of considerable computational cost of gradient selection and/or increased communication traffic owing to gradient build-up. To address these challenges, we propose a novel gradient sparsification scheme, DEFT, that partitions the gradient selection task into sub tasks and distributes them to workers. DEFT differs from existing sparsifiers, wherein every worker selects gradients among all gradients. Consequently, the computational cost can be reduced as the number of workers increases. Moreover, gradient build-up can be eliminated because DEFT allows workers to select gradients in partitions that are non-intersecting (between workers). Therefore, even if the number of workers increases, the communication traffic can be maintained as per user requirement. To avoid the loss of significance of gradient selection, DEFT selects more gradients in the layers that have a larger gradient norm than the other layers. Because every layer has a different computational load, DEFT allocates layers to workers using a bin-packing algorithm to maintain a balanced load of gradient selection between workers. In our empirical evaluation, DEFT shows a significant improvement in training performance in terms of speed in gradient selection over existing sparsifiers while achieving high convergence performance. △ Less

Submitted 13 July, 2023; v1 submitted 7 July, 2023; originally announced July 2023.

Comments: International Conference on Parallel Processing (ICPP) 2023. Code: https://github.com/kljp/deft

arXiv:2306.07052 [pdf, other]

Gradient Ascent Post-training Enhances Language Model Generalization

Authors: Dongkeun Yoon, Joel Jang, Sungdong Kim, Minjoon Seo

Abstract: In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applyi… ▽ More In this work, we empirically show that updating pretrained LMs (350M, 1.3B, 2.7B) with just a few steps of Gradient Ascent Post-training (GAP) on random, unlabeled text corpora enhances its zero-shot generalization capabilities across diverse NLP tasks. Specifically, we show that GAP can allow LMs to become comparable to 2-3x times larger LMs across 12 different NLP tasks. We also show that applying GAP on out-of-distribution corpora leads to the most reliable performance improvements. Our findings indicate that GAP can be a promising method for improving the generalization capability of LMs without any task-specific fine-tuning. △ Less

Submitted 12 June, 2023; originally announced June 2023.

Comments: ACL 2023 Main Conference (Short Paper)

arXiv:2305.09248 [pdf, other]

Maximum-Width Rainbow-Bisecting Empty Annulus

Authors: Sang Won Bae, Sandip Banerjee, Arpita Baral, Priya Ranjan Sinha Mahapatra, Sang Duk Yoon

Abstract: Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of… ▽ More Given a set of $n$ colored points with $k$ colors in the plane, we study the problem of computing a maximum-width rainbow-bisecting empty annulus (of objects specifically axis-parallel square, axis-parallel rectangle and circle) problem. We call a region rainbow if it contains at least one point of each color. The maximum-width rainbow-bisecting empty annulus problem asks to find an annulus $A$ of a particular shape with maximum possible width such that $A$ does not contain any input points and it bisects the input point set into two parts, each of which is a rainbow. We compute a maximum-width rainbow-bisecting empty axis-parallel square, axis-parallel rectangular and circular annulus in $O(n^3)$ time using $O(n)$ space, in $O(k^2n^2\log n)$ time using $O(n\log n)$ space and in $O(n^3)$ time using $O(n^2)$ space respectively. △ Less

Submitted 26 March, 2024; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: A preliminary version is accepted in EuroCG 2021 and the expanded version is accepted in the journal Computational Geometry: Theory and Applications

arXiv:2304.13215 [pdf, other]

PROBE3.0: A Systematic Framework for Design-Technology Pathfinding with Improved Design Enablement

Authors: Suhyeong Choi, Jinwook Jung, Andrew B. Kahng, Minsoo Kim, Chul-Hong Park, Bodhisatta Pramanik, Dooseok Yoon

Abstract: We propose a systematic framework to conduct design-technology pathfinding for PPAC in advanced nodes. Our goal is to provide configurable, scalable generation of process design kit (PDK) and standard-cell library, spanning key scaling boosters (backside PDN and buried power rail), to explore PPAC across given technology and design parameters. We build on PROBE2.0, which addressed only area and co… ▽ More We propose a systematic framework to conduct design-technology pathfinding for PPAC in advanced nodes. Our goal is to provide configurable, scalable generation of process design kit (PDK) and standard-cell library, spanning key scaling boosters (backside PDN and buried power rail), to explore PPAC across given technology and design parameters. We build on PROBE2.0, which addressed only area and cost (AC), to include power and performance (PP) evaluations through automated generation of full design enablements. We also improve the use of artificial designs in the PPAC assessment of technology and design configurations. We generate more realistic artificial designs by applying a machine learning-based parameter tuning flow. We further employ clustering-based cell width-regularized placements at the core of routability assessment, enabling more realistic placement utilization and improved experimental efficiency. We demonstrate PPAC evaluation across scaling boosters and artificial designs in a predictive technology node. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: 14 pages, 17 figures, submitted to IEEE Trans. on CAD

arXiv:2304.10805 [pdf, other]

RPLKG: Robust Prompt Learning with Knowledge Graph

Authors: Yewon Kim, YongTaek Lim, Dokyung Yoon, KyungWoo Song

Abstract: Large-scale pre-trained models have been known that they are transferable, and they generalize well on the unseen dataset. Recently, multimodal pre-trained models such as CLIP show significant performance improvement in diverse experiments. However, when the labeled dataset is limited, the generalization of a new dataset or domain is still challenging. To improve the generalization performance on… ▽ More Large-scale pre-trained models have been known that they are transferable, and they generalize well on the unseen dataset. Recently, multimodal pre-trained models such as CLIP show significant performance improvement in diverse experiments. However, when the labeled dataset is limited, the generalization of a new dataset or domain is still challenging. To improve the generalization performance on few-shot learning, there have been diverse efforts, such as prompt learning and adapter. However, the current few-shot adaptation methods are not interpretable, and they require a high computation cost for adaptation. In this study, we propose a new method, robust prompt learning with knowledge graph (RPLKG). Based on the knowledge graph, we automatically design diverse interpretable and meaningful prompt sets. Our model obtains cached embeddings of prompt sets after one forwarding from a large pre-trained model. After that, model optimizes the prompt selection processes with GumbelSoftmax. In this way, our model is trained using relatively little memory and learning time. Also, RPLKG selects the optimal interpretable prompt automatically, depending on the dataset. In summary, RPLKG is i) interpretable, ii) requires small computation resources, and iii) easy to incorporate prior human knowledge. To validate the RPLKG, we provide comprehensive experimental results on few-shot learning, domain generalization and new class generalization setting. RPLKG shows a significant performance improvement compared to zero-shot learning and competitive performance against several prompt learning methods using much lower resources. △ Less

Submitted 21 April, 2023; originally announced April 2023.

arXiv:2304.03456 [pdf, other]

Rethinking Evaluation Protocols of Visual Representations Learned via Self-supervised Learning

Authors: Jae-Hun Lee, Doyoung Yoon, ByeongMoon Ji, Kyungyul Kim, Sangheum Hwang

Abstract: Linear probing (LP) (and $k$-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensi… ▽ More Linear probing (LP) (and $k$-NN) on the upstream dataset with labels (e.g., ImageNet) and transfer learning (TL) to various downstream datasets are commonly employed to evaluate the quality of visual representations learned via self-supervised learning (SSL). Although existing SSL methods have shown good performances under those evaluation protocols, we observe that the performances are very sensitive to the hyperparameters involved in LP and TL. We argue that this is an undesirable behavior since truly generic representations should be easily adapted to any other visual recognition task, i.e., the learned representations should be robust to the settings of LP and TL hyperparameters. In this work, we try to figure out the cause of performance sensitivity by conducting extensive experiments with state-of-the-art SSL methods. First, we find that input normalization for LP is crucial to eliminate performance variations according to the hyperparameters. Specifically, batch normalization before feeding inputs to a linear classifier considerably improves the stability of evaluation, and also resolves inconsistency of $k$-NN and LP metrics. Second, for TL, we demonstrate that a weight decay parameter in SSL significantly affects the transferability of learned representations, which cannot be identified by LP or $k$-NN evaluations on the upstream dataset. We believe that the findings of this study will be beneficial for the community by drawing attention to the shortcomings in the current SSL evaluation schemes and underscoring the need to reconsider them. △ Less

Submitted 6 April, 2023; originally announced April 2023.

arXiv:2303.06511 [pdf, other]

Need for Speed: Fast Correspondence-Free Lidar-Inertial Odometry Using Doppler Velocity

Authors: David J. Yoon, Keenan Burnett, Johann Laconte, Yi Chen, Heethesh Vhavle, Soeren Kammel, James Reuther, Timothy D. Barfoot

Abstract: In this paper, we present a fast, lightweight odometry method that uses the Doppler velocity measurements from a Frequency-Modulated Continuous-Wave (FMCW) lidar without data association. FMCW lidar is a recently emerging technology that enables per-return relative radial velocity measurements via the Doppler effect. Since the Doppler measurement model is linear with respect to the 6-degrees-of-fr… ▽ More In this paper, we present a fast, lightweight odometry method that uses the Doppler velocity measurements from a Frequency-Modulated Continuous-Wave (FMCW) lidar without data association. FMCW lidar is a recently emerging technology that enables per-return relative radial velocity measurements via the Doppler effect. Since the Doppler measurement model is linear with respect to the 6-degrees-of-freedom (DOF) vehicle velocity, we can formulate a linear continuous-time estimation problem for the velocity and numerically integrate for the 6-DOF pose estimate afterward. The caveat is that angular velocity is not observable with a single FMCW lidar. We address this limitation by also incorporating the angular velocity measurements from a gyroscope. This results in an extremely efficient odometry method that processes lidar frames at an average wall-clock time of 5.64ms on a single thread, well below the 10Hz operating rate of the lidar we tested. We show experimental results on real-world driving sequences and compare against state-of-the-art Iterative Closest Point (ICP)-based odometry methods, presenting a compelling trade-off between accuracy and computation. We also present an algebraic observability study, where we demonstrate in theory that the Doppler measurements from multiple FMCW lidars are capable of observing all 6 degrees of freedom (translational and angular velocity). △ Less

Submitted 29 September, 2023; v1 submitted 11 March, 2023; originally announced March 2023.

Comments: Accepted and presented at IROS 2023

arXiv:2303.06507 [pdf, other]

Towards Consistent Batch State Estimation Using a Time-Correlated Measurement Noise Model

Authors: David J. Yoon, Timothy D. Barfoot

Abstract: In this paper, we present an algorithm for learning time-correlated measurement covariances for application in batch state estimation. We parameterize the inverse measurement covariance matrix to be block-banded, which conveniently factorizes and results in a computationally efficient approach for correlating measurements across the entire trajectory. We train our covariance model through supervis… ▽ More In this paper, we present an algorithm for learning time-correlated measurement covariances for application in batch state estimation. We parameterize the inverse measurement covariance matrix to be block-banded, which conveniently factorizes and results in a computationally efficient approach for correlating measurements across the entire trajectory. We train our covariance model through supervised learning using the groundtruth trajectory. In applications where the measurements are time-correlated, we demonstrate improved performance in both the mean posterior estimate and the covariance (i.e., improved estimator consistency). We use an experimental dataset collected using a mobile robot equipped with a laser rangefinder to demonstrate the improvement in performance. We also verify estimator consistency in a controlled simulation using a statistical test over several trials. △ Less

Submitted 11 March, 2023; originally announced March 2023.

Comments: ICRA 2023

arXiv:2301.08443 [pdf, other]

doi 10.1109/ICIP46576.2022.9898012

DIFAI: Diverse Facial Inpainting using StyleGAN Inversion

Authors: Dongsik Yoon, Jeong-gi Kwak, Yuanming Li, David Han, Hanseok Ko

Abstract: Image inpainting is an old problem in computer vision that restores occluded regions and completes damaged images. In the case of facial image inpainting, most of the methods generate only one result for each masked image, even though there are other reasonable possibilities. To prevent any potential biases and unnatural constraints stemming from generating only one image, we propose a novel frame… ▽ More Image inpainting is an old problem in computer vision that restores occluded regions and completes damaged images. In the case of facial image inpainting, most of the methods generate only one result for each masked image, even though there are other reasonable possibilities. To prevent any potential biases and unnatural constraints stemming from generating only one image, we propose a novel framework for diverse facial inpainting exploiting the embedding space of StyleGAN. Our framework employs pSp encoder and SeFa algorithm to identify semantic components of the StyleGAN embeddings and feed them into our proposed SPARN decoder that adopts region normalization for plausible inpainting. We demonstrate that our proposed method outperforms several state-of-the-art methods. △ Less

Submitted 20 January, 2023; originally announced January 2023.

Comments: ICIP 2022

arXiv:2301.08044 [pdf, other]

Reference Guided Image Inpainting using Facial Attributes

Authors: Dongsik Yoon, Jeonggi Kwak, Yuanming Li, David Han, Youngsaeng Jin, Hanseok Ko

Abstract: Image inpainting is a technique of completing missing pixels such as occluded region restoration, distracting objects removal, and facial completion. Among these inpainting tasks, facial completion algorithm performs face inpainting according to the user direction. Existing approaches require delicate and well controlled input by the user, thus it is difficult for an average user to provide the gu… ▽ More Image inpainting is a technique of completing missing pixels such as occluded region restoration, distracting objects removal, and facial completion. Among these inpainting tasks, facial completion algorithm performs face inpainting according to the user direction. Existing approaches require delicate and well controlled input by the user, thus it is difficult for an average user to provide the guidance sufficiently accurate for the algorithm to generate desired results. To overcome this limitation, we propose an alternative user-guided inpainting architecture that manipulates facial attributes using a single reference image as the guide. Our end-to-end model consists of attribute extractors for accurate reference image attribute transfer and an inpainting model to map the attributes realistically and accurately to generated images. We customize MS-SSIM loss and learnable bidirectional attention maps in which importance structures remain intact even with irregular shaped masks. Based on our evaluation using the publicly available dataset CelebA-HQ, we demonstrate that the proposed method delivers superior performance compared to some state-of-the-art methods specialized in inpainting tasks. △ Less

Submitted 19 January, 2023; originally announced January 2023.

Comments: BMVC 2021

arXiv:2301.00310 [pdf, other]

Graphlets over Time: A New Lens for Temporal Network Analysis

Authors: Deukryeol Yoon, Dongjin Lee, Minyoung Choe, Kijung Shin

Abstract: Graphs are widely used for modeling various types of interactions, such as email communications and online discussions. Many of such real-world graphs are temporal, and specifically, they grow over time with new nodes and edges. Counting the instances of each graphlet (i.e., an induced subgraph isomorphism class) has been successful in characterizing local structures of graphs, with many applica… ▽ More Graphs are widely used for modeling various types of interactions, such as email communications and online discussions. Many of such real-world graphs are temporal, and specifically, they grow over time with new nodes and edges. Counting the instances of each graphlet (i.e., an induced subgraph isomorphism class) has been successful in characterizing local structures of graphs, with many applications. While graphlets have been extended for temporal graphs, the extensions are designed for examining temporally-local subgraphs composed of edges with close arrival times, instead of long-term changes in local structures. In this paper, as a new lens for temporal graph analysis, we study the evolution of distributions of graphlet instances over time in real-world graphs at three different levels (graphs, nodes, and edges). At the graph level, we first discover that the evolution patterns are significantly different from those in random graphs. Then, we suggest a graphlet transition graph for measuring the similarity of the evolution patterns of graphs, and we find out a surprising similarity between the graphs from the same domain. At the node and edge levels, we demonstrate that the local structures around nodes and edges in their early stage provide a strong signal regarding their future importance. In particular, we significantly improve the predictability of the future importance of nodes and edges using the counts of the roles (a.k.a., orbits) that they take within graphlets. △ Less

Submitted 3 January, 2023; v1 submitted 31 December, 2022; originally announced January 2023.

Comments: 13 pages, 7 figures

arXiv:2212.02059 [pdf, other]

Region-Conditioned Orthogonal 3D U-Net for Weather4Cast Competition

Authors: Taehyeon Kim, Shinhwan Kang, Hyeonjeong Shin, Deukryeol Yoon, Seongha Eom, Kijung Shin, Se-Young Yun

Abstract: The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional la… ▽ More The Weather4Cast competition (hosted by NeurIPS 2022) required competitors to predict super-resolution rain movies in various regions of Europe when low-resolution satellite contexts covering wider regions are given. In this paper, we show that a general baseline 3D U-Net can be significantly improved with region-conditioned layers as well as orthogonality regularizations on 1x1x1 convolutional layers. Additionally, we facilitate the generalization with a bag of training strategies: mixup data augmentation, self-distillation, and feature-wise linear modulation (FiLM). Presented modifications outperform the baseline algorithms (3D U-Net) by up to 19.54% with less than 1% additional parameters, which won the 4th place in the core test leaderboard. △ Less

Submitted 5 December, 2022; originally announced December 2022.

Comments: workshop at NeurIPS 2022 Competition Track on Weather4Cast

arXiv:2211.14807 [pdf, other]

Universal convex covering problems under translation and discrete rotations

Authors: Mook Kwon Jung, Sang Duk Yoon, Hee-Kap Ahn, Takeshi Tokuyama

Abstract: We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $π$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curv… ▽ More We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $π$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curves of length 2 under translation and discrete rotations of multiples of $π/2$ and $2π/3$. We show a minimality of the covering for discrete rotation of multiples of $π/2$, which is an equilateral triangle of height smaller than 1, and conjecture that the covering is the smallest-area convex covering. Finally, we give the smallest-area convex coverings of all unit segments under translation and discrete rotations $2π/k$ for all integers $k\ge 3$. △ Less

Submitted 27 November, 2022; originally announced November 2022.

MSC Class: 52C15; 05B40 ACM Class: F.0; G.0

arXiv:2210.07760 [pdf, other]

Lightweight Alpha Matting Network Using Distillation-Based Channel Pruning

Authors: Donggeun Yoon, Jinsun Park, Donghyeon Cho

Abstract: Recently, alpha matting has received a lot of attention because of its usefulness in mobile applications such as selfies. Therefore, there has been a demand for a lightweight alpha matting model due to the limited computational resources of commercial portable devices. To this end, we suggest a distillation-based channel pruning method for the alpha matting networks. In the pruning step, we remove… ▽ More Recently, alpha matting has received a lot of attention because of its usefulness in mobile applications such as selfies. Therefore, there has been a demand for a lightweight alpha matting model due to the limited computational resources of commercial portable devices. To this end, we suggest a distillation-based channel pruning method for the alpha matting networks. In the pruning step, we remove channels of a student network having fewer impacts on mimicking the knowledge of a teacher network. Then, the pruned lightweight student network is trained by the same distillation loss. A lightweight alpha matting model from the proposed method outperforms existing lightweight methods. To show superiority of our algorithm, we provide various quantitative and qualitative experiments with in-depth analyses. Furthermore, we demonstrate the versatility of the proposed distillation-based channel pruning method by applying it to semantic segmentation. △ Less

Submitted 14 October, 2022; originally announced October 2022.

Comments: Accepted by ACCV2022

arXiv:2210.01504 [pdf, other]

Knowledge Unlearning for Mitigating Privacy Risks in Language Models

Authors: Joel Jang, Dongkeun Yoon, Sohee Yang, Sungmin Cha, Moontae Lee, Lajanugen Logeswaran, Minjoon Seo

Abstract: Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an… ▽ More Pretrained Language Models (LMs) memorize a vast amount of knowledge during initial pretraining, including information that may violate the privacy of personal lives and identities. Previous work addressing privacy issues for language models has mostly focused on data preprocessing and differential privacy methods, both requiring re-training the underlying LM. We propose knowledge unlearning as an alternative method to reduce privacy risks for LMs post hoc. We show that simply performing gradient ascent on target token sequences is effective at forgetting them with little to no degradation of general language modeling performances for larger LMs; it sometimes even substantially improves the underlying LM with just a few iterations. We also find that sequential unlearning is better than trying to unlearn all the data at once and that unlearning is highly dependent on which kind of data (domain) is forgotten. By showing comparisons with a previous data preprocessing method and a decoding method known to mitigate privacy risks for LMs, we show that unlearning can give a stronger empirical privacy guarantee in scenarios where the data vulnerable to extraction attacks are known a priori while being much more efficient and robust. We release the code and dataset needed to replicate our results at https://github.com/joeljang/knowledge-unlearning. △ Less

Submitted 19 December, 2022; v1 submitted 4 October, 2022; originally announced October 2022.

arXiv:2209.08497 [pdf, other]

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

Authors: Daegun Yoon, Sangyoon Oh

Abstract: To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting… ▽ More To train deep learning models faster, distributed training on multiple GPUs is the very popular scheme in recent years. However, the communication bandwidth is still a major bottleneck of training performance. To improve overall training performance, recent works have proposed gradient sparsification methods that reduce the communication traffic significantly. Most of them require gradient sorting to select meaningful gradients such as Top-k gradient sparsification (Top-k SGD). However, Top-k SGD has a limit to increase the speed up overall training performance because gradient sorting is significantly inefficient on GPUs. In this paper, we conduct experiments that show the inefficiency of Top-k SGD and provide the insight of the low performance. Based on observations from our empirical analysis, we plan to yield a high performance gradient sparsification method as a future work. △ Less

Submitted 18 September, 2022; originally announced September 2022.

Comments: 4 pages, 4 figures, The 8th International Conference on Next Generation Computing (ICNGC) 2022

arXiv:2209.03304 [pdf, other]

Picking Up Speed: Continuous-Time Lidar-Only Odometry using Doppler Velocity Measurements

Authors: Yuchen Wu, David J. Yoon, Keenan Burnett, Soeren Kammel, Yi Chen, Heethesh Vhavle, Timothy D. Barfoot

Abstract: Frequency-Modulated Continuous-Wave (FMCW) lidar is a recently emerging technology that additionally enables per-return instantaneous relative radial velocity measurements via the Doppler effect. In this letter, we present the first continuous-time lidar-only odometry algorithm using these Doppler velocity measurements from an FMCW lidar to aid odometry in geometrically degenerate environments. We… ▽ More Frequency-Modulated Continuous-Wave (FMCW) lidar is a recently emerging technology that additionally enables per-return instantaneous relative radial velocity measurements via the Doppler effect. In this letter, we present the first continuous-time lidar-only odometry algorithm using these Doppler velocity measurements from an FMCW lidar to aid odometry in geometrically degenerate environments. We apply an existing continuous-time framework that efficiently estimates the vehicle trajectory using Gaussian process regression to compensate for motion distortion due to the scanning-while-moving nature of any mechanically actuated lidar (FMCW and non-FMCW). We evaluate our proposed algorithm on several real-world datasets, including publicly available ones and datasets we collected. Our algorithm outperforms the only existing method that also uses Doppler velocity measurements, and we study difficult conditions where including this extra information greatly improves performance. We additionally demonstrate state-of-the-art performance of lidar-only odometry with and without using Doppler velocity measurements in nominal conditions. Code for this project can be found at: https://github.com/utiasASRL/steam_icp. △ Less

Submitted 3 December, 2022; v1 submitted 7 September, 2022; originally announced September 2022.

Comments: RA-L & ICRA2023

arXiv:2208.12392 [pdf, other]

DiVa: An Accelerator for Differentially Private Machine Learning

Authors: Beomsik Park, Ranggi Hwang, Dongho Yoon, Yoonhyuk Choi, Minsoo Rhu

Abstract: The widespread deployment of machine learning (ML) is raising serious concerns on protecting the privacy of users who contributed to the collection of training data. Differential privacy (DP) is rapidly gaining momentum in the industry as a practical standard for privacy protection. Despite DP's importance, however, little has been explored within the computer systems community regarding the impli… ▽ More The widespread deployment of machine learning (ML) is raising serious concerns on protecting the privacy of users who contributed to the collection of training data. Differential privacy (DP) is rapidly gaining momentum in the industry as a practical standard for privacy protection. Despite DP's importance, however, little has been explored within the computer systems community regarding the implication of this emerging ML algorithm on system designs. In this work, we conduct a detailed workload characterization on a state-of-the-art differentially private ML training algorithm named DP-SGD. We uncover several unique properties of DP-SGD (e.g., its high memory capacity and computation requirements vs. non-private ML), root-causing its key bottlenecks. Based on our analysis, we propose an accelerator for differentially private ML named DiVa, which provides a significant improvement in compute utilization, leading to 2.6x higher energy-efficiency vs. conventional systolic arrays. △ Less

Submitted 25 August, 2022; originally announced August 2022.

Comments: Accepted for publication at the 55th IEEE/ACM International Symposium on Microarchitecture (MICRO-55), 2022

arXiv:2207.10257 [pdf, other]

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

Authors: Jeong-gi Kwak, Yuanming Li, Dongsik Yoon, Donghyeon Kim, David Han, Hanseok Ko

Abstract: Over the years, 2D GANs have achieved great successes in photorealistic portrait generation. However, they lack 3D understanding in the generation process, thus they suffer from multi-view inconsistency problem. To alleviate the issue, many 3D-aware GANs have been proposed and shown notable results, but 3D GANs struggle with editing semantic attributes. The controllability and interpretability of… ▽ More Over the years, 2D GANs have achieved great successes in photorealistic portrait generation. However, they lack 3D understanding in the generation process, thus they suffer from multi-view inconsistency problem. To alleviate the issue, many 3D-aware GANs have been proposed and shown notable results, but 3D GANs struggle with editing semantic attributes. The controllability and interpretability of 3D GANs have not been much explored. In this work, we propose two solutions to overcome these weaknesses of 2D GANs and 3D-aware GANs. We first introduce a novel 3D-aware GAN, SURF-GAN, which is capable of discovering semantic attributes during training and controlling them in an unsupervised manner. After that, we inject the prior of SURF-GAN into StyleGAN to obtain a high-fidelity 3D-controllable generator. Unlike existing latent-based methods allowing implicit pose control, the proposed 3D-controllable StyleGAN enables explicit pose control over portrait generation. This distillation allows direct compatibility between 3D control and many StyleGAN-based techniques (e.g., inversion and stylization), and also brings an advantage in terms of computational resources. Our codes are available at https://github.com/jgkwak95/SURF-GAN. △ Less

Submitted 26 July, 2022; v1 submitted 20 July, 2022; originally announced July 2022.

Comments: ECCV 2022, project page: https://jgkwak95.github.io/surfgan/

arXiv:2206.04386 [pdf]

doi 10.1145/3491101.3519859

Interaction Design for VR Applications: Understanding Needs for University Curricula

Authors: Oloff C. Biermann, Daniel Ajisafe, Dongwook Yoon

Abstract: As virtual reality (VR) is emerging in the tech sector, developers and designers are under pressure to create immersive experiences for their products. However, the current curricula from top institutions focus primarily on technical considerations for building VR applications, missing out on concerns and usability problems specific to VR interaction design. To better understand current needs, we… ▽ More As virtual reality (VR) is emerging in the tech sector, developers and designers are under pressure to create immersive experiences for their products. However, the current curricula from top institutions focus primarily on technical considerations for building VR applications, missing out on concerns and usability problems specific to VR interaction design. To better understand current needs, we examined the status quo of existing university pedagogies by carrying out a content analysis of undergraduate and graduate courses about VR and related areas offered in the major citadels of learning and conducting interviews with 7 industry experts. Our analysis reveals that the current teaching practices underemphasize design thinking, prototyping, and evaluation skills, while focusing on technical implementation. We recommend VR curricula should emphasize design principles and guidelines, offer training in prototyping and ideation, prioritize practical design exercises while providing industry insights, and encourage students to solve VR design problems beyond the classroom. △ Less

Submitted 9 June, 2022; originally announced June 2022.

Comments: 7 pages, 2 figures, published to CHI EA. For the associated presentation, see https://dl.acm.org/action/downloadSupplement?doi=10.1145%2F3491101.3519859&file=3491101.3519859-talk-video.mp4

ACM Class: K.3.2; H.5.1

Journal ref: CHI '22 Extended Abstracts. ACM, New York, NY, USA, 7 pages (2022)

arXiv:2205.02974 [pdf, other]

Generate and Edit Your Own Character in a Canonical View

Authors: Jeong-gi Kwak, Yuanming Li, Dongsik Yoon, David Han, Hanseok Ko

Abstract: Recently, synthesizing personalized characters from a single user-given portrait has received remarkable attention as a drastic popularization of social media and the metaverse. The input image is not always in frontal view, thus it is important to acquire or predict canonical view for 3D modeling or other applications. Although the progress of generative models enables the stylization of a portra… ▽ More Recently, synthesizing personalized characters from a single user-given portrait has received remarkable attention as a drastic popularization of social media and the metaverse. The input image is not always in frontal view, thus it is important to acquire or predict canonical view for 3D modeling or other applications. Although the progress of generative models enables the stylization of a portrait, obtaining the stylized image in canonical view is still a challenging task. There have been several studies on face frontalization but their performance significantly decreases when input is not in the real image domain, e.g., cartoon or painting. Stylizing after frontalization also results in degenerated output. In this paper, we propose a novel and unified framework which generates stylized portraits in canonical view. With a proposed latent mapper, we analyze and discover frontalization mapping in a latent space of StyleGAN to stylize and frontalize at once. In addition, our model can be trained with unlabelled 2D image sets, without any 3D supervision. The effectiveness of our method is demonstrated by experimental results. △ Less

Submitted 11 July, 2022; v1 submitted 5 May, 2022; originally announced May 2022.

Comments: AI for Content Creation Workshop at CVPR 2022

arXiv:2203.10174 [pdf, other]

Are We Ready for Radar to Replace Lidar in All-Weather Mapping and Localization?

Authors: Keenan Burnett, Yuchen Wu, David J. Yoon, Angela P. Schoellig, Timothy D. Barfoot

Abstract: We present an extensive comparison between three topometric localization systems: radar-only, lidar-only, and a cross-modal radar-to-lidar system across varying seasonal and weather conditions using the Boreas dataset. Contrary to our expectations, our experiments showed that our lidar-only pipeline achieved the best localization accuracy even during a snowstorm. Our results seem to suggest that t… ▽ More We present an extensive comparison between three topometric localization systems: radar-only, lidar-only, and a cross-modal radar-to-lidar system across varying seasonal and weather conditions using the Boreas dataset. Contrary to our expectations, our experiments showed that our lidar-only pipeline achieved the best localization accuracy even during a snowstorm. Our results seem to suggest that the sensitivity of lidar localization to moderate precipitation has been exaggerated in prior works. However, our radar-only pipeline was able to achieve competitive accuracy with a much smaller map. Furthermore, radar localization and radar sensors still have room to improve and may yet prove valuable in extreme weather or as a redundant backup system. Code for this project can be found at: https://github.com/utiasASRL/vtr3 △ Less

Submitted 8 June, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: Version 3: Accepted to RA-L, presented at IROS 2022. Localization results updated due to improved ground truth and calibration. Also switched Huber Loss for Cauchy Loss for the radar-based approaches

arXiv:2203.10168 [pdf, other]

Boreas: A Multi-Season Autonomous Driving Dataset

Authors: Keenan Burnett, David J. Yoon, Yuchen Wu, Andrew Zou Li, Haowei Zhang, Shichen Lu, Jingxing Qian, Wei-Kang Tseng, Andrew Lambert, Keith Y. K. Leung, Angela P. Schoellig, Timothy D. Barfoot

Abstract: The Boreas dataset was collected by driving a repeated route over the course of one year, resulting in stark seasonal variations and adverse weather conditions such as rain and falling snow. In total, the Boreas dataset includes over 350km of driving data featuring a 128-channel Velodyne Alpha Prime lidar, a 360$^\circ$ Navtech CIR304-H scanning radar, a 5MP FLIR Blackfly S camera, and centimetre-… ▽ More The Boreas dataset was collected by driving a repeated route over the course of one year, resulting in stark seasonal variations and adverse weather conditions such as rain and falling snow. In total, the Boreas dataset includes over 350km of driving data featuring a 128-channel Velodyne Alpha Prime lidar, a 360$^\circ$ Navtech CIR304-H scanning radar, a 5MP FLIR Blackfly S camera, and centimetre-accurate post-processed ground truth poses. Our dataset will support live leaderboards for odometry, metric localization, and 3D object detection. The dataset and development kit are available at https://www.boreas.utias.utoronto.ca △ Less

Submitted 26 January, 2023; v1 submitted 18 March, 2022; originally announced March 2022.

Comments: Accepted in IJRR as a data paper

arXiv:2202.13799 [pdf, other]

One-shot Ultra-high-Resolution Generative Adversarial Network That Synthesizes 16K Images On A Single GPU

Authors: Junseok Oh, Donghwee Yoon, Injung Kim

Abstract: We propose a one-shot ultra-high-resolution generative adversarial network (OUR-GAN) framework that generates non-repetitive 16K (16, 384 x 8, 640) images from a single training image and is trainable on a single consumer GPU. OUR-GAN generates an initial image that is visually plausible and varied in shape at low resolution, and then gradually increases the resolution by adding detail through sup… ▽ More We propose a one-shot ultra-high-resolution generative adversarial network (OUR-GAN) framework that generates non-repetitive 16K (16, 384 x 8, 640) images from a single training image and is trainable on a single consumer GPU. OUR-GAN generates an initial image that is visually plausible and varied in shape at low resolution, and then gradually increases the resolution by adding detail through super-resolution. Since OUR-GAN learns from a real ultra-high-resolution (UHR) image, it can synthesize large shapes with fine details and long-range coherence, which is difficult to achieve with conventional generative models that rely on the patch distribution learned from relatively small images. OUR-GAN can synthesize high-quality 16K images with 12.5 GB of GPU memory and 4K images with only 4.29 GB as it synthesizes a UHR image part by part through seamless subregion-wise super-resolution. Additionally, OUR-GAN improves visual coherence while maintaining diversity by applying vertical positional convolution. In experiments on the ST4K and RAISE datasets, OUR-GAN exhibited improved fidelity, visual coherency, and diversity compared with the baseline one-shot synthesis models. To the best of our knowledge, OUR-GAN is the first one-shot image synthesizer that generates non-repetitive UHR images on a single consumer GPU. The synthesized image samples are presented at https://our-gan.github.io. △ Less

Submitted 28 August, 2023; v1 submitted 28 February, 2022; originally announced February 2022.

Comments: 36 pages, 26 figures

arXiv:2112.04283 [pdf, other]

Adverse Weather Image Translation with Asymmetric and Uncertainty-aware GAN

Authors: Jeong-gi Kwak, Youngsaeng Jin, Yuanming Li, Dongsik Yoon, Donghyeon Kim, Hanseok Ko

Abstract: Adverse weather image translation belongs to the unsupervised image-to-image (I2I) translation task which aims to transfer adverse condition domain (eg, rainy night) to standard domain (eg, day). It is a challenging task because images from adverse domains have some artifacts and insufficient information. Recently, many studies employing Generative Adversarial Networks (GANs) have achieved notable… ▽ More Adverse weather image translation belongs to the unsupervised image-to-image (I2I) translation task which aims to transfer adverse condition domain (eg, rainy night) to standard domain (eg, day). It is a challenging task because images from adverse domains have some artifacts and insufficient information. Recently, many studies employing Generative Adversarial Networks (GANs) have achieved notable success in I2I translation but there are still limitations in applying them to adverse weather enhancement. Symmetric architecture based on bidirectional cycle-consistency loss is adopted as a standard framework for unsupervised domain transfer methods. However, it can lead to inferior translation result if the two domains have imbalanced information. To address this issue, we propose a novel GAN model, i.e., AU-GAN, which has an asymmetric architecture for adverse domain translation. We insert a proposed feature transfer network (${T}$-net) in only a normal domain generator (i.e., rainy night-> day) to enhance encoded features of the adverse domain image. In addition, we introduce asymmetric feature matching for disentanglement of encoded features. Finally, we propose uncertainty-aware cycle-consistency loss to address the regional uncertainty of a cyclic reconstructed image. We demonstrate the effectiveness of our method by qualitative and quantitative comparisons with state-of-the-art models. Codes are available at https://github.com/jgkwak95/AU-GAN. △ Less

Submitted 14 February, 2022; v1 submitted 8 December, 2021; originally announced December 2021.

Comments: BMVC 2021, codes are available in here: https://github.com/jgkwak95/AU-GAN

arXiv:2109.03845 [pdf, other]

StripBrush: A Constraint-Relaxed 3D Brush Reduces Physical Effort and Enhances the Quality of Spatial Drawing

Authors: Enrique Rosales, Jafet Rodriguez, Chrystiano Araújo, Nicholas Vining, Dongwook Yoon, Alla Sheffer

Abstract: Spatial drawing using ruled-surface brush strokes is a popular mode of content creation in immersive VR, yet little is known about the usability of existing spatial drawing interfaces or potential improvements. We address these questions in a three-phase study. (1) Our exploratory need-finding study (N=8) indicates that popular spatial brushes require users to perform large wrist motions, causing… ▽ More Spatial drawing using ruled-surface brush strokes is a popular mode of content creation in immersive VR, yet little is known about the usability of existing spatial drawing interfaces or potential improvements. We address these questions in a three-phase study. (1) Our exploratory need-finding study (N=8) indicates that popular spatial brushes require users to perform large wrist motions, causing physical strain. We speculate that this is partly due to constraining users to align their 3D controllers with their intended stroke normal orientation. (2) We designed and implemented a new brush interface that significantly reduces the physical effort and wrist motion involved in VR drawing, with the additional benefit of increasing drawing accuracy. We achieve this by relaxing the normal alignment constraints, allowing users to control stroke rulings, and estimating normals from them instead. (3) Our comparative evaluation of StripBrush (N=17) against the traditional brush shows that StripBrush requires significantly less physical effort and allows users to more accurately depict their intended shapes while offering competitive ease-of-use and speed. △ Less

Submitted 8 September, 2021; originally announced September 2021.

Comments: 12 pages, 13 figures, for associated video and supplementary files, see https://www.cs.ubc.ca/labs/imager/tr/2021/StripBrush/

MSC Class: 68Uxx ACM Class: I.3.4; H.5.2

arXiv:2105.14152 [pdf, other]

Radar Odometry Combining Probabilistic Estimation and Unsupervised Feature Learning

Authors: Keenan Burnett, David J. Yoon, Angela P. Schoellig, Timothy D. Barfoot

Abstract: This paper presents a radar odometry method that combines probabilistic trajectory estimation and deep learned features without needing groundtruth pose information. The feature network is trained unsupervised, using only the on-board radar data. With its theoretical foundation based on a data likelihood objective, our method leverages a deep network for processing rich radar data, and a non-diffe… ▽ More This paper presents a radar odometry method that combines probabilistic trajectory estimation and deep learned features without needing groundtruth pose information. The feature network is trained unsupervised, using only the on-board radar data. With its theoretical foundation based on a data likelihood objective, our method leverages a deep network for processing rich radar data, and a non-differentiable classic estimator for probabilistic inference. We provide extensive experimental results on both the publicly available Oxford Radar RobotCar Dataset and an additional 100 km of driving collected in an urban setting. Our sliding-window implementation of radar odometry outperforms most hand-crafted methods and approaches the current state of the art without requiring a groundtruth trajectory for training. We also demonstrate the effectiveness of radar odometry under adverse weather conditions. Code for this project can be found at: https://github.com/utiasASRL/hero_radar_odometry △ Less

Submitted 30 June, 2021; v1 submitted 28 May, 2021; originally announced May 2021.

Comments: Accepted to Robotics Science and Systems 2021

arXiv:2102.11261 [pdf, other]

Unsupervised Learning of Lidar Features for Use in a Probabilistic Trajectory Estimator

Authors: David J. Yoon, Haowei Zhang, Mona Gridseth, Hugues Thomas, Timothy D. Barfoot

Abstract: We present unsupervised parameter learning in a Gaussian variational inference setting that combines classic trajectory estimation for mobile robots with deep learning for rich sensor data, all under a single learning objective. The framework is an extension of an existing system identification method that optimizes for the observed data likelihood, which we improve with modern advances in batch t… ▽ More We present unsupervised parameter learning in a Gaussian variational inference setting that combines classic trajectory estimation for mobile robots with deep learning for rich sensor data, all under a single learning objective. The framework is an extension of an existing system identification method that optimizes for the observed data likelihood, which we improve with modern advances in batch trajectory estimation and deep learning. Though the framework is general to any form of parameter learning and sensor modality, we demonstrate application to feature and uncertainty learning with a deep network for 3D lidar odometry. Our framework learns from only the on-board lidar data, and does not require any form of groundtruth supervision. We demonstrate that our lidar odometry performs better than existing methods that learn the full estimator with a deep network, and comparable to state-of-the-art ICP-based methods on the KITTI odometry dataset. We additionally show results on lidar data from the Oxford RobotCar dataset. △ Less

Submitted 22 February, 2021; originally announced February 2021.

Comments: Accepted for publication in RA-L 2021

arXiv:2006.12000 [pdf, other]

Self-Knowledge Distillation with Progressive Refinement of Targets

Authors: Kyungyul Kim, ByeongMoon Ji, Doyoung Yoon, Sangheum Hwang

Abstract: The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD), which progressively distills a model's o… ▽ More The generalization capability of deep neural networks has been substantially improved by applying a wide spectrum of regularization methods, e.g., restricting function space, injecting randomness during training, augmenting data, etc. In this work, we propose a simple yet effective regularization method named progressive self-knowledge distillation (PS-KD), which progressively distills a model's own knowledge to soften hard targets (i.e., one-hot vectors) during training. Hence, it can be interpreted within a framework of knowledge distillation as a student becomes a teacher itself. Specifically, targets are adjusted adaptively by combining the ground-truth and past predictions from the model itself. We show that PS-KD provides an effect of hard example mining by rescaling gradients according to difficulty in classifying examples. The proposed method is applicable to any supervised learning tasks with hard targets and can be easily combined with existing regularization methods to further enhance the generalization performance. Furthermore, it is confirmed that PS-KD achieves not only better accuracy, but also provides high quality of confidence estimates in terms of calibration as well as ordinal ranking. Extensive experimental results on three different tasks, image classification, object detection, and machine translation, demonstrate that our method consistently improves the performance of the state-of-the-art baselines. The code is available at https://github.com/lgcnsai/PS-KD-Pytorch. △ Less

Submitted 7 October, 2021; v1 submitted 22 June, 2020; originally announced June 2020.

Comments: Accepted at ICCV 2021 (oral presentation)

arXiv:2004.08752 [pdf, other]

Zeus: A System Description of the Two-Time Winner of the Collegiate SAE AutoDrive Competition

Authors: Keenan Burnett, Jingxing Qian, Xintong Du, Linqiao Liu, David J. Yoon, Tianchang Shen, Susan Sun, Sepehr Samavi, Michael J. Sorocky, Mollie Bianchi, Kaicheng Zhang, Arkady Arkhangorodsky, Quinlan Sykora, Shichen Lu, Yizhou Huang, Angela P. Schoellig, Timothy D. Barfoot

Abstract: The SAE AutoDrive Challenge is a three-year collegiate competition to develop a self-driving car by 2020. The second year of the competition was held in June 2019 at MCity, a mock town built for self-driving car testing at the University of Michigan. Teams were required to autonomously navigate a series of intersections while handling pedestrians, traffic lights, and traffic signs. Zeus is aUToron… ▽ More The SAE AutoDrive Challenge is a three-year collegiate competition to develop a self-driving car by 2020. The second year of the competition was held in June 2019 at MCity, a mock town built for self-driving car testing at the University of Michigan. Teams were required to autonomously navigate a series of intersections while handling pedestrians, traffic lights, and traffic signs. Zeus is aUToronto's winning entry in the AutoDrive Challenge. This article describes the system design and development of Zeus as well as many of the lessons learned along the way. This includes details on the team's organizational structure, sensor suite, software components, and performance at the Year 2 competition. With a team of mostly undergraduates and minimal resources, aUToronto has made progress towards a functioning self-driving vehicle, in just two years. This article may prove valuable to researchers looking to develop their own self-driving platform. △ Less

Submitted 18 April, 2020; originally announced April 2020.

Comments: Submitted to the Journal of Field Robotics

arXiv:2003.09736 [pdf, other]

doi 10.1109/LRA.2020.3007381

Variational Inference with Parameter Learning Applied to Vehicle Trajectory Estimation

Authors: Jeremy N. Wong, David J. Yoon, Angela P. Schoellig, Timothy D. Barfoot

Abstract: We present parameter learning in a Gaussian variational inference setting using only noisy measurements (i.e., no groundtruth). This is demonstrated in the context of vehicle trajectory estimation, although the method we propose is general. The paper extends the Exactly Sparse Gaussian Variational Inference (ESGVI) framework, which has previously been used for large-scale nonlinear batch state est… ▽ More We present parameter learning in a Gaussian variational inference setting using only noisy measurements (i.e., no groundtruth). This is demonstrated in the context of vehicle trajectory estimation, although the method we propose is general. The paper extends the Exactly Sparse Gaussian Variational Inference (ESGVI) framework, which has previously been used for large-scale nonlinear batch state estimation. Our contribution is to additionally learn parameters of our system models (which may be difficult to choose in practice) within the ESGVI framework. In this paper, we learn the covariances for the motion and sensor models used within vehicle trajectory estimation. Specifically, we learn the parameters of a white-noise-on-acceleration motion model and the parameters of an Inverse-Wishart prior over measurement covariances for our sensor model. We demonstrate our technique using a 36~km dataset consisting of a car using lidar to localize against a high-definition map; we learn the parameters on a training section of the data and then show that we achieve high-quality state estimates on a test section, even in the presence of outliers. Lastly, we show that our framework can be used to solve pose graph optimization even with many false loop closures. △ Less

Submitted 9 July, 2020; v1 submitted 21 March, 2020; originally announced March 2020.

Comments: IEEE Robotics and Automation Letters (RA-L). 8 pages, 4 figures

arXiv:2003.07734 [pdf, other]

A Novel Online Action Detection Framework from Untrimmed Video Streams

Authors: Da-Hye Yoon, Nam-Gyu Cho, Seong-Whan Lee

Abstract: Online temporal action localization from an untrimmed video stream is a challenging problem in computer vision. It is challenging because of i) in an untrimmed video stream, more than one action instance may appear, including background scenes, and ii) in online settings, only past and current information is available. Therefore, temporal priors, such as the average action duration of training dat… ▽ More Online temporal action localization from an untrimmed video stream is a challenging problem in computer vision. It is challenging because of i) in an untrimmed video stream, more than one action instance may appear, including background scenes, and ii) in online settings, only past and current information is available. Therefore, temporal priors, such as the average action duration of training data, which have been exploited by previous action detection methods, are not suitable for this task because of the high intra-class variation in human actions. We propose a novel online action detection framework that considers actions as a set of temporally ordered subclasses and leverages a future frame generation network to cope with the limited information issue associated with the problem outlined above. Additionally, we augment our data by varying the lengths of videos to allow the proposed method to learn about the high intra-class variation in human actions. We evaluate our method using two benchmark datasets, THUMOS'14 and ActivityNet, for an online temporal action localization scenario and demonstrate that the performance is comparable to state-of-the-art methods that have been proposed for offline settings. △ Less

Submitted 17 March, 2020; originally announced March 2020.

arXiv:2001.00977 [pdf, other]

RF Fingerprinting and Deep Learning Assisted UE Positioning in 5G

Authors: M Majid Butt, Anil Rao, Daejung Yoon

Abstract: In this work, we investigate user equipment (UE) positioning assisted by deep learning (DL) in 5G and beyond networks. As compared to state of the art positioning algorithms used in today's networks, radio signal fingerprinting and machine learning (ML) assisted positioning requires smaller additional feedback overhead; and the positioning estimates are made directly inside the radio access networ… ▽ More In this work, we investigate user equipment (UE) positioning assisted by deep learning (DL) in 5G and beyond networks. As compared to state of the art positioning algorithms used in today's networks, radio signal fingerprinting and machine learning (ML) assisted positioning requires smaller additional feedback overhead; and the positioning estimates are made directly inside the radio access network (RAN), thereby assisting in radio resource management. The conventional positioning algorithms will be used as back-up for the environments with high variability in conditions; but ML-assisted positioning serves as more efficient and simpler technique to provide better or similar positioning accuracy. In this regard, we study ML-assisted positioning methods and evaluate their performance using system level simulations for an outdoor scenario in Lincoln park Chicago. The study is based on the use of raytracing tools, a 3GPP 5G NR compliant system level simulator and DL framework to estimate positioning accuracy of the UE. The use of raytracing tool and system level simulator helps avoid expensive drive test measurements in practical scenarios. Our proposed mechanism is a first step towards more proactive mobility management in future networks. We evaluate and compare performance of various DL models and show mean positioning error in the range of 1-1.5m for the best DL configuration with appropriate system feature-modeling. △ Less

Submitted 3 January, 2020; originally announced January 2020.

Comments: submitted to ICC 2020

arXiv:1912.03443 [pdf, other]

Joins on Samples: A Theoretical Guide for Practitioners

Authors: Dawei Huang, Dong Young Yoon, Seth Pettie, Barzan Mozafari

Abstract: Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join, and that it… ▽ More Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join, and that it leads to quadratically fewer output tuples. However, unfortunately, this result has little applicability to the key questions practitioners face. For example, the success metric is often the final approximation's accuracy, rather than output cardinality. Moreover, there are many non-uniform sampling strategies that one can employ. Is sampling for joins still futile in all of these settings? If not, what is the best sampling strategy in each case? To the best of our knowledge, there is no formal study answering these questions. This paper aims to improve our understanding of sample-based joins and offer a guideline for practitioners building and using real-world AQP systems. We study limitations of offline samples in approximating join queries: given an offline sampling budget, how well can one approximate the join of two tables? We answer this question for two success metrics: output size and estimator variance. We show that maximizing output size is easy, while there is an information-theoretical lower bound on the lowest variance achievable by any sampling strategy. We then define a hybrid sampling scheme that captures all combinations of stratified, universe, and Bernoulli sampling, and show that this scheme with our optimal parameters achieves the theoretical lower bound within a constant factor. Since computing these optimal parameters requires shuffling statistics across the network, we also propose a decentralized variant where each node acts autonomously using minimal statistics. △ Less

Submitted 24 January, 2020; v1 submitted 7 December, 2019; originally announced December 2019.

Comments: 19 pages

arXiv:1911.12988 [pdf, other]

Empty Squares in Arbitrary Orientation Among Points

Authors: Sang Won Bae, Sang Duk Yoon

Abstract: This paper studies empty squares in arbitrary orientation among a set $P$ of $n$ points in the plane. We prove that the number of empty squares with four contact pairs is between $Ω(n)$ and $O(n^2)$, and that these bounds are tight, provided $P$ is in a certain general position. A contact pair of a square is a pair of a point $p\in P$ and a side $\ell$ of the square with $p\in \ell$. The upper bou… ▽ More This paper studies empty squares in arbitrary orientation among a set $P$ of $n$ points in the plane. We prove that the number of empty squares with four contact pairs is between $Ω(n)$ and $O(n^2)$, and that these bounds are tight, provided $P$ is in a certain general position. A contact pair of a square is a pair of a point $p\in P$ and a side $\ell$ of the square with $p\in \ell$. The upper bound $O(n^2)$ also applies to the number of empty squares with four contact points, while we construct a point set among which there is no square of four contact points. These combinatorial results are based on new observations on the $L_\infty$ Voronoi diagram with the axes rotated and its close connection to empty squares in arbitrary orientation. We then present an algorithm that maintains a combinatorial structure of the $L_\infty$ Voronoi diagram of $P$, while the axes of the plane continuously rotates by $90$ degrees, and simultaneously reports all empty squares with four contact pairs among $P$ in an output-sensitive way within $O(s\log n)$ time and $O(n)$ space, where $s$ denotes the number of reported squares. Several new algorithmic results are also obtained: a largest empty square among $P$ and a square annulus of minimum width or minimum area that encloses $P$ over all orientations can be computed in worst-case $O(n^2 \log n)$ time. △ Less

Submitted 29 November, 2019; originally announced November 2019.

Comments: 39 pages, 11 figures

arXiv:1911.08333 [pdf, other]

Exactly Sparse Gaussian Variational Inference with Application to Derivative-Free Batch Nonlinear State Estimation

Authors: Timothy D. Barfoot, James R. Forbes, David Yoon

Abstract: We present a Gaussian Variational Inference (GVI) technique that can be applied to large-scale nonlinear batch state estimation problems. The main contribution is to show how to fit both the mean and (inverse) covariance of a Gaussian to the posterior efficiently, by exploiting factorization of the joint likelihood of the state and data, as is common in practical problems. This is different than M… ▽ More We present a Gaussian Variational Inference (GVI) technique that can be applied to large-scale nonlinear batch state estimation problems. The main contribution is to show how to fit both the mean and (inverse) covariance of a Gaussian to the posterior efficiently, by exploiting factorization of the joint likelihood of the state and data, as is common in practical problems. This is different than Maximum A Posteriori (MAP) estimation, which seeks the point estimate for the state that maximizes the posterior (i.e., the mode). The proposed Exactly Sparse Gaussian Variational Inference (ESGVI) technique stores the inverse covariance matrix, which is typically very sparse (e.g., block-tridiagonal for classic state estimation). We show that the only blocks of the (dense) covariance matrix that are required during the calculations correspond to the non-zero blocks of the inverse covariance matrix, and further show how to calculate these blocks efficiently in the general GVI problem. ESGVI operates iteratively, and while we can use analytical derivatives at each iteration, Gaussian cubature can be substituted, thereby producing an efficient derivative-free batch formulation. ESGVI simplifies to precisely the Rauch-Tung-Striebel (RTS) smoother in the batch linear estimation case, but goes beyond the 'extended' RTS smoother in the nonlinear case since it finds the best-fit Gaussian (mean and covariance), not the MAP point estimate. We demonstrate the technique on controlled simulation problems and a batch nonlinear Simultaneous Localization and Mapping (SLAM) problem with an experimental dataset. △ Less

Submitted 9 April, 2020; v1 submitted 9 November, 2019; originally announced November 2019.

Comments: Accepted to the International Journal of Robotics Research (IJRR) on 8 April 2020, # IJR-19-3748; 31 pages, 10 figures

arXiv:1811.09128 [pdf, other]

doi 10.1109/ACCESS.2020.3032344

Driver Behavior Recognition via Interwoven Deep Convolutional Neural Nets with Multi-stream Inputs

Authors: Chaoyun Zhang, Rui Li, Woojin Kim, Daesub Yoon, Paul Patras

Abstract: Understanding driver activity is vital for in-vehicle systems that aim to reduce the incidence of car accidents rooted in cognitive distraction. Automating real-time behavior recognition while ensuring actions classification with high accuracy is however challenging, given the multitude of circumstances surrounding drivers, the unique traits of individuals, and the computational constraints impose… ▽ More Understanding driver activity is vital for in-vehicle systems that aim to reduce the incidence of car accidents rooted in cognitive distraction. Automating real-time behavior recognition while ensuring actions classification with high accuracy is however challenging, given the multitude of circumstances surrounding drivers, the unique traits of individuals, and the computational constraints imposed by in-vehicle embedded platforms. Prior work fails to jointly meet these runtime/accuracy requirements and mostly rely on a single sensing modality, which in turn can be a single point of failure. In this paper, we harness the exceptional feature extraction abilities of deep learning and propose a dedicated Interwoven Deep Convolutional Neural Network (InterCNN) architecture to tackle the problem of accurate classification of driver behaviors in real-time. The proposed solution exploits information from multi-stream inputs, i.e., in-vehicle cameras with different fields of view and optical flows computed based on recorded images, and merges through multiple fusion layers abstract features that it extracts. This builds a tight ensembling system, which significantly improves the robustness of the model. In addition, we introduce a temporal voting scheme based on historical inference instances, to enhance the classification accuracy. Experiments conducted with a dataset that we collect in a mock-up car environment demonstrate that the proposed InterCNN with MobileNet convolutional blocks can classify 9 different behaviors with 73.97% accuracy, and 5 'aggregated' behaviors with 81.66% accuracy. We further show that our architecture is highly computationally efficient, as it performs inferences within 15ms, which satisfies the real-time constraints of intelligent cars. Nevertheless, our InterCNN is robust to lossy input, as the classification remains accurate when two input streams are occluded. △ Less

Submitted 21 February, 2021; v1 submitted 22 November, 2018; originally announced November 2018.

Comments: 13 pages, 15 figures

Journal ref: IEEE Access, vol. 8, pp. 191138-191151, 2020

arXiv:1809.06972 [pdf, other]

Mapless Online Detection of Dynamic Objects in 3D Lidar

Authors: David J. Yoon, Tim Y. Tang, Timothy D. Barfoot

Abstract: This paper presents a model-free, setting-independent method for online detection of dynamic objects in 3D lidar data. We explicitly compensate for the moving-while-scanning operation (motion distortion) of present-day 3D spinning lidar sensors. Our detection method uses a motion-compensated freespace querying algorithm and classifies between dynamic (currently moving) and static (currently statio… ▽ More This paper presents a model-free, setting-independent method for online detection of dynamic objects in 3D lidar data. We explicitly compensate for the moving-while-scanning operation (motion distortion) of present-day 3D spinning lidar sensors. Our detection method uses a motion-compensated freespace querying algorithm and classifies between dynamic (currently moving) and static (currently stationary) labels at the point level. For a quantitative analysis, we establish a benchmark with motion-distorted lidar data using CARLA, an open-source simulator for autonomous driving research. We also provide a qualitative analysis with real data using a Velodyne HDL-64E in driving scenarios. Compared to existing 3D lidar methods that are model-free, our method is unique because of its setting independence and compensation for pointcloud motion distortion. △ Less

Submitted 18 September, 2018; originally announced September 2018.

Comments: 7 pages, 8 figures

arXiv:1809.06518 [pdf, other]

A White-Noise-On-Jerk Motion Prior for Continuous-Time Trajectory Estimation on SE(3)

Authors: Tim Y. Tang, David J. Yoon, Timothy D. Barfoot

Abstract: Simultaneous trajectory estimation and mapping (STEAM) offers an efficient approach to continuous-time trajectory estimation, by representing the trajectory as a Gaussian process (GP). Previous formulations of the STEAM framework use a GP prior that assumes white-noise-on-acceleration, with the prior mean encouraging constant body-centric velocity. We show that such a prior cannot sufficiently rep… ▽ More Simultaneous trajectory estimation and mapping (STEAM) offers an efficient approach to continuous-time trajectory estimation, by representing the trajectory as a Gaussian process (GP). Previous formulations of the STEAM framework use a GP prior that assumes white-noise-on-acceleration, with the prior mean encouraging constant body-centric velocity. We show that such a prior cannot sufficiently represent trajectory sections with non-zero acceleration, resulting in a bias to the posterior estimates. This paper derives a novel motion prior that assumes white-noise-on-jerk, where the prior mean encourages constant body-centric acceleration. With the new prior, we formulate a variation of STEAM that estimates the pose, body-centric velocity, and body-centric acceleration. By evaluating across several datasets, we show that the new prior greatly outperforms the white-noise-on-acceleration prior in terms of solution accuracy. △ Less

Submitted 12 January, 2019; v1 submitted 17 September, 2018; originally announced September 2018.

Comments: To appear in IEEE Robotics and Automation Letters (RA-L). 8 pages, 5 figures

arXiv:1809.02292 [pdf, other]

A Block Coordinate Ascent Algorithm for Mean-Variance Optimization

Authors: Bo Liu, Tengyang Xie, Yangyang Xu, Mohammad Ghavamzadeh, Yinlam Chow, Daoming Lyu, Daesub Yoon

Abstract: Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, wh… ▽ More Risk management in dynamic decision problems is a primary concern in many fields, including financial investment, autonomous driving, and healthcare. The mean-variance function is one of the most widely used objective functions in risk management due to its simplicity and interpretability. Existing algorithms for mean-variance optimization are based on multi-time-scale stochastic approximation, whose learning rate schedules are often hard to tune, and have only asymptotic convergence proof. In this paper, we develop a model-free policy search framework for mean-variance optimization with finite-sample error bound analysis (to local optima). Our starting point is a reformulation of the original mean-variance function with its Fenchel dual, from which we propose a stochastic block coordinate ascent policy search algorithm. Both the asymptotic convergence guarantee of the last iteration's solution and the convergence rate of the randomly picked solution are provided, and their applicability is demonstrated on several benchmark domains. △ Less

Submitted 1 November, 2018; v1 submitted 6 September, 2018; originally announced September 2018.

Comments: Accepted by NIPS 2018

arXiv:1808.07383 [pdf, other]

Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding

Authors: Deunsol Yoon, Dongbok Lee, SangKeun Lee

Abstract: In this paper, we propose Dynamic Self-Attention (DSA), a new self-attention mechanism for sentence embedding. We design DSA by modifying dynamic routing in capsule network (Sabouretal.,2017) for natural language processing. DSA attends to informative words with a dynamic weight vector. We achieve new state-of-the-art results among sentence encoding methods in Stanford Natural Language Inference (… ▽ More In this paper, we propose Dynamic Self-Attention (DSA), a new self-attention mechanism for sentence embedding. We design DSA by modifying dynamic routing in capsule network (Sabouretal.,2017) for natural language processing. DSA attends to informative words with a dynamic weight vector. We achieve new state-of-the-art results among sentence encoding methods in Stanford Natural Language Inference (SNLI) dataset with the least number of parameters, while showing comparative results in Stanford Sentiment Treebank (SST) dataset. △ Less

Submitted 22 August, 2018; originally announced August 2018.

Comments: 7 pages, 4 figures

arXiv:1807.07711 [pdf, other]

Blind Signal Classification for Non-Orthogonal Multiple Access in Vehicular Networks

Authors: Minseok Choi, Daejung Yoon, Joongheon Kim

Abstract: For downlink multiple-user (MU) transmission based on non-orthogonal multiple access (NOMA), the advanced receiver strategy is required to cancel the inter-user interference, e.g., successive interference cancellation (SIC). The SIC process can be applicable only when information about the co-scheduled signal is known at the user terminal (UT) side. In particular, the UT should know whether the re… ▽ More For downlink multiple-user (MU) transmission based on non-orthogonal multiple access (NOMA), the advanced receiver strategy is required to cancel the inter-user interference, e.g., successive interference cancellation (SIC). The SIC process can be applicable only when information about the co-scheduled signal is known at the user terminal (UT) side. In particular, the UT should know whether the received signal is OMA or NOMA, whether SIC is required or not, and which modulation orders and power ratios have been used for the superposed UTs, before decoding the signal. An efficient network, e.g., vehicular network, requires that the UTs blindly classify the received signal and apply a matching receiver strategy to reduce the high-layer signaling overhead which is essential for high-mobility vehicular networks. In this paper, we first analyze the performance impact of errors in NOMA signal classification and address ensuing receiver challenges in practical MU usage cases. In order to reduce the blind signal classification error rate, we propose transmission schemes that rotate data symbols or pilots to a specific phase according to the transmitted signal format. In the case of pilot rotation, a new signal classification algorithm is also proposed. The performance improvements by the proposed methods are verified by intensive simulation results. △ Less

Submitted 24 January, 2020; v1 submitted 20 July, 2018; originally announced July 2018.

Comments: 13 pages, 15 figures

Showing 1–50 of 54 results for author: Yoon, D