subscribe to arXiv mailings

FeatureSORT: Essential Features for Effective Tracking

Authors: Hamidreza Hashempoor, Rosemary Koikara, Yu Dong Hwang

Abstract: In this work, we introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective. we provide multiple feature modules each of which stands for a particular appearance information. By integrating distinct appearance features, including clothing color, style, and target direction, alongside a ReID network for robust embedding extraction, our… ▽ More In this work, we introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective. we provide multiple feature modules each of which stands for a particular appearance information. By integrating distinct appearance features, including clothing color, style, and target direction, alongside a ReID network for robust embedding extraction, our tracker significantly enhances online tracking accuracy. Additionally, we propose the incorporation of a stronger detector and also provide an advanced post processing methods that further elevate the tracker's performance. During real time operation, we establish measurement to track associated distance function which includes the IoU, direction, color, style, and ReID features similarity information, where each metric is calculated separately. With the design of our feature related distance function, it is possible to track objects through longer period of occlusions, while keeping the number of identity switches comparatively low. Extensive experimental evaluation demonstrates notable improvement in tracking accuracy and reliability, as evidenced by reduced identity switches and enhanced occlusion handling. These advancements not only contribute to the state of the art in object tracking but also open new avenues for future research and practical applications demanding high precision and reliability. △ Less

Submitted 5 July, 2024; originally announced July 2024.

arXiv:2406.01833 [pdf, other]

doi 10.1145/3637528.3671724

CAFO: Feature-Centric Explanation on Time Series Classification

Authors: Jaeho Kim, Seok-Ju Hahn, Yoontae Hwang, Junghye Lee, Seulki Lee

Abstract: In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time… ▽ More In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time periods but less effective in identifying key features. This limitation underscores the pressing need for a feature-centric approach, a vital yet often overlooked perspective that complements time-centric analysis. To bridge this gap, our study introduces a novel feature-centric explanation and evaluation framework for MTS, named CAFO (Channel Attention and Feature Orthgonalization). CAFO employs a convolution-based approach with channel attention mechanisms, incorporating a depth-wise separable channel attention module (DepCA) and a QR decomposition-based loss for promoting feature-wise orthogonality. We demonstrate that this orthogonalization enhances the separability of attention distributions, thereby refining and stabilizing the ranking of feature importance. This improvement in feature-wise ranking enhances our understanding of feature explainability in MTS. Furthermore, we develop metrics to evaluate global and class-specific feature importance. Our framework's efficacy is validated through extensive empirical analyses on two major public benchmarks and real-world datasets, both synthetic and self-collected, specifically designed to highlight class-wise discriminative features. The results confirm CAFO's robustness and informative capacity in assessing feature importance in MTS classification tasks. This study not only advances the understanding of feature-centric explanations in MTS but also sets a foundation for future explorations in feature-centric explanations. △ Less

Submitted 11 June, 2024; v1 submitted 3 June, 2024; originally announced June 2024.

Comments: Accepted to KDD 2024 Research Track

arXiv:2406.00841 [pdf, other]

doi 10.1145/3643834.3660721

Understanding On-the-Fly End-User Robot Programming

Authors: Laura Stegner, Yuna Hwang, David Porfirio, Bilge Mutlu

Abstract: Novel end-user programming (EUP) tools enable on-the-fly (i.e., spontaneous, easy, and rapid) creation of interactions with robotic systems. These tools are expected to empower users in determining system behavior, although very little is understood about how end users perceive, experience, and use these systems. In this paper, we seek to address this gap by investigating end-user experience with… ▽ More Novel end-user programming (EUP) tools enable on-the-fly (i.e., spontaneous, easy, and rapid) creation of interactions with robotic systems. These tools are expected to empower users in determining system behavior, although very little is understood about how end users perceive, experience, and use these systems. In this paper, we seek to address this gap by investigating end-user experience with on-the-fly robot EUP. We trained 21 end users to use an existing on-the-fly EUP tool, asked them to create robot interactions for four scenarios, and assessed their overall experience. Our findings provide insight into how these systems should be designed to better support end-user experience with on-the-fly EUP, focusing on user interaction with an automatic program synthesizer that resolves imprecise user input, the use of multimodal inputs to express user intent, and the general process of programming a robot. △ Less

Submitted 2 June, 2024; originally announced June 2024.

Comments: To appear at DIS'24. Stegner and Hwang contributed equally to this research

arXiv:2405.20867 [pdf, other]

Automatic Channel Pruning for Multi-Head Attention

Authors: Eunho Lee, Youngbae Hwang

Abstract: Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic ch… ▽ More Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning indicator based on difference of attention between original structure and each channel. Our proposed method can be used to not only original attention, but also linear attention, which is more efficient as linear complexity with respect to the number of tokens. On ImageNet-1K, applying our pruning method to the FLattenTransformer, which includes both attention mechanisms, shows outperformed accuracy for several MACs compared with previous state-of-the-art efficient models and pruned methods. Code will be available soon. △ Less

Submitted 31 May, 2024; originally announced May 2024.

arXiv:2404.01842 [pdf, other]

Semi-Supervised Domain Adaptation for Wildfire Detection

Authors: JooYoung Jang, Youngseo Cha, Jisu Kim, SooHyung Lee, Geonu Lee, Minkook Cho, Young Hwang, Nojun Kwak

Abstract: Recently, both the frequency and intensity of wildfires have increased worldwide, primarily due to climate change. In this paper, we propose a novel protocol for wildfire detection, leveraging semi-supervised Domain Adaptation for object detection, accompanied by a corresponding dataset designed for use by both academics and industries. Our dataset encompasses 30 times more diverse labeled scenes… ▽ More Recently, both the frequency and intensity of wildfires have increased worldwide, primarily due to climate change. In this paper, we propose a novel protocol for wildfire detection, leveraging semi-supervised Domain Adaptation for object detection, accompanied by a corresponding dataset designed for use by both academics and industries. Our dataset encompasses 30 times more diverse labeled scenes for the current largest benchmark wildfire dataset, HPWREN, and introduces a new labeling policy for wildfire detection. Inspired by CoordConv, we propose a robust baseline, Location-Aware Object Detection for Semi-Supervised Domain Adaptation (LADA), utilizing a teacher-student based framework capable of extracting translational variance features characteristic of wildfires. With only using 1% target domain labeled data, our framework significantly outperforms our source-only baseline by a notable margin of 3.8% in mean Average Precision on the HPWREN wildfire dataset. Our dataset is available at https://github.com/BloomBerry/LADA. △ Less

Submitted 2 April, 2024; originally announced April 2024.

Comments: 16 pages, 5 figures, 22 tables

arXiv:2403.05814 [pdf, other]

MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs

Authors: Yerin Hwang, Yongil Kim, Yunah Jang, Jeesoo Bang, Hyunkyung Bae, Kyomin Jung

Abstract: Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions.… ▽ More Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions. By leveraging the relationships between entities in a knowledge graph, MP2D maps the flow of topics within a dialogue, effectively mirroring the dynamics of human conversation. It retrieves relevant passages corresponding to the topics and transforms them into dialogues through the passage-to-dialogue method. Through quantitative and qualitative experiments, we demonstrate MP2D's efficacy in generating dialogue with natural topic shifts. Furthermore, this study introduces a novel benchmark for topic shift dialogues, TS-WikiDialog. Utilizing the dataset, we demonstrate that even Large Language Models (LLMs) struggle to handle topic shifts in dialogue effectively, and we showcase the performance improvements of models trained on datasets generated by MP2D across diverse topic shift dialogue tasks. △ Less

Submitted 9 March, 2024; originally announced March 2024.

Comments: 20 pages

arXiv:2401.09770 [pdf, other]

Reliability-based G1 Continuous Arc Spline Approximation

Authors: Jinhwan Jeon, Yoonjin Hwang, Seibum B. Choi

Abstract: In this paper, we present an algorithm to approximate a set of data points with G1 continuous arcs, using points' covariance data. To the best of our knowledge, previous arc spline approximation approaches assumed that all data points contribute equally (i.e. have the same weights) during the approximation process. However, this assumption may cause serious instability in the algorithm, if the col… ▽ More In this paper, we present an algorithm to approximate a set of data points with G1 continuous arcs, using points' covariance data. To the best of our knowledge, previous arc spline approximation approaches assumed that all data points contribute equally (i.e. have the same weights) during the approximation process. However, this assumption may cause serious instability in the algorithm, if the collected data contains outliers. To resolve this issue, a robust method for arc spline approximation is suggested in this work, assuming that the 2D covariance for each data point is given. Starting with the definition of models and parameters for single arc approximation, the framework is extended to multiple-arc approximation for general usage. Then the proposed algorithm is verified using generated noisy data and real-world collected data via vehicle experiment in Sejong City, South Korea. △ Less

Submitted 18 January, 2024; originally announced January 2024.

Comments: 42 pages, 19 figures, Submitted to Computer Aided Geometric Design

arXiv:2311.07589 [pdf, other]

Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources

Authors: Yerin Hwang, Yongil Kim, Hyunkyung Bae, Jeesoo Bang, Hwanhee Lee, Kyomin Jung

Abstract: To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer… ▽ More To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer alignment. To overcome this limitation, we propose a novel framework called Dialogizer, which has the capability to automatically generate ConvQA datasets with high contextual relevance from textual sources. The framework incorporates two training tasks: question-answer matching (QAM) and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted during the inference phase based on the contextual relevance of the generated questions. Using our framework, we produce four ConvQA datasets by utilizing documents from multiple domains as the primary source. Through automatic evaluation using diverse metrics, as well as human evaluation, we validate that our proposed framework exhibits the ability to generate datasets of higher quality compared to the baseline dialog inpainting model. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: Accepted to EMNLP 2023 main conference

arXiv:2311.05373 [pdf]

What is prompt literacy? An exploratory study of language learners' development of new literacy skill using generative AI

Authors: Yohan Hwang, Jang Ho Lee, Dongkwang Shin

Abstract: In the current study,we propose that, in the era of generative AI, there is now a new form of literacy called "prompt literacy," which refers to the ability to generate precise prompts as input for AI systems, interpret the outputs, and iteratively refine prompts to achieve desired results. To explore the emergence and development of this literacy skill, the current study examined 30 EFL students'… ▽ More In the current study,we propose that, in the era of generative AI, there is now a new form of literacy called "prompt literacy," which refers to the ability to generate precise prompts as input for AI systems, interpret the outputs, and iteratively refine prompts to achieve desired results. To explore the emergence and development of this literacy skill, the current study examined 30 EFL students' engagement in an AI-powered image creation project, through which they created artworks representing the socio-cultural meanings of English words by iteratively drafting and refining prompts in generative AI tools. By examining AI-generated images and the participants' drafting and revision of their prompts, this study demonstrated the emergence of learners' prompt literacy skills. The survey data further showed the participants' perceived improvement in their vocabulary learning strategies as a result of engaging in the target AI-powered project. In addition, the participants' post-project reflection revealed three benefits of developing prompt literacy: enjoyment from manifesting imagined outcomes; recognition of its importance for communication, problem-solving and career development; and the enhanced understanding of the collaborative nature of human-AI interaction. These findings suggest that prompt literacy is an increasingly crucial literacy for the AI era. △ Less

Submitted 9 November, 2023; originally announced November 2023.

Comments: 22 pages

arXiv:2311.00737 [pdf]

Real-Time Magnetic Tracking and Diagnosis of COVID-19 via Machine Learning

Authors: Dang Nguyen, Phat K. Huynh, Vinh Duc An Bui, Kee Young Hwang, Nityanand Jain, Chau Nguyen, Le Huu Nhat Minh, Le Van Truong, Xuan Thanh Nguyen, Dinh Hoang Nguyen, Le Tien Dung, Trung Q. Le, Manh-Huong Phan

Abstract: The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through thre… ▽ More The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through three specific breath testing protocols: normal breath, holding breath, and deep breath. We collected breath data from both COVID-19 patients and healthy subjects in Vietnam using this platform, which then served to train and validate ML models. Our evaluation encompassed multiple ML algorithms, including support vector machines and deep learning models, assessing their ability to diagnose COVID-19. Our multi-model validation methodology ensures a thorough comparison and grants the adaptability to select the most optimal model, striking a balance between diagnostic precision with model interpretability. The findings highlight the exceptional potential of our diagnostic tool in pinpointing respiratory anomalies, achieving over 90% accuracy. This innovative sensor technology can be seamlessly integrated into healthcare settings for patient monitoring, marking a significant enhancement for the healthcare infrastructure. △ Less

Submitted 1 November, 2023; originally announced November 2023.

arXiv:2308.10571 [pdf, other]

Overcoming Overconfidence for Active Learning

Authors: Yujin Hwang, Won Jo, Juyoung Hong, Yukyung Choi

Abstract: It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adju… ▽ More It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adjust the model. However, due to the limited amount of data in each iteration, the model is vulnerable to bias; thus, it is more likely to yield overconfident predictions. In this paper, we present two novel methods to address the problem of overconfidence that arises in the active learning scenario. The first is an augmentation strategy named Cross-Mix-and-Mix (CMaM), which aims to calibrate the model by expanding the limited training distribution. The second is a selection strategy named Ranked Margin Sampling (RankedMS), which prevents choosing data that leads to overly confident predictions. Through various experiments and analyses, we are able to demonstrate that our proposals facilitate efficient data selection by alleviating overconfidence, even though they are readily applicable. △ Less

Submitted 21 August, 2023; originally announced August 2023.

arXiv:2308.10166 [pdf, other]

Cell Spatial Analysis in Crohn's Disease: Unveiling Local Cell Arrangement Pattern with Graph-based Signatures

Authors: Shunxing Bao, Sichen Zhu, Vasantha L Kolachala, Lucas W. Remedios, Yeonjoo Hwang, Yutong Sun, Ruining Deng, Can Cui, Yike Li, Jia Li, Joseph T. Roland, Qi Liu, Ken S. Lau, Subra Kugathasan, Peng Qiu, Keith T. Wilson, Lori A. Coburn, Bennett A. Landman, Yuankai Huo

Abstract: Crohn's disease (CD) is a chronic and relapsing inflammatory condition that affects segments of the gastrointestinal tract. CD activity is determined by histological findings, particularly the density of neutrophils observed on Hematoxylin and Eosin stains (H&E) imaging. However, understanding the broader morphometry and local cell arrangement beyond cell counting and tissue morphology remains cha… ▽ More Crohn's disease (CD) is a chronic and relapsing inflammatory condition that affects segments of the gastrointestinal tract. CD activity is determined by histological findings, particularly the density of neutrophils observed on Hematoxylin and Eosin stains (H&E) imaging. However, understanding the broader morphometry and local cell arrangement beyond cell counting and tissue morphology remains challenging. To address this, we characterize six distinct cell types from H&E images and develop a novel approach for the local spatial signature of each cell. Specifically, we create a 10-cell neighborhood matrix, representing neighboring cell arrangements for each individual cell. Utilizing t-SNE for non-linear spatial projection in scatter-plot and Kernel Density Estimation contour-plot formats, our study examines patterns of differences in the cellular environment associated with the odds ratio of spatial patterns between active CD and control groups. This analysis is based on data collected at the two research institutes. The findings reveal heterogeneous nearest-neighbor patterns, signifying distinct tendencies of cell clustering, with a particular focus on the rectum region. These variations underscore the impact of data heterogeneity on cell spatial arrangements in CD patients. Moreover, the spatial distribution disparities between the two research sites highlight the significance of collaborative efforts among healthcare organizations. All research analysis pipeline tools are available at https://github.com/MASILab/cellNN. △ Less

Submitted 20 August, 2023; originally announced August 2023.

Comments: Submitted to SPIE Medical Imaging. San Diego, CA. February 2024

arXiv:2303.14711 [pdf, other]

Unsupervised detection of small hyperreflective features in ultrahigh resolution optical coherence tomography

Authors: Marcel Reimann, Jungeun Won, Hiroyuki Takahashi, Antonio Yaghy, Yunchan Hwang, Stefan Ploner, Junhong Lin, Jessica Girgis, Kenneth Lam, Siyu Chen, Nadia K. Waheed, Andreas Maier, James G. Fujimoto

Abstract: Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential associa… ▽ More Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential association with disease progression and treatment outcomes. Therefore, it is necessary to reliably detect these features in 3D volumetric scans. Because manual labeling of entire volumes is infeasible a need for automatic detection arises. Labeled datasets are often not publicly available and there are usually large variations in scan protocols and scanner types. Thus, this work focuses on an unsupervised approach that is based on local peak-detection and random walker segmentation to detect small features on each B-scan of the volume. △ Less

Submitted 26 March, 2023; originally announced March 2023.

Comments: Accepted as poster at BVM workshop 2023 (https://www.bvm-workshop.org/). The arXiv version provides full quality figures. 6 pages content (2 figures)

arXiv:2303.08389 [pdf, other]

PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning

Authors: Yongil Kim, Yerin Hwang, Hyeongu Yun, Seunghyun Yoon, Trung Bui, Kyomin Jung

Abstract: Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning. This paper proposes Perturbation Robust Multi-Lingual CLIPScore(PR-MCS), which exhibits robustness to such perturbations, as a novel reference-free image captioning metric applicable to multiple languages. To achieve perturbation robustness, we fine-tune the text encoder of CLIP with… ▽ More Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning. This paper proposes Perturbation Robust Multi-Lingual CLIPScore(PR-MCS), which exhibits robustness to such perturbations, as a novel reference-free image captioning metric applicable to multiple languages. To achieve perturbation robustness, we fine-tune the text encoder of CLIP with our language-agnostic method to distinguish the perturbed text from the original text. To verify the robustness of PR-MCS, we introduce a new fine-grained evaluation dataset consisting of detailed captions, critical objects, and the relationships between the objects for 3, 000 images in five languages. In our experiments, PR-MCS significantly outperforms baseline metrics in capturing lexical noise of all various perturbation types in all five languages, proving that PR-MCS is highly robust to lexical perturbations. △ Less

Submitted 15 March, 2023; originally announced March 2023.

arXiv:2303.08329 [pdf, other]

Cross-speaker Emotion Transfer by Manipulating Speech Style Latents

Authors: Suhee Jo, Younggun Lee, Yookyung Shin, Yeongtae Hwang, Taesu Kim

Abstract: In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic… ▽ More In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic in latent style space. By leveraging only a few labeled samples, we generate emotional speech from reading-style speech without losing the speaker identity. Furthermore, emotion strength is readily controllable using a scalar value, providing an intuitive way for users to manipulate speech. Experimental results show the proposed method affords superior performance in terms of expressiveness, naturalness, and controllability, preserving speaker identity. △ Less

Submitted 14 March, 2023; originally announced March 2023.

Comments: accepted to ICASSP 2023

arXiv:2211.01629 [pdf, other]

Image-based Early Detection System for Wildfires

Authors: Omkar Ranadive, Jisu Kim, Serin Lee, Youngseo Cha, Heechan Park, Minkook Cho, Young K. Hwang

Abstract: Wildfires are a disastrous phenomenon which cause damage to land, loss of property, air pollution, and even loss of human life. Due to the warmer and drier conditions created by climate change, more severe and uncontrollable wildfires are expected to occur in the coming years. This could lead to a global wildfire crisis and have dire consequences on our planet. Hence, it has become imperative to u… ▽ More Wildfires are a disastrous phenomenon which cause damage to land, loss of property, air pollution, and even loss of human life. Due to the warmer and drier conditions created by climate change, more severe and uncontrollable wildfires are expected to occur in the coming years. This could lead to a global wildfire crisis and have dire consequences on our planet. Hence, it has become imperative to use technology to help prevent the spread of wildfires. One way to prevent the spread of wildfires before they become too large is to perform early detection i.e, detecting the smoke before the actual fire starts. In this paper, we present our Wildfire Detection and Alert System which use machine learning to detect wildfire smoke with a high degree of accuracy and can send immediate alerts to users. Our technology is currently being used in the USA to monitor data coming in from hundreds of cameras daily. We show that our system has a high true detection rate and a low false detection rate. Our performance evaluation study also shows that on an average our system detects wildfire smoke faster than an actual person. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: Published in Tackling Climate Change with Machine Learning workshop, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022)

arXiv:2210.11672 [pdf, other]

Stochastic Adaptive Activation Function

Authors: Kyungsu Lee, Jaeseung Yang, Haeyun Lee, Jae Youn Hwang

Abstract: The simulation of human neurons and neurotransmission mechanisms has been realized in deep neural networks based on the theoretical implementations of activation functions. However, recent studies have reported that the threshold potential of neurons exhibits different values according to the locations and types of individual neurons, and that the activation functions have limitations in terms of… ▽ More The simulation of human neurons and neurotransmission mechanisms has been realized in deep neural networks based on the theoretical implementations of activation functions. However, recent studies have reported that the threshold potential of neurons exhibits different values according to the locations and types of individual neurons, and that the activation functions have limitations in terms of representing this variability. Therefore, this study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs. Furthermore, the proposed activation function mathematically exhibits a more generalized form of Swish activation function, and thus we denoted it as Adaptive SwisH (ASH). ASH highlights informative features that exhibit large values in the top percentiles in an input, whereas it rectifies low values. Most importantly, ASH exhibits trainable, adaptive, and context-aware properties compared to other activation functions. Furthermore, ASH represents general formula of the previously studied activation function and provides a reasonable mathematical background for the superior performance. To validate the effectiveness and robustness of ASH, we implemented ASH into many deep learning models for various tasks, including classification, detection, segmentation, and image generation. Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications. △ Less

Submitted 20 October, 2022; originally announced October 2022.

arXiv:2209.09491 [pdf, other]

Deep Q-Network for AI Soccer

Authors: Curie Kim, Yewon Hwang, Jong-Hwan Kim

Abstract: Reinforcement learning has shown an outstanding performance in the applications of games, particularly in Atari games as well as Go. Based on these successful examples, we attempt to apply one of the well-known reinforcement learning algorithms, Deep Q-Network, to the AI Soccer game. AI Soccer is a 5:5 robot soccer game where each participant develops an algorithm that controls five robots in a te… ▽ More Reinforcement learning has shown an outstanding performance in the applications of games, particularly in Atari games as well as Go. Based on these successful examples, we attempt to apply one of the well-known reinforcement learning algorithms, Deep Q-Network, to the AI Soccer game. AI Soccer is a 5:5 robot soccer game where each participant develops an algorithm that controls five robots in a team to defeat the opponent participant. Deep Q-Network is designed to implement our original rewards, the state space, and the action space to train each agent so that it can take proper actions in different situations during the game. Our algorithm was able to successfully train the agents, and its performance was preliminarily proven through the mini-competition against 10 teams wishing to take part in the AI Soccer international competition. The competition was organized by the AI World Cup committee, in conjunction with the WCG 2019 Xi'an AI Masters. With our algorithm, we got the achievement of advancing to the round of 16 in this international competition with 130 teams from 39 countries. △ Less

Submitted 21 September, 2022; v1 submitted 20 September, 2022; originally announced September 2022.

arXiv:2207.06000 [pdf, other]

Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS

Authors: Yookyung Shin, Younggun Lee, Suhee Jo, Yeongtae Hwang, Taesu Kim

Abstract: Expressive text-to-speech has shown improved performance in recent years. However, the style control of synthetic speech is often restricted to discrete emotion categories and requires training data recorded by the target speaker in the target style. In many practical situations, users may not have reference speech recorded in target emotion but still be interested in controlling speech style just… ▽ More Expressive text-to-speech has shown improved performance in recent years. However, the style control of synthetic speech is often restricted to discrete emotion categories and requires training data recorded by the target speaker in the target style. In many practical situations, users may not have reference speech recorded in target emotion but still be interested in controlling speech style just by typing text description of desired emotional style. In this paper, we propose a text-based interface for emotional style control and cross-speaker style transfer in multi-speaker TTS. We propose the bi-modal style encoder which models the semantic relationship between text description embedding and speech style embedding with a pretrained language model. To further improve cross-speaker style transfer on disjoint, multi-style datasets, we propose the novel style loss. The experimental results show that our model can generate high-quality expressive speech even in unseen style. △ Less

Submitted 13 July, 2022; originally announced July 2022.

Comments: Accepted to Interspeech 2022

arXiv:2205.09185 [pdf, other]

doi 10.1016/j.nima.2022.167748

AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector. △ Less

Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: 16 pages, 18 figures, 2 appendices, 3 tables

arXiv:2204.02403 [pdf, other]

Explainable Deep Learning Algorithm for Distinguishing Incomplete Kawasaki Disease by Coronary Artery Lesions on Echocardiographic Imaging

Authors: Haeyun Lee, Yongsoon Eun, Jae Youn Hwang, Lucy Youngmin Eun

Abstract: Background and Objective: Incomplete Kawasaki disease (KD) has often been misdiagnosed due to a lack of the clinical manifestations of classic KD. However, it is associated with a markedly higher prevalence of coronary artery lesions. Identifying coronary artery lesions by echocardiography is important for the timely diagnosis of and favorable outcomes in KD. Moreover, similar to KD, coronavirus d… ▽ More Background and Objective: Incomplete Kawasaki disease (KD) has often been misdiagnosed due to a lack of the clinical manifestations of classic KD. However, it is associated with a markedly higher prevalence of coronary artery lesions. Identifying coronary artery lesions by echocardiography is important for the timely diagnosis of and favorable outcomes in KD. Moreover, similar to KD, coronavirus disease 2019, currently causing a worldwide pandemic, also manifests with fever; therefore, it is crucial at this moment that KD should be distinguished clearly among the febrile diseases in children. In this study, we aimed to validate a deep learning algorithm for classification of KD and other acute febrile diseases. Methods: We obtained coronary artery images by echocardiography of children (n = 88 for KD; n = 65 for pneumonia). We trained six deep learning networks (VGG19, Xception, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) using the collected data. Results: SE-ResNext50 showed the best performance in terms of accuracy, specificity, and precision in the classification. SE-ResNext50 offered a precision of 76.35%, a sensitivity of 82.64%, and a specificity of 58.12%. Conclusions: The results of our study suggested that deep learning algorithms have similar performance to an experienced cardiologist in detecting coronary artery lesions to facilitate the diagnosis of KD. △ Less

Submitted 5 April, 2022; originally announced April 2022.

arXiv:2202.09150 [pdf, other]

Personalization Trade-offs in Designing a Dialogue-based Information System for Support-Seeking of Sexual Violence Survivors

Authors: Hyeok Kim, Youjin Hwang, Jieun Lee, Youngjin Kwon, Yujin Park, Joonhwan Lee

Abstract: The lack of reliable, personalized information often complicates sexual violence survivors' support-seeking. Recently, there is an emerging approach to conversational information systems for support-seeking of sexual violence survivors, featuring personalization with wide availability and anonymity. However, a single best solution might not exist as sexual violence survivors have different needs a… ▽ More The lack of reliable, personalized information often complicates sexual violence survivors' support-seeking. Recently, there is an emerging approach to conversational information systems for support-seeking of sexual violence survivors, featuring personalization with wide availability and anonymity. However, a single best solution might not exist as sexual violence survivors have different needs and purposes in seeking support channels. To better envision conversational support-seeking systems for sexual violence survivors, we explore personalization trade-offs in designing such information systems. We implement a high-fidelity prototype dialogue-based information system through four design workshop sessions with three professional caregivers and interviewed with four self-identified survivors using our prototype. We then identify two forms of personalization trade-offs for conversational support-seeking systems: (1) specificity and sensitivity in understanding users and (2) relevancy and inclusiveness in providing information. To handle these trade-offs, we propose a reversed approach that starts from designing information and inclusive tailoring that considers unspecified needs, respectively. △ Less

Submitted 18 February, 2022; originally announced February 2022.

Comments: 15 pages, 2 figures, 1 table, accepted for CHI 2022

arXiv:2202.01863 [pdf]

Best Practices and Scoring System on Reviewing A.I. based Medical Imaging Papers: Part 1 Classification

Authors: Timothy L. Kline, Felipe Kitamura, Ian Pan, Amine M. Korchi, Neil Tenenholtz, Linda Moy, Judy Wawira Gichoya, Igor Santos, Steven Blumer, Misha Ysabel Hwang, Kim-Ann Git, Abishek Shroff, Elad Walach, George Shih, Steve Langer

Abstract: With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection i… ▽ More With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection is often subjective; however, reproducible science requires reproducible review. The Machine Learning Education Sub-Committee of SIIM has identified a knowledge gap and a serious need to establish guidelines for reviewing these studies. Although there have been several recent papers with this goal, this present work is written from the machine learning practitioners standpoint. In this series, the committee will address the best practices to be followed in an A.I.-based study and present the required sections in terms of examples and discussion of what should be included to make the studies cohesive, reproducible, accurate, and self-contained. This first entry in the series focuses on the task of image classification. Elements such as dataset curation, data pre-processing steps, defining an appropriate reference standard, data partitioning, model architecture and training are discussed. The sections are presented as they would be detailed in a typical manuscript, with content describing the necessary information that should be included to make sure the study is of sufficient quality to be considered for publication. The goal of this series is to provide resources to not only help improve the review process for A.I.-based medical imaging papers, but to facilitate a standard for the information that is presented within all components of the research study. We hope to provide quantitative metrics in what otherwise may be a qualitative review process. △ Less

Submitted 3 February, 2022; originally announced February 2022.

arXiv:2202.00783 [pdf, other]

Modeling ventilation in a low-income house in Dhaka, Bangladesh

Authors: Yunjae Hwang, Laura, Kwong, Mohammad Saeed Munim, Fosiul Alam Nizame, Stephen Luby, Catherine Gorlé

Abstract: According to UNICEF, pneumonia is the leading cause of death in children under 5. 70% of worldwide pneumonia deaths occur in only 15 countries, including Bangladesh. Previous research has indicated a potential association between the incidence of pneumonia and the presence of cross-ventilation in slum housing in Dhaka, Bangladesh. The objective of this research is to establish a validated computat… ▽ More According to UNICEF, pneumonia is the leading cause of death in children under 5. 70% of worldwide pneumonia deaths occur in only 15 countries, including Bangladesh. Previous research has indicated a potential association between the incidence of pneumonia and the presence of cross-ventilation in slum housing in Dhaka, Bangladesh. The objective of this research is to establish a validated computational framework that can predict ventilation rates in slum homes to support further studies investigating this correlation. To achieve this objective we employ a building thermal model (BTM) in combination with uncertainty quantification (UQ). The BTM solves for the time-evolution of volume-averaged temperatures in a typical home, considering different ventilation configurations. The UQ method propagates uncertainty in model parameters, weather inputs, and physics models to predict mean values and 95% confidence intervals for the quantities of interest, namely temperatures and ventilation rates in terms of air changes per hour (ACH). The model predictions are compared to on-site field measurements of air and thermal mass temperatures, and of ACH. The results indicate that the use of standard cross- or single-sided ventilation models limits the accuracy of the ACH predictions; in contrast, a model based on a similarity relationship informed by the available ACH measurements can produce more accurate predictions with confidence intervals that encompass the measurements for 12 of the 17 available data points. △ Less

Submitted 30 January, 2022; originally announced February 2022.

arXiv:2111.01254 [pdf, ps, other]

Unique Games hardness of Quantum Max-Cut, and a conjectured vector-valued Borell's inequality

Authors: Yeongwoo Hwang, Joe Neeman, Ojas Parekh, Kevin Thompson, John Wright

Abstract: The Gaussian noise stability of a function $f:\mathbb{R}^n \to \{-1, 1\}$ is the expected value of $f(\boldsymbol{x}) \cdot f(\boldsymbol{y})$ over $ρ$-correlated Gaussian random variables $\boldsymbol{x}$ and $\boldsymbol{y}$. Borell's inequality states that for $-1 \leq ρ\leq 0$, this is minimized by the halfspace $f(x) = \mathrm{sign}(x_1)$. In this work, we generalize this result to hold for f… ▽ More The Gaussian noise stability of a function $f:\mathbb{R}^n \to \{-1, 1\}$ is the expected value of $f(\boldsymbol{x}) \cdot f(\boldsymbol{y})$ over $ρ$-correlated Gaussian random variables $\boldsymbol{x}$ and $\boldsymbol{y}$. Borell's inequality states that for $-1 \leq ρ\leq 0$, this is minimized by the halfspace $f(x) = \mathrm{sign}(x_1)$. In this work, we generalize this result to hold for functions $f:\mathbb{R}^n \to S^{k-1}$ which output $k$-dimensional unit vectors. Our main conjecture, which we call the $\textit{vector-valued Borell's inequality}$, asserts that the expected value of $\langle f(\boldsymbol{x}), f(\boldsymbol{y})\rangle$ is minimized by the function $f(x) = x_{\leq k} / \Vert x_{\leq k} \Vert$, where $x_{\leq k} = (x_1, \ldots, x_k)$. We give several pieces of evidence in favor of this conjecture, including a proof that it does indeed hold in the special case of $n = k$. As an application of this conjecture, we show that it implies several hardness of approximation results for a special case of the local Hamiltonian problem related to the anti-ferromagnetic Heisenberg model known as Quantum Max-Cut. This can be viewed as a natural quantum analogue of the classical Max-Cut problem and has been proposed as a useful testbed for developing algorithms. We show the following, assuming our conjecture: (1) The integrality gap of the basic SDP is $0.498$, matching an existing rounding algorithm. Combined with existing results, this shows that the basic SDP does not achieve the optimal approximation ratio. (2) It is Unique Games-hard (UG-hard) to compute a $(0.956+\varepsilon)$-approximation to the value of the best product state, matching an existing approximation algorithm. (3) It is UG-hard to compute a $(0.956+\varepsilon)$-approximation to the value of the best (possibly entangled) state. △ Less

Submitted 28 September, 2022; v1 submitted 1 November, 2021; originally announced November 2021.

Comments: 76 pages; v3 treats the vector-valued Borell's inequality as a conjecture rather than a theorem, due to an error in previous versions

arXiv:2109.05712 [pdf, other]

Contrastive Learning for Context-aware Neural Machine TranslationUsing Coreference Information

Authors: Yongkeun Hwang, Hyungu Yun, Kyomin Jung

Abstract: Context-aware neural machine translation (NMT) incorporates contextual information of surrounding texts, that can improve the translation quality of document-level machine translation. Many existing works on context-aware NMT have focused on developing new model architectures for incorporating additional contexts and have shown some promising results. However, most existing works rely on cross-ent… ▽ More Context-aware neural machine translation (NMT) incorporates contextual information of surrounding texts, that can improve the translation quality of document-level machine translation. Many existing works on context-aware NMT have focused on developing new model architectures for incorporating additional contexts and have shown some promising results. However, most existing works rely on cross-entropy loss, resulting in limited use of contextual information. In this paper, we propose CorefCL, a novel data augmentation and contrastive learning scheme based on coreference between the source and contextual sentences. By corrupting automatically detected coreference mentions in the contextual sentence, CorefCL can train the model to be sensitive to coreference inconsistency. We experimented with our method on common context-aware NMT models and two document-level translation tasks. In the experiments, our method consistently improved BLEU of compared models on English-German and English-Korean tasks. We also show that our method significantly improves coreference resolution in the English-German contrastive test suite. △ Less

Submitted 13 September, 2021; originally announced September 2021.

Comments: WMT 2021

arXiv:2109.01729 [pdf, other]

Applying the Persona of User's Family Member and the Doctor to the Conversational Agents for Healthcare

Authors: Youjin Hwang, Donghoon Shin, Sion Baek, Bongwon Suh, Joonhwan Lee

Abstract: Conversational agents have been showing lots of opportunities in healthcare by taking over a lot of tasks that used to be done by a human. One of the major functions of conversational healthcare agent is intervening users' daily behaviors. In this case, forming an intimate and trustful relationship with users is one of the major issues. Factors affecting human-agent relationship should be deeply e… ▽ More Conversational agents have been showing lots of opportunities in healthcare by taking over a lot of tasks that used to be done by a human. One of the major functions of conversational healthcare agent is intervening users' daily behaviors. In this case, forming an intimate and trustful relationship with users is one of the major issues. Factors affecting human-agent relationship should be deeply explored to improve long-term acceptance of healthcare agent. Even though a bunch of ideas and researches have been suggested to increase the acceptance of conversational agents in healthcare, challenges still remain. From the preliminary work we conducted, we suggest an idea of applying the personas of users' family members and the doctor who are in the relationship with users in the real world as a solution for forming the rigid relationship between humans and the chatbot. △ Less

Submitted 2 September, 2021; originally announced September 2021.

Comments: Accepted at CHI 2020 Workshop on Conversational Agents for Health and Wellbeing

arXiv:2108.09030 [pdf, other]

Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard

Authors: Sahng-Min Yoo, Ue-Hwan Kim, Yewon Hwang, Jong-Hwan Kim

Abstract: Contemporary soft keyboards possess limitations: the lack of physical feedback results in an increase of typos, and the interface of soft keyboards degrades the utility of the screen. To overcome these limitations, we propose an Invisible Mobile Keyboard (IMK), which lets users freely type on the desired area without any constraints. To facilitate a data-driven IMK decoding task, we have collected… ▽ More Contemporary soft keyboards possess limitations: the lack of physical feedback results in an increase of typos, and the interface of soft keyboards degrades the utility of the screen. To overcome these limitations, we propose an Invisible Mobile Keyboard (IMK), which lets users freely type on the desired area without any constraints. To facilitate a data-driven IMK decoding task, we have collected the most extensive text-entry dataset (approximately 2M pairs of typing positions and the corresponding characters). Additionally, we propose our baseline decoder along with a semantic typo correction mechanism based on self-attention, which decodes such unconstrained inputs with high accuracy (96.0%). Moreover, the user study reveals that the users could type faster and feel convenience and satisfaction to IMK with our decoder. Lastly, we make the source code and the dataset public to contribute to the research community. △ Less

Submitted 20 August, 2021; originally announced August 2021.

Comments: Accepted by IJCAI 2021

arXiv:2104.09021 [pdf, other]

Writing in The Air: Unconstrained Text Recognition from Finger Movement Using Spatio-Temporal Convolution

Authors: Ue-Hwan Kim, Yewon Hwang, Sun-Kyung Lee, Jong-Hwan Kim

Abstract: In this paper, we introduce a new benchmark dataset for the challenging writing in the air (WiTA) task -- an elaborate task bridging vision and NLP. WiTA implements an intuitive and natural writing method with finger movement for human-computer interaction (HCI). Our WiTA dataset will facilitate the development of data-driven WiTA systems which thus far have displayed unsatisfactory performance --… ▽ More In this paper, we introduce a new benchmark dataset for the challenging writing in the air (WiTA) task -- an elaborate task bridging vision and NLP. WiTA implements an intuitive and natural writing method with finger movement for human-computer interaction (HCI). Our WiTA dataset will facilitate the development of data-driven WiTA systems which thus far have displayed unsatisfactory performance -- due to lack of dataset as well as traditional statistical models they have adopted. Our dataset consists of five sub-datasets in two languages (Korean and English) and amounts to 209,926 video instances from 122 participants. We capture finger movement for WiTA with RGB cameras to ensure wide accessibility and cost-efficiency. Next, we propose spatio-temporal residual network architectures inspired by 3D ResNet. These models perform unconstrained text recognition from finger movement, guarantee a real-time operation by processing 435 and 697 decoding frames-per-second for Korean and English, respectively, and will serve as an evaluation standard. Our dataset and the source codes are available at https://github.com/Uehwan/WiTA. △ Less

Submitted 18 April, 2021; originally announced April 2021.

Comments: 10 pages, 6 figures, 6 tables

arXiv:2004.11819 [pdf]

doi 10.1109/TGRS.2020.3010055

Domain Adaptive Transfer Attack (DATA)-based Segmentation Networks for Building Extraction from Aerial Images

Authors: Younghwan Na, Jun Hee Kim, Kyungsu Lee, Juhum Park, Jae Youn Hwang, Jihwan P. Choi

Abstract: Semantic segmentation models based on convolutional neural networks (CNNs) have gained much attention in relation to remote sensing and have achieved remarkable performance for the extraction of buildings from high-resolution aerial images. However, the issue of limited generalization for unseen images remains. When there is a domain gap between the training and test datasets, CNN-based segmentati… ▽ More Semantic segmentation models based on convolutional neural networks (CNNs) have gained much attention in relation to remote sensing and have achieved remarkable performance for the extraction of buildings from high-resolution aerial images. However, the issue of limited generalization for unseen images remains. When there is a domain gap between the training and test datasets, CNN-based segmentation models trained by a training dataset fail to segment buildings for the test dataset. In this paper, we propose segmentation networks based on a domain adaptive transfer attack (DATA) scheme for building extraction from aerial images. The proposed system combines the domain transfer and adversarial attack concepts. Based on the DATA scheme, the distribution of the input images can be shifted to that of the target images while turning images into adversarial examples against a target network. Defending adversarial examples adapted to the target domain can overcome the performance degradation due to the domain gap and increase the robustness of the segmentation model. Cross-dataset experiments and the ablation study are conducted for the three different datasets: the Inria aerial image labeling dataset, the Massachusetts building dataset, and the WHU East Asia dataset. Compared to the performance of the segmentation network without the DATA scheme, the proposed method shows improvements in the overall IoU. Moreover, it is verified that the proposed method outperforms even when compared to feature adaptation (FA) and output space adaptation (OSA). △ Less

Submitted 29 April, 2020; v1 submitted 11 April, 2020; originally announced April 2020.

Comments: 11pages, 12 figures

arXiv:2001.01401 [pdf, other]

Mel-spectrogram augmentation for sequence to sequence voice conversion

Authors: Yeongtae Hwang, Hyemin Cho, Hongsun Yang, Dong-Ok Won, Insoo Oh, Seong-Whan Lee

Abstract: For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies… ▽ More For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies proposed in SpecAugment. In addition, we proposed new policies (i.e., frequency warping, loudness and time length control) for more data variations. Moreover, to find the appropriate hyperparameters of augmentation policies without training the VC model, we proposed hyperparameter search strategy and the new metric for reducing experimental cost, namely deformation per deteriorating ratio. We compared the effect of these Mel-spectrogram augmentation methods based on various sizes of training set and augmentation policies. In the experimental results, the time axis warping based policies (i.e., time length control and time warping.) showed better performance than other policies. These results indicate that the use of the Mel-spectrogram augmentation is more beneficial for training the VC model. △ Less

Submitted 15 June, 2020; v1 submitted 6 January, 2020; originally announced January 2020.

Comments: 5pages, 1 figures, 8 tables

arXiv:1902.02905 [pdf, other]

Mobile Artificial Intelligence Technology for Detecting Macula Edema and Subretinal Fluid on OCT Scans: Initial Results from the DATUM alpha Study

Authors: Stephen G. Odaibo, Mikelson MomPremier, Richard Y. Hwang, Salman J. Yousuf, Steven L. Williams, Joshua Grant

Abstract: Artificial Intelligence (AI) is necessary to address the large and growing deficit in retina and healthcare access globally. And mobile AI diagnostic platforms running in the Cloud may effectively and efficiently distribute such AI capability. Here we sought to evaluate the feasibility of Cloud-based mobile artificial intelligence for detection of retinal disease. And to evaluate the accuracy of a… ▽ More Artificial Intelligence (AI) is necessary to address the large and growing deficit in retina and healthcare access globally. And mobile AI diagnostic platforms running in the Cloud may effectively and efficiently distribute such AI capability. Here we sought to evaluate the feasibility of Cloud-based mobile artificial intelligence for detection of retinal disease. And to evaluate the accuracy of a particular such system for detection of subretinal fluid (SRF) and macula edema (ME) on OCT scans. A multicenter retrospective image analysis was conducted in which board-certified ophthalmologists with fellowship training in retina evaluated OCT images of the macula. They noted the presence or absence of ME or SRF, then compared their assessment to that obtained from Fluid Intelligence, a mobile AI app that detects SRF and ME on OCT scans. Investigators consecutively selected retinal OCTs, while making effort to balance the number of scans with retinal fluid and scans without. Exclusion criteria included poor scan quality, ambiguous features, macula holes, retinoschisis, and dense epiretinal membranes. Accuracy in the form of sensitivity and specificity of the AI mobile App was determined by comparing its assessments to those of the retina specialists. At the time of this submission, five centers have completed their initial studies. This consists of a total of 283 OCT scans of which 155 had either ME or SRF ("wet") and 128 did not ("dry"). The sensitivity ranged from 82.5% to 97% with a weighted average of 89.3%. The specificity ranged from 52% to 100% with a weighted average of 81.23%. CONCLUSION: Cloud-based Mobile AI technology is feasible for the detection retinal disease. In particular, Fluid Intelligence (alpha version), is sufficiently accurate as a screening tool for SRF and ME, especially in underserved areas. Further studies and technology development is needed. △ Less

Submitted 12 February, 2019; v1 submitted 7 February, 2019; originally announced February 2019.

Comments: Initial results of the DATUM alpha Study were initially presented on August 13th 2018 in the Keynote Address at the 116th National Medical Association Annual Meeting & Scientific Assembly's New Innovations in Ophthalmology Session. The results were also presented on September 21st 2018 in a Podium Lecture during Alumni Day at the University of Michigan--Ann Arbor Kellogg Eye Center

arXiv:1809.07998 [pdf, ps, other]

Hierarchical System Mapping for Large-Scale Fault-Tolerant Quantum Computing

Authors: Yongsoo Hwang, Byung-Soo Choi

Abstract: Considering the large-scale quantum computer, it is important to know how much quantum computational resources is necessary precisely and quickly. Unfortunately the previous methods so far cannot support a large-scale quantum computing practically and therefore the analysis because they usually use a non-structured code. To overcome this problem, we propose a fast mapping by using the hierarchical… ▽ More Considering the large-scale quantum computer, it is important to know how much quantum computational resources is necessary precisely and quickly. Unfortunately the previous methods so far cannot support a large-scale quantum computing practically and therefore the analysis because they usually use a non-structured code. To overcome this problem, we propose a fast mapping by using the hierarchical assembly code which is much more compact than the non-structured code. During the mapping process, the necessary modules and their interconnection can be dynamically mapped by using the communication bus at the cost of additional qubits. In our study, the proposed method works very fast such as 1 hour than 1500 days for Shor algorithm to factorize 512-bit integer. Meanwhile, since the hierarchical assembly code has high degree of locality, it has shorter SWAP chains and hence it does not increase the quantum computation time than expected. △ Less

Submitted 21 September, 2018; originally announced September 2018.

arXiv:1807.06233 [pdf, other]

Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

Authors: Jaekyum Kim, Junho Koh, Yecheol Kim, Jaehyung Choi, Youngbae Hwang, Jun Won Choi

Abstract: The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learnin… ▽ More The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learning architecture in the presence of imperfect modalities. We introduce deep fusion architecture for object detection which processes each modality using the separate convolutional neural network (CNN) and constructs the joint feature map by combining the intermediate features from the CNNs. In order to facilitate the robustness to the degraded modalities, we employ the gated information fusion (GIF) network which weights the contribution from each modality according to the input feature maps to be fused. The weights are determined through the convolutional layers followed by a sigmoid function and trained along with the information fusion network in an end-to-end fashion. Our experiments show that the proposed GIF network offers the additional architectural flexibility to achieve robust performance in handling some degraded modalities, and show a significant performance improvement based on Single Shot Detector (SSD) for KITTI dataset using the proposed fusion network and data augmentation schemes. △ Less

Submitted 2 November, 2018; v1 submitted 17 July, 2018; originally announced July 2018.

Comments: 2018 Asian Conference on Computer Vision (ACCV)

arXiv:1801.03009 [pdf, other]

doi 10.1016/j.cma.2018.12.022

Development of hp-inverse model by using generalized polynomial chaos

Authors: Kyongmin Yeo, Youngdeok Hwang, Xiao Liu, Jayant Kalagnanam

Abstract: We present a hp-inverse model to estimate a smooth, non-negative source function from a limited number of observations for a two-dimensional linear source inversion problem. A standard least-square inverse model is formulated by using a set of Gaussian radial basis functions (GRBF) on a rectangular mesh system with a uniform grid space. Here, the choice of the mesh system is modeled as a random va… ▽ More We present a hp-inverse model to estimate a smooth, non-negative source function from a limited number of observations for a two-dimensional linear source inversion problem. A standard least-square inverse model is formulated by using a set of Gaussian radial basis functions (GRBF) on a rectangular mesh system with a uniform grid space. Here, the choice of the mesh system is modeled as a random variable and the generalized polynomial chaos (gPC) expansion is used to represent the random mesh system. It is shown that the convolution of gPC and GRBF provides hierarchical basis functions for the linear source inverse model with the $hp$-refinement capability. We propose a mixed l_1 and l_2 regularization to exploit the hierarchical nature of the basis functions to find a sparse solution. The $hp$-inverse model has an advantage over the standard least-square inverse model when the number of data is limited. It is shown that the hp-inverse model provides a good estimate of the source function even when the number of unknown parameters ($m$) is much larger the number of data ($n$), e.g., m/n > 40. △ Less

Submitted 14 December, 2018; v1 submitted 9 January, 2018; originally announced January 2018.

arXiv:1712.09721 [pdf, ps, other]

Analysis of the Game-Theoretic Modeling of Backscatter Wireless Sensor Networks under Smart Interference

Authors: Seung Gwan Hong, Yu Min Hwang, Sun Yui Lee, Yoan Shin, Dong In Kim, Jin Young Kim

Abstract: In this paper, we study an interference avoidance scenario in the presence of a smart interferer which can rapidly observe the transmit power of a backscatter wireless sensor network (WSN) and effectively interrupt backscatter signals. We consider a power control with a sub-channel allocation to avoid interference attacks and a time-switching ratio for backscattering and RF energy harvesting in ba… ▽ More In this paper, we study an interference avoidance scenario in the presence of a smart interferer which can rapidly observe the transmit power of a backscatter wireless sensor network (WSN) and effectively interrupt backscatter signals. We consider a power control with a sub-channel allocation to avoid interference attacks and a time-switching ratio for backscattering and RF energy harvesting in backscatter WSNs. We formulate the problem based on a Stackelberg game theory and compute the optimal transmit power, time-switching ratio, and sub-channel allocation parameter to maximize a utility function against the smart interference. We propose two algorithms for the utility maximization using Lagrangian dual decomposition for the backscatter WSN and the smart interference to prove the existence of the Stackelberg equilibrium. Numerical results show that the proposed algorithms effectively maximize the utility, compared to that of the algorithm based on the Nash game, so as to overcome smart interference in backscatter communications. △ Less

Submitted 21 December, 2017; originally announced December 2017.

Comments: 13 pages

arXiv:1601.05447 [pdf, other]

Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation

Authors: Subarna Tripathi, Serge Belongie, Youngbae Hwang, Truong Nguyen

Abstract: Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector. A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performan… ▽ More Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector. A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performance on the Youtube-Objects dataset. We further propose a clustering of VOPs which can efficiently be used for detecting objects in video in a streaming fashion. As opposed to applying per-frame convolutional neural network (CNN) based object detection, our proposed method called Objects in Video Enabler thRough LAbel Propagation (OVERLAP) needs to classify only a small fraction of all candidate proposals in every video frame through streaming clustering of object proposals and class-label propagation. Source code will be made available soon. △ Less

Submitted 20 January, 2016; originally announced January 2016.

Comments: Accepted for publication in WACV 2016

arXiv:1511.08343 [pdf, other]

The Automatic Statistician: A Relational Perspective

Authors: Yunseong Hwang, Anh Tong, Jaesik Choi

Abstract: Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite cova… ▽ More Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data. △ Less

Submitted 11 February, 2016; v1 submitted 26 November, 2015; originally announced November 2015.

arXiv:1509.02441 [pdf]

Semantic Video Segmentation : Exploring Inference Efficiency

Authors: Subarna Tripathi, Serge Belongie, Youngbae Hwang, Truong Nguyen

Abstract: We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames. The key idea is to combine best of two worlds: semantic co-labeling and more expressive models. Our formulation enables us to perform inference over ten thousand images within seconds and makes the system amenable to perform video semantic segmentation most effective… ▽ More We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames. The key idea is to combine best of two worlds: semantic co-labeling and more expressive models. Our formulation enables us to perform inference over ten thousand images within seconds and makes the system amenable to perform video semantic segmentation most effectively. On CamVid dataset, with TextonBoost unaries, our proposed method achieves up to 8% improvement in accuracy over individual semantic image segmentation without additional time overhead. The source code is available at https://github.com/subtri/video_inference △ Less

Submitted 4 September, 2015; originally announced September 2015.

Comments: To appear in proc of ISOCC 2015

arXiv:1404.5020 [pdf, ps, other]

doi 10.1109/IECON.2014.7049328

Training-Free Non-Intrusive Load Monitoring of Electric Vehicle Charging with Low Sampling Rate

Authors: Zhilin Zhang, Jae Hyun Son, Ying Li, Mark Trayer, Zhouyue Pi, Dong Yoon Hwang, Joong Ki Moon

Abstract: Non-intrusive load monitoring (NILM) is an important topic in smart-grid and smart-home. Many energy disaggregation algorithms have been proposed to detect various individual appliances from one aggregated signal observation. However, few works studied the energy disaggregation of plug-in electric vehicle (EV) charging in the residential environment since EVs charging at home has emerged only rece… ▽ More Non-intrusive load monitoring (NILM) is an important topic in smart-grid and smart-home. Many energy disaggregation algorithms have been proposed to detect various individual appliances from one aggregated signal observation. However, few works studied the energy disaggregation of plug-in electric vehicle (EV) charging in the residential environment since EVs charging at home has emerged only recently. Recent studies showed that EV charging has a large impact on smart-grid especially in summer. Therefore, EV charging monitoring has become a more important and urgent missing piece in energy disaggregation. In this paper, we present a novel method to disaggregate EV charging signals from aggregated real power signals. The proposed method can effectively mitigate interference coming from air-conditioner (AC), enabling accurate EV charging detection and energy estimation under the presence of AC power signals. Besides, the proposed algorithm requires no training, demands a light computational load, delivers high estimation accuracy, and works well for data recorded at the low sampling rate 1/60 Hz. When the algorithm is tested on real-world data recorded from 11 houses over about a whole year (total 125 months worth of data), the averaged error in estimating energy consumption of EV charging is 15.7 kwh/month (while the true averaged energy consumption of EV charging is 208.5 kwh/month), and the averaged normalized mean square error in disaggregating EV charging load signals is 0.19. △ Less

Submitted 6 August, 2014; v1 submitted 20 April, 2014; originally announced April 2014.

Comments: Accepted by The 40th Annual Conference of the IEEE Industrial Electronics Society (IECON 2014)

arXiv:1402.3557 [pdf]

Improving Streaming Video Segmentation with Early and Mid-Level Visual Processing

Authors: Subarna Tripathi, Youngbae Hwang, Serge Belongie, Truong Nguyen

Abstract: Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues. We propose improvements to the leading streaming graph-based hierarchical video segmentation (streamGBH) method based on early and mid level visual processing. The extensive experimental analysis of our approach validates the improvement of hierarchical supervoxe… ▽ More Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues. We propose improvements to the leading streaming graph-based hierarchical video segmentation (streamGBH) method based on early and mid level visual processing. The extensive experimental analysis of our approach validates the improvement of hierarchical supervoxel representation by incorporating motion and color with effective filtering. We also pose and illuminate some open questions towards intermediate level video analysis as further extension to streamGBH. We exploit the supervoxels as an initialization towards estimation of dominant affine motion regions, followed by merging of such motion regions in order to hierarchically segment a video in a novel motion-segmentation framework which aims at subsequent applications such as foreground recognition. △ Less

Submitted 14 February, 2014; originally announced February 2014.

Comments: WACV accepted paper

arXiv:1312.5794 [pdf, ps, other]

Random Basketball Routing for ZigBee based Sensor Networks

Authors: Dong Min Kim, Young Ju Hwang, Seong-Lyun Kim, Gwang-Ja Jin, Bong-Soo Kim

Abstract: Random basketball routing (BR) \cite {Hwang} is a simple protocol that integrates MAC and multihop routing in a cross-layer optimized manner. Due to its lightness and performance, BR would be quite suitable for sensor networks, where communication nodes are usually simple devices. In this paper, we describe how we implemented BR in a ZigBee-based (IEEE 802.15.4) sensor network. In \cite{Hwang}, it… ▽ More Random basketball routing (BR) \cite {Hwang} is a simple protocol that integrates MAC and multihop routing in a cross-layer optimized manner. Due to its lightness and performance, BR would be quite suitable for sensor networks, where communication nodes are usually simple devices. In this paper, we describe how we implemented BR in a ZigBee-based (IEEE 802.15.4) sensor network. In \cite{Hwang}, it is verified that BR takes advantages of dynamic environments (in particular, node mobility), however, here we focus on how BR works under static situations. For implementation purposes, we add some features such as destination RSSI measuring and loop-free procedure, to the original BR. With implemented testbed, we compare the performance of BR with that of the simplified AODV with CSMA/CA. The result is that BR has merits in terms of number of hops to traverse the network. Considering the simple structure of BR and its possible energy-efficiency, we can conclude that BR can be a good candidate for sensor networks both under dynamic- and static environments. △ Less

Submitted 19 December, 2013; originally announced December 2013.

Journal ref: in Proc. IEEE APWCS 2007, Hsinchu, Taiwan, August, 2007

Showing 1–42 of 42 results for author: Hwang, Y