-
FeatureSORT: Essential Features for Effective Tracking
Authors:
Hamidreza Hashempoor,
Rosemary Koikara,
Yu Dong Hwang
Abstract:
In this work, we introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective. we provide multiple feature modules each of which stands for a particular appearance information. By integrating distinct appearance features, including clothing color, style, and target direction, alongside a ReID network for robust embedding extraction, our…
▽ More
In this work, we introduce a novel tracker designed for online multiple object tracking with a focus on being simple, while being effective. we provide multiple feature modules each of which stands for a particular appearance information. By integrating distinct appearance features, including clothing color, style, and target direction, alongside a ReID network for robust embedding extraction, our tracker significantly enhances online tracking accuracy. Additionally, we propose the incorporation of a stronger detector and also provide an advanced post processing methods that further elevate the tracker's performance. During real time operation, we establish measurement to track associated distance function which includes the IoU, direction, color, style, and ReID features similarity information, where each metric is calculated separately. With the design of our feature related distance function, it is possible to track objects through longer period of occlusions, while keeping the number of identity switches comparatively low. Extensive experimental evaluation demonstrates notable improvement in tracking accuracy and reliability, as evidenced by reduced identity switches and enhanced occlusion handling. These advancements not only contribute to the state of the art in object tracking but also open new avenues for future research and practical applications demanding high precision and reliability.
△ Less
Submitted 5 July, 2024;
originally announced July 2024.
-
CAFO: Feature-Centric Explanation on Time Series Classification
Authors:
Jaeho Kim,
Seok-Ju Hahn,
Yoontae Hwang,
Junghye Lee,
Seulki Lee
Abstract:
In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time…
▽ More
In multivariate time series (MTS) classification, finding the important features (e.g., sensors) for model performance is crucial yet challenging due to the complex, high-dimensional nature of MTS data, intricate temporal dynamics, and the necessity for domain-specific interpretations. Current explanation methods for MTS mostly focus on time-centric explanations, apt for pinpointing important time periods but less effective in identifying key features. This limitation underscores the pressing need for a feature-centric approach, a vital yet often overlooked perspective that complements time-centric analysis. To bridge this gap, our study introduces a novel feature-centric explanation and evaluation framework for MTS, named CAFO (Channel Attention and Feature Orthgonalization). CAFO employs a convolution-based approach with channel attention mechanisms, incorporating a depth-wise separable channel attention module (DepCA) and a QR decomposition-based loss for promoting feature-wise orthogonality. We demonstrate that this orthogonalization enhances the separability of attention distributions, thereby refining and stabilizing the ranking of feature importance. This improvement in feature-wise ranking enhances our understanding of feature explainability in MTS. Furthermore, we develop metrics to evaluate global and class-specific feature importance. Our framework's efficacy is validated through extensive empirical analyses on two major public benchmarks and real-world datasets, both synthetic and self-collected, specifically designed to highlight class-wise discriminative features. The results confirm CAFO's robustness and informative capacity in assessing feature importance in MTS classification tasks. This study not only advances the understanding of feature-centric explanations in MTS but also sets a foundation for future explorations in feature-centric explanations.
△ Less
Submitted 11 June, 2024; v1 submitted 3 June, 2024;
originally announced June 2024.
-
Understanding On-the-Fly End-User Robot Programming
Authors:
Laura Stegner,
Yuna Hwang,
David Porfirio,
Bilge Mutlu
Abstract:
Novel end-user programming (EUP) tools enable on-the-fly (i.e., spontaneous, easy, and rapid) creation of interactions with robotic systems. These tools are expected to empower users in determining system behavior, although very little is understood about how end users perceive, experience, and use these systems. In this paper, we seek to address this gap by investigating end-user experience with…
▽ More
Novel end-user programming (EUP) tools enable on-the-fly (i.e., spontaneous, easy, and rapid) creation of interactions with robotic systems. These tools are expected to empower users in determining system behavior, although very little is understood about how end users perceive, experience, and use these systems. In this paper, we seek to address this gap by investigating end-user experience with on-the-fly robot EUP. We trained 21 end users to use an existing on-the-fly EUP tool, asked them to create robot interactions for four scenarios, and assessed their overall experience. Our findings provide insight into how these systems should be designed to better support end-user experience with on-the-fly EUP, focusing on user interaction with an automatic program synthesizer that resolves imprecise user input, the use of multimodal inputs to express user intent, and the general process of programming a robot.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
Automatic Channel Pruning for Multi-Head Attention
Authors:
Eunho Lee,
Youngbae Hwang
Abstract:
Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic ch…
▽ More
Despite the strong performance of Transformers, their quadratic computation complexity presents challenges in applying them to vision tasks. Automatic pruning is one of effective methods for reducing computation complexity without heuristic approaches. However, directly applying it to multi-head attention is not straightforward due to channel misalignment. In this paper, we propose an automatic channel pruning method to take into account the multi-head attention mechanism. First, we incorporate channel similarity-based weights into the pruning indicator to preserve more informative channels in each head. Then, we adjust pruning indicator to enforce removal of channels in equal proportions across all heads, preventing the channel misalignment. We also add a reweight module to compensate for information loss resulting from channel removal, and an effective initialization step for pruning indicator based on difference of attention between original structure and each channel. Our proposed method can be used to not only original attention, but also linear attention, which is more efficient as linear complexity with respect to the number of tokens. On ImageNet-1K, applying our pruning method to the FLattenTransformer, which includes both attention mechanisms, shows outperformed accuracy for several MACs compared with previous state-of-the-art efficient models and pruned methods. Code will be available soon.
△ Less
Submitted 31 May, 2024;
originally announced May 2024.
-
Semi-Supervised Domain Adaptation for Wildfire Detection
Authors:
JooYoung Jang,
Youngseo Cha,
Jisu Kim,
SooHyung Lee,
Geonu Lee,
Minkook Cho,
Young Hwang,
Nojun Kwak
Abstract:
Recently, both the frequency and intensity of wildfires have increased worldwide, primarily due to climate change. In this paper, we propose a novel protocol for wildfire detection, leveraging semi-supervised Domain Adaptation for object detection, accompanied by a corresponding dataset designed for use by both academics and industries. Our dataset encompasses 30 times more diverse labeled scenes…
▽ More
Recently, both the frequency and intensity of wildfires have increased worldwide, primarily due to climate change. In this paper, we propose a novel protocol for wildfire detection, leveraging semi-supervised Domain Adaptation for object detection, accompanied by a corresponding dataset designed for use by both academics and industries. Our dataset encompasses 30 times more diverse labeled scenes for the current largest benchmark wildfire dataset, HPWREN, and introduces a new labeling policy for wildfire detection. Inspired by CoordConv, we propose a robust baseline, Location-Aware Object Detection for Semi-Supervised Domain Adaptation (LADA), utilizing a teacher-student based framework capable of extracting translational variance features characteristic of wildfires. With only using 1% target domain labeled data, our framework significantly outperforms our source-only baseline by a notable margin of 3.8% in mean Average Precision on the HPWREN wildfire dataset. Our dataset is available at https://github.com/BloomBerry/LADA.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
MP2D: An Automated Topic Shift Dialogue Generation Framework Leveraging Knowledge Graphs
Authors:
Yerin Hwang,
Yongil Kim,
Yunah Jang,
Jeesoo Bang,
Hyunkyung Bae,
Kyomin Jung
Abstract:
Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions.…
▽ More
Despite advancements in on-topic dialogue systems, effectively managing topic shifts within dialogues remains a persistent challenge, largely attributed to the limited availability of training datasets. To address this issue, we propose Multi-Passage to Dialogue (MP2D), a data generation framework that automatically creates conversational question-answering datasets with natural topic transitions. By leveraging the relationships between entities in a knowledge graph, MP2D maps the flow of topics within a dialogue, effectively mirroring the dynamics of human conversation. It retrieves relevant passages corresponding to the topics and transforms them into dialogues through the passage-to-dialogue method. Through quantitative and qualitative experiments, we demonstrate MP2D's efficacy in generating dialogue with natural topic shifts. Furthermore, this study introduces a novel benchmark for topic shift dialogues, TS-WikiDialog. Utilizing the dataset, we demonstrate that even Large Language Models (LLMs) struggle to handle topic shifts in dialogue effectively, and we showcase the performance improvements of models trained on datasets generated by MP2D across diverse topic shift dialogue tasks.
△ Less
Submitted 9 March, 2024;
originally announced March 2024.
-
Reliability-based G1 Continuous Arc Spline Approximation
Authors:
Jinhwan Jeon,
Yoonjin Hwang,
Seibum B. Choi
Abstract:
In this paper, we present an algorithm to approximate a set of data points with G1 continuous arcs, using points' covariance data. To the best of our knowledge, previous arc spline approximation approaches assumed that all data points contribute equally (i.e. have the same weights) during the approximation process. However, this assumption may cause serious instability in the algorithm, if the col…
▽ More
In this paper, we present an algorithm to approximate a set of data points with G1 continuous arcs, using points' covariance data. To the best of our knowledge, previous arc spline approximation approaches assumed that all data points contribute equally (i.e. have the same weights) during the approximation process. However, this assumption may cause serious instability in the algorithm, if the collected data contains outliers. To resolve this issue, a robust method for arc spline approximation is suggested in this work, assuming that the 2D covariance for each data point is given. Starting with the definition of models and parameters for single arc approximation, the framework is extended to multiple-arc approximation for general usage. Then the proposed algorithm is verified using generated noisy data and real-world collected data via vehicle experiment in Sejong City, South Korea.
△ Less
Submitted 18 January, 2024;
originally announced January 2024.
-
Dialogizer: Context-aware Conversational-QA Dataset Generation from Textual Sources
Authors:
Yerin Hwang,
Yongil Kim,
Hyunkyung Bae,
Jeesoo Bang,
Hwanhee Lee,
Kyomin Jung
Abstract:
To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer…
▽ More
To address the data scarcity issue in Conversational question answering (ConvQA), a dialog inpainting method, which utilizes documents to generate ConvQA datasets, has been proposed. However, the original dialog inpainting model is trained solely on the dialog reconstruction task, resulting in the generation of questions with low contextual relevance due to insufficient learning of question-answer alignment. To overcome this limitation, we propose a novel framework called Dialogizer, which has the capability to automatically generate ConvQA datasets with high contextual relevance from textual sources. The framework incorporates two training tasks: question-answer matching (QAM) and topic-aware dialog generation (TDG). Moreover, re-ranking is conducted during the inference phase based on the contextual relevance of the generated questions. Using our framework, we produce four ConvQA datasets by utilizing documents from multiple domains as the primary source. Through automatic evaluation using diverse metrics, as well as human evaluation, we validate that our proposed framework exhibits the ability to generate datasets of higher quality compared to the baseline dialog inpainting model.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
What is prompt literacy? An exploratory study of language learners' development of new literacy skill using generative AI
Authors:
Yohan Hwang,
Jang Ho Lee,
Dongkwang Shin
Abstract:
In the current study,we propose that, in the era of generative AI, there is now a new form of literacy called "prompt literacy," which refers to the ability to generate precise prompts as input for AI systems, interpret the outputs, and iteratively refine prompts to achieve desired results. To explore the emergence and development of this literacy skill, the current study examined 30 EFL students'…
▽ More
In the current study,we propose that, in the era of generative AI, there is now a new form of literacy called "prompt literacy," which refers to the ability to generate precise prompts as input for AI systems, interpret the outputs, and iteratively refine prompts to achieve desired results. To explore the emergence and development of this literacy skill, the current study examined 30 EFL students' engagement in an AI-powered image creation project, through which they created artworks representing the socio-cultural meanings of English words by iteratively drafting and refining prompts in generative AI tools. By examining AI-generated images and the participants' drafting and revision of their prompts, this study demonstrated the emergence of learners' prompt literacy skills. The survey data further showed the participants' perceived improvement in their vocabulary learning strategies as a result of engaging in the target AI-powered project. In addition, the participants' post-project reflection revealed three benefits of developing prompt literacy: enjoyment from manifesting imagined outcomes; recognition of its importance for communication, problem-solving and career development; and the enhanced understanding of the collaborative nature of human-AI interaction. These findings suggest that prompt literacy is an increasingly crucial literacy for the AI era.
△ Less
Submitted 9 November, 2023;
originally announced November 2023.
-
Real-Time Magnetic Tracking and Diagnosis of COVID-19 via Machine Learning
Authors:
Dang Nguyen,
Phat K. Huynh,
Vinh Duc An Bui,
Kee Young Hwang,
Nityanand Jain,
Chau Nguyen,
Le Huu Nhat Minh,
Le Van Truong,
Xuan Thanh Nguyen,
Dinh Hoang Nguyen,
Le Tien Dung,
Trung Q. Le,
Manh-Huong Phan
Abstract:
The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through thre…
▽ More
The COVID-19 pandemic underscored the importance of reliable, noninvasive diagnostic tools for robust public health interventions. In this work, we fused magnetic respiratory sensing technology (MRST) with machine learning (ML) to create a diagnostic platform for real-time tracking and diagnosis of COVID-19 and other respiratory diseases. The MRST precisely captures breathing patterns through three specific breath testing protocols: normal breath, holding breath, and deep breath. We collected breath data from both COVID-19 patients and healthy subjects in Vietnam using this platform, which then served to train and validate ML models. Our evaluation encompassed multiple ML algorithms, including support vector machines and deep learning models, assessing their ability to diagnose COVID-19. Our multi-model validation methodology ensures a thorough comparison and grants the adaptability to select the most optimal model, striking a balance between diagnostic precision with model interpretability. The findings highlight the exceptional potential of our diagnostic tool in pinpointing respiratory anomalies, achieving over 90% accuracy. This innovative sensor technology can be seamlessly integrated into healthcare settings for patient monitoring, marking a significant enhancement for the healthcare infrastructure.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
Overcoming Overconfidence for Active Learning
Authors:
Yujin Hwang,
Won Jo,
Juyoung Hong,
Yukyung Choi
Abstract:
It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adju…
▽ More
It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adjust the model. However, due to the limited amount of data in each iteration, the model is vulnerable to bias; thus, it is more likely to yield overconfident predictions. In this paper, we present two novel methods to address the problem of overconfidence that arises in the active learning scenario. The first is an augmentation strategy named Cross-Mix-and-Mix (CMaM), which aims to calibrate the model by expanding the limited training distribution. The second is a selection strategy named Ranked Margin Sampling (RankedMS), which prevents choosing data that leads to overly confident predictions. Through various experiments and analyses, we are able to demonstrate that our proposals facilitate efficient data selection by alleviating overconfidence, even though they are readily applicable.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Cell Spatial Analysis in Crohn's Disease: Unveiling Local Cell Arrangement Pattern with Graph-based Signatures
Authors:
Shunxing Bao,
Sichen Zhu,
Vasantha L Kolachala,
Lucas W. Remedios,
Yeonjoo Hwang,
Yutong Sun,
Ruining Deng,
Can Cui,
Yike Li,
Jia Li,
Joseph T. Roland,
Qi Liu,
Ken S. Lau,
Subra Kugathasan,
Peng Qiu,
Keith T. Wilson,
Lori A. Coburn,
Bennett A. Landman,
Yuankai Huo
Abstract:
Crohn's disease (CD) is a chronic and relapsing inflammatory condition that affects segments of the gastrointestinal tract. CD activity is determined by histological findings, particularly the density of neutrophils observed on Hematoxylin and Eosin stains (H&E) imaging. However, understanding the broader morphometry and local cell arrangement beyond cell counting and tissue morphology remains cha…
▽ More
Crohn's disease (CD) is a chronic and relapsing inflammatory condition that affects segments of the gastrointestinal tract. CD activity is determined by histological findings, particularly the density of neutrophils observed on Hematoxylin and Eosin stains (H&E) imaging. However, understanding the broader morphometry and local cell arrangement beyond cell counting and tissue morphology remains challenging. To address this, we characterize six distinct cell types from H&E images and develop a novel approach for the local spatial signature of each cell. Specifically, we create a 10-cell neighborhood matrix, representing neighboring cell arrangements for each individual cell. Utilizing t-SNE for non-linear spatial projection in scatter-plot and Kernel Density Estimation contour-plot formats, our study examines patterns of differences in the cellular environment associated with the odds ratio of spatial patterns between active CD and control groups. This analysis is based on data collected at the two research institutes. The findings reveal heterogeneous nearest-neighbor patterns, signifying distinct tendencies of cell clustering, with a particular focus on the rectum region. These variations underscore the impact of data heterogeneity on cell spatial arrangements in CD patients. Moreover, the spatial distribution disparities between the two research sites highlight the significance of collaborative efforts among healthcare organizations. All research analysis pipeline tools are available at https://github.com/MASILab/cellNN.
△ Less
Submitted 20 August, 2023;
originally announced August 2023.
-
Unsupervised detection of small hyperreflective features in ultrahigh resolution optical coherence tomography
Authors:
Marcel Reimann,
Jungeun Won,
Hiroyuki Takahashi,
Antonio Yaghy,
Yunchan Hwang,
Stefan Ploner,
Junhong Lin,
Jessica Girgis,
Kenneth Lam,
Siyu Chen,
Nadia K. Waheed,
Andreas Maier,
James G. Fujimoto
Abstract:
Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential associa…
▽ More
Recent advances in optical coherence tomography such as the development of high speed ultrahigh resolution scanners and corresponding signal processing techniques may reveal new potential biomarkers in retinal diseases. Newly visible features are, for example, small hyperreflective specks in age-related macular degeneration. Identifying these new markers is crucial to investigate potential association with disease progression and treatment outcomes. Therefore, it is necessary to reliably detect these features in 3D volumetric scans. Because manual labeling of entire volumes is infeasible a need for automatic detection arises. Labeled datasets are often not publicly available and there are usually large variations in scan protocols and scanner types. Thus, this work focuses on an unsupervised approach that is based on local peak-detection and random walker segmentation to detect small features on each B-scan of the volume.
△ Less
Submitted 26 March, 2023;
originally announced March 2023.
-
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
Authors:
Yongil Kim,
Yerin Hwang,
Hyeongu Yun,
Seunghyun Yoon,
Trung Bui,
Kyomin Jung
Abstract:
Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning. This paper proposes Perturbation Robust Multi-Lingual CLIPScore(PR-MCS), which exhibits robustness to such perturbations, as a novel reference-free image captioning metric applicable to multiple languages. To achieve perturbation robustness, we fine-tune the text encoder of CLIP with…
▽ More
Vulnerability to lexical perturbation is a critical weakness of automatic evaluation metrics for image captioning. This paper proposes Perturbation Robust Multi-Lingual CLIPScore(PR-MCS), which exhibits robustness to such perturbations, as a novel reference-free image captioning metric applicable to multiple languages. To achieve perturbation robustness, we fine-tune the text encoder of CLIP with our language-agnostic method to distinguish the perturbed text from the original text. To verify the robustness of PR-MCS, we introduce a new fine-grained evaluation dataset consisting of detailed captions, critical objects, and the relationships between the objects for 3, 000 images in five languages. In our experiments, PR-MCS significantly outperforms baseline metrics in capturing lexical noise of all various perturbation types in all five languages, proving that PR-MCS is highly robust to lexical perturbations.
△ Less
Submitted 15 March, 2023;
originally announced March 2023.
-
Cross-speaker Emotion Transfer by Manipulating Speech Style Latents
Authors:
Suhee Jo,
Younggun Lee,
Yookyung Shin,
Yeongtae Hwang,
Taesu Kim
Abstract:
In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic…
▽ More
In recent years, emotional text-to-speech has shown considerable progress. However, it requires a large amount of labeled data, which is not easily accessible. Even if it is possible to acquire an emotional speech dataset, there is still a limitation in controlling emotion intensity. In this work, we propose a novel method for cross-speaker emotion transfer and manipulation using vector arithmetic in latent style space. By leveraging only a few labeled samples, we generate emotional speech from reading-style speech without losing the speaker identity. Furthermore, emotion strength is readily controllable using a scalar value, providing an intuitive way for users to manipulate speech. Experimental results show the proposed method affords superior performance in terms of expressiveness, naturalness, and controllability, preserving speaker identity.
△ Less
Submitted 14 March, 2023;
originally announced March 2023.
-
Image-based Early Detection System for Wildfires
Authors:
Omkar Ranadive,
Jisu Kim,
Serin Lee,
Youngseo Cha,
Heechan Park,
Minkook Cho,
Young K. Hwang
Abstract:
Wildfires are a disastrous phenomenon which cause damage to land, loss of property, air pollution, and even loss of human life. Due to the warmer and drier conditions created by climate change, more severe and uncontrollable wildfires are expected to occur in the coming years. This could lead to a global wildfire crisis and have dire consequences on our planet. Hence, it has become imperative to u…
▽ More
Wildfires are a disastrous phenomenon which cause damage to land, loss of property, air pollution, and even loss of human life. Due to the warmer and drier conditions created by climate change, more severe and uncontrollable wildfires are expected to occur in the coming years. This could lead to a global wildfire crisis and have dire consequences on our planet. Hence, it has become imperative to use technology to help prevent the spread of wildfires. One way to prevent the spread of wildfires before they become too large is to perform early detection i.e, detecting the smoke before the actual fire starts. In this paper, we present our Wildfire Detection and Alert System which use machine learning to detect wildfire smoke with a high degree of accuracy and can send immediate alerts to users. Our technology is currently being used in the USA to monitor data coming in from hundreds of cameras daily. We show that our system has a high true detection rate and a low false detection rate. Our performance evaluation study also shows that on an average our system detects wildfire smoke faster than an actual person.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Stochastic Adaptive Activation Function
Authors:
Kyungsu Lee,
Jaeseung Yang,
Haeyun Lee,
Jae Youn Hwang
Abstract:
The simulation of human neurons and neurotransmission mechanisms has been realized in deep neural networks based on the theoretical implementations of activation functions. However, recent studies have reported that the threshold potential of neurons exhibits different values according to the locations and types of individual neurons, and that the activation functions have limitations in terms of…
▽ More
The simulation of human neurons and neurotransmission mechanisms has been realized in deep neural networks based on the theoretical implementations of activation functions. However, recent studies have reported that the threshold potential of neurons exhibits different values according to the locations and types of individual neurons, and that the activation functions have limitations in terms of representing this variability. Therefore, this study proposes a simple yet effective activation function that facilitates different thresholds and adaptive activations according to the positions of units and the contexts of inputs. Furthermore, the proposed activation function mathematically exhibits a more generalized form of Swish activation function, and thus we denoted it as Adaptive SwisH (ASH). ASH highlights informative features that exhibit large values in the top percentiles in an input, whereas it rectifies low values. Most importantly, ASH exhibits trainable, adaptive, and context-aware properties compared to other activation functions. Furthermore, ASH represents general formula of the previously studied activation function and provides a reasonable mathematical background for the superior performance. To validate the effectiveness and robustness of ASH, we implemented ASH into many deep learning models for various tasks, including classification, detection, segmentation, and image generation. Experimental analysis demonstrates that our activation function can provide the benefits of more accurate prediction and earlier convergence in many deep learning applications.
△ Less
Submitted 20 October, 2022;
originally announced October 2022.
-
Deep Q-Network for AI Soccer
Authors:
Curie Kim,
Yewon Hwang,
Jong-Hwan Kim
Abstract:
Reinforcement learning has shown an outstanding performance in the applications of games, particularly in Atari games as well as Go. Based on these successful examples, we attempt to apply one of the well-known reinforcement learning algorithms, Deep Q-Network, to the AI Soccer game. AI Soccer is a 5:5 robot soccer game where each participant develops an algorithm that controls five robots in a te…
▽ More
Reinforcement learning has shown an outstanding performance in the applications of games, particularly in Atari games as well as Go. Based on these successful examples, we attempt to apply one of the well-known reinforcement learning algorithms, Deep Q-Network, to the AI Soccer game. AI Soccer is a 5:5 robot soccer game where each participant develops an algorithm that controls five robots in a team to defeat the opponent participant. Deep Q-Network is designed to implement our original rewards, the state space, and the action space to train each agent so that it can take proper actions in different situations during the game. Our algorithm was able to successfully train the agents, and its performance was preliminarily proven through the mini-competition against 10 teams wishing to take part in the AI Soccer international competition. The competition was organized by the AI World Cup committee, in conjunction with the WCG 2019 Xi'an AI Masters. With our algorithm, we got the achievement of advancing to the round of 16 in this international competition with 130 teams from 39 countries.
△ Less
Submitted 21 September, 2022; v1 submitted 20 September, 2022;
originally announced September 2022.
-
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS
Authors:
Yookyung Shin,
Younggun Lee,
Suhee Jo,
Yeongtae Hwang,
Taesu Kim
Abstract:
Expressive text-to-speech has shown improved performance in recent years. However, the style control of synthetic speech is often restricted to discrete emotion categories and requires training data recorded by the target speaker in the target style. In many practical situations, users may not have reference speech recorded in target emotion but still be interested in controlling speech style just…
▽ More
Expressive text-to-speech has shown improved performance in recent years. However, the style control of synthetic speech is often restricted to discrete emotion categories and requires training data recorded by the target speaker in the target style. In many practical situations, users may not have reference speech recorded in target emotion but still be interested in controlling speech style just by typing text description of desired emotional style. In this paper, we propose a text-based interface for emotional style control and cross-speaker style transfer in multi-speaker TTS. We propose the bi-modal style encoder which models the semantic relationship between text description embedding and speech style embedding with a pretrained language model. To further improve cross-speaker style transfer on disjoint, multi-style datasets, we propose the novel style loss. The experimental results show that our model can generate high-quality expressive speech even in unseen style.
△ Less
Submitted 13 July, 2022;
originally announced July 2022.
-
AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider
Authors:
C. Fanelli,
Z. Papandreou,
K. Suresh,
J. K. Adkins,
Y. Akiba,
A. Albataineh,
M. Amaryan,
I. C. Arsene,
C. Ayerbe Gayoso,
J. Bae,
X. Bai,
M. D. Baker,
M. Bashkanov,
R. Bellwied,
F. Benmokhtar,
V. Berdnikov,
J. C. Bernauer,
F. Bock,
W. Boeglin,
M. Borysova,
E. Brash,
P. Brindza,
W. J. Briscoe,
M. Brooks,
S. Bueltmann
, et al. (258 additional authors not shown)
Abstract:
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to…
▽ More
The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector.
△ Less
Submitted 19 May, 2022; v1 submitted 18 May, 2022;
originally announced May 2022.
-
Explainable Deep Learning Algorithm for Distinguishing Incomplete Kawasaki Disease by Coronary Artery Lesions on Echocardiographic Imaging
Authors:
Haeyun Lee,
Yongsoon Eun,
Jae Youn Hwang,
Lucy Youngmin Eun
Abstract:
Background and Objective: Incomplete Kawasaki disease (KD) has often been misdiagnosed due to a lack of the clinical manifestations of classic KD. However, it is associated with a markedly higher prevalence of coronary artery lesions. Identifying coronary artery lesions by echocardiography is important for the timely diagnosis of and favorable outcomes in KD. Moreover, similar to KD, coronavirus d…
▽ More
Background and Objective: Incomplete Kawasaki disease (KD) has often been misdiagnosed due to a lack of the clinical manifestations of classic KD. However, it is associated with a markedly higher prevalence of coronary artery lesions. Identifying coronary artery lesions by echocardiography is important for the timely diagnosis of and favorable outcomes in KD. Moreover, similar to KD, coronavirus disease 2019, currently causing a worldwide pandemic, also manifests with fever; therefore, it is crucial at this moment that KD should be distinguished clearly among the febrile diseases in children. In this study, we aimed to validate a deep learning algorithm for classification of KD and other acute febrile diseases.
Methods: We obtained coronary artery images by echocardiography of children (n = 88 for KD; n = 65 for pneumonia). We trained six deep learning networks (VGG19, Xception, ResNet50, ResNext50, SE-ResNet50, and SE-ResNext50) using the collected data.
Results: SE-ResNext50 showed the best performance in terms of accuracy, specificity, and precision in the classification. SE-ResNext50 offered a precision of 76.35%, a sensitivity of 82.64%, and a specificity of 58.12%.
Conclusions: The results of our study suggested that deep learning algorithms have similar performance to an experienced cardiologist in detecting coronary artery lesions to facilitate the diagnosis of KD.
△ Less
Submitted 5 April, 2022;
originally announced April 2022.
-
Personalization Trade-offs in Designing a Dialogue-based Information System for Support-Seeking of Sexual Violence Survivors
Authors:
Hyeok Kim,
Youjin Hwang,
Jieun Lee,
Youngjin Kwon,
Yujin Park,
Joonhwan Lee
Abstract:
The lack of reliable, personalized information often complicates sexual violence survivors' support-seeking. Recently, there is an emerging approach to conversational information systems for support-seeking of sexual violence survivors, featuring personalization with wide availability and anonymity. However, a single best solution might not exist as sexual violence survivors have different needs a…
▽ More
The lack of reliable, personalized information often complicates sexual violence survivors' support-seeking. Recently, there is an emerging approach to conversational information systems for support-seeking of sexual violence survivors, featuring personalization with wide availability and anonymity. However, a single best solution might not exist as sexual violence survivors have different needs and purposes in seeking support channels. To better envision conversational support-seeking systems for sexual violence survivors, we explore personalization trade-offs in designing such information systems. We implement a high-fidelity prototype dialogue-based information system through four design workshop sessions with three professional caregivers and interviewed with four self-identified survivors using our prototype. We then identify two forms of personalization trade-offs for conversational support-seeking systems: (1) specificity and sensitivity in understanding users and (2) relevancy and inclusiveness in providing information. To handle these trade-offs, we propose a reversed approach that starts from designing information and inclusive tailoring that considers unspecified needs, respectively.
△ Less
Submitted 18 February, 2022;
originally announced February 2022.
-
Best Practices and Scoring System on Reviewing A.I. based Medical Imaging Papers: Part 1 Classification
Authors:
Timothy L. Kline,
Felipe Kitamura,
Ian Pan,
Amine M. Korchi,
Neil Tenenholtz,
Linda Moy,
Judy Wawira Gichoya,
Igor Santos,
Steven Blumer,
Misha Ysabel Hwang,
Kim-Ann Git,
Abishek Shroff,
Elad Walach,
George Shih,
Steve Langer
Abstract:
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection i…
▽ More
With the recent advances in A.I. methodologies and their application to medical imaging, there has been an explosion of related research programs utilizing these techniques to produce state-of-the-art classification performance. Ultimately, these research programs culminate in submission of their work for consideration in peer reviewed journals. To date, the criteria for acceptance vs. rejection is often subjective; however, reproducible science requires reproducible review. The Machine Learning Education Sub-Committee of SIIM has identified a knowledge gap and a serious need to establish guidelines for reviewing these studies. Although there have been several recent papers with this goal, this present work is written from the machine learning practitioners standpoint. In this series, the committee will address the best practices to be followed in an A.I.-based study and present the required sections in terms of examples and discussion of what should be included to make the studies cohesive, reproducible, accurate, and self-contained. This first entry in the series focuses on the task of image classification. Elements such as dataset curation, data pre-processing steps, defining an appropriate reference standard, data partitioning, model architecture and training are discussed. The sections are presented as they would be detailed in a typical manuscript, with content describing the necessary information that should be included to make sure the study is of sufficient quality to be considered for publication. The goal of this series is to provide resources to not only help improve the review process for A.I.-based medical imaging papers, but to facilitate a standard for the information that is presented within all components of the research study. We hope to provide quantitative metrics in what otherwise may be a qualitative review process.
△ Less
Submitted 3 February, 2022;
originally announced February 2022.
-
Modeling ventilation in a low-income house in Dhaka, Bangladesh
Authors:
Yunjae Hwang,
Laura,
Kwong,
Mohammad Saeed Munim,
Fosiul Alam Nizame,
Stephen Luby,
Catherine Gorlé
Abstract:
According to UNICEF, pneumonia is the leading cause of death in children under 5. 70% of worldwide pneumonia deaths occur in only 15 countries, including Bangladesh. Previous research has indicated a potential association between the incidence of pneumonia and the presence of cross-ventilation in slum housing in Dhaka, Bangladesh. The objective of this research is to establish a validated computat…
▽ More
According to UNICEF, pneumonia is the leading cause of death in children under 5. 70% of worldwide pneumonia deaths occur in only 15 countries, including Bangladesh. Previous research has indicated a potential association between the incidence of pneumonia and the presence of cross-ventilation in slum housing in Dhaka, Bangladesh. The objective of this research is to establish a validated computational framework that can predict ventilation rates in slum homes to support further studies investigating this correlation. To achieve this objective we employ a building thermal model (BTM) in combination with uncertainty quantification (UQ). The BTM solves for the time-evolution of volume-averaged temperatures in a typical home, considering different ventilation configurations. The UQ method propagates uncertainty in model parameters, weather inputs, and physics models to predict mean values and 95% confidence intervals for the quantities of interest, namely temperatures and ventilation rates in terms of air changes per hour (ACH). The model predictions are compared to on-site field measurements of air and thermal mass temperatures, and of ACH. The results indicate that the use of standard cross- or single-sided ventilation models limits the accuracy of the ACH predictions; in contrast, a model based on a similarity relationship informed by the available ACH measurements can produce more accurate predictions with confidence intervals that encompass the measurements for 12 of the 17 available data points.
△ Less
Submitted 30 January, 2022;
originally announced February 2022.
-
Unique Games hardness of Quantum Max-Cut, and a conjectured vector-valued Borell's inequality
Authors:
Yeongwoo Hwang,
Joe Neeman,
Ojas Parekh,
Kevin Thompson,
John Wright
Abstract:
The Gaussian noise stability of a function $f:\mathbb{R}^n \to \{-1, 1\}$ is the expected value of $f(\boldsymbol{x}) \cdot f(\boldsymbol{y})$ over $ρ$-correlated Gaussian random variables $\boldsymbol{x}$ and $\boldsymbol{y}$. Borell's inequality states that for $-1 \leq ρ\leq 0$, this is minimized by the halfspace $f(x) = \mathrm{sign}(x_1)$. In this work, we generalize this result to hold for f…
▽ More
The Gaussian noise stability of a function $f:\mathbb{R}^n \to \{-1, 1\}$ is the expected value of $f(\boldsymbol{x}) \cdot f(\boldsymbol{y})$ over $ρ$-correlated Gaussian random variables $\boldsymbol{x}$ and $\boldsymbol{y}$. Borell's inequality states that for $-1 \leq ρ\leq 0$, this is minimized by the halfspace $f(x) = \mathrm{sign}(x_1)$. In this work, we generalize this result to hold for functions $f:\mathbb{R}^n \to S^{k-1}$ which output $k$-dimensional unit vectors. Our main conjecture, which we call the $\textit{vector-valued Borell's inequality}$, asserts that the expected value of $\langle f(\boldsymbol{x}), f(\boldsymbol{y})\rangle$ is minimized by the function $f(x) = x_{\leq k} / \Vert x_{\leq k} \Vert$, where $x_{\leq k} = (x_1, \ldots, x_k)$. We give several pieces of evidence in favor of this conjecture, including a proof that it does indeed hold in the special case of $n = k$.
As an application of this conjecture, we show that it implies several hardness of approximation results for a special case of the local Hamiltonian problem related to the anti-ferromagnetic Heisenberg model known as Quantum Max-Cut. This can be viewed as a natural quantum analogue of the classical Max-Cut problem and has been proposed as a useful testbed for developing algorithms. We show the following, assuming our conjecture:
(1) The integrality gap of the basic SDP is $0.498$, matching an existing rounding algorithm. Combined with existing results, this shows that the basic SDP does not achieve the optimal approximation ratio.
(2) It is Unique Games-hard (UG-hard) to compute a $(0.956+\varepsilon)$-approximation to the value of the best product state, matching an existing approximation algorithm.
(3) It is UG-hard to compute a $(0.956+\varepsilon)$-approximation to the value of the best (possibly entangled) state.
△ Less
Submitted 28 September, 2022; v1 submitted 1 November, 2021;
originally announced November 2021.
-
Contrastive Learning for Context-aware Neural Machine TranslationUsing Coreference Information
Authors:
Yongkeun Hwang,
Hyungu Yun,
Kyomin Jung
Abstract:
Context-aware neural machine translation (NMT) incorporates contextual information of surrounding texts, that can improve the translation quality of document-level machine translation. Many existing works on context-aware NMT have focused on developing new model architectures for incorporating additional contexts and have shown some promising results. However, most existing works rely on cross-ent…
▽ More
Context-aware neural machine translation (NMT) incorporates contextual information of surrounding texts, that can improve the translation quality of document-level machine translation. Many existing works on context-aware NMT have focused on developing new model architectures for incorporating additional contexts and have shown some promising results. However, most existing works rely on cross-entropy loss, resulting in limited use of contextual information. In this paper, we propose CorefCL, a novel data augmentation and contrastive learning scheme based on coreference between the source and contextual sentences. By corrupting automatically detected coreference mentions in the contextual sentence, CorefCL can train the model to be sensitive to coreference inconsistency. We experimented with our method on common context-aware NMT models and two document-level translation tasks. In the experiments, our method consistently improved BLEU of compared models on English-German and English-Korean tasks. We also show that our method significantly improves coreference resolution in the English-German contrastive test suite.
△ Less
Submitted 13 September, 2021;
originally announced September 2021.
-
Applying the Persona of User's Family Member and the Doctor to the Conversational Agents for Healthcare
Authors:
Youjin Hwang,
Donghoon Shin,
Sion Baek,
Bongwon Suh,
Joonhwan Lee
Abstract:
Conversational agents have been showing lots of opportunities in healthcare by taking over a lot of tasks that used to be done by a human. One of the major functions of conversational healthcare agent is intervening users' daily behaviors. In this case, forming an intimate and trustful relationship with users is one of the major issues. Factors affecting human-agent relationship should be deeply e…
▽ More
Conversational agents have been showing lots of opportunities in healthcare by taking over a lot of tasks that used to be done by a human. One of the major functions of conversational healthcare agent is intervening users' daily behaviors. In this case, forming an intimate and trustful relationship with users is one of the major issues. Factors affecting human-agent relationship should be deeply explored to improve long-term acceptance of healthcare agent. Even though a bunch of ideas and researches have been suggested to increase the acceptance of conversational agents in healthcare, challenges still remain. From the preliminary work we conducted, we suggest an idea of applying the personas of users' family members and the doctor who are in the relationship with users in the real world as a solution for forming the rigid relationship between humans and the chatbot.
△ Less
Submitted 2 September, 2021;
originally announced September 2021.
-
Type Anywhere You Want: An Introduction to Invisible Mobile Keyboard
Authors:
Sahng-Min Yoo,
Ue-Hwan Kim,
Yewon Hwang,
Jong-Hwan Kim
Abstract:
Contemporary soft keyboards possess limitations: the lack of physical feedback results in an increase of typos, and the interface of soft keyboards degrades the utility of the screen. To overcome these limitations, we propose an Invisible Mobile Keyboard (IMK), which lets users freely type on the desired area without any constraints. To facilitate a data-driven IMK decoding task, we have collected…
▽ More
Contemporary soft keyboards possess limitations: the lack of physical feedback results in an increase of typos, and the interface of soft keyboards degrades the utility of the screen. To overcome these limitations, we propose an Invisible Mobile Keyboard (IMK), which lets users freely type on the desired area without any constraints. To facilitate a data-driven IMK decoding task, we have collected the most extensive text-entry dataset (approximately 2M pairs of typing positions and the corresponding characters). Additionally, we propose our baseline decoder along with a semantic typo correction mechanism based on self-attention, which decodes such unconstrained inputs with high accuracy (96.0%). Moreover, the user study reveals that the users could type faster and feel convenience and satisfaction to IMK with our decoder. Lastly, we make the source code and the dataset public to contribute to the research community.
△ Less
Submitted 20 August, 2021;
originally announced August 2021.
-
Writing in The Air: Unconstrained Text Recognition from Finger Movement Using Spatio-Temporal Convolution
Authors:
Ue-Hwan Kim,
Yewon Hwang,
Sun-Kyung Lee,
Jong-Hwan Kim
Abstract:
In this paper, we introduce a new benchmark dataset for the challenging writing in the air (WiTA) task -- an elaborate task bridging vision and NLP. WiTA implements an intuitive and natural writing method with finger movement for human-computer interaction (HCI). Our WiTA dataset will facilitate the development of data-driven WiTA systems which thus far have displayed unsatisfactory performance --…
▽ More
In this paper, we introduce a new benchmark dataset for the challenging writing in the air (WiTA) task -- an elaborate task bridging vision and NLP. WiTA implements an intuitive and natural writing method with finger movement for human-computer interaction (HCI). Our WiTA dataset will facilitate the development of data-driven WiTA systems which thus far have displayed unsatisfactory performance -- due to lack of dataset as well as traditional statistical models they have adopted. Our dataset consists of five sub-datasets in two languages (Korean and English) and amounts to 209,926 video instances from 122 participants. We capture finger movement for WiTA with RGB cameras to ensure wide accessibility and cost-efficiency. Next, we propose spatio-temporal residual network architectures inspired by 3D ResNet. These models perform unconstrained text recognition from finger movement, guarantee a real-time operation by processing 435 and 697 decoding frames-per-second for Korean and English, respectively, and will serve as an evaluation standard. Our dataset and the source codes are available at https://github.com/Uehwan/WiTA.
△ Less
Submitted 18 April, 2021;
originally announced April 2021.
-
Domain Adaptive Transfer Attack (DATA)-based Segmentation Networks for Building Extraction from Aerial Images
Authors:
Younghwan Na,
Jun Hee Kim,
Kyungsu Lee,
Juhum Park,
Jae Youn Hwang,
Jihwan P. Choi
Abstract:
Semantic segmentation models based on convolutional neural networks (CNNs) have gained much attention in relation to remote sensing and have achieved remarkable performance for the extraction of buildings from high-resolution aerial images. However, the issue of limited generalization for unseen images remains. When there is a domain gap between the training and test datasets, CNN-based segmentati…
▽ More
Semantic segmentation models based on convolutional neural networks (CNNs) have gained much attention in relation to remote sensing and have achieved remarkable performance for the extraction of buildings from high-resolution aerial images. However, the issue of limited generalization for unseen images remains. When there is a domain gap between the training and test datasets, CNN-based segmentation models trained by a training dataset fail to segment buildings for the test dataset. In this paper, we propose segmentation networks based on a domain adaptive transfer attack (DATA) scheme for building extraction from aerial images. The proposed system combines the domain transfer and adversarial attack concepts. Based on the DATA scheme, the distribution of the input images can be shifted to that of the target images while turning images into adversarial examples against a target network. Defending adversarial examples adapted to the target domain can overcome the performance degradation due to the domain gap and increase the robustness of the segmentation model. Cross-dataset experiments and the ablation study are conducted for the three different datasets: the Inria aerial image labeling dataset, the Massachusetts building dataset, and the WHU East Asia dataset. Compared to the performance of the segmentation network without the DATA scheme, the proposed method shows improvements in the overall IoU. Moreover, it is verified that the proposed method outperforms even when compared to feature adaptation (FA) and output space adaptation (OSA).
△ Less
Submitted 29 April, 2020; v1 submitted 11 April, 2020;
originally announced April 2020.
-
Mel-spectrogram augmentation for sequence to sequence voice conversion
Authors:
Yeongtae Hwang,
Hyemin Cho,
Hongsun Yang,
Dong-Ok Won,
Insoo Oh,
Seong-Whan Lee
Abstract:
For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies…
▽ More
For training the sequence-to-sequence voice conversion model, we need to handle an issue of insufficient data about the number of speech pairs which consist of the same utterance. This study experimentally investigated the effects of Mel-spectrogram augmentation on training the sequence-to-sequence voice conversion (VC) model from scratch. For Mel-spectrogram augmentation, we adopted the policies proposed in SpecAugment. In addition, we proposed new policies (i.e., frequency warping, loudness and time length control) for more data variations. Moreover, to find the appropriate hyperparameters of augmentation policies without training the VC model, we proposed hyperparameter search strategy and the new metric for reducing experimental cost, namely deformation per deteriorating ratio. We compared the effect of these Mel-spectrogram augmentation methods based on various sizes of training set and augmentation policies. In the experimental results, the time axis warping based policies (i.e., time length control and time warping.) showed better performance than other policies. These results indicate that the use of the Mel-spectrogram augmentation is more beneficial for training the VC model.
△ Less
Submitted 15 June, 2020; v1 submitted 6 January, 2020;
originally announced January 2020.
-
Mobile Artificial Intelligence Technology for Detecting Macula Edema and Subretinal Fluid on OCT Scans: Initial Results from the DATUM alpha Study
Authors:
Stephen G. Odaibo,
Mikelson MomPremier,
Richard Y. Hwang,
Salman J. Yousuf,
Steven L. Williams,
Joshua Grant
Abstract:
Artificial Intelligence (AI) is necessary to address the large and growing deficit in retina and healthcare access globally. And mobile AI diagnostic platforms running in the Cloud may effectively and efficiently distribute such AI capability. Here we sought to evaluate the feasibility of Cloud-based mobile artificial intelligence for detection of retinal disease. And to evaluate the accuracy of a…
▽ More
Artificial Intelligence (AI) is necessary to address the large and growing deficit in retina and healthcare access globally. And mobile AI diagnostic platforms running in the Cloud may effectively and efficiently distribute such AI capability. Here we sought to evaluate the feasibility of Cloud-based mobile artificial intelligence for detection of retinal disease. And to evaluate the accuracy of a particular such system for detection of subretinal fluid (SRF) and macula edema (ME) on OCT scans. A multicenter retrospective image analysis was conducted in which board-certified ophthalmologists with fellowship training in retina evaluated OCT images of the macula. They noted the presence or absence of ME or SRF, then compared their assessment to that obtained from Fluid Intelligence, a mobile AI app that detects SRF and ME on OCT scans. Investigators consecutively selected retinal OCTs, while making effort to balance the number of scans with retinal fluid and scans without. Exclusion criteria included poor scan quality, ambiguous features, macula holes, retinoschisis, and dense epiretinal membranes. Accuracy in the form of sensitivity and specificity of the AI mobile App was determined by comparing its assessments to those of the retina specialists. At the time of this submission, five centers have completed their initial studies. This consists of a total of 283 OCT scans of which 155 had either ME or SRF ("wet") and 128 did not ("dry"). The sensitivity ranged from 82.5% to 97% with a weighted average of 89.3%. The specificity ranged from 52% to 100% with a weighted average of 81.23%. CONCLUSION: Cloud-based Mobile AI technology is feasible for the detection retinal disease. In particular, Fluid Intelligence (alpha version), is sufficiently accurate as a screening tool for SRF and ME, especially in underserved areas. Further studies and technology development is needed.
△ Less
Submitted 12 February, 2019; v1 submitted 7 February, 2019;
originally announced February 2019.
-
Hierarchical System Mapping for Large-Scale Fault-Tolerant Quantum Computing
Authors:
Yongsoo Hwang,
Byung-Soo Choi
Abstract:
Considering the large-scale quantum computer, it is important to know how much quantum computational resources is necessary precisely and quickly. Unfortunately the previous methods so far cannot support a large-scale quantum computing practically and therefore the analysis because they usually use a non-structured code. To overcome this problem, we propose a fast mapping by using the hierarchical…
▽ More
Considering the large-scale quantum computer, it is important to know how much quantum computational resources is necessary precisely and quickly. Unfortunately the previous methods so far cannot support a large-scale quantum computing practically and therefore the analysis because they usually use a non-structured code. To overcome this problem, we propose a fast mapping by using the hierarchical assembly code which is much more compact than the non-structured code. During the mapping process, the necessary modules and their interconnection can be dynamically mapped by using the communication bus at the cost of additional qubits. In our study, the proposed method works very fast such as 1 hour than 1500 days for Shor algorithm to factorize 512-bit integer. Meanwhile, since the hierarchical assembly code has high degree of locality, it has shorter SWAP chains and hence it does not increase the quantum computation time than expected.
△ Less
Submitted 21 September, 2018;
originally announced September 2018.
-
Robust Deep Multi-modal Learning Based on Gated Information Fusion Network
Authors:
Jaekyum Kim,
Junho Koh,
Yecheol Kim,
Jaehyung Choi,
Youngbae Hwang,
Jun Won Choi
Abstract:
The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learnin…
▽ More
The goal of multi-modal learning is to use complimentary information on the relevant task provided by the multiple modalities to achieve reliable and robust performance. Recently, deep learning has led significant improvement in multi-modal learning by allowing for the information fusion in the intermediate feature levels. This paper addresses a problem of designing robust deep multi-modal learning architecture in the presence of imperfect modalities. We introduce deep fusion architecture for object detection which processes each modality using the separate convolutional neural network (CNN) and constructs the joint feature map by combining the intermediate features from the CNNs. In order to facilitate the robustness to the degraded modalities, we employ the gated information fusion (GIF) network which weights the contribution from each modality according to the input feature maps to be fused. The weights are determined through the convolutional layers followed by a sigmoid function and trained along with the information fusion network in an end-to-end fashion. Our experiments show that the proposed GIF network offers the additional architectural flexibility to achieve robust performance in handling some degraded modalities, and show a significant performance improvement based on Single Shot Detector (SSD) for KITTI dataset using the proposed fusion network and data augmentation schemes.
△ Less
Submitted 2 November, 2018; v1 submitted 17 July, 2018;
originally announced July 2018.
-
Development of hp-inverse model by using generalized polynomial chaos
Authors:
Kyongmin Yeo,
Youngdeok Hwang,
Xiao Liu,
Jayant Kalagnanam
Abstract:
We present a hp-inverse model to estimate a smooth, non-negative source function from a limited number of observations for a two-dimensional linear source inversion problem. A standard least-square inverse model is formulated by using a set of Gaussian radial basis functions (GRBF) on a rectangular mesh system with a uniform grid space. Here, the choice of the mesh system is modeled as a random va…
▽ More
We present a hp-inverse model to estimate a smooth, non-negative source function from a limited number of observations for a two-dimensional linear source inversion problem. A standard least-square inverse model is formulated by using a set of Gaussian radial basis functions (GRBF) on a rectangular mesh system with a uniform grid space. Here, the choice of the mesh system is modeled as a random variable and the generalized polynomial chaos (gPC) expansion is used to represent the random mesh system. It is shown that the convolution of gPC and GRBF provides hierarchical basis functions for the linear source inverse model with the $hp$-refinement capability. We propose a mixed l_1 and l_2 regularization to exploit the hierarchical nature of the basis functions to find a sparse solution. The $hp$-inverse model has an advantage over the standard least-square inverse model when the number of data is limited. It is shown that the hp-inverse model provides a good estimate of the source function even when the number of unknown parameters ($m$) is much larger the number of data ($n$), e.g., m/n > 40.
△ Less
Submitted 14 December, 2018; v1 submitted 9 January, 2018;
originally announced January 2018.
-
Analysis of the Game-Theoretic Modeling of Backscatter Wireless Sensor Networks under Smart Interference
Authors:
Seung Gwan Hong,
Yu Min Hwang,
Sun Yui Lee,
Yoan Shin,
Dong In Kim,
Jin Young Kim
Abstract:
In this paper, we study an interference avoidance scenario in the presence of a smart interferer which can rapidly observe the transmit power of a backscatter wireless sensor network (WSN) and effectively interrupt backscatter signals. We consider a power control with a sub-channel allocation to avoid interference attacks and a time-switching ratio for backscattering and RF energy harvesting in ba…
▽ More
In this paper, we study an interference avoidance scenario in the presence of a smart interferer which can rapidly observe the transmit power of a backscatter wireless sensor network (WSN) and effectively interrupt backscatter signals. We consider a power control with a sub-channel allocation to avoid interference attacks and a time-switching ratio for backscattering and RF energy harvesting in backscatter WSNs. We formulate the problem based on a Stackelberg game theory and compute the optimal transmit power, time-switching ratio, and sub-channel allocation parameter to maximize a utility function against the smart interference. We propose two algorithms for the utility maximization using Lagrangian dual decomposition for the backscatter WSN and the smart interference to prove the existence of the Stackelberg equilibrium. Numerical results show that the proposed algorithms effectively maximize the utility, compared to that of the algorithm based on the Nash game, so as to overcome smart interference in backscatter communications.
△ Less
Submitted 21 December, 2017;
originally announced December 2017.
-
Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation
Authors:
Subarna Tripathi,
Serge Belongie,
Youngbae Hwang,
Truong Nguyen
Abstract:
Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector. A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performan…
▽ More
Object proposals for detecting moving or static video objects need to address issues such as speed, memory complexity and temporal consistency. We propose an efficient Video Object Proposal (VOP) generation method and show its efficacy in learning a better video object detector. A deep-learning based video object detector learned using the proposed VOP achieves state-of-the-art detection performance on the Youtube-Objects dataset. We further propose a clustering of VOPs which can efficiently be used for detecting objects in video in a streaming fashion. As opposed to applying per-frame convolutional neural network (CNN) based object detection, our proposed method called Objects in Video Enabler thRough LAbel Propagation (OVERLAP) needs to classify only a small fraction of all candidate proposals in every video frame through streaming clustering of object proposals and class-label propagation. Source code will be made available soon.
△ Less
Submitted 20 January, 2016;
originally announced January 2016.
-
The Automatic Statistician: A Relational Perspective
Authors:
Yunseong Hwang,
Anh Tong,
Jaesik Choi
Abstract:
Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite cova…
▽ More
Gaussian Processes (GPs) provide a general and analytically tractable way of modeling complex time-varying, nonparametric functions. The Automatic Bayesian Covariance Discovery (ABCD) system constructs natural-language description of time-series data by treating unknown time-series data nonparametrically using GP with a composite covariance kernel function. Unfortunately, learning a composite covariance kernel with a single time-series data set often results in less informative kernel that may not give qualitative, distinctive descriptions of data. We address this challenge by proposing two relational kernel learning methods which can model multiple time-series data sets by finding common, shared causes of changes. We show that the relational kernel learning methods find more accurate models for regression problems on several real-world data sets; US stock data, US house price index data and currency exchange rate data.
△ Less
Submitted 11 February, 2016; v1 submitted 26 November, 2015;
originally announced November 2015.
-
Semantic Video Segmentation : Exploring Inference Efficiency
Authors:
Subarna Tripathi,
Serge Belongie,
Youngbae Hwang,
Truong Nguyen
Abstract:
We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames. The key idea is to combine best of two worlds: semantic co-labeling and more expressive models. Our formulation enables us to perform inference over ten thousand images within seconds and makes the system amenable to perform video semantic segmentation most effective…
▽ More
We explore the efficiency of the CRF inference beyond image level semantic segmentation and perform joint inference in video frames. The key idea is to combine best of two worlds: semantic co-labeling and more expressive models. Our formulation enables us to perform inference over ten thousand images within seconds and makes the system amenable to perform video semantic segmentation most effectively. On CamVid dataset, with TextonBoost unaries, our proposed method achieves up to 8% improvement in accuracy over individual semantic image segmentation without additional time overhead. The source code is available at https://github.com/subtri/video_inference
△ Less
Submitted 4 September, 2015;
originally announced September 2015.
-
Training-Free Non-Intrusive Load Monitoring of Electric Vehicle Charging with Low Sampling Rate
Authors:
Zhilin Zhang,
Jae Hyun Son,
Ying Li,
Mark Trayer,
Zhouyue Pi,
Dong Yoon Hwang,
Joong Ki Moon
Abstract:
Non-intrusive load monitoring (NILM) is an important topic in smart-grid and smart-home. Many energy disaggregation algorithms have been proposed to detect various individual appliances from one aggregated signal observation. However, few works studied the energy disaggregation of plug-in electric vehicle (EV) charging in the residential environment since EVs charging at home has emerged only rece…
▽ More
Non-intrusive load monitoring (NILM) is an important topic in smart-grid and smart-home. Many energy disaggregation algorithms have been proposed to detect various individual appliances from one aggregated signal observation. However, few works studied the energy disaggregation of plug-in electric vehicle (EV) charging in the residential environment since EVs charging at home has emerged only recently. Recent studies showed that EV charging has a large impact on smart-grid especially in summer. Therefore, EV charging monitoring has become a more important and urgent missing piece in energy disaggregation. In this paper, we present a novel method to disaggregate EV charging signals from aggregated real power signals. The proposed method can effectively mitigate interference coming from air-conditioner (AC), enabling accurate EV charging detection and energy estimation under the presence of AC power signals. Besides, the proposed algorithm requires no training, demands a light computational load, delivers high estimation accuracy, and works well for data recorded at the low sampling rate 1/60 Hz. When the algorithm is tested on real-world data recorded from 11 houses over about a whole year (total 125 months worth of data), the averaged error in estimating energy consumption of EV charging is 15.7 kwh/month (while the true averaged energy consumption of EV charging is 208.5 kwh/month), and the averaged normalized mean square error in disaggregating EV charging load signals is 0.19.
△ Less
Submitted 6 August, 2014; v1 submitted 20 April, 2014;
originally announced April 2014.
-
Improving Streaming Video Segmentation with Early and Mid-Level Visual Processing
Authors:
Subarna Tripathi,
Youngbae Hwang,
Serge Belongie,
Truong Nguyen
Abstract:
Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues. We propose improvements to the leading streaming graph-based hierarchical video segmentation (streamGBH) method based on early and mid level visual processing. The extensive experimental analysis of our approach validates the improvement of hierarchical supervoxe…
▽ More
Despite recent advances in video segmentation, many opportunities remain to improve it using a variety of low and mid-level visual cues. We propose improvements to the leading streaming graph-based hierarchical video segmentation (streamGBH) method based on early and mid level visual processing. The extensive experimental analysis of our approach validates the improvement of hierarchical supervoxel representation by incorporating motion and color with effective filtering. We also pose and illuminate some open questions towards intermediate level video analysis as further extension to streamGBH. We exploit the supervoxels as an initialization towards estimation of dominant affine motion regions, followed by merging of such motion regions in order to hierarchically segment a video in a novel motion-segmentation framework which aims at subsequent applications such as foreground recognition.
△ Less
Submitted 14 February, 2014;
originally announced February 2014.
-
Random Basketball Routing for ZigBee based Sensor Networks
Authors:
Dong Min Kim,
Young Ju Hwang,
Seong-Lyun Kim,
Gwang-Ja Jin,
Bong-Soo Kim
Abstract:
Random basketball routing (BR) \cite {Hwang} is a simple protocol that integrates MAC and multihop routing in a cross-layer optimized manner. Due to its lightness and performance, BR would be quite suitable for sensor networks, where communication nodes are usually simple devices. In this paper, we describe how we implemented BR in a ZigBee-based (IEEE 802.15.4) sensor network. In \cite{Hwang}, it…
▽ More
Random basketball routing (BR) \cite {Hwang} is a simple protocol that integrates MAC and multihop routing in a cross-layer optimized manner. Due to its lightness and performance, BR would be quite suitable for sensor networks, where communication nodes are usually simple devices. In this paper, we describe how we implemented BR in a ZigBee-based (IEEE 802.15.4) sensor network. In \cite{Hwang}, it is verified that BR takes advantages of dynamic environments (in particular, node mobility), however, here we focus on how BR works under static situations. For implementation purposes, we add some features such as destination RSSI measuring and loop-free procedure, to the original BR. With implemented testbed, we compare the performance of BR with that of the simplified AODV with CSMA/CA. The result is that BR has merits in terms of number of hops to traverse the network. Considering the simple structure of BR and its possible energy-efficiency, we can conclude that BR can be a good candidate for sensor networks both under dynamic- and static environments.
△ Less
Submitted 19 December, 2013;
originally announced December 2013.