subscribe to arXiv mailings

Dominant Design Prediction with Phylogenetic Networks

Authors: Youwei He, Jeong-Dong Lee, Dawoon Jeong, Sungjun Choi, Jiyong Kim

Abstract: This study proposes an effective method to predict technology development from an evolutionary perspective. Product evolution is the result of technological evolution and market selection. A phylogenetic network is the main method to study product evolution. The formation of the dominant design determines the trajectory of technology development. How to predict future dominant design has become a… ▽ More This study proposes an effective method to predict technology development from an evolutionary perspective. Product evolution is the result of technological evolution and market selection. A phylogenetic network is the main method to study product evolution. The formation of the dominant design determines the trajectory of technology development. How to predict future dominant design has become a key issue in technology forecasting and new product development. We define the dominant product and use machine learning methods, combined with product evolutionary theory, to construct a Fully Connected Phylogenetic Network dataset to effectively predict the future dominant design. △ Less

Submitted 14 July, 2024; originally announced July 2024.

arXiv:2407.02694 [pdf, other]

LLM-Select: Feature Selection with Large Language Models

Authors: Daniel P. Jeong, Zachary C. Lipton, Pradeep Ravikumar

Abstract: In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM… ▽ More In this paper, we demonstrate a surprising capability of large language models (LLMs): given only input feature names and a description of a prediction task, they are capable of selecting the most predictive features, with performance rivaling the standard tools of data science. Remarkably, these models exhibit this capacity across various query mechanisms. For example, we zero-shot prompt an LLM to output a numerical importance score for a feature (e.g., "blood pressure") in predicting an outcome of interest (e.g., "heart failure"), with no additional context. In particular, we find that the latest models, such as GPT-4, can consistently identify the most predictive features regardless of the query mechanism and across various prompting strategies. We illustrate these findings through extensive experiments on real-world data, where we show that LLM-based feature selection consistently achieves strong performance competitive with data-driven methods such as the LASSO, despite never having looked at the downstream training data. Our findings suggest that LLMs may be useful not only for selecting the best features for training but also for deciding which features to collect in the first place. This could potentially benefit practitioners in domains like healthcare, where collecting high-quality data comes at a high cost. △ Less

Submitted 2 July, 2024; originally announced July 2024.

Comments: Preprint

arXiv:2405.02367 [pdf, other]

Enhancing Social Media Post Popularity Prediction with Visual Content

Authors: Dahyun Jeong, Hyelim Son, Yunjin Choi, Keunwoo Kim

Abstract: Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a… ▽ More Our study presents a framework for predicting image-based social media content popularity that focuses on addressing complex image information and a hierarchical data structure. We utilize the Google Cloud Vision API to effectively extract key image and color information from users' postings, achieving 6.8% higher accuracy compared to using non-image covariates alone. For prediction, we explore a wide range of prediction models, including Linear Mixed Model, Support Vector Regression, Multi-layer Perceptron, Random Forest, and XGBoost, with linear regression as the benchmark. Our comparative study demonstrates that models that are capable of capturing the underlying nonlinear interactions between covariates outperform other methods. △ Less

Submitted 8 May, 2024; v1 submitted 3 May, 2024; originally announced May 2024.

Report number: Report-no: JKSS-D-23-00299R1

arXiv:2405.01974 [pdf, other]

Multitask Extension of Geometrically Aligned Transfer Encoder

Authors: Sung Moon Ko, Sumin Lee, Dae-Woong Jeong, Hyunseung Kim, Chanhui Lee, Soorin Yim, Sehui Han

Abstract: Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transf… ▽ More Molecular datasets often suffer from a lack of data. It is well-known that gathering data is difficult due to the complexity of experimentation or simulation involved. Here, we leverage mutual information across different tasks in molecular data to address this issue. We extend an algorithm that utilizes the geometric characteristics of the encoding space, known as the Geometrically Aligned Transfer Encoder (GATE), to a multi-task setup. Thus, we connect multiple molecular tasks by aligning the curved coordinates onto locally flat coordinates, ensuring the flow of information from source tasks to support performance on target data. △ Less

Submitted 3 May, 2024; originally announced May 2024.

Comments: 7 pages, 3 figures, 2 tables

arXiv:2404.18826 [pdf, other]

Winning the Social Media Influence Battle: Uncertainty-Aware Opinions to Understand and Spread True Information via Competitive Influence Maximization

Authors: Qi Zhang, Lance M. Kaplan, Audun Jøsang, Dong Hyun. Jeong, Feng Chen, Jin-Hee Cho

Abstract: Competitive Influence Maximization (CIM) involves entities competing to maximize influence in online social networks (OSNs). Current Deep Reinforcement Learning (DRL) methods in CIM rely on simplistic binary opinion models (i.e., an opinion is represented by either 0 or 1) and often overlook the complexity of users' behavioral characteristics and their prior knowledge. We propose a novel DRL-based… ▽ More Competitive Influence Maximization (CIM) involves entities competing to maximize influence in online social networks (OSNs). Current Deep Reinforcement Learning (DRL) methods in CIM rely on simplistic binary opinion models (i.e., an opinion is represented by either 0 or 1) and often overlook the complexity of users' behavioral characteristics and their prior knowledge. We propose a novel DRL-based framework that enhances CIM analysis by integrating Subjective Logic (SL) to accommodate uncertain opinions, users' behaviors, and their preferences. This approach targets the mitigation of false information by effectively propagating true information. By modeling two competitive agents, one spreading true information and the other spreading false information, we capture the strategic interplay essential to CIM. Our framework utilizes an uncertainty-based opinion model (UOM) to assess the impact on information quality in OSNs, emphasizing the importance of user behavior alongside network topology in selecting influential seed nodes. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods, achieving faster and more influential results (i.e., outperforming over 20%) under realistic network conditions. Moreover, our method shows robust performance in partially observable networks, effectively doubling the performance when users are predisposed to disbelieve true information. △ Less

Submitted 29 April, 2024; v1 submitted 29 April, 2024; originally announced April 2024.

Comments: 8 pages, 3 figures, submitted to ASONAM 2024

arXiv:2404.13569 [pdf, other]

Musical Word Embedding for Music Tagging and Retrieval

Authors: SeungHeon Doh, Jongpil Lee, Dasaem Jeong, Juhan Nam

Abstract: Word embedding has become an essential means for text-based information retrieval. Typically, word embeddings are learned from large quantities of general and unstructured text data. However, in the domain of music, the word embedding may have difficulty understanding musical contexts or recognizing music-related entities like artists and tracks. To address this issue, we propose a new approach ca… ▽ More Word embedding has become an essential means for text-based information retrieval. Typically, word embeddings are learned from large quantities of general and unstructured text data. However, in the domain of music, the word embedding may have difficulty understanding musical contexts or recognizing music-related entities like artists and tracks. To address this issue, we propose a new approach called Musical Word Embedding (MWE), which involves learning from various types of texts, including both everyday and music-related vocabulary. We integrate MWE into an audio-word joint representation framework for tagging and retrieving music, using words like tag, artist, and track that have different levels of musical specificity. Our experiments show that using a more specific musical word like track results in better retrieval performance, while using a less specific term like tag leads to better tagging performance. To balance this compromise, we suggest multi-prototype training that uses words with different levels of musical specificity jointly. We evaluate both word embedding and audio-word joint embedding on four tasks (tag rank prediction, music tagging, query-by-tag, and query-by-track) across two datasets (Million Song Dataset and MTG-Jamendo). Our findings show that the suggested MWE is more efficient and robust than the conventional word embedding. △ Less

Submitted 22 April, 2024; v1 submitted 21 April, 2024; originally announced April 2024.

Comments: Submitted to IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP)

arXiv:2404.10980 [pdf, other]

Hyper Evidential Deep Learning to Quantify Composite Classification Uncertainty

Authors: Changbin Li, Kangshuo Li, Yuzhe Ou, Lance M. Kaplan, Audun Jøsang, Jin-Hee Cho, Dong Hyun Jeong, Feng Chen

Abstract: Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explic… ▽ More Deep neural networks (DNNs) have been shown to perform well on exclusive, multi-class classification tasks. However, when different classes have similar visual features, it becomes challenging for human annotators to differentiate them. This scenario necessitates the use of composite class labels. In this paper, we propose a novel framework called Hyper-Evidential Neural Network (HENN) that explicitly models predictive uncertainty due to composite class labels in training data in the context of the belief theory called Subjective Logic (SL). By placing a grouped Dirichlet distribution on the class probabilities, we treat predictions of a neural network as parameters of hyper-subjective opinions and learn the network that collects both single and composite evidence leading to these hyper-opinions by a deterministic DNN from data. We introduce a new uncertainty type called vagueness originally designed for hyper-opinions in SL to quantify composite classification uncertainty for DNNs. Our results demonstrate that HENN outperforms its state-of-the-art counterparts based on four image datasets. The code and datasets are available at: https://github.com/Hugo101/HyperEvidentialNN. △ Less

Submitted 16 April, 2024; originally announced April 2024.

Comments: In Proceedings of The Twelfth International Conference on Learning Representations, ICLR 2024

arXiv:2404.06818 [pdf, other]

Towards Efficient and Real-Time Piano Transcription Using Neural Autoregressive Models

Authors: Taegyun Kwon, Dasaem Jeong, Juhan Nam

Abstract: In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance offline transcription, neglecting deliberate consideration of model size. The goal of this work is to implement real-time inference for piano transcrip… ▽ More In recent years, advancements in neural network designs and the availability of large-scale labeled datasets have led to significant improvements in the accuracy of piano transcription models. However, most previous work focused on high-performance offline transcription, neglecting deliberate consideration of model size. The goal of this work is to implement real-time inference for piano transcription while ensuring both high performance and lightweight. To this end, we propose novel architectures for convolutional recurrent neural networks, redesigning an existing autoregressive piano transcription model. First, we extend the acoustic module by adding a frequency-conditioned FiLM layer to the CNN module to adapt the convolutional filters on the frequency axis. Second, we improve note-state sequence modeling by using a pitchwise LSTM that focuses on note-state transitions within a note. In addition, we augment the autoregressive connection with an enhanced recursive context. Using these components, we propose two types of models; one for high performance and the other for high compactness. Through extensive experiments, we show that the proposed models are comparable to state-of-the-art models in terms of note accuracy on the MAESTRO dataset. We also investigate the effective model size and real-time inference latency by gradually streamlining the architecture. Finally, we conduct cross-data evaluation on unseen piano datasets and in-depth analysis to elucidate the effect of the proposed components in the view of note length and pitch range. △ Less

Submitted 10 April, 2024; originally announced April 2024.

Comments: 11 pages, 8 figures, preprint

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.03947 [pdf, other]

Can Audio Reveal Music Performance Difficulty? Insights from the Piano Syllabus Dataset

Authors: Pedro Ramoneda, Minhee Lee, Dasaem Jeong, J. J. Valero-Mas, Xavier Serra

Abstract: Automatically estimating the performance difficulty of a music piece represents a key process in music education to create tailored curricula according to the individual needs of the students. Given its relevance, the Music Information Retrieval (MIR) field depicts some proof-of-concept works addressing this task that mainly focuses on high-level music abstractions such as machine-readable scores… ▽ More Automatically estimating the performance difficulty of a music piece represents a key process in music education to create tailored curricula according to the individual needs of the students. Given its relevance, the Music Information Retrieval (MIR) field depicts some proof-of-concept works addressing this task that mainly focuses on high-level music abstractions such as machine-readable scores or music sheet images. In this regard, the potential of directly analyzing audio recordings has been generally neglected, which prevents students from exploring diverse music pieces that may not have a formal symbolic-level transcription. This work pioneers in the automatic estimation of performance difficulty of music pieces on audio recordings with two precise contributions: (i) the first audio-based difficulty estimation dataset -- namely, Piano Syllabus (PSyllabus) dataset -- featuring 7,901 piano pieces across 11 difficulty levels from 1,233 composers; and (ii) a recognition framework capable of managing different input representations -- both unimodal and multimodal manners -- directly derived from audio to perform the difficulty estimation task. The comprehensive experimentation comprising different pre-training schemes, input modalities, and multi-task scenarios prove the validity of the proposal and establishes PSyllabus as a reference dataset for audio-based difficulty estimation in the MIR field. The dataset as well as the developed code and trained models are publicly shared to promote further research in the field. △ Less

Submitted 6 March, 2024; originally announced March 2024.

arXiv:2402.13410 [pdf, other]

Bayesian Neural Networks with Domain Knowledge Priors

Authors: Dylan Sam, Rattana Pukdee, Daniel P. Jeong, Yewon Byun, J. Zico Kolter

Abstract: Bayesian neural networks (BNNs) have recently gained popularity due to their ability to quantify model uncertainty. However, specifying a prior for BNNs that captures relevant domain knowledge is often extremely challenging. In this work, we propose a framework for integrating general forms of domain knowledge (i.e., any knowledge that can be represented by a loss function) into a BNN prior throug… ▽ More Bayesian neural networks (BNNs) have recently gained popularity due to their ability to quantify model uncertainty. However, specifying a prior for BNNs that captures relevant domain knowledge is often extremely challenging. In this work, we propose a framework for integrating general forms of domain knowledge (i.e., any knowledge that can be represented by a loss function) into a BNN prior through variational inference, while enabling computationally efficient posterior inference and sampling. Specifically, our approach results in a prior over neural network weights that assigns high probability mass to models that better align with our domain knowledge, leading to posterior samples that also exhibit this behavior. We show that BNNs using our proposed domain knowledge priors outperform those with standard priors (e.g., isotropic Gaussian, Gaussian process), successfully incorporating diverse types of prior information such as fairness, physics rules, and healthcare knowledge and achieving better predictive performance. We also present techniques for transferring the learned priors across different model architectures, demonstrating their broad utility across various settings. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 17 pages, 4 figures

arXiv:2312.10195 [pdf, other]

SoloPose: One-Shot Kinematic 3D Human Pose Estimation with Video Data Augmentation

Authors: David C. Jeong, Hongji Liu, Saunder Salazar, Jessie Jiang, Christopher A. Kitts

Abstract: While recent two-stage many-to-one deep learning models have demonstrated great success in 3D human pose estimation, such models are inefficient ways to detect 3D key points in a sequential video relative to one-shot and many-to-many models. Another key drawback of two-stage and many-to-one models is that errors in the first stage will be passed onto the second stage. In this paper, we introduce S… ▽ More While recent two-stage many-to-one deep learning models have demonstrated great success in 3D human pose estimation, such models are inefficient ways to detect 3D key points in a sequential video relative to one-shot and many-to-many models. Another key drawback of two-stage and many-to-one models is that errors in the first stage will be passed onto the second stage. In this paper, we introduce SoloPose, a novel one-shot, many-to-many spatio-temporal transformer model for kinematic 3D human pose estimation of video. SoloPose is further fortified by HeatPose, a 3D heatmap based on Gaussian Mixture Model distributions that factors target key points as well as kinematically adjacent key points. Finally, we address data diversity constraints with the 3D AugMotion Toolkit, a methodology to augment existing 3D human pose datasets, specifically by projecting four top public 3D human pose datasets (Humans3.6M, MADS, AIST Dance++, MPI INF 3DHP) into a novel dataset (Humans7.1M) with a universal coordinate system. Extensive experiments are conducted on Human3.6M as well as the augmented Humans7.1M dataset, and SoloPose demonstrates superior results relative to the state-of-the-art approaches. △ Less

Submitted 15 December, 2023; originally announced December 2023.

Comments: 8 pages, 6 figures

arXiv:2311.08439 [pdf, other]

A Unified Approach for Comprehensive Analysis of Various Spectral and Tissue Doppler Echocardiography

Authors: Jaeik Jeon, Jiyeon Kim, Yeonggul Jang, Yeonyee E. Yoon, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee, Hyuk-Jae Chang

Abstract: Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to proc… ▽ More Doppler echocardiography offers critical insights into cardiac function and phases by quantifying blood flow velocities and evaluating myocardial motion. However, previous methods for automating Doppler analysis, ranging from initial signal processing techniques to advanced deep learning approaches, have been constrained by their reliance on electrocardiogram (ECG) data and their inability to process Doppler views collectively. We introduce a novel unified framework using a convolutional neural network for comprehensive analysis of spectral and tissue Doppler echocardiography images that combines automatic measurements and end-diastole (ED) detection into a singular method. The network automatically recognizes key features across various Doppler views, with novel Doppler shape embedding and anti-aliasing modules enhancing interpretation and ensuring consistent analysis. Empirical results indicate a consistent outperformance in performance metrics, including dice similarity coefficients (DSC) and intersection over union (IoU). The proposed framework demonstrates strong agreement with clinicians in Doppler automatic measurements and competitive performance in ED detection. △ Less

Submitted 14 November, 2023; originally announced November 2023.

arXiv:2310.09757 [pdf, other]

doi 10.1109/IROS55552.2023.10342417

MoEmo Vision Transformer: Integrating Cross-Attention and Movement Vectors in 3D Pose Estimation for HRI Emotion Detection

Authors: David C. Jeong, Tianma Shen, Hongji Liu, Raghav Kapoor, Casey Nguyen, Song Liu, Christopher A. Kitts

Abstract: Emotion detection presents challenges to intelligent human-robot interaction (HRI). Foundational deep learning techniques used in emotion detection are limited by information-constrained datasets or models that lack the necessary complexity to learn interactions between input data elements, such as the the variance of human emotions across different contexts. In the current effort, we introduce 1)… ▽ More Emotion detection presents challenges to intelligent human-robot interaction (HRI). Foundational deep learning techniques used in emotion detection are limited by information-constrained datasets or models that lack the necessary complexity to learn interactions between input data elements, such as the the variance of human emotions across different contexts. In the current effort, we introduce 1) MoEmo (Motion to Emotion), a cross-attention vision transformer (ViT) for human emotion detection within robotics systems based on 3D human pose estimations across various contexts, and 2) a data set that offers full-body videos of human movement and corresponding emotion labels based on human gestures and environmental contexts. Compared to existing approaches, our method effectively leverages the subtle connections between movement vectors of gestures and environmental contexts through the use of cross-attention on the extracted movement vectors of full-body human gestures/poses and feature maps of environmental contexts. We implement a cross-attention fusion model to combine movement vectors and environment contexts into a joint representation to derive emotion estimation. Leveraging our Naturalistic Motion Database, we train the MoEmo system to jointly analyze motion and context, yielding emotion detection that outperforms the current state-of-the-art. △ Less

Submitted 15 October, 2023; originally announced October 2023.

Comments: IEEE/RSJ International Conference on Intelligent Robots (IROS), Detroit, Michigan

Journal ref: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots (IROS), 2023

arXiv:2310.08897 [pdf, other]

Self supervised convolutional kernel based handcrafted feature harmonization: Enhanced left ventricle hypertension disease phenotyping on echocardiography

Authors: Jina Lee, Youngtaek Hong, Dawun Jeong, Yeonggul Jang, Jaeik Jeon, Sihyeon Jeong, Taekgeun Jung, Yeonyee E. Yoon, Inki Moon, Seung-Ah Lee, Hyuk-Jae Chang

Abstract: Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricul… ▽ More Radiomics, a medical imaging technique, extracts quantitative handcrafted features from images to predict diseases. Harmonization in those features ensures consistent feature extraction across various imaging devices and protocols. Methods for harmonization include standardized imaging protocols, statistical adjustments, and evaluating feature robustness. Myocardial diseases such as Left Ventricular Hypertrophy (LVH) and Hypertensive Heart Disease (HHD) are diagnosed via echocardiography, but variable imaging settings pose challenges. Harmonization techniques are crucial for applying handcrafted features in disease diagnosis in such scenario. Self-supervised learning (SSL) enhances data understanding within limited datasets and adapts to diverse data settings. ConvNeXt-V2 integrates convolutional layers into SSL, displaying superior performance in various tasks. This study focuses on convolutional filters within SSL, using them as preprocessing to convert images into feature maps for handcrafted feature harmonization. Our proposed method excelled in harmonization evaluation and exhibited superior LVH classification performance compared to existing methods. △ Less

Submitted 22 November, 2023; v1 submitted 13 October, 2023; originally announced October 2023.

Comments: 11 pages, 7 figures

arXiv:2310.06369 [pdf, other]

Geometrically Aligned Transfer Encoder for Inductive Transfer in Regression Tasks

Authors: Sung Moon Ko, Sumin Lee, Dae-Woong Jeong, Woohyung Lim, Sehui Han

Abstract: Transfer learning is a crucial technique for handling a small amount of data that is potentially related to other abundant data. However, most of the existing methods are focused on classification tasks using images and language datasets. Therefore, in order to expand the transfer learning scheme to regression tasks, we propose a novel transfer technique based on differential geometry, namely the… ▽ More Transfer learning is a crucial technique for handling a small amount of data that is potentially related to other abundant data. However, most of the existing methods are focused on classification tasks using images and language datasets. Therefore, in order to expand the transfer learning scheme to regression tasks, we propose a novel transfer technique based on differential geometry, namely the Geometrically Aligned Transfer Encoder (GATE). In this method, we interpret the latent vectors from the model to exist on a Riemannian curved manifold. We find a proper diffeomorphism between pairs of tasks to ensure that every arbitrary point maps to a locally flat coordinate in the overlapping region, allowing the transfer of knowledge from the source to the target data. This also serves as an effective regularizer for the model to behave in extrapolation regions. In this article, we demonstrate that GATE outperforms conventional methods and exhibits stable behavior in both the latent space and extrapolation regions for various molecular graph datasets. △ Less

Submitted 10 October, 2023; originally announced October 2023.

Comments: 12+11 pages, 6+1 figures, 0+7 tables

arXiv:2310.01580 [pdf, other]

Active Learning on Neural Networks through Interactive Generation of Digit Patterns and Visual Representation

Authors: Dong H. Jeong, Jin-Hee Cho, Feng Chen, Audun Josang, Soo-Yeon Ji

Abstract: Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying computation and meaning are hidden. Due to this nature, users often face difficulties in interpreting the underlying mechanism of the NNs and the benefits of using them.… ▽ More Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying computation and meaning are hidden. Due to this nature, users often face difficulties in interpreting the underlying mechanism of the NNs and the benefits of using them. In this paper, to improve users' learning and understanding of NNs, an interactive learning system is designed to create digit patterns and recognize them in real time. To help users clearly understand the visual differences of digit patterns (i.e., 0 ~ 9) and their results with an NN, integrating visualization is considered to present all digit patterns in a two-dimensional display space with supporting multiple user interactions. An evaluation with multiple datasets is conducted to determine its usability for active learning. In addition, informal user testing is managed during a summer workshop by asking the workshop participants to use the system. △ Less

Submitted 2 October, 2023; originally announced October 2023.

arXiv:2310.00398 [pdf, other]

Optimal Impact Angle Guidance via First-Order Optimization under Nonconvex Constraints

Authors: Gyubin Park, Jiwoo Choi, Da Hoon Jeong, Jong-Han Kim

Abstract: Most of the optimal guidance problems can be formulated as nonconvex optimization problems, which can be solved indirectly by relaxation, convexification, or linearization. Although these methods are guaranteed to converge to the global optimum of the modified problems, the obtained solution may not guarantee global optimality or even the feasibility of the original nonconvex problems. In this pap… ▽ More Most of the optimal guidance problems can be formulated as nonconvex optimization problems, which can be solved indirectly by relaxation, convexification, or linearization. Although these methods are guaranteed to converge to the global optimum of the modified problems, the obtained solution may not guarantee global optimality or even the feasibility of the original nonconvex problems. In this paper, we propose a computational optimal guidance approach that directly handles the nonconvex constraints encountered in formulating the guidance problems. The proposed computational guidance approach alternately solves the least squares problems and projects the solution onto nonconvex feasible sets, which rapidly converges to feasible suboptimal solutions or sometimes to the globally optimal solutions. The proposed algorithm is verified via a series of numerical simulations on impact angle guidance problems under state dependent maneuver vector constraints, and it is demonstrated that the proposed algorithm provides superior guidance performance than conventional techniques. △ Less

Submitted 17 March, 2024; v1 submitted 30 September, 2023; originally announced October 2023.

Comments: To appear at 2024 American Control Conference

arXiv:2309.16287 [pdf, other]

Predicting performance difficulty from piano sheet music images

Authors: Pedro Ramoneda, Jose J. Valero-Mas, Dasaem Jeong, Xavier Serra

Abstract: Estimating the performance difficulty of a musical score is crucial in music education for adequately designing the learning curriculum of the students. Although the Music Information Retrieval community has recently shown interest in this task, existing approaches mainly use machine-readable scores, leaving the broader case of sheet music images unaddressed. Based on previous works involving shee… ▽ More Estimating the performance difficulty of a musical score is crucial in music education for adequately designing the learning curriculum of the students. Although the Music Information Retrieval community has recently shown interest in this task, existing approaches mainly use machine-readable scores, leaving the broader case of sheet music images unaddressed. Based on previous works involving sheet music images, we use a mid-level representation, bootleg score, describing notehead positions relative to staff lines coupled with a transformer model. This architecture is adapted to our task by introducing an encoding scheme that reduces the encoded sequence length to one-eighth of the original size. In terms of evaluation, we consider five datasets -- more than 7500 scores with up to 9 difficulty levels -- , two of them particularly compiled for this work. The results obtained when pretraining the scheme on the IMSLP corpus and fine-tuning it on the considered datasets prove the proposal's validity, achieving the best-performing model with a balanced accuracy of 40.34\% and a mean square error of 1.33. Finally, we provide access to our code, data, and models for transparency and reproducibility. △ Less

Submitted 28 September, 2023; originally announced September 2023.

arXiv:2309.11093 [pdf, other]

K-pop Lyric Translation: Dataset, Analysis, and Neural-Modelling

Authors: Haven Kim, Jongmin Jung, Dasaem Jeong, Juhan Nam

Abstract: Lyric translation, a field studied for over a century, is now attracting computational linguistics researchers. We identified two limitations in previous studies. Firstly, lyric translation studies have predominantly focused on Western genres and languages, with no previous study centering on K-pop despite its popularity. Second, the field of lyric translation suffers from a lack of publicly avail… ▽ More Lyric translation, a field studied for over a century, is now attracting computational linguistics researchers. We identified two limitations in previous studies. Firstly, lyric translation studies have predominantly focused on Western genres and languages, with no previous study centering on K-pop despite its popularity. Second, the field of lyric translation suffers from a lack of publicly available datasets; to the best of our knowledge, no such dataset exists. To broaden the scope of genres and languages in lyric translation studies, we introduce a novel singable lyric translation dataset, approximately 89\% of which consists of K-pop song lyrics. This dataset aligns Korean and English lyrics line-by-line and section-by-section. We leveraged this dataset to unveil unique characteristics of K-pop lyric translation, distinguishing it from other extensively studied genres, and to construct a neural lyric translation model, thereby underscoring the importance of a dedicated dataset for singable lyric translations. △ Less

Submitted 17 May, 2024; v1 submitted 20 September, 2023; originally announced September 2023.

Comments: LREC-COLING 2024

arXiv:2309.04062 [pdf, other]

3D Denoisers are Good 2D Teachers: Molecular Pretraining via Denoising and Cross-Modal Distillation

Authors: Sungjun Cho, Dae-Woong Jeong, Sung Moon Ko, Jinwoo Kim, Sehui Han, Seunghoon Hong, Honglak Lee, Moontae Lee

Abstract: Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining… ▽ More Pretraining molecular representations from large unlabeled data is essential for molecular property prediction due to the high cost of obtaining ground-truth labels. While there exist various 2D graph-based molecular pretraining approaches, these methods struggle to show statistically significant gains in predictive performance. Recent work have thus instead proposed 3D conformer-based pretraining under the task of denoising, which led to promising results. During downstream finetuning, however, models trained with 3D conformers require accurate atom-coordinates of previously unseen molecules, which are computationally expensive to acquire at scale. In light of this limitation, we propose D&D, a self-supervised molecular representation learning framework that pretrains a 2D graph encoder by distilling representations from a 3D denoiser. With denoising followed by cross-modal knowledge distillation, our approach enjoys use of knowledge obtained from denoising as well as painless application to downstream tasks with no access to accurate conformers. Experiments on real-world molecular property prediction datasets show that the graph encoder trained via D&D can infer 3D information based on the 2D graph and shows superior performance and label-efficiency against other baselines. △ Less

Submitted 7 September, 2023; originally announced September 2023.

Comments: 16 pages, 5 figures

arXiv:2308.16483 [pdf, other]

Improving Out-of-Distribution Detection in Echocardiographic View Classication through Enhancing Semantic Features

Authors: Jaeik Jeon, Seongmin Ha, Yeonggul Jang, Yeonyee E. Yoon, Jiyeon Kim, Hyunseok Jeong, Dawun Jeong, Youngtaek Hong, Seung-Ah Lee Hyuk-Jae Chang

Abstract: In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obviou… ▽ More In echocardiographic view classification, accurately detecting out-of-distribution (OOD) data is essential but challenging, especially given the subtle differences between in-distribution and OOD data. While conventional OOD detection methods, such as Mahalanobis distance (MD) are effective in far-OOD scenarios with clear distinctions between distributions, they struggle to discern the less obvious variations characteristic of echocardiographic data. In this study, we introduce a novel use of label smoothing to enhance semantic feature representation in echocardiographic images, demonstrating that these enriched semantic features are key for significantly improving near-OOD instance detection. By combining label smoothing with MD-based OOD detection, we establish a new benchmark for accuracy in echocardiographic OOD detection. △ Less

Submitted 23 November, 2023; v1 submitted 31 August, 2023; originally announced August 2023.

arXiv:2308.09092 [pdf, other]

Watch Out! Smartwatches as criminal tool and digital forensic investigations

Authors: Seungjae Jeon, Jaehyun Chung, Doowon Jeong

Abstract: In the rapidly advancing technological landscape, smartwatches have materialized as multifunctional devices integral to our daily routines. Smartwatches store a substantial amount of personal information, potentially serving as repositories of digital evidence. Thus, digital forensic researchers have devoted considerable effort to exploring smartwatch forensic techniques. However, it has been obse… ▽ More In the rapidly advancing technological landscape, smartwatches have materialized as multifunctional devices integral to our daily routines. Smartwatches store a substantial amount of personal information, potentially serving as repositories of digital evidence. Thus, digital forensic researchers have devoted considerable effort to exploring smartwatch forensic techniques. However, it has been observed that prior studies have primarily treated smartwatches as mere storage mediums for digital evidence, neglecting their potential role in criminal activities. This paper presents the information leakage perpetrated through smartwatches. We represent crime scenarios in an environment where smartphones are not available, considering that the perception that smartphones can be used as tools for criminal behavior prevails in many organizations, while the potential of similar-use smartwatches is often overlooked. We detail mechanisms for information leakage via file transfer and camera control using smartwatches. Additionally, we present methods to investigate each crime incident through smartwatch forensics. Finally, we describe the limitations of post-incident responses and propose proactive measures to prepare for potential crimes involving smartwatches. Keywords: Information Leakage, Smartwatch Forensics, Android Forensics, Mobile Device Management, Security Policy △ Less

Submitted 17 August, 2023; originally announced August 2023.

arXiv:2308.04723 [pdf, other]

doi 10.13089/JKIISC.2023.33.4.671

A Forensic Methodology for Detecting Image Manipulations

Authors: Jiwon Lee, Seungjae Jeon, Yunji Park, Jaehyun Chung, Doowon Jeong

Abstract: By applying artificial intelligence to image editing technology, it has become possible to generate high-quality images with minimal traces of manipulation. However, since these technologies can be misused for criminal activities such as dissemination of false information, destruction of evidence, and denial of facts, it is crucial to implement strong countermeasures. In this study, image file and… ▽ More By applying artificial intelligence to image editing technology, it has become possible to generate high-quality images with minimal traces of manipulation. However, since these technologies can be misused for criminal activities such as dissemination of false information, destruction of evidence, and denial of facts, it is crucial to implement strong countermeasures. In this study, image file and mobile forensic artifacts analysis were conducted for detecting image manipulation. Image file analysis involves parsing the metadata of manipulated images (e.g., Exif, DQT, and Filename Signature) and comparing them with a Reference DB to detect manipulation. The Reference DB is a database that collects manipulation-related traces left in image metadata, which serves as a criterion for detecting image manipulation. In the mobile forensic artifacts analysis, packages related to image editing tools were extracted and analyzed to aid the detection of image manipulation. The proposed methodology overcomes the limitations of existing graphic feature-based analysis and combines with image processing techniques, providing the advantage of reducing false positives. The research results demonstrate the significant role of such methodology in digital forensic investigation and analysis. Additionally, We provide the code for parsing image metadata and the Reference DB along with the dataset of manipulated images, aiming to contribute to related research. △ Less

Submitted 9 August, 2023; originally announced August 2023.

Journal ref: Journal of The Korea Institute of Information Security and Cryptology (2023)

arXiv:2308.02249 [pdf, other]

Finding Tori: Self-supervised Learning for Analyzing Korean Folk Song

Authors: Danbinaerin Han, Rafael Caro Repetto, Dasaem Jeong

Abstract: In this paper, we introduce a computational analysis of the field recording dataset of approximately 700 hours of Korean folk songs, which were recorded around 1980-90s. Because most of the songs were sung by non-expert musicians without accompaniment, the dataset provides several challenges. To address this challenge, we utilized self-supervised learning with convolutional neural network based on… ▽ More In this paper, we introduce a computational analysis of the field recording dataset of approximately 700 hours of Korean folk songs, which were recorded around 1980-90s. Because most of the songs were sung by non-expert musicians without accompaniment, the dataset provides several challenges. To address this challenge, we utilized self-supervised learning with convolutional neural network based on pitch contour, then analyzed how the musical concept of tori, a classification system defined by a specific scale, ornamental notes, and an idiomatic melodic contour, is captured by the model. The experimental result shows that our approach can better capture the characteristics of tori compared to traditional pitch histograms. Using our approaches, we have examined how musical discussions proposed in existing academia manifest in the actual field recordings of Korean folk songs. △ Less

Submitted 4 August, 2023; originally announced August 2023.

Comments: Accepted at 24th International Society for Music Information Retrieval Conference (ISMIR 2023)

arXiv:2307.10943 [pdf, other]

Proxy Anchor-based Unsupervised Learning for Continuous Generalized Category Discovery

Authors: Hyungmin Kim, Sungho Suh, Daehwan Kim, Daun Jeong, Hansang Cho, Junmo Kim

Abstract: Recent advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets… ▽ More Recent advances in deep learning have significantly improved the performance of various computer vision applications. However, discovering novel categories in an incremental learning scenario remains a challenging problem due to the lack of prior knowledge about the number and nature of new categories. Existing methods for novel category discovery are limited by their reliance on labeled datasets and prior knowledge about the number of novel categories and the proportion of novel samples in the batch. To address the limitations and more accurately reflect real-world scenarios, in this paper, we propose a novel unsupervised class incremental learning approach for discovering novel categories on unlabeled sets without prior knowledge. The proposed method fine-tunes the feature extractor and proxy anchors on labeled sets, then splits samples into old and novel categories and clusters on the unlabeled dataset. Furthermore, the proxy anchors-based exemplar generates representative category vectors to mitigate catastrophic forgetting. Experimental results demonstrate that our proposed approach outperforms the state-of-the-art methods on fine-grained datasets under real-world scenarios. △ Less

Submitted 2 November, 2023; v1 submitted 20 July, 2023; originally announced July 2023.

Comments: Accepted to ICCV 2023

arXiv:2306.08480 [pdf, other]

Combining piano performance dimensions for score difficulty classification

Authors: Pedro Ramoneda, Dasaem Jeong, Vsevolod Eremenko, Nazif Can Tamer, Marius Miron, Xavier Serra

Abstract: Predicting the difficulty of playing a musical score is essential for structuring and exploring score collections. Despite its importance for music education, the automatic difficulty classification of piano scores is not yet solved, mainly due to the lack of annotated data and the subjectiveness of the annotations. This paper aims to advance the state-of-the-art in score difficulty classification… ▽ More Predicting the difficulty of playing a musical score is essential for structuring and exploring score collections. Despite its importance for music education, the automatic difficulty classification of piano scores is not yet solved, mainly due to the lack of annotated data and the subjectiveness of the annotations. This paper aims to advance the state-of-the-art in score difficulty classification with two major contributions. To address the lack of data, we present Can I Play It? (CIPI) dataset, a machine-readable piano score dataset with difficulty annotations obtained from the renowned classical music publisher Henle Verlag. The dataset is created by matching public domain scores with difficulty labels from Henle Verlag, then reviewed and corrected by an expert pianist. As a second contribution, we explore various input representations from score information to pre-trained ML models for piano fingering and expressiveness inspired by the musicology definition of performance. We show that combining the outputs of multiple classifiers performs better than the classifiers on their own, pointing to the fact that the representations capture different aspects of difficulty. In addition, we conduct numerous experiments that lay a foundation for score difficulty classification and create a basis for future research. Our best-performing model reports a 39.47% balanced accuracy and 1.13 median square error across the nine difficulty levels proposed in this study. Code, dataset, and models are made available for reproducibility. △ Less

Submitted 27 September, 2023; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 36 pages

arXiv:2305.13758 [pdf, other]

A study of audio mixing methods for piano transcription in violin-piano ensembles

Authors: Hyemi Kim, Jiyun Park, Taegyun Kwon, Dasaem Jeong, Juhan Nam

Abstract: While piano music transcription models have shown high performance for solo piano recordings, their performance degrades when applied to ensemble recordings. This study aims to analyze the impact of different data augmentation methods on piano transcription performance, specifically focusing on mixing techniques applied to violin-piano ensembles. We apply mixing methods that consider both harmonic… ▽ More While piano music transcription models have shown high performance for solo piano recordings, their performance degrades when applied to ensemble recordings. This study aims to analyze the impact of different data augmentation methods on piano transcription performance, specifically focusing on mixing techniques applied to violin-piano ensembles. We apply mixing methods that consider both harmonic and temporal characteristics of the audio. To create datasets for this study, we generated the PFVN-synth dataset, which contains 7 hours of violin-piano ensemble audio by rendering MIDI files and corresponding labels, and also collected unaccompanied violin recordings and mixed them with the MAESTRO dataset. We evaluated the transcription results on both synthesized and real audio recordings datasets. △ Less

Submitted 23 May, 2023; originally announced May 2023.

Comments: To Appear IEEE ICASSP 2023

arXiv:2303.11668

Focus or Not: A Baseline for Anomaly Event Detection On the Open Public Places with Satellite Images

Authors: Yongjin Jeon, Youngtack Oh, Doyoung Jeong, Hyunguk Choi, Junsik Kim

Abstract: In recent years, monitoring the world wide area with satellite images has been emerged as an important issue. Site monitoring task can be divided into two independent tasks; 1) Change Detection and 2) Anomaly Event Detection. Unlike to change detection research is actively conducted based on the numerous datasets(\eg LEVIR-CD, WHU-CD, S2Looking, xView2 and etc...) to meet up the expectations o… ▽ More In recent years, monitoring the world wide area with satellite images has been emerged as an important issue. Site monitoring task can be divided into two independent tasks; 1) Change Detection and 2) Anomaly Event Detection. Unlike to change detection research is actively conducted based on the numerous datasets(\eg LEVIR-CD, WHU-CD, S2Looking, xView2 and etc...) to meet up the expectations of industries or governments, research on AI models for detecting anomaly events is passively and rarely conducted. In this paper, we introduce a novel satellite imagery dataset(AED-RS) for detecting anomaly events on the open public places. AED-RS Dataset contains satellite images of normal and abnormal situations of 8 open public places from all over the world. Each places are labeled with different criteria based on the difference of characteristics of each places. With this dataset, we introduce a baseline model for our dataset TB-FLOW, which can be trained in weakly-supervised manner and shows reasonable performance on the AED-RS Dataset compared with the other NF(Normalizing-Flow) based anomaly detection models. Our dataset and code will be publicly open in \url{https://github.com/SIAnalytics/RS_AnomalyDetection.git}. △ Less

Submitted 4 April, 2023; v1 submitted 21 March, 2023; originally announced March 2023.

Comments: I am withdrawing my submission due to issues with content modification

arXiv:2302.10195 [pdf, other]

Uncertainty-Aware Reward-based Deep Reinforcement Learning for Intent Analysis of Social Media Information

Authors: Zhen Guo, Qi Zhang, Xinwei An, Qisheng Zhang, Audun Jøsang, Lance M. Kaplan, Feng Chen, Dong H. Jeong, Jin-Hee Cho

Abstract: Due to various and serious adverse impacts of spreading fake news, it is often known that only people with malicious intent would propagate fake news. However, it is not necessarily true based on social science studies. Distinguishing the types of fake news spreaders based on their intent is critical because it will effectively guide how to intervene to mitigate the spread of fake news with differ… ▽ More Due to various and serious adverse impacts of spreading fake news, it is often known that only people with malicious intent would propagate fake news. However, it is not necessarily true based on social science studies. Distinguishing the types of fake news spreaders based on their intent is critical because it will effectively guide how to intervene to mitigate the spread of fake news with different approaches. To this end, we propose an intent classification framework that can best identify the correct intent of fake news. We will leverage deep reinforcement learning (DRL) that can optimize the structural representation of each tweet by removing noisy words from the input sequence when appending an actor to the long short-term memory (LSTM) intent classifier. Policy gradient DRL model (e.g., REINFORCE) can lead the actor to a higher delayed reward. We also devise a new uncertainty-aware immediate reward using a subjective opinion that can explicitly deal with multidimensional uncertainty for effective decision-making. Via 600K training episodes from a fake news tweets dataset with an annotated intent class, we evaluate the performance of uncertainty-aware reward in DRL. Evaluation results demonstrate that our proposed framework efficiently reduces the number of selected words to maintain a high 95\% multi-class accuracy. △ Less

Submitted 18 February, 2023; originally announced February 2023.

Comments: Accepted version

arXiv:2212.06343 [pdf, other]

PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

Authors: Qisheng Zhang, Zhen Guo, Audun Jøsang, Lance M. Kaplan, Feng Chen, Dong H. Jeong, Jin-Hee Cho

Abstract: Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level… ▽ More Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks. △ Less

Submitted 12 December, 2022; originally announced December 2022.

arXiv:2211.10938 [pdf, other]

AI-KD: Adversarial learning and Implicit regularization for self-Knowledge Distillation

Authors: Hyungmin Kim, Sungho Suh, Sunghyun Baek, Daehwan Kim, Daun Jeong, Hansang Cho, Junmo Kim

Abstract: We present a novel adversarial penalized self-knowledge distillation method, named adversarial learning and implicit regularization for self-knowledge distillation (AI-KD), which regularizes the training procedure by adversarial learning and implicit distillations. Our model not only distills the deterministic and progressive knowledge which are from the pre-trained and previous epoch predictive p… ▽ More We present a novel adversarial penalized self-knowledge distillation method, named adversarial learning and implicit regularization for self-knowledge distillation (AI-KD), which regularizes the training procedure by adversarial learning and implicit distillations. Our model not only distills the deterministic and progressive knowledge which are from the pre-trained and previous epoch predictive probabilities but also transfers the knowledge of the deterministic predictive distributions using adversarial learning. The motivation is that the self-knowledge distillation methods regularize the predictive probabilities with soft targets, but the exact distributions may be hard to predict. Our method deploys a discriminator to distinguish the distributions between the pre-trained and student models while the student model is trained to fool the discriminator in the trained procedure. Thus, the student model not only can learn the pre-trained model's predictive probabilities but also align the distributions between the pre-trained and student models. We demonstrate the effectiveness of the proposed method with network architectures on multiple datasets and show the proposed method achieves better performance than state-of-the-art methods. △ Less

Submitted 21 March, 2024; v1 submitted 20 November, 2022; originally announced November 2022.

Comments: Accepted to KBS

arXiv:2211.02681 [pdf, ps, other]

Deep Distance Sensitivity Oracles

Authors: Davin Jeong, Allison Gunby-Mann, Sarel Cohen, Maximilian Katzmann, Chau Pham, Arnav Bhakta, Tobias Friedrich, Sang Chin

Abstract: One of the most fundamental graph problems is finding a shortest path from a source to a target node. While in its basic forms the problem has been studied extensively and efficient algorithms are known, it becomes significantly harder as soon as parts of the graph are susceptible to failure. Although one can recompute a shortest replacement path after every outage, this is rather inefficient both… ▽ More One of the most fundamental graph problems is finding a shortest path from a source to a target node. While in its basic forms the problem has been studied extensively and efficient algorithms are known, it becomes significantly harder as soon as parts of the graph are susceptible to failure. Although one can recompute a shortest replacement path after every outage, this is rather inefficient both in time and/or storage. One way to overcome this problem is to shift computational burden from the queries into a pre-processing step, where a data structure is computed that allows for fast querying of replacement paths, typically referred to as a Distance Sensitivity Oracle (DSO). While DSOs have been extensively studied in the theoretical computer science community, to the best of our knowledge this is the first work to construct DSOs using deep learning techniques. We show how to use deep learning to utilize a combinatorial structure of replacement paths. More specifically, we utilize the combinatorial structure of replacement paths as a concatenation of shortest paths and use deep learning to find the pivot nodes for stitching shortest paths into replacement paths. △ Less

Submitted 18 October, 2023; v1 submitted 2 November, 2022; originally announced November 2022.

Comments: arXiv admin note: text overlap with arXiv:2007.11495 by other authors

arXiv:2209.02939 [pdf, other]

doi 10.1609/aaai.v37i7.26005

Grouping-matrix based Graph Pooling with Adaptive Number of Clusters

Authors: Sung Moon Ko, Sungjun Cho, Dae-Woong Jeong, Sehui Han, Moontae Lee, Honglak Lee

Abstract: Graph pooling is a crucial operation for encoding hierarchical structures within graphs. Most existing graph pooling approaches formulate the problem as a node clustering task which effectively captures the graph topology. Conventional methods ask users to specify an appropriate number of clusters as a hyperparameter, then assume that all input graphs share the same number of clusters. In inductiv… ▽ More Graph pooling is a crucial operation for encoding hierarchical structures within graphs. Most existing graph pooling approaches formulate the problem as a node clustering task which effectively captures the graph topology. Conventional methods ask users to specify an appropriate number of clusters as a hyperparameter, then assume that all input graphs share the same number of clusters. In inductive settings where the number of clusters can vary, however, the model should be able to represent this variation in its pooling layers in order to learn suitable clusters. Thus we propose GMPool, a novel differentiable graph pooling architecture that automatically determines the appropriate number of clusters based on the input data. The main intuition involves a grouping matrix defined as a quadratic form of the pooling operator, which induces use of binary classification probabilities of pairwise combinations of nodes. GMPool obtains the pooling operator by first computing the grouping matrix, then decomposing it. Extensive evaluations on molecular property prediction tasks demonstrate that our method outperforms conventional methods. △ Less

Submitted 7 September, 2022; originally announced September 2022.

Comments: 10 pages, 3 figures

arXiv:2206.05675 [pdf, other]

A Survey on Uncertainty Reasoning and Quantification for Decision Making: Belief Theory Meets Deep Learning

Authors: Zhen Guo, Zelin Wan, Qisheng Zhang, Xujiang Zhao, Feng Chen, Jin-Hee Cho, Qi Zhang, Lance M. Kaplan, Dong H. Jeong, Audun Jøsang

Abstract: An in-depth understanding of uncertainty is the first step to making effective decisions under uncertainty. Deep/machine learning (ML/DL) has been hugely leveraged to solve complex problems involved with processing high-dimensional data. However, reasoning and quantifying different types of uncertainties to achieve effective decision-making have been much less explored in ML/DL than in other Artif… ▽ More An in-depth understanding of uncertainty is the first step to making effective decisions under uncertainty. Deep/machine learning (ML/DL) has been hugely leveraged to solve complex problems involved with processing high-dimensional data. However, reasoning and quantifying different types of uncertainties to achieve effective decision-making have been much less explored in ML/DL than in other Artificial Intelligence (AI) domains. In particular, belief/evidence theories have been studied in KRR since the 1960s to reason and measure uncertainties to enhance decision-making effectiveness. We found that only a few studies have leveraged the mature uncertainty research in belief/evidence theories in ML/DL to tackle complex problems under different types of uncertainty. In this survey paper, we discuss several popular belief theories and their core ideas dealing with uncertainty causes and types and quantifying them, along with the discussions of their applicability in ML/DL. In addition, we discuss three main approaches that leverage belief theories in Deep Neural Networks (DNNs), including Evidential DNNs, Fuzzy DNNs, and Rough DNNs, in terms of their uncertainty causes, types, and quantification methods along with their applicability in diverse problem domains. Based on our in-depth survey, we discuss insights, lessons learned, limitations of the current state-of-the-art bridging belief theories and ML/DL, and finally, future research directions. △ Less

Submitted 13 June, 2022; v1 submitted 12 June, 2022; originally announced June 2022.

Comments: First four authors contributed equally

arXiv:2206.04688 [pdf, other]

A New Frontier of AI: On-Device AI Training and Personalization

Authors: Ji Joong Moon, Hyun Suk Lee, Jiho Chu, Donghak Park, Seungbaek Hong, Hyungjun Seo, Donghyeon Jeong, Sungsik Kong, MyungJoo Ham

Abstract: Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the li… ▽ More Modern consumer electronic devices have started executing deep learning-based intelligence services on devices, not cloud servers, to keep personal data on devices and to reduce network and cloud costs. We find such a trend as the opportunity to personalize intelligence services by updating neural networks with user data without exposing the data out of devices: on-device training. However, the limited resources of devices incurs significant difficulties. We propose a light-weight on-device training framework, NNTrainer, which provides highly memory-efficient neural network training techniques and proactive swapping based on fine-grained execution order analysis for neural networks. Moreover, its optimizations do not sacrifice accuracy and are transparent to training algorithms; thus, prior algorithmic studies may be implemented on top of NNTrainer. The evaluations show that NNTrainer can reduce memory consumption down to 1/20 (saving 95%!) and effectively personalizes intelligence services on devices. NNTrainer is cross-platform and practical open-source software, which is being deployed to millions of mobile devices. △ Less

Submitted 4 January, 2024; v1 submitted 9 June, 2022; originally announced June 2022.

Comments: 12 pages, 16 figures, Accepted in ICSE 2024

arXiv:2204.08307 [pdf]

Heavy Rain Face Image Restoration: Integrating Physical Degradation Model and Facial Component Guided Adversarial Learning

Authors: Chang-Hwan Son, Da-Hee Jeong

Abstract: With the recent increase in intelligent CCTVs for visual surveillance, a new image degradation that integrates resolution conversion and synthetic rain models is required. For example, in heavy rain, face images captured by CCTV from a distance have significant deterioration in both visibility and resolution. Unlike traditional image degradation models (IDM), such as rain removal and superresoluti… ▽ More With the recent increase in intelligent CCTVs for visual surveillance, a new image degradation that integrates resolution conversion and synthetic rain models is required. For example, in heavy rain, face images captured by CCTV from a distance have significant deterioration in both visibility and resolution. Unlike traditional image degradation models (IDM), such as rain removal and superresolution, this study addresses a new IDM referred to as a scale-aware heavy rain model and proposes a method for restoring high-resolution face images (HR-FIs) from low-resolution heavy rain face images (LRHR-FI). To this end, a 2-stage network is presented. The first stage generates low-resolution face images (LR-FIs), from which heavy rain has been removed from the LRHR-FIs to improve visibility. To realize this, an interpretable IDM-based network is constructed to predict physical parameters, such as rain streaks, transmission maps, and atmospheric light. In addition, the image reconstruction loss is evaluated to enhance the estimates of the physical parameters. For the second stage, which aims to reconstruct the HR-FIs from the LR-FIs outputted in the first stage, facial component guided adversarial learning (FCGAL) is applied to boost facial structure expressions. To focus on informative facial features and reinforce the authenticity of facial components, such as the eyes and nose, a face-parsing-guided generator and facial local discriminators are designed for FCGAL. The experimental results verify that the proposed approach based on physical-based network design and FCGAL can remove heavy rain and increase the resolution and visibility simultaneously. Moreover, the proposed heavy-rain face image restoration outperforms state-of-the-art models of heavy rain removal, image-to-image translation, and superresolution. △ Less

Submitted 18 April, 2022; originally announced April 2022.

arXiv:2110.02550 [pdf, other]

CBP: Backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method

Authors: Guhyun Kim, Doo Seok Jeong

Abstract: Backward propagation of errors (backpropagation) is a method to minimize objective functions (e.g., loss functions) of deep neural networks by identifying optimal sets of weights and biases. Imposing constraints on weight precision is often required to alleviate prohibitive workloads on hardware. Despite the remarkable success of backpropagation, the algorithm itself is not capable of considering… ▽ More Backward propagation of errors (backpropagation) is a method to minimize objective functions (e.g., loss functions) of deep neural networks by identifying optimal sets of weights and biases. Imposing constraints on weight precision is often required to alleviate prohibitive workloads on hardware. Despite the remarkable success of backpropagation, the algorithm itself is not capable of considering such constraints unless additional algorithms are applied simultaneously. To address this issue, we propose the constrained backpropagation (CBP) algorithm based on a pseudo-Lagrange multiplier method to obtain the optimal set of weights that satisfy a given set of constraints. The defining characteristic of the proposed CBP algorithm is the utilization of a Lagrangian function (loss function plus constraint function) as its objective function. We considered various types of constraints--binary, ternary, one-bit shift, and two-bit shift weight constraints. As a post-training method, CBP applied to AlexNet, ResNet-18, ResNet-50, and GoogLeNet on ImageNet, which were pre-trained using the conventional backpropagation. For all cases, the proposed algorithm outperforms the state-of-the-art methods on ImageNet, e.g., 66.6%, 74.4%, and 64.0% top-1 accuracy for ResNet-18, ResNet-50, and GoogLeNet with binary weights, respectively. This highlights CBP as a learning algorithm to address diverse constraints with the minimal performance loss by employing appropriate constraint functions. △ Less

Submitted 25 October, 2021; v1 submitted 6 October, 2021; originally announced October 2021.

Comments: Accepted. NeurIPS 2021. The code is available at https://github.com/dooseokjeong/CBP

arXiv:2109.12894 [pdf, other]

Training Spiking Neural Networks Using Lessons From Deep Learning

Authors: Jason K. Eshraghian, Max Ward, Emre Neftci, Xinxin Wang, Gregor Lenz, Girish Dwivedi, Mohammed Bennamoun, Doo Seok Jeong, Wei D. Lu

Abstract: The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This paper serves as a tutorial and perspective showing how to apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation and neurosc… ▽ More The brain is the perfect place to look for inspiration to develop more efficient neural networks. The inner workings of our synapses and neurons provide a glimpse at what the future of deep learning might look like. This paper serves as a tutorial and perspective showing how to apply the lessons learnt from several decades of research in deep learning, gradient descent, backpropagation and neuroscience to biologically plausible spiking neural neural networks. We also explore the delicate interplay between encoding data as spikes and the learning process; the challenges and solutions of applying gradient-based learning to spiking neural networks (SNNs); the subtle link between temporal backpropagation and spike timing dependent plasticity, and how deep learning might move towards biologically plausible online learning. Some ideas are well accepted and commonly used amongst the neuromorphic engineering community, while others are presented or justified for the first time here. The fields of deep learning and spiking neural networks evolve very rapidly. We endeavour to treat this document as a 'dynamic' manuscript that will continue to be updated as the common practices in training SNNs also change. A series of companion interactive tutorials complementary to this paper using our Python package, snnTorch, are also made available. See https://snntorch.readthedocs.io/en/latest/tutorials/index.html . △ Less

Submitted 13 August, 2023; v1 submitted 27 September, 2021; originally announced September 2021.

arXiv:2105.02483 [pdf, other]

Covering Convex Polygons by Two Congruent Disks

Authors: Jongmin Choi, Dahye Jeong, Hee-Kap Ahn

Abstract: We consider the planar two-center problem for a convex polygon: given a convex polygon in the plane, find two congruent disks of minimum radius whose union contains the polygon. We present an $O(n\log n)$-time algorithm for the two-center problem for a convex polygon, where $n$ is the number of vertices of the polygon. This improves upon the previous best algorithm for the problem. We consider the planar two-center problem for a convex polygon: given a convex polygon in the plane, find two congruent disks of minimum radius whose union contains the polygon. We present an $O(n\log n)$-time algorithm for the two-center problem for a convex polygon, where $n$ is the number of vertices of the polygon. This improves upon the previous best algorithm for the problem. △ Less

Submitted 12 May, 2021; v1 submitted 6 May, 2021; originally announced May 2021.

Comments: 21 pages, 7 figures

arXiv:2102.04680 [pdf, other]

TräumerAI: Dreaming Music with StyleGAN

Authors: Dasaem Jeong, Seungheon Doh, Taegyun Kwon

Abstract: The goal of this paper to generate a visually appealing video that responds to music with a neural network so that each frame of the video reflects the musical characteristics of the corresponding audio clip. To achieve the goal, we propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN, named TräumerAI, which consists of a music auto-tagging model… ▽ More The goal of this paper to generate a visually appealing video that responds to music with a neural network so that each frame of the video reflects the musical characteristics of the corresponding audio clip. To achieve the goal, we propose a neural music visualizer directly mapping deep music embeddings to style embeddings of StyleGAN, named TräumerAI, which consists of a music auto-tagging model using short-chunk CNN and StyleGAN2 pre-trained on WikiArt dataset. Rather than establishing an objective metric between musical and visual semantics, we manually labeled the pairs in a subjective manner. An annotator listened to 100 music clips of 10 seconds long and selected an image that suits the music among the 200 StyleGAN-generated examples. Based on the collected data, we trained a simple transfer function that converts an audio embedding to a style embedding. The generated examples show that the mapping between audio and video makes a certain level of intra-segment similarity and inter-segment dissimilarity. △ Less

Submitted 9 February, 2021; originally announced February 2021.

Comments: presented in NeurIPS Workshop 2020: Machine Learning for Creativity and Design

arXiv:2011.01436 [pdf, other]

Developing High Quality Training Samples for Deep Learning Based Local Climate Zone Classification in Korea

Authors: Minho Kim, Doyoung Jeong, Hyoungwoo Choi, Yongil Kim

Abstract: Two out of three people will be living in urban areas by 2050, as projected by the United Nations, emphasizing the need for sustainable urban development and monitoring. Common urban footprint data provide high-resolution city extents but lack essential information on the distribution, pattern, and characteristics. The Local Climate Zone (LCZ) offers an efficient and standardized framework that ca… ▽ More Two out of three people will be living in urban areas by 2050, as projected by the United Nations, emphasizing the need for sustainable urban development and monitoring. Common urban footprint data provide high-resolution city extents but lack essential information on the distribution, pattern, and characteristics. The Local Climate Zone (LCZ) offers an efficient and standardized framework that can delineate the internal structure and characteristics of urban areas. Global-scale LCZ mapping has been explored, but are limited by low accuracy, variable labeling quality, or domain adaptation challenges. Instead, this study developed a custom LCZ data to map key Korean cities using a multi-scale convolutional neural network. Results demonstrated that using a novel, custom LCZ data with deep learning can generate more accurate LCZ map results compared to conventional community-based LCZ mapping with machine learning as well as transfer learning of the global So2Sat dataset. △ Less

Submitted 10 December, 2020; v1 submitted 2 November, 2020; originally announced November 2020.

Comments: 7 pages, 7 figures; AI for Earth Workshop at NeurIPS2020

arXiv:2010.01104 [pdf, other]

Polyphonic Piano Transcription Using Autoregressive Multi-State Note Model

Authors: Taegyun Kwon, Dasaem Jeong, Juhan Nam

Abstract: Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evol… ▽ More Recent advances in polyphonic piano transcription have been made primarily by a deliberate design of neural network architectures that detect different note states such as onset or sustain and model the temporal evolution of the states. The majority of them, however, use separate neural networks for each note state, thereby optimizing multiple loss functions, and also they handle the temporal evolution of note states by abstract connections between the state-wise neural networks or using a post-processing module. In this paper, we propose a unified neural network architecture where multiple note states are predicted as a softmax output with a single loss function and the temporal order is learned by an auto-regressive connection within the single neural network. This compact model allows to increase note states without architectural complexity. Using the MAESTRO dataset, we examine various combinations of multiple note states including on, onset, sustain, re-onset, offset, and off. We also show that the autoregressive module effectively learns inter-state dependency of notes. Finally, we show that our proposed model achieves performance comparable to state-of-the-arts with fewer parameters. △ Less

Submitted 2 October, 2020; originally announced October 2020.

Comments: 6+2 pages, 5 figures, Camera-ready version. To be published in ISMIR 2020. Project page is available at https://TaegyunKwon.github.io/ar_multi_transcription

arXiv:2002.10941 [pdf, other]

A$^3$: Accelerating Attention Mechanisms in Neural Networks with Approximation

Authors: Tae Jun Ham, Sung Jun Jung, Seonghak Kim, Young H. Oh, Yeonhong Park, Yoonho Song, Jung-Hun Park, Sanghee Lee, Kyoung Park, Jae W. Lee, Deog-Kyoon Jeong

Abstract: With the increasing computational demands of neural networks, many hardware accelerators for the neural networks have been proposed. Such existing neural network accelerators often focus on popular neural network types such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs); however, not much attention has been paid to attention mechanisms, an emerging neural network prim… ▽ More With the increasing computational demands of neural networks, many hardware accelerators for the neural networks have been proposed. Such existing neural network accelerators often focus on popular neural network types such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs); however, not much attention has been paid to attention mechanisms, an emerging neural network primitive that enables neural networks to retrieve most relevant information from a knowledge-base, external memory, or past states. The attention mechanism is widely adopted by many state-of-the-art neural networks for computer vision, natural language processing, and machine translation, and accounts for a large portion of total execution time. We observe today's practice of implementing this mechanism using matrix-vector multiplication is suboptimal as the attention mechanism is semantically a content-based search where a large portion of computations ends up not being used. Based on this observation, we design and architect A3, which accelerates attention mechanisms in neural networks with algorithmic approximation and hardware specialization. Our proposed accelerator achieves multiple orders of magnitude improvement in energy efficiency (performance/watt) as well as substantial speedup over the state-of-the-art conventional hardware. △ Less

Submitted 21 February, 2020; originally announced February 2020.

Comments: To be published in 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)

arXiv:2001.10673 [pdf, other]

doi 10.2514/6.2020-2096

Assistive Relative Pose Estimation for On-orbit Assembly using Convolutional Neural Networks

Authors: Shubham Sonawani, Ryan Alimo, Renaud Detry, Daniel Jeong, Andrew Hess, Heni Ben Amor

Abstract: Accurate real-time pose estimation of spacecraft or object in space is a key capability necessary for on-orbit spacecraft servicing and assembly tasks. Pose estimation of objects in space is more challenging than for objects on Earth due to space images containing widely varying illumination conditions, high contrast, and poor resolution in addition to power and mass constraints. In this paper, a… ▽ More Accurate real-time pose estimation of spacecraft or object in space is a key capability necessary for on-orbit spacecraft servicing and assembly tasks. Pose estimation of objects in space is more challenging than for objects on Earth due to space images containing widely varying illumination conditions, high contrast, and poor resolution in addition to power and mass constraints. In this paper, a convolutional neural network is leveraged to uniquely determine the translation and rotation of an object of interest relative to the camera. The main idea of using CNN model is to assist object tracker used in on space assembly tasks where only feature based method is always not sufficient. The simulation framework designed for assembly task is used to generate dataset for training the modified CNN models and, then results of different models are compared with measure of how accurately models are predicting the pose. Unlike many current approaches for spacecraft or object in space pose estimation, the model does not rely on hand-crafted object-specific features which makes this model more robust and easier to apply to other types of spacecraft. It is shown that the model performs comparable to the current feature-selection methods and can therefore be used in conjunction with them to provide more reliable estimates. △ Less

Submitted 19 February, 2020; v1 submitted 28 January, 2020; originally announced January 2020.

Journal ref: AIAA-Scitech 2020

arXiv:1911.11326 [pdf]

Simplified calcium signaling cascade for synaptic plasticity

Authors: Vladimir Kornijcuk, Dohun Kim, Guhyun Kim, Doo Seok Jeong

Abstract: We propose a model for synaptic plasticity based on a calcium signaling cascade. The model simplifies the full signaling pathways from a calcium influx to the phosphorylation (potentiation) and dephosphorylation (depression) of glutamate receptors that are gated by fictive C1 and C2 catalysts, respectively. This model is based on tangible chemical reactions, including fictive catalysts, for long-t… ▽ More We propose a model for synaptic plasticity based on a calcium signaling cascade. The model simplifies the full signaling pathways from a calcium influx to the phosphorylation (potentiation) and dephosphorylation (depression) of glutamate receptors that are gated by fictive C1 and C2 catalysts, respectively. This model is based on tangible chemical reactions, including fictive catalysts, for long-term plasticity rather than the conceptual theories commonplace in various models, such as preset thresholds of calcium concentration. Our simplified model successfully reproduced the experimental synaptic plasticity induced by different protocols such as (i) a synchronous pairing protocol and (ii) correlated presynaptic and postsynaptic action potentials (APs). Further, the ocular dominance plasticity (or the experimental verification of the celebrated Bienenstock--Cooper--Munro theory) was reproduced by two model synapses that compete by means of back-propagating APs (bAPs). The key to this competition is synapse-specific bAPs with reference to bAP-boosting on the physiological grounds. △ Less

Submitted 25 November, 2019; originally announced November 2019.

Comments: 42 pages, 7 figures, Accepted by Neural Networks

arXiv:1911.05531 [pdf, other]

Accurate Protein Structure Prediction by Embeddings and Deep Learning Representations

Authors: Iddo Drori, Darshan Thaker, Arjun Srivatsa, Daniel Jeong, Yueqi Wang, Linyong Nan, Fan Wu, Dimitri Leggas, Jinhao Lei, Weiyi Lu, Weilong Fu, Yuan Gao, Sashank Karri, Anand Kannan, Antonio Moretti, Mohammed AlQuraishi, Chen Keasar, Itsik Pe'er

Abstract: Proteins are the major building blocks of life, and actuators of almost all chemical and biophysical events in living organisms. Their native structures in turn enable their biological functions which have a fundamental role in drug design. This motivates predicting the structure of a protein from its sequence of amino acids, a fundamental problem in computational biology. In this work, we demonst… ▽ More Proteins are the major building blocks of life, and actuators of almost all chemical and biophysical events in living organisms. Their native structures in turn enable their biological functions which have a fundamental role in drug design. This motivates predicting the structure of a protein from its sequence of amino acids, a fundamental problem in computational biology. In this work, we demonstrate state-of-the-art protein structure prediction (PSP) results using embeddings and deep learning models for prediction of backbone atom distance matrices and torsion angles. We recover 3D coordinates of backbone atoms and reconstruct full atom protein by optimization. We create a new gold standard dataset of proteins which is comprehensive and easy to use. Our dataset consists of amino acid sequences, Q8 secondary structures, position specific scoring matrices, multiple sequence alignment co-evolutionary features, backbone atom distance matrices, torsion angles, and 3D coordinates. We evaluate the quality of our structure prediction by RMSD on the latest Critical Assessment of Techniques for Protein Structure Prediction (CASP) test data and demonstrate competitive results with the winning teams and AlphaFold in CASP13 and supersede the results of the winning teams in CASP12. We make our data, models, and code publicly available. △ Less

Submitted 8 November, 2019; originally announced November 2019.

Journal ref: Machine Learning in Computational Biology, 2019

arXiv:1902.04224 [pdf, other]

Effective Network Compression Using Simulation-Guided Iterative Pruning

Authors: Dae-Woong Jeong, Jaehun Kim, Youngseok Kim, Tae-Ho Kim, Myungsu Chae

Abstract: Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the redu… ▽ More Existing high-performance deep learning models require very intensive computing. For this reason, it is difficult to embed a deep learning model into a system with limited resources. In this paper, we propose the novel idea of the network compression as a method to solve this limitation. The principle of this idea is to make iterative pruning more effective and sophisticated by simulating the reduced network. A simple experiment was conducted to evaluate the method; the results showed that the proposed method achieved higher performance than existing methods at the same pruning level. △ Less

Submitted 11 February, 2019; originally announced February 2019.

Comments: Submitted to NIPS 2018 MLPCD2

MSC Class: 68T05

arXiv:1711.08679 [pdf]

Markov chain Hebbian learning algorithm with ternary synaptic units

Authors: Guhyun Kim, Vladimir Kornijcuk, Dohun Kim, Inho Kim, Jaewook Kim, Hyo Cheon Woo, Ji Hun Kim, Cheol Seong Hwang, Doo Seok Jeong

Abstract: In spite of remarkable progress in machine learning techniques, the state-of-the-art machine learning algorithms often keep machines from real-time learning (online learning) due in part to computational complexity in parameter optimization. As an alternative, a learning algorithm to train a memory in real time is proposed, which is named as the Markov chain Hebbian learning algorithm. The algorit… ▽ More In spite of remarkable progress in machine learning techniques, the state-of-the-art machine learning algorithms often keep machines from real-time learning (online learning) due in part to computational complexity in parameter optimization. As an alternative, a learning algorithm to train a memory in real time is proposed, which is named as the Markov chain Hebbian learning algorithm. The algorithm pursues efficient memory use during training in that (i) the weight matrix has ternary elements (-1, 0, 1) and (ii) each update follows a Markov chain--the upcoming update does not need past weight memory. The algorithm was verified by two proof-of-concept tasks (handwritten digit recognition and multiplication table memorization) in which numbers were taken as symbols. Particularly, the latter bases multiplication arithmetic on memory, which may be analogous to humans' mental arithmetic. The memory-based multiplication arithmetic feasibly offers the basis of factorization, supporting novel insight into the arithmetic. △ Less

Submitted 23 November, 2017; originally announced November 2017.

Comments: 25 pages, 4 figures

arXiv:1711.04480 [pdf, other]

Audio-to-score alignment of piano music using RNN-based automatic music transcription

Authors: Taegyun Kwon, Dasaem Jeong, Juhan Nam

Abstract: We propose a framework for audio-to-score alignment on piano performance that employs automatic music transcription (AMT) using neural networks. Even though the AMT result may contain some errors, the note prediction output can be regarded as a learned feature representation that is directly comparable to MIDI note or chroma representation. To this end, we employ two recurrent neural networks that… ▽ More We propose a framework for audio-to-score alignment on piano performance that employs automatic music transcription (AMT) using neural networks. Even though the AMT result may contain some errors, the note prediction output can be regarded as a learned feature representation that is directly comparable to MIDI note or chroma representation. To this end, we employ two recurrent neural networks that work as the AMT-based feature extractors to the alignment algorithm. One predicts the presence of 88 notes or 12 chroma in frame-level and the other detects note onsets in 12 chroma. We combine the two types of learned features for the audio-to-score alignment. For comparability, we apply dynamic time warping as an alignment algorithm without any additional post-processing. We evaluate the proposed framework on the MAPS dataset and compare it to previous work. The result shows that the alignment framework with the learned features significantly improves the accuracy, achieving less than 10 ms in mean onset error. △ Less

Submitted 13 November, 2017; originally announced November 2017.

Comments: 6 pages, 5 figures, The paper was published in SMC 2017 proceedings, Proceedings of 14th Sound and Music Computing Conference (SMC). 2017

Showing 1–50 of 55 results for author: Jeong, D