subscribe to arXiv mailings

Crayon: Customized On-Device LLM via Instant Adapter Blending and Edge-Server Hybrid Inference

Authors: Jihwan Bang, Juntae Lee, Kyuhong Shim, Seunghan Yang, Simyung Chang

Abstract: The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constr… ▽ More The customization of large language models (LLMs) for user-specified tasks gets important. However, maintaining all the customized LLMs on cloud servers incurs substantial memory and computational overheads, and uploading user data can also lead to privacy concerns. On-device LLMs can offer a promising solution by mitigating these issues. Yet, the performance of on-device LLMs is inherently constrained by the limitations of small-scaled models. To overcome these restrictions, we first propose Crayon, a novel approach for on-device LLM customization. Crayon begins by constructing a pool of diverse base adapters, and then we instantly blend them into a customized adapter without extra training. In addition, we develop a device-server hybrid inference strategy, which deftly allocates more demanding queries or non-customized tasks to a larger, more capable LLM on a server. This ensures optimal performance without sacrificing the benefits of on-device customization. We carefully craft a novel benchmark from multiple question-answer datasets, and show the efficacy of our method in the LLM customization. △ Less

Submitted 11 June, 2024; originally announced June 2024.

Comments: ACL 2024 Main

arXiv:2405.03945 [pdf, other]

Role of Sensing and Computer Vision in 6G Wireless Communications

Authors: Seungnyun Kim, Jihoon Moon, Jinhong Kim, Yongjun Ahn, Donghoon Kim, Sunwoo Kim, Kyuhong Shim, Byonghyo Shim

Abstract: Recently, we are witnessing the remarkable progress and widespread adoption of sensing technologies in autonomous driving, robotics, and metaverse. Considering the rapid advancement of computer vision (CV) technology to analyze the sensing information, we anticipate a proliferation of wireless applications exploiting the sensing and CV technologies in 6G. In this article, we provide a holistic ove… ▽ More Recently, we are witnessing the remarkable progress and widespread adoption of sensing technologies in autonomous driving, robotics, and metaverse. Considering the rapid advancement of computer vision (CV) technology to analyze the sensing information, we anticipate a proliferation of wireless applications exploiting the sensing and CV technologies in 6G. In this article, we provide a holistic overview of the sensing and CV-aided wireless communications (SVWC) framework for 6G. By analyzing the high-resolution sensing information through the powerful CV techniques, SVWC can quickly and accurately understand the wireless environments and then perform the wireless tasks. To demonstrate the efficacy of SVWC, we design the whole process of SVWC including the sensing dataset collection, DL model training, and execution of realistic wireless tasks. From the numerical evaluations on 6G communication scenarios, we show that SVWC achieves considerable performance gains over the conventional 5G systems in terms of positioning accuracy, data rate, and access latency. △ Less

Submitted 6 May, 2024; originally announced May 2024.

arXiv:2404.11630 [pdf, other]

SNP: Structured Neuron-level Pruning to Preserve Attention Scores

Authors: Kyunghwan Shim, Jaewoong Yun, Shinkook Choi

Abstract: Multi-head self-attention (MSA) is a key component of Vision Transformers (ViTs), which have achieved great success in various vision tasks. However, their high computational cost and memory footprint hinder their deployment on resource-constrained devices. Conventional pruning approaches can only compress and accelerate the MSA module using head pruning, although the head is not an atomic unit. T… ▽ More Multi-head self-attention (MSA) is a key component of Vision Transformers (ViTs), which have achieved great success in various vision tasks. However, their high computational cost and memory footprint hinder their deployment on resource-constrained devices. Conventional pruning approaches can only compress and accelerate the MSA module using head pruning, although the head is not an atomic unit. To address this issue, we propose a novel graph-aware neuron-level pruning method, Structured Neuron-level Pruning (SNP). SNP prunes neurons with less informative attention scores and eliminates redundancy among heads. Specifically, it prunes graphically connected query and key layers having the least informative attention scores while preserving the overall attention scores. Value layers, which can be pruned independently, are pruned to eliminate inter-head redundancy. Our proposed method effectively compresses and accelerates Transformer-based models for both edge devices and server processors. For instance, the DeiT-Small with SNP runs 3.1$\times$ faster than the original model and achieves performance that is 21.94\% faster and 1.12\% higher than the DeiT-Tiny. Additionally, SNP combine successfully with conventional head or block pruning approaches. SNP with head pruning could compress the DeiT-Base by 80\% of the parameters and computational costs and achieve 3.85$\times$ faster inference speed on RTX3090 and 4.93$\times$ on Jetson Nano. △ Less

Submitted 17 April, 2024; originally announced April 2024.

arXiv:2403.01594 [pdf, other]

Never Tell the Trick: Covert Interactive Mixed Reality System for Immersive Storytelling

Authors: Chanwoo Lee, Kyubeom Shim, Sanggyo Seo, Gwonu Ryu, Yongsoon Choi

Abstract: This study explores the integration of Ultra-Wideband (UWB) technology into Mixed Reality (MR) Systems for immersive storytelling. Addressing the limitations of existing technologies like Microsoft Kinect and HTC Vive, the research focuses on overcoming challenges in robustness to occlusion, tracking volume, and cost efficiency in props tracking. Utilizing UWB technology, the interactive MR system… ▽ More This study explores the integration of Ultra-Wideband (UWB) technology into Mixed Reality (MR) Systems for immersive storytelling. Addressing the limitations of existing technologies like Microsoft Kinect and HTC Vive, the research focuses on overcoming challenges in robustness to occlusion, tracking volume, and cost efficiency in props tracking. Utilizing UWB technology, the interactive MR system enhances the scope of performance art by enabling larger tracking areas, more reliable and cheaper multi-prop tracking, and reducing occlusion issues. Preliminary user tests suggest meaningful improvements in immersive experience, promising a new possibility in Extended Reality (XR) theater, performance art and immersive game. △ Less

Submitted 3 March, 2024; originally announced March 2024.

Comments: To be presented in IEEE VR 2024

arXiv:2312.07342 [pdf, other]

Expand-and-Quantize: Unsupervised Semantic Segmentation Using High-Dimensional Space and Product Quantization

Authors: Jiyoung Kim, Kyuhong Shim, Insu Lee, Byonghyo Shim

Abstract: Unsupervised semantic segmentation (USS) aims to discover and recognize meaningful categories without any labels. For a successful USS, two key abilities are required: 1) information compression and 2) clustering capability. Previous methods have relied on feature dimension reduction for information compression, however, this approach may hinder the process of clustering. In this paper, we propose… ▽ More Unsupervised semantic segmentation (USS) aims to discover and recognize meaningful categories without any labels. For a successful USS, two key abilities are required: 1) information compression and 2) clustering capability. Previous methods have relied on feature dimension reduction for information compression, however, this approach may hinder the process of clustering. In this paper, we propose a novel USS framework called Expand-and-Quantize Unsupervised Semantic Segmentation (EQUSS), which combines the benefits of high-dimensional spaces for better clustering and product quantization for effective information compression. Our extensive experiments demonstrate that EQUSS achieves state-of-the-art results on three standard benchmarks. In addition, we analyze the entropy of USS features, which is the first step towards understanding USS from the perspective of information theory. △ Less

Submitted 12 December, 2023; originally announced December 2023.

Comments: Accepted to AAAI 2024

arXiv:2309.00647 [pdf, other]

Improving Small Footprint Few-shot Keyword Spotting with Supervision on Auxiliary Data

Authors: Seunghan Yang, Byeonggeun Kim, Kyuhong Shim, Simyung Chang

Abstract: Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning… ▽ More Few-shot keyword spotting (FS-KWS) models usually require large-scale annotated datasets to generalize to unseen target keywords. However, existing KWS datasets are limited in scale and gathering keyword-like labeled data is costly undertaking. To mitigate this issue, we propose a framework that uses easily collectible, unlabeled reading speech data as an auxiliary source. Self-supervised learning has been widely adopted for learning representations from unlabeled data; however, it is known to be suitable for large models with enough capacity and is not practical for training a small footprint FS-KWS model. Instead, we automatically annotate and filter the data to construct a keyword-like dataset, LibriWord, enabling supervision on auxiliary data. We then adopt multi-task learning that helps the model to enhance the representation power from out-of-domain auxiliary data. Our method notably improves the performance over competitive methods in the FS-KWS benchmark. △ Less

Submitted 31 August, 2023; originally announced September 2023.

Comments: Interspeech 2023

arXiv:2308.16415 [pdf, other]

Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer

Authors: Kyuhong Shim, Jinkyu Lee, Simyung Chang, Kyuwoong Hwang

Abstract: Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to… ▽ More Streaming automatic speech recognition (ASR) models are restricted from accessing future context, which results in worse performance compared to the non-streaming models. To improve the performance of streaming ASR, knowledge distillation (KD) from the non-streaming to streaming model has been studied, mainly focusing on aligning the output token probabilities. In this paper, we propose a layer-to-layer KD from the teacher encoder to the student encoder. To ensure that features are extracted using the same context, we insert auxiliary non-streaming branches to the student and perform KD from the non-streaming teacher layer to the non-streaming auxiliary layer. We design a special KD loss that leverages the autoregressive predictive coding (APC) mechanism to encourage the streaming model to predict unseen future contexts. Experimental results show that the proposed method can significantly reduce the word error rate compared to previous token probability distillation methods. △ Less

Submitted 30 August, 2023; originally announced August 2023.

Comments: Accepted to Interspeech 2023

arXiv:2306.01388 [pdf, other]

From Large Language Models to Databases and Back: A discussion on research and education

Authors: Sihem Amer-Yahia, Angela Bonifati, Lei Chen, Guoliang Li, Kyuseok Shim, Jianliang Xu, Xiaochun Yang

Abstract: This discussion was conducted at a recent panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023), held April 17-20, 2023 in Tianjin, China. The title of the panel was "What does LLM (ChatGPT) Bring to Data Science Research and Education? Pros and Cons". It was moderated by Lei Chen and Xiaochun Yang. The discussion raised several questions on how lar… ▽ More This discussion was conducted at a recent panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023), held April 17-20, 2023 in Tianjin, China. The title of the panel was "What does LLM (ChatGPT) Bring to Data Science Research and Education? Pros and Cons". It was moderated by Lei Chen and Xiaochun Yang. The discussion raised several questions on how large language models (LLMs) and database research and education can help each other and the potential risks of LLMs. △ Less

Submitted 7 July, 2023; v1 submitted 2 June, 2023; originally announced June 2023.

Comments: 7 pages, 2 figures, the Panel at the 28th International Conference on Database Systems for Advanced Applications (DASFAA 2023)

arXiv:2305.13680 [pdf, other]

ChatGPT, Can You Generate Solutions for my Coding Exercises? An Evaluation on its Effectiveness in an undergraduate Java Programming Course

Authors: Eng Lieh Ouh, Benjamin Kok Siew Gan, Kyong Jin Shim, Swavek Wlodkowski

Abstract: In this study, we assess the efficacy of employing the ChatGPT language model to generate solutions for coding exercises within an undergraduate Java programming course. ChatGPT, a large-scale, deep learning-driven natural language processing model, is capable of producing programming code based on textual input. Our evaluation involves analyzing ChatGPT-generated solutions for 80 diverse programm… ▽ More In this study, we assess the efficacy of employing the ChatGPT language model to generate solutions for coding exercises within an undergraduate Java programming course. ChatGPT, a large-scale, deep learning-driven natural language processing model, is capable of producing programming code based on textual input. Our evaluation involves analyzing ChatGPT-generated solutions for 80 diverse programming exercises and comparing them to the correct solutions. Our findings indicate that ChatGPT accurately generates Java programming solutions, which are characterized by high readability and well-structured organization. Additionally, the model can produce alternative, memory-efficient solutions. However, as a natural language processing model, ChatGPT struggles with coding exercises containing non-textual descriptions or class files, leading to invalid solutions. In conclusion, ChatGPT holds potential as a valuable tool for students seeking to overcome programming challenges and explore alternative approaches to solving coding problems. By understanding its limitations, educators can design coding exercises that minimize the potential for misuse as a cheating aid while maintaining their validity as assessment tools. △ Less

Submitted 23 May, 2023; originally announced May 2023.

arXiv:2304.12849 [pdf, other]

Depth-Relative Self Attention for Monocular Depth Estimation

Authors: Kyuhong Shim, Jiyoung Kim, Gusang Lee, Byonghyo Shim

Abstract: Monocular depth estimation is very challenging because clues to the exact depth are incomplete in a single RGB image. To overcome the limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information. However, we observe that if such hints are overly exploited, the network can be biased on RGB information without considering the comprehen… ▽ More Monocular depth estimation is very challenging because clues to the exact depth are incomplete in a single RGB image. To overcome the limitation, deep neural networks rely on various visual hints such as size, shade, and texture extracted from RGB information. However, we observe that if such hints are overly exploited, the network can be biased on RGB information without considering the comprehensive view. We propose a novel depth estimation model named RElative Depth Transformer (RED-T) that uses relative depth as guidance in self-attention. Specifically, the model assigns high attention weights to pixels of close depth and low attention weights to pixels of distant depth. As a result, the features of similar depth can become more likely to each other and thus less prone to misused visual hints. We show that the proposed model achieves competitive results in monocular depth estimation benchmarks and is less biased to RGB information. In addition, we propose a novel monocular depth estimation benchmark that limits the observable depth range during training in order to evaluate the robustness of the model for unseen depths. △ Less

Submitted 25 April, 2023; originally announced April 2023.

Comments: Accepted for IJCAI 2023

arXiv:2303.05692 [pdf, ps, other]

Semantic-Preserving Augmentation for Robust Image-Text Retrieval

Authors: Sunwoo Kim, Kyuhong Shim, Luong Trung Nguyen, Byonghyo Shim

Abstract: Image text retrieval is a task to search for the proper textual descriptions of the visual world and vice versa. One challenge of this task is the vulnerability to input image and text corruptions. Such corruptions are often unobserved during the training, and degrade the retrieval model decision quality substantially. In this paper, we propose a novel image text retrieval technique, referred to a… ▽ More Image text retrieval is a task to search for the proper textual descriptions of the visual world and vice versa. One challenge of this task is the vulnerability to input image and text corruptions. Such corruptions are often unobserved during the training, and degrade the retrieval model decision quality substantially. In this paper, we propose a novel image text retrieval technique, referred to as robust visual semantic embedding (RVSE), which consists of novel image-based and text-based augmentation techniques called semantic preserving augmentation for image (SPAugI) and text (SPAugT). Since SPAugI and SPAugT change the original data in a way that its semantic information is preserved, we enforce the feature extractors to generate semantic aware embedding vectors regardless of the corruption, improving the model robustness significantly. From extensive experiments using benchmark datasets, we show that RVSE outperforms conventional retrieval schemes in terms of image-text retrieval performance. △ Less

Submitted 9 March, 2023; originally announced March 2023.

Comments: Accepted to ICASSP 2023

arXiv:2302.11812 [pdf, other]

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Authors: Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi

Abstract: Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity. Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. However, aggressive quantization below 2-bit causes considerable accuracy degradation due to unstable convergence, especial… ▽ More Pre-trained Transformer models such as BERT have shown great success in a wide range of applications, but at the cost of substantial increases in model complexity. Quantization-aware training (QAT) is a promising method to lower the implementation cost and energy consumption. However, aggressive quantization below 2-bit causes considerable accuracy degradation due to unstable convergence, especially when the downstream dataset is not abundant. This work proposes a proactive knowledge distillation method called Teacher Intervention (TI) for fast converging QAT of ultra-low precision pre-trained Transformers. TI intervenes layer-wise signal propagation with the intact signal from the teacher to remove the interference of propagated quantization errors, smoothing loss surface of QAT and expediting the convergence. Furthermore, we propose a gradual intervention mechanism to stabilize the recovery of subsections of Transformer layers from quantization. The proposed schemes enable fast convergence of QAT and improve the model accuracy regardless of the diverse characteristics of downstream fine-tuning tasks. We demonstrate that TI consistently achieves superior accuracy with significantly lower fine-tuning iterations on well-known Transformers of natural language processing as well as computer vision compared to the state-of-the-art QAT methods. △ Less

Submitted 23 February, 2023; originally announced February 2023.

Comments: Accepted to EACL 2023 (main conference)

arXiv:2302.00875 [pdf, ps, other]

Vision Transformer-based Feature Extraction for Generalized Zero-Shot Learning

Authors: Jiseob Kim, Kyuhong Shim, Junhan Kim, Byonghyo Shim

Abstract: Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. In ViT, the entire image region is processed without the degradation of the image resolution and the… ▽ More Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the image attribute. In this paper, we put forth a new GZSL approach exploiting Vision Transformer (ViT) to maximize the attribute-related information contained in the image feature. In ViT, the entire image region is processed without the degradation of the image resolution and the local image information is preserved in patch features. To fully enjoy these benefits of ViT, we exploit patch features as well as the CLS feature in extracting the attribute-related image feature. In particular, we propose a novel attention-based module, called attribute attention module (AAM), to aggregate the attribute-related information in patch features. In AAM, the correlation between each patch feature and the synthetic image attribute is used as the importance weight for each patch. From extensive experiments on benchmark datasets, we demonstrate that the proposed technique outperforms the state-of-the-art GZSL approaches by a large margin. △ Less

Submitted 1 February, 2023; originally announced February 2023.

Comments: 21 pages, 10 figures

arXiv:2301.12444 [pdf, other]

Exploring Attention Map Reuse for Efficient Transformer Neural Networks

Authors: Kyuhong Shim, Jungwook Choi, Wonyong Sung

Abstract: Transformer-based deep neural networks have achieved great success in various sequence applications due to their powerful ability to model long-range dependency. The key module of Transformer is self-attention (SA) which extracts features from the entire sequence regardless of the distance between positions. Although SA helps Transformer performs particularly well on long-range tasks, SA requires… ▽ More Transformer-based deep neural networks have achieved great success in various sequence applications due to their powerful ability to model long-range dependency. The key module of Transformer is self-attention (SA) which extracts features from the entire sequence regardless of the distance between positions. Although SA helps Transformer performs particularly well on long-range tasks, SA requires quadratic computation and memory complexity with the input sequence length. Recently, attention map reuse, which groups multiple SA layers to share one attention map, has been proposed and achieved significant speedup for speech recognition models. In this paper, we provide a comprehensive study on attention map reuse focusing on its ability to accelerate inference. We compare the method with other SA compression techniques and conduct a breakdown analysis of its advantages for a long sequence. We demonstrate the effectiveness of attention map reuse by measuring the latency on both CPU and GPU platforms. △ Less

Submitted 29 January, 2023; originally announced January 2023.

arXiv:2210.00367 [pdf, other]

A Comparison of Transformer, Convolutional, and Recurrent Neural Networks on Phoneme Recognition

Authors: Kyuhong Shim, Wonyong Sung

Abstract: Phoneme recognition is a very important part of speech recognition that requires the ability to extract phonetic features from multiple frames. In this paper, we compare and analyze CNN, RNN, Transformer, and Conformer models using phoneme recognition. For CNN, the ContextNet model is used for the experiments. First, we compare the accuracy of various architectures under different constraints, suc… ▽ More Phoneme recognition is a very important part of speech recognition that requires the ability to extract phonetic features from multiple frames. In this paper, we compare and analyze CNN, RNN, Transformer, and Conformer models using phoneme recognition. For CNN, the ContextNet model is used for the experiments. First, we compare the accuracy of various architectures under different constraints, such as the receptive field length, parameter size, and layer depth. Second, we interpret the performance difference of these models, especially when the observable sequence length varies. Our analyses show that Transformer and Conformer models benefit from the long-range accessibility of self-attention through input frames. △ Less

Submitted 1 October, 2022; originally announced October 2022.

arXiv:2204.12416 [pdf, other]

doi 10.1145/3502718.3524795

XSS for the Masses: Integrating Security in a Web Programming Course using a Security Scanner

Authors: Lwin Khin Shar, Christopher M. Poskitt, Kyong Jin Shim, Li Ying Leonard Wong

Abstract: Cybersecurity education is considered an important part of undergraduate computing curricula, but many institutions teach it only in dedicated courses or tracks. This optionality risks students graduating with limited exposure to secure coding practices that are expected in industry. An alternative approach is to integrate cybersecurity concepts across non-security courses, so as to expose student… ▽ More Cybersecurity education is considered an important part of undergraduate computing curricula, but many institutions teach it only in dedicated courses or tracks. This optionality risks students graduating with limited exposure to secure coding practices that are expected in industry. An alternative approach is to integrate cybersecurity concepts across non-security courses, so as to expose students to the interplay between security and other sub-areas of computing. In this paper, we report on our experience of applying the security integration approach to an undergraduate web programming course. In particular, we added a practical introduction to secure coding, which highlighted the OWASP Top 10 vulnerabilities by example, and demonstrated how to identify them using out-of-the-box security scanner tools (e.g. ZAP). Furthermore, we incentivised students to utilise these tools in their own course projects by offering bonus marks. To assess the impact of this intervention, we scanned students' project code over the last three years, finding a reduction in the number of vulnerabilities. Finally, in focus groups and a survey, students shared that our intervention helped to raise awareness, but they also highlighted the importance of grading incentives and the need to teach security content earlier. △ Less

Submitted 26 April, 2022; originally announced April 2022.

Comments: Accepted by the 27th annual conference on Innovation and Technology in Computer Science Education (ITiCSE 2022)

Journal ref: Proc. ITiCSE'22, pages 463-469. ACM, 2022

arXiv:2203.10252 [pdf, ps, other]

Similarity and Content-based Phonetic Self Attention for Speech Recognition

Authors: Kyuhong Shim, Wonyong Sung

Abstract: Transformer-based speech recognition models have achieved great success due to the self-attention (SA) mechanism that utilizes every frame in the feature extraction process. Especially, SA heads in lower layers capture various phonetic characteristics by the query-key dot product, which is designed to compute the pairwise relationship between frames. In this paper, we propose a variant of SA to ex… ▽ More Transformer-based speech recognition models have achieved great success due to the self-attention (SA) mechanism that utilizes every frame in the feature extraction process. Especially, SA heads in lower layers capture various phonetic characteristics by the query-key dot product, which is designed to compute the pairwise relationship between frames. In this paper, we propose a variant of SA to extract more representative phonetic features. The proposed phonetic self-attention (phSA) is composed of two different types of phonetic attention; one is similarity-based and the other is content-based. In short, similarity-based attention captures the correlation between frames while content-based attention only considers each frame without being affected by other frames. We identify which parts of the original dot product equation are related to two different attention patterns and improve each part with simple modifications. Our experiments on phoneme classification and speech recognition show that replacing SA with phSA for lower layers improves the recognition performance without increasing the latency and the parameter size. △ Less

Submitted 11 July, 2022; v1 submitted 19 March, 2022; originally announced March 2022.

Comments: Accepted for INTERSPEECH 2022

arXiv:2203.03583 [pdf, ps, other]

Korean Tokenization for Beam Search Rescoring in Speech Recognition

Authors: Kyuhong Shim, Hyewon Bae, Wonyong Sung

Abstract: The performance of automatic speech recognition (ASR) models can be greatly improved by proper beam-search decoding with external language model (LM). There has been an increasing interest in Korean speech recognition, but not many studies have been focused on the decoding procedure. In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR. Although th… ▽ More The performance of automatic speech recognition (ASR) models can be greatly improved by proper beam-search decoding with external language model (LM). There has been an increasing interest in Korean speech recognition, but not many studies have been focused on the decoding procedure. In this paper, we propose a Korean tokenization method for neural network-based LM used for Korean ASR. Although the common approach is to use the same tokenization method for external LM as the ASR model, we show that it may not be the best choice for Korean. We propose a new tokenization method that inserts a special token, SkipTC, when there is no trailing consonant in a Korean syllable. By utilizing the proposed SkipTC token, the input sequence for LM becomes very regularly patterned so that the LM can better learn the linguistic characteristics. Our experiments show that the proposed approach achieves a lower word error rate compared to the same LM model without SkipTC. In addition, we are the first to report the ASR performance for the recently introduced large-scale 7,600h Korean speech dataset. △ Less

Submitted 28 March, 2022; v1 submitted 22 February, 2022; originally announced March 2022.

Comments: Submitted to INTERSPEECH 2022

arXiv:2201.08357 [pdf]

The Specialized High-Performance Network on Anton 3

Authors: Keun Sup Shim, Brian Greskamp, Brian Towles, Bruce Edwards, J. P. Grossman, David E. Shaw

Abstract: Molecular dynamics (MD) simulation, a computationally intensive method that provides invaluable insights into the behavior of biomolecules, typically requires large-scale parallelization. Implementation of fast parallel MD simulation demands both high bandwidth and low latency for inter-node communication, but in current semiconductor technology, neither of these properties is scaling as quickly a… ▽ More Molecular dynamics (MD) simulation, a computationally intensive method that provides invaluable insights into the behavior of biomolecules, typically requires large-scale parallelization. Implementation of fast parallel MD simulation demands both high bandwidth and low latency for inter-node communication, but in current semiconductor technology, neither of these properties is scaling as quickly as intra-node computational capacity. This disparity in scaling necessitates architectural innovations to maximize the utilization of computational units. For Anton 3, the latest in a family of highly successful special-purpose supercomputers designed for MD simulations, we thus designed and built a completely new specialized network as part of our ASIC. Tightly integrating this network with specialized computation pipelines enables Anton 3 to perform simulations orders of magnitude faster than any general-purpose supercomputer, and to outperform its predecessor, Anton 2 (the state of the art prior to Anton 3), by an order of magnitude. In this paper, we present the three key features of the network that contribute to the high performance of Anton 3. First, through architectural optimizations, the network achieves very low end-to-end inter-node communication latency for fine-grained messages, allowing for better overlap of computation and communication. Second, novel application-specific compression techniques reduce the size of most messages sent between nodes, thereby increasing effective inter-node bandwidth. Lastly, a new hardware synchronization primitive, called a network fence, supports fast fine-grained synchronization tailored to the data flow within a parallel MD application. These application-driven specializations to the network are critical for Anton 3's MD simulation performance advantage over all other machines. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Comments: Accepted by the 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA 2022)

arXiv:2112.14478 [pdf, other]

Semantic Feature Extraction for Generalized Zero-shot Learning

Authors: Junhan Kim, Kyuhong Shim, Byonghyo Shim

Abstract: Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the attribute. In this paper, we put forth a new GZSL technique that improves the GZSL classification performance greatly. Key idea of the proposed approach, henceforth referred to as semantic feature extraction-based GZSL (SE-GZSL), is to use the semantic feature containing only at… ▽ More Generalized zero-shot learning (GZSL) is a technique to train a deep learning model to identify unseen classes using the attribute. In this paper, we put forth a new GZSL technique that improves the GZSL classification performance greatly. Key idea of the proposed approach, henceforth referred to as semantic feature extraction-based GZSL (SE-GZSL), is to use the semantic feature containing only attribute-related information in learning the relationship between the image and the attribute. In doing so, we can remove the interference, if any, caused by the attribute-irrelevant information contained in the image feature. To train a network extracting the semantic feature, we present two novel loss functions, 1) mutual information-based loss to capture all the attribute-related information in the image feature and 2) similarity-based loss to remove unwanted attribute-irrelevant information. From extensive experiments using various datasets, we show that the proposed SE-GZSL technique outperforms conventional GZSL approaches by a large margin. △ Less

Submitted 29 December, 2021; originally announced December 2021.

Comments: Accepted at AAAI2022

arXiv:2110.03252 [pdf, other]

Layer-wise Pruning of Transformer Attention Heads for Efficient Language Modeling

Authors: Kyuhong Shim, Iksoo Choi, Wonyong Sung, Jungwook Choi

Abstract: While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use. Attention head pruning, which removes unnecessary attention heads in the multihead attention, is a promising technique to solve this problem. However, it does not evenly reduce the overall load because the heavy feedforward module is not affected by… ▽ More While Transformer-based models have shown impressive language modeling performance, the large computation cost is often prohibitive for practical use. Attention head pruning, which removes unnecessary attention heads in the multihead attention, is a promising technique to solve this problem. However, it does not evenly reduce the overall load because the heavy feedforward module is not affected by head pruning. In this paper, we apply layer-wise attention head pruning on All-attention Transformer so that the entire computation and the number of parameters can be reduced proportionally to the number of pruned heads. While the architecture has the potential to fully utilize head pruning, we propose three training methods that are especially helpful to minimize performance degradation and stabilize the pruning process. Our pruned model shows consistently lower perplexity within a comparable parameter size than Transformer-XL on WikiText-103 language modeling benchmark. △ Less

Submitted 7 October, 2021; originally announced October 2021.

arXiv:2109.09073 [pdf, other]

doi 10.24251/HICSS.2022.115

Mind the Gap: Reimagining an Interactive Programming Course for the Synchronous Hybrid Classroom

Authors: Christopher M. Poskitt, Kyong Jin Shim, Yi Meng Lau, Hong Seng Ong

Abstract: COVID-19 has significantly affected universities, forcing many courses to be delivered entirely online. As countries bring the pandemic under control, a potential way to safely resume some face-to-face teaching is the synchronous hybrid classroom, in which physically and remotely attending students are taught simultaneously. This comes with challenges, however, including the risk that remotely att… ▽ More COVID-19 has significantly affected universities, forcing many courses to be delivered entirely online. As countries bring the pandemic under control, a potential way to safely resume some face-to-face teaching is the synchronous hybrid classroom, in which physically and remotely attending students are taught simultaneously. This comes with challenges, however, including the risk that remotely attending students perceive a 'gap' between their engagement and that of their physical peers. In this experience report, we describe how an interactive programming course was adapted to hybrid delivery in a way that mitigated this risk. Our solution centred on the use of a professional communication platform - Slack - to equalise participation opportunities and to facilitate peer learning. Furthermore, to mitigate 'Zoom fatigue', we implemented a semi-flipped classroom, covering concepts in videos and using shorter lessons to consolidate them. Finally, we critically reflect on the results of a student survey and our own experiences of implementing the solution. △ Less

Submitted 19 September, 2021; originally announced September 2021.

Comments: Accepted by the 34th Conference on Software Engineering Education and Training (CSEE&T 2022): Special Track of the 55th Hawaii International Conference on System Sciences (HICSS 2022)

Journal ref: Proc. HICSS 2022, pages 931-940. ScholarSpace, 2022

arXiv:2103.10858 [pdf, other]

Toward Compact Deep Neural Networks via Energy-Aware Pruning

Authors: Seul-Ki Yeom, Kyung-Hwan Shim, Jee-Hyun Hwang

Abstract: Despite the remarkable performance, modern deep neural networks are inevitably accompanied by a significant amount of computational cost for learning and deployment, which may be incompatible with their usage on edge devices. Recent efforts to reduce these overheads involve pruning and decomposing the parameters of various layers without performance deterioration. Inspired by several decomposition… ▽ More Despite the remarkable performance, modern deep neural networks are inevitably accompanied by a significant amount of computational cost for learning and deployment, which may be incompatible with their usage on edge devices. Recent efforts to reduce these overheads involve pruning and decomposing the parameters of various layers without performance deterioration. Inspired by several decomposition studies, in this paper, we propose a novel energy-aware pruning method that quantifies the importance of each filter in the network using nuclear-norm (NN). Proposed energy-aware pruning leads to state-of-the-art performance for Top-1 accuracy, FLOPs, and parameter reduction across a wide range of scenarios with multiple network architectures on CIFAR-10 and ImageNet after fine-grained classification tasks. On toy experiment, without fine-tuning, we can visually observe that NN has a minute change in decision boundaries across classes and outperforms the previous popular criteria. We achieve competitive results with 40.4/49.8% of FLOPs and 45.9/52.9% of parameter reduction with 94.13/94.61% in the Top-1 accuracy with ResNet-56/110 on CIFAR-10, respectively. In addition, our observations are consistent for a variety of different pruning setting in terms of data size as well as data quality which can be emphasized in the stability of the acceleration and compression with negligible accuracy loss. △ Less

Submitted 10 March, 2022; v1 submitted 19 March, 2021; originally announced March 2021.

Comments: 10 pages, 5 figures, 3 tables

arXiv:2011.11851 [pdf, other]

Dual Supervision Framework for Relation Extraction with Distant Supervision and Human Annotation

Authors: Woohwan Jung, Kyuseok Shim

Abstract: Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervisio… ▽ More Relation extraction (RE) has been extensively studied due to its importance in real-world applications such as knowledge base construction and question answering. Most of the existing works train the models on either distantly supervised data or human-annotated data. To take advantage of the high accuracy of human annotation and the cheap cost of distant supervision, we propose the dual supervision framework which effectively utilizes both types of data. However, simply combining the two types of data to train a RE model may decrease the prediction accuracy since distant supervision has labeling bias. We employ two separate prediction networks HA-Net and DS-Net to predict the labels by human annotation and distant supervision, respectively, to prevent the degradation of accuracy by the incorrect labeling of distant supervision. Furthermore, we propose an additional loss term called disagreement penalty to enable HA-Net to learn from distantly supervised labels. In addition, we exploit additional networks to adaptively assess the labeling bias by considering contextual information. Our performance study on sentence-level and document-level REs confirms the effectiveness of the dual supervision framework. △ Less

Submitted 23 November, 2020; originally announced November 2020.

Comments: Accepted to COLING 2020

arXiv:2005.02602 [pdf, other]

Gradual Relation Network: Decoding Intuitive Upper Extremity Movement Imaginations Based on Few-Shot EEG Learning

Authors: Kyung-Hwan Shim, Ji-Hoon Jeong, Seong-Whan Lee

Abstract: Brain-computer interface (BCI) is a communication tool that connects users and external devices. In a real-time BCI environment, a calibration procedure is particularly necessary for each user and each session. This procedure consumes a significant amount of time that hinders the application of a BCI system in a real-world scenario. To avoid this problem, we adopt the metric based few-shot learnin… ▽ More Brain-computer interface (BCI) is a communication tool that connects users and external devices. In a real-time BCI environment, a calibration procedure is particularly necessary for each user and each session. This procedure consumes a significant amount of time that hinders the application of a BCI system in a real-world scenario. To avoid this problem, we adopt the metric based few-shot learning approach for decoding intuitive upper-extremity movement imagination (MI) using a gradual relation network (GRN) that can gradually consider the combination of temporal and spectral groups. We acquired the MI data of the upper-arm, forearm, and hand associated with intuitive upper-extremity movement from 25 subjects. The grand average multiclass classification results under offline analysis were 42.57%, 55.60%, and 80.85% in 1-, 5-, and 25-shot settings, respectively. In addition, we could demonstrate the feasibility of intuitive MI decoding using the few-shot approach in real-time robotic arm control scenarios. Five participants could achieve a success rate of 78% in the drinking task. Hence, we demonstrated the feasibility of the online robotic arm control with shortened calibration time by focusing on human body parts but also the accommodation of various untrained intuitive MI decoding based on the proposed GRN. △ Less

Submitted 6 May, 2020; originally announced May 2020.

arXiv:2002.01122 [pdf, other]

Motor Imagery Classification of Single-Arm Tasks Using Convolutional Neural Network based on Feature Refining

Authors: Byeong-Hoo Lee, Ji-Hoon Jeong, Kyung-Hwan Shim, Dong-Joo Kim

Abstract: Brain-computer interface (BCI) decodes brain signals to understand user intention and status. Because of its simple and safe data acquisition process, electroencephalogram (EEG) is commonly used in non-invasive BCI. One of EEG paradigms, motor imagery (MI) is commonly used for recovery or rehabilitation of motor functions due to its signal origin. However, the EEG signals are an oscillatory and no… ▽ More Brain-computer interface (BCI) decodes brain signals to understand user intention and status. Because of its simple and safe data acquisition process, electroencephalogram (EEG) is commonly used in non-invasive BCI. One of EEG paradigms, motor imagery (MI) is commonly used for recovery or rehabilitation of motor functions due to its signal origin. However, the EEG signals are an oscillatory and non-stationary signal that makes it difficult to collect and classify MI accurately. In this study, we proposed a band-power feature refining convolutional neural network (BFR-CNN) which is composed of two convolution blocks to achieve high classification accuracy. We collected EEG signals to create MI dataset contained the movement imagination of a single-arm. The proposed model outperforms conventional approaches in 4-class MI tasks classification. Hence, we demonstrate that the decoding of user intention is possible by using only EEG signals with robust performance using BFR-CNN. △ Less

Submitted 3 February, 2020; originally announced February 2020.

arXiv:2002.01121 [pdf, other]

Classification of Upper Limb Movements \newline Using Convolutional Neural Network \newline with 3D Inception Block

Authors: D. -Y. Lee, J. -H. Jeong, K. -H. Shim, D. -J. Kim

Abstract: A brain-machine interface (BMI) based on electroencephalography (EEG) can overcome the movement deficits for patients and real-world applications for healthy people. Ideally, the BMI system detects user movement intentions transforms them into a control signal for a robotic arm movement. In this study, we made progress toward user intention decoding and successfully classified six different reachi… ▽ More A brain-machine interface (BMI) based on electroencephalography (EEG) can overcome the movement deficits for patients and real-world applications for healthy people. Ideally, the BMI system detects user movement intentions transforms them into a control signal for a robotic arm movement. In this study, we made progress toward user intention decoding and successfully classified six different reaching movements of the right arm in the movement execution (ME). Notably, we designed an experimental environment using robotic arm movement and proposed a convolutional neural network architecture (CNN) with inception block for robust classify executed movements of the same limb. As a result, we confirmed the classification accuracies of six different directions show 0.45 for the executed session. The results proved that the proposed architecture has approximately 6~13% performance increase compared to its conventional classification models. Hence, we demonstrate the 3D inception CNN architecture to contribute to the continuous decoding of ME. △ Less

Submitted 3 February, 2020; originally announced February 2020.

Comments: 5 pages, accepted by BCI2020

arXiv:2002.00210 [pdf, other]

Classification of High-Dimensional Motor Imagery Tasks based on An End-to-end role assigned convolutional neural network

Authors: Byeong-Hoo Lee, Ji-Hoon Jeong, Kyung-Hwan Shim, Seong-Whan Lee

Abstract: A brain-computer interface (BCI) provides a direct communication pathway between user and external devices. Electroencephalogram (EEG) motor imagery (MI) paradigm is widely used in non-invasive BCI to obtain encoded signals contained user intention of movement execution. However, EEG has intricate and non-stationary properties resulting in insufficient decoding performance. By imagining numerous m… ▽ More A brain-computer interface (BCI) provides a direct communication pathway between user and external devices. Electroencephalogram (EEG) motor imagery (MI) paradigm is widely used in non-invasive BCI to obtain encoded signals contained user intention of movement execution. However, EEG has intricate and non-stationary properties resulting in insufficient decoding performance. By imagining numerous movements of a single-arm, decoding performance can be improved without artificial command matching. In this study, we collected intuitive EEG data contained the nine different types of movements of a single-arm from 9 subjects. We propose an end-to-end role assigned convolutional neural network (ERA-CNN) which considers discriminative features of each upper limb region by adopting the principle of a hierarchical CNN architecture. The proposed model outperforms previous methods on 3-class, 5-class and two different types of 7-class classification tasks. Hence, we demonstrate the possibility of decoding user intention by using only EEG signals with robust performance using an ERA-CNN. △ Less

Submitted 3 February, 2020; v1 submitted 1 February, 2020; originally announced February 2020.

Comments: Pre-review version, accepted at ICASSP 2020

arXiv:1910.04397 [pdf, other]

BitNet: Learning-Based Bit-Depth Expansion

Authors: Junyoung Byun, Kyujin Shim, Changick Kim

Abstract: Bit-depth is the number of bits for each color channel of a pixel in an image. Although many modern displays support unprecedented higher bit-depth to show more realistic and natural colors with a high dynamic range, most media sources are still in bit-depth of 8 or lower. Since insufficient bit-depth may generate annoying false contours or lose detailed visual appearance, bit-depth expansion (BDE… ▽ More Bit-depth is the number of bits for each color channel of a pixel in an image. Although many modern displays support unprecedented higher bit-depth to show more realistic and natural colors with a high dynamic range, most media sources are still in bit-depth of 8 or lower. Since insufficient bit-depth may generate annoying false contours or lose detailed visual appearance, bit-depth expansion (BDE) from low bit-depth (LBD) images to high bit-depth (HBD) images becomes more and more important. In this paper, we adopt a learning-based approach for BDE and propose a novel CNN-based bit-depth expansion network (BitNet) that can effectively remove false contours and restore visual details at the same time. We have carefully designed our BitNet based on an encoder-decoder architecture with dilated convolutions and a novel multi-scale feature integration. We have performed various experiments with four different datasets including MIT-Adobe FiveK, Kodak, ESPL v2, and TESTIMAGES, and our proposed BitNet has achieved state-of-the-art performance in terms of PSNR and SSIM among other existing BDE methods and famous CNN-based image processing networks. Unlike previous methods that separately process each color channel, we treat all RGB channels at once and have greatly improved color restoration. In addition, our network has shown the fastest computational speed in near real-time. △ Less

Submitted 10 October, 2019; originally announced October 2019.

Comments: Accepted by ACCV 2018, Authors Byun and Shim contributed equally

arXiv:1904.10217 [pdf, other]

doi 10.5441/002/edbt.2019.19

Crowdsourced Truth Discovery in the Presence of Hierarchies for Knowledge Fusion

Authors: Woohwan Jung, Younghoon Kim, Kyuseok Shim

Abstract: Existing works for truth discovery in categorical data usually assume that claimed values are mutually exclusive and only one among them is correct. However, many claimed values are not mutually exclusive even for functional predicates due to their hierarchical structures. Thus, we need to consider the hierarchical structure to effectively estimate the trustworthiness of the sources and infer the… ▽ More Existing works for truth discovery in categorical data usually assume that claimed values are mutually exclusive and only one among them is correct. However, many claimed values are not mutually exclusive even for functional predicates due to their hierarchical structures. Thus, we need to consider the hierarchical structure to effectively estimate the trustworthiness of the sources and infer the truths. We propose a probabilistic model to utilize the hierarchical structures and an inference algorithm to find the truths. In addition, in the knowledge fusion, the step of automatically extracting information from unstructured data (e.g., text) generates a lot of false claims. To take advantages of the human cognitive abilities in understanding unstructured data, we utilize crowdsourcing to refine the result of the truth discovery. We propose a task assignment algorithm to maximize the accuracy of the inferred truths. The performance study with real-life datasets confirms the effectiveness of our truth inference and task assignment algorithms. △ Less

Submitted 23 April, 2019; originally announced April 2019.

ACM Class: I.2.6

Journal ref: Proceedings of the 22nd International Conference on Extending Database Technology, 2019. pp. 205-216

arXiv:1611.05339 [pdf]

CareerMapper: An Automated Resume Evaluation Tool

Authors: Vivian Lai, Kyong Jin Shim, Richard J. Oentaryo, Philips K. Prasetyo, Casey Vu, Ee-Peng Lim, David Lo

Abstract: The advent of the Web brought about major changes in the way people search for jobs and companies look for suitable candidates. As more employers and recruitment firms turn to the Web for job candidate search, an increasing number of people turn to the Web for uploading and creating their online resumes. Resumes are often the first source of information about candidates and also the first item of… ▽ More The advent of the Web brought about major changes in the way people search for jobs and companies look for suitable candidates. As more employers and recruitment firms turn to the Web for job candidate search, an increasing number of people turn to the Web for uploading and creating their online resumes. Resumes are often the first source of information about candidates and also the first item of evaluation in candidate selection. Thus, it is imperative that resumes are complete, free of errors and well-organized. We present an automated resume evaluation tool called "CareerMapper". Our tool is designed to conduct a thorough review of a user's LinkedIn profile and provide best recommendations for improved online resumes by analyzing a large number of online user profiles. △ Less

Submitted 16 November, 2016; originally announced November 2016.

Journal ref: Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2016)

arXiv:1104.3212 [pdf]

Similarity Join Size Estimation using Locality Sensitive Hashing

Authors: Hongrae Lee, Raymond T. Ng, Kyuseok Shim

Abstract: Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generalization of the previously studied set similarity join size estimation (SSJ) problem and can handle more interesting cases such as TF-IDF vectors. One of the key challenges in similarity join size estimation is that the joi… ▽ More Similarity joins are important operations with a broad range of applications. In this paper, we study the problem of vector similarity join size estimation (VSJ). It is a generalization of the previously studied set similarity join size estimation (SSJ) problem and can handle more interesting cases such as TF-IDF vectors. One of the key challenges in similarity join size estimation is that the join size can change dramatically depending on the input similarity threshold. We propose a sampling based algorithm that uses the Locality-Sensitive-Hashing (LSH) scheme. The proposed algorithm LSH-SS uses an LSH index to enable effective sampling even at high thresholds. We compare the proposed technique with random sampling and the state-of-the-art technique for SSJ (adapted to VSJ) and demonstrate LSH-SS offers more accurate estimates at both high and low similarity thresholds and small variance using real-world data sets. △ Less

Submitted 16 April, 2011; originally announced April 2011.

Comments: VLDB2011

Journal ref: Proceedings of the VLDB Endowment (PVLDB), Vol. 4, No. 6, pp. 338-349 (2011)

Showing 1–32 of 32 results for author: Shim, K