subscribe to arXiv mailings

On-Device Training Empowered Transfer Learning For Human Activity Recognition

Authors: Pixi Kang, Julian Moosmann, Sizhen Bian, Michele Magno

Abstract: Human Activity Recognition (HAR) is an attractive topic to perceive human behavior and supplying assistive services. Besides the classical inertial unit and vision-based HAR methods, new sensing technologies, such as ultrasound and body-area electric fields, have emerged in HAR to enhance user experience and accommodate new application scenarios. As those sensors are often paired with AI for HAR,… ▽ More Human Activity Recognition (HAR) is an attractive topic to perceive human behavior and supplying assistive services. Besides the classical inertial unit and vision-based HAR methods, new sensing technologies, such as ultrasound and body-area electric fields, have emerged in HAR to enhance user experience and accommodate new application scenarios. As those sensors are often paired with AI for HAR, they frequently encounter challenges due to limited training data compared to the more widely IMU or vision-based HAR solutions. Additionally, user-induced concept drift (UICD) is common in such HAR scenarios. UICD is characterized by deviations in the sample distribution of new users from that of the training participants, leading to deteriorated recognition performance. This paper proposes an on-device transfer learning (ODTL) scheme tailored for energy- and resource-constrained IoT edge devices. Optimized on-device training engines are developed for two representative MCU-level edge computing platforms: STM32F756ZG and GAP9. Based on this, we evaluated the ODTL benefits in three HAR scenarios: body capacitance-based gym activity recognition, QVAR- and ultrasonic-based hand gesture recognition. We demonstrated an improvement of 3.73%, 17.38%, and 3.70% in the activity recognition accuracy, respectively. Besides this, we observed that the RISC-V-based GAP9 achieves 20x and 280x less latency and power consumption than STM32F7 MCU during the ODTL deployment, demonstrating the advantages of employing the latest low-power parallel computing devices for edge tasks. △ Less

Submitted 4 July, 2024; originally announced July 2024.

arXiv:2406.12356 [pdf, other]

A Gradient Accumulation Method for Dense Retriever under Memory Constraint

Authors: Jaehee Kim, Yukyung Lee, Pilsung Kang

Abstract: InfoNCE loss is commonly used to train dense retriever in information retrieval tasks. It is well known that a large batch is essential to stable and effective training with InfoNCE loss, which requires significant hardware resources. Due to the dependency of large batch, dense retriever has bottleneck of application and research. Recently, memory reduction methods have been broadly adopted to res… ▽ More InfoNCE loss is commonly used to train dense retriever in information retrieval tasks. It is well known that a large batch is essential to stable and effective training with InfoNCE loss, which requires significant hardware resources. Due to the dependency of large batch, dense retriever has bottleneck of application and research. Recently, memory reduction methods have been broadly adopted to resolve the hardware bottleneck by decomposing forward and backward or using a memory bank. However, current methods still suffer from slow and unstable training. To address these issues, we propose Contrastive Accumulation (ContAccum), a stable and efficient memory reduction method for dense retriever trains that uses a dual memory bank structure to leverage previously generated query and passage representations. Experiments on widely used five information retrieval datasets indicate that ContAccum can surpass not only existing memory reduction methods but also high-resource scenario. Moreover, theoretical analysis and experimental results confirm that ContAccum provides more stable dual-encoder training than current memory bank utilization methods. △ Less

Submitted 18 June, 2024; v1 submitted 18 June, 2024; originally announced June 2024.

arXiv:2405.01028 [pdf, other]

Technical Report of NICE Challenge at CVPR 2024: Caption Re-ranking Evaluation Using Ensembled CLIP and Consensus Scores

Authors: Kiyoon Jeong, Woojun Lee, Woongchan Nam, Minjeong Ma, Pilsung Kang

Abstract: This report presents the ECO (Ensembled Clip score and cOnsensus score) pipeline from team DSBA LAB, which is a new framework used to evaluate and rank captions for a given image. ECO selects the most accurate caption describing image. It is made possible by combining an Ensembled CLIP score, which considers the semantic alignment between the image and captions, with a Consensus score that account… ▽ More This report presents the ECO (Ensembled Clip score and cOnsensus score) pipeline from team DSBA LAB, which is a new framework used to evaluate and rank captions for a given image. ECO selects the most accurate caption describing image. It is made possible by combining an Ensembled CLIP score, which considers the semantic alignment between the image and captions, with a Consensus score that accounts for the essentialness of the captions. Using this framework, we achieved notable success in the CVPR 2024 Workshop Challenge on Caption Re-ranking Evaluation at the New Frontiers for Zero-Shot Image Captioning Evaluation (NICE). Specifically, we secured third place based on the CIDEr metric, second in both the SPICE and METEOR metrics, and first in the ROUGE-L and all BLEU Score metrics. The code and configuration for the ECO framework are available at https://github.com/DSBA-Lab/ECO . △ Less

Submitted 13 June, 2024; v1 submitted 2 May, 2024; originally announced May 2024.

arXiv:2404.13919 [pdf, other]

Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models

Authors: Yukyung Lee, Soonwon Ka, Bokyung Son, Pilsung Kang, Jaewook Kang

Abstract: Large Language Models (LLMs) have significantly impacted the writing process, enabling collaborative content creation and enhancing productivity. However, generating high-quality, user-aligned text remains challenging. In this paper, we propose Writing Path, a framework that uses explicit outlines to guide LLMs in generating goal-oriented, high-quality pieces of writing. Our approach draws inspira… ▽ More Large Language Models (LLMs) have significantly impacted the writing process, enabling collaborative content creation and enhancing productivity. However, generating high-quality, user-aligned text remains challenging. In this paper, we propose Writing Path, a framework that uses explicit outlines to guide LLMs in generating goal-oriented, high-quality pieces of writing. Our approach draws inspiration from structured writing planning and reasoning paths, focusing on capturing and reflecting user intentions throughout the writing process. We construct a diverse dataset from unstructured blog posts to benchmark writing performance and introduce a comprehensive evaluation framework assessing the quality of outlines and generated texts. Our evaluations with GPT-3.5-turbo, GPT-4, and HyperCLOVA X demonstrate that the Writing Path approach significantly enhances text quality according to both LLMs and human evaluations. This study highlights the potential of integrating writing-specific techniques into LLMs to enhance their ability to meet the diverse writing needs of users. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: under review

arXiv:2403.18771 [pdf, other]

CheckEval: Robust Evaluation Framework using Large Language Model via Checklist

Authors: Yukyung Lee, Joonghoon Kim, Jaehee Kim, Hyowon Cho, Pilsung Kang

Abstract: We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more… ▽ More We introduce CheckEval, a novel evaluation framework using Large Language Models, addressing the challenges of ambiguity and inconsistency in current evaluation methods. CheckEval addresses these challenges by dividing evaluation criteria into detailed sub-aspects and constructing a checklist of Boolean questions for each, simplifying the evaluation. This approach not only renders the process more interpretable but also significantly enhances the robustness and reliability of results by focusing on specific evaluation dimensions. Validated through a focused case study using the SummEval benchmark, CheckEval indicates a strong correlation with human judgments. Furthermore, it demonstrates a highly consistent Inter-Annotator Agreement. These findings highlight the effectiveness of CheckEval for objective, flexible, and precise evaluations. By offering a customizable and interactive framework, CheckEval sets a new standard for the use of LLMs in evaluation, responding to the evolving needs of the field and establishing a clear method for future LLM-based evaluation. △ Less

Submitted 27 March, 2024; originally announced March 2024.

Comments: HEAL at CHI 2024

arXiv:2403.07036 [pdf, other]

A Converting Autoencoder Toward Low-latency and Energy-efficient DNN Inference at the Edge

Authors: Hasanul Mahmud, Peng Kang, Kevin Desai, Palden Lama, Sushil Prasad

Abstract: Reducing inference time and energy usage while maintaining prediction accuracy has become a significant concern for deep neural networks (DNN) inference on resource-constrained edge devices. To address this problem, we propose a novel approach based on "converting" autoencoder and lightweight DNNs. This improves upon recent work such as early-exiting framework and DNN partitioning. Early-exiting f… ▽ More Reducing inference time and energy usage while maintaining prediction accuracy has become a significant concern for deep neural networks (DNN) inference on resource-constrained edge devices. To address this problem, we propose a novel approach based on "converting" autoencoder and lightweight DNNs. This improves upon recent work such as early-exiting framework and DNN partitioning. Early-exiting frameworks spend different amounts of computation power for different input data depending upon their complexity. However, they can be inefficient in real-world scenarios that deal with many hard image samples. On the other hand, DNN partitioning algorithms that utilize the computation power of both the cloud and edge devices can be affected by network delays and intermittent connections between the cloud and the edge. We present CBNet, a low-latency and energy-efficient DNN inference framework tailored for edge devices. It utilizes a "converting" autoencoder to efficiently transform hard images into easy ones, which are subsequently processed by a lightweight DNN for inference. To the best of our knowledge, such autoencoder has not been proposed earlier. Our experimental results using three popular image-classification datasets on a Raspberry Pi 4, a Google Cloud instance, and an instance with Nvidia Tesla K80 GPU show that CBNet achieves up to 4.8x speedup in inference latency and 79% reduction in energy usage compared to competing techniques while maintaining similar or higher accuracy. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 8 Pages, 8 Figures

arXiv:2312.16071 [pdf, other]

Event-based Shape from Polarization with Spiking Neural Networks

Authors: Peng Kang, Srutarshi Banerjee, Henry Chopp, Aggelos Katsaggelos, Oliver Cossairt

Abstract: Recent advances in event-based shape determination from polarization offer a transformative approach that tackles the trade-off between speed and accuracy in capturing surface geometries. In this paper, we investigate event-based shape from polarization using Spiking Neural Networks (SNNs), introducing the Single-Timestep and Multi-Timestep Spiking UNets for effective and efficient surface normal… ▽ More Recent advances in event-based shape determination from polarization offer a transformative approach that tackles the trade-off between speed and accuracy in capturing surface geometries. In this paper, we investigate event-based shape from polarization using Spiking Neural Networks (SNNs), introducing the Single-Timestep and Multi-Timestep Spiking UNets for effective and efficient surface normal estimation. Specificially, the Single-Timestep model processes event-based shape as a non-temporal task, updating the membrane potential of each spiking neuron only once, thereby reducing computational and energy demands. In contrast, the Multi-Timestep model exploits temporal dynamics for enhanced data extraction. Extensive evaluations on synthetic and real-world datasets demonstrate that our models match the performance of state-of-the-art Artifical Neural Networks (ANNs) in estimating surface normals, with the added advantage of superior energy efficiency. Our work not only contributes to the advancement of SNNs in event-based sensing but also sets the stage for future explorations in optimizing SNN architectures, integrating multi-modal data, and scaling for applications on neuromorphic hardware. △ Less

Submitted 26 December, 2023; originally announced December 2023.

Comments: 25 pages

arXiv:2312.04982 [pdf, other]

Boosting Prompt-Based Self-Training With Mapping-Free Automatic Verbalizer for Multi-Class Classification

Authors: Yookyung Kho, Jaehee Kim, Pilsung Kang

Abstract: Recently, prompt-based fine-tuning has garnered considerable interest as a core technique for few-shot text classification task. This approach reformulates the fine-tuning objective to align with the Masked Language Modeling (MLM) objective. Leveraging unlabeled data, prompt-based self-training has shown greater effectiveness in binary and three-class classification. However, prompt-based self-tra… ▽ More Recently, prompt-based fine-tuning has garnered considerable interest as a core technique for few-shot text classification task. This approach reformulates the fine-tuning objective to align with the Masked Language Modeling (MLM) objective. Leveraging unlabeled data, prompt-based self-training has shown greater effectiveness in binary and three-class classification. However, prompt-based self-training for multi-class classification has not been adequately investigated, despite its significant applicability to real-world scenarios. Moreover, extending current methods to multi-class classification suffers from the verbalizer that extracts the predicted value of manually pre-defined single label word for each class from MLM predictions. Consequently, we introduce a novel, efficient verbalizer structure, named Mapping-free Automatic Verbalizer (MAV). Comprising two fully connected layers, MAV serves as a trainable verbalizer that automatically extracts the requisite word features for classification by capitalizing on all available information from MLM predictions. Experimental results on five multi-class classification datasets indicate MAV's superior self-training efficacy. △ Less

Submitted 8 December, 2023; originally announced December 2023.

Comments: EMNLP 2023 findings

arXiv:2311.05160 [pdf, other]

RAPID: Training-free Retrieval-based Log Anomaly Detection with PLM considering Token-level information

Authors: Gunho No, Yukyung Lee, Hyeongwon Kang, Pilsung Kang

Abstract: As the IT industry advances, system log data becomes increasingly crucial. Many computer systems rely on log texts for management due to restricted access to source code. The need for log anomaly detection is growing, especially in real-world applications, but identifying anomalies in rapidly accumulating logs remains a challenging task. Traditional deep learning-based anomaly detection models req… ▽ More As the IT industry advances, system log data becomes increasingly crucial. Many computer systems rely on log texts for management due to restricted access to source code. The need for log anomaly detection is growing, especially in real-world applications, but identifying anomalies in rapidly accumulating logs remains a challenging task. Traditional deep learning-based anomaly detection models require dataset-specific training, leading to corresponding delays. Notably, most methods only focus on sequence-level log information, which makes the detection of subtle anomalies harder, and often involve inference processes that are difficult to utilize in real-time. We introduce RAPID, a model that capitalizes on the inherent features of log data to enable anomaly detection without training delays, ensuring real-time capability. RAPID treats logs as natural language, extracting representations using pre-trained language models. Given that logs can be categorized based on system context, we implement a retrieval-based technique to contrast test logs with the most similar normal logs. This strategy not only obviates the need for log-specific training but also adeptly incorporates token-level information, ensuring refined and robust detection, particularly for unseen logs. We also propose the core set technique, which can reduce the computational cost needed for comparison. Experimental results show that even without training on log data, RAPID demonstrates competitive performance compared to prior models and achieves the best performance on certain datasets. Through various research questions, we verified its capability for real-time detection without delay. △ Less

Submitted 9 November, 2023; originally announced November 2023.

arXiv:2311.03754 [pdf, other]

Which is better? Exploring Prompting Strategy For LLM-based Metrics

Authors: Joonghoon Kim, Saeran Park, Kiyoon Jeong, Sangmin Lee, Seung Hun Han, Jiyoon Lee, Pilsung Kang

Abstract: This paper describes the DSBA submissions to the Prompting Large Language Models as Explainable Metrics shared task, where systems were submitted to two tracks: small and large summarization tracks. With advanced Large Language Models (LLMs) such as GPT-4, evaluating the quality of Natural Language Generation (NLG) has become increasingly paramount. Traditional similarity-based metrics such as BLE… ▽ More This paper describes the DSBA submissions to the Prompting Large Language Models as Explainable Metrics shared task, where systems were submitted to two tracks: small and large summarization tracks. With advanced Large Language Models (LLMs) such as GPT-4, evaluating the quality of Natural Language Generation (NLG) has become increasingly paramount. Traditional similarity-based metrics such as BLEU and ROUGE have shown to misalign with human evaluation and are ill-suited for open-ended generation tasks. To address this issue, we explore the potential capability of LLM-based metrics, especially leveraging open-source LLMs. In this study, wide range of prompts and prompting techniques are systematically analyzed with three approaches: prompting strategy, score aggregation, and explainability. Our research focuses on formulating effective prompt templates, determining the granularity of NLG quality scores and assessing the impact of in-context examples on LLM-based evaluation. Furthermore, three aggregation strategies are compared to identify the most reliable method for aggregating NLG quality scores. To examine explainability, we devise a strategy that generates rationales for the scores and analyzes the characteristics of the explanation produced by the open-source LLMs. Extensive experiments provide insights regarding evaluation capabilities of open-source LLMs and suggest effective prompting strategies. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: Eval4NLP 2023 shared task winner on both Small and Large model Track for Summarization

arXiv:2306.02043 [pdf, other]

Painsight: An Extendable Opinion Mining Framework for Detecting Pain Points Based on Online Customer Reviews

Authors: Yukyung Lee, Jaehee Kim, Doyoon Kim, Yookyung Kho, Younsun Kim, Pilsung Kang

Abstract: As the e-commerce market continues to expand and online transactions proliferate, customer reviews have emerged as a critical element in shaping the purchasing decisions of prospective buyers. Previous studies have endeavored to identify key aspects of customer reviews through the development of sentiment analysis models and topic models. However, extracting specific dissatisfaction factors remain… ▽ More As the e-commerce market continues to expand and online transactions proliferate, customer reviews have emerged as a critical element in shaping the purchasing decisions of prospective buyers. Previous studies have endeavored to identify key aspects of customer reviews through the development of sentiment analysis models and topic models. However, extracting specific dissatisfaction factors remains a challenging task. In this study, we delineate the pain point detection problem and propose Painsight, an unsupervised framework for automatically extracting distinct dissatisfaction factors from customer reviews without relying on ground truth labels. Painsight employs pre-trained language models to construct sentiment analysis and topic models, leveraging attribution scores derived from model gradients to extract dissatisfaction factors. Upon application of the proposed methodology to customer review data spanning five product categories, we successfully identified and categorized dissatisfaction factors within each group, as well as isolated factors for each type. Notably, Painsight outperformed benchmark methods, achieving substantial performance enhancements and exceptional results in human evaluations. △ Less

Submitted 3 June, 2023; originally announced June 2023.

Comments: WASSA at ACL 2023

arXiv:2303.01034 [pdf, other]

Multi-Task Self-Supervised Time-Series Representation Learning

Authors: Heejeong Choi, Pilsung Kang

Abstract: Time-series representation learning can extract representations from data with temporal dynamics and sparse labels. When labeled data are sparse but unlabeled data are abundant, contrastive learning, i.e., a framework to learn a latent space where similar samples are close to each other while dissimilar ones are far from each other, has shown outstanding performance. This strategy can encourage va… ▽ More Time-series representation learning can extract representations from data with temporal dynamics and sparse labels. When labeled data are sparse but unlabeled data are abundant, contrastive learning, i.e., a framework to learn a latent space where similar samples are close to each other while dissimilar ones are far from each other, has shown outstanding performance. This strategy can encourage varied consistency of time-series representations depending on the positive pair selection and contrastive loss. We propose a new time-series representation learning method by combining the advantages of self-supervised tasks related to contextual, temporal, and transformation consistency. It allows the network to learn general representations for various downstream tasks and domains. Specifically, we first adopt data preprocessing to generate positive and negative pairs for each self-supervised task. The model then performs contextual, temporal, and transformation contrastive learning and is optimized jointly using their contrastive losses. We further investigate an uncertainty weighting approach to enable effective multi-task learning by considering the contribution of each consistency. We evaluate the proposed framework on three downstream tasks: time-series classification, forecasting, and anomaly detection. Experimental results show that our method not only outperforms the benchmark models on these downstream tasks, but also shows efficiency in cross-domain transfer learning. △ Less

Submitted 2 March, 2023; originally announced March 2023.

arXiv:2210.04277 [pdf, other]

Boost Event-Driven Tactile Learning with Location Spiking Neurons

Authors: Peng Kang, Srutarshi Banerjee, Henry Chopp, Aggelos Katsaggelos, Oliver Cossairt

Abstract: Tactile sensing is essential for a variety of daily tasks. And recent advances in event-driven tactile sensors and Spiking Neural Networks (SNNs) spur the research in related fields. However, SNN-enabled event-driven tactile learning is still in its infancy due to the limited representation abilities of existing spiking neurons and high spatio-temporal complexity in the event-driven tactile data.… ▽ More Tactile sensing is essential for a variety of daily tasks. And recent advances in event-driven tactile sensors and Spiking Neural Networks (SNNs) spur the research in related fields. However, SNN-enabled event-driven tactile learning is still in its infancy due to the limited representation abilities of existing spiking neurons and high spatio-temporal complexity in the event-driven tactile data. In this paper, to improve the representation capability of existing spiking neurons, we propose a novel neuron model called "location spiking neuron", which enables us to extract features of event-based data in a novel way. Specifically, based on the classical Time Spike Response Model (TSRM), we develop the Location Spike Response Model (LSRM). In addition, based on the most commonly-used Time Leaky Integrate-and-Fire (TLIF) model, we develop the Location Leaky Integrate-and-Fire (LLIF) model. Moreover, to demonstrate the representation effectiveness of our proposed neurons and capture the complex spatio-temporal dependencies in the event-driven tactile data, we exploit the location spiking neurons to propose two hybrid models for event-driven tactile learning. Specifically, the first hybrid model combines a fully-connected SNN with TSRM neurons and a fully-connected SNN with LSRM neurons. And the second hybrid model fuses the spatial spiking graph neural network with TLIF neurons and the temporal spiking graph neural network with LLIF neurons. Extensive experiments demonstrate the significant improvements of our models over the state-of-the-art methods on event-driven tactile learning. Moreover, compared to the counterpart artificial neural networks (ANNs), our SNN models are 10x to 100x energy-efficient, which shows the superior energy efficiency of our models and may bring new opportunities to the spike-based learning community and neuromorphic engineering. △ Less

Submitted 19 December, 2022; v1 submitted 9 October, 2022; originally announced October 2022.

Comments: Under review. Please note that this paper is a journal extension of our previous conference paper: arXiv:2209.01080. Please check what we added in the introduction part

arXiv:2209.01080 [pdf, other]

Event-Driven Tactile Learning with Location Spiking Neurons

Authors: Peng Kang, Srutarshi Banerjee, Henry Chopp, Aggelos Katsaggelos, Oliver Cossairt

Abstract: The sense of touch is essential for a variety of daily tasks. New advances in event-based tactile sensors and Spiking Neural Networks (SNNs) spur the research in event-driven tactile learning. However, SNN-enabled event-driven tactile learning is still in its infancy due to the limited representative abilities of existing spiking neurons and high spatio-temporal complexity in the data. In this pap… ▽ More The sense of touch is essential for a variety of daily tasks. New advances in event-based tactile sensors and Spiking Neural Networks (SNNs) spur the research in event-driven tactile learning. However, SNN-enabled event-driven tactile learning is still in its infancy due to the limited representative abilities of existing spiking neurons and high spatio-temporal complexity in the data. In this paper, to improve the representative capabilities of existing spiking neurons, we propose a novel neuron model called "location spiking neuron", which enables us to extract features of event-based data in a novel way. Moreover, based on the classical Time Spike Response Model (TSRM), we develop a specific location spiking neuron model - Location Spike Response Model (LSRM) that serves as a new building block of SNNs. Furthermore, we propose a hybrid model which combines an SNN with TSRM neurons and an SNN with LSRM neurons to capture the complex spatio-temporal dependencies in the data. Extensive experiments demonstrate the significant improvements of our models over other works on event-driven tactile learning and show the superior energy efficiency of our models and location spiking neurons, which may unlock their potential on neuromorphic hardware. △ Less

Submitted 23 July, 2022; originally announced September 2022.

Comments: accepted by IJCNN 2022 (oral), the source code is available at https://github.com/pkang2017/TactileLocNeurons

arXiv:2208.13022 [pdf, other]

Parity-Check Matrix Partitioning for Efficient Layered Decoding of QC-LDPC Codes

Authors: Teng Lu, Xuan He, Peng Kang, Jiongyue Xing, Xiaohu Tang

Abstract: In this paper, we consider how to partition the parity-check matrices (PCMs) to reduce the hardware complexity and computation delay for the row layered decoding of quasi-cyclic low-density parity-check (QC-LDPC) codes. First, we formulate the PCM partitioning as an optimization problem, which targets to minimize the maximum column weight of each layer while maintaining a block cyclic shift proper… ▽ More In this paper, we consider how to partition the parity-check matrices (PCMs) to reduce the hardware complexity and computation delay for the row layered decoding of quasi-cyclic low-density parity-check (QC-LDPC) codes. First, we formulate the PCM partitioning as an optimization problem, which targets to minimize the maximum column weight of each layer while maintaining a block cyclic shift property among different layers. As a result, we derive all the feasible solutions for the problem and propose a tight lower bound $ω_{LB}$ on the minimum possible maximum column weight to evaluate a solution. Second, we define a metric called layer distance to measure the data dependency between consecutive layers and further illustrate how to identify the solutions with desired layer distance from those achieving the minimum value of $ω_{LB}=1$, which is preferred to reduce computation delay. Next, we demonstrate that up-to-now, finding an optimal solution for the optimization problem with polynomial time complexity is unachievable. Therefore, both enumerative and greedy partition algorithms are proposed instead. After that, we modify the quasi-cyclic progressive edge-growth (QC-PEG) algorithm to directly construct PCMs that have a straightforward partition scheme to achieve $ω_{LB} $ or the desired layer distance. Simulation results showed that the constructed codes have better error correction performance and smaller average number of iterations than the underlying 5G LDPC code. △ Less

Submitted 27 August, 2022; originally announced August 2022.

arXiv:2207.03858 [pdf, other]

DSTEA: Improving Dialogue State Tracking via Entity Adaptive Pre-training

Authors: Yukyung Lee, Takyoung Kim, Hoonsang Yoon, Pilsung Kang, Junseong Bang, Misuk Kim

Abstract: Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue c… ▽ More Dialogue State Tracking (DST) is critical for comprehensively interpreting user and system utterances, thereby forming the cornerstone of efficient dialogue systems. Despite past research efforts focused on enhancing DST performance through alterations to the model structure or integrating additional features like graph relations, they often require additional pre-training with external dialogue corpora. In this study, we propose DSTEA, improving Dialogue State Tracking via Entity Adaptive pre-training, which can enhance the encoder through by intensively training key entities in dialogue utterances. DSTEA identifies these pivotal entities from input dialogues utilizing four different methods: ontology information, named-entity recognition, the spaCy, and the flair library. Subsequently, it employs selective knowledge masking to train the model effectively. Remarkably, DSTEA only requires pre-training without the direct infusion of extra knowledge into the DST model. This approach resulted in substantial performance improvements of four robust DST models on MultiWOZ 2.0, 2.1, and 2.2, with joint goal accuracy witnessing an increase of up to 2.69% (from 52.41% to 55.10%). Further validation of DSTEA's efficacy was provided through comparative experiments considering various entity types and different entity adaptive pre-training configurations such as masking strategy and masking rate. △ Less

Submitted 23 July, 2023; v1 submitted 8 July, 2022; originally announced July 2022.

Journal ref: KnowledgeNLP@KDD2023

arXiv:2206.09178 [pdf, other]

REVECA -- Rich Encoder-decoder framework for Video Event CAptioner

Authors: Jaehyuk Heo, YongGi Jeong, Sunwoo Kim, Jaehee Kim, Pilsung Kang

Abstract: We describe an approach used in the Generic Boundary Event Captioning challenge at the Long-Form Video Understanding Workshop held at CVPR 2022. We designed a Rich Encoder-decoder framework for Video Event CAptioner (REVECA) that utilizes spatial and temporal information from the video to generate a caption for the corresponding the event boundary. REVECA uses frame position embedding to incorpora… ▽ More We describe an approach used in the Generic Boundary Event Captioning challenge at the Long-Form Video Understanding Workshop held at CVPR 2022. We designed a Rich Encoder-decoder framework for Video Event CAptioner (REVECA) that utilizes spatial and temporal information from the video to generate a caption for the corresponding the event boundary. REVECA uses frame position embedding to incorporate information before and after the event boundary. Furthermore, it employs features extracted using the temporal segment network and temporal-based pairwise difference method to learn temporal information. A semantic segmentation mask for the attentional pooling process is adopted to learn the subject of an event. Finally, LoRA is applied to fine-tune the image encoder to enhance the learning efficiency. REVECA yielded an average score of 50.97 on the Kinetics-GEBC test data, which is an improvement of 10.17 over the baseline method. Our code is available in https://github.com/TooTouch/REVECA. △ Less

Submitted 18 June, 2022; originally announced June 2022.

Comments: The IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR). LOng-form VidEo Understanding (LOVEU) workshop

arXiv:2203.10808 [pdf, other]

AnoViT: Unsupervised Anomaly Detection and Localization with Vision Transformer-based Encoder-Decoder

Authors: Yunseung Lee, Pilsung Kang

Abstract: Image anomaly detection problems aim to determine whether an image is abnormal, and to detect anomalous areas. These methods are actively used in various fields such as manufacturing, medical care, and intelligent information. Encoder-decoder structures have been widely used in the field of anomaly detection because they can easily learn normal patterns in an unsupervised learning environment and… ▽ More Image anomaly detection problems aim to determine whether an image is abnormal, and to detect anomalous areas. These methods are actively used in various fields such as manufacturing, medical care, and intelligent information. Encoder-decoder structures have been widely used in the field of anomaly detection because they can easily learn normal patterns in an unsupervised learning environment and calculate a score to identify abnormalities through a reconstruction error indicating the difference between input and reconstructed images. Therefore, current image anomaly detection methods have commonly used convolutional encoder-decoders to extract normal information through the local features of images. However, they are limited in that only local features of the image can be utilized when constructing a normal representation owing to the characteristics of convolution operations using a filter of fixed size. Therefore, we propose a vision transformer-based encoder-decoder model, named AnoViT, designed to reflect normal information by additionally learning the global relationship between image patches, which is capable of both image anomaly detection and localization. The proposed approach constructs a feature map that maintains the existing location information of individual patches by using the embeddings of all patches passed through multiple self-attention layers. The proposed AnoViT model performed better than the convolution-based model on three benchmark datasets. In MVTecAD, which is a representative benchmark dataset for anomaly localization, it showed improved results on 10 out of 15 classes compared with the baseline. Furthermore, the proposed method showed good performance regardless of the class and type of the anomalous area when localization results were evaluated qualitatively. △ Less

Submitted 21 March, 2022; originally announced March 2022.

arXiv:2203.03123 [pdf, other]

Mismatch between Multi-turn Dialogue and its Evaluation Metric in Dialogue State Tracking

Authors: Takyoung Kim, Hoonsang Yoon, Yukyung Lee, Pilsung Kang, Misuk Kim

Abstract: Dialogue state tracking (DST) aims to extract essential information from multi-turn dialogue situations and take appropriate actions. A belief state, one of the core pieces of information, refers to the subject and its specific content, and appears in the form of domain-slot-value. The trained model predicts "accumulated" belief states in every turn, and joint goal accuracy and slot accuracy are m… ▽ More Dialogue state tracking (DST) aims to extract essential information from multi-turn dialogue situations and take appropriate actions. A belief state, one of the core pieces of information, refers to the subject and its specific content, and appears in the form of domain-slot-value. The trained model predicts "accumulated" belief states in every turn, and joint goal accuracy and slot accuracy are mainly used to evaluate the prediction; however, we specify that the current evaluation metrics have a critical limitation when evaluating belief states accumulated as the dialogue proceeds, especially in the most used MultiWOZ dataset. Additionally, we propose relative slot accuracy to complement existing metrics. Relative slot accuracy does not depend on the number of predefined slots, and allows intuitive evaluation by assigning relative scores according to the turn of each dialogue. This study also encourages not solely the reporting of joint goal accuracy, but also various complementary metrics in DST tasks for the sake of a realistic evaluation. △ Less

Submitted 31 March, 2022; v1 submitted 6 March, 2022; originally announced March 2022.

Comments: ACL 2022 (short)

arXiv:2202.10001 [pdf, other]

Recurrent Auto-Encoder With Multi-Resolution Ensemble and Predictive Coding for Multivariate Time-Series Anomaly Detection

Authors: Heejeong Choi, Subin Kim, Pilsung Kang

Abstract: As large-scale time-series data can easily be found in real-world applications, multivariate time-series anomaly detection has played an essential role in diverse industries. It enables productivity improvement and maintenance cost reduction by preventing malfunctions and detecting anomalies based on time-series data. However, multivariate time-series anomaly detection is challenging because real-… ▽ More As large-scale time-series data can easily be found in real-world applications, multivariate time-series anomaly detection has played an essential role in diverse industries. It enables productivity improvement and maintenance cost reduction by preventing malfunctions and detecting anomalies based on time-series data. However, multivariate time-series anomaly detection is challenging because real-world time-series data exhibit complex temporal dependencies. For this task, it is crucial to learn a rich representation that effectively contains the nonlinear temporal dynamics of normal behavior. In this study, we propose an unsupervised multivariate time-series anomaly detection model named RAE-MEPC which learns informative normal representations based on multi-resolution ensemble and predictive coding. We introduce multi-resolution ensemble encoding to capture the multi-scale dependency from the input time series. The encoder hierarchically aggregates the temporal features extracted from the sub-encoders with different encoding lengths. From these encoded features, the reconstruction decoder reconstructs the input time series based on multi-resolution ensemble decoding where lower-resolution information helps to decode sub-decoders with higher-resolution outputs. Predictive coding is further introduced to encourage the model to learn the temporal dependencies of the time series. Experiments on real-world benchmark datasets show that the proposed model outperforms the benchmark models for multivariate time-series anomaly detection. △ Less

Submitted 21 February, 2022; originally announced February 2022.

arXiv:2201.06071 [pdf, ps, other]

Memory Efficient Mutual Information-Maximizing Quantized Min-Sum Decoding for Rate-Compatible LDPC Codes

Authors: Peng Kang, Kui Cai, Xuan He, Jinhong Yuan

Abstract: In this letter, we propose a two-stage design method to construct memory efficient mutual information-maximizing quantized min-sum (MIM-QMS) decoder for rate-compatible low-density parity-check (LDPC) codes. We first develop a modified density evolution to design a unique set of lookup tables (LUTs) that can be used for rate-compatible LDPC codes. The constructed LUTs are optimized based on their… ▽ More In this letter, we propose a two-stage design method to construct memory efficient mutual information-maximizing quantized min-sum (MIM-QMS) decoder for rate-compatible low-density parity-check (LDPC) codes. We first develop a modified density evolution to design a unique set of lookup tables (LUTs) that can be used for rate-compatible LDPC codes. The constructed LUTs are optimized based on their discrepancy values and a merge function to reduce the memory requirement. Numerical results show that the proposed rate-compatible MIM-QMS decoder can reduce the memory requirement for decoding by up to 94.92% compared to the benchmark rate-compatible LUT-based decoder with generally faster convergence speed. In addition, the proposed decoder can approach the performance of the floating-pointing belief propagation decoder within 0.15 dB. △ Less

Submitted 16 January, 2022; originally announced January 2022.

Comments: This paper is an extended version of the manuscript submitted to IEEE Communications Letters

arXiv:2111.09564 [pdf, other]

LAnoBERT: System Log Anomaly Detection based on BERT Masked Language Model

Authors: Yukyung Lee, Jina Kim, Pilsung Kang

Abstract: The system log generated in a computer system refers to large-scale data that are collected simultaneously and used as the basic data for determining errors, intrusion and abnormal behaviors. The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention, which is a critical problem in the industry. Previous studies performed anomaly detection through… ▽ More The system log generated in a computer system refers to large-scale data that are collected simultaneously and used as the basic data for determining errors, intrusion and abnormal behaviors. The aim of system log anomaly detection is to promptly identify anomalies while minimizing human intervention, which is a critical problem in the industry. Previous studies performed anomaly detection through algorithms after converting various forms of log data into a standardized template using a parser. Particularly, a template corresponding to a specific event should be defined in advance for all the log data using which the information within the log key may get lost. In this study, we propose LAnoBERT, a parser free system log anomaly detection method that uses the BERT model, exhibiting excellent natural language processing performance. The proposed method, LAnoBERT, learns the model through masked language modeling, which is a BERT-based pre-training method, and proceeds with unsupervised learning-based anomaly detection using the masked language modeling loss function per log key during the test process. In addition, we also propose an efficient inference process to establish a practically applicable pipeline to the actual system. Experiments on three well-known log datasets, i.e., HDFS, BGL, and Thunderbird, show that not only did LAnoBERT yield a higher anomaly detection performance compared to unsupervised learning-based benchmark models, but also it resulted in a comparable performance with supervised learning-based benchmark models. △ Less

Submitted 23 July, 2023; v1 submitted 18 November, 2021; originally announced November 2021.

arXiv:2110.05172 [pdf, other]

K-Wav2vec 2.0: Automatic Speech Recognition based on Joint Decoding of Graphemes and Syllables

Authors: Jounghee Kim, Pilsung Kang

Abstract: Wav2vec 2.0 is an end-to-end framework of self-supervised learning for speech representation that is successful in automatic speech recognition (ASR), but most of the work on the topic has been developed with a single language: English. Therefore, it is unclear whether the self-supervised framework is effective in recognizing other languages with different writing systems, such as Korean which use… ▽ More Wav2vec 2.0 is an end-to-end framework of self-supervised learning for speech representation that is successful in automatic speech recognition (ASR), but most of the work on the topic has been developed with a single language: English. Therefore, it is unclear whether the self-supervised framework is effective in recognizing other languages with different writing systems, such as Korean which uses the Hangul having a unique writing system. In this paper, we present K-Wav2Vec 2.0, which is a modified version of Wav2vec 2.0 designed for Korean automatic speech recognition by exploring and optimizing various factors of the original Wav2vec 2.0. In fine-tuning, we propose a multi-task hierarchical architecture to reflect the Korean writing structure. Moreover, a joint decoder is applied to alleviate the problem of words existing outside of the vocabulary. In pre-training, we attempted the cross-lingual transfer of the pre-trained model by further pre-training the English Wav2vec 2.0 on a Korean dataset, considering limited resources. Our experimental results demonstrate that the proposed method yields the best performance on both Korean ASR datasets: Ksponspeech (a large-scale Korean speech corpus) and Clovacall (a call-based dialog corpus). Further pre-training is also effective in language adaptation, leading to large improvements without additional data. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: 13 pages, 4 figures

arXiv:2108.12637 [pdf, other]

Oh My Mistake!: Toward Realistic Dialogue State Tracking including Turnback Utterances

Authors: Takyoung Kim, Yukyung Lee, Hoonsang Yoon, Pilsung Kang, Junseong Bang, Misuk Kim

Abstract: The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds from time to time during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no… ▽ More The primary purpose of dialogue state tracking (DST), a critical component of an end-to-end conversational system, is to build a model that responds well to real-world situations. Although we often change our minds from time to time during ordinary conversations, current benchmark datasets do not adequately reflect such occurrences and instead consist of over-simplified conversations, in which no one changes their mind during a conversation. As the main question inspiring the present study, "Are current benchmark datasets sufficiently diverse to handle casual conversations in which one changes their mind after a certain topic is over?" We found that the answer is "No" because DST models cannot refer to previous user preferences when template-based turnback utterances are injected into the dataset. Even in the the simplest mind-changing (turnback) scenario, the performance of DST models significantly degenerated. However, we found that this performance degeneration can be recovered when the turnback scenarios are explicitly designed in the training set, implying that the problem is not with the DST models but rather with the construction of the benchmark dataset. △ Less

Submitted 12 October, 2022; v1 submitted 28 August, 2021; originally announced August 2021.

Comments: SereTOD Workshop at EMNLP 2022

arXiv:2107.10474 [pdf, other]

Back-Translated Task Adaptive Pretraining: Improving Accuracy and Robustness on Text Classification

Authors: Junghoon Lee, Jounghee Kim, Pilsung Kang

Abstract: Language models (LMs) pretrained on a large text corpus and fine-tuned on a downstream text corpus and fine-tuned on a downstream task becomes a de facto training strategy for several natural language processing (NLP) tasks. Recently, an adaptive pretraining method retraining the pretrained language model with task-relevant data has shown significant performance improvements. However, current adap… ▽ More Language models (LMs) pretrained on a large text corpus and fine-tuned on a downstream text corpus and fine-tuned on a downstream task becomes a de facto training strategy for several natural language processing (NLP) tasks. Recently, an adaptive pretraining method retraining the pretrained language model with task-relevant data has shown significant performance improvements. However, current adaptive pretraining methods suffer from underfitting on the task distribution owing to a relatively small amount of data to re-pretrain the LM. To completely use the concept of adaptive pretraining, we propose a back-translated task-adaptive pretraining (BT-TAPT) method that increases the amount of task-specific data for LM re-pretraining by augmenting the task data using back-translation to generalize the LM to the target task domain. The experimental results show that the proposed BT-TAPT yields improved classification accuracy on both low- and high-resource data and better robustness to noise than the conventional adaptive pretraining method. △ Less

Submitted 22 July, 2021; originally announced July 2021.

arXiv:2105.02942 [pdf, other]

Understanding Catastrophic Overfitting in Adversarial Training

Authors: Peilin Kang, Seyed-Mohsen Moosavi-Dezfooli

Abstract: Recently, FGSM adversarial training is found to be able to train a robust model which is comparable to the one trained by PGD but an order of magnitude faster. However, there is a failure mode called catastrophic overfitting (CO) that the classifier loses its robustness suddenly during the training and hardly recovers by itself. In this paper, we find CO is not only limited to FGSM, but also happe… ▽ More Recently, FGSM adversarial training is found to be able to train a robust model which is comparable to the one trained by PGD but an order of magnitude faster. However, there is a failure mode called catastrophic overfitting (CO) that the classifier loses its robustness suddenly during the training and hardly recovers by itself. In this paper, we find CO is not only limited to FGSM, but also happens in $\mbox{DF}^{\infty}$-1 adversarial training. Then, we analyze the geometric properties for both FGSM and $\mbox{DF}^{\infty}$-1 and find they have totally different decision boundaries after CO. For FGSM, a new decision boundary is generated along the direction of perturbation and makes the small perturbation more effective than the large one. While for $\mbox{DF}^{\infty}$-1, there is no new decision boundary generated along the direction of perturbation, instead the perturbation generated by $\mbox{DF}^{\infty}$-1 becomes smaller after CO and thus loses its effectiveness. We also experimentally analyze three hypotheses on potential factors causing CO. And then based on the empirical analysis, we modify the RS-FGSM by not projecting perturbation back to the $l_\infty$ ball. By this small modification, we could achieve $47.56 \pm 0.37\% $ PGD-50-10 accuracy on CIFAR10 with $ε=8/255$ in contrast to $43.57 \pm 0.30\% $ by RS-FGSM and also further extend the working range of $ε$ from 8/255 to 11/255 on CIFAR10 without CO occurring. △ Less

Submitted 6 May, 2021; originally announced May 2021.

arXiv:2104.11421 [pdf, other]

A Framework for Recognizing and Estimating Human Concentration Levels

Authors: Woodo Lee, Jakyung Koo, Nokyung Park, Pilgu Kang, Jeakwon Shim

Abstract: One of the major tasks in online education is to estimate the concentration levels of each student. Previous studies have a limitation of classifying the levels using discrete states only. The purpose of this paper is to estimate the subtle levels as specified states by using the minimum amount of body movement data. This is done by a framework composed of a Deep Neural Network and Kalman Filter.… ▽ More One of the major tasks in online education is to estimate the concentration levels of each student. Previous studies have a limitation of classifying the levels using discrete states only. The purpose of this paper is to estimate the subtle levels as specified states by using the minimum amount of body movement data. This is done by a framework composed of a Deep Neural Network and Kalman Filter. Using this framework, we successfully extracted the concentration levels, which can be used to aid lecturers and expand to other areas. △ Less

Submitted 23 April, 2021; originally announced April 2021.

arXiv:2012.08888 [pdf, ps, other]

Solving the Travelling Thief Problem based on Item Selection Weight and Reverse Order Allocation

Authors: Lei Yang, Zitong Zhang, Xiaotian Jia, Peipei Kang, Wensheng Zhang, Dongya Wang

Abstract: The Travelling Thief Problem (TTP) is a challenging combinatorial optimization problem that attracts many scholars. The TTP interconnects two well-known NP-hard problems: the Travelling Salesman Problem (TSP) and the 0-1 Knapsack Problem (KP). Increasingly algorithms have been proposed for solving this novel problem that combines two interdependent sub-problems. In this paper, TTP is investigated… ▽ More The Travelling Thief Problem (TTP) is a challenging combinatorial optimization problem that attracts many scholars. The TTP interconnects two well-known NP-hard problems: the Travelling Salesman Problem (TSP) and the 0-1 Knapsack Problem (KP). Increasingly algorithms have been proposed for solving this novel problem that combines two interdependent sub-problems. In this paper, TTP is investigated theoretically and empirically. An algorithm based on the score value calculated by our proposed formulation in picking items and sorting items in the reverse order in the light of the scoring value is proposed to solve the problem. Different approaches for solving the TTP are compared and analyzed; the experimental investigations suggest that our proposed approach is very efficient in meeting or beating current state-of-the-art heuristic solutions on a comprehensive set of benchmark TTP instances. △ Less

Submitted 16 December, 2020; originally announced December 2020.

arXiv:2011.13147 [pdf, other]

Generalized Mutual Information-Maximizing Quantized Decoding of LDPC Codes with Layered Scheduling

Authors: Peng Kang, Kui Cai, Xuan He, Shuangyang Li, Jinhong Yuan

Abstract: In this paper, we propose a framework of the mutual information-maximizing (MIM) quantized decoding for low-density parity-check (LDPC) codes by using simple mappings and fixed-point additions. Our decoding method is generic in the sense that it can be applied to LDPC codes with arbitrary degree distributions, and can be implemented based on either the belief propagation (BP) algorithm or the min-… ▽ More In this paper, we propose a framework of the mutual information-maximizing (MIM) quantized decoding for low-density parity-check (LDPC) codes by using simple mappings and fixed-point additions. Our decoding method is generic in the sense that it can be applied to LDPC codes with arbitrary degree distributions, and can be implemented based on either the belief propagation (BP) algorithm or the min-sum (MS) algorithm. In particular, we propose the MIM density evolution (MIM-DE) to construct the lookup tables (LUTs) for the node updates. The computational complexity and memory requirements are discussed and compared to the LUT decoder variants. For applications with low-latency requirement, we consider the layered schedule to accelerate the convergence speed of decoding quasi-cyclic LDPC codes. In particular, we develop the layered MIM-DE to design the LUTs based on the MS algorithm, leading to the MIM layered quantized MS (MIM-LQMS) decoder. An optimization method is further introduced to reduce the memory requirement for storing the LUTs. Simulation results show that the MIM quantized decoders outperform the state-of-the-art LUT decoders in the waterfall region with both 3-bit and 4-bit precision over the additive white Gaussian noise channels. For small decoding iterations, the MIM quantized decoders also achieve a faster convergence speed compared to the benchmarks. Moreover, the 4-bit MIM-LQMS decoder can approach the error performance of the floating-point layered BP decoder within 0.3 dB in the moderate-to-high SNR regions, over both the AWGN channels and the fast fading channels. △ Less

Submitted 12 February, 2022; v1 submitted 26 November, 2020; originally announced November 2020.

Comments: This paper is an extended version of the manuscript submitted to the IEEE Transactions on Vehicular Technology

arXiv:2009.08128 [pdf, other]

doi 10.18653/v1/2020.findings-emnlp.99

Multi$^2$OIE: Multilingual Open Information Extraction Based on Multi-Head Attention with BERT

Authors: Youngbin Ro, Yukyung Lee, Pilsung Kang

Abstract: In this paper, we propose Multi$^2$OIE, which performs open information extraction (open IE) by combining BERT with multi-head attention. Our model is a sequence-labeling system with an efficient and effective argument extraction method. We use a query, key, and value setting inspired by the Multimodal Transformer to replace the previously used bidirectional long short-term memory architecture wit… ▽ More In this paper, we propose Multi$^2$OIE, which performs open information extraction (open IE) by combining BERT with multi-head attention. Our model is a sequence-labeling system with an efficient and effective argument extraction method. We use a query, key, and value setting inspired by the Multimodal Transformer to replace the previously used bidirectional long short-term memory architecture with multi-head attention. Multi$^2$OIE outperforms existing sequence-labeling systems with high computational efficiency on two benchmark evaluation datasets, Re-OIE2016 and CaRB. Additionally, we apply the proposed method to multilingual open IE using multilingual BERT. Experimental results on new benchmark datasets introduced for two languages (Spanish and Portuguese) demonstrate that our model outperforms other multilingual systems without training data for the target languages. △ Less

Submitted 7 October, 2020; v1 submitted 17 September, 2020; originally announced September 2020.

Comments: 11 pages, Findings of EMNLP 2020

arXiv:1906.09217 [pdf, other]

Hierarchical Gating Networks for Sequential Recommendation

Authors: Chen Ma, Peng Kang, Xue Liu

Abstract: The chronological order of user-item interactions is a key feature in many recommender systems, where the items that users will interact may largely depend on those items that users just accessed recently. However, with the tremendous increase of users and items, sequential recommender systems still face several challenging problems: (1) the hardness of modeling the long-term user interests from s… ▽ More The chronological order of user-item interactions is a key feature in many recommender systems, where the items that users will interact may largely depend on those items that users just accessed recently. However, with the tremendous increase of users and items, sequential recommender systems still face several challenging problems: (1) the hardness of modeling the long-term user interests from sparse implicit feedback; (2) the difficulty of capturing the short-term user interests given several items the user just accessed. To cope with these challenges, we propose a hierarchical gating network (HGN), integrated with the Bayesian Personalized Ranking (BPR) to capture both the long-term and short-term user interests. Our HGN consists of a feature gating module, an instance gating module, and an item-item product module. In particular, our feature gating and instance gating modules select what item features can be passed to the downstream layers from the feature and instance levels, respectively. Our item-item product module explicitly captures the item relations between the items that users accessed in the past and those items users will access in the future. We extensively evaluate our model with several state-of-the-art methods and different validation metrics on five real-world datasets. The experimental results demonstrate the effectiveness of our model on Top-N sequential recommendation. △ Less

Submitted 21 June, 2019; originally announced June 2019.

Comments: Accepted by the 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019 Research Track), code available:https://github.com/allenjack/HGN

arXiv:1904.06666 [pdf, other]

Mutual Information-Maximizing Quantized Belief Propagation Decoding of Regular LDPC Codes

Authors: Xuan He, Kui Cai, Zhen Mei, Peng Kang, Xiaohu Tang

Abstract: In this paper, we propose a class of finite alphabet iterative decoder (FAID), called mutual information-maximizing quantized belief propagation (MIM-QBP) decoder, for decoding regular low-density parity-check (LDPC) codes. Our decoder follows the reconstruction-calculation-quantization (RCQ) decoding architecture that is widely used in FAIDs. We present the first complete and systematic design fr… ▽ More In this paper, we propose a class of finite alphabet iterative decoder (FAID), called mutual information-maximizing quantized belief propagation (MIM-QBP) decoder, for decoding regular low-density parity-check (LDPC) codes. Our decoder follows the reconstruction-calculation-quantization (RCQ) decoding architecture that is widely used in FAIDs. We present the first complete and systematic design framework for the RCQ parameters, and prove that our design with sufficient precision at node update is able to maximize the mutual information between coded bits and exchanged messages. Simulation results show that the MIM-QBP decoder can always considerably outperform the state-of-the-art mutual information-maximizing FAIDs that adopt two-input single-output lookup tables for decoding. Furthermore, with only 3 bits being used for each exchanged message, the MIM-QBP decoder can outperform the floating-point belief propagation decoder at the high signal-to-noise ratio regions when testing on high-rate LDPC codes with a maximum of 10 and 30 iterations. △ Less

Submitted 16 December, 2022; v1 submitted 14 April, 2019; originally announced April 2019.

arXiv:1901.05219 [pdf, other]

Sentence transition matrix: An efficient approach that preserves sentence semantics

Authors: Myeongjun Jang, Pilsung Kang

Abstract: Sentence embedding is a significant research topic in the field of natural language processing (NLP). Generating sentence embedding vectors reflecting the intrinsic meaning of a sentence is a key factor to achieve an enhanced performance in various NLP tasks such as sentence classification and document summarization. Therefore, various sentence embedding models based on supervised and unsupervised… ▽ More Sentence embedding is a significant research topic in the field of natural language processing (NLP). Generating sentence embedding vectors reflecting the intrinsic meaning of a sentence is a key factor to achieve an enhanced performance in various NLP tasks such as sentence classification and document summarization. Therefore, various sentence embedding models based on supervised and unsupervised learning have been proposed after the advent of researches regarding the distributed representation of words. They were evaluated through semantic textual similarity (STS) tasks, which measure the degree of semantic preservation of a sentence and neural network-based supervised embedding models generally yielded state-of-the-art performance. However, these models have a limitation in that they have multiple parameters to update, thereby requiring a tremendous amount of labeled training data. In this study, we propose an efficient approach that learns a transition matrix that refines a sentence embedding vector to reflect the latent semantic meaning of a sentence. The proposed method has two practical advantages; (1) it can be applied to any sentence embedding method, and (2) it can achieve robust performance in STS tasks irrespective of the number of training examples. △ Less

Submitted 16 January, 2019; originally announced January 2019.

Comments: 11 pages

arXiv:1901.04268 [pdf, other]

Learning Shared Semantic Space with Correlation Alignment for Cross-modal Event Retrieval

Authors: Zhenguo Yang, Zehang Lin, Peipei Kang, Jianming Lv, Qing Li, Wenyin Liu

Abstract: In this paper, we propose to learn shared semantic space with correlation alignment (${S}^{3}CA$) for multimodal data representations, which aligns nonlinear correlations of multimodal data distributions in deep neural networks designed for heterogeneous data. In the context of cross-modal (event) retrieval, we design a neural network with convolutional layers and fully-connected layers to extract… ▽ More In this paper, we propose to learn shared semantic space with correlation alignment (${S}^{3}CA$) for multimodal data representations, which aligns nonlinear correlations of multimodal data distributions in deep neural networks designed for heterogeneous data. In the context of cross-modal (event) retrieval, we design a neural network with convolutional layers and fully-connected layers to extract features for images, including images on Flickr-like social media. Simultaneously, we exploit a fully-connected neural network to extract semantic features for texts, including news articles from news media. In particular, nonlinear correlations of layer activations in the two neural networks are aligned with correlation alignment during the joint training of the networks. Furthermore, we project the multimodal data into a shared semantic space for cross-modal (event) retrieval, where the distances between heterogeneous data samples can be measured directly. In addition, we contribute a Wiki-Flickr Event dataset, where the multimodal data samples are not describing each other in pairs like the existing paired datasets, but all of them are describing semantic events. Extensive experiments conducted on both paired and unpaired datasets manifest the effectiveness of ${S}^{3}CA$, outperforming the state-of-the-art methods. △ Less

Submitted 21 May, 2019; v1 submitted 14 January, 2019; originally announced January 2019.

Comments: 22 pages, submitted to ACM Transactions on Multimedia Computing Communications and Applications(ACM TOMM)

arXiv:1812.02869 [pdf, other]

Gated Attentive-Autoencoder for Content-Aware Recommendation

Authors: Chen Ma, Peng Kang, Bin Wu, Qinglong Wang, Xue Liu

Abstract: The rapid growth of Internet services and mobile devices provides an excellent opportunity to satisfy the strong demand for the personalized item or product recommendation. However, with the tremendous increase of users and items, personalized recommender systems still face several challenging problems: (1) the hardness of exploiting sparse implicit feedback; (2) the difficulty of combining hetero… ▽ More The rapid growth of Internet services and mobile devices provides an excellent opportunity to satisfy the strong demand for the personalized item or product recommendation. However, with the tremendous increase of users and items, personalized recommender systems still face several challenging problems: (1) the hardness of exploiting sparse implicit feedback; (2) the difficulty of combining heterogeneous data. To cope with these challenges, we propose a gated attentive-autoencoder (GATE) model, which is capable of learning fused hidden representations of items' contents and binary ratings, through a neural gating structure. Based on the fused representations, our model exploits neighboring relations between items to help infer users' preferences. In particular, a word-level and a neighbor-level attention module are integrated with the autoencoder. The word-level attention learns the item hidden representations from items' word sequences, while favoring informative words by assigning larger attention weights. The neighbor-level attention learns the hidden representation of an item's neighborhood by considering its neighbors in a weighted manner. We extensively evaluate our model with several state-of-the-art methods and different validation metrics on four real-world datasets. The experimental results not only demonstrate the effectiveness of our model on top-N recommendation but also provide interpretable results attributed to the attention modules. △ Less

Submitted 6 December, 2018; originally announced December 2018.

Comments: Accepted by the 12th ACM International Conference on Web Search and Data Mining (WSDM 2019), code available: https://github.com/allenjack/GATE

arXiv:1810.13111 [pdf, ps, other]

Enhanced Quasi-Maximum Likelihood Decoding of Short LDPC Codes based on Saturation

Authors: Peng Kang, Yixuan Xie, Lei Yang, Chen Zheng, Jinhong Yuan, Yuejun Wei

Abstract: In this paper, we propose an enhanced quasi-maximum likelihood (EQML) decoder for LDPC codes with short block lengths. After the failure of the conventional belief propagation (BP) decoding, the proposed EQML decoder selects unreliable variable nodes (VNs) and saturates their associated channel output values to generate a list of decoder input sequences. Each decoder input sequence in the list is… ▽ More In this paper, we propose an enhanced quasi-maximum likelihood (EQML) decoder for LDPC codes with short block lengths. After the failure of the conventional belief propagation (BP) decoding, the proposed EQML decoder selects unreliable variable nodes (VNs) and saturates their associated channel output values to generate a list of decoder input sequences. Each decoder input sequence in the list is then decoded by the conventional BP decoder to obtain the most likely codeword. To improve the accuracy of selecting unreliable VNs, we propose an edge-wise selection method based on the sign fluctuation of VNs' extrinsic messages. A partial pruning stopping (PPS) rule is also presented to reduce the decoding latency. Simulation results show that the proposed EQML decoder outperforms the conventional BP decoder and the augmented BP decoder for short LDPC codes. It even approaches the performance of ML decoding within 0.3 dB in terms of frame error rate. In addition, the proposed PPS rule achieves a lower decoding latency compared to the list decoding stopping rule. △ Less

Submitted 30 January, 2019; v1 submitted 31 October, 2018; originally announced October 2018.

arXiv:1808.05505 [pdf, other]

Paraphrase Thought: Sentence Embedding Module Imitating Human Language Recognition

Authors: Myeongjun Jang, Pilsung Kang

Abstract: Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine translation and document classification. Thus far, various sentence embedding models have been proposed,… ▽ More Sentence embedding is an important research topic in natural language processing. It is essential to generate a good embedding vector that fully reflects the semantic meaning of a sentence in order to achieve an enhanced performance for various natural language processing tasks, such as machine translation and document classification. Thus far, various sentence embedding models have been proposed, and their feasibility has been demonstrated through good performances on tasks following embedding, such as sentiment analysis and sentence classification. However, because the performances of sentence classification and sentiment analysis can be enhanced by using a simple sentence representation method, it is not sufficient to claim that these models fully reflect the meanings of sentences based on good performances for such tasks. In this paper, inspired by human language recognition, we propose the following concept of semantic coherence, which should be satisfied for a good sentence embedding method: similar sentences should be located close to each other in the embedding space. Then, we propose the Paraphrase-Thought (P-thought) model to pursue semantic coherence as much as possible. Experimental results on two paraphrase identification datasets (MS COCO and STS benchmark) show that the P-thought models outperform the benchmarked sentence embedding methods. △ Less

Submitted 14 October, 2018; v1 submitted 16 August, 2018; originally announced August 2018.

Comments: 10 pages

arXiv:1806.06190

Recurrent neural network-based user authentication for freely typed keystroke data

Authors: Junhong Kim, Pilsung Kang

Abstract: Keystroke dynamics-based user authentication (KDA) based on long and freely typed text is an enhanced user authentication method that can not only identify the validity of current users during login but also continuously monitors the consistency of typing behavior after the login process. Previous long and freely typed text-based KDA methods had difficulty incorporating the key sequence informatio… ▽ More Keystroke dynamics-based user authentication (KDA) based on long and freely typed text is an enhanced user authentication method that can not only identify the validity of current users during login but also continuously monitors the consistency of typing behavior after the login process. Previous long and freely typed text-based KDA methods had difficulty incorporating the key sequence information and handling variable lengths of keystrokes, which in turn resulted in lower authentication performance compared to KDA methods based on short and fixed-length text. To overcome these limitations, we propose a recurrent neural network (RNN)-based KDA model. As the RNN model can process an arbitrary length of input and target sequences, our proposed model takes two consecutive keys as the input sequence and actual typing time for the corresponding key sequence as the target sequence. Based on experimental results involving 120 participants, our proposed RNN-KDA model yielded the best authentication performance for all training and test length combinations in terms of equal error rate (EER). It achieved a 5%-6% EER using only 10 test keystrokes while the EERs of other benchmark methods were above 20%. In addition, its performance steadily and more rapidly improves compared to the benchmark methods when the length of training keystrokes increases. △ Less

Submitted 25 April, 2019; v1 submitted 16 June, 2018; originally announced June 2018.

Comments: We found a code error

MSC Class: 68T99

arXiv:1805.02902 [pdf, ps, other]

Reliability-Based Windowed Decoding for Spatially-Coupled LDPC Codes

Authors: Peng Kang, Yixuan Xie, Lei Yang, Jinhong Yuan

Abstract: In this letter, we propose a reliability-based windowed decoding scheme for spatially-coupled (SC) low-density parity-check (LDPC) codes. To mitigate the error propagation along the sliding windowed decoder of the SC LDPC codes, a partial message reservation (PMR) method is proposed where only the reliable messages generated in the previous decoding window are reserved for the next decoding window… ▽ More In this letter, we propose a reliability-based windowed decoding scheme for spatially-coupled (SC) low-density parity-check (LDPC) codes. To mitigate the error propagation along the sliding windowed decoder of the SC LDPC codes, a partial message reservation (PMR) method is proposed where only the reliable messages generated in the previous decoding window are reserved for the next decoding window. We also propose a partial syndrome check (PSC) stopping rule for each decoding window, in which only the complete VNs are checked. Simulation results show that our proposed scheme significantly improves the error floor performance compared to the sliding windowed decoder with the conventional weighted bit-flipping (WBF) algorithm. △ Less

Submitted 8 May, 2018; v1 submitted 8 May, 2018; originally announced May 2018.

arXiv:1802.03238 [pdf, other]

Recurrent Neural Network-Based Semantic Variational Autoencoder for Sequence-to-Sequence Learning

Authors: Myeongjun Jang, Seungwan Seo, Pilsung Kang

Abstract: Sequence-to-sequence (Seq2seq) models have played an important role in the recent success of various natural language processing methods, such as machine translation, text summarization, and speech recognition. However, current Seq2seq models have trouble preserving global latent information from a long sequence of words. Variational autoencoder (VAE) alleviates this problem by learning a continuo… ▽ More Sequence-to-sequence (Seq2seq) models have played an important role in the recent success of various natural language processing methods, such as machine translation, text summarization, and speech recognition. However, current Seq2seq models have trouble preserving global latent information from a long sequence of words. Variational autoencoder (VAE) alleviates this problem by learning a continuous semantic space of the input sentence. However, it does not solve the problem completely. In this paper, we propose a new recurrent neural network (RNN)-based Seq2seq model, RNN semantic variational autoencoder (RNN--SVAE), to better capture the global latent information of a sequence of words. To reflect the meaning of words in a sentence properly, without regard to its position within the sentence, we construct a document information vector using the attention information between the final state of the encoder and every prior hidden state. Then, the mean and standard deviation of the continuous semantic space are learned by using this vector to take advantage of the variational method. By using the document information vector to find the semantic space of the sentence, it becomes possible to better capture the global latent feature of the sentence. Experimental results of three natural language tasks (i.e., language modeling, missing word imputation, paraphrase identification) confirm that the proposed RNN--SVAE yields higher performance than two benchmark models. △ Less

Submitted 2 June, 2018; v1 submitted 9 February, 2018; originally announced February 2018.

Comments: 14 pages

arXiv:1709.09885 [pdf, other]

Sentiment Classification with Word Attention based on Weakly Supervised Learning with a Convolutional Neural Network

Authors: Gichang Lee, Jaeyun Jeong, Seungwan Seo, CzangYeob Kim, Pilsung Kang

Abstract: In order to maximize the applicability of sentiment analysis results, it is necessary to not only classify the overall sentiment (positive/negative) of a given document but also to identify the main words that contribute to the classification. However, most datasets for sentiment analysis only have the sentiment label for each document or sentence. In other words, there is no information about whi… ▽ More In order to maximize the applicability of sentiment analysis results, it is necessary to not only classify the overall sentiment (positive/negative) of a given document but also to identify the main words that contribute to the classification. However, most datasets for sentiment analysis only have the sentiment label for each document or sentence. In other words, there is no information about which words play an important role in sentiment classification. In this paper, we propose a method for identifying key words discriminating positive and negative sentences by using a weakly supervised learning method based on a convolutional neural network (CNN). In our model, each word is represented as a continuous-valued vector and each sentence is represented as a matrix whose rows correspond to the word vector used in the sentence. Then, the CNN model is trained using these sentence matrices as inputs and the sentiment labels as the output. Once the CNN model is trained, we implement the word attention mechanism that identifies high-contributing words to classification results with a class activation map, using the weights from the fully connected layer at the end of the learned CNN model. In order to verify the proposed methodology, we evaluated the classification accuracy and inclusion rate of polarity words using two movie review datasets. Experimental result show that the proposed model can not only correctly classify the sentence polarity but also successfully identify the corresponding words with high polarity scores. △ Less

Submitted 28 September, 2017; v1 submitted 28 September, 2017; originally announced September 2017.

Comments: 16 pages

arXiv:1709.00845 [pdf, other]

Semi-supervised Learning with Deep Generative Models for Asset Failure Prediction

Authors: Andre S. Yoon, Taehoon Lee, Yongsub Lim, Deokwoo Jung, Philgyun Kang, Dongwon Kim, Keuntae Park, Yongjin Choi

Abstract: This work presents a novel semi-supervised learning approach for data-driven modeling of asset failures when health status is only partially known in historical data. We combine a generative model parameterized by deep neural networks with non-linear embedding technique. It allows us to build prognostic models with the limited amount of health status information for the precise prediction of futur… ▽ More This work presents a novel semi-supervised learning approach for data-driven modeling of asset failures when health status is only partially known in historical data. We combine a generative model parameterized by deep neural networks with non-linear embedding technique. It allows us to build prognostic models with the limited amount of health status information for the precise prediction of future asset reliability. The proposed method is evaluated on a publicly available dataset for remaining useful life (RUL) estimation, which shows significant improvement even when a fraction of the data with known health status is as sparse as 1% of the total. Our study suggests that the non-linear embedding based on a deep generative model can efficiently regularize a complex model with deep architectures while achieving high prediction accuracy that is far less sensitive to the availability of health status information. △ Less

Submitted 4 September, 2017; originally announced September 2017.

Comments: 9 pages, 6 figures, 1 table, KDD17 Workshop on Machine Learning for Prognostics and Health Management.August 13-17, 2017, Halifax, Nova Scotia - Canada

ACM Class: I.2.6; H.2.8

Showing 1–42 of 42 results for author: Kang, P