subscribe to arXiv mailings

OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation

Authors: Seungbeom Woo, Geonwoo Baek, Taehoon Kim, Jaemin Na, Joong-won Hwang, Wonjun Hwang

Abstract: Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher arch… ▽ More Multi-target domain adaptation (MTDA) for semantic segmentation poses a significant challenge, as it involves multiple target domains with varying distributions. The goal of MTDA is to minimize the domain discrepancies among a single source and multi-target domains, aiming to train a single model that excels across all target domains. Previous MTDA approaches typically employ multiple teacher architectures, where each teacher specializes in one target domain to simplify the task. However, these architectures hinder the student model from fully assimilating comprehensive knowledge from all target-specific teachers and escalate training costs with increasing target domains. In this paper, we propose an ouroboric domain bridging (OurDB) framework, offering an efficient solution to the MTDA problem using a single teacher architecture. This framework dynamically cycles through multiple target domains, aligning each domain individually to restrain the biased alignment problem, and utilizes Fisher information to minimize the forgetting of knowledge from previous target domains. We also propose a context-guided class-wise mixup (CGMix) that leverages contextual information tailored to diverse target contexts in MTDA. Experimental evaluations conducted on four urban driving datasets (i.e., GTA5, Cityscapes, IDD, and Mapillary) demonstrate the superiority of our method over existing state-of-the-art approaches. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.11537 [pdf, other]

Semantic Prompting with Image-Token for Continual Learning

Authors: Jisu Han, Jaemin Na, Wonjun Hwang

Abstract: Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the reliance on the rehearsal buffer. Although this approach has demonstrated outstanding results, existing methods depend on preceding task-selection process to ch… ▽ More Continual learning aims to refine model parameters for new tasks while retaining knowledge from previous tasks. Recently, prompt-based learning has emerged to leverage pre-trained models to be prompted to learn subsequent tasks without the reliance on the rehearsal buffer. Although this approach has demonstrated outstanding results, existing methods depend on preceding task-selection process to choose appropriate prompts. However, imperfectness in task-selection may lead to negative impacts on the performance particularly in the scenarios where the number of tasks is large or task distributions are imbalanced. To address this issue, we introduce I-Prompt, a task-agnostic approach focuses on the visual semantic information of image tokens to eliminate task prediction. Our method consists of semantic prompt matching, which determines prompts based on similarities between tokens, and image token-level prompting, which applies prompts directly to image tokens in the intermediate layers. Consequently, our method achieves competitive performance on four benchmarks while significantly reducing training time compared to state-of-the-art methods. Moreover, we demonstrate the superiority of our method across various scenarios through extensive experiments. △ Less

Submitted 18 March, 2024; originally announced March 2024.

arXiv:2403.09359 [pdf, other]

D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection

Authors: Dinh Phat Do, Taehoon Kim, Jaemin Na, Jiwon Kim, Keonho Lee, Kyunghwan Cho, Wonjun Hwang

Abstract: Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situat… ▽ More Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST. Source code is available at https://github.com/EdwardDo69/D3T . △ Less

Submitted 14 March, 2024; originally announced March 2024.

Comments: Accepted by CVPR 2024. Link: https://github.com/EdwardDo69/D3T

arXiv:2403.06537 [pdf, other]

On the Consideration of AI Openness: Can Good Intent Be Abused?

Authors: Yeeun Kim, Eunkyung Choi, Hyunjun Kim, Hongseok Oh, Hyunseo Shin, Wonseok Hwang

Abstract: Openness is critical for the advancement of science. In particular, recent rapid progress in AI has been made possible only by various open-source models, datasets, and libraries. However, this openness also means that technologies can be freely used for socially harmful purposes. Can open-source models or datasets be used for malicious purposes? If so, how easy is it to adapt technology for such… ▽ More Openness is critical for the advancement of science. In particular, recent rapid progress in AI has been made possible only by various open-source models, datasets, and libraries. However, this openness also means that technologies can be freely used for socially harmful purposes. Can open-source models or datasets be used for malicious purposes? If so, how easy is it to adapt technology for such goals? Here, we conduct a case study in the legal domain, a realm where individual decisions can have profound social consequences. To this end, we build EVE, a dataset consisting of 200 examples of questions and corresponding answers about criminal activities based on 200 Korean precedents. We found that a widely accepted open-source LLM, which initially refuses to answer unethical questions, can be easily tuned with EVE to provide unethical and informative answers about criminal activities. This implies that although open-source technologies contribute to scientific progress, some care must be taken to mitigate possible malicious use cases. Warning: This paper contains contents that some may find unethical. △ Less

Submitted 11 March, 2024; originally announced March 2024.

Comments: 10 pages

arXiv:2402.12806 [pdf, other]

SymBa: Symbolic Backward Chaining for Multi-step Natural Language Reasoning

Authors: Jinu Lee, Wonseok Hwang

Abstract: Large Language Models (LLMs) have recently demonstrated remarkable reasoning ability as in Chain-of-thought prompting, but faithful multi-step reasoning remains a challenge. We specifically focus on backward chaining, where the query is recursively decomposed using logical rules until proven. To address the limitations of current backward chaining implementations, we propose SymBa (Symbolic Backwa… ▽ More Large Language Models (LLMs) have recently demonstrated remarkable reasoning ability as in Chain-of-thought prompting, but faithful multi-step reasoning remains a challenge. We specifically focus on backward chaining, where the query is recursively decomposed using logical rules until proven. To address the limitations of current backward chaining implementations, we propose SymBa (Symbolic Backward Chaining). In SymBa, the symbolic top-down solver controls the entire proof process and the LLM is called to generate a single reasoning step only when the solver encounters a dead end. By this novel solver-LLM integration, while being able to produce an interpretable, structured proof, SymBa achieves significant improvement in performance, proof faithfulness, and efficiency in diverse multi-step reasoning benchmarks (ProofWriter, Birds-Electricity, GSM8k, CLUTRR-TF, ECtHR Article 6) compared to backward chaining baselines. △ Less

Submitted 20 February, 2024; originally announced February 2024.

Comments: 22 pages (8 pages for main text),9 figures

arXiv:2310.18640 [pdf, other]

Switching Temporary Teachers for Semi-Supervised Semantic Segmentation

Authors: Jaemin Na, Jung-Woo Ha, Hyung Jin Chang, Dongyoon Han, Wonjun Hwang

Abstract: The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that the weights of the teacher and student are getting coupled, causing a potential performance bottleneck. Furthermore, this problem may become more severe when t… ▽ More The teacher-student framework, prevalent in semi-supervised semantic segmentation, mainly employs the exponential moving average (EMA) to update a single teacher's weights based on the student's. However, EMA updates raise a problem in that the weights of the teacher and student are getting coupled, causing a potential performance bottleneck. Furthermore, this problem may become more severe when training with more complicated labels such as segmentation masks but with few annotated data. This paper introduces Dual Teacher, a simple yet effective approach that employs dual temporary teachers aiming to alleviate the coupling problem for the student. The temporary teachers work in shifts and are progressively improved, so consistently prevent the teacher and student from becoming excessively close. Specifically, the temporary teachers periodically take turns generating pseudo-labels to train a student model and maintain the distinct characteristics of the student model for each epoch. Consequently, Dual Teacher achieves competitive performance on the PASCAL VOC, Cityscapes, and ADE20K benchmarks with remarkably shorter training times than state-of-the-art methods. Moreover, we demonstrate that our approach is model-agnostic and compatible with both CNN- and Transformer-based models. Code is available at \url{https://github.com/naver-ai/dual-teacher}. △ Less

Submitted 28 October, 2023; originally announced October 2023.

Comments: NeurIPS-2023

arXiv:2310.10549 [pdf, other]

Applications of Distributed Machine Learning for the Internet-of-Things: A Comprehensive Survey

Authors: Mai Le, Thien Huynh-The, Tan Do-Duy, Thai-Hoc Vu, Won-Joo Hwang, Quoc-Viet Pham

Abstract: The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI… ▽ More The emergence of new services and applications in emerging wireless networks (e.g., beyond 5G and 6G) has shown a growing demand for the usage of artificial intelligence (AI) in the Internet of Things (IoT). However, the proliferation of massive IoT connections and the availability of computing resources distributed across future IoT systems have strongly demanded the development of distributed AI for better IoT services and applications. Therefore, existing AI-enabled IoT systems can be enhanced by implementing distributed machine learning (aka distributed learning) approaches. This work aims to provide a comprehensive survey on distributed learning for IoT services and applications in emerging networks. In particular, we first provide a background of machine learning and present a preliminary to typical distributed learning approaches, such as federated learning, multi-agent reinforcement learning, and distributed inference. Then, we provide an extensive review of distributed learning for critical IoT services (e.g., data sharing and computation offloading, localization, mobile crowdsensing, and security and privacy) and IoT applications (e.g., smart healthcare, smart grid, autonomous vehicle, aerial IoT networks, and smart industry). From the reviewed literature, we also present critical challenges of distributed learning for IoT and propose several promising solutions and research directions in this emerging area. △ Less

Submitted 16 October, 2023; originally announced October 2023.

arXiv:2309.14587 [pdf, other]

Joint Communication and Computation Framework for Goal-Oriented Semantic Communication with Distortion Rate Resilience

Authors: Minh-Duong Nguyen, Quang-Vinh Do, Zhaohui Yang, Quoc-Viet Pham, Won-Joo Hwang

Abstract: Recent research efforts on semantic communication have mostly considered accuracy as a main problem for optimizing goal-oriented communication systems. However, these approaches introduce a paradox: the accuracy of artificial intelligence (AI) tasks should naturally emerge through training rather than being dictated by network constraints. Acknowledging this dilemma, this work introduces an innova… ▽ More Recent research efforts on semantic communication have mostly considered accuracy as a main problem for optimizing goal-oriented communication systems. However, these approaches introduce a paradox: the accuracy of artificial intelligence (AI) tasks should naturally emerge through training rather than being dictated by network constraints. Acknowledging this dilemma, this work introduces an innovative approach that leverages the rate-distortion theory to analyze distortions induced by communication and semantic compression, thereby analyzing the learning process. Specifically, we examine the distribution shift between the original data and the distorted data, thus assessing its impact on the AI model's performance. Founding upon this analysis, we can preemptively estimate the empirical accuracy of AI tasks, making the goal-oriented semantic communication problem feasible. To achieve this objective, we present the theoretical foundation of our approach, accompanied by simulations and experiments that demonstrate its effectiveness. The experimental results indicate that our proposed method enables accurate AI task performance while adhering to network constraints, establishing it as a valuable contribution to the field of signal processing. Furthermore, this work advances research in goal-oriented semantic communication and highlights the significance of data-driven approaches in optimizing the performance of intelligent systems. △ Less

Submitted 25 September, 2023; originally announced September 2023.

Comments: 15 pages; 11 figures, 2 tables

MSC Class: 68T05 ACM Class: F.1.3

arXiv:2309.04146 [pdf, other]

NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus

Authors: Kyoungyeon Cho, Seungkum Han, Young Rok Choi, Wonseok Hwang

Abstract: The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structure text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive u… ▽ More The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structure text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive unified "no-code" tools have been available. Here we provide NESTLE, a no-code tool for large-scale statistical analysis of legal corpus. Powered by a Large Language Model (LLM) and the internal custom end-to-end IE system, NESTLE can extract any type of information that has not been predefined in the IE system opening up the possibility of unlimited customizable statistical analysis of the corpus without writing a single line of code. We validate our system on 15 Korean precedent IE tasks and 3 legal text classification tasks from LexGLUE. The comprehensive experiments reveal NESTLE can achieve GPT-4 comparable performance by training the internal IE module with 4 human-labeled, and 192 LLM-labeled examples. △ Less

Submitted 5 February, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: EACL 2024 System Demonstration Track

arXiv:2308.04953 [pdf, other]

Wirelessly Powered Federated Learning Networks: Joint Power Transfer, Data Sensing, Model Training, and Resource Allocation

Authors: Mai Le, Dinh Thai Hoang, Diep N. Nguyen, Won-Joo Hwang, Quoc-Viet Pham

Abstract: Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the fir… ▽ More Federated learning (FL) has found many successes in wireless networks; however, the implementation of FL has been hindered by the energy limitation of mobile devices (MDs) and the availability of training data at MDs. How to integrate wireless power transfer and mobile crowdsensing towards sustainable FL solutions is a research topic entirely missing from the open literature. This work for the first time investigates a resource allocation problem in collaborative sensing-assisted sustainable FL (S2FL) networks with the goal of minimizing the total completion time. We investigate a practical harvesting-sensing-training-transmitting protocol in which energy-limited MDs first harvest energy from RF signals, use it to gain a reward for user participation, sense the training data from the environment, train the local models at MDs, and transmit the model updates to the server. The total completion time minimization problem of jointly optimizing power transfer, transmit power allocation, data sensing, bandwidth allocation, local model training, and data transmission is complicated due to the non-convex objective function, highly non-convex constraints, and strongly coupled variables. We propose a computationally-efficient path-following algorithm to obtain the optimal solution via the decomposition technique. In particular, inner convex approximations are developed for the resource allocation subproblem, and the subproblems are performed alternatively in an iterative fashion. Simulation results are provided to evaluate the effectiveness of the proposed S2FL algorithm in reducing the completion time up to 21.45% in comparison with other benchmark schemes. Further, we investigate an extension of our work from frequency division multiple access (FDMA) to non-orthogonal multiple access (NOMA) and show that NOMA can speed up the total completion time 8.36% on average of the considered FL system. △ Less

Submitted 9 August, 2023; originally announced August 2023.

arXiv:2308.00558 [pdf, other]

Gradient Scaling on Deep Spiking Neural Networks with Spike-Dependent Local Information

Authors: Seongsik Park, Jeonghee Jo, Jongkil Park, Yeonjoo Jeong, Jaewook Kim, Suyoun Lee, Joon Young Kwak, Inho Kim, Jong-Keuk Park, Kyeong Seok Lee, Gye Weon Hwang, Hyun Jae Jang

Abstract: Deep spiking neural networks (SNNs) are promising neural networks for their model capacity from deep neural network architecture and energy efficiency from SNNs' operations. To train deep SNNs, recently, spatio-temporal backpropagation (STBP) with surrogate gradient was proposed. Although deep SNNs have been successfully trained with STBP, they cannot fully utilize spike information. In this work,… ▽ More Deep spiking neural networks (SNNs) are promising neural networks for their model capacity from deep neural network architecture and energy efficiency from SNNs' operations. To train deep SNNs, recently, spatio-temporal backpropagation (STBP) with surrogate gradient was proposed. Although deep SNNs have been successfully trained with STBP, they cannot fully utilize spike information. In this work, we proposed gradient scaling with local spike information, which is the relation between pre- and post-synaptic spikes. Considering the causality between spikes, we could enhance the training performance of deep SNNs. According to our experiments, we could achieve higher accuracy with lower spikes by adopting the gradient scaling on image classification tasks, such as CIFAR10 and CIFAR100. △ Less

Submitted 1 August, 2023; originally announced August 2023.

Comments: ICML-23 Localized Learning Workshop: Decentralized Model Updates via Non-Global Objectives

arXiv:2307.04312 [pdf, other]

Robust Feature Learning Against Noisy Labels

Authors: Tsung-Ming Tai, Yun-Jie Jhang, Wen-Jyi Hwang

Abstract: Supervised learning of deep neural networks heavily relies on large-scale datasets annotated by high-quality labels. In contrast, mislabeled samples can significantly degrade the generalization of models and result in memorizing samples, further learning erroneous associations of data contents to incorrect annotations. To this end, this paper proposes an efficient approach to tackle noisy labels b… ▽ More Supervised learning of deep neural networks heavily relies on large-scale datasets annotated by high-quality labels. In contrast, mislabeled samples can significantly degrade the generalization of models and result in memorizing samples, further learning erroneous associations of data contents to incorrect annotations. To this end, this paper proposes an efficient approach to tackle noisy labels by learning robust feature representation based on unsupervised augmentation restoration and cluster regularization. In addition, progressive self-bootstrapping is introduced to minimize the negative impact of supervision from noisy labels. Our proposed design is generic and flexible in applying to existing classification architectures with minimal overheads. Experimental results show that our proposed method can efficiently and effectively enhance model robustness under severely noisy labels. △ Less

Submitted 9 July, 2023; originally announced July 2023.

arXiv:2306.09707 [pdf, ps, other]

Representation and decomposition of functions in DAG-DNNs and structural network pruning

Authors: Wen-Liang Hwang

Abstract: The conclusions provided by deep neural networks (DNNs) must be carefully scrutinized to determine whether they are universal or architecture dependent. The term DAG-DNN refers to a graphical representation of a DNN in which the architecture is expressed as a direct-acyclic graph (DAG), on which arcs are associated with functions. The level of a node denotes the maximum number of hops between the… ▽ More The conclusions provided by deep neural networks (DNNs) must be carefully scrutinized to determine whether they are universal or architecture dependent. The term DAG-DNN refers to a graphical representation of a DNN in which the architecture is expressed as a direct-acyclic graph (DAG), on which arcs are associated with functions. The level of a node denotes the maximum number of hops between the input node and the node of interest. In the current study, we demonstrate that DAG-DNNs can be used to derive all functions defined on various sub-architectures of the DNN. We also demonstrate that the functions defined in a DAG-DNN can be derived via a sequence of lower-triangular matrices, each of which provides the transition of functions defined in sub-graphs up to nodes at a specified level. The lifting structure associated with lower-triangular matrices makes it possible to perform the structural pruning of a network in a systematic manner. The fact that decomposition is universally applicable to all DNNs means that network pruning could theoretically be applied to any DNN, regardless of the underlying architecture. We demonstrate that it is possible to obtain the winning ticket (sub-network and initialization) for a weak version of the lottery ticket hypothesis, based on the fact that the sub-network with initialization can achieve training performance on par with that of the original network using the same number of iterations or fewer. △ Less

Submitted 16 June, 2023; originally announced June 2023.

arXiv:2305.05175 [pdf, other]

SRIL: Selective Regularization for Class-Incremental Learning

Authors: Jisu Han, Jaemin Na, Wonjun Hwang

Abstract: Human intelligence gradually accepts new information and accumulates knowledge throughout the lifespan. However, deep learning models suffer from a catastrophic forgetting phenomenon, where they forget previous knowledge when acquiring new information. Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge. In this paper, we… ▽ More Human intelligence gradually accepts new information and accumulates knowledge throughout the lifespan. However, deep learning models suffer from a catastrophic forgetting phenomenon, where they forget previous knowledge when acquiring new information. Class-Incremental Learning aims to create an integrated model that balances plasticity and stability to overcome this challenge. In this paper, we propose a selective regularization method that accepts new knowledge while maintaining previous knowledge. We first introduce an asymmetric feature distillation method for old and new classes inspired by cognitive science, using the gradient of classification and knowledge distillation losses to determine whether to perform pattern completion or pattern separation. We also propose a method to selectively interpolate the weight of the previous model for a balance between stability and plasticity, and we adjust whether to transfer through model confidence to ensure the performance of the previous class and enable exploratory learning. We validate the effectiveness of the proposed method, which surpasses the performance of existing methods through extensive experimental protocols using CIFAR-100, ImageNet-Subset, and ImageNet-Full. △ Less

Submitted 9 May, 2023; originally announced May 2023.

Comments: 10 pages, 7 figures

arXiv:2211.01692 [pdf, other]

Data-efficient End-to-end Information Extraction for Statistical Legal Analysis

Authors: Wonseok Hwang, Saehee Eom, Hanuhl Lee, Hai Jin Park, Minjoon Seo

Abstract: Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to… ▽ More Legal practitioners often face a vast amount of documents. Lawyers, for instance, search for appropriate precedents favorable to their clients, while the number of legal precedents is ever-growing. Although legal search engines can assist finding individual target documents and narrowing down the number of candidates, retrieved information is often presented as unstructured text and users have to examine each document thoroughly which could lead to information overloading. This also makes their statistical analysis challenging. Here, we present an end-to-end information extraction (IE) system for legal documents. By formulating IE as a generation task, our system can be easily applied to various tasks without domain-specific engineering effort. The experimental results of four IE tasks on Korean precedents shows that our IE system can achieve competent scores (-2.3 on average) compared to the rule-based baseline with as few as 50 training examples per task and higher score (+5.4 on average) with 200 examples. Finally, our statistical analysis on two case categories--drunk driving and fraud--with 35k precedents reveals the resulting structured information from our IE system faithfully reflects the macroscopic features of Korean legal system. △ Less

Submitted 3 November, 2022; originally announced November 2022.

Comments: NLLP workshop @ EMNLP 2022

arXiv:2209.14520 [pdf, other]

Label driven Knowledge Distillation for Federated Learning with non-IID Data

Authors: Minh-Duong Nguyen, Quoc-Viet Pham, Dinh Thai Hoang, Long Tran-Thanh, Diep N. Nguyen, Won-Joo Hwang

Abstract: In real-world applications, Federated Learning (FL) meets two challenges: (1) scalability, especially when applied to massive IoT networks; and (2) how to be robust against an environment with heterogeneous data. Realizing the first problem, we aim to design a novel FL framework named Full-stack FL (F2L). More specifically, F2L utilizes a hierarchical network architecture, making extending the FL… ▽ More In real-world applications, Federated Learning (FL) meets two challenges: (1) scalability, especially when applied to massive IoT networks; and (2) how to be robust against an environment with heterogeneous data. Realizing the first problem, we aim to design a novel FL framework named Full-stack FL (F2L). More specifically, F2L utilizes a hierarchical network architecture, making extending the FL network accessible without reconstructing the whole network system. Moreover, leveraging the advantages of hierarchical network design, we propose a new label-driven knowledge distillation (LKD) technique at the global server to address the second problem. As opposed to current knowledge distillation techniques, LKD is capable of training a student model, which consists of good knowledge from all teachers' models. Therefore, our proposed algorithm can effectively extract the knowledge of the regions' data distribution (i.e., the regional aggregated models) to reduce the divergence between clients' models when operating under the FL system with non-independent identically distributed data. Extensive experiment results reveal that: (i) our F2L method can significantly improve the overall FL efficiency in all global distillations, and (ii) F2L rapidly achieves convergence as global distillation stages occur instead of increasing on each communication cycle. △ Less

Submitted 29 September, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

Comments: 28 pages, 5 figures, 10 tables

MSC Class: 19A22 ACM Class: I.2.11

arXiv:2207.05381 [pdf, ps, other]

Deriving RIP sensing matrices for sparsifying dictionaries

Authors: Jinn Ho, Wen-Liang Hwang

Abstract: Compressive sensing involves the inversion of a mapping $SD \in \mathbb{R}^{m \times n}$, where $m < n$, $S$ is a sensing matrix, and $D$ is a sparisfying dictionary. The restricted isometry property is a powerful sufficient condition for the inversion that guarantees the recovery of high-dimensional sparse vectors from their low-dimensional embedding into a Euclidean space via convex optimization… ▽ More Compressive sensing involves the inversion of a mapping $SD \in \mathbb{R}^{m \times n}$, where $m < n$, $S$ is a sensing matrix, and $D$ is a sparisfying dictionary. The restricted isometry property is a powerful sufficient condition for the inversion that guarantees the recovery of high-dimensional sparse vectors from their low-dimensional embedding into a Euclidean space via convex optimization. However, determining whether $SD$ has the restricted isometry property for a given sparisfying dictionary is an NP-hard problem, hampering the application of compressive sensing. This paper provides a novel approach to resolving this problem. We demonstrate that it is possible to derive a sensing matrix for any sparsifying dictionary with a high probability of retaining the restricted isometry property. In numerical experiments with sensing matrices for K-SVD, Parseval K-SVD, and wavelets, our recovery performance was comparable to that of benchmarks obtained using Gaussian and Bernoulli random sensing matrices for sparse vectors. △ Less

Submitted 12 July, 2022; originally announced July 2022.

arXiv:2206.06976 [pdf, other]

Resource Allocation for Compression-aided Federated Learning with High Distortion Rate

Authors: Xuan-Tung Nguyen, Minh-Duong Nguyen, Quoc-Viet Pham, Vinh-Quang Do, Won-Joo Hwang

Abstract: Recently, a considerable amount of works have been made to tackle the communication burden in federated learning (FL) (e.g., model quantization, data sparsification, and model compression). However, the existing methods, that boost the communication efficiency in FL, result in a considerable trade-off between communication efficiency and global convergence rate. We formulate an optimization proble… ▽ More Recently, a considerable amount of works have been made to tackle the communication burden in federated learning (FL) (e.g., model quantization, data sparsification, and model compression). However, the existing methods, that boost the communication efficiency in FL, result in a considerable trade-off between communication efficiency and global convergence rate. We formulate an optimization problem for compression-aided FL, which captures the relationship between the distortion rate, number of participating IoT devices, and convergence rate. Following that, the objective function is to minimize the total transmission time for FL convergence. Because the problem is non-convex, we propose to decompose it into sub-problems. Based on the property of a FL model, we first determine the number of IoT devices participating in the FL process. Then, the communication between IoT devices and the server is optimized by efficiently allocating wireless resources based on a coalition game. Our theoretical analysis shows that, by actively controlling the number of participating IoT devices, we can avoid the training divergence of compression-aided FL while maintaining the communication efficiency. △ Less

Submitted 2 June, 2022; originally announced June 2022.

Comments: 6 pages, 4 figures, conference

MSC Class: 60F05; 41-06; 65D99 ACM Class: F.2.2; I.2.11

arXiv:2206.05997 [pdf, ps, other]

Analysis of function approximation and stability of general DNNs in directed acyclic graphs using un-rectifying analysis

Authors: Wen-Liang Hwang, Shih-Shuo Tung

Abstract: A general lack of understanding pertaining to deep feedforward neural networks (DNNs) can be attributed partly to a lack of tools with which to analyze the composition of non-linear functions, and partly to a lack of mathematical models applicable to the diversity of DNN architectures. In this paper, we made a number of basic assumptions pertaining to activation functions, non-linear transformatio… ▽ More A general lack of understanding pertaining to deep feedforward neural networks (DNNs) can be attributed partly to a lack of tools with which to analyze the composition of non-linear functions, and partly to a lack of mathematical models applicable to the diversity of DNN architectures. In this paper, we made a number of basic assumptions pertaining to activation functions, non-linear transformations, and DNN architectures in order to use the un-rectifying method to analyze DNNs via directed acyclic graphs (DAGs). DNNs that satisfy these assumptions are referred to as general DNNs. Our construction of an analytic graph was based on an axiomatic method in which DAGs are built from the bottom-up through the application of atomic operations to basic elements in accordance with regulatory rules. This approach allows us to derive the properties of general DNNs via mathematical induction. We show that using the proposed approach, some properties hold true for general DNNs can be derived. This analysis advances our understanding of network functions and could promote further theoretical insights if the host of analytical tools for graphs can be leveraged. △ Less

Submitted 13 June, 2022; originally announced June 2022.

Comments: 26 pages, 14 figures

arXiv:2206.05224 [pdf, other]

A Multi-Task Benchmark for Korean Legal Language Understanding and Judgement Prediction

Authors: Wonseok Hwang, Dongjun Lee, Kyoungyeon Cho, Hanuhl Lee, Minjoon Seo

Abstract: The recent advances of deep learning have dramatically changed how machine learning, especially in the domain of natural language processing, can be applied to legal domain. However, this shift to the data-driven approaches calls for larger and more diverse datasets, which are nevertheless still small in number, especially in non-English languages. Here we present the first large-scale benchmark o… ▽ More The recent advances of deep learning have dramatically changed how machine learning, especially in the domain of natural language processing, can be applied to legal domain. However, this shift to the data-driven approaches calls for larger and more diverse datasets, which are nevertheless still small in number, especially in non-English languages. Here we present the first large-scale benchmark of Korean legal AI datasets, LBOX OPEN, that consists of one legal corpus, two classification tasks, two legal judgement prediction (LJP) tasks, and one summarization task. The legal corpus consists of 147k Korean precedents (259M tokens), of which 63k are sentenced in last 4 years and 96k are from the first and the second level courts in which factual issues are reviewed. The two classification tasks are case names (11.3k) and statutes (2.8k) prediction from the factual description of individual cases. The LJP tasks consist of (1) 10.5k criminal examples where the model is asked to predict fine amount, imprisonment with labor, and imprisonment without labor ranges for the given facts, and (2) 4.7k civil examples where the inputs are facts and claim for relief and outputs are the degrees of claim acceptance. The summarization task consists of the Supreme Court precedents and the corresponding summaries (20k). We also release realistic variants of the datasets by extending the domain (1) to infrequent case categories in case name (31k examples) and statute (17.7k) classification tasks, and (2) to long input sequences in the summarization task (51k). Finally, we release LCUBE, the first Korean legal language model trained on the legal corpus from this study. Given the uniqueness of the Law of South Korea and the diversity of the legal tasks covered in this work, we believe that LBOX OPEN contributes to the multilinguality of global legal research. LBOX OPEN and LCUBE will be publicly available. △ Less

Submitted 5 October, 2022; v1 submitted 10 June, 2022; originally announced June 2022.

Comments: Accepted at NeurIPS 2022 Datasets and Benchmarks track

arXiv:2206.01186 [pdf, other]

ORC: Network Group-based Knowledge Distillation using Online Role Change

Authors: Junyong Choi, Hyeon Cho, Seokhwa Cheung, Wonjun Hwang

Abstract: In knowledge distillation, since a single, omnipotent teacher network cannot solve all problems, multiple teacher-based knowledge distillations have been studied recently. However, sometimes their improvements are not as good as expected because some immature teachers may transfer the false knowledge to the student. In this paper, to overcome this limitation and take the efficacy of the multiple n… ▽ More In knowledge distillation, since a single, omnipotent teacher network cannot solve all problems, multiple teacher-based knowledge distillations have been studied recently. However, sometimes their improvements are not as good as expected because some immature teachers may transfer the false knowledge to the student. In this paper, to overcome this limitation and take the efficacy of the multiple networks, we divide the multiple networks into teacher and student groups, respectively. That is, the student group is a set of immature networks that require learning the teacher's knowledge, while the teacher group consists of the selected networks that are capable of teaching successfully. We propose our online role change strategy where the top-ranked networks in the student group are able to promote to the teacher group at every iteration. After training the teacher group using the error samples of the student group to refine the teacher group's knowledge, we transfer the collaborative knowledge from the teacher group to the student group successfully. We verify the superiority of the proposed method on CIFAR-10, CIFAR-100, and ImageNet which achieves high performance. We further show the generality of our method with various backbone architectures such as ResNet, WRN, VGG, Mobilenet, and Shufflenet. △ Less

Submitted 8 August, 2023; v1 submitted 1 June, 2022; originally announced June 2022.

Comments: Accepted at ICCV 2023; Supplementary material would be found at CVF Open Access

arXiv:2205.15531 [pdf, other]

itKD: Interchange Transfer-based Knowledge Distillation for 3D Object Detection

Authors: Hyeon Cho, Junyong Choi, Geonwoo Baek, Wonjun Hwang

Abstract: Point-cloud based 3D object detectors recently have achieved remarkable progress. However, most studies are limited to the development of network architectures for improving only their accuracy without consideration of the computational efficiency. In this paper, we first propose an autoencoder-style framework comprising channel-wise compression and decompression via interchange transfer-based kno… ▽ More Point-cloud based 3D object detectors recently have achieved remarkable progress. However, most studies are limited to the development of network architectures for improving only their accuracy without consideration of the computational efficiency. In this paper, we first propose an autoencoder-style framework comprising channel-wise compression and decompression via interchange transfer-based knowledge distillation. To learn the map-view feature of a teacher network, the features from teacher and student networks are independently passed through the shared autoencoder; here, we use a compressed representation loss that binds the channel-wised compression knowledge from both student and teacher networks as a kind of regularization. The decompressed features are transferred in opposite directions to reduce the gap in the interchange reconstructions. Lastly, we present an head attention loss to match the 3D object detection information drawn by the multi-head self-attention mechanism. Through extensive experiments, we verify that our method can train the lightweight model that is well-aligned with the 3D point cloud detection task and we demonstrate its superiority using the well-known public datasets; e.g., Waymo and nuScenes. △ Less

Submitted 27 March, 2023; v1 submitted 31 May, 2022; originally announced May 2022.

Comments: Accepted at CVPR 2023

arXiv:2205.08833 [pdf, other]

Speckle Image Restoration without Clean Data

Authors: Tsung-Ming Tai, Yun-Jie Jhang, Wen-Jyi Hwang, Chau-Jern Cheng

Abstract: Speckle noise is an inherent disturbance in coherent imaging systems such as digital holography, synthetic aperture radar, optical coherence tomography, or ultrasound systems. These systems usually produce only single observation per view angle of the same interest object, imposing the difficulty to leverage the statistic among observations. We propose a novel image restoration algorithm that can… ▽ More Speckle noise is an inherent disturbance in coherent imaging systems such as digital holography, synthetic aperture radar, optical coherence tomography, or ultrasound systems. These systems usually produce only single observation per view angle of the same interest object, imposing the difficulty to leverage the statistic among observations. We propose a novel image restoration algorithm that can perform speckle noise removal without clean data and does not require multiple noisy observations in the same view angle. Our proposed method can also be applied to the situation without knowing the noise distribution as prior. We demonstrate our method is especially well-suited for spectral images by first validating on the synthetic dataset, and also applied on real-world digital holography samples. The results are superior in both quantitative measurement and visual inspection compared to several widely applied baselines. Our method even shows promising results across different speckle noise strengths, without the clean data needed. △ Less

Submitted 18 May, 2022; originally announced May 2022.

arXiv:2204.06760 [pdf, other]

HCFL: A High Compression Approach for Communication-Efficient Federated Learning in Very Large Scale IoT Networks

Authors: Minh-Duong Nguyen, Sang-Min Lee, Quoc-Viet Pham, Dinh Thai Hoang, Diep N. Nguyen, Won-Joo Hwang

Abstract: Federated learning (FL) is a new artificial intelligence concept that enables Internet-of-Things (IoT) devices to learn a collaborative model without sending the raw data to centralized nodes for processing. Despite numerous advantages, low computing resources at IoT devices and high communication costs for exchanging model parameters make applications of FL in massive IoT networks very limited. I… ▽ More Federated learning (FL) is a new artificial intelligence concept that enables Internet-of-Things (IoT) devices to learn a collaborative model without sending the raw data to centralized nodes for processing. Despite numerous advantages, low computing resources at IoT devices and high communication costs for exchanging model parameters make applications of FL in massive IoT networks very limited. In this work, we develop a novel compression scheme for FL, called high-compression federated learning (HCFL), for very large scale IoT networks. HCFL can reduce the data load for FL processes without changing their structure and hyperparameters. In this way, we not only can significantly reduce communication costs, but also make intensive learning processes more adaptable on low-computing resource IoT devices. Furthermore, we investigate a relationship between the number of IoT devices and the convergence level of the FL model and thereby better assess the quality of the FL process. We demonstrate our HCFL scheme in both simulations and mathematical analyses. Our proposed theoretical research can be used as a minimum level of satisfaction, proving that the FL process can achieve good performance when a determined configuration is met. Therefore, we show that HCFL is applicable in any FL-integrated networks with numerous IoT devices. △ Less

Submitted 21 June, 2022; v1 submitted 14 April, 2022; originally announced April 2022.

Comments: 14 pages, 12 figures, 3 tables

MSC Class: 62D05 68T04 ACM Class: I.2; E.4

arXiv:2204.02005 [pdf, other]

doi 10.1109/JIOT.2022.3160691

Aerial Computing: A New Computing Paradigm, Applications, and Challenges

Authors: Quoc-Viet Pham, Rukhsana Ruby, Fang Fang, Dinh C. Nguyen, Zhaohui Yang, Mai Le, Zhiguo Ding, Won-Joo Hwang

Abstract: In existing computing systems, such as edge computing and cloud computing, several emerging applications and practical scenarios are mostly unavailable or only partially implemented. To overcome the limitations that restrict such applications, the development of a comprehensive computing paradigm has garnered attention in both academia and industry. However, a gap exists in the literature owing to… ▽ More In existing computing systems, such as edge computing and cloud computing, several emerging applications and practical scenarios are mostly unavailable or only partially implemented. To overcome the limitations that restrict such applications, the development of a comprehensive computing paradigm has garnered attention in both academia and industry. However, a gap exists in the literature owing to the scarce research, and a comprehensive computing paradigm is yet to be systematically designed and reviewed. This study introduces a novel concept, called aerial computing, via the amalgamation of aerial radio access networks and edge computing, which attempts to bridge the gap. Specifically, first, we propose a novel comprehensive computing architecture that is composed of low-altitude computing, high-altitude computing, and satellite computing platforms, along with conventional computing systems. We determine that aerial computing offers several desirable attributes: global computing service, better mobility, higher scalability and availability, and simultaneity. Second, we comprehensively discuss key technologies that facilitate aerial computing, including energy refilling, edge computing, network softwarization, frequency spectrum, multi-access techniques, artificial intelligence, and big data. In addition, we discuss vertical domain applications (e.g., smart cities, smart vehicles, smart factories, and smart grids) supported by aerial computing. Finally, we highlight several challenges that need to be addressed and their possible solutions. △ Less

Submitted 5 April, 2022; originally announced April 2022.

Comments: Accepted to IEEE Internet of Things Journal

arXiv:2202.13959 [pdf, other]

Semi-Structured Query Grounding for Document-Oriented Databases with Deep Retrieval and Its Application to Receipt and POI Matching

Authors: Geewook Kim, Wonseok Hwang, Minjoon Seo, Seunghyun Park

Abstract: Semi-structured query systems for document-oriented databases have many real applications. One particular application that we are interested in is matching each financial receipt image with its corresponding place of interest (POI, e.g., restaurant) in the nationwide database. The problem is especially challenging in the real production environment where many similar or incomplete entries exist in… ▽ More Semi-structured query systems for document-oriented databases have many real applications. One particular application that we are interested in is matching each financial receipt image with its corresponding place of interest (POI, e.g., restaurant) in the nationwide database. The problem is especially challenging in the real production environment where many similar or incomplete entries exist in the database and queries are noisy (e.g., errors in optical character recognition). In this work, we aim to address practical challenges when using embedding-based retrieval for the query grounding problem in semi-structured data. Leveraging recent advancements in deep language encoding for retrieval, we conduct extensive experiments to find the most effective combination of modules for the embedding and retrieval of both query and database entries without any manually engineered component. The proposed model significantly outperforms the conventional manual pattern-based model while requiring much less development and maintenance cost. We also discuss some core observations in our experiments, which could be helpful for practitioners working on a similar problem in other domains. △ Less

Submitted 23 February, 2022; originally announced February 2022.

Comments: To appear in AAAI-22 Workshop on Knowledge Discovery from Unstructured Data in Financial Services

arXiv:2202.11508 [pdf, ps, other]

AI-enabled mm-Waveform Configuration for Autonomous Vehicles with Integrated Communication and Sensing

Authors: Nam H. Chu, Diep N. Nguyen, Dinh Thai Hoang, Quoc-Viet Pham, Khoa T. Phan, Won-Joo Hwang, Eryk Dutkiewicz

Abstract: Integrated Communications and Sensing (ICS) has recently emerged as an enabling technology for ubiquitous sensing and IoT applications. For ICS application to Autonomous Vehicles (AVs), optimizing the waveform structure is one of the most challenging tasks due to strong influences between sensing and data communication functions. Specifically, the preamble of a data communication frame is typicall… ▽ More Integrated Communications and Sensing (ICS) has recently emerged as an enabling technology for ubiquitous sensing and IoT applications. For ICS application to Autonomous Vehicles (AVs), optimizing the waveform structure is one of the most challenging tasks due to strong influences between sensing and data communication functions. Specifically, the preamble of a data communication frame is typically leveraged for the sensing function. As such, the higher number of preambles in a Coherent Processing Interval (CPI) is, the greater sensing task's performance is. In contrast, communication efficiency is inversely proportional to the number of preambles. Moreover, surrounding radio environments are usually dynamic with high uncertainties due to their high mobility, making the ICS's waveform optimization problem even more challenging. To that end, this paper develops a novel ICS framework established on the Markov decision process and recent advanced techniques in deep reinforcement learning. By doing so, without requiring complete knowledge of the surrounding environment in advance, the ICS-AV can adaptively optimize its waveform structure (i.e., number of frames in the CPI) to maximize sensing and data communication performance under the surrounding environment's dynamic and uncertainty. Extensive simulations show that our proposed approach can improve the joint communication and sensing performance up to 46.26% compared with other baseline methods. △ Less

Submitted 31 October, 2022; v1 submitted 23 February, 2022; originally announced February 2022.

Comments: Typos, channel model updates

arXiv:2111.15664 [pdf, other]

OCR-free Document Understanding Transformer

Authors: Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park

Abstract: Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such… ▽ More Understanding document images (e.g., invoices) is a core but challenging task since it requires complex functions such as reading text and a holistic understanding of the document. Current Visual Document Understanding (VDU) methods outsource the task of reading text to off-the-shelf Optical Character Recognition (OCR) engines and focus on the understanding task with the OCR outputs. Although such OCR-based approaches have shown promising performance, they suffer from 1) high computational costs for using OCR; 2) inflexibility of OCR models on languages or types of document; 3) OCR error propagation to the subsequent process. To address these issues, in this paper, we introduce a novel OCR-free VDU model named Donut, which stands for Document understanding transformer. As the first step in OCR-free VDU research, we propose a simple architecture (i.e., Transformer) with a pre-training objective (i.e., cross-entropy loss). Donut is conceptually simple yet effective. Through extensive experiments and analyses, we show a simple OCR-free VDU model, Donut, achieves state-of-the-art performances on various VDU tasks in terms of both speed and accuracy. In addition, we offer a synthetic data generator that helps the model pre-training to be flexible in various languages and domains. The code, trained model and synthetic data are available at https://github.com/clovaai/donut. △ Less

Submitted 6 October, 2022; v1 submitted 30 November, 2021; originally announced November 2021.

Comments: ECCV 2022. (v5) update table 2 and figures; add LayoutLM and update scores with the latest test script at https://github.com/clovaai/donut

arXiv:2111.14173 [pdf, other]

CDGNet: Class Distribution Guided Network for Human Parsing

Authors: Kunliang Liu, Ouk Choi, Jianming Wang, Wonjun Hwang

Abstract: The objective of human parsing is to partition a human in an image into constituent parts. This task involves labeling each pixel of the human image according to the classes. Since the human body comprises hierarchically structured parts, each body part of an image can have its sole position distribution characteristic. Probably, a human head is less likely to be under the feet, and arms are more… ▽ More The objective of human parsing is to partition a human in an image into constituent parts. This task involves labeling each pixel of the human image according to the classes. Since the human body comprises hierarchically structured parts, each body part of an image can have its sole position distribution characteristic. Probably, a human head is less likely to be under the feet, and arms are more likely to be near the torso. Inspired by this observation, we make instance class distributions by accumulating the original human parsing label in the horizontal and vertical directions, which can be utilized as supervision signals. Using these horizontal and vertical class distribution labels, the network is guided to exploit the intrinsic position distribution of each class. We combine two guided features to form a spatial guidance map, which is then superimposed onto the baseline network by multiplication and concatenation to distinguish the human parts precisely. We conducted extensive experiments to demonstrate the effectiveness and superiority of our method on three well-known benchmarks: LIP, ATR, and CIHP databases. △ Less

Submitted 16 March, 2022; v1 submitted 28 November, 2021; originally announced November 2021.

Comments: Accepted at CVPR 2022

arXiv:2111.13353 [pdf, other]

Contrastive Vicinal Space for Unsupervised Domain Adaptation

Authors: Jaemin Na, Dongyoon Han, Hyung Jin Chang, Wonjun Hwang

Abstract: Recent unsupervised domain adaptation methods have utilized vicinal space between the source and target domains. However, the equilibrium collapse of labels, a problem where the source labels are dominant over the target labels in the predictions of vicinal instances, has never been addressed. In this paper, we propose an instance-wise minimax strategy that minimizes the entropy of high uncertaint… ▽ More Recent unsupervised domain adaptation methods have utilized vicinal space between the source and target domains. However, the equilibrium collapse of labels, a problem where the source labels are dominant over the target labels in the predictions of vicinal instances, has never been addressed. In this paper, we propose an instance-wise minimax strategy that minimizes the entropy of high uncertainty instances in the vicinal space to tackle the stated problem. We divide the vicinal space into two subspaces through the solution of the minimax problem: contrastive space and consensus space. In the contrastive space, inter-domain discrepancy is mitigated by constraining instances to have contrastive views and labels, and the consensus space reduces the confusion between intra-domain categories. The effectiveness of our method is demonstrated on public benchmarks, including Office-31, Office-Home, and VisDA-C, achieving state-of-the-art performances. We further show that our method outperforms the current state-of-the-art methods on PACS, which indicates that our instance-wise approach works well for multi-source domain adaptation as well. Code is available at https://github.com/NaJaeMin92/CoVi. △ Less

Submitted 18 July, 2022; v1 submitted 26 November, 2021; originally announced November 2021.

Comments: 10 pages, 7 figures, 5 tables

arXiv:2111.08834 [pdf, other]

Federated Learning for Smart Healthcare: A Survey

Authors: Dinh C. Nguyen, Quoc-Viet Pham, Pubudu N. Pathirana, Ming Ding, Aruna Seneviratne, Zihuai Lin, Octavia A. Dobre, Won-Joo Hwang

Abstract: Recent advances in communication technologies and Internet-of-Medical-Things have transformed smart healthcare enabled by artificial intelligence (AI). Traditionally, AI techniques require centralized data collection and processing that may be infeasible in realistic healthcare scenarios due to the high scalability of modern healthcare networks and growing data privacy concerns. Federated Learning… ▽ More Recent advances in communication technologies and Internet-of-Medical-Things have transformed smart healthcare enabled by artificial intelligence (AI). Traditionally, AI techniques require centralized data collection and processing that may be infeasible in realistic healthcare scenarios due to the high scalability of modern healthcare networks and growing data privacy concerns. Federated Learning (FL), as an emerging distributed collaborative AI paradigm, is particularly attractive for smart healthcare, by coordinating multiple clients (e.g., hospitals) to perform AI training without sharing raw data. Accordingly, we provide a comprehensive survey on the use of FL in smart healthcare. First, we present the recent advances in FL, the motivations, and the requirements of using FL in smart healthcare. The recent FL designs for smart healthcare are then discussed, ranging from resource-aware FL, secure and privacy-aware FL to incentive FL and personalized FL. Subsequently, we provide a state-of-the-art review on the emerging applications of FL in key healthcare domains, including health data management, remote health monitoring, medical imaging, and COVID-19 detection. Several recent FL-based smart healthcare projects are analyzed, and the key lessons learned from the survey are also highlighted. Finally, we discuss interesting research challenges and possible directions for future FL research in smart healthcare. △ Less

Submitted 16 November, 2021; originally announced November 2021.

Comments: Accepted at ACM Computing Surveys, 35 pages

arXiv:2110.05022 [pdf, other]

Blockchain for Edge of Things: Applications, Opportunities, and Challenges

Authors: Thippa Reddy Gadekallu, Quoc-Viet Pham, Dinh C. Nguyen, Praveen Kumar Reddy Maddikunta, N Deepa, Prabadevi B, Pubudu N. Pathirana, Jun Zhao, Won-Joo Hwang

Abstract: In recent years, blockchain networks have attracted significant attention in many research areas beyond cryptocurrency, one of them being the Edge of Things (EoT) that is enabled by the combination of edge computing and the Internet of Things (IoT). In this context, blockchain networks enabled with unique features such as decentralization, immutability, and traceability, have the potential to resh… ▽ More In recent years, blockchain networks have attracted significant attention in many research areas beyond cryptocurrency, one of them being the Edge of Things (EoT) that is enabled by the combination of edge computing and the Internet of Things (IoT). In this context, blockchain networks enabled with unique features such as decentralization, immutability, and traceability, have the potential to reshape and transform the conventional EoT systems with higher security levels. Particularly, the convergence of blockchain and EoT leads to a new paradigm, called BEoT that has been regarded as a promising enabler for future services and applications. In this paper, we present a state-of-the-art review of recent developments in BEoT technology and discover its great opportunities in many application domains. We start our survey by providing an updated introduction to blockchain and EoT along with their recent advances. Subsequently, we discuss the use of BEoT in a wide range of industrial applications, from smart transportation, smart city, smart healthcare to smart home and smart grid. Security challenges in BEoT paradigm are also discussed and analyzed, with some key services such as access authentication, data privacy preservation, attack detection, and trust management. Finally, some key research challenges and future directions are also highlighted to instigate further research in this promising area. △ Less

Submitted 11 October, 2021; originally announced October 2021.

Comments: The paper is accepted for publication in IEEE IoTJ

arXiv:2108.04539 [pdf, other]

BROS: A Pre-trained Language Model Focusing on Text and Layout for Better Key Information Extraction from Documents

Authors: Teakgyu Hong, Donghyun Kim, Mingi Ji, Wonseok Hwang, Daehyun Nam, Sungrae Park

Abstract: Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the bas… ▽ More Key information extraction (KIE) from document images requires understanding the contextual and spatial semantics of texts in two-dimensional (2D) space. Many recent studies try to solve the task by developing pre-trained language models focusing on combining visual features from document images with texts and their layout. On the other hand, this paper tackles the problem by going back to the basic: effective combination of text and layout. Specifically, we propose a pre-trained language model, named BROS (BERT Relying On Spatiality), that encodes relative positions of texts in 2D space and learns from unlabeled documents with area-masking strategy. With this optimized training scheme for understanding texts in 2D space, BROS shows comparable or better performance compared to previous methods on four KIE benchmarks (FUNSD, SROIE*, CORD, and SciTSR) without relying on visual features. This paper also reveals two real-world challenges in KIE tasks-(1) minimizing the error from incorrect text ordering and (2) efficient learning from fewer downstream examples-and demonstrates the superiority of BROS over previous methods. Code is available at https://github.com/clovaai/bros. △ Less

Submitted 5 April, 2022; v1 submitted 10 August, 2021; originally announced August 2021.

Comments: AAAI 2022 - Main Technical Track

arXiv:2107.14270 [pdf, other]

Secure Swarm UAV-assisted Communications with Cooperative Friendly Jamming

Authors: Hanh Dang-Ngoc, Diep N. Nguyen, Khuong Ho-Van, Dinh Thai Hoang, Eryk Dutkiewicz, Quoc-Viet Pham, Won-Joo Hwang

Abstract: This article proposes a cooperative friendly jamming framework for swarm unmanned aerial vehicle (UAV)-assisted amplify-and-forward (AF) relaying networks with wireless energy harvesting. Due to the limited energy of the UAVs, we develop a collaborative time-switching relaying protocol which allows the UAVs to collaborate to harvest wireless energy, relay information, and jam the eavesdropper. To… ▽ More This article proposes a cooperative friendly jamming framework for swarm unmanned aerial vehicle (UAV)-assisted amplify-and-forward (AF) relaying networks with wireless energy harvesting. Due to the limited energy of the UAVs, we develop a collaborative time-switching relaying protocol which allows the UAVs to collaborate to harvest wireless energy, relay information, and jam the eavesdropper. To evaluate the secrecy rate, we derive the secrecy outage probability (SOP) for two popular detection techniques at the eavesdropper, i.e., selection combining and maximum-ratio combining. Monte Carlo simulations are then used to validate the theoretical SOP derivation and to show the effectiveness of the proposed framework in terms of SOP as compared with the conventional amplify-and-forward relaying system. Using the derived SOP, one can obtain engineering insights to optimize the energy harvesting time and the number of UAVs in the swarm to achieve a given secrecy protection level. The analytical SOP derived in this work can also be helpful in future UAV secure-communications optimizations (e.g., trajectory, locations of UAVs). As an example, we present a case study to find the optimal corridor to locate the swarm so as to minimize the system SOP. △ Less

Submitted 29 July, 2021; originally announced July 2021.

Comments: 30 pages, 7 figures, journal

arXiv:2107.14040 [pdf, other]

doi 10.1109/ACCESS.2020.3009328

Artificial Intelligence (AI) and Big Data for Coronavirus (COVID-19) Pandemic: A Survey on the State-of-the-Arts

Authors: Quoc-Viet Pham, Dinh C. Nguyen, Thien Huynh-The, Won-Joo Hwang, Pubudu N Pathirana

Abstract: The very first infected novel coronavirus case (COVID-19) was found in Hubei, China in Dec. 2019. The COVID-19 pandemic has spread over 214 countries and areas in the world, and has significantly affected every aspect of our daily lives. At the time of writing this article, the numbers of infected cases and deaths still increase significantly and have no sign of a well-controlled situation, e.g.,… ▽ More The very first infected novel coronavirus case (COVID-19) was found in Hubei, China in Dec. 2019. The COVID-19 pandemic has spread over 214 countries and areas in the world, and has significantly affected every aspect of our daily lives. At the time of writing this article, the numbers of infected cases and deaths still increase significantly and have no sign of a well-controlled situation, e.g., as of 13 July 2020, from a total number of around 13.1 million positive cases, 571, 527 deaths were reported in the world. Motivated by recent advances and applications of artificial intelligence (AI) and big data in various areas, this paper aims at emphasizing their importance in responding to the COVID-19 outbreak and preventing the severe effects of the COVID-19 pandemic. We firstly present an overview of AI and big data, then identify the applications aimed at fighting against COVID-19, next highlight challenges and issues associated with state-of-the-art solutions, and finally come up with recommendations for the communications to effectively control the COVID-19 situation. It is expected that this paper provides researchers and communities with new insights into the ways AI and big data improve the COVID-19 situation, and drives further studies in stopping the COVID-19 outbreak. △ Less

Submitted 17 July, 2021; originally announced July 2021.

Comments: Accepted at IEEE Access Journal, 19 pages

arXiv:2107.02905 [pdf]

doi 10.1016/j.imu.2023.101387

An in silico drug repurposing pipeline to identify drugs with the potential to inhibit SARS-CoV-2 replication

Authors: Méabh MacMahon, Woochang Hwang, Soorin Yim, Eoghan MacMahon, Alexandre Abraham, Justin Barton, Mukunthan Tharmakulasingam, Paul Bilokon, Vasanthi Priyadarshini Gaddi, Namshik Han

Abstract: Drug repurposing provides an opportunity to redeploy drugs, which ideally are already approved for use in humans, for the treatment of other diseases. For example, the repurposing of dexamethasone and baricitinib has played a crucial role in saving patient lives during the ongoing SARS-CoV-2 pandemic. There remains a need to expand therapeutic approaches to prevent life-threatening complications i… ▽ More Drug repurposing provides an opportunity to redeploy drugs, which ideally are already approved for use in humans, for the treatment of other diseases. For example, the repurposing of dexamethasone and baricitinib has played a crucial role in saving patient lives during the ongoing SARS-CoV-2 pandemic. There remains a need to expand therapeutic approaches to prevent life-threatening complications in patients with COVID-19. Using an in silico approach based on structural similarity to drugs already in clinical trials for COVID-19, potential drugs were predicted for repurposing. For a subset of identified drugs with different targets to their corresponding COVID-19 clinical trial drug, a mechanism of action analysis was applied to establish whether they might have a role in inhibiting the replication of SARS-CoV-2. Of sixty drugs predicted in this study, two with the potential to inhibit SARS-CoV-2 replication were identified using mechanism of action analysis. Triamcinolone is a corticosteroid that is structurally similar to dexamethasone; gallopamil is a calcium channel blocker that is structurally similar to verapamil. In silico approaches indicate possible mechanisms of action for both drugs in inhibiting SARS-CoV-2 replication. The identification of these drugs as potentially useful for patients with COVID-19 who are at a higher risk of developing severe disease supports the use of in silico approaches to facilitate quick and cost-effective drug repurposing. Such drugs could expand the number of treatments available to patients who are not protected by vaccination. △ Less

Submitted 23 November, 2022; v1 submitted 5 July, 2021; originally announced July 2021.

Comments: 23 pages, 4 figures

Journal ref: Informatics in Medicine Unlocked (2023): 101387

arXiv:2106.09261 [pdf, other]

Federated Learning Framework with Straggling Mitigation and Privacy-Awareness for AI-based Mobile Application Services

Authors: Yuris Mulya Saputra, Diep N. Nguyen, Dinh Thai Hoang, Quoc-Viet Pham, Eryk Dutkiewicz, Won-Joo Hwang

Abstract: In this work, we propose a novel framework to address straggling and privacy issues for federated learning (FL)-based mobile application services, taking into account limited computing/communications resources at mobile users (MUs)/mobile application provider (MAP), privacy cost, the rationality and incentive competition among MUs in contributing data to the MAP. Particularly, the MAP first determ… ▽ More In this work, we propose a novel framework to address straggling and privacy issues for federated learning (FL)-based mobile application services, taking into account limited computing/communications resources at mobile users (MUs)/mobile application provider (MAP), privacy cost, the rationality and incentive competition among MUs in contributing data to the MAP. Particularly, the MAP first determines a set of the best MUs for the FL process based on the MUs' provided information/features. To mitigate straggling problems with privacy-awareness, each selected MU can then encrypt part of local data and upload the encrypted data to the MAP for an encrypted training process, in addition to the local training process. For that, each selected MU can propose a contract to the MAP according to its expected trainable local data and privacy-protected encrypted data. To find the optimal contracts that can maximize utilities of the MAP and all the participating MUs while maintaining high learning quality of the whole system, we first develop a multi-principal one-agent contract-based problem leveraging FL-based multiple utility functions. These utility functions account for the MUs' privacy cost, the MAP's limited computing resources, and asymmetric information between the MAP and MUs. Then, we transform the problem into an equivalent low-complexity problem and develop a light-weight iterative algorithm to effectively find the optimal solutions. Experiments with a real-world dataset show that our framework can speed up training time up to 49% and improve prediction accuracy up to 4.6 times while enhancing the network's social welfare, i.e., total utility of all participating entities, up to 114% under the privacy cost consideration compared with those of baseline methods. △ Less

Submitted 3 November, 2021; v1 submitted 17 June, 2021; originally announced June 2021.

Comments: 18 pages (submitted to an IEEE journal)

arXiv:2104.08041 [pdf, other]

Cost-effective End-to-end Information Extraction for Semi-structured Document Images

Authors: Wonseok Hwang, Hyunji Lee, Jinyeong Yim, Geewook Kim, Minjoon Seo

Abstract: A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost. One can instead consider an end-to-end model that directly maps the input to the target output and simplify the entire process. However, such generation approach is known to lead to unst… ▽ More A real-world information extraction (IE) system for semi-structured document images often involves a long pipeline of multiple modules, whose complexity dramatically increases its development and maintenance cost. One can instead consider an end-to-end model that directly maps the input to the target output and simplify the entire process. However, such generation approach is known to lead to unstable performance if not designed carefully. Here we present our recent effort on transitioning from our existing pipeline-based IE system to an end-to-end system focusing on practical challenges that are associated with replacing and deploying the system in real, large-scale production. By carefully formulating document IE as a sequence generation task, we show that a single end-to-end IE system can be built and still achieve competent performance. △ Less

Submitted 30 August, 2021; v1 submitted 16 April, 2021; originally announced April 2021.

Comments: Accepted at EMNLP 2021

arXiv:2103.11073 [pdf, other]

doi 10.1109/TVT.2021.3065084

UAV Communications for Sustainable Federated Learning

Authors: Quoc-Viet Pham, Ming Zeng, Rukhsana Ruby, Thien Huynh-The, Won-Joo Hwang

Abstract: Federated learning (FL), invented by Google in 2016, has become a hot research trend. However, enabling FL in wireless networks has to overcome the limited battery challenge of mobile users. In this regard, we propose to apply unmanned aerial vehicle (UAV)-empowered wireless power transfer to enable sustainable FL-based wireless networks. The objective is to maximize the UAV transmit power efficie… ▽ More Federated learning (FL), invented by Google in 2016, has become a hot research trend. However, enabling FL in wireless networks has to overcome the limited battery challenge of mobile users. In this regard, we propose to apply unmanned aerial vehicle (UAV)-empowered wireless power transfer to enable sustainable FL-based wireless networks. The objective is to maximize the UAV transmit power efficiency, via a joint optimization of transmission time and bandwidth allocation, power control, and the UAV placement. Directly solving the formulated problem is challenging, due to the coupling of variables. Hence, we leverage the decomposition technique and a successive convex approximation approach to develop an efficient algorithm, namely UAV for sustainable FL (UAV-SFL). Finally, simulations illustrate the potential of our proposed UAV-SFL approach in providing a sustainable solution for FL-based wireless networks, and in reducing the UAV transmit power by 32.95%, 63.18%, and 78.81% compared with the benchmarks. △ Less

Submitted 19 March, 2021; originally announced March 2021.

Comments: Accepted by IEEE Vehicular Technology correspondence 2021

arXiv:2102.10185 [pdf, other]

Cornus: Atomic Commit for a Cloud DBMS with Storage Disaggregation (Extended Version)

Authors: Zhihan Guo, Xinyu Zeng, Kan Wu, Wuh-Chwen Hwang, Ziwei Ren, Xiangyao Yu, Mahesh Balakrishnan, Philip A. Bernstein

Abstract: Two-phase commit (2PC) is widely used in distributed databases to ensure the atomicity of distributed transactions. However, 2PC has two limitations. First, it requires two eager log writes on the critical path, which incurs significant latency. Second, when a coordinator fails, a participant may be blocked waiting for the coordinator's decision, leading to indefinitely long latency and low throug… ▽ More Two-phase commit (2PC) is widely used in distributed databases to ensure the atomicity of distributed transactions. However, 2PC has two limitations. First, it requires two eager log writes on the critical path, which incurs significant latency. Second, when a coordinator fails, a participant may be blocked waiting for the coordinator's decision, leading to indefinitely long latency and low throughput. 2PC was originally designed for a shared-nothing architecture. We observe that the two problems above can be addressed in an emerging storage disaggregation architecture which provides compare-and-swap capability in the storage layer. We propose Cornus, an optimized 2PC protocol for Cloud DBMS with Storage Disaggregation. We present Cornus in detail with proofs and show how it addresses the two limitations in 2PC. We also deploy it on real storage services including Azure Blob Storage and Redis. Empirical evaluations show that Cornus can achieve up to 1.9x speedup in latency over 2PC. △ Less

Submitted 12 October, 2022; v1 submitted 19 February, 2021; originally announced February 2021.

arXiv:2102.07572 [pdf, other]

Transfer Learning for Future Wireless Networks: A Comprehensive Survey

Authors: Cong T. Nguyen, Nguyen Van Huynh, Nam H. Chu, Yuris Mulya Saputra, Dinh Thai Hoang, Diep N. Nguyen, Quoc-Viet Pham, Dusit Niyato, Eryk Dutkiewicz, Won-Joo Hwang

Abstract: With outstanding features, Machine Learning (ML) has been the backbone of numerous applications in wireless networks. However, the conventional ML approaches have been facing many challenges in practical implementation, such as the lack of labeled data, the constantly changing wireless environments, the long training process, and the limited capacity of wireless devices. These challenges, if not a… ▽ More With outstanding features, Machine Learning (ML) has been the backbone of numerous applications in wireless networks. However, the conventional ML approaches have been facing many challenges in practical implementation, such as the lack of labeled data, the constantly changing wireless environments, the long training process, and the limited capacity of wireless devices. These challenges, if not addressed, will impede the effectiveness and applicability of ML in future wireless networks. To address these problems, Transfer Learning (TL) has recently emerged to be a very promising solution. The core idea of TL is to leverage and synthesize distilled knowledge from similar tasks as well as from valuable experiences accumulated from the past to facilitate the learning of new problems. Doing so, TL techniques can reduce the dependence on labeled data, improve the learning speed, and enhance the ML methods' robustness to different wireless environments. This article aims to provide a comprehensive survey on applications of TL in wireless networks. Particularly, we first provide an overview of TL including formal definitions, classification, and various types of TL techniques. We then discuss diverse TL approaches proposed to address emerging issues in wireless networks. The issues include spectrum management, localization, signal recognition, security, human activity recognition and caching, which are all important to next-generation networks such as 5G and beyond. Finally, we highlight important challenges, open issues, and future research directions of TL in future wireless networks. △ Less

Submitted 8 August, 2021; v1 submitted 15 February, 2021; originally announced February 2021.

arXiv:2101.08013 [pdf, other]

Deep Learning for Intelligent Demand Response and Smart Grids: A Comprehensive Survey

Authors: Prabadevi B, Quoc-Viet Pham, Madhusanka Liyanage, N Deepa, Mounik VVSS, Shivani Reddy, Praveen Kumar Reddy Maddikunta, Neelu Khare, Thippa Reddy Gadekallu, Won-Joo Hwang

Abstract: Electricity is one of the mandatory commodities for mankind today. To address challenges and issues in the transmission of electricity through the traditional grid, the concepts of smart grids and demand response have been developed. In such systems, a large amount of data is generated daily from various sources such as power generation (e.g., wind turbines), transmission and distribution (microgr… ▽ More Electricity is one of the mandatory commodities for mankind today. To address challenges and issues in the transmission of electricity through the traditional grid, the concepts of smart grids and demand response have been developed. In such systems, a large amount of data is generated daily from various sources such as power generation (e.g., wind turbines), transmission and distribution (microgrids and fault detectors), load management (smart meters and smart electric appliances). Thanks to recent advancements in big data and computing technologies, Deep Learning (DL) can be leveraged to learn the patterns from the generated data and predict the demand for electricity and peak hours. Motivated by the advantages of deep learning in smart grids, this paper sets to provide a comprehensive survey on the application of DL for intelligent smart grids and demand response. Firstly, we present the fundamental of DL, smart grids, demand response, and the motivation behind the use of DL. Secondly, we review the state-of-the-art applications of DL in smart grids and demand response, including electric load forecasting, state estimation, energy theft detection, energy sharing and trading. Furthermore, we illustrate the practicality of DL via various use cases and projects. Finally, we highlight the challenges presented in existing research works and highlight important issues and potential directions in the use of DL for smart grids and demand response. △ Less

Submitted 20 January, 2021; originally announced January 2021.

Comments: This work has been submitted for possible publication. Any comments and suggestions are appreciated

arXiv:2101.06940 [pdf, ps, other]

Learning DNN networks using un-rectifying ReLU with compressed sensing application

Authors: Wen-Liang Hwang, Shih-Shuo Tung

Abstract: The un-rectifying technique expresses a non-linear point-wise activation function as a data-dependent variable, which means that the activation variable along with its input and output can all be employed in optimization. The ReLU network in this study was un-rectified means that the activation functions could be replaced with data-dependent activation variables in the form of equations and constr… ▽ More The un-rectifying technique expresses a non-linear point-wise activation function as a data-dependent variable, which means that the activation variable along with its input and output can all be employed in optimization. The ReLU network in this study was un-rectified means that the activation functions could be replaced with data-dependent activation variables in the form of equations and constraints. The discrete nature of activation variables associated with un-rectifying ReLUs allows the reformulation of deep learning problems as problems of combinatorial optimization. However, we demonstrate that the optimal solution to a combinatorial optimization problem can be preserved by relaxing the discrete domains of activation variables to closed intervals. This makes it easier to learn a network using methods developed for real-domain constrained optimization. We also demonstrate that by introducing data-dependent slack variables as constraints, it is possible to optimize a network based on the augmented Lagrangian approach. This means that our method could theoretically achieve global convergence and all limit points are critical points of the learning problem. In experiments, our novel approach to solving the compressed sensing recovery problem achieved state-of-the-art performance when applied to the MNIST database and natural images. △ Less

Submitted 18 January, 2021; originally announced January 2021.

Comments: 35 pages, 6 figures

arXiv:2101.03498 [pdf, other]

doi 10.1109/JIOT.2020.2988930

Sum-Rate Maximization for UAV-assisted Visible Light Communications using NOMA: Swarm Intelligence meets Machine Learning

Authors: Quoc-Viet Pham, Thien Huynh-The, Mamoun Alazab, Jun Zhao, Won-Joo Hwang

Abstract: As the integration of unmanned aerial vehicles (UAVs) into visible light communications (VLC) can offer many benefits for massive-connectivity applications and services in 5G and beyond, this work considers a UAV-assisted VLC using non-orthogonal multiple-access. More specifically, we formulate a joint problem of power allocation and UAV's placement to maximize the sum rate of all users, subject t… ▽ More As the integration of unmanned aerial vehicles (UAVs) into visible light communications (VLC) can offer many benefits for massive-connectivity applications and services in 5G and beyond, this work considers a UAV-assisted VLC using non-orthogonal multiple-access. More specifically, we formulate a joint problem of power allocation and UAV's placement to maximize the sum rate of all users, subject to constraints on power allocation, quality of service of users, and UAV's position. Since the problem is non-convex and NP-hard in general, it is difficult to be solved optimally. Moreover, the problem is not easy to be solved by conventional approaches, e.g., coordinate descent algorithms, due to channel modeling in VLC. Therefore, we propose using harris hawks optimization (HHO) algorithm to solve the formulated problem and obtain an efficient solution. We then use the HHO algorithm together with artificial neural networks to propose a design which can be used in real-time applications and avoid falling into the "local minima" trap in conventional trainers. Numerical results are provided to verify the effectiveness of the proposed algorithm and further demonstrate that the proposed algorithm/HHO trainer is superior to several alternative schemes and existing metaheuristic algorithms. △ Less

Submitted 10 January, 2021; originally announced January 2021.

Comments: Published in IEEE Internet of Things Journal (IoTJ) 2020

arXiv:2011.13509 [pdf, other]

Tractable loss function and color image generation of multinary restricted Boltzmann machine

Authors: Juno Hwang, Wonseok Hwang, Junghyo Jo

Abstract: The restricted Boltzmann machine (RBM) is a representative generative model based on the concept of statistical mechanics. In spite of the strong merit of interpretability, unavailability of backpropagation makes it less competitive than other generative models. Here we derive differentiable loss functions for both binary and multinary RBMs. Then we demonstrate their learnability and performance b… ▽ More The restricted Boltzmann machine (RBM) is a representative generative model based on the concept of statistical mechanics. In spite of the strong merit of interpretability, unavailability of backpropagation makes it less competitive than other generative models. Here we derive differentiable loss functions for both binary and multinary RBMs. Then we demonstrate their learnability and performance by generating colored face images. △ Less

Submitted 26 November, 2020; originally announced November 2020.

Comments: NueRIPS 2020 DiffCVGP workshop paper

arXiv:2011.09230 [pdf, other]

FixBi: Bridging Domain Spaces for Unsupervised Domain Adaptation

Authors: Jaemin Na, Heechul Jung, Hyung Jin Chang, Wonjun Hwang

Abstract: Unsupervised domain adaptation (UDA) methods for learning domain invariant representations have achieved remarkable progress. However, most of the studies were based on direct adaptation from the source domain to the target domain and have suffered from large domain discrepancies. In this paper, we propose a UDA method that effectively handles such large domain discrepancies. We introduce a fixed… ▽ More Unsupervised domain adaptation (UDA) methods for learning domain invariant representations have achieved remarkable progress. However, most of the studies were based on direct adaptation from the source domain to the target domain and have suffered from large domain discrepancies. In this paper, we propose a UDA method that effectively handles such large domain discrepancies. We introduce a fixed ratio-based mixup to augment multiple intermediate domains between the source and target domain. From the augmented-domains, we train the source-dominant model and the target-dominant model that have complementary characteristics. Using our confidence-based learning methodologies, e.g., bidirectional matching with high-confidence predictions and self-penalization using low-confidence predictions, the models can learn from each other or from its own results. Through our proposed methods, the models gradually transfer domain knowledge from the source to the target domain. Extensive experiments demonstrate the superiority of our proposed method on three public benchmarks: Office-31, Office-Home, and VisDA-2017. △ Less

Submitted 25 March, 2021; v1 submitted 18 November, 2020; originally announced November 2020.

Comments: Accepted to CVPR 2021

arXiv:2009.08825 [pdf, other]

Densely Guided Knowledge Distillation using Multiple Teacher Assistants

Authors: Wonchul Son, Jaemin Na, Junyong Choi, Wonjun Hwang

Abstract: With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies have been performed to resolve the poor learning issue of the student network when the student and teacher model sizes significantly differ. In this paper, we pr… ▽ More With the success of deep neural networks, knowledge distillation which guides the learning of a small student network from a large teacher network is being actively studied for model compression and transfer learning. However, few studies have been performed to resolve the poor learning issue of the student network when the student and teacher model sizes significantly differ. In this paper, we propose a densely guided knowledge distillation using multiple teacher assistants that gradually decreases the model size to efficiently bridge the large gap between the teacher and student networks. To stimulate more efficient learning of the student network, we guide each teacher assistant to every other smaller teacher assistants iteratively. Specifically, when teaching a smaller teacher assistant at the next step, the existing larger teacher assistants from the previous step are used as well as the teacher network. Moreover, we design stochastic teaching where, for each mini-batch, a teacher or teacher assistants are randomly dropped. This acts as a regularizer to improve the efficiency of teaching of the student network. Thus, the student can always learn salient distilled knowledge from the multiple sources. We verified the effectiveness of the proposed method for a classification task using CIFAR-10, CIFAR-100, and ImageNet. We also achieved significant performance improvements with various backbone architectures such as ResNet, WideResNet, and VGG. △ Less

Submitted 9 August, 2021; v1 submitted 18 September, 2020; originally announced September 2020.

Comments: Accepted at ICCV 2021

arXiv:2008.08264 [pdf, other]

Intelligent Radio Signal Processing: A Survey

Authors: Quoc-Viet Pham, Nhan Thanh Nguyen, Thien Huynh-The, Long Bao Le, Kyungchun Lee, Won-Joo Hwang

Abstract: Intelligent signal processing for wireless communications is a vital task in modern wireless systems, but it faces new challenges because of network heterogeneity, diverse service requirements, a massive number of connections, and various radio characteristics. Owing to recent advancements in big data and computing technologies, artificial intelligence (AI) has become a useful tool for radio signa… ▽ More Intelligent signal processing for wireless communications is a vital task in modern wireless systems, but it faces new challenges because of network heterogeneity, diverse service requirements, a massive number of connections, and various radio characteristics. Owing to recent advancements in big data and computing technologies, artificial intelligence (AI) has become a useful tool for radio signal processing and has enabled the realization of intelligent radio signal processing. This survey covers four intelligent signal processing topics for the wireless physical layer, including modulation classification, signal detection, beamforming, and channel estimation. In particular, each theme is presented in a dedicated section, starting with the most fundamental principles, followed by a review of up-to-date studies and a summary. To provide the necessary background, we first present a brief overview of AI techniques such as machine learning, deep learning, and federated learning. Finally, we highlight a number of research challenges and future directions in the area of intelligent radio signal processing. We expect this survey to be a good source of information for anyone interested in intelligent radio signal processing, and the perspectives we provide therein will stimulate many more novel ideas and contributions in the future. △ Less

Submitted 3 June, 2021; v1 submitted 19 August, 2020; originally announced August 2020.

Comments: Accepted for publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2007.15430 [pdf, other]

Clustering and Power Allocation for UAV-assisted NOMA-VLC Systems: A Swarm Intelligence Approach

Authors: Quoc-Viet Pham, Nhu-Ngoc Dao, Thien Huynh-The, Jun Zhao, Won-Joo Hwang

Abstract: Integrating unmanned aerial vehicles (UAV) to non-orthogonal multiple access (NOMA) visible light communications (VLC) exposes many potentials over VLC and NOMA-VLC systems. In this circumstance, user grouping is of importance to reduce the NOMA decoding complexity when the number of users is large; however, this issue has not been considered in the existing study. In this paper, we aim to maximiz… ▽ More Integrating unmanned aerial vehicles (UAV) to non-orthogonal multiple access (NOMA) visible light communications (VLC) exposes many potentials over VLC and NOMA-VLC systems. In this circumstance, user grouping is of importance to reduce the NOMA decoding complexity when the number of users is large; however, this issue has not been considered in the existing study. In this paper, we aim to maximize the weighted sum-rate of all the users by jointly optimizing UAV placement, user grouping, and power allocation in downlink NOMA-VLC systems. We first consider an efficient user clustering strategy, then apply a swarm intelligence approach, namely Harris Hawk Optimization (HHO), to solve the joint UAV placement and power allocation problem. Simulation results show outperformance of the proposed algorithm in comparison with four alternatives: OMA, NOMA without pairing, NOMA-VLC with fixed UAV placement, and random user clustering. △ Less

Submitted 12 July, 2020; originally announced July 2020.

arXiv:2007.15221 [pdf, other]

Swarm Intelligence for Next-Generation Wireless Networks: Recent Advances and Applications

Authors: Quoc-Viet Pham, Dinh C. Nguyen, Seyedali Mirjalili, Dinh Thai Hoang, Diep N. Nguyen, Pubudu N. Pathirana, Won-Joo Hwang

Abstract: Due to the proliferation of smart devices and emerging applications, many next-generation technologies have been paid for the development of wireless networks. Even though commercial 5G has just been widely deployed in some countries, there have been initial efforts from academia and industrial communities for 6G systems. In such a network, a very large number of devices and applications are emerg… ▽ More Due to the proliferation of smart devices and emerging applications, many next-generation technologies have been paid for the development of wireless networks. Even though commercial 5G has just been widely deployed in some countries, there have been initial efforts from academia and industrial communities for 6G systems. In such a network, a very large number of devices and applications are emerged, along with heterogeneity of technologies, architectures, mobile data, etc., and optimizing such a network is of utmost importance. Besides convex optimization and game theory, swarm intelligence (SI) has recently appeared as a promising optimization tool for wireless networks. As a new subdivision of artificial intelligence, SI is inspired by the collective behaviors of societies of biological species. In SI, simple agents with limited capabilities would achieve intelligent strategies for high-dimensional and challenging problems, so it has recently found many applications in next-generation wireless networks (NGN). However, researchers may not be completely aware of the full potential of SI techniques. In this work, our primary focus will be the integration of these two domains: NGN and SI. Firstly, we provide an overview of SI techniques from fundamental concepts to well-known optimizers. Secondly, we review the applications of SI to settle emerging issues in NGN, including spectrum management and resource allocation, wireless caching and edge computing, network security, and several other miscellaneous issues. Finally, we highlight open challenges and issues in the literature, and introduce some interesting directions for future research. △ Less

Submitted 30 July, 2020; originally announced July 2020.

Comments: Submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Showing 1–50 of 68 results for author: Hwang, W