subscribe to arXiv mailings

Multibeam Satellite Communications with Massive MIMO: Asymptotic Performance Analysis and Design Insights

Authors: Seyong Kim, Jinseok Choi, Wonjae Shin, Namyoon Lee, Jeonghun Park

Abstract: To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by… ▽ More To achieve high performance without substantial overheads associated with channel state information (CSI) of ground users, we consider a fixed-beam precoding approach, where a satellite forms multiple fixed-beams without relying on CSI, then select a suitable user set for each beam. Upon this precoding method, we put forth a satellite equipped with massive multiple-input multiple-output (MIMO), by which inter-beam interference is efficiently mitigated by narrowing corresponding beam width. By modeling the ground users' locations via a Poisson point process, we rigorously analyze the achievable performance of the presented multibeam satellite system. In particular, we investigate the asymptotic scaling laws that reveal the interplay between the user density, the number of beams, and the number of antennas. Our analysis offers critical design insights for the multibeam satellite with massive MIMO: i) If the user density scales in power with the number of antennas, the considered precoding can achieve a linear fraction of the optimal rate in the asymptotic regime. ii) A certain additional scaling factor for the user density is needed as the number of beams increases to maintain the asymptotic optimality. △ Less

Submitted 15 July, 2024; originally announced July 2024.

arXiv:2407.08923 [pdf, other]

A Bistatic ISAC Framework for LEO Satellite Systems: A Rate-Splitting Approach

Authors: Juha Park, Jaehyup Seong, Jaehak Ryu, Yijie Mao, Wonjae Shin

Abstract: Aiming to achieve ubiquitous global connectivity and target detection on the same platform with improved spectral/energy efficiency and reduced onboard hardware cost, low Earth orbit (LEO) satellite systems capable of simultaneously performing communications and radar have attracted significant attention. Designing such a joint system should address not only the challenges of integrating two funct… ▽ More Aiming to achieve ubiquitous global connectivity and target detection on the same platform with improved spectral/energy efficiency and reduced onboard hardware cost, low Earth orbit (LEO) satellite systems capable of simultaneously performing communications and radar have attracted significant attention. Designing such a joint system should address not only the challenges of integrating two functions but also the unique propagation characteristics of the satellites. To overcome severe echo signal path loss due to the high altitude of the satellite, we put forth a bistatic integrated sensing and communication (ISAC) framework with a radar receiver separated from the satellite. For robust and effective interference management, we employ rate-splitting multiple access (RSMA), which splits and encodes users messages into private and common streams. We optimize the dual-functional precoders to maximize the minimum rate among all users while satisfying the Cramer-Rao bound (CRB) constraints. Given the challenge of acquiring instantaneous channel state information (iCSI) for LEO satellites, we exploit the geometrical and statistical characteristics of the satellite channel. To develop an efficient optimization algorithm, semidefinite relaxation (SDR), sequential rank-1 constraint relaxation (SROCR), and successive convex approximation (SCA) are utilized. Numerical results show that the proposed framework efficiently performs both communication and radar, demonstrating superior interference control capabilities. Furthermore, it is validated that the common stream plays three vital roles: i) beamforming towards the radar target, ii) interference management between communications and radar, and iii) interference management among communication users. △ Less

Submitted 11 July, 2024; originally announced July 2024.

Comments: 33 pages, 8 figures, 2 tables

arXiv:2406.19135 [pdf, other]

DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability

Authors: Hyun Joon Park, Jin Sob Kim, Wooseok Shin, Sung Won Han

Abstract: Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a… ▽ More Expressive Text-to-Speech (TTS) using reference speech has been studied extensively to synthesize natural speech, but there are limitations to obtaining well-represented styles and improving model generalization ability. In this study, we present Diffusion-based EXpressive TTS (DEX-TTS), an acoustic model designed for reference-based speech synthesis with enhanced style representations. Based on a general diffusion TTS framework, DEX-TTS includes encoders and adapters to handle styles extracted from reference speech. Key innovations contain the differentiation of styles into time-invariant and time-variant categories for effective style extraction, as well as the design of encoders and adapters with high generalization ability. In addition, we introduce overlapping patchify and convolution-frequency patch embedding strategies to improve DiT-based diffusion networks for TTS. DEX-TTS yields outstanding performance in terms of objective and subjective evaluation in English multi-speaker and emotional multi-speaker datasets, without relying on pre-training strategies. Lastly, the comparison results for the general TTS on a single-speaker dataset verify the effectiveness of our enhanced diffusion backbone. Demos are available here. △ Less

Submitted 27 June, 2024; originally announced June 2024.

Comments: Preprint

arXiv:2406.11504 [pdf, other]

On the Feasibility of Fidelity$^-$ for Graph Pruning

Authors: Yong-Min Shin, Won-Yong Shin

Abstract: As one of popular quantitative metrics to assess the quality of explanation of graph neural networks (GNNs), fidelity measures the output difference after removing unimportant parts of the input graph. Fidelity has been widely used due to its straightforward interpretation that the underlying model should produce similar predictions when features deemed unimportant from the explanation are removed… ▽ More As one of popular quantitative metrics to assess the quality of explanation of graph neural networks (GNNs), fidelity measures the output difference after removing unimportant parts of the input graph. Fidelity has been widely used due to its straightforward interpretation that the underlying model should produce similar predictions when features deemed unimportant from the explanation are removed. This raises a natural question: "Does fidelity induce a global (soft) mask for graph pruning?" To solve this, we aim to explore the potential of the fidelity measure to be used for graph pruning, eventually enhancing the GNN models for better efficiency. To this end, we propose Fidelity$^-$-inspired Pruning (FiP), an effective framework to construct global edge masks from local explanations. Our empirical observations using 7 edge attribution methods demonstrate that, surprisingly, general eXplainable AI methods outperform methods tailored to GNNs in terms of graph pruning performance. △ Less

Submitted 17 June, 2024; originally announced June 2024.

Comments: 6 pages, 3 figures, 2 tables; IJCAI Workshop on Explainable AI (XAI 2024) (to appear) (Please cite our workshop version.)

arXiv:2406.05602 [pdf, other]

Can Prompt Modifiers Control Bias? A Comparative Analysis of Text-to-Image Generative Models

Authors: Philip Wootaek Shin, Jihyun Janice Ahn, Wenpeng Yin, Jack Sampson, Vijaykrishnan Narayanan

Abstract: It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and t… ▽ More It has been shown that many generative models inherit and amplify societal biases. To date, there is no uniform/systematic agreed standard to control/adjust for these biases. This study examines the presence and manipulation of societal biases in leading text-to-image models: Stable Diffusion, DALL-E 3, and Adobe Firefly. Through a comprehensive analysis combining base prompts with modifiers and their sequencing, we uncover the nuanced ways these AI technologies encode biases across gender, race, geography, and region/culture. Our findings reveal the challenges and potential of prompt engineering in controlling biases, highlighting the critical need for ethical AI development promoting diversity and inclusivity. This work advances AI ethics by not only revealing the nuanced dynamics of bias in text-to-image generation models but also by offering a novel framework for future research in controlling bias. Our contributions-panning comparative analyses, the strategic use of prompt modifiers, the exploration of prompt sequencing effects, and the introduction of a bias sensitivity taxonomy-lay the groundwork for the development of common metrics and standard analyses for evaluating whether and how future AI models exhibit and respond to requests to adjust for inherent biases. △ Less

Submitted 8 June, 2024; originally announced June 2024.

arXiv:2406.04612 [pdf, other]

Revisiting Attention Weights as Interpretations of Message-Passing Neural Networks

Authors: Yong-Min Shin, Siqing Li, Xin Cao, Won-Yong Shin

Abstract: The self-attention mechanism has been adopted in several widely-used message-passing neural networks (MPNNs) (e.g., GATs), which adaptively controls the amount of information that flows along the edges of the underlying graph. This usage of attention has made such models a baseline for studies on explainable AI (XAI) since interpretations via attention have been popularized in various domains (e.g… ▽ More The self-attention mechanism has been adopted in several widely-used message-passing neural networks (MPNNs) (e.g., GATs), which adaptively controls the amount of information that flows along the edges of the underlying graph. This usage of attention has made such models a baseline for studies on explainable AI (XAI) since interpretations via attention have been popularized in various domains (e.g., natural language processing and computer vision). However, existing studies often use naive calculations to derive attribution scores from attention, and do not take the precise and careful calculation of edge attribution into consideration. In our study, we aim to fill the gap between the widespread usage of attention-enabled MPNNs and their potential in largely under-explored explainability, a topic that has been actively investigated in other areas. To this end, as the first attempt, we formalize the problem of edge attribution from attention weights in GNNs. Then, we propose GATT, an edge attribution calculation method built upon the computation tree. Through comprehensive experiments, we demonstrate the effectiveness of our proposed method when evaluating attributions from GATs. Conversely, we empirically validate that simply averaging attention weights over graph attention layers is insufficient to interpret the GAT model's behavior. Code is publicly available at https://github.com/jordan7186/GAtt/tree/main. △ Less

Submitted 6 June, 2024; originally announced June 2024.

Comments: 11 pages, 3 figures, 5 tables

arXiv:2405.20610 [pdf, other]

Revisiting and Maximizing Temporal Knowledge in Semi-supervised Semantic Segmentation

Authors: Wooseok Shin, Hyun Joon Park, Jin Sob Kim, Sung Won Han

Abstract: In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatc… ▽ More In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high performance, these approaches frequently involve complex training pipelines and a substantial computational burden, limiting the scalability and compatibility of these methods. In this paper, we propose a PrevMatch framework that effectively mitigates the aforementioned limitations by maximizing the utilization of the temporal knowledge obtained during the training process. The PrevMatch framework relies on two core strategies: (1) we reconsider the use of temporal knowledge and thus directly utilize previous models obtained during training to generate additional pseudo-label guidance, referred to as previous guidance. (2) we design a highly randomized ensemble strategy to maximize the effectiveness of the previous guidance. Experimental results on four benchmark semantic segmentation datasets confirm that the proposed method consistently outperforms existing methods across various evaluation protocols. In particular, with DeepLabV3+ and ResNet-101 network settings, PrevMatch outperforms the existing state-of-the-art method, Diverse Co-training, by +1.6 mIoU on Pascal VOC with only 92 annotated images, while achieving 2.4 times faster training. Furthermore, the results indicate that PrevMatch induces stable optimization, particularly in benefiting classes that exhibit poor performance. Code is available at https://github.com/wooseok-shin/PrevMatch △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 14 pages, 5 figures, submitted to IEEE TPAMI. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

arXiv:2404.14243 [pdf, other]

Turbo-CF: Matrix Decomposition-Free Graph Filtering for Fast Recommendation

Authors: Jin-Duk Park, Yong-Min Shin, Won-Yong Shin

Abstract: A series of graph filtering (GF)-based collaborative filtering (CF) showcases state-of-the-art performance on the recommendation accuracy by using a low-pass filter (LPF) without a training process. However, conventional GF-based CF approaches mostly perform matrix decomposition on the item-item similarity graph to realize the ideal LPF, which results in a non-trivial computational cost and thus m… ▽ More A series of graph filtering (GF)-based collaborative filtering (CF) showcases state-of-the-art performance on the recommendation accuracy by using a low-pass filter (LPF) without a training process. However, conventional GF-based CF approaches mostly perform matrix decomposition on the item-item similarity graph to realize the ideal LPF, which results in a non-trivial computational cost and thus makes them less practical in scenarios where rapid recommendations are essential. In this paper, we propose Turbo-CF, a GF-based CF method that is both training-free and matrix decomposition-free. Turbo-CF employs a polynomial graph filter to circumvent the issue of expensive matrix decompositions, enabling us to make full use of modern computer hardware components (i.e., GPU). Specifically, Turbo-CF first constructs an item-item similarity graph whose edge weights are effectively regulated. Then, our own polynomial LPFs are designed to retain only low-frequency signals without explicit matrix decompositions. We demonstrate that Turbo-CF is extremely fast yet accurate, achieving a runtime of less than 1 second on real-world benchmark datasets while achieving recommendation accuracies comparable to best competitors. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 5 pages, 4 figures, 4 tables; 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024) (to appear) (Please cite our conference version.)

arXiv:2404.14240 [pdf, other]

Collaborative Filtering Based on Diffusion Models: Unveiling the Potential of High-Order Connectivity

Authors: Yu Hou, Jin-Duk Park, Won-Yong Shin

Abstract: A recent study has shown that diffusion models are well-suited for modeling the generative process of user-item interactions in recommender systems due to their denoising nature. However, existing diffusion model-based recommender systems do not explicitly leverage high-order connectivities that contain crucial collaborative signals for accurate recommendations. Addressing this gap, we propose CF-… ▽ More A recent study has shown that diffusion models are well-suited for modeling the generative process of user-item interactions in recommender systems due to their denoising nature. However, existing diffusion model-based recommender systems do not explicitly leverage high-order connectivities that contain crucial collaborative signals for accurate recommendations. Addressing this gap, we propose CF-Diff, a new diffusion model-based collaborative filtering (CF) method, which is capable of making full use of collaborative signals along with multi-hop neighbors. Specifically, the forward-diffusion process adds random noise to user-item interactions, while the reverse-denoising process accommodates our own learning model, named cross-attention-guided multi-hop autoencoder (CAM-AE), to gradually recover the original user-item interactions. CAM-AE consists of two core modules: 1) the attention-aided AE module, responsible for precisely learning latent representations of user-item interactions while preserving the model's complexity at manageable levels, and 2) the multi-hop cross-attention module, which judiciously harnesses high-order connectivity information to capture enhanced collaborative signals. Through comprehensive experiments on three real-world datasets, we demonstrate that CF-Diff is (a) Superior: outperforming benchmark recommendation methods, achieving remarkable gains up to 7.29% compared to the best competitor, (b) Theoretically-validated: reducing computations while ensuring that the embeddings generated by our model closely approximate those from the original cross-attention, and (c) Scalable: proving the computational efficiency that scales linearly with the number of users or items. △ Less

Submitted 22 April, 2024; originally announced April 2024.

Comments: 10 pages, 6 figures, 4 tables; 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024) (to appear) (Please cite our conference version.)

arXiv:2404.01954 [pdf, other]

HyperCLOVA X Technical Report

Authors: Kang Min Yoo, Jaegeun Han, Sookyo In, Heewon Jeon, Jisu Jeong, Jaewook Kang, Hyunwook Kim, Kyung-Min Kim, Munhyong Kim, Sungju Kim, Donghyun Kwak, Hanock Kwak, Se Jung Kwon, Bado Lee, Dongsoo Lee, Gichang Lee, Jooho Lee, Baeseong Park, Seongjin Shin, Joonsang Yu, Seolki Baek, Sumin Byeon, Eungsup Cho, Dooseok Choe, Jeesung Han , et al. (371 additional authors not shown)

Abstract: We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t… ▽ More We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs. △ Less

Submitted 13 April, 2024; v1 submitted 2 April, 2024; originally announced April 2024.

Comments: 44 pages; updated authors list and fixed author names

arXiv:2403.15048 [pdf, other]

Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning

Authors: Bumsoo Kim, Wonseop Shin, Kyuchul Lee, Sanghyun Seo

Abstract: Large-scale Text-to-Image (TTI) models have become a common approach for generating training data in various generative fields. However, visual hallucinations, which contain perceptually critical defects, remain a concern, especially in non-photorealistic styles like cartoon characters. We propose a novel visual hallucination detection system for cartoon character images generated by TTI models. O… ▽ More Large-scale Text-to-Image (TTI) models have become a common approach for generating training data in various generative fields. However, visual hallucinations, which contain perceptually critical defects, remain a concern, especially in non-photorealistic styles like cartoon characters. We propose a novel visual hallucination detection system for cartoon character images generated by TTI models. Our approach leverages pose-aware in-context visual learning (PA-ICVL) with Vision-Language Models (VLMs), utilizing both RGB images and pose information. By incorporating pose guidance from a fine-tuned pose estimator, we enable VLMs to make more accurate decisions. Experimental results demonstrate significant improvements in identifying visual hallucinations compared to baseline methods relying solely on RGB images. This research advances TTI models by mitigating visual hallucinations, expanding their potential in non-photorealistic domains. △ Less

Submitted 24 March, 2024; v1 submitted 22 March, 2024; originally announced March 2024.

Comments: 11 pages, 12 figures, 1 table, Project page: https://gh-bumsookim.github.io/Cartoon-Hallucinations-Detection/

arXiv:2403.14155 [pdf, other]

Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization

Authors: Yeji Song, Jimyeong Kim, Wonhark Park, Wonsik Shin, Wonjong Rhee, Nojun Kwak

Abstract: In a surge of text-to-image (T2I) models and their customization methods that generate new images of a user-provided subject, current works focus on alleviating the costs incurred by a lengthy per-subject optimization. These zero-shot customization methods encode the image of a specified subject into a visual embedding which is then utilized alongside the textual embedding for diffusion guidance.… ▽ More In a surge of text-to-image (T2I) models and their customization methods that generate new images of a user-provided subject, current works focus on alleviating the costs incurred by a lengthy per-subject optimization. These zero-shot customization methods encode the image of a specified subject into a visual embedding which is then utilized alongside the textual embedding for diffusion guidance. The visual embedding incorporates intrinsic information about the subject, while the textual embedding provides a new, transient context. However, the existing methods often 1) are significantly affected by the input images, eg., generating images with the same pose, and 2) exhibit deterioration in the subject's identity. We first pin down the problem and show that redundant pose information in the visual embedding interferes with the textual embedding containing the desired pose information. To address this issue, we propose orthogonal visual embedding which effectively harmonizes with the given textual embedding. We also adopt the visual-only embedding and inject the subject's clear features utilizing a self-attention swap. Our results demonstrate the effectiveness and robustness of our method, which offers highly flexible zero-shot generation while effectively maintaining the subject's identity. △ Less

Submitted 21 March, 2024; originally announced March 2024.

Comments: Project page: https://ldynx.github.io/harmony-zero-t2i/

arXiv:2402.11925 [pdf, other]

Energy-Efficient Edge Learning via Joint Data Deepening-and-Prefetching

Authors: Sujin Kook, Won-Yong Shin, Seong-Lyun Kim, Seung-Woo Ko

Abstract: The vision of pervasive artificial intelligence (AI) services can be realized by training an AI model on time using real-time data collected by internet of things (IoT) devices. To this end, IoT devices require offloading their data to an edge server in proximity. However, transmitting high-dimensional and voluminous data from energy-constrained IoT devices poses a significant challenge. To addres… ▽ More The vision of pervasive artificial intelligence (AI) services can be realized by training an AI model on time using real-time data collected by internet of things (IoT) devices. To this end, IoT devices require offloading their data to an edge server in proximity. However, transmitting high-dimensional and voluminous data from energy-constrained IoT devices poses a significant challenge. To address this limitation, we propose a novel offloading architecture, called joint data deepening-and-prefetching (JD2P), which is feature-by-feature offloading comprising two key techniques. The first one is data deepening, where each data sample's features are sequentially offloaded in the order of importance determined by the data embedding technique such as principle component analysis (PCA). Offloading is terminated once the already transmitted features are sufficient for accurate data classification, resulting in a reduction in the amount of transmitted data. The criteria to offload data are derived for binary and multi-class classifiers, which are designed based on support vector machine (SVM) and deep neural network (DNN), respectively. The second one is data prefetching, where some features potentially required in the future are offloaded in advance, thus achieving high efficiency via precise prediction and parameter optimization. We evaluate the effectiveness of JD2P through experiments using the MNIST dataset, and the results demonstrate its significant reduction in expected energy consumption compared to several benchmarks without degrading learning accuracy. △ Less

Submitted 19 February, 2024; originally announced February 2024.

Comments: accepted for publication in IEEE Transactions on Wireless Communications. arXiv admin note: text overlap with arXiv:2211.07146

arXiv:2402.10781 [pdf, other]

Towards 6G Evolution: Three Enhancements, Three Innovations, and Three Major Challenges

Authors: Rohit Singh, Aryan Kaushik, Wonjae Shin, Marco Di Renzo, Vincenzo Sciancalepore, Doohwan Lee, Hirofumi Sasaki, Arman Shojaeifard, Octavia A. Dobre

Abstract: Over the past few decades, wireless communication has witnessed remarkable growth, experiencing several transformative changes. This article aims to provide a comprehensive overview of wireless communication technologies, from the foundations to the recent wireless advances. Specifically, we take a neutral look at the state-of-the-art technologies for 5G and the ongoing evolutions towards 6G, revi… ▽ More Over the past few decades, wireless communication has witnessed remarkable growth, experiencing several transformative changes. This article aims to provide a comprehensive overview of wireless communication technologies, from the foundations to the recent wireless advances. Specifically, we take a neutral look at the state-of-the-art technologies for 5G and the ongoing evolutions towards 6G, reviewing the recommendations of the International Mobile Communication vision for 2030 (IMT-2030). We first highlight specific features of IMT 2030, including three IMT-2020 extensions (URLLC+, eMBB+, and mMTC+) and three new innovations (Ubiquitous connectivity and integrating the new capabilities of sensing & AI with communication functionality). Then, we delve into three major challenges in implementing 6G, along with global standardization efforts. Besides, a proof of concept is provided by demonstrating terahertz (THz) signal transmission using Orbital Angular Momentum (OAM) multiplexing, which is one of the potential candidates for 6G and beyond. To inspire further potential research, we conclude by identifying research opportunities and future visions on IMT-2030 recommendations. △ Less

Submitted 16 February, 2024; originally announced February 2024.

Comments: 8 pages, 4 figures, 1 table

arXiv:2402.07381 [pdf, other]

RIS-Empowered LEO Satellite Networks for 6G: Promising Usage Scenarios and Future Directions

Authors: Mesut Toka, Byungju Lee, Jaehyup Seong, Aryan Kaushik, Juhwan Lee, Jungwoo Lee, Namyoon Lee, Wonjae Shin, H. Vincent Poor

Abstract: Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has eme… ▽ More Low-Earth orbit (LEO) satellite systems have been deemed a promising key enabler for current 5G and the forthcoming 6G wireless networks. Such LEO satellite constellations can provide worldwide three-dimensional coverage, high data rate, and scalability, thus enabling truly ubiquitous connectivity. On the other hand, another promising technology, reconfigurable intelligent surfaces (RISs), has emerged with favorable features, such as flexible deployment, cost & power efficiency, less transmission delay, noise-free nature, and in-band full-duplex structure. LEO satellite networks have many practical imperfections and limitations; however, exploiting RISs has been shown to be a potential solution to overcome these challenges. Particularly, RISs can enhance link quality, reduce the Doppler shift effect, and mitigate inter-/intra beam interference. In this article, we delve into exploiting RISs in LEO satellite networks. First, we present a holistic overview of LEO satellite communication and RIS technology, highlighting potential benefits and challenges. Second, we describe promising usage scenarios and applications in detail. Finally, we discuss potential future directions and challenges on RIS-empowered LEO networks, offering futuristic visions of the upcoming 6G era. △ Less

Submitted 11 February, 2024; originally announced February 2024.

Comments: 18 pages, 5 figures, Paper accepted by IEEE Communications Magazine

arXiv:2402.05448 [pdf, other]

Minecraft-ify: Minecraft Style Image Generation with Text-guided Image Editing for In-Game Application

Authors: Bumsoo Kim, Sanghyun Byun, Yonghoon Jung, Wonseop Shin, Sareer UI Amin, Sanghyun Seo

Abstract: In this paper, we first present the character texture generation system \textit{Minecraft-ify}, specified to Minecraft video game toward in-game application. Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold. While existing projects or works only generate texture, proposed system can inverse the user-provided real image, or generate aver… ▽ More In this paper, we first present the character texture generation system \textit{Minecraft-ify}, specified to Minecraft video game toward in-game application. Ours can generate face-focused image for texture mapping tailored to 3D virtual character having cube manifold. While existing projects or works only generate texture, proposed system can inverse the user-provided real image, or generate average/random appearance from learned distribution. Moreover, it can be manipulated with text-guidance using StyleGAN and StyleCLIP. These features provide a more extended user experience with enlarged freedom as a user-friendly AI-tool. Project page can be found at https://gh-bumsookim.github.io/Minecraft-ify/ △ Less

Submitted 3 March, 2024; v1 submitted 8 February, 2024; originally announced February 2024.

Comments: 2 pages, 2 figures. Accepted as Spotlight to NeurIPS 2023 Workshop on Machine Learning for Creativity and Design

arXiv:2402.00729 [pdf, other]

Profiling and Modeling of Power Characteristics of Leadership-Scale HPC System Workloads

Authors: Ahmad Maroof Karimi, Naw Safrin Sattar, Woong Shin, Feiyi Wang

Abstract: In the exascale era in which application behavior has large power & energy footprints, per-application job-level awareness of such impression is crucial in taking steps towards achieving efficiency goals beyond performance, such as energy efficiency, and sustainability. To achieve these goals, we have developed a novel low-latency job power profiling machine learning pipeline that can group job-… ▽ More In the exascale era in which application behavior has large power & energy footprints, per-application job-level awareness of such impression is crucial in taking steps towards achieving efficiency goals beyond performance, such as energy efficiency, and sustainability. To achieve these goals, we have developed a novel low-latency job power profiling machine learning pipeline that can group job-level power profiles based on their shapes as they complete. This pipeline leverages a comprehensive feature extraction and clustering pipeline powered by a generative adversarial network (GAN) model to handle the feature-rich time series of job-level power measurements. The output is then used to train a classification model that can predict whether an incoming job power profile is similar to a known group of profiles or is completely new. With extensive evaluations, we demonstrate the effectiveness of each component in our pipeline. Also, we provide a preliminary analysis of the resulting clusters that depict the power profile landscape of the Summit supercomputer from more than 60K jobs sampled from the year 2021. △ Less

Submitted 1 February, 2024; originally announced February 2024.

arXiv:2312.09511 [pdf, other]

MONET: Modality-Embracing Graph Convolutional Network and Target-Aware Attention for Multimedia Recommendation

Authors: Yungi Kim, Taeri Kim, Won-Yong Shin, Sang-Wook Kim

Abstract: In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together. Our study aims to exploit multimodal features more effectively in order to accurately capture users' preferences for items. To this end, we point out following two limitations of existing GCN-based multimedia reco… ▽ More In this paper, we focus on multimedia recommender systems using graph convolutional networks (GCNs) where the multimodal features as well as user-item interactions are employed together. Our study aims to exploit multimodal features more effectively in order to accurately capture users' preferences for items. To this end, we point out following two limitations of existing GCN-based multimedia recommender systems: (L1) although multimodal features of interacted items by a user can reveal her preferences on items, existing methods utilize GCN designed to focus only on capturing collaborative signals, resulting in insufficient reflection of the multimodal features in the final user/item embeddings; (L2) although a user decides whether to prefer the target item by considering its multimodal features, existing methods represent her as only a single embedding regardless of the target item's multimodal features and then utilize her embedding to predict her preference for the target item. To address the above issues, we propose a novel multimedia recommender system, named MONET, composed of following two core ideas: modality-embracing GCN (MeGCN) and target-aware attention. Through extensive experiments using four real-world datasets, we demonstrate i) the significant superiority of MONET over seven state-of-the-art competitors (up to 30.32% higher accuracy in terms of recall@20, compared to the best competitor) and ii) the effectiveness of the two core ideas in MONET. All MONET codes are available at https://github.com/Kimyungi/MONET. △ Less

Submitted 14 December, 2023; originally announced December 2023.

Comments: Accepted by WSDM 2024

arXiv:2312.02503 [pdf, other]

SAVE: Protagonist Diversification with Structure Agnostic Video Editing

Authors: Yeji Song, Wonsik Shin, Junsoo Lee, Jeesoo Kim, Nojun Kwak

Abstract: Driven by the upsurge progress in text-to-image (T2I) generation models, text-to-video (T2V) generation has experienced a significant advance as well. Accordingly, tasks such as modifying the object or changing the style in a video have been possible. However, previous works usually work well on trivial and consistent shapes, and easily collapse on a difficult target that has a largely different b… ▽ More Driven by the upsurge progress in text-to-image (T2I) generation models, text-to-video (T2V) generation has experienced a significant advance as well. Accordingly, tasks such as modifying the object or changing the style in a video have been possible. However, previous works usually work well on trivial and consistent shapes, and easily collapse on a difficult target that has a largely different body shape from the original one. In this paper, we spot the bias problem in the existing video editing method that restricts the range of choices for the new protagonist and attempt to address this issue using the conventional image-level personalization method. We adopt motion personalization that isolates the motion from a single source video and then modifies the protagonist accordingly. To deal with the natural discrepancy between image and video, we propose a motion word with an inflated textual embedding to properly represent the motion in a source video. We also regulate the motion word to attend to proper motion-related areas by introducing a novel pseudo optical flow, efficiently computed from the pre-calculated attention maps. Finally, we decouple the motion from the appearance of the source video with an additional pseudo word. Extensive experiments demonstrate the editing capability of our method, taking a step toward more diverse and extensive video editing. △ Less

Submitted 5 December, 2023; originally announced December 2023.

Comments: Project website: https://ldynx.github.io/SAVE/

arXiv:2311.17781 [pdf, other]

Propagate & Distill: Towards Effective Graph Learners Using Propagation-Embracing MLPs

Authors: Yong-Min Shin, Won-Yong Shin

Abstract: Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semisupervised node classification on graphs, by training a student MLP by knowledge distillation from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during distillation, it h… ▽ More Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semisupervised node classification on graphs, by training a student MLP by knowledge distillation from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during distillation, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $Π$, we re-frame the distillation process as making the student MLP learn both $T$ and $Π$. Although this can be achieved by applying the inverse propagation $Π^{-1}$ before distillation from the teacher, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher before distillation, which can be interpreted as an approximate process of the inverse propagation. We demonstrate that P&D can readily improve the performance of the student MLP. △ Less

Submitted 29 November, 2023; originally announced November 2023.

Comments: 17 pages, 2 figures, 8 tables; 2nd Learning on Graphs Conference (LoG 2023) (Please cite our conference version.). arXiv admin note: substantial text overlap with arXiv:2311.11759

arXiv:2311.11759 [pdf, other]

Unveiling the Unseen Potential of Graph Learning through MLPs: Effective Graph Learners Using Propagation-Embracing MLPs

Authors: Yong-Min Shin, Won-Yong Shin

Abstract: Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semi-supervised node classification on graphs, by training a student MLP by knowledge distillation (KD) from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during KD, it has n… ▽ More Recent studies attempted to utilize multilayer perceptrons (MLPs) to solve semi-supervised node classification on graphs, by training a student MLP by knowledge distillation (KD) from a teacher graph neural network (GNN). While previous studies have focused mostly on training the student MLP by matching the output probability distributions between the teacher and student models during KD, it has not been systematically studied how to inject the structural information in an explicit and interpretable manner. Inspired by GNNs that separate feature transformation $T$ and propagation $Π$, we re-frame the KD process as enabling the student MLP to explicitly learn both $T$ and $Π$. Although this can be achieved by applying the inverse propagation $Π^{-1}$ before distillation from the teacher GNN, it still comes with a high computational cost from large matrix multiplications during training. To solve this problem, we propose Propagate & Distill (P&D), which propagates the output of the teacher GNN before KD and can be interpreted as an approximate process of the inverse propagation $Π^{-1}$. Through comprehensive evaluations using real-world benchmark datasets, we demonstrate the effectiveness of P&D by showing further performance boost of the student MLP. △ Less

Submitted 20 November, 2023; originally announced November 2023.

Comments: 35 pages, 5 figures, 8 tables

arXiv:2311.07223 [pdf, other]

Wasm SpecTec: Engineering a Formal Language Standard

Authors: Joachim Breitner, Philippa Gardner, Jaehyun Lee, Sam Lindley, Matija Pretnar, Xiaojia Rao, Andreas Rossberg, Sukyoung Ryu, Wonho Shin, Conrad Watt, Dongjun Youn

Abstract: WebAssembly (Wasm) is a low-level bytecode language and virtual machine, intended as a compilation target for a wide range of programming languages, which is seeing increasing adoption across diverse ecosystems. As a young technology, Wasm continues to evolve -- it reached version 2.0 last year and another major update is expected soon. For a new feature to be standardised in Wasm, four key arte… ▽ More WebAssembly (Wasm) is a low-level bytecode language and virtual machine, intended as a compilation target for a wide range of programming languages, which is seeing increasing adoption across diverse ecosystems. As a young technology, Wasm continues to evolve -- it reached version 2.0 last year and another major update is expected soon. For a new feature to be standardised in Wasm, four key artefacts must be presented: a formal (mathematical) specification of the feature, an accompanying prose pseudocode description, an implementation in the official reference interpreter, and a suite of unit tests. This rigorous process helps to avoid errors in the design and implementation of new Wasm features, and Wasm's distinctive formal specification in particular has facilitated machine-checked proofs of various correctness properties for the language. However, manually crafting all of these artefacts requires expert knowledge combined with repetitive and tedious labor, which is a burden on the language's standardization process and authoring of the specification. This paper presents Wasm SpecTec, a technology to express the formal specification of Wasm through a domain-specific language. This DSL allows all of Wasm's currently handwritten specification artefacts to be error-checked and generated automatically from a single source of truth, and is designed to be easy to write, read, compare, and review. We believe that Wasm SpecTec's automation and meta-level error checking will significantly ease the current burden of the language's specification authors. We demonstrate the current capabilities of Wasm SpecTec by showcasing its proficiency in generating various artefacts, and describe our work towards replacing the manually written official Wasm specification document with specifications generated by Wasm SpecTec. △ Less

Submitted 13 November, 2023; originally announced November 2023.

Comments: 5 pages, 7 figures

arXiv:2310.16371 [pdf, other]

Synergizing Airborne Non-Terrestrial Networks and Reconfigurable Intelligent Surfaces-Aided 6G IoT

Authors: Muhammad Ali Jamshed, Aryan Kaushik, Mesut Toka, Wonjae Shin, Muhammad Zeeshan Shakir, Soumya P. Dash, Davide Dardari

Abstract: On the one hand, Reconfigurable Intelligent Surfaces (RISs) emerge as a promising solution to meet the demand for higher data rates, improved coverage, and efficient spectrum utilization. On the other hand, Non-Terrestrial Networks (NTNs) offer unprecedented possibilities for global connectivity. Moreover, the NTN can also support the upsurge in the number of Internet of Things (IoT) devices by pr… ▽ More On the one hand, Reconfigurable Intelligent Surfaces (RISs) emerge as a promising solution to meet the demand for higher data rates, improved coverage, and efficient spectrum utilization. On the other hand, Non-Terrestrial Networks (NTNs) offer unprecedented possibilities for global connectivity. Moreover, the NTN can also support the upsurge in the number of Internet of Things (IoT) devices by providing reliable and ubiquitous connectivity. Although NTNs have shown promising results, there are several challenges associated with their usage, such as signal propagation delays, interference, security, etc. In this article, we have discussed the possibilities of integrating RIS with an NTN platform to overcome the issues associated with NTN. Furthermore, through experimental validation, we have demonstrated that the RIS-assisted NTN can play a pivotal role in improving the performance of the entire communication system. △ Less

Submitted 25 October, 2023; originally announced October 2023.

Comments: 15 pages, 5 figures

arXiv:2310.00244 [pdf, other]

Coordinated Rate-Splitting Multiple Access for Integrated Satellite-Terrestrial Networks with Super-Common Message

Authors: Juhwan Lee, Jungwoo Lee, Longfei Yin, Wonjae Shin, Bruno Clerckx

Abstract: Rate-splitting multiple access (RSMA) is an emerging multiple access technique for multi-antenna networks that splits messages into common and private parts for flexible interference mitigation. Motivated by its robustness and scalability, it is promising to employ RSMA in integrated satellite-terrestrial networks (ISTN), where a satellite serves satellite users (SUs) broadly with a multibeam mult… ▽ More Rate-splitting multiple access (RSMA) is an emerging multiple access technique for multi-antenna networks that splits messages into common and private parts for flexible interference mitigation. Motivated by its robustness and scalability, it is promising to employ RSMA in integrated satellite-terrestrial networks (ISTN), where a satellite serves satellite users (SUs) broadly with a multibeam multicast transmission while terrestrial base station (BS) serves cellular users (CUs) with a unicast transmission, operating in the same frequency band. To avoid the data exchange between satellite/cellular networks via backhaul, we assume a coordinated ISTN relying on imperfect channel state information. We put forth a coordinated RSMA framework tailored to the coordinated ISTN by applying inter-network rate-splitting (RS) with a super-common message on top of intra-network RS with common/private messages. With the unified RS design for inter- and intra-networks, we jointly optimize the precoding and power allocation of the private/common/super-common messages to achieve max-min fairness among all SUs and CUs through successive convex approximation. By doing so, the power of the super-common message can be adjusted according to interference levels of the satellite towards CUs, thereby potentially mitigating inter-network interference. Simulation results demonstrate the superiority and robustness of our approach to cope with various interference and propagation conditions. △ Less

Submitted 30 September, 2023; originally announced October 2023.

Comments: 16 pages, 3 figures

arXiv:2309.15433 [pdf, other]

Cardinality Estimation of Subgraph Matching: A Filtering-Sampling Approach

Authors: Wonseok Shin, Siwoo Song, Kunsoo Park, Wook-Shin Han

Abstract: Subgraph counting is a fundamental problem in understanding and analyzing graph structured data, yet computationally challenging. This calls for an accurate and efficient algorithm for Subgraph Cardinality Estimation, which is to estimate the number of all isomorphic embeddings of a query graph in a data graph. We present FaSTest, a novel algorithm that combines (1) a powerful filtering technique… ▽ More Subgraph counting is a fundamental problem in understanding and analyzing graph structured data, yet computationally challenging. This calls for an accurate and efficient algorithm for Subgraph Cardinality Estimation, which is to estimate the number of all isomorphic embeddings of a query graph in a data graph. We present FaSTest, a novel algorithm that combines (1) a powerful filtering technique to significantly reduce the sample space, (2) an adaptive tree sampling algorithm for accurate and efficient estimation, and (3) a worst-case optimal stratified graph sampling algorithm for difficult instances. Extensive experiments on real-world datasets show that FaSTest outperforms state-of-the-art sampling-based methods by up to two orders of magnitude and GNN-based methods by up to three orders of magnitude in terms of accuracy. △ Less

Submitted 15 April, 2024; v1 submitted 27 September, 2023; originally announced September 2023.

arXiv:2309.13542 [pdf, other]

Integrated Sensing and Communications for IoT: Synergies with Key 6G Technology Enablers

Authors: Aryan Kaushik, Rohit Singh, Ming Li, Honghao Luo, Shalanika Dayarathna, Rajitha Senanayake, Xueli An, Richard A. Stirling-Gallacher, Wonjae Shin, Marco Di Renzo

Abstract: The Internet of Things (IoT) and wireless generations have been evolving simultaneously for the past few decades. Built upon wireless communication and sensing technologies, IoT networks are usually evaluated based on metrics that measure the device ability to sense information and effectively share it with the network, which makes Integrated Sensing and Communication (ISAC) a pivotal candidate fo… ▽ More The Internet of Things (IoT) and wireless generations have been evolving simultaneously for the past few decades. Built upon wireless communication and sensing technologies, IoT networks are usually evaluated based on metrics that measure the device ability to sense information and effectively share it with the network, which makes Integrated Sensing and Communication (ISAC) a pivotal candidate for the sixth-generation (6G) IoT standards. This paper reveals several innovative aspects of ISAC from an IoT perspective in 6G, empowering various modern IoT use cases and key technology enablers. Moreover, we address the challenges and future potential of ISAC-enabled IoT, including synergies with Reconfigurable Intelligent Surfaces (RIS), Artificial Intelligence (AI), and key updates of ISAC-IoT in 6G standardization. Furthermore, several evolutionary concepts are introduced to open future research in 6G ISAC-IoT, including the interplay with Non-Terrestrial Networks (NTN) and Orthogonal Time-Frequency Space (OTFS) modulation. △ Less

Submitted 23 September, 2023; originally announced September 2023.

Comments: 7 pages, 6 figures

arXiv:2309.06325 [pdf, other]

Distributed Precoding for Satellite-Terrestrial Integrated Networks Without Sharing CSIT: A Rate-Splitting Approach

Authors: Doseon Kim, Sungyoon Cho, Wonjae Shin, Jeonghun Park, Dong Ku Kim

Abstract: Satellite-terrestrial integrated networks (STINs) are promising architecture for providing global coverage. In STINs, full frequency reuse between a satellite and a terrestrial base station (BS) is encouraged for aggressive spectrum reuse, which induces non-negligible amount of interference. To address the interference management problem in STINs, this paper proposes a novel distributed precoding… ▽ More Satellite-terrestrial integrated networks (STINs) are promising architecture for providing global coverage. In STINs, full frequency reuse between a satellite and a terrestrial base station (BS) is encouraged for aggressive spectrum reuse, which induces non-negligible amount of interference. To address the interference management problem in STINs, this paper proposes a novel distributed precoding method. Key features of our method are: i) a rate-splitting (RS) strategy is incorporated for efficient interference management and ii) the precoders are designed in a distributed way without sharing channel state information between a satellite and a terrestrial BS. Specifically, to design the precoders in a distributed fashion, we put forth a spectral efficiency decoupling technique, that disentangles the total spectral efficiency function into two distinct terms, each of which is dependent solely on the satellite's precoder and the terrestrial BS's precoder, respectively. Then, to resolve the non-smoothness raised by the RS strategy, we approximate the spectral efficiency expression as a smooth function by using the LogSumExp technique; thereafter we develop a generalized power iteration inspired optimization algorithm built based on the first-order optimality condition. Simulation results demonstrate that the proposed method offers considerable spectral efficiency gains compared to the existing methods. △ Less

Submitted 1 April, 2024; v1 submitted 12 September, 2023; originally announced September 2023.

Comments: 13 pages, 3 figures

arXiv:2309.01961 [pdf, other]

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

Authors: Taehoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh , et al. (17 additional authors not shown)

Abstract: In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested… ▽ More In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge. This project is designed to challenge the computer vision community to develop robust image captioning models that advance the state-of-the-art both in terms of accuracy and fairness. Through the challenge, the image captioning models were tested using a new evaluation dataset that includes a large variety of visual concepts from many domains. There was no specific training data provided for the challenge, and therefore the challenge entries were required to adapt to new types of image descriptions that had not been seen during training. This report includes information on the newly proposed NICE dataset, evaluation methods, challenge results, and technical details of top-ranking entries. We expect that the outcomes of the challenge will contribute to the improvement of AI models on various vision-language tasks. △ Less

Submitted 10 September, 2023; v1 submitted 5 September, 2023; originally announced September 2023.

Comments: Tech report, project page https://nice.lgresearch.ai/

arXiv:2308.01227 [pdf, other]

Towards Integrated Sensing and Communications for 6G: A Standardization Perspective

Authors: Aryan Kaushik, Rohit Singh, Shalanika Dayarathna, Rajitha Senanayake, Marco Di Renzo, Miguel Dajer, Hyoungju Ji, Younsun Kim, Vincenzo Sciancalepore, Alessio Zappone, Wonjae Shin

Abstract: The radio communication division of the International Telecommunication Union (ITU-R) has recently adopted Integrated Sensing and Communication (ISAC) among the key usage scenarios for IMT-2030/6G. ISAC is envisioned to play a vital role in the upcoming wireless generation standards. In this work, we bring together several paramount and innovative aspects of ISAC technology from a global 6G standa… ▽ More The radio communication division of the International Telecommunication Union (ITU-R) has recently adopted Integrated Sensing and Communication (ISAC) among the key usage scenarios for IMT-2030/6G. ISAC is envisioned to play a vital role in the upcoming wireless generation standards. In this work, we bring together several paramount and innovative aspects of ISAC technology from a global 6G standardization perspective, including both industrial and academic progress. Specifically, this article provides 6G requirements and ISAC-enabled vision, including various aspects of 6G standardization, benefits of ISAC co-existence, and integration challenges. Moreover, we present key enabling technologies, including intelligent metasurface-aided ISAC, as well as Orthogonal Time Frequency Space (OTFS) waveform design and interference management for ISAC. Finally, future aspects are discussed to open various research opportunities and challenges on the ISAC technology towards 6G wireless communications. △ Less

Submitted 2 August, 2023; originally announced August 2023.

Comments: 7 pages, 5 figures

arXiv:2307.07382 [pdf, other]

Distributed Rate-Splitting Multiple Access for Multilayer Satellite Communications

Authors: Yunnuo Xu, Longfei Yin, Yijie Mao, Wonjae Shin, Bruno Clerckx

Abstract: Future wireless networks, in particular, 5G and beyond, are anticipated to deploy dense Low Earth Orbit (LEO) satellites to provide global coverage and broadband connectivity. However, the limited frequency band and the coexistence of multiple constellations bring new challenges for interference management. In this paper, we propose a robust multilayer interference management scheme for spectrum s… ▽ More Future wireless networks, in particular, 5G and beyond, are anticipated to deploy dense Low Earth Orbit (LEO) satellites to provide global coverage and broadband connectivity. However, the limited frequency band and the coexistence of multiple constellations bring new challenges for interference management. In this paper, we propose a robust multilayer interference management scheme for spectrum sharing in heterogeneous satellite networks with statistical channel state information (CSI) at the transmitter (CSIT) and receivers (CSIR). In the proposed scheme, Rate-Splitting Multiple Access (RSMA), as a general and powerful framework for interference management and multiple access strategies, is implemented distributedly at GEO and LEO satellites, coined Distributed-RSMA (D-RSMA). By doing so, D-RSMA aims to mitigate the interference and boost the user fairness of the overall multilayer satellite system. Specifically, we study the problem of jointly optimizing the GEO/LEO precoders and message splits to maximize the minimum rate among User Terminals (UTs) subject to a transmit power constraint at all satellites. A robust algorithm is proposed to solve the original non-convex optimization problem. Numerical results demonstrate the effectiveness and robustness towards network load and CSI uncertainty of our proposed D-RSMA scheme. Benefiting from the interference management capability, D-RSMA provides significant max-min fairness performance gains compared to several benchmark schemes. △ Less

Submitted 2 May, 2024; v1 submitted 14 July, 2023; originally announced July 2023.

arXiv:2306.12978 [pdf, other]

Rate-Splitting Multiple Access for 6G Networks: Ten Promising Scenarios and Applications

Authors: Jeonghun Park, Byungju Lee, Jinseok Choi, Hoon Lee, Namyoon Lee, Seok-Hwan Park, Kyoung-Jae Lee, Junil Choi, Sung Ho Chae, Sang-Woon Jeon, Kyung Sup Kwak, Bruno Clerckx, Wonjae Shin

Abstract: In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple acce… ▽ More In the upcoming 6G era, multiple access (MA) will play an essential role in achieving high throughput performances required in a wide range of wireless applications. Since MA and interference management are closely related issues, the conventional MA techniques are limited in that they cannot provide near-optimal performance in universal interference regimes. Recently, rate-splitting multiple access (RSMA) has been gaining much attention. RSMA splits an individual message into two parts: a common part, decodable by every user, and a private part, decodable only by the intended user. Each user first decodes the common message and then decodes its private message by applying successive interference cancellation (SIC). By doing so, RSMA not only embraces the existing MA techniques as special cases but also provides significant performance gains by efficiently mitigating inter-user interference in a broad range of interference regimes. In this article, we first present the theoretical foundation of RSMA. Subsequently, we put forth four key benefits of RSMA: spectral efficiency, robustness, scalability, and flexibility. Upon this, we describe how RSMA can enable ten promising scenarios and applications along with future research directions to pave the way for 6G. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 17 pages, 6 figures, submitted to IEEE Network Magazine

arXiv:2306.12977 [pdf, other]

Sum-Rate Maximization of RSMA-based Aerial Communications with Energy Harvesting: A Reinforcement Learning Approach

Authors: Jaehyup Seong, Mesut Toka, Wonjae Shin

Abstract: In this letter, we investigate a joint power and beamforming design problem for rate-splitting multiple access (RSMA)-based aerial communications with energy harvesting, where a self-sustainable aerial base station serves multiple users by utilizing the harvested energy. Considering maximizing the sum-rate from the long-term perspective, we utilize a deep reinforcement learning (DRL) approach, nam… ▽ More In this letter, we investigate a joint power and beamforming design problem for rate-splitting multiple access (RSMA)-based aerial communications with energy harvesting, where a self-sustainable aerial base station serves multiple users by utilizing the harvested energy. Considering maximizing the sum-rate from the long-term perspective, we utilize a deep reinforcement learning (DRL) approach, namely the soft actor-critic algorithm, to restrict the maximum transmission power at each time based on the stochastic property of the channel environment, harvested energy, and battery power information. Moreover, for designing precoders and power allocation among all the private/common streams of the RSMA, we employ sequential least squares programming (SLSQP) using the Han-Powell quasi-Newton method to maximize the sum-rate for the given transmission power via DRL. Numerical results show the superiority of the proposed scheme over several baseline methods in terms of the average sum-rate performance. △ Less

Submitted 22 June, 2023; originally announced June 2023.

Comments: 13 pages, 4 figures, submitted to IEEE Wireless Communications Letters

arXiv:2305.18885 [pdf, other]

Criteria Tell You More than Ratings: Criteria Preference-Aware Light Graph Convolution for Effective Multi-Criteria Recommendation

Authors: Jin-Duk Park, Siqing Li, Xin Cao, Won-Yong Shin

Abstract: The multi-criteria (MC) recommender system, which leverages MC rating information in a wide range of e-commerce areas, is ubiquitous nowadays. Surprisingly, although graph neural networks (GNNs) have been widely applied to develop various recommender systems due to GNN's high expressive capability in learning graph representations, it has been still unexplored how to design MC recommender systems… ▽ More The multi-criteria (MC) recommender system, which leverages MC rating information in a wide range of e-commerce areas, is ubiquitous nowadays. Surprisingly, although graph neural networks (GNNs) have been widely applied to develop various recommender systems due to GNN's high expressive capability in learning graph representations, it has been still unexplored how to design MC recommender systems with GNNs. In light of this, we make the first attempt towards designing a GNN-aided MC recommender system. Specifically, rather than straightforwardly adopting existing GNN-based recommendation methods, we devise a novel criteria preference-aware light graph convolution CPA-LGC method, which is capable of precisely capturing the criteria preference of users as well as the collaborative signal in complex high-order connectivities. To this end, we first construct an MC expansion graph that transforms user--item MC ratings into an expanded bipartite graph to potentially learn from the collaborative signal in MC ratings. Next, to strengthen the capability of criteria preference awareness, CPA-LGC incorporates newly characterized embeddings, including user-specific criteria-preference embeddings and item-specific criterion embeddings, into our graph convolution model. Through comprehensive evaluations using four real-world datasets, we demonstrate (a) the superiority over benchmark MC recommendation methods and benchmark recommendation methods using GNNs with tremendous gains, (b) the effectiveness of core components in CPA-LGC, and (c) the computational efficiency. △ Less

Submitted 6 June, 2023; v1 submitted 30 May, 2023; originally announced May 2023.

Comments: 12 pages, 10 figures, 5 tables; 29th ACM SIGKDD Conference on Knowledge Discovery & Data (KDD 2023) (to appear) (Please cite our conference version.)

arXiv:2304.12751 [pdf, other]

Node Feature Augmentation Vitaminizes Network Alignment

Authors: Jin-Duk Park, Cong Tran, Won-Yong Shin, Xin Cao

Abstract: Network alignment (NA) is the task of discovering node correspondences across multiple networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their effectiveness is not without additional information such as prior anchor links and/or node features, which may not always be available due to privacy concerns or access restrictions. To tackle this challenge, we propos… ▽ More Network alignment (NA) is the task of discovering node correspondences across multiple networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their effectiveness is not without additional information such as prior anchor links and/or node features, which may not always be available due to privacy concerns or access restrictions. To tackle this challenge, we propose Grad-Align+, a novel NA method built upon a recent state-of-the-art NA method, the so-called Grad-Align, that gradually discovers a part of node pairs until all node pairs are found. In designing Grad-Align+, we account for how to augment node features in the sense of performing the NA task and how to design our NA method by maximally exploiting the augmented node features. To achieve this goal, Grad-Align+ consists of three key components: 1) centrality-based node feature augmentation (CNFA), 2) graph neural network (GNN)-aided embedding similarity calculation alongside the augmented node features, and 3) gradual NA with similarity calculation using aligned cross-network neighbor-pairs (ACNs). Through comprehensive experiments, we demonstrate that Grad-Align+ exhibits (a) the superiority over benchmark NA methods, (b) empirical validations as well as our theoretical findings to see the effectiveness of CNFA, (c) the influence of each component, (d) the robustness to network noises, and (e) the computational efficiency. △ Less

Submitted 17 May, 2024; v1 submitted 25 April, 2023; originally announced April 2023.

Comments: 18 pages, 12 figures, 5 tables; its conference version was presented at the ACM International Conference on Information and Knowledge Management (CIKM 2022)

arXiv:2304.04497 [pdf, other]

A Unified Framework for Exploratory Learning-Aided Community Detection Under Topological Uncertainty

Authors: Yu Hou, Cong Tran, Ming Li, Won-Yong Shin

Abstract: In social networks, the discovery of community structures has received considerable attention as a fundamental problem in various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often uncertain, thereby rendering established community detection approaches ineffective without costly network topology acquisition. To tackle this challenge, we… ▽ More In social networks, the discovery of community structures has received considerable attention as a fundamental problem in various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often uncertain, thereby rendering established community detection approaches ineffective without costly network topology acquisition. To tackle this challenge, we present META-CODE, a unified framework for detecting overlapping communities via exploratory learning aided by easy-to-collect node metadata when networks are topologically unknown (or only partially known). Specifically, META-CODE consists of three iterative steps in addition to the initial network inference step: 1) node-level community-affiliation embeddings based on graph neural networks (GNNs) trained by our new reconstruction loss, 2) network exploration via community-affiliation-based node queries, and 3) network inference using an edge connectivity-based Siamese neural network model from the explored network. Through extensive experiments on five real-world datasets including two large networks, we demonstrated: (a) the superiority of META-CODE over benchmark community detection methods, achieving remarkable gains up to 151.27% compared to the best existing competitor, (b) the impact of each module in META-CODE, (c) the effectiveness of node queries in META-CODE based on empirical evaluations and theoretical findings, (d) the convergence of the inferred network, and (e) the computational efficiency of META-CODE. △ Less

Submitted 5 March, 2024; v1 submitted 10 April, 2023; originally announced April 2023.

Comments: 17 pages, 9 figures, 6 tables; its conference version was presented at the ACM International Conference on Information and Knowledge Management (CIKM 2022)

arXiv:2303.17057 [pdf, other]

doi 10.1109/TMECH.2023.3331357

Avian-Inspired Claws Enable Robot Perching or Walking

Authors: Mohammad Askari, Won Dong Shin, Damian Lenherr, William Stewart, Dario Floreano

Abstract: Multimodal UAVs (Unmanned Aerial Vehicles) are rarely capable of more than two modalities, i.e., flying and walking or flying and perching. However, being able to fly, perch, and walk could further improve their usefulness by expanding their operating envelope. For instance, an aerial robot could fly a long distance, perch in a high place to survey the surroundings, then walk to avoid obstacles th… ▽ More Multimodal UAVs (Unmanned Aerial Vehicles) are rarely capable of more than two modalities, i.e., flying and walking or flying and perching. However, being able to fly, perch, and walk could further improve their usefulness by expanding their operating envelope. For instance, an aerial robot could fly a long distance, perch in a high place to survey the surroundings, then walk to avoid obstacles that could potentially inhibit flight. Birds are capable of these three tasks, and so offer a practical example of how a robot might be developed to do the same. In this paper, we present a specialized avian-inspired claw design to enable UAVs to perch passively or walk. The key innovation is the combination of a Hoberman linkage leg with Fin Ray claw that uses the weight of the UAV to wrap the claw around a perch, or hyperextend it in the opposite direction to form a curved-up shape for stable terrestrial locomotion. Because the design uses the weight of the vehicle, the underactuated design is lightweight and low power. With the inclusion of talons, the 45g claws are capable of holding a 700g UAV to an almost 20-degree angle on a perch. In scenarios where cluttered environments impede flight and long mission times are required, such a combination of flying, perching, and walking is critical. △ Less

Submitted 2 February, 2024; v1 submitted 29 March, 2023; originally announced March 2023.

Comments: 15 pages, 12 figures

arXiv:2303.09057 [pdf, other]

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

Authors: Hyun Joon Park, Seok Woo Yang, Jin Sob Kim, Wooseok Shin, Sung Won Han

Abstract: Voice Conversion (VC) must be achieved while maintaining the content of the source speech and representing the characteristics of the target speaker. The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics. In this study, we propose Triple Adaptive Att… ▽ More Voice Conversion (VC) must be achieved while maintaining the content of the source speech and representing the characteristics of the target speaker. The existing methods do not simultaneously satisfy the above two aspects of VC, and their conversion outputs suffer from a trade-off problem between maintaining source contents and target characteristics. In this study, we propose Triple Adaptive Attention Normalization VC (TriAAN-VC), comprising an encoder-decoder and an attention-based adaptive normalization block, that can be applied to non-parallel any-to-any VC. The proposed adaptive normalization block extracts target speaker representations and achieves conversion while minimizing the loss of the source content with siamese loss. We evaluated TriAAN-VC on the VCTK dataset in terms of the maintenance of the source content and target speaker similarity. Experimental results for one-shot VC suggest that TriAAN-VC achieves state-of-the-art performance while mitigating the trade-off problem encountered in the existing VC methods. △ Less

Submitted 15 March, 2023; originally announced March 2023.

Comments: To appear in ICASSP 2023

arXiv:2302.13563 [pdf, other]

Deep Imbalanced Time-series Forecasting via Local Discrepancy Density

Authors: Junwoo Park, Jungsoo Lee, Youngin Cho, Woncheol Shin, Dongmin Kim, Jaegul Choo, Edward Choi

Abstract: Time-series forecasting models often encounter abrupt changes in a given period of time which generally occur due to unexpected or unknown events. Despite their scarce occurrences in the training set, abrupt changes incur loss that significantly contributes to the total loss. Therefore, they act as noisy training samples and prevent the model from learning generalizable patterns, namely the normal… ▽ More Time-series forecasting models often encounter abrupt changes in a given period of time which generally occur due to unexpected or unknown events. Despite their scarce occurrences in the training set, abrupt changes incur loss that significantly contributes to the total loss. Therefore, they act as noisy training samples and prevent the model from learning generalizable patterns, namely the normal states. Based on our findings, we propose a reweighting framework that down-weights the losses incurred by abrupt changes and up-weights those by normal states. For the reweighting framework, we first define a measurement termed Local Discrepancy (LD) which measures the degree of abruptness of a change in a given period of time. Since a training set is mostly composed of normal states, we then consider how frequently the temporal changes appear in the training set based on LD. Our reweighting framework is applicable to existing time-series forecasting models regardless of the architectures. Through extensive experiments on 12 time-series forecasting models over eight datasets with various in-output sequence lengths, we demonstrate that applying our reweighting framework reduces MSE by 10.1% on average and by up to 18.6% in the state-of-the-art model. △ Less

Submitted 22 September, 2023; v1 submitted 27 February, 2023; originally announced February 2023.

Comments: Accepted at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML/PKDD) 2023

arXiv:2302.07476 [pdf, other]

Indexed Multiple Access with Reconfigurable Intelligent Surfaces: The Reflection Tuning Potential

Authors: Rohit Singh, Aryan Kaushik, Wonjae Shin, George C. Alexandropoulos, Mesut Toka, Marco Di Renzo

Abstract: Indexed modulation (IM) is an evolving technique that has become popular due to its ability of parallel data communication over distinct combinations of transmission entities. In this article, we first provide a comprehensive survey of IM-enabled multiple access (MA) techniques, emphasizing the shortcomings of existing non-indexed MA schemes. Theoretical comparisons are presented to show how the n… ▽ More Indexed modulation (IM) is an evolving technique that has become popular due to its ability of parallel data communication over distinct combinations of transmission entities. In this article, we first provide a comprehensive survey of IM-enabled multiple access (MA) techniques, emphasizing the shortcomings of existing non-indexed MA schemes. Theoretical comparisons are presented to show how the notion of indexing eliminates the limitations of non-indexed solutions. We also discuss the benefits that the utilization of a reconfigurable intelligent surface (RIS) can offer when deployed as an indexing entity. In particular, we propose an RIS-indexed multiple access (RIMA) transmission scheme that utilizes dynamic phase tuning to embed multi-user information over a single carrier. The performance of the proposed RIMA is assessed in light of simulation results that confirm its performance gains. The article further includes a list of relevant open technical issues and research directions. △ Less

Submitted 15 February, 2023; originally announced February 2023.

Comments: 7 pages, 5 figures, 1 table

arXiv:2301.11111 [pdf, other]

Towards 6G Hyper-Connectivity: Vision, Challenges, and Key Enabling Technologies

Authors: Howon Lee, Byungju Lee, Heecheol Yang, Junghyun Kim, Seungnyun Kim, Wonjae Shin, Byonghyo Shim, H. Vincent Poor

Abstract: Technology forecasts anticipate a new era in which massive numbers of humans, machines, and things are connected to wireless networks to sense, process, act, and communicate with the surrounding environment in a real-time manner. To make the visions come true, the sixth generation (6G) wireless networks should be hyper-connected, implying that there are no constraints on the data rate, coverage, a… ▽ More Technology forecasts anticipate a new era in which massive numbers of humans, machines, and things are connected to wireless networks to sense, process, act, and communicate with the surrounding environment in a real-time manner. To make the visions come true, the sixth generation (6G) wireless networks should be hyper-connected, implying that there are no constraints on the data rate, coverage, and computing. In this article, we first identify the main challenges for 6G hyper-connectivity, including terabits-per-second (Tbps) data rates for immersive user experiences, zero coverage-hole networks, and pervasive computing for connected intelligence. To overcome these challenges, we highlight key enabling technologies for 6G such as distributed and intelligence-aware cell-free massive multi-input multioutput (MIMO) networks, boundless and fully integrated terrestrial and non-terrestrial networks, and communication-aware distributed computing for computation-intensive applications. We further illustrate and discuss the hyper-connected 6G network architecture along with open issues and future research directions. △ Less

Submitted 26 January, 2023; originally announced January 2023.

Comments: 16 pages, 6 figures

arXiv:2301.07695 [pdf, other]

EHRSQL: A Practical Text-to-SQL Benchmark for Electronic Health Records

Authors: Gyubok Lee, Hyeonji Hwang, Seongsu Bae, Yeonsu Kwon, Woncheol Shin, Seongjun Yang, Minjoon Seo, Jong-Yeup Kim, Edward Choi

Abstract: We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions… ▽ More We present a new text-to-SQL dataset for electronic health records (EHRs). The utterances were collected from 222 hospital staff members, including physicians, nurses, and insurance review and health records teams. To construct the QA dataset on structured EHR data, we conducted a poll at a university hospital and used the responses to create seed questions. We then manually linked these questions to two open-source EHR databases, MIMIC-III and eICU, and included various time expressions and held-out unanswerable questions in the dataset, which were also collected from the poll. Our dataset poses a unique set of challenges: the model needs to 1) generate SQL queries that reflect a wide range of needs in the hospital, including simple retrieval and complex operations such as calculating survival rate, 2) understand various time expressions to answer time-sensitive questions in healthcare, and 3) distinguish whether a given question is answerable or unanswerable. We believe our dataset, EHRSQL, can serve as a practical benchmark for developing and assessing QA models on structured EHR data and take a step further towards bridging the gap between text-to-SQL research and its real-life deployment in healthcare. EHRSQL is available at https://github.com/glee4810/EHRSQL. △ Less

Submitted 25 December, 2023; v1 submitted 16 January, 2023; originally announced January 2023.

Comments: Published as a conference paper at NeurIPS 2022 (Track on Datasets and Benchmarks)

arXiv:2212.03454 [pdf]

Fallen Angel Bonds Investment and Bankruptcy Predictions Using Manual Models and Automated Machine Learning

Authors: Harrison Mateika, Juannan Jia, Linda Lillard, Noah Cronbaugh, Will Shin

Abstract: The primary aim of this research was to find a model that best predicts which fallen angel bonds would either potentially rise up back to investment grade bonds and which ones would fall into bankruptcy. To implement the solution, we thought that the ideal method would be to create an optimal machine learning model that could predict bankruptcies. Among the many machine learning models out there w… ▽ More The primary aim of this research was to find a model that best predicts which fallen angel bonds would either potentially rise up back to investment grade bonds and which ones would fall into bankruptcy. To implement the solution, we thought that the ideal method would be to create an optimal machine learning model that could predict bankruptcies. Among the many machine learning models out there we decided to pick four classification methods: logistic regression, KNN, SVM, and NN. We also utilized an automated methods of Google Cloud's machine learning. The results of our model comparisons showed that the models did not predict bankruptcies very well on the original data set with the exception of Google Cloud's machine learning having a high precision score. However, our over-sampled and feature selection data set did perform very well. This could likely be due to the model being over-fitted to match the narrative of the over-sampled data (as in, it does not accurately predict data outside of this data set quite well). Therefore, we were not able to create a model that we are confident that would predict bankruptcies. However, we were able to find value out of this project in two key ways. The first is that Google Cloud's machine learning model in every metric and in every data set either outperformed or performed on par with the other models. The second is that we found that utilizing feature selection did not reduce predictive power that much. This means that we can reduce the amount of data to collect for future experimentation regarding predicting bankruptcies. △ Less

Submitted 8 December, 2022; v1 submitted 6 December, 2022; originally announced December 2022.

Comments: 8 pages, 3 figures

arXiv:2211.09988 [pdf, ps, other]

Exploring WavLM on Speech Enhancement

Authors: Hyungchan Song, Sanyuan Chen, Zhuo Chen, Yu Wu, Takuya Yoshioka, Min Tang, Jong Won Shin, Shujie Liu

Abstract: There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with… ▽ More There is a surge in interest in self-supervised learning approaches for end-to-end speech encoding in recent years as they have achieved great success. Especially, WavLM showed state-of-the-art performance on various speech processing tasks. To better understand the efficacy of self-supervised learning models for speech enhancement, in this work, we design and conduct a series of experiments with three resource conditions by combining WavLM and two high-quality speech enhancement systems. Also, we propose a regression-based WavLM training objective and a noise-mixing data configuration to further boost the downstream enhancement performance. The experiments on the DNS challenge dataset and a simulation dataset show that the WavLM benefits the speech enhancement task in terms of both speech quality and speech recognition accuracy, especially for low fine-tuning resources. For the high fine-tuning resource condition, only the word error rate is substantially improved. △ Less

Submitted 17 November, 2022; originally announced November 2022.

Comments: Accepted by IEEE SLT 2022

arXiv:2211.07146 [pdf, other]

Joint Data Deepening-and-Prefetching for Energy-Efficient Edge Learning

Authors: Sujin Kook, Won-Yong Shin, Seong-Lyun Kim, Seung-Woo Ko

Abstract: The vision of pervasive machine learning (ML) services can be realized by training an ML model on time using real-time data collected by internet of things (IoT) devices. To this end, IoT devices require offloading their data to an edge server in proximity. On the other hand, high dimensional data with a heavy volume causes a significant burden to an IoT device with a limited energy budget. To cop… ▽ More The vision of pervasive machine learning (ML) services can be realized by training an ML model on time using real-time data collected by internet of things (IoT) devices. To this end, IoT devices require offloading their data to an edge server in proximity. On the other hand, high dimensional data with a heavy volume causes a significant burden to an IoT device with a limited energy budget. To cope with the limitation, we propose a novel offloading architecture, called joint data deepening and prefetching (JD2P), which is feature-by-feature offloading comprising two key techniques. The first one is data deepening, where each data sample's features are sequentially offloaded in the order of importance determined by the data embedding technique such as principle component analysis (PCA). No more features are offloaded when the features offloaded so far are enough to classify the data, resulting in reducing the amount of offloaded data. The second one is data prefetching, where some features potentially required in the future are offloaded in advance, thus achieving high efficiency via precise prediction and parameter optimization. To verify the effectiveness of JD2P, we conduct experiments using the MNIST and fashion-MNIST dataset. Experimental results demonstrate that the JD2P can significantly reduce the expected energy consumption compared with several benchmarks without degrading learning accuracy. △ Less

Submitted 14 November, 2022; originally announced November 2022.

Comments: This work has been submitted to the IEEE for possible publication

arXiv:2211.06648 [pdf, other]

DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical Representations

Authors: Hyebin Kwon, Joungbin An, Dongwoo Lee, Won-Yong Shin

Abstract: Considerable research attention has been paid to table detection by developing not only rule-based approaches reliant on hand-crafted heuristics but also deep learning approaches. Although recent studies successfully perform table detection with enhanced results, they often experience performance degradation when they are used for transferred domains whose table layout features might differ from t… ▽ More Considerable research attention has been paid to table detection by developing not only rule-based approaches reliant on hand-crafted heuristics but also deep learning approaches. Although recent studies successfully perform table detection with enhanced results, they often experience performance degradation when they are used for transferred domains whose table layout features might differ from the source domain in which the underlying model has been trained. To overcome this problem, we present DATa, a novel Domain Adaptation-aided deep Table detection method that guarantees satisfactory performance in a specific target domain where few trusted labels are available. To this end, we newly design lexical features and an augmented model used for re-training. More specifically, after pre-training one of state-of-the-art vision-based models as our backbone network, we re-train our augmented model, consisting of the vision-based model and the multilayer perceptron (MLP) architecture. Using new confidence scores acquired based on the trained MLP architecture as well as an initial prediction of bounding boxes and their confidence scores, we calculate each confidence score more accurately. To validate the superiority of DATa, we perform experimental evaluations by adopting a real-world benchmark dataset in a source domain and another dataset in our target domain consisting of materials science articles. Experimental results demonstrate that the proposed DATa method substantially outperforms competing methods that only utilize visual representations in the target domain. Such gains are possible owing to the capability of eliminating high false positives or false negatives according to the setting of a confidence score threshold. △ Less

Submitted 12 November, 2022; originally announced November 2022.

Comments: 28 pages, 5 figures, 2 tables, published in the Knowledge-Based Systems (Please cite our journal version.)

arXiv:2211.01213 [pdf, other]

FiFo: Fishbone Forwarding in Massive IoT Networks

Authors: Hayoung Seong, Junseon Kim, Won-Yong Shin, Howon Lee

Abstract: Massive Internet of Things (IoT) networks have a wide range of applications, including but not limited to the rapid delivery of emergency and disaster messages. Although various benchmark algorithms have been developed to date for message delivery in such applications, they pose several practical challenges such as insufficient network coverage and/or highly redundant transmissions to expand the c… ▽ More Massive Internet of Things (IoT) networks have a wide range of applications, including but not limited to the rapid delivery of emergency and disaster messages. Although various benchmark algorithms have been developed to date for message delivery in such applications, they pose several practical challenges such as insufficient network coverage and/or highly redundant transmissions to expand the coverage area, resulting in considerable energy consumption for each IoT device. To overcome this problem, we first characterize a new performance metric, forwarding efficiency, which is defined as the ratio of the coverage probability to the average number of transmissions per device, to evaluate the data dissemination performance more appropriately. Then, we propose a novel and effective forwarding method, fishbone forwarding (FiFo), which aims to improve the forwarding efficiency with acceptable computational complexity. Our FiFo method completes two tasks: 1) it clusters devices based on the unweighted pair group method with the arithmetic average; and 2) it creates the main axis and sub axes of each cluster using both the expectation-maximization algorithm for the Gaussian mixture model and principal component analysis. We demonstrate the superiority of FiFo by using a real-world dataset. Through intensive and comprehensive simulations, we show that the proposed FiFo method outperforms benchmark algorithms in terms of the forwarding efficiency. △ Less

Submitted 2 November, 2022; originally announced November 2022.

Comments: 13 pages, 16 figures, 5 tables; to appear in the IEEE Internet of Things Journal (Please cite our journal version that will appear in an upcoming issue.)

arXiv:2210.17159 [pdf, other]

PAGE: Prototype-Based Model-Level Explanations for Graph Neural Networks

Authors: Yong-Min Shin, Sun-Woo Kim, Won-Yong Shin

Abstract: Aside from graph neural networks (GNNs) attracting significant attention as a powerful framework revolutionizing graph representation learning, there has been an increasing demand for explaining GNN models. Although various explanation methods for GNNs have been developed, most studies have focused on instance-level explanations, which produce explanations tailored to a given graph instance. In ou… ▽ More Aside from graph neural networks (GNNs) attracting significant attention as a powerful framework revolutionizing graph representation learning, there has been an increasing demand for explaining GNN models. Although various explanation methods for GNNs have been developed, most studies have focused on instance-level explanations, which produce explanations tailored to a given graph instance. In our study, we propose Prototype-bAsed GNN-Explainer (PAGE), a novel model-level GNN explanation method that explains what the underlying GNN model has learned for graph classification by discovering human-interpretable prototype graphs. Our method produces explanations for a given class, thus being capable of offering more concise and comprehensive explanations than those of instance-level explanations. First, PAGE selects embeddings of class-discriminative input graphs on the graph-level embedding space after clustering them. Then, PAGE discovers a common subgraph pattern by iteratively searching for high matching node tuples using node-level embeddings via a prototype scoring function, thereby yielding a prototype graph as our explanation. Using six graph classification datasets, we demonstrate that PAGE qualitatively and quantitatively outperforms the state-of-the-art model-level explanation method. We also carry out systematic experimental studies by demonstrating the relationship between PAGE and instance-level explanation methods, the robustness of PAGE to input data scarce environments, and the computational efficiency of the proposed prototype scoring function in PAGE. △ Less

Submitted 19 March, 2024; v1 submitted 31 October, 2022; originally announced October 2022.

Comments: 18 pages, 12 figures, 5 tables; to appear in the IEEE Transactions on Pattern Analysis and Machine Intelligence (Please cite our journal version that will appear in an upcoming issue. Its two-page extended summary was presented in the AAAI-22 Student Abstract and Poster Program.)

arXiv:2209.14930 [pdf, other]

Graph Anomaly Detection with Graph Neural Networks: Current Status and Challenges

Authors: Hwan Kim, Byung Suk Lee, Won-Yong Shin, Sungsu Lim

Abstract: Graphs are used widely to model complex systems, and detecting anomalies in a graph is an important task in the analysis of complex systems. Graph anomalies are patterns in a graph that do not conform to normal patterns expected of the attributes and/or structures of the graph. In recent years, graph neural networks (GNNs) have been studied extensively and have successfully performed difficult mac… ▽ More Graphs are used widely to model complex systems, and detecting anomalies in a graph is an important task in the analysis of complex systems. Graph anomalies are patterns in a graph that do not conform to normal patterns expected of the attributes and/or structures of the graph. In recent years, graph neural networks (GNNs) have been studied extensively and have successfully performed difficult machine learning tasks in node classification, link prediction, and graph classification thanks to the highly expressive capability via message passing in effectively learning graph representations. To solve the graph anomaly detection problem, GNN-based methods leverage information about the graph attributes (or features) and/or structures to learn to score anomalies appropriately. In this survey, we review the recent advances made in detecting graph anomalies using GNN models. Specifically, we summarize GNN-based methods according to the graph type (i.e., static and dynamic), the anomaly type (i.e., node, edge, subgraph, and whole graph), and the network architecture (e.g., graph autoencoder, graph convolutional network). To the best of our knowledge, this survey is the first comprehensive review of graph anomaly detection methods based on GNNs. △ Less

Submitted 4 October, 2022; v1 submitted 29 September, 2022; originally announced September 2022.

Comments: 9 pages, 2 figures, 1 tables; to appear in the IEEE Access (Please cite our journal version.)

arXiv:2208.11025 [pdf, other]

Grad-Align+: Empowering Gradual Network Alignment Using Attribute Augmentation

Authors: Jin-Duk Park, Cong Tran, Won-Yong Shin, Xin Cao

Abstract: Network alignment (NA) is the task of discovering node correspondences across different networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their satisfactory performance is not without prior anchor link information and/or node attributes, which may not always be available. In this paper, we propose Grad-Align+, a novel NA method using node attribute augmentati… ▽ More Network alignment (NA) is the task of discovering node correspondences across different networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their satisfactory performance is not without prior anchor link information and/or node attributes, which may not always be available. In this paper, we propose Grad-Align+, a novel NA method using node attribute augmentation that is quite robust to the absence of such additional information. Grad-Align+ is built upon a recent state-of-the-art NA method, the so-called Grad-Align, that gradually discovers only a part of node pairs until all node pairs are found. Specifically, Grad-Align+ is composed of the following key components: 1) augmenting node attributes based on nodes' centrality measures, 2) calculating an embedding similarity matrix extracted from a graph neural network into which the augmented node attributes are fed, and 3) gradually discovering node pairs by calculating similarities between cross-network nodes with respect to the aligned cross-network neighbor-pair. Experimental results demonstrate that Grad-Align+ exhibits (a) superiority over benchmark NA methods, (b) empirical validation of our theoretical findings, and (c) the effectiveness of our attribute augmentation module. △ Less

Submitted 24 August, 2022; v1 submitted 23 August, 2022; originally announced August 2022.

Comments: 31st ACM International Conference on Information and Knowledge Management (CIKM 2022) (to appear) (Please cite our conference version.)

arXiv:2208.11015 [pdf, other]

META-CODE: Community Detection via Exploratory Learning in Topologically Unknown Networks

Authors: Yu Hou, Cong Tran, Won-Yong Shin

Abstract: The discovery of community structures in social networks has gained considerable attention as a fundamental problem for various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often unknown, thereby rendering established community detection approaches ineffective without costly data acquisition. To tackle this challenge, we present META-COD… ▽ More The discovery of community structures in social networks has gained considerable attention as a fundamental problem for various network analysis tasks. However, due to privacy concerns or access restrictions, the network structure is often unknown, thereby rendering established community detection approaches ineffective without costly data acquisition. To tackle this challenge, we present META-CODE, a novel end-to-end solution for detecting overlapping communities in networks with unknown topology via exploratory learning aided by easy-to-collect node metadata. Specifically, META-CODE consists of three steps: 1) initial network inference, 2) node-level community-affiliation embedding based on graph neural networks (GNNs) trained by our new reconstruction loss, and 3) network exploration via community-affiliation-based node queries, where Steps 2 and 3 are performed iteratively. Experimental results demonstrate that META-CODE exhibits (a) superiority over benchmark methods for overlapping community detection, (b) the effectiveness of our training model, and (c) fast network exploration. △ Less

Submitted 23 August, 2022; originally announced August 2022.

Comments: 31st ACM International Conference on Information and Knowledge Management (CIKM 2022) (to appear) (Please cite our conference version.)

Showing 1–50 of 126 results for author: Shin, W