subscribe to arXiv mailings

Global decomposition of networks into multiple cores formed by local hubs

Authors: Wonhee Jeong, Unjong Yu, Sang Hoon Lee

Abstract: Networks are ubiquitous in various fields, representing systems where nodes and their interconnections constitute their intricate structures. We introduce a network decomposition scheme to reveal multiscale core-periphery structures lurking inside, using the concept of locally defined nodal hub centrality and edge-pruning techniques built upon it. We demonstrate that the hub-centrality-based edge… ▽ More Networks are ubiquitous in various fields, representing systems where nodes and their interconnections constitute their intricate structures. We introduce a network decomposition scheme to reveal multiscale core-periphery structures lurking inside, using the concept of locally defined nodal hub centrality and edge-pruning techniques built upon it. We demonstrate that the hub-centrality-based edge pruning reveals a series of breaking points in network decomposition, which effectively separates a network into its backbone and shell structures. Our local-edge decomposition method iteratively identifies and removes locally least important nodes, and uncovers an onion-like hierarchical structure as a result. Compared with the conventional $k$-core decomposition method, our method based on relative information residing in local structures exhibits a clear advantage in terms of discovering locally crucial substructures. Furthermore, we introduce the core-periphery score to properly separate the core and periphery with our decomposition scheme. By extending the method combined with the network community structure, we successfully detect multiple core-periphery structures by decomposition inside each community. Moreover, the application of our decomposition to supernode networks defined from the communities reveals the intricate relation between the two representative mesoscale structures. △ Less

Submitted 29 June, 2024; originally announced July 2024.

Comments: 10 pages, 8 figures, 1 table

arXiv:2404.17563 [pdf, other]

An exactly solvable model for emergence and scaling laws

Authors: Yoonsoo Nam, Nayara Fonseca, Seok Hyeong Lee, Chris Mingard, Ard A. Louis

Abstract: Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of ne… ▽ More Deep learning models can exhibit what appears to be a sudden ability to solve a new problem as training time, training data, or model size increases, a phenomenon known as emergence. In this paper, we present a framework where each new ability (a skill) is represented as a basis function. We solve a simple multi-linear model in this skill-basis, finding analytic expressions for the emergence of new skills, as well as for scaling laws of the loss with training time, data size, model size, and optimal compute ($C$). We compare our detailed calculations to direct simulations of a two-layer neural network trained on multitask sparse parity, where the tasks in the dataset are distributed according to a power-law. Our simple model captures, using a single fit parameter, the sigmoidal emergence of multiple new skills as training time, data size or model size increases in the neural network. △ Less

Submitted 14 July, 2024; v1 submitted 26 April, 2024; originally announced April 2024.

arXiv:2404.04096 [pdf, other]

Machine Learning-Aided Cooperative Localization under Dense Urban Environment

Authors: Hoon Lee, Hong Ki Kim, Seung Hyun Oh, Sang Hyun Lee

Abstract: Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions includin… ▽ More Future wireless network technology provides automobiles with the connectivity feature to consolidate the concept of vehicular networks that collaborate on conducting cooperative driving tasks. The full potential of connected vehicles, which promises road safety and quality driving experience, can be leveraged if machine learning models guarantee the robustness in performing core functions including localization and controls. Location awareness, in particular, lends itself to the deployment of location-specific services and the improvement of the operation performance. The localization entails direct communication to the network infrastructure, and the resulting centralized positioning solutions readily become intractable as the network scales up. As an alternative to the centralized solutions, this article addresses decentralized principle of vehicular localization reinforced by machine learning techniques in dense urban environments with frequent inaccessibility to reliable measurement. As such, the collaboration of multiple vehicles enhances the positioning performance of machine learning approaches. A virtual testbed is developed to validate this machine learning model for real-map vehicular networks. Numerical results demonstrate universal feasibility of cooperative localization, in particular, for dense urban area configurations. △ Less

Submitted 5 April, 2024; originally announced April 2024.

arXiv:2403.02639 [pdf, other]

False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy

Authors: Jiyong Oh, Junhaeng Lee, Woongchan Byun, Minsang Kong, Sang Hun Lee

Abstract: Recent studies have focused on enhancing the performance of 3D object detection models. Among various approaches, ground-truth sampling has been proposed as an augmentation technique to address the challenges posed by limited ground-truth data. However, an inherent issue with ground-truth sampling is its tendency to increase false positives. Therefore, this study aims to overcome the limitations o… ▽ More Recent studies have focused on enhancing the performance of 3D object detection models. Among various approaches, ground-truth sampling has been proposed as an augmentation technique to address the challenges posed by limited ground-truth data. However, an inherent issue with ground-truth sampling is its tendency to increase false positives. Therefore, this study aims to overcome the limitations of ground-truth sampling and improve the performance of 3D object detection models by developing a new augmentation technique called false-positive sampling. False-positive sampling involves retraining the model using point clouds that are identified as false positives in the model's predictions. We propose an algorithm that utilizes both ground-truth and false-positive sampling and an algorithm for building the false-positive sample database. Additionally, we analyze the principles behind the performance enhancement due to false-positive sampling. Our experiments demonstrate that models utilizing false-positive sampling show a reduction in false positives and exhibit improved object detection performance. On the KITTI and Waymo Open datasets, models with false-positive sampling surpass the baseline models by a large margin. △ Less

Submitted 19 May, 2024; v1 submitted 4 March, 2024; originally announced March 2024.

arXiv:2402.16598 [pdf, other]

PCR-99: A Practical Method for Point Cloud Registration with 99% Outliers

Authors: Seong Hun Lee, Javier Civera, Patrick Vandewalle

Abstract: We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be i… ▽ More We propose a robust method for point cloud registration that can handle both unknown scales and extreme outlier ratios. Our method, dubbed PCR-99, uses a deterministic 3-point sampling approach with two novel mechanisms that significantly boost the speed: (1) an improved ordering of the samples based on pairwise scale consistency, prioritizing the point correspondences that are more likely to be inliers, and (2) an efficient outlier rejection scheme based on triplet scale consistency, prescreening bad samples and reducing the number of hypotheses to be tested. Our evaluation shows that, up to 98% outlier ratio, the proposed method achieves comparable performance to the state of the art. At 99% outlier ratio, however, it outperforms the state of the art for both known-scale and unknown-scale problems. Especially for the latter, we observe a clear superiority in terms of robustness and speed. △ Less

Submitted 12 July, 2024; v1 submitted 26 February, 2024; originally announced February 2024.

arXiv:2402.06662 [pdf, other]

Sign Rank Limitations for Inner Product Graph Decoders

Authors: Su Hyeong Lee, Qingqi Zhang, Risi Kondor

Abstract: Inner product-based decoders are among the most influential frameworks used to extract meaningful data from latent embeddings. However, such decoders have shown limitations in representation capacity in numerous works within the literature, which have been particularly notable in graph reconstruction problems. In this paper, we provide the first theoretical elucidation of this pervasive phenomenon… ▽ More Inner product-based decoders are among the most influential frameworks used to extract meaningful data from latent embeddings. However, such decoders have shown limitations in representation capacity in numerous works within the literature, which have been particularly notable in graph reconstruction problems. In this paper, we provide the first theoretical elucidation of this pervasive phenomenon in graph data, and suggest straightforward modifications to circumvent this issue without deviating from the inner product framework. △ Less

Submitted 18 May, 2024; v1 submitted 5 February, 2024; originally announced February 2024.

Comments: Accepted by ICML 2024

arXiv:2401.05675 [pdf, other]

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Authors: Seung Hyun Lee, Yinxiao Li, Junjie Ke, Innfarn Yoo, Han Zhang, Jiahui Yu, Qifei Wang, Fei Deng, Glenn Entis, Junfeng He, Gang Li, Sangpil Kim, Irfan Essa, Feng Yang

Abstract: Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduc… ▽ More Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose Parrot, which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, Parrot automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of Parrot over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment. △ Less

Submitted 15 July, 2024; v1 submitted 11 January, 2024; originally announced January 2024.

arXiv:2401.00496 [pdf, other]

SAR-RARP50: Segmentation of surgical instrumentation and Action Recognition on Robot-Assisted Radical Prostatectomy Challenge

Authors: Dimitrios Psychogyios, Emanuele Colleoni, Beatrice Van Amsterdam, Chih-Yang Li, Shu-Yu Huang, Yuchong Li, Fucang Jia, Baosheng Zou, Guotai Wang, Yang Liu, Maxence Boels, Jiayu Huo, Rachel Sparks, Prokar Dasgupta, Alejandro Granados, Sebastien Ourselin, Mengya Xu, An Wang, Yanan Wu, Long Bai, Hongliang Ren, Atsushi Yamada, Yuriko Harai, Yuto Ishikawa, Kazuyuki Hayashi , et al. (25 additional authors not shown)

Abstract: Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segme… ▽ More Surgical tool segmentation and action recognition are fundamental building blocks in many computer-assisted intervention applications, ranging from surgical skills assessment to decision support systems. Nowadays, learning-based action recognition and segmentation approaches outperform classical methods, relying, however, on large, annotated datasets. Furthermore, action recognition and tool segmentation algorithms are often trained and make predictions in isolation from each other, without exploiting potential cross-task relationships. With the EndoVis 2022 SAR-RARP50 challenge, we release the first multimodal, publicly available, in-vivo, dataset for surgical action recognition and semantic instrumentation segmentation, containing 50 suturing video segments of Robotic Assisted Radical Prostatectomy (RARP). The aim of the challenge is twofold. First, to enable researchers to leverage the scale of the provided dataset and develop robust and highly accurate single-task action recognition and tool segmentation approaches in the surgical domain. Second, to further explore the potential of multitask-based learning approaches and determine their comparative advantage against their single-task counterparts. A total of 12 teams participated in the challenge, contributing 7 action recognition methods, 9 instrument segmentation techniques, and 4 multitask approaches that integrated both action recognition and instrument segmentation. The complete SAR-RARP50 dataset is available at: https://rdr.ucl.ac.uk/projects/SARRARP50_Segmentation_of_surgical_instrumentation_and_Action_Recognition_on_Robot-Assisted_Radical_Prostatectomy_Challenge/191091 △ Less

Submitted 23 January, 2024; v1 submitted 31 December, 2023; originally announced January 2024.

arXiv:2311.09354 [pdf]

doi 10.1063/5.0189222

Nondestructive, quantitative viability analysis of 3D tissue cultures using machine learning image segmentation

Authors: Kylie J. Trettner, Jeremy Hsieh, Weikun Xiao, Jerry S. H. Lee, Andrea M. Armani

Abstract: Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measure… ▽ More Ascertaining the collective viability of cells in different cell culture conditions has typically relied on averaging colorimetric indicators and is often reported out in simple binary readouts. Recent research has combined viability assessment techniques with image-based deep-learning models to automate the characterization of cellular properties. However, further development of viability measurements to assess the continuity of possible cellular states and responses to perturbation across cell culture conditions is needed. In this work, we demonstrate an image processing algorithm for quantifying cellular viability in 3D cultures without the need for assay-based indicators. We show that our algorithm performs similarly to a pair of human experts in whole-well images over a range of days and culture matrix compositions. To demonstrate potential utility, we perform a longitudinal study investigating the impact of a known therapeutic on pancreatic cancer spheroids. Using images taken with a high content imaging system, the algorithm successfully tracks viability at the individual spheroid and whole-well level. The method we propose reduces analysis time by 97% in comparison to the experts. Because the method is independent of the microscope or imaging system used, this approach lays the foundation for accelerating progress in and for improving the robustness and reproducibility of 3D culture analysis across biological and clinical research. △ Less

Submitted 11 March, 2024; v1 submitted 15 November, 2023; originally announced November 2023.

Comments: 52 total pages, Main text and SI included, 35 figures (5 main text, 30 supplemental), 9 tables, 6 datasets (provided on linked GitHub), linked image files on Zenodo

arXiv:2310.16401 [pdf, other]

Graph Neural Networks with a Distribution of Parametrized Graphs

Authors: See Hian Lee, Feng Ji, Kelin Xia, Wee Peng Tay

Abstract: Traditionally, graph neural networks have been trained using a single observed graph. However, the observed graph represents only one possible realization. In many applications, the graph may encounter uncertainties, such as having erroneous or missing edges, as well as edge weights that provide little informative value. To address these challenges and capture additional information previously abs… ▽ More Traditionally, graph neural networks have been trained using a single observed graph. However, the observed graph represents only one possible realization. In many applications, the graph may encounter uncertainties, such as having erroneous or missing edges, as well as edge weights that provide little informative value. To address these challenges and capture additional information previously absent in the observed graph, we introduce latent variables to parameterize and generate multiple graphs. We obtain the maximum likelihood estimate of the network parameters in an Expectation-Maximization (EM) framework based on the multiple graphs. Specifically, we iteratively determine the distribution of the graphs using a Markov Chain Monte Carlo (MCMC) method, incorporating the principles of PAC-Bayesian theory. Numerical experiments demonstrate improvements in performance against baseline models on node classification for heterogeneous graphs and graph regression on chemistry datasets. △ Less

Submitted 2 February, 2024; v1 submitted 25 October, 2023; originally announced October 2023.

arXiv:2309.05388 [pdf, other]

Robust Single Rotation Averaging Revisited

Authors: Seong Hun Lee, Javier Civera

Abstract: In this work, we propose a novel method for robust single rotation averaging that can efficiently handle an extremely large fraction of outliers. Our approach is to minimize the total truncated least unsquared deviations (TLUD) cost of geodesic distances. The proposed algorithm consists of three steps: First, we consider each input rotation as a potential initial solution and choose the one that y… ▽ More In this work, we propose a novel method for robust single rotation averaging that can efficiently handle an extremely large fraction of outliers. Our approach is to minimize the total truncated least unsquared deviations (TLUD) cost of geodesic distances. The proposed algorithm consists of three steps: First, we consider each input rotation as a potential initial solution and choose the one that yields the least sum of truncated chordal deviations. Next, we obtain the inlier set using the initial solution and compute its chordal $L_2$-mean. Finally, starting from this estimate, we iteratively compute the geodesic $L_1$-mean of the inliers using the Weiszfeld algorithm on $SO(3)$. An extensive evaluation shows that our method is robust against up to 99% outliers given a sufficient number of accurate inliers, outperforming the current state of the art. △ Less

Submitted 28 February, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

arXiv:2309.04655 [pdf]

Intelligent upper-limb exoskeleton integrated with soft wearable bioelectronics and deep-learning for human intention-driven strength augmentation based on sensory feedback

Authors: Jinwoo Lee, Kangkyu Kwon, Ira Soltis, Jared Matthews, Yoonjae Lee, Hojoong Kim, Lissette Romero, Nathan Zavanelli, Youngjin Kwon, Shinjae Kwon, Jimin Lee, Yewon Na, Sung Hoon Lee, Ki Jun Yu, Minoru Shinohara, Frank L. Hammond, Woon-Hong Yeo

Abstract: The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learn… ▽ More The age and stroke-associated decline in musculoskeletal strength degrades the ability to perform daily human tasks using the upper extremities. Although there are a few examples of exoskeletons, they need manual operations due to the absence of sensor feedback and no intention prediction of movements. Here, we introduce an intelligent upper-limb exoskeleton system that uses cloud-based deep learning to predict human intention for strength augmentation. The embedded soft wearable sensors provide sensory feedback by collecting real-time muscle signals, which are simultaneously computed to determine the user's intended movement. The cloud-based deep-learning predicts four upper-limb joint motions with an average accuracy of 96.2% at a 200-250 millisecond response rate, suggesting that the exoskeleton operates just by human intention. In addition, an array of soft pneumatics assists the intended movements by providing 897 newton of force and 78.7 millimeter of displacement at maximum. Collectively, the intent-driven exoskeleton can augment human strength by 5.15 times on average compared to the unassisted exoskeleton. This report demonstrates an exoskeleton robot that augments the upper-limb joint movements by human intention based on a machine-learning cloud computing and sensory feedback. △ Less

Submitted 26 January, 2024; v1 submitted 8 September, 2023; originally announced September 2023.

Comments: 15 pages, 6 figures, 1 table, published in npj flexible electronics journals

MSC Class: 68T40 (Primary) 92C55; 68T99 (Secondary)

arXiv:2308.02596 [pdf, other]

doi 10.1007/s40042-023-00921-8

Revisiting small-world network models: Exploring technical realizations and the equivalence of the Newman-Watts and Harary models

Authors: Seora Son, Eun Ji Choi, Sang Hoon Lee

Abstract: We address the relatively less known facts on the equivalence and technical realizations surrounding two network models showing the "small-world" property, namely the Newman-Watts and the Harary models. We provide the most accurate (in terms of faithfulness to the original literature) versions of these models to clarify the deviation from them existing in their variants adopted in one of the most… ▽ More We address the relatively less known facts on the equivalence and technical realizations surrounding two network models showing the "small-world" property, namely the Newman-Watts and the Harary models. We provide the most accurate (in terms of faithfulness to the original literature) versions of these models to clarify the deviation from them existing in their variants adopted in one of the most popular network analysis packages. The difference in technical realizations of those models could be conceived as minor details, but we discover significantly notable changes caused by the possibly inadvertent modification. For the Harary model, the stochasticity in the original formulation allows a much wider range of the clustering coefficient and the average shortest path length. For the Newman-Watts model, due to the drastically different degree distributions, the clustering coefficient can also be affected, which is verified by our higher-order analytic derivation. During the process, we discover the equivalence of the Newman-Watts (better known in the network science or physics community) and the Harary (better known in the graph theory or mathematics community) models under a specific condition of restricted parity in variables, which would bridge the two relatively independently developed models in different fields. Our result highlights the importance of each detailed step in constructing network models and the possibility of deeply related models, even if they might initially appear distinct in terms of the time period or the academic disciplines from which they emerged. △ Less

Submitted 12 December, 2023; v1 submitted 3 August, 2023; originally announced August 2023.

Comments: 11 pages, 5 figures, 1 table

Journal ref: J. Korean Phys. Soc. 83, 879 (2023)

arXiv:2305.00139 [pdf, other]

Leveraging Label Non-Uniformity for Node Classification in Graph Neural Networks

Authors: Feng Ji, See Hian Lee, Hanyang Meng, Kai Zhao, Jielong Yang, Wee Peng Tay

Abstract: In node classification using graph neural networks (GNNs), a typical model generates logits for different class labels at each node. A softmax layer often outputs a label prediction based on the largest logit. We demonstrate that it is possible to infer hidden graph structural information from the dataset using these logits. We introduce the key notion of label non-uniformity, which is derived fro… ▽ More In node classification using graph neural networks (GNNs), a typical model generates logits for different class labels at each node. A softmax layer often outputs a label prediction based on the largest logit. We demonstrate that it is possible to infer hidden graph structural information from the dataset using these logits. We introduce the key notion of label non-uniformity, which is derived from the Wasserstein distance between the softmax distribution of the logits and the uniform distribution. We demonstrate that nodes with small label non-uniformity are harder to classify correctly. We theoretically analyze how the label non-uniformity varies across the graph, which provides insights into boosting the model performance: increasing training samples with high non-uniformity or dropping edges to reduce the maximal cut size of the node set of small non-uniformity. These mechanisms can be easily added to a base GNN model. Experimental results demonstrate that our approach improves the performance of many benchmark base models. △ Less

Submitted 28 April, 2023; originally announced May 2023.

arXiv:2304.06818 [pdf, other]

Soundini: Sound-Guided Diffusion for Natural Video Editing

Authors: Seung Hyun Lee, Sieun Kim, Innfarn Yoo, Feng Yang, Donghyeon Cho, Youngseo Kim, Huiwen Chang, Jinkyu Kim, Sangpil Kim

Abstract: We propose a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting. Animating the appearance of the visual effect is challenging because each frame of the edited video should have visual changes while maintaining temporal consistency. Moreover, existing video editing solutions focus on temporal consistency across frames, ignoring the visual style vari… ▽ More We propose a method for adding sound-guided visual effects to specific regions of videos with a zero-shot setting. Animating the appearance of the visual effect is challenging because each frame of the edited video should have visual changes while maintaining temporal consistency. Moreover, existing video editing solutions focus on temporal consistency across frames, ignoring the visual style variations over time, e.g., thunderstorm, wave, fire crackling. To overcome this limitation, we utilize temporal sound features for the dynamic style. Specifically, we guide denoising diffusion probabilistic models with an audio latent representation in the audio-visual latent space. To the best of our knowledge, our work is the first to explore sound-guided natural video editing from various sound sources with sound-specialized properties, such as intensity, timbre, and volume. Additionally, we design optical flow-based guidance to generate temporally consistent video frames, capturing the pixel-wise relationship between adjacent frames. Experimental results show that our method outperforms existing video editing techniques, producing more realistic visual effects that reflect the properties of sound. Please visit our page: https://kuai-lab.github.io/soundini-gallery/. △ Less

Submitted 13 April, 2023; originally announced April 2023.

arXiv:2304.03507 [pdf, other]

Distributional Signals for Node Classification in Graph Neural Networks

Authors: Feng Ji, See Hian Lee, Kai Zhao, Wee Peng Tay, Jielong Yang

Abstract: In graph neural networks (GNNs), both node features and labels are examples of graph signals, a key notion in graph signal processing (GSP). While it is common in GSP to impose signal smoothness constraints in learning and estimation tasks, it is unclear how this can be done for discrete node labels. We bridge this gap by introducing the concept of distributional graph signals. In our framework, w… ▽ More In graph neural networks (GNNs), both node features and labels are examples of graph signals, a key notion in graph signal processing (GSP). While it is common in GSP to impose signal smoothness constraints in learning and estimation tasks, it is unclear how this can be done for discrete node labels. We bridge this gap by introducing the concept of distributional graph signals. In our framework, we work with the distributions of node labels instead of their values and propose notions of smoothness and non-uniformity of such distributional graph signals. We then propose a general regularization method for GNNs that allows us to encode distributional smoothness and non-uniformity of the model output in semi-supervised node classification tasks. Numerical experiments demonstrate that our method can significantly improve the performance of most base GNN models in different problem settings. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2303.01724 [pdf, other]

Node-Specific Space Selection via Localized Geometric Hyperbolicity in Graph Neural Networks

Authors: See Hian Lee, Feng Ji, Wee Peng Tay

Abstract: Many graph neural networks have been developed to learn graph representations in either Euclidean or hyperbolic space, with all nodes' representations embedded in a single space. However, a graph can have hyperbolic and Euclidean geometries at different regions of the graph. Thus, it is sub-optimal to indifferently embed an entire graph into a single space. In this paper, we explore and analyze tw… ▽ More Many graph neural networks have been developed to learn graph representations in either Euclidean or hyperbolic space, with all nodes' representations embedded in a single space. However, a graph can have hyperbolic and Euclidean geometries at different regions of the graph. Thus, it is sub-optimal to indifferently embed an entire graph into a single space. In this paper, we explore and analyze two notions of local hyperbolicity, describing the underlying local geometry: geometric (Gromov) and model-based, to determine the preferred space of embedding for each node. The two hyperbolicities' distributions are aligned using the Wasserstein metric such that the calculated geometric hyperbolicity guides the choice of the learned model hyperbolicity. As such our model Joint Space Graph Neural Network (JSGNN) can leverage both Euclidean and hyperbolic spaces during learning by allowing node-specific geometry space selection. We evaluate our model on both node classification and link prediction tasks and observe promising performance compared to baseline models. △ Less

Submitted 3 March, 2023; originally announced March 2023.

arXiv:2302.14684 [pdf, other]

doi 10.1088/2632-072X/acef9d

Exploring 3D community inconsistency in human chromosome contact networks

Authors: Dolores Bernenko, Sang Hoon Lee, Ludvig Lizana

Abstract: Researchers developed chromosome capture methods such as Hi-C to better understand DNA's 3D folding in nuclei. The Hi-C method captures contact frequencies between DNA segment pairs across the genome. When analyzing Hi-C data sets, it is common to group these pairs using standard bioinformatics methods (e.g., PCA). Other approaches handle Hi-C data as weighted networks, where connected node repres… ▽ More Researchers developed chromosome capture methods such as Hi-C to better understand DNA's 3D folding in nuclei. The Hi-C method captures contact frequencies between DNA segment pairs across the genome. When analyzing Hi-C data sets, it is common to group these pairs using standard bioinformatics methods (e.g., PCA). Other approaches handle Hi-C data as weighted networks, where connected node represent DNA segments in 3D proximity. In this representation, one can leverage community detection techniques developed in complex network theory to group nodes into mesoscale communities containing similar connection patterns. While there are several successful attempts to analyze Hi-C data in this way, it is common to report and study the most typical community structure. But in reality, there are often several valid candidates. Therefore, depending on algorithm design, different community detection methods focusing on slightly different connectivity features may have differing views on the ideal node groupings. In fact, even the same community detection method may yield different results if using a stochastic algorithm. This ambiguity is fundamental to community detection and shared by most complex networks whenever interactions span all scales in the network. This is known as community inconsistency. This paper explores this inconsistency of 3D communities in Hi-C data for all human chromosomes. We base our analysis on two inconsistency metrics, one local and one global, and quantify the network scales where the community separation is most variable. For example, we find that TADs are less reliable than A/B compartments and that nodes with highly variable node-community memberships are associated with open chromatin. Overall, our study provides a helpful framework for data-driven researchers and increases awareness of some inherent challenges when clustering Hi-C data into 3D communities. △ Less

Submitted 28 February, 2023; originally announced February 2023.

Comments: 10 pages, 7 figures

Journal ref: J. Phys. Complex. 4 035004 (2023)

arXiv:2212.05376 [pdf, other]

What's Wrong with the Absolute Trajectory Error?

Authors: Seong Hun Lee, Javier Civera

Abstract: One of the limitations of the commonly used Absolute Trajectory Error (ATE) is that it is highly sensitive to outliers. As a result, in the presence of just a few outliers, it often fails to reflect the varying accuracy as the inlier trajectory error or the number of outliers varies. In this work, we propose an alternative error metric for evaluating the accuracy of the reconstructed camera trajec… ▽ More One of the limitations of the commonly used Absolute Trajectory Error (ATE) is that it is highly sensitive to outliers. As a result, in the presence of just a few outliers, it often fails to reflect the varying accuracy as the inlier trajectory error or the number of outliers varies. In this work, we propose an alternative error metric for evaluating the accuracy of the reconstructed camera trajectory. Our metric, named Discernible Trajectory Error (DTE), is computed in five steps: (1) Shift the ground-truth and estimated trajectories such that both of their geometric medians are located at the origin. (2) Rotate the estimated trajectory such that it minimizes the sum of geodesic distances between the corresponding camera orientations. (3) Scale the estimated trajectory such that the median distance of the cameras to their geometric median is the same as that of the ground truth. (4) Compute, winsorize and normalize the distances between the corresponding cameras. (5) Obtain the DTE by taking the average of the mean and the root-mean-square (RMS) of the resulting distances. This metric is an attractive alternative to the ATE, in that it is capable of discerning the varying trajectory accuracy as the inlier trajectory error or the number of outliers varies. Using the similar idea, we also propose a novel rotation error metric, named Discernible Rotation Error (DRE), which has similar advantages to the DTE. Furthermore, we propose a simple yet effective method for calibrating the camera-to-marker rotation, which is needed for the computation of our metrics. Our methods are verified through extensive simulations. △ Less

Submitted 9 July, 2024; v1 submitted 10 December, 2022; originally announced December 2022.

arXiv:2211.11381 [pdf, other]

LISA: Localized Image Stylization with Audio via Implicit Neural Representation

Authors: Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

Abstract: We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization. Sound often provides information about the specific context of the scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a pa… ▽ More We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization. Sound often provides information about the specific context of the scene and is closely related to a certain part of the scene or object. However, existing image stylization works have focused on stylizing the entire image using an image or text input. Stylizing a particular part of the image based on audio input is natural but challenging. In this work, we propose a framework that a user provides an audio input to localize the sound source in the input image and another for locally stylizing the target object or scene. LISA first produces a delicate localization map with an audio-visual localization network by leveraging CLIP embedding space. We then utilize implicit neural representation (INR) along with the predicted localization map to stylize the target object or scene based on sound information. The proposed INR can manipulate the localized pixel values to be semantically consistent with the provided audio input. Through a series of experiments, we show that the proposed framework outperforms the other audio-guided stylization methods. Moreover, LISA constructs concise localization maps and naturally manipulates the target object or scene in accordance with the given audio input. △ Less

Submitted 21 November, 2022; originally announced November 2022.

arXiv:2208.14114 [pdf, other]

Robust Sound-Guided Image Manipulation

Authors: Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

Abstract: Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image generator, which leverages multi-modal (text and image) embedding space. However, we observe that such text inputs are often bottlenecked in provi… ▽ More Recent successes suggest that an image can be manipulated by a text prompt, e.g., a landscape scene on a sunny day is manipulated into the same scene on a rainy day driven by a text input "raining". These approaches often utilize a StyleCLIP-based image generator, which leverages multi-modal (text and image) embedding space. However, we observe that such text inputs are often bottlenecked in providing and synthesizing rich semantic cues, e.g., differentiating heavy rain from rain with thunderstorms. To address this issue, we advocate leveraging an additional modality, sound, which has notable advantages in image manipulation as it can convey more diverse semantic cues (vivid emotions or dynamic expressions of the natural world) than texts. In this paper, we propose a novel approach that first extends the image-text joint embedding space with sound and applies a direct latent optimization method to manipulate a given image based on audio input, e.g., the sound of rain. Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations. Our downstream task evaluations also show that our learned image-text-sound joint embedding space effectively encodes sound inputs. △ Less

Submitted 24 April, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

Comments: arXiv admin note: text overlap with arXiv:2112.00007

arXiv:2207.11761 [pdf, other]

SGAT: Simplicial Graph Attention Network

Authors: See Hian Lee, Feng Ji, Wee Peng Tay

Abstract: Heterogeneous graphs have multiple node and edge types and are semantically richer than homogeneous graphs. To learn such complex semantics, many graph neural network approaches for heterogeneous graphs use metapaths to capture multi-hop interactions between nodes. Typically, features from non-target nodes are not incorporated into the learning procedure. However, there can be nonlinear, high-orde… ▽ More Heterogeneous graphs have multiple node and edge types and are semantically richer than homogeneous graphs. To learn such complex semantics, many graph neural network approaches for heterogeneous graphs use metapaths to capture multi-hop interactions between nodes. Typically, features from non-target nodes are not incorporated into the learning procedure. However, there can be nonlinear, high-order interactions involving multiple nodes or edges. In this paper, we present Simplicial Graph Attention Network (SGAT), a simplicial complex approach to represent such high-order interactions by placing features from non-target nodes on the simplices. We then use attention mechanisms and upper adjacencies to generate representations. We empirically demonstrate the efficacy of our approach with node classification tasks on heterogeneous graph datasets and further show SGAT's ability in extracting structural information by employing random node features. Numerical experiments indicate that SGAT performs better than other current state-of-the-art heterogeneous graph learning methods. △ Less

Submitted 24 July, 2022; originally announced July 2022.

Comments: Accepted in the 31st International Joint Conference on Artificial Intelligence (IJCAI-ECAI), 2022

arXiv:2206.02570 [pdf, other]

RODIAN: Robustified Median

Authors: Seong Hun Lee, Javier Civera

Abstract: We propose a robust method for averaging numbers contaminated by a large proportion of outliers. Our method, dubbed RODIAN, is inspired by the key idea of MINPRAN [1]: We assume that the outliers are uniformly distributed within the range of the data and we search for the region that is least likely to contain outliers only. The median of the data within this region is then taken as RODIAN. Our ap… ▽ More We propose a robust method for averaging numbers contaminated by a large proportion of outliers. Our method, dubbed RODIAN, is inspired by the key idea of MINPRAN [1]: We assume that the outliers are uniformly distributed within the range of the data and we search for the region that is least likely to contain outliers only. The median of the data within this region is then taken as RODIAN. Our approach can accurately estimate the true mean of data with more than 50% outliers and runs in time $O(n\log n)$. Unlike other robust techniques, it is completely deterministic and does not rely on a known inlier error bound. Our extensive evaluation shows that RODIAN is much more robust than the median and the least-median-of-squares. This result also holds in the case of non-uniform outlier distributions. △ Less

Submitted 18 November, 2022; v1 submitted 3 June, 2022; originally announced June 2022.

arXiv:2205.09185 [pdf, other]

doi 10.1016/j.nima.2022.167748

AI-assisted Optimization of the ECCE Tracking System at the Electron Ion Collider

Authors: C. Fanelli, Z. Papandreou, K. Suresh, J. K. Adkins, Y. Akiba, A. Albataineh, M. Amaryan, I. C. Arsene, C. Ayerbe Gayoso, J. Bae, X. Bai, M. D. Baker, M. Bashkanov, R. Bellwied, F. Benmokhtar, V. Berdnikov, J. C. Bernauer, F. Bock, W. Boeglin, M. Borysova, E. Brash, P. Brindza, W. J. Briscoe, M. Brooks, S. Bueltmann , et al. (258 additional authors not shown)

Abstract: The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to… ▽ More The Electron-Ion Collider (EIC) is a cutting-edge accelerator facility that will study the nature of the "glue" that binds the building blocks of the visible matter in the universe. The proposed experiment will be realized at Brookhaven National Laboratory in approximately 10 years from now, with detector design and R&D currently ongoing. Notably, EIC is one of the first large-scale facilities to leverage Artificial Intelligence (AI) already starting from the design and R&D phases. The EIC Comprehensive Chromodynamics Experiment (ECCE) is a consortium that proposed a detector design based on a 1.5T solenoid. The EIC detector proposal review concluded that the ECCE design will serve as the reference design for an EIC detector. Herein we describe a comprehensive optimization of the ECCE tracker using AI. The work required a complex parametrization of the simulated detector system. Our approach dealt with an optimization problem in a multidimensional design space driven by multiple objectives that encode the detector performance, while satisfying several mechanical constraints. We describe our strategy and show results obtained for the ECCE tracking system. The AI-assisted design is agnostic to the simulation framework and can be extended to other sub-detectors or to a system of sub-detectors to further optimize the performance of the EIC detector. △ Less

Submitted 19 May, 2022; v1 submitted 18 May, 2022; originally announced May 2022.

Comments: 16 pages, 18 figures, 2 appendices, 3 tables

arXiv:2204.09273 [pdf, other]

Sound-Guided Semantic Video Generation

Authors: Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim

Abstract: The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the direction and magnitude in the StyleGAN latent space. In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound… ▽ More The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation. However, the generated motion in the video is usually not semantically meaningful due to the difficulty of determining the direction and magnitude in the StyleGAN latent space. In this paper, we propose a framework to generate realistic videos by leveraging multimodal (sound-image-text) embedding space. As sound provides the temporal contexts of the scene, our framework learns to generate a video that is semantically consistent with sound. First, our sound inversion module maps the audio directly into the StyleGAN latent space. We then incorporate the CLIP-based multimodal embedding space to further provide the audio-visual relationships. Finally, the proposed frame generator learns to find the trajectory in the latent space which is coherent with the corresponding sound and generates a video in a hierarchical manner. We provide the new high-resolution landscape video dataset (audio-visual pair) for the sound-guided video generation task. The experiments show that our model outperforms the state-of-the-art methods in terms of video quality. We further show several applications including image and video editing to verify the effectiveness of our method. △ Less

Submitted 21 October, 2022; v1 submitted 20 April, 2022; originally announced April 2022.

arXiv:2112.13492 [pdf, other]

Vision Transformer for Small-Size Datasets

Authors: Seung Hoon Lee, Seunghyun Lee, Byung Cheol Song

Abstract: Recently, the Vision Transformer (ViT), which applied the transformer structure to the image classification task, has outperformed convolutional neural networks. However, the high performance of the ViT results from pre-training using a large-size dataset such as JFT-300M, and its dependence on a large dataset is interpreted as due to low locality inductive bias. This paper proposes Shifted Patch… ▽ More Recently, the Vision Transformer (ViT), which applied the transformer structure to the image classification task, has outperformed convolutional neural networks. However, the high performance of the ViT results from pre-training using a large-size dataset such as JFT-300M, and its dependence on a large dataset is interpreted as due to low locality inductive bias. This paper proposes Shifted Patch Tokenization (SPT) and Locality Self-Attention (LSA), which effectively solve the lack of locality inductive bias and enable it to learn from scratch even on small-size datasets. Moreover, SPT and LSA are generic and effective add-on modules that are easily applicable to various ViTs. Experimental results show that when both SPT and LSA were applied to the ViTs, the performance improved by an average of 2.96% in Tiny-ImageNet, which is a representative small-size dataset. Especially, Swin Transformer achieved an overwhelming performance improvement of 4.08% thanks to the proposed SPT and LSA. △ Less

Submitted 26 December, 2021; originally announced December 2021.

arXiv:2112.00007 [pdf, other]

Sound-Guided Semantic Image Manipulation

Authors: Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim

Abstract: The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a fra… ▽ More The recent success of the generative model shows that leveraging the multi-modal embedding space can manipulate an image using text information. However, manipulating an image with other sources rather than text, such as sound, is not easy due to the dynamic characteristics of the sources. Especially, sound can convey vivid emotions and dynamic expressions of the real world. Here, we propose a framework that directly encodes sound into the multi-modal (image-text) embedding space and manipulates an image from the space. Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space. We use a direct latent optimization method based on aligned embeddings for sound-guided image manipulation. We also show that our method can mix text and audio modalities, which enrich the variety of the image modification. We verify the effectiveness of our sound-guided image manipulation quantitatively and qualitatively. We also show that our method can mix different modalities, i.e., text and audio, which enrich the variety of the image modification. The experiments on zero-shot audio classification and semantic-level image classification show that our proposed model outperforms other text and sound-guided state-of-the-art methods. △ Less

Submitted 30 November, 2021; originally announced December 2021.

arXiv:2111.08831 [pdf, other]

HARA: A Hierarchical Approach for Robust Rotation Averaging

Authors: Seong Hun Lee, Javier Civera

Abstract: We propose a novel hierarchical approach for multiple rotation averaging, dubbed HARA. Our method incrementally initializes the rotation graph based on a hierarchy of triplet support. The key idea is to build a spanning tree by prioritizing the edges with many strong triplet supports and gradually adding those with weaker and fewer supports. This reduces the risk of adding outliers in the spanning… ▽ More We propose a novel hierarchical approach for multiple rotation averaging, dubbed HARA. Our method incrementally initializes the rotation graph based on a hierarchy of triplet support. The key idea is to build a spanning tree by prioritizing the edges with many strong triplet supports and gradually adding those with weaker and fewer supports. This reduces the risk of adding outliers in the spanning tree. As a result, we obtain a robust initial solution that enables us to filter outliers prior to nonlinear optimization. With minimal modification, our approach can also integrate the knowledge of the number of valid 2D-2D correspondences. We perform extensive evaluations on both synthetic and real datasets, demonstrating state-of-the-art results. △ Less

Submitted 29 March, 2022; v1 submitted 16 November, 2021; originally announced November 2021.

Comments: Accepted to CVPR2022

arXiv:2111.06521 [pdf, other]

doi 10.1007/s40042-021-00352-3

Refinement for community structures of bipartite networks

Authors: Sang Hoon Lee

Abstract: Bipartite networks composed of dichotomous node sets are ubiquitous in nature and society. Partly for simplicity's sake, many studies have focused on their projection onto their unipartite versions where one only needs to care about a single type of node. When it comes to mesoscale structures such as communities, however, properly incorporating a priori structural restrictions such as bipartivity… ▽ More Bipartite networks composed of dichotomous node sets are ubiquitous in nature and society. Partly for simplicity's sake, many studies have focused on their projection onto their unipartite versions where one only needs to care about a single type of node. When it comes to mesoscale structures such as communities, however, properly incorporating a priori structural restrictions such as bipartivity is ever more important. In this paper, as a case study, we take the community structure of bipartite networks in various scales to examine the amount of information of bipartivity encoded in the community detection procedure. In particular, we report the robustness in reliability of detected community based on consistency by comparing the detection algorithm with or without the consideration of bipartivity. From the analysis with model networks embedding prescribed communities and real networks, we find that the community detection tailored to take the bipartivity into account clearly yields more robust community structures than the one without such structural information. This demonstrates the necessity for customizing the community detection algorithm by encoding whatever information is known about networks of interest and, at the same time, raises an interesting question on the possibility of estimating the quantitative amount of information from such a customization. △ Less

Submitted 30 December, 2021; v1 submitted 11 November, 2021; originally announced November 2021.

Comments: 8 pages, 7 figures

Journal ref: J. Korean Phys. Soc. 79, 1190 (2021)

arXiv:2106.07984 [pdf, ps, other]

Learning Autonomy in Management of Wireless Random Networks

Authors: Hoon Lee, Sang Hyun Lee, Tony Q. S. Quek

Abstract: This paper presents a machine learning strategy that tackles a distributed optimization task in a wireless network with an arbitrary number of randomly interconnected nodes. Individual nodes decide their optimal states with distributed coordination among other nodes through randomly varying backhaul links. This poses a technical challenge in distributed universal optimization policy robust to a ra… ▽ More This paper presents a machine learning strategy that tackles a distributed optimization task in a wireless network with an arbitrary number of randomly interconnected nodes. Individual nodes decide their optimal states with distributed coordination among other nodes through randomly varying backhaul links. This poses a technical challenge in distributed universal optimization policy robust to a random topology of the wireless network, which has not been properly addressed by conventional deep neural networks (DNNs) with rigid structural configurations. We develop a flexible DNN formalism termed distributed message-passing neural network (DMPNN) with forward and backward computations independent of the network topology. A key enabler of this approach is an iterative message-sharing strategy through arbitrarily connected backhaul links. The DMPNN provides a convergent solution for iterative coordination by learning numerous random backhaul interactions. The DMPNN is investigated for various configurations of the power control in wireless networks, and intensive numerical results prove its universality and viability over conventional optimization and DNN approaches. △ Less

Submitted 15 June, 2021; originally announced June 2021.

Comments: to appear in IEEE TWC

arXiv:2105.06314 [pdf, other]

Explainable Machine Learning for Fraud Detection

Authors: Ismini Psychoula, Andreas Gutmann, Pradip Mainali, S. H. Lee, Paul Dunphy, Fabien A. P. Petitcolas

Abstract: The application of machine learning to support the processing of large datasets holds promise in many industries, including financial services. However, practical issues for the full adoption of machine learning remain with the focus being on understanding and being able to explain the decisions and predictions made by complex models. In this paper, we explore explainability methods in the domain… ▽ More The application of machine learning to support the processing of large datasets holds promise in many industries, including financial services. However, practical issues for the full adoption of machine learning remain with the focus being on understanding and being able to explain the decisions and predictions made by complex models. In this paper, we explore explainability methods in the domain of real-time fraud detection by investigating the selection of appropriate background datasets and runtime trade-offs on both supervised and unsupervised models. △ Less

Submitted 13 May, 2021; originally announced May 2021.

Comments: To be published in IEEE Computer Special Issue on Explainable AI and Machine Learning, 12 pages, 7 figures

arXiv:2104.11589 [pdf, ps, other]

doi 10.1109/CVPRW53098.2021.00457

SBNet: Segmentation-based Network for Natural Language-based Vehicle Search

Authors: Sangrok Lee, Taekang Woo, Sang Hun Lee

Abstract: Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police searching for a suspect vehicle. However, it is challenging due to the ambiguity of language descriptions and the difficulty of processing multi-modal data. To tackle this problem, we pr… ▽ More Natural language-based vehicle retrieval is a task to find a target vehicle within a given image based on a natural language description as a query. This technology can be applied to various areas including police searching for a suspect vehicle. However, it is challenging due to the ambiguity of language descriptions and the difficulty of processing multi-modal data. To tackle this problem, we propose a deep neural network called SBNet that performs natural language-based segmentation for vehicle retrieval. We also propose two task-specific modules to improve performance: a substitution module that helps features from different domains to be embedded in the same space and a future prediction module that learns temporal information. SBnet has been trained using the CityFlow-NL dataset that contains 2,498 tracks of vehicles with three unique natural language descriptions each and tested 530 unique vehicle tracks and their corresponding query sets. SBNet achieved a significant improvement over the baseline in the natural language-based vehicle tracking track in the AI City Challenge 2021. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 7 pages, 4 figures, CVPR Workshop Paper

ACM Class: I.2.10; I.5.1; I.4.8

Journal ref: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 4049-4055

arXiv:2104.10401 [pdf]

doi 10.1093/jcde/qwad014

Multi-Attention-Based Soft Partition Network for Vehicle Re-Identification

Authors: Sangrok Lee, Taekang Woo, Sang Hun Lee

Abstract: Vehicle re-identification helps in distinguishing between images of the same and other vehicles. It is a challenging process because of significant intra-instance differences between identical vehicles from different views and subtle inter-instance differences between similar vehicles. To solve this issue, researchers have extracted view-aware or part-specific features via spatial attention mechan… ▽ More Vehicle re-identification helps in distinguishing between images of the same and other vehicles. It is a challenging process because of significant intra-instance differences between identical vehicles from different views and subtle inter-instance differences between similar vehicles. To solve this issue, researchers have extracted view-aware or part-specific features via spatial attention mechanisms, which usually result in noisy attention maps or otherwise require expensive additional annotation for metadata, such as key points, to improve the quality. Meanwhile, based on the researchers' insights, various handcrafted multi-attention architectures for specific viewpoints or vehicle parts have been proposed. However, this approach does not guarantee that the number and nature of attention branches will be optimal for real-world re-identification tasks. To address these problems, we proposed a new vehicle re-identification network based on a multiple soft attention mechanism for capturing various discriminative regions from different viewpoints more efficiently. Furthermore, this model can significantly reduce the noise in spatial attention maps by devising a new method for creating an attention map for insignificant regions and then excluding it from generating the final result. We also combined a channel-wise attention mechanism with a spatial attention mechanism for the efficient selection of important semantic attributes for vehicle re-identification. Our experiments showed that our proposed model achieved a state-of-the-art performance among the attention-based methods without metadata and was comparable to the approaches using metadata for the VehicleID and VERI-Wild datasets. △ Less

Submitted 2 August, 2023; v1 submitted 21 April, 2021; originally announced April 2021.

Comments: 15 pages, 6 figures

ACM Class: I.2.10; I.5.1; I.4.8

Journal ref: J. Comput. Des. Eng. 10 (2023) 488-502

arXiv:2103.15532 [pdf, other]

doi 10.1109/ICASSP39728.2021.9413417

Learning on heterogeneous graphs using high-order relations

Authors: See Hian Lee, Feng Ji, Wee Peng Tay

Abstract: A heterogeneous graph consists of different vertices and edges types. Learning on heterogeneous graphs typically employs meta-paths to deal with the heterogeneity by reducing the graph to a homogeneous network, guide random walks or capture semantics. These methods are however sensitive to the choice of meta-paths, with suboptimal paths leading to poor performance. In this paper, we propose an app… ▽ More A heterogeneous graph consists of different vertices and edges types. Learning on heterogeneous graphs typically employs meta-paths to deal with the heterogeneity by reducing the graph to a homogeneous network, guide random walks or capture semantics. These methods are however sensitive to the choice of meta-paths, with suboptimal paths leading to poor performance. In this paper, we propose an approach for learning on heterogeneous graphs without using meta-paths. Specifically, we decompose a heterogeneous graph into different homogeneous relation-type graphs, which are then combined to create higher-order relation-type representations. These representations preserve the heterogeneity of edges and retain their edge directions while capturing the interaction of different vertex types multiple hops apart. This is then complemented with attention mechanisms to distinguish the importance of the relation-type based neighbors and the relation-types themselves. Experiments demonstrate that our model generally outperforms other state-of-the-art baselines in the vertex classification task on three commonly studied heterogeneous graph datasets. △ Less

Submitted 3 March, 2023; v1 submitted 29 March, 2021; originally announced March 2021.

arXiv:2102.10497 [pdf]

doi 10.1093/jcde/qwaa052

User interface for in-vehicle systems with on-wheel finger spreading gestures and head-up displays

Authors: Sang Hun Lee, Se-One Yoon

Abstract: Interacting with an in-vehicle system through a central console is known to induce visual and biomechanical distractions, thereby delaying the danger recognition and response times of the driver and significantly increasing the risk of an accident. To address this problem, various hand gestures have been developed. Although such gestures can reduce visual demand, they are limited in number, lack p… ▽ More Interacting with an in-vehicle system through a central console is known to induce visual and biomechanical distractions, thereby delaying the danger recognition and response times of the driver and significantly increasing the risk of an accident. To address this problem, various hand gestures have been developed. Although such gestures can reduce visual demand, they are limited in number, lack passive feedback, and can be vague and imprecise, difficult to understand and remember, and culture-bound. To overcome these limitations, we developed a novel on-wheel finger spreading gestural interface combined with a head-up display (HUD) allowing the user to choose a menu displayed in the HUD with a gesture. This interface displays audio and air conditioning functions on the central console of a HUD and enables their control using a specific number of fingers while keeping both hands on the steering wheel. We compared the effectiveness of the newly proposed hybrid interface against a traditional tactile interface for a central console using objective measurements and subjective evaluations regarding both the vehicle and driver behaviour. A total of 32 subjects were recruited to conduct experiments on a driving simulator equipped with the proposed interface under various scenarios. The results showed that the proposed interface was approximately 20% faster in emergency response than the traditional interface, whereas its performance in maintaining vehicle speed and lane was not significantly different from that of the traditional one. △ Less

Submitted 20 February, 2021; originally announced February 2021.

Comments: This paper was published in the Journal of Computational Design and Engineering (Oxford University Press) in December 2020 https://academic.oup.com/jcde/article/7/6/700/5859941

ACM Class: H.5.2

Journal ref: Journal of Computational Design and Engineering 7 (2020) 700-721

arXiv:2011.11724 [pdf, other]

Rotation-Only Bundle Adjustment

Authors: Seong Hun Lee, Javier Civera

Abstract: We propose a novel method for estimating the global rotations of the cameras independently of their positions and the scene structure. When two calibrated cameras observe five or more of the same points, their relative rotation can be recovered independently of the translation. We extend this idea to multiple views, thereby decoupling the rotation estimation from the translation and structure esti… ▽ More We propose a novel method for estimating the global rotations of the cameras independently of their positions and the scene structure. When two calibrated cameras observe five or more of the same points, their relative rotation can be recovered independently of the translation. We extend this idea to multiple views, thereby decoupling the rotation estimation from the translation and structure estimation. Our approach provides several benefits such as complete immunity to inaccurate translations and structure, and the accuracy improvement when used with rotation averaging. We perform extensive evaluations on both synthetic and real datasets, demonstrating consistent and significant gains in accuracy when used with the state-of-the-art rotation averaging method. △ Less

Submitted 27 March, 2021; v1 submitted 23 November, 2020; originally announced November 2020.

Comments: Accepted to CVPR 2021

arXiv:2008.11047 [pdf, other]

doi 10.1103/PhysRevResearch.3.043136

Uncovering hidden dependency in weighted networks via information entropy

Authors: Mi Jin Lee, Eun Lee, Byunghwee Lee, Hawoong Jeong, Deok-Sun Lee, Sang Hoon Lee

Abstract: Interactions between elements, which are usually represented by networks, have to delineate potentially unequal relationships in terms of their relative importance or direction. The intrinsic unequal relationships of such kind, however, are opaque or hidden in numerous real systems. For instance, when a node in a network with limited interaction capacity spends its capacity to its neighboring node… ▽ More Interactions between elements, which are usually represented by networks, have to delineate potentially unequal relationships in terms of their relative importance or direction. The intrinsic unequal relationships of such kind, however, are opaque or hidden in numerous real systems. For instance, when a node in a network with limited interaction capacity spends its capacity to its neighboring nodes, the allocation of the total amount of interactions to them can be vastly diverse. Even if such potentially heterogeneous interactions epitomized by weighted networks are observable, as a result of the aforementioned ego-centric allocation of interactions, the relative importance or dependency between two interacting nodes can only be implicitly accessible. In this work, we precisely pinpoint such relative dependency by proposing the framework to discover hidden dependent relations extracted from weighted networks. For a given weighted network, we provide a systematic criterion to select the most essential interactions for individual nodes based on the concept of information entropy. The criterion is symbolized by assigning the effective number of neighbors or the effective out-degree to each node, and the resultant directed subnetwork decodes the hidden dependent relations by leaving only the most essential directed interactions. We apply our methodology to two time-stamped empirical network data, namely the international trade relations between nations in the world trade web (WTW) and the network of people in the historical record of Korea, Annals of the Joseon Dynasty (AJD). Based on the data analysis, we discover that the properties of mutual dependency encoded in the two systems are vastly different. △ Less

Submitted 29 November, 2021; v1 submitted 25 August, 2020; originally announced August 2020.

Comments: 20 pages, 15 figures

Journal ref: Phys. Rev. Res. 3, 043136 (2021)

arXiv:2008.01258 [pdf, other]

Robust Uncertainty-Aware Multiview Triangulation

Authors: Seong Hun Lee, Javier Civera

Abstract: We propose a robust and efficient method for multiview triangulation and uncertainty estimation. Our contribution is threefold: First, we propose an outlier rejection scheme using two-view RANSAC with the midpoint method. By prescreening the two-view samples prior to triangulation, we achieve the state-of-the-art efficiency. Second, we compare different local optimization methods for refining the… ▽ More We propose a robust and efficient method for multiview triangulation and uncertainty estimation. Our contribution is threefold: First, we propose an outlier rejection scheme using two-view RANSAC with the midpoint method. By prescreening the two-view samples prior to triangulation, we achieve the state-of-the-art efficiency. Second, we compare different local optimization methods for refining the initial solution and the inlier set. With an iterative update of the inlier set, we show that the optimization provides significant improvement in accuracy and robustness. Third, we model the uncertainty of a triangulated point as a function of three factors: the number of cameras, the mean reprojection error and the maximum parallax angle. Learning this model allows us to quickly interpolate the uncertainty at test time. We validate our method through an extensive evaluation. △ Less

Submitted 5 August, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

arXiv:2008.01254 [pdf, other]

Geometric Interpretations of the Normalized Epipolar Error

Authors: Seong Hun Lee, Javier Civera

Abstract: In this work, we provide geometric interpretations of the normalized epipolar error. Most notably, we show that it is directly related to the following quantities: (1) the shortest distance between the two backprojected rays, (2) the dihedral angle between the two bounding epipolar planes, and (3) the $L_1$-optimal angular reprojection error. In this work, we provide geometric interpretations of the normalized epipolar error. Most notably, we show that it is directly related to the following quantities: (1) the shortest distance between the two backprojected rays, (2) the dihedral angle between the two bounding epipolar planes, and (3) the $L_1$-optimal angular reprojection error. △ Less

Submitted 30 December, 2020; v1 submitted 3 August, 2020; originally announced August 2020.

arXiv:2004.12032 [pdf, other]

doi 10.1109/CVPRW50498.2020.00312

StRDAN: Synthetic-to-Real Domain Adaptation Network for Vehicle Re-Identification

Authors: Sangrok Lee, Eunsoo Park, Hongsuk Yi, Sang Hun Lee

Abstract: Vehicle re-identification aims to obtain the same vehicles from vehicle images. This is challenging but essential for analyzing and predicting traffic flow in the city. Although deep learning methods have achieved enormous progress for this task, their large data requirement is a critical shortcoming. Therefore, we propose a synthetic-to-real domain adaptation network (StRDAN) framework, which can… ▽ More Vehicle re-identification aims to obtain the same vehicles from vehicle images. This is challenging but essential for analyzing and predicting traffic flow in the city. Although deep learning methods have achieved enormous progress for this task, their large data requirement is a critical shortcoming. Therefore, we propose a synthetic-to-real domain adaptation network (StRDAN) framework, which can be trained with inexpensive large-scale synthetic and real data to improve performance. The StRDAN training method combines domain adaptation and semi-supervised learning methods and their associated losses. StRDAN offers significant improvement over the baseline model, which can only be trained using real data, for VeRi and CityFlow-ReID datasets, achieving 3.1% and 12.9% improved mean average precision, respectively. △ Less

Submitted 17 July, 2020; v1 submitted 24 April, 2020; originally announced April 2020.

Comments: 7 pages, 2 figures, CVPR Workshop Paper (Revised)

ACM Class: I.2.10; I.5.1; I.4.8

Journal ref: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2590-2597

arXiv:2004.00732 [pdf, other]

Robust Single Rotation Averaging

Authors: Seong Hun Lee, Javier Civera

Abstract: We propose a novel method for single rotation averaging using the Weiszfeld algorithm. Our contribution is threefold: First, we propose a robust initialization based on the elementwise median of the input rotation matrices. Our initial solution is more accurate and robust than the commonly used chordal $L_2$-mean. Second, we propose an outlier rejection scheme that can be incorporated in the Weisz… ▽ More We propose a novel method for single rotation averaging using the Weiszfeld algorithm. Our contribution is threefold: First, we propose a robust initialization based on the elementwise median of the input rotation matrices. Our initial solution is more accurate and robust than the commonly used chordal $L_2$-mean. Second, we propose an outlier rejection scheme that can be incorporated in the Weiszfeld algorithm to improve the robustness of $L_1$ rotation averaging. Third, we propose a method for approximating the chordal $L_1$-mean using the Weiszfeld algorithm. An extensive evaluation shows that both our method and the state of the art perform equally well with the proposed outlier rejection scheme, but ours is $2-4$ times faster. △ Less

Submitted 4 November, 2020; v1 submitted 1 April, 2020; originally announced April 2020.

arXiv:2001.06875 [pdf, other]

doi 10.1007/978-3-030-28603-3_6

RGB-D Odometry and SLAM

Authors: Javier Civera, Seong Hun Lee

Abstract: The emergence of modern RGB-D sensors had a significant impact in many application fields, including robotics, augmented reality (AR) and 3D scanning. They are low-cost, low-power and low-size alternatives to traditional range sensors such as LiDAR. Moreover, unlike RGB cameras, RGB-D sensors provide the additional depth information that removes the need of frame-by-frame triangulation for 3D scen… ▽ More The emergence of modern RGB-D sensors had a significant impact in many application fields, including robotics, augmented reality (AR) and 3D scanning. They are low-cost, low-power and low-size alternatives to traditional range sensors such as LiDAR. Moreover, unlike RGB cameras, RGB-D sensors provide the additional depth information that removes the need of frame-by-frame triangulation for 3D scene reconstruction. These merits have made them very popular in mobile robotics and AR, where it is of great interest to estimate ego-motion and 3D scene structure. Such spatial understanding can enable robots to navigate autonomously without collisions and allow users to insert virtual entities consistent with the image stream. In this chapter, we review common formulations of odometry and Simultaneous Localization and Mapping (known by its acronym SLAM) using RGB-D stream input. The two topics are closely related, as the former aims to track the incremental camera motion with respect to a local map of the scene, and the latter to jointly estimate the camera trajectory and the global map with consistency. In both cases, the standard approaches minimize a cost function using nonlinear optimization techniques. This chapter consists of three main parts: In the first part, we introduce the basic concept of odometry and SLAM and motivate the use of RGB-D sensors. We also give mathematical preliminaries relevant to most odometry and SLAM algorithms. In the second part, we detail the three main components of SLAM systems: camera pose tracking, scene mapping and loop closing. For each component, we describe different approaches proposed in the literature. In the final part, we provide a brief discussion on advanced research topics with the references to the state-of-the-art. △ Less

Submitted 19 January, 2020; originally announced January 2020.

Comments: This is the pre-submission version of the manuscript that was later edited and published as a chapter in RGB-D Image Analysis and Processing

arXiv:1910.12048 [pdf, other]

A Deep Learning Approach to Universal Binary Visible Light Communication Transceiver

Authors: Hoon Lee, Tony Q. S. Quek, Sang Hyun Lee

Abstract: This paper studies a deep learning (DL) framework for the design of binary modulated visible light communication (VLC) transceiver with universal dimming support. The dimming control for the optical binary signal boils down to a combinatorial codebook design so that the average Hamming weight of binary codewords matches with arbitrary dimming target. An unsupervised DL technique is employed for ob… ▽ More This paper studies a deep learning (DL) framework for the design of binary modulated visible light communication (VLC) transceiver with universal dimming support. The dimming control for the optical binary signal boils down to a combinatorial codebook design so that the average Hamming weight of binary codewords matches with arbitrary dimming target. An unsupervised DL technique is employed for obtaining a neural network to replace the encoder-decoder pair that recovers the message from the optically transmitted signal. In such a task, a novel stochastic binarization method is developed to generate the set of binary codewords from continuous-valued neural network outputs. For universal support of arbitrary dimming target, the DL-based VLC transceiver is trained with multiple dimming constraints, which turns out to be a constrained training optimization that is very challenging to handle with existing DL methods. We develop a new training algorithm that addresses the dimming constraints through a dual formulation of the optimization. Based on the developed algorithm, the resulting VLC transceiver can be optimized via the end-to-end training procedure. Numerical results verify that the proposed codebook outperforms theoretically best constant weight codebooks under various VLC setups. △ Less

Submitted 26 October, 2019; originally announced October 2019.

Comments: to appear in IEEE Trans. Wireless Commun

arXiv:1908.11024 [pdf, other]

Metric-based Regularization and Temporal Ensemble for Multi-task Learning using Heterogeneous Unsupervised Tasks

Authors: Dae Ha Kim, Seung Hyun Lee, Byung Cheol Song

Abstract: One of the ways to improve the performance of a target task is to learn the transfer of abundant knowledge of a pre-trained network. However, learning of the pre-trained network requires high computation capability and large-scale labeled dataset. To mitigate the burden of large-scale labeling, learning in un/self-supervised manner can be a solution. In addition, using unsupervised multi-task lear… ▽ More One of the ways to improve the performance of a target task is to learn the transfer of abundant knowledge of a pre-trained network. However, learning of the pre-trained network requires high computation capability and large-scale labeled dataset. To mitigate the burden of large-scale labeling, learning in un/self-supervised manner can be a solution. In addition, using unsupervised multi-task learning, a generalized feature representation can be learned. However, unsupervised multi-task learning can be biased to a specific task. To overcome this problem, we propose the metric-based regularization term and temporal task ensemble (TTE) for multi-task learning. Since these two techniques prevent the entire network from learning in a state deviated to a specific task, it is possible to learn a generalized feature representation that appropriately reflects the characteristics of each task without biasing. Experimental results for three target tasks such as classification, object detection and embedding clustering prove that the TTE-based multi-task framework is more effective than the state-of-the-art (SOTA) method in improving the performance of a target task. △ Less

Submitted 28 August, 2019; originally announced August 2019.

Comments: 11 pages. To Appear in the IEEE International Conference on Computer Vision Workshops (ICCVW) 2019

arXiv:1907.11917 [pdf, other]

Triangulation: Why Optimize?

Authors: Seong Hun Lee, Javier Civera

Abstract: For decades, it has been widely accepted that the gold standard for two-view triangulation is to minimize the cost based on reprojection errors. In this work, we challenge this idea. We propose a novel alternative to the classic midpoint method that leads to significantly lower 2D errors and parallax errors. It provides a numerically stable closed-form solution based solely on a pair of backprojec… ▽ More For decades, it has been widely accepted that the gold standard for two-view triangulation is to minimize the cost based on reprojection errors. In this work, we challenge this idea. We propose a novel alternative to the classic midpoint method that leads to significantly lower 2D errors and parallax errors. It provides a numerically stable closed-form solution based solely on a pair of backprojected rays. Since our solution is rotationally invariant, it can also be applied for fisheye and omnidirectional cameras. We show that for small parallax angles, our method outperforms the state-of-the-art in terms of combined 2D, 3D and parallax accuracy, while achieving comparable speed. △ Less

Submitted 23 August, 2019; v1 submitted 27 July, 2019; originally announced July 2019.

Comments: Accepted to BMVC2019 (oral presentation)

arXiv:1905.13378 [pdf, other]

Deep Learning for Distributed Optimization: Applications to Wireless Resource Management

Authors: Hoon Lee, Sang Hyun Lee, Tony Q. S. Quek

Abstract: This paper studies a deep learning (DL) framework to solve distributed non-convex constrained optimizations in wireless networks where multiple computing nodes, interconnected via backhaul links, desire to determine an efficient assignment of their states based on local observations. Two different configurations are considered: First, an infinite-capacity backhaul enables nodes to communicate in a… ▽ More This paper studies a deep learning (DL) framework to solve distributed non-convex constrained optimizations in wireless networks where multiple computing nodes, interconnected via backhaul links, desire to determine an efficient assignment of their states based on local observations. Two different configurations are considered: First, an infinite-capacity backhaul enables nodes to communicate in a lossless way, thereby obtaining the solution by centralized computations. Second, a practical finite-capacity backhaul leads to the deployment of distributed solvers equipped along with quantizers for communication through capacity-limited backhaul. The distributed nature and the nonconvexity of the optimizations render the identification of the solution unwieldy. To handle them, deep neural networks (DNNs) are introduced to approximate an unknown computation for the solution accurately. In consequence, the original problems are transformed to training tasks of the DNNs subject to non-convex constraints where existing DL libraries fail to extend straightforwardly. A constrained training strategy is developed based on the primal-dual method. For distributed implementation, a novel binarization technique at the output layer is developed for quantization at each node. Our proposed distributed DL framework is examined in various network configurations of wireless resource management. Numerical results verify the effectiveness of our proposed approach over existing optimization techniques. △ Less

Submitted 30 May, 2019; originally announced May 2019.

Comments: to appear in IEEE J. Sel. Areas Commun

arXiv:1904.05523 [pdf, other]

doi 10.1103/PhysRevE.100.022311

Relational flexibility of network elements based on inconsistent community detection

Authors: Heetae Kim, Sang Hoon Lee

Abstract: Community identification of network components enables us to understand the mesoscale clustering structure of networks. A number of algorithms have been developed to determine the most likely community structures in networks. Such a probabilistic or stochastic nature of this problem can naturally involve the ambiguity in resultant community structures. More specifically, stochastic algorithms can… ▽ More Community identification of network components enables us to understand the mesoscale clustering structure of networks. A number of algorithms have been developed to determine the most likely community structures in networks. Such a probabilistic or stochastic nature of this problem can naturally involve the ambiguity in resultant community structures. More specifically, stochastic algorithms can result in different community structures for each realization in principle. In this study, instead of trying to "solve" this community degeneracy problem, we turn the tables by taking the degeneracy as a chance to quantify how strong companionship each node has with other nodes. For that purpose, we define the concept of companionship inconsistency that indicates how inconsistently a node is identified as a member of a community regarding the other nodes. Analyzing model and real networks, we show that companionship inconsistency discloses unique characteristics of nodes, thus we suggest it as a new type of node centrality. In social networks, for example, companionship inconsistency can classify outsider nodes without firm community membership and promiscuous nodes with multiple connections to several communities. In infrastructure networks such as power grids, it can diagnose how the connection structure is evenly balanced in terms of power transmission. Companionship inconsistency, therefore, abstracts individual nodes' intrinsic property on its relationship to a higher-order organization of the network. △ Less

Submitted 19 August, 2019; v1 submitted 11 April, 2019; originally announced April 2019.

Comments: 10 pages, 5 figures, 2 tables

Journal ref: Phys. Rev. E 100, 022311 (2019)

arXiv:1903.09115 [pdf, other]

Closed-Form Optimal Two-View Triangulation Based on Angular Errors

Authors: Seong Hun Lee, Javier Civera

Abstract: In this paper, we study closed-form optimal solutions to two-view triangulation with known internal calibration and pose. By formulating the triangulation problem as $L_1$ and $L_\infty$ minimization of angular reprojection errors, we derive the exact closed-form solutions that guarantee global optimality under respective cost functions. To the best of our knowledge, we are the first to present su… ▽ More In this paper, we study closed-form optimal solutions to two-view triangulation with known internal calibration and pose. By formulating the triangulation problem as $L_1$ and $L_\infty$ minimization of angular reprojection errors, we derive the exact closed-form solutions that guarantee global optimality under respective cost functions. To the best of our knowledge, we are the first to present such solutions. Since the angular error is rotationally invariant, our solutions can be applied for any type of central cameras, be it perspective, fisheye or omnidirectional. Our methods also require significantly less computation than the existing optimal methods. Experimental results on synthetic and real datasets validate our theoretical derivations. △ Less

Submitted 29 July, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

Comments: Accepted to ICCV2019

arXiv:1812.05227 [pdf, ps, other]

Deep Learning Framework for Wireless Systems: Applications to Optical Wireless Communications

Authors: Hoon Lee, Sang Hyun Lee, Tony Q. S. Quek, Inkyu Lee

Abstract: Optical wireless communication (OWC) is a promising technology for future wireless communications owing to its potentials for cost-effective network deployment and high data rate. There are several implementation issues in the OWC which have not been encountered in radio frequency wireless communications. First, practical OWC transmitters need an illumination control on color, intensity, and lumin… ▽ More Optical wireless communication (OWC) is a promising technology for future wireless communications owing to its potentials for cost-effective network deployment and high data rate. There are several implementation issues in the OWC which have not been encountered in radio frequency wireless communications. First, practical OWC transmitters need an illumination control on color, intensity, and luminance, etc., which poses complicated modulation design challenges. Furthermore, signal-dependent properties of optical channels raise non-trivial challenges both in modulation and demodulation of the optical signals. To tackle such difficulties, deep learning (DL) technologies can be applied for optical wireless transceiver design. This article addresses recent efforts on DL-based OWC system designs. A DL framework for emerging image sensor communication is proposed and its feasibility is verified by simulation. Finally, technical challenges and implementation issues for the DL-based optical wireless technology are discussed. △ Less

Submitted 12 December, 2018; originally announced December 2018.

Comments: To appear in IEEE Communications Magazine, Special Issue on Applications of Artificial Intelligence in Wireless Communications

arXiv:1807.10073 [pdf, other]

doi 10.1109/LRA.2018.2889156

Loosely-Coupled Semi-Direct Monocular SLAM

Authors: Seong Hun Lee, Javier Civera

Abstract: We propose a novel semi-direct approach for monocular simultaneous localization and mapping (SLAM) that combines the complementary strengths of direct and feature-based methods. The proposed pipeline loosely couples direct odometry and feature-based SLAM to perform three levels of parallel optimizations: (1) photometric bundle adjustment (BA) that jointly optimizes the local structure and motion,… ▽ More We propose a novel semi-direct approach for monocular simultaneous localization and mapping (SLAM) that combines the complementary strengths of direct and feature-based methods. The proposed pipeline loosely couples direct odometry and feature-based SLAM to perform three levels of parallel optimizations: (1) photometric bundle adjustment (BA) that jointly optimizes the local structure and motion, (2) geometric BA that refines keyframe poses and associated feature map points, and (3) pose graph optimization to achieve global map consistency in the presence of loop closures. This is achieved in real-time by limiting the feature-based operations to marginalized keyframes from the direct odometry module. Exhaustive evaluation on two benchmark datasets demonstrates that our system outperforms the state-of-the-art monocular odometry and SLAM systems in terms of overall accuracy and robustness. △ Less

Submitted 6 January, 2019; v1 submitted 26 July, 2018; originally announced July 2018.

Comments: Accepted for publication in IEEE Robotics and Automation Letters. Watch video demo at: https://youtu.be/j7WnU7ZpZ8c

Showing 1–50 of 67 results for author: Lee, S H