-
HyperCLOVA X Technical Report
Authors:
Kang Min Yoo,
Jaegeun Han,
Sookyo In,
Heewon Jeon,
Jisu Jeong,
Jaewook Kang,
Hyunwook Kim,
Kyung-Min Kim,
Munhyong Kim,
Sungju Kim,
Donghyun Kwak,
Hanock Kwak,
Se Jung Kwon,
Bado Lee,
Dongsoo Lee,
Gichang Lee,
Jooho Lee,
Baeseong Park,
Seongjin Shin,
Joonsang Yu,
Seolki Baek,
Sumin Byeon,
Eungsup Cho,
Dooseok Choe,
Jeesung Han
, et al. (371 additional authors not shown)
Abstract:
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment t…
▽ More
We introduce HyperCLOVA X, a family of large language models (LLMs) tailored to the Korean language and culture, along with competitive capabilities in English, math, and coding. HyperCLOVA X was trained on a balanced mix of Korean, English, and code data, followed by instruction-tuning with high-quality human-annotated datasets while abiding by strict safety guidelines reflecting our commitment to responsible AI. The model is evaluated across various benchmarks, including comprehensive reasoning, knowledge, commonsense, factuality, coding, math, chatting, instruction-following, and harmlessness, in both Korean and English. HyperCLOVA X exhibits strong reasoning capabilities in Korean backed by a deep understanding of the language and cultural nuances. Further analysis of the inherent bilingual nature and its extension to multilingualism highlights the model's cross-lingual proficiency and strong generalization ability to untargeted languages, including machine translation between several language pairs and cross-lingual inference tasks. We believe that HyperCLOVA X can provide helpful guidance for regions or countries in developing their sovereign LLMs.
△ Less
Submitted 13 April, 2024; v1 submitted 2 April, 2024;
originally announced April 2024.
-
Neural 5G Indoor Localization with IMU Supervision
Authors:
Aleksandr Ermolov,
Shreya Kadambi,
Maximilian Arnold,
Mohammed Hirzallah,
Roohollah Amiri,
Deepak Singh Mahendar Singh,
Srinivas Yerramalli,
Daniel Dijkman,
Fatih Porikli,
Taesang Yoo,
Bence Major
Abstract:
Radio signals are well suited for user localization because they are ubiquitous, can operate in the dark and maintain privacy. Many prior works learn mappings between channel state information (CSI) and position fully-supervised. However, that approach relies on position labels which are very expensive to acquire. In this work, this requirement is relaxed by using pseudo-labels during deployment,…
▽ More
Radio signals are well suited for user localization because they are ubiquitous, can operate in the dark and maintain privacy. Many prior works learn mappings between channel state information (CSI) and position fully-supervised. However, that approach relies on position labels which are very expensive to acquire. In this work, this requirement is relaxed by using pseudo-labels during deployment, which are calculated from an inertial measurement unit (IMU). We propose practical algorithms for IMU double integration and training of the localization system. We show decimeter-level accuracy on simulated and challenging real data of 5G measurements. Our IMU-supervised method performs similarly to fully-supervised, but requires much less effort to deploy.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
A Deep Learning Approach to Radar-based QPE
Authors:
Ting-Shuo Yo,
Shih-Hao Su,
Jung-Lien Chu,
Chiao-Wei Chang,
Hung-Chi Kuo
Abstract:
In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for…
▽ More
In this study, we propose a volume-to-point framework for quantitative precipitation estimation (QPE) based on the Quantitative Precipitation Estimation and Segregation Using Multiple Sensor (QPESUMS) Mosaic Radar data set. With a data volume consisting of the time series of gridded radar reflectivities over the Taiwan area, we used machine learning algorithms to establish a statistical model for QPE in weather stations. The model extracts spatial and temporal features from the input data volume and then associates these features with the location-specific precipitations. In contrast to QPE methods based on the Z-R relation, we leverage the machine learning algorithms to automatically detect the evolution and movement of weather systems and associate these patterns to a location with specific topographic attributes. Specifically, we evaluated this framework with the hourly precipitation data of 45 weather stations in Taipei during 2013-2016. In comparison to the operational QPE scheme used by the Central Weather Bureau, the volume-to-point framework performed comparably well in general cases and excelled in detecting heavy-rainfall events. By using the current results as the reference benchmark, the proposed method can integrate the heterogeneous data sources and potentially improve the forecast in extreme precipitation scenarios.
△ Less
Submitted 15 February, 2024;
originally announced February 2024.
-
Transformer-Based Neural Surrogate for Link-Level Path Loss Prediction from Variable-Sized Maps
Authors:
Thomas M. Hehn,
Tribhuvanesh Orekondy,
Ori Shental,
Arash Behboodi,
Juan Bucheli,
Akash Doshi,
June Namgoong,
Taesang Yoo,
Ashwin Sampath,
Joseph B. Soriaga
Abstract:
Estimating path loss for a transmitter-receiver location is key to many use-cases including network planning and handover. Machine learning has become a popular tool to predict wireless channel properties based on map data. In this work, we present a transformer-based neural network architecture that enables predicting link-level properties from maps of various dimensions and from sparse measureme…
▽ More
Estimating path loss for a transmitter-receiver location is key to many use-cases including network planning and handover. Machine learning has become a popular tool to predict wireless channel properties based on map data. In this work, we present a transformer-based neural network architecture that enables predicting link-level properties from maps of various dimensions and from sparse measurements. The map contains information about buildings and foliage. The transformer model attends to the regions that are relevant for path loss prediction and, therefore, scales efficiently to maps of different size. Further, our approach works with continuous transmitter and receiver coordinates without relying on discretization. In experiments, we show that the proposed model is able to efficiently learn dominant path losses from sparse training data and generalizes well when tested on novel maps.
△ Less
Submitted 10 October, 2023; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Neural RF SLAM for unsupervised positioning and mapping with channel state information
Authors:
Shreya Kadambi,
Arash Behboodi,
Joseph B. Soriaga,
Max Welling,
Roohollah Amiri,
Srinivas Yerramalli,
Taesang Yoo
Abstract:
We present a neural network architecture for jointly learning user locations and environment mapping up to isometry, in an unsupervised way, from channel state information (CSI) values with no location information. The model is based on an encoder-decoder architecture. The encoder network maps CSI values to the user location. The decoder network models the physics of propagation by parametrizing t…
▽ More
We present a neural network architecture for jointly learning user locations and environment mapping up to isometry, in an unsupervised way, from channel state information (CSI) values with no location information. The model is based on an encoder-decoder architecture. The encoder network maps CSI values to the user location. The decoder network models the physics of propagation by parametrizing the environment using virtual anchors. It aims at reconstructing, from the encoder output and virtual anchor location, the set of time of flights (ToFs) that are extracted from CSI using super-resolution methods. The neural network task is set prediction and is accordingly trained end-to-end. The proposed model learns an interpretable latent, i.e., user location, by just enforcing a physics-based decoder. It is shown that the proposed model achieves sub-meter accuracy on synthetic ray tracing based datasets with single anchor SISO setup while recovering the environment map up to 4cm median error in a 2D environment and 15cm in a 3D environment
△ Less
Submitted 15 March, 2022;
originally announced March 2022.
-
A Deep Reinforcement Learning Framework for Contention-Based Spectrum Sharing
Authors:
Akash Doshi,
Srinivas Yerramalli,
Lorenzo Ferrari,
Taesang Yoo,
Jeffrey G. Andrews
Abstract:
The increasing number of wireless devices operating in unlicensed spectrum motivates the development of intelligent adaptive approaches to spectrum access. We consider decentralized contention-based medium access for base stations (BSs) operating on unlicensed shared spectrum, where each BS autonomously decides whether or not to transmit on a given resource. The contention decision attempts to max…
▽ More
The increasing number of wireless devices operating in unlicensed spectrum motivates the development of intelligent adaptive approaches to spectrum access. We consider decentralized contention-based medium access for base stations (BSs) operating on unlicensed shared spectrum, where each BS autonomously decides whether or not to transmit on a given resource. The contention decision attempts to maximize not its own downlink throughput, but rather a network-wide objective. We formulate this problem as a decentralized partially observable Markov decision process with a novel reward structure that provides long term proportional fairness in terms of throughput. We then introduce a two-stage Markov decision process in each time slot that uses information from spectrum sensing and reception quality to make a medium access decision. Finally, we incorporate these features into a distributed reinforcement learning framework for contention-based spectrum access. Our formulation provides decentralized inference, online adaptability and also caters to partial observability of the environment through recurrent Q-learning. Empirically, we find its maximization of the proportional fairness metric to be competitive with a genie-aided adaptive energy detection threshold, while being robust to channel fading and small contention windows.
△ Less
Submitted 4 October, 2021;
originally announced October 2021.
-
Identification of synoptic weather types over Taiwan area with multiple classifiers
Authors:
Shih-Hao Su,
Jung-Lien Chu,
Ting-Shuo Yo,
Lee-Yaw Lin
Abstract:
In this study, a novel machine learning approach was used to classify three types of synoptic weather events in Taiwan area from 2001 to 2010. We used reanalysis data with three machine learning algorithms to recognize weather systems and evaluated their performance. Overall, the classifiers successfully identified 52-83% of weather events (hit rate), which is higher than the performance of tradit…
▽ More
In this study, a novel machine learning approach was used to classify three types of synoptic weather events in Taiwan area from 2001 to 2010. We used reanalysis data with three machine learning algorithms to recognize weather systems and evaluated their performance. Overall, the classifiers successfully identified 52-83% of weather events (hit rate), which is higher than the performance of traditional objective methods. The results showed that the machine learning approach gave low false alarm rate in general, while the support vector machine (SVM) with more principal components of reanalysis data had higher hit rate on all tested weather events. The sensitivity tests of grid data resolution indicated that the differences between the high- and low-resolution datasets are limited, which implied that the proposed method can achieve reasonable performance in weather forecasting with minimal resources. By identifying daily weather systems in historical reanalysis data, this method can be used to study long-term weather changes, to monitor climatological-scale variations, and to provide a better estimate of climate projections. Furthermore, this method can also serve as an alternative to model output statistics and potentially be used for synoptic weather forecasting.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
A realistic and robust model for Chinese word segmentation
Authors:
Chu-Ren Huang,
Ting-Shuo Yo,
Petr Simon,
Shu-Kai Hsieh
Abstract:
A realistic Chinese word segmentation tool must adapt to textual variations with minimal training input and yet robust enough to yield reliable segmentation result for all variants. Various lexicon-driven approaches to Chinese segmentation, e.g. [1,16], achieve high f-scores yet require massive training for any variation. Text-driven approach, e.g. [12], can be easily adapted for domain and genre…
▽ More
A realistic Chinese word segmentation tool must adapt to textual variations with minimal training input and yet robust enough to yield reliable segmentation result for all variants. Various lexicon-driven approaches to Chinese segmentation, e.g. [1,16], achieve high f-scores yet require massive training for any variation. Text-driven approach, e.g. [12], can be easily adapted for domain and genre changes yet has difficulty matching the high f-scores of the lexicon-driven approaches. In this paper, we refine and implement an innovative text-driven word boundary decision (WBD) segmentation model proposed in [15]. The WBD model treats word segmentation simply and efficiently as a binary decision on whether to realize the natural textual break between two adjacent characters as a word boundary. The WBD model allows simple and quick training data preparation converting characters as contextual vectors for learning the word boundary decision. Machine learning experiments with four different classifiers show that training with 1,000 vectors and 1 million vectors achieve comparable and reliable results. In addition, when applied to SigHAN Bakeoff 3 competition data, the WBD model produces OOV recall rates that are higher than all published results. Unlike all previous work, our OOV recall rate is comparable to our own F-score. Both experiments support the claim that the WBD model is a realistic model for Chinese word segmentation as it can be easily adapted for new variants with the robust result. In conclusion, we will discuss linguistic ramifications as well as future implications for the WBD approach.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
A comparison of evaluation methods in coevolution
Authors:
Ting-Shuo Yo,
Edwin de Jong
Abstract:
In this research, we compare four different evaluation methods in coevolution on the Majority Function problem. The size of the problem is selected such that evaluation against all possible test cases is feasible. Two measures are used for the comparisons, i.e., the objective fitness derived from evaluating solutions against all test cases, and the objective fitness correlation (OFC), which is def…
▽ More
In this research, we compare four different evaluation methods in coevolution on the Majority Function problem. The size of the problem is selected such that evaluation against all possible test cases is feasible. Two measures are used for the comparisons, i.e., the objective fitness derived from evaluating solutions against all test cases, and the objective fitness correlation (OFC), which is defined as the correlation coefficient between subjective and objective fitness. The results of our experiments suggest that a combination of average score and weighted informativeness may provide a more accurate evaluation in coevolution. In order to confirm this difference, a series of t-tests on the preference between each pair of the evaluation methods is performed. The resulting significance is affirmative, and the tests for two quality measures show similar preference on four evaluation methods. %This study is the first time OFC is actually computed on a real problem. Experiments on Majority Function problems with larger sizes and Parity problems are in progress, and their results will be added in the final version.
△ Less
Submitted 21 May, 2019;
originally announced May 2019.
-
Scalable Simple Linear Iterative Clustering (SSLIC) Using a Generic and Parallel Approach
Authors:
Bradley C. Lowekamp,
David T. Chen,
Ziv Yaniv,
Terry S. Yoo
Abstract:
Superpixel algorithms have proven to be a useful initial step for segmentation and subsequent processing of images, reducing computational complexity by replacing the use of expensive per-pixel primitives with a higher-level abstraction, superpixels. They have been successfully applied both in the context of traditional image analysis and deep learning based approaches. In this work, we present a…
▽ More
Superpixel algorithms have proven to be a useful initial step for segmentation and subsequent processing of images, reducing computational complexity by replacing the use of expensive per-pixel primitives with a higher-level abstraction, superpixels. They have been successfully applied both in the context of traditional image analysis and deep learning based approaches. In this work, we present a generalized implementation of the simple linear iterative clustering (SLIC) superpixel algorithm that has been generalized for n-dimensional scalar and multi-channel images. Additionally, the standard iterative implementation is replaced by a parallel, multi-threaded one. We describe the implementation details and analyze its scalability using a strong scaling formulation. Quantitative evaluation is performed using a 3D image, the Visible Human cryosection dataset, and a 2D image from the same dataset. Results show good scalability with runtime gains even when using a large number of threads that exceeds the physical number of available cores (hyperthreading).
△ Less
Submitted 23 July, 2018; v1 submitted 22 June, 2018;
originally announced June 2018.
-
Inference of Personal Attributes from Tweets Using Machine Learning
Authors:
Take Yo,
Kazutoshi Sasahara
Abstract:
Using machine learning algorithms, including deep learning, we studied the prediction of personal attributes from the text of tweets, such as gender, occupation, and age groups. We applied word2vec to construct word vectors, which were then used to vectorize tweet blocks. The resulting tweet vectors were used as inputs for training models, and the prediction accuracy of those models was examined a…
▽ More
Using machine learning algorithms, including deep learning, we studied the prediction of personal attributes from the text of tweets, such as gender, occupation, and age groups. We applied word2vec to construct word vectors, which were then used to vectorize tweet blocks. The resulting tweet vectors were used as inputs for training models, and the prediction accuracy of those models was examined as a function of the dimension of the tweet vectors and the size of the tweet blacks. The results showed that the machine learning algorithms could predict the three personal attributes of interest with 60-70% accuracy.
△ Less
Submitted 24 December, 2017; v1 submitted 28 September, 2017;
originally announced September 2017.