-
Attention-based sequential recommendation system using multimodal data
Authors:
Hyungtaik Oh,
Wonkeun Jo,
Dongil Kim
Abstract:
Sequential recommendation systems that model dynamic preferences based on a use's past behavior are crucial to e-commerce. Recent studies on these systems have considered various types of information such as images and texts. However, multimodal data have not yet been utilized directly to recommend products to users. In this study, we propose an attention-based sequential recommendation method tha…
▽ More
Sequential recommendation systems that model dynamic preferences based on a use's past behavior are crucial to e-commerce. Recent studies on these systems have considered various types of information such as images and texts. However, multimodal data have not yet been utilized directly to recommend products to users. In this study, we propose an attention-based sequential recommendation method that employs multimodal data of items such as images, texts, and categories. First, we extract image and text features from pre-trained VGG and BERT and convert categories into multi-labeled forms. Subsequently, attention operations are performed independent of the item sequence and multimodal representations. Finally, the individual attention information is integrated through an attention fusion function. In addition, we apply multitask learning loss for each modality to improve the generalization performance. The experimental results obtained from the Amazon datasets show that the proposed method outperforms those of conventional sequential recommendation systems.
△ Less
Submitted 28 May, 2024;
originally announced May 2024.
-
NFCL: Simply interpretable neural networks for a short-term multivariate forecasting
Authors:
Wonkeun Jo,
Dongil Kim
Abstract:
Multivariate time-series forecasting (MTSF) stands as a compelling field within the machine learning community. Diverse neural network based methodologies deployed in MTSF applications have demonstrated commendable efficacy. Despite the advancements in model performance, comprehending the rationale behind the model's behavior remains an enigma. Our proposed model, the Neural ForeCasting Layer (NFC…
▽ More
Multivariate time-series forecasting (MTSF) stands as a compelling field within the machine learning community. Diverse neural network based methodologies deployed in MTSF applications have demonstrated commendable efficacy. Despite the advancements in model performance, comprehending the rationale behind the model's behavior remains an enigma. Our proposed model, the Neural ForeCasting Layer (NFCL), employs a straightforward amalgamation of neural networks. This uncomplicated integration ensures that each neural network contributes inputs and predictions independently, devoid of interference from other inputs. Consequently, our model facilitates a transparent explication of forecast results. This paper introduces NFCL along with its diverse extensions. Empirical findings underscore NFCL's superior performance compared to nine benchmark models across 15 available open datasets. Notably, NFCL not only surpasses competitors but also provides elucidation for its predictions. In addition, Rigorous experimentation involving diverse model structures bolsters the justification of NFCL's unique configuration.
△ Less
Submitted 22 May, 2024;
originally announced May 2024.
-
Accuracy of a Large Language Model in Distinguishing Anti- And Pro-vaccination Messages on Social Media: The Case of Human Papillomavirus Vaccination
Authors:
Soojong Kim,
Kwanho Kim,
Claire Wonjeong Jo
Abstract:
Objective. Vaccination has engendered a spectrum of public opinions, with social media acting as a crucial platform for health-related discussions. The emergence of artificial intelligence technologies, such as large language models (LLMs), offers a novel opportunity to efficiently investigate public discourses. This research assesses the accuracy of ChatGPT, a widely used and freely available ser…
▽ More
Objective. Vaccination has engendered a spectrum of public opinions, with social media acting as a crucial platform for health-related discussions. The emergence of artificial intelligence technologies, such as large language models (LLMs), offers a novel opportunity to efficiently investigate public discourses. This research assesses the accuracy of ChatGPT, a widely used and freely available service built upon an LLM, for sentiment analysis to discern different stances toward Human Papillomavirus (HPV) vaccination. Methods. Messages related to HPV vaccination were collected from social media supporting different message formats: Facebook (long format) and Twitter (short format). A selection of 1,000 human-evaluated messages was input into the LLM, which generated multiple response instances containing its classification results. Accuracy was measured for each message as the level of concurrence between human and machine decisions, ranging between 0 and 1. Results. Average accuracy was notably high when 20 response instances were used to determine the machine decision of each message: .882 (SE = .021) and .750 (SE = .029) for anti- and pro-vaccination long-form; .773 (SE = .027) and .723 (SE = .029) for anti- and pro-vaccination short-form, respectively. Using only three or even one instance did not lead to a severe decrease in accuracy. However, for long-form messages, the language model exhibited significantly lower accuracy in categorizing pro-vaccination messages than anti-vaccination ones. Conclusions. ChatGPT shows potential in analyzing public opinions on HPV vaccination using social media content. However, understanding the characteristics and limitations of a language model within specific public health contexts remains imperative.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
A 28.6 mJ/iter Stable Diffusion Processor for Text-to-Image Generation with Patch Similarity-based Sparsity Augmentation and Text-based Mixed-Precision
Authors:
Jiwon Choi,
Wooyoung Jo,
Seongyon Hong,
Beomseok Kwon,
Wonhoon Park,
Hoi-Jun Yoo
Abstract:
This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augm…
▽ More
This paper presents an energy-efficient stable diffusion processor for text-to-image generation. While stable diffusion attained attention for high-quality image synthesis results, its inherent characteristics hinder its deployment on mobile platforms. The proposed processor achieves high throughput and energy efficiency with three key features as solutions: 1) Patch similarity-based sparsity augmentation (PSSA) to reduce external memory access (EMA) energy of self-attention score by 60.3 %, leading to 37.8 % total EMA energy reduction. 2) Text-based important pixel spotting (TIPS) to allow 44.8 % of the FFN layer workload to be processed with low-precision activation. 3) Dual-mode bit-slice core (DBSC) architecture to enhance energy efficiency in FFN layers by 43.0 %. The proposed processor is implemented in 28 nm CMOS technology and achieves 3.84 TOPS peak throughput with 225.6 mW average power consumption. In sum, 28.6 mJ/iteration highly energy-efficient text-to-image generation processor can be achieved at MS-COCO dataset.
△ Less
Submitted 14 March, 2024; v1 submitted 7 March, 2024;
originally announced March 2024.
-
Make Prompts Adaptable: Bayesian Modeling for Vision-Language Prompt Learning with Data-Dependent Prior
Authors:
Youngjae Cho,
HeeSun Bae,
Seungjae Shin,
Yeo Dong Youn,
Weonyoung Joo,
Il-Chul Moon
Abstract:
Recent Vision-Language Pretrained (VLP) models have become the backbone for many downstream tasks, but they are utilized as frozen model without learning. Prompt learning is a method to improve the pre-trained VLP model by adding a learnable context vector to the inputs of the text encoder. In a few-shot learning scenario of the downstream task, MLE training can lead the context vector to over-fit…
▽ More
Recent Vision-Language Pretrained (VLP) models have become the backbone for many downstream tasks, but they are utilized as frozen model without learning. Prompt learning is a method to improve the pre-trained VLP model by adding a learnable context vector to the inputs of the text encoder. In a few-shot learning scenario of the downstream task, MLE training can lead the context vector to over-fit dominant image features in the training data. This overfitting can potentially harm the generalization ability, especially in the presence of a distribution shift between the training and test dataset. This paper presents a Bayesian-based framework of prompt learning, which could alleviate the overfitting issues on few-shot learning application and increase the adaptability of prompts on unseen instances. Specifically, modeling data-dependent prior enhances the adaptability of text features for both seen and unseen image features without the trade-off of performance between them. Based on the Bayesian framework, we utilize the Wasserstein Gradient Flow in the estimation of our target posterior distribution, which enables our prompt to be flexible in capturing the complex modes of image features. We demonstrate the effectiveness of our method on benchmark datasets for several experiments by showing statistically significant improvements on performance compared to existing methods. The code is available at https://github.com/youngjae-cho/APP.
△ Less
Submitted 9 January, 2024;
originally announced January 2024.
-
Overcoming Overconfidence for Active Learning
Authors:
Yujin Hwang,
Won Jo,
Juyoung Hong,
Yukyung Choi
Abstract:
It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adju…
▽ More
It is not an exaggeration to say that the recent progress in artificial intelligence technology depends on large-scale and high-quality data. Simultaneously, a prevalent issue exists everywhere: the budget for data labeling is constrained. Active learning is a prominent approach for addressing this issue, where valuable data for labeling is selected through a model and utilized to iteratively adjust the model. However, due to the limited amount of data in each iteration, the model is vulnerable to bias; thus, it is more likely to yield overconfident predictions. In this paper, we present two novel methods to address the problem of overconfidence that arises in the active learning scenario. The first is an augmentation strategy named Cross-Mix-and-Mix (CMaM), which aims to calibrate the model by expanding the limited training distribution. The second is a selection strategy named Ranked Margin Sampling (RankedMS), which prevents choosing data that leads to overly confident predictions. Through various experiments and analyses, we are able to demonstrate that our proposals facilitate efficient data selection by alleviating overconfidence, even though they are readily applicable.
△ Less
Submitted 21 August, 2023;
originally announced August 2023.
-
Affective Workload Allocation for Multi-human Multi-robot Teams
Authors:
Wonse Jo,
Ruiqi Wang,
Baijian Yang,
Dan Foti,
Mo Rastgaar,
Byung-Cheol Min
Abstract:
The interaction and collaboration between humans and multiple robots represent a novel field of research known as human multi-robot systems. Adequately designed systems within this field allow teams composed of both humans and robots to work together effectively on tasks such as monitoring, exploration, and search and rescue operations. This paper presents a deep reinforcement learning-based affec…
▽ More
The interaction and collaboration between humans and multiple robots represent a novel field of research known as human multi-robot systems. Adequately designed systems within this field allow teams composed of both humans and robots to work together effectively on tasks such as monitoring, exploration, and search and rescue operations. This paper presents a deep reinforcement learning-based affective workload allocation controller specifically for multi-human multi-robot teams. The proposed controller can dynamically reallocate workloads based on the performance of the operators during collaborative missions with multi-robot systems. The operators' performances are evaluated through the scores of a self-reported questionnaire (i.e., subjective measurement) and the results of a deep learning-based cognitive workload prediction algorithm that uses physiological and behavioral data (i.e., objective measurement). To evaluate the effectiveness of the proposed controller, we use a multi-human multi-robot CCTV monitoring task as an example and carry out comprehensive real-world experiments with 32 human subjects for both quantitative measurement and qualitative analysis. Our results demonstrate the performance and effectiveness of the proposed controller and highlight the importance of incorporating both subjective and objective measurements of the operators' cognitive workload as well as seeking consent for workload transitions, to enhance the performance of multi-human multi-robot teams.
△ Less
Submitted 18 March, 2023;
originally announced March 2023.
-
VVS: Video-to-Video Retrieval with Irrelevant Frame Suppression
Authors:
Won Jo,
Geuntaek Lim,
Gwangjin Lee,
Hyunwoo Kim,
Byungsoo Ko,
Yukyung Choi
Abstract:
In content-based video retrieval (CBVR), dealing with large-scale collections, efficiency is as important as accuracy; thus, several video-level feature-based studies have actively been conducted. Nevertheless, owing to the severe difficulty of embedding a lengthy and untrimmed video into a single feature, these studies have been insufficient for accurate retrieval compared to frame-level feature-…
▽ More
In content-based video retrieval (CBVR), dealing with large-scale collections, efficiency is as important as accuracy; thus, several video-level feature-based studies have actively been conducted. Nevertheless, owing to the severe difficulty of embedding a lengthy and untrimmed video into a single feature, these studies have been insufficient for accurate retrieval compared to frame-level feature-based studies. In this paper, we show that appropriate suppression of irrelevant frames can provide insight into the current obstacles of the video-level approaches. Furthermore, we propose a Video-to-Video Suppression network (VVS) as a solution. VVS is an end-to-end framework that consists of an easy distractor elimination stage to identify which frames to remove and a suppression weight generation stage to determine the extent to suppress the remaining frames. This structure is intended to effectively describe an untrimmed video with varying content and meaningless information. Its efficacy is proved via extensive experiments, and we show that our approach is not only state-of-the-art in video-level approaches but also has a fast inference time despite possessing retrieval capabilities close to those of frame-level approaches. Code is available at https://github.com/sejong-rcv/VVS
△ Less
Submitted 19 December, 2023; v1 submitted 15 March, 2023;
originally announced March 2023.
-
Loss-Curvature Matching for Dataset Selection and Condensation
Authors:
Seungjae Shin,
Heesun Bae,
Donghyeok Shin,
Weonyoung Joo,
Il-Chul Moon
Abstract:
Training neural networks on a large dataset requires substantial computational costs. Dataset reduction selects or synthesizes data instances based on the large dataset, while minimizing the degradation in generalization performance from the full dataset. Existing methods utilize the neural network during the dataset reduction procedure, so the model parameter becomes important factor in preservin…
▽ More
Training neural networks on a large dataset requires substantial computational costs. Dataset reduction selects or synthesizes data instances based on the large dataset, while minimizing the degradation in generalization performance from the full dataset. Existing methods utilize the neural network during the dataset reduction procedure, so the model parameter becomes important factor in preserving the performance after reduction. By depending upon the importance of parameters, this paper introduces a new reduction objective, coined LCMat, which Matches the Loss Curvatures of the original dataset and reduced dataset over the model parameter space, more than the parameter point. This new objective induces a better adaptation of the reduced dataset on the perturbed parameter region than the exact point matching. Particularly, we identify the worst case of the loss curvature gap from the local parameter region, and we derive the implementable upper bound of such worst-case with theoretical analyses. Our experiments on both coreset selection and condensation benchmarks illustrate that LCMat shows better generalization performances than existing baselines.
△ Less
Submitted 8 March, 2023;
originally announced March 2023.
-
Implications of Personality on Cognitive Workload, Affect, and Task Performance in Remote Robot Control
Authors:
Go-Eum Cha,
Wonse Jo,
Byung-Cheol Min
Abstract:
This paper explores how the personality traits of robot operators can influence their task performance during remote control of robots. It is essential to explore the impact of personal dispositions on information processing, both directly and indirectly, when working with robots on specific tasks. To investigate this relationship, we utilize the open-access multi-modal dataset MOCAS to examine th…
▽ More
This paper explores how the personality traits of robot operators can influence their task performance during remote control of robots. It is essential to explore the impact of personal dispositions on information processing, both directly and indirectly, when working with robots on specific tasks. To investigate this relationship, we utilize the open-access multi-modal dataset MOCAS to examine the robot operator's personality traits, affect, cognitive load, and task performance. Our objective is to confirm if personality traits have a total effect, including both direct and indirect effects, that could significantly impact the performance levels of operators. Specifically, we examine the relationship between personality traits such as extroversion, conscientiousness, and agreeableness, and task performance. We conduct a correlation analysis between cognitive load, self-ratings of workload and affect, and quantified individual personality traits along with their experimental scores. The findings show that personality traits do not have a total effect on task performance.
△ Less
Submitted 1 August, 2023; v1 submitted 8 March, 2023;
originally announced March 2023.
-
Beacon-based Distributed Structure Formation in Multi-agent Systems
Authors:
Tamzidul Mina,
Wonse Jo,
Shyam S. Kannan,
Byung-Cheol Min
Abstract:
Autonomous shape and structure formation is an important problem in the domain of large-scale multi-agent systems. In this paper, we propose a 3D structure representation method and a distributed structure formation strategy where settled agents guide free moving agents to a prescribed location to settle in the structure. Agents at the structure formation frontier looking for neighbors to settle a…
▽ More
Autonomous shape and structure formation is an important problem in the domain of large-scale multi-agent systems. In this paper, we propose a 3D structure representation method and a distributed structure formation strategy where settled agents guide free moving agents to a prescribed location to settle in the structure. Agents at the structure formation frontier looking for neighbors to settle act as beacons, generating a surface gradient throughout the formed structure propagated by settled agents. Free-moving agents follow the surface gradient along the formed structure surface to the formation frontier, where they eventually reach the closest beacon and settle to continue the structure formation following a local bidding process. Agent behavior is governed by a finite state machine implementation, along with potential field-based motion control laws. We also discuss appropriate rules for recovering from stagnation points. Simulation experiments are presented to show planar and 3D structure formations with continuous and discontinuous boundary/surfaces, which validate the proposed strategy, followed by a scalability analysis.
△ Less
Submitted 28 July, 2023; v1 submitted 1 March, 2023;
originally announced March 2023.
-
DaDe: Delay-adaptive Detector for Streaming Perception
Authors:
Wonwoo Jo,
Kyungshin Lee,
Jaewon Baik,
Sangsun Lee,
Dongho Choi,
Hyunkyoo Park
Abstract:
Recognizing the surrounding environment at low latency is critical in autonomous driving. In real-time environment, surrounding environment changes when processing is over. Current detection models are incapable of dealing with changes in the environment that occur after processing. Streaming perception is proposed to assess the latency and accuracy of real-time video perception. However, addition…
▽ More
Recognizing the surrounding environment at low latency is critical in autonomous driving. In real-time environment, surrounding environment changes when processing is over. Current detection models are incapable of dealing with changes in the environment that occur after processing. Streaming perception is proposed to assess the latency and accuracy of real-time video perception. However, additional problems arise in real-world applications due to limited hardware resources, high temperatures, and other factors. In this study, we develop a model that can reflect processing delays in real time and produce the most reasonable results. By incorporating the proposed feature queue and feature select module, the system gains the ability to forecast specific time steps without any additional computational costs. Our method is tested on the Argoverse-HD dataset. It achieves higher performance than the current state-of-the-art methods(2022.12) in various environments when delayed . The code is available at https://github.com/danjos95/DADE
△ Less
Submitted 22 December, 2022; v1 submitted 22 December, 2022;
originally announced December 2022.
-
MOCAS: A Multimodal Dataset for Objective Cognitive Workload Assessment on Simultaneous Tasks
Authors:
Wonse Jo,
Ruiqi Wang,
Su Sun,
Revanth Krishna Senthilkumaran,
Daniel Foti,
Byung-Cheol Min
Abstract:
This paper presents MOCAS, a multimodal dataset dedicated for human cognitive workload (CWL) assessment. In contrast to existing datasets based on virtual game stimuli, the data in MOCAS was collected from realistic closed-circuit television (CCTV) monitoring tasks, increasing its applicability for real-world scenarios. To build MOCAS, two off-the-shelf wearable sensors and one webcam were utilize…
▽ More
This paper presents MOCAS, a multimodal dataset dedicated for human cognitive workload (CWL) assessment. In contrast to existing datasets based on virtual game stimuli, the data in MOCAS was collected from realistic closed-circuit television (CCTV) monitoring tasks, increasing its applicability for real-world scenarios. To build MOCAS, two off-the-shelf wearable sensors and one webcam were utilized to collect physiological signals and behavioral features from 21 human subjects. After each task, participants reported their CWL by completing the NASA-Task Load Index (NASA-TLX) and Instantaneous Self-Assessment (ISA). Personal background (e.g., personality and prior experience) was surveyed using demographic and Big Five Factor personality questionnaires, and two domains of subjective emotion information (i.e., arousal and valence) were obtained from the Self-Assessment Manikin (SAM), which could serve as potential indicators for improving CWL recognition performance. Technical validation was conducted to demonstrate that target CWL levels were elicited during simultaneous CCTV monitoring tasks; its results support the high quality of the collected multimodal signals.
△ Less
Submitted 10 June, 2024; v1 submitted 6 October, 2022;
originally announced October 2022.
-
Husformer: A Multi-Modal Transformer for Multi-Modal Human State Recognition
Authors:
Ruiqi Wang,
Wonse Jo,
Dezhong Zhao,
Weizheng Wang,
Baijian Yang,
Guohua Chen,
Byung-Cheol Min
Abstract:
Human state recognition is a critical topic with pervasive and important applications in human-machine systems. Multi-modal fusion, the combination of metrics from multiple data sources, has been shown as a sound method for improving the recognition performance. However, while promising results have been reported by recent multi-modal-based models, they generally fail to leverage the sophisticated…
▽ More
Human state recognition is a critical topic with pervasive and important applications in human-machine systems. Multi-modal fusion, the combination of metrics from multiple data sources, has been shown as a sound method for improving the recognition performance. However, while promising results have been reported by recent multi-modal-based models, they generally fail to leverage the sophisticated fusion strategies that would model sufficient cross-modal interactions when producing the fusion representation; instead, current methods rely on lengthy and inconsistent data preprocessing and feature crafting. To address this limitation, we propose an end-to-end multi-modal transformer framework for multi-modal human state recognition called Husformer. Specifically, we propose to use cross-modal transformers, which inspire one modality to reinforce itself through directly attending to latent relevance revealed in other modalities, to fuse different modalities while ensuring sufficient awareness of the cross-modal interactions introduced. Subsequently, we utilize a self-attention transformer to further prioritize contextual information in the fusion representation. Using two such attention mechanisms enables effective and adaptive adjustments to noise and interruptions in multi-modal signals during the fusion process and in relation to high-level features. Extensive experiments on two human emotion corpora (DEAP and WESAD) and two cognitive workload datasets (MOCAS and CogLoad) demonstrate that in the recognition of human state, our Husformer outperforms both state-of-the-art multi-modal baselines and the use of a single modality by a large margin, especially when dealing with raw multi-modal signals. We also conducted an ablation study to show the benefits of each component in Husformer.
△ Less
Submitted 10 April, 2023; v1 submitted 29 September, 2022;
originally announced September 2022.
-
Neural Additive Models for Nowcasting
Authors:
Wonkeun Jo,
Dongil Kim
Abstract:
Deep neural networks (DNNs) are one of the most highlighted methods in machine learning. However, as DNNs are black-box models, they lack explanatory power for their predictions. Recently, neural additive models (NAMs) have been proposed to provide this power while maintaining high prediction performance. In this paper, we propose a novel NAM approach for multivariate nowcasting (NC) problems, whi…
▽ More
Deep neural networks (DNNs) are one of the most highlighted methods in machine learning. However, as DNNs are black-box models, they lack explanatory power for their predictions. Recently, neural additive models (NAMs) have been proposed to provide this power while maintaining high prediction performance. In this paper, we propose a novel NAM approach for multivariate nowcasting (NC) problems, which comprise an important focus area of machine learning. For the multivariate time-series data used in NC problems, explanations should be considered for every input value to the variables at distinguishable time steps. By employing generalized additive models, the proposed NAM-NC successfully explains each input value's importance for multiple variables and time steps. Experimental results involving a toy example and two real-world datasets show that the NAM-NC predicts multivariate time-series data as accurately as state-of-the-art neural networks, while also providing the explanatory importance of each input value. We also examine parameter-sharing networks using NAM-NC to decrease their complexity, and NAM-MC's hard-tied feature net extracted explanations with good performance.
△ Less
Submitted 20 May, 2022;
originally announced May 2022.
-
SMARTmBOT: A ROS2-based Low-cost and Open-source Mobile Robot Platform
Authors:
Wonse Jo,
Jaeeun Kim,
Ruiqi Wang,
Jeremy Pan,
Revanth Krishna Senthilkumaran,
Byung-Cheol Min
Abstract:
This paper introduces SMARTmBOT, an open-source mobile robot platform based on Robot Operating System 2 (ROS2). The characteristics of the SMARTmBOT, including low-cost, modular-typed, customizable and expandable design, make it an easily achievable and effective robot platform to support broad robotics research and education involving either single-robot or multi-robot systems. The total cost per…
▽ More
This paper introduces SMARTmBOT, an open-source mobile robot platform based on Robot Operating System 2 (ROS2). The characteristics of the SMARTmBOT, including low-cost, modular-typed, customizable and expandable design, make it an easily achievable and effective robot platform to support broad robotics research and education involving either single-robot or multi-robot systems. The total cost per robot is approximately $210, and most hardware components can be fabricated by a generic 3D printer, hence allowing users to build the robots or replace any broken parts conveniently. The SMARTmBot is also equipped with a rich range of sensors, making it competent for general task scenarios, such as point-to-point navigation and obstacle avoidance. We validated the mobility and function of SMARTmBOT through various robot navigation experiments and applications with tasks including go-to-goal, pure-pursuit, line following, and swarming. All source code necessary for reading sensors, streaming from an embedded camera, and controlling the robot including robot navigation controllers is available through an online repository that can be found at https://github.com/SMARTlab-Purdue/SMARTmBOT.
△ Less
Submitted 16 March, 2022;
originally announced March 2022.
-
See further upon the giants: Quantifying intellectual lineage in science
Authors:
Woo Seong Jo,
Lu Liu,
Dashun Wang
Abstract:
Newton's centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a novel, discipline-independent method to identify the giant for any individual paper, allowing us to systematically examine the role and characteristics of giants in science. We find that across…
▽ More
Newton's centuries-old wisdom of standing on the shoulders of giants raises a crucial yet underexplored question: Out of all the prior works cited by a discovery, which one is its giant? Here, we develop a novel, discipline-independent method to identify the giant for any individual paper, allowing us to systematically examine the role and characteristics of giants in science. We find that across disciplines, about 95% of papers stand on the shoulders of giants, yet the weight of scientific progress rests on relatively few shoulders. Defining a new measure of giant index, we find that, while papers with high citations are more likely to be giants, for papers with the same citations, their giant index sharply predicts a paper's future impact and prize-winning probabilities. Giants tend to originate from both small and large teams, being either highly disruptive or highly developmental. And papers that did not have a giant but later became a giant tend to be home-run papers that are highly disruptive to science. Given the crucial importance of citation-based measures in science, the developed concept of giants may offer a useful new dimension in assessing scientific impact that goes beyond sheer citation counts.
△ Less
Submitted 14 March, 2022; v1 submitted 16 February, 2022;
originally announced February 2022.
-
Toward a Wearable Biosensor Ecosystem on ROS 2 for Real-time Human-Robot Interaction Systems
Authors:
Wonse Jo,
Robert Wilson,
Jaeeun Kim,
Steve McGuire,
Byung-Cheol Min
Abstract:
Wearable biosensors can enable continuous human data capture, facilitating development of real-world Human-Robot Interaction (HRI) systems. However, a lack of standardized libraries and implementations adds extraneous complexity to HRI system designs, and precludes collaboration across disciplines and institutions. Here, we introduce a novel wearable biosensor package for the Robot Operating Syste…
▽ More
Wearable biosensors can enable continuous human data capture, facilitating development of real-world Human-Robot Interaction (HRI) systems. However, a lack of standardized libraries and implementations adds extraneous complexity to HRI system designs, and precludes collaboration across disciplines and institutions. Here, we introduce a novel wearable biosensor package for the Robot Operating System 2 (ROS 2) system. The ROS2 officially supports real-time computing and multi-robot systems, and thus provides easy-to-use and reliable streaming data from multiple nodes. The package standardizes biosensor HRI integration, lowers the technical barrier of entry, and expands the biosensor ecosystem into the robotics field. Each biosensor package node follows a generalized node and topic structure concentrated on ease of use. Current package capabilities, listed by biosensor, highlight package standardization. Collected example data demonstrate a full integration of each biosensor into ROS2. We expect that standardization of this biosensors package for ROS2 will greatly simplify use and cross-collaboration across many disciplines. The wearable biosensor package is made publicly available on GitHub at \https://github.com/SMARTlab-Purdue/ros2-foxy-wearable-biosensors.
△ Less
Submitted 7 October, 2021;
originally announced October 2021.
-
Neural Posterior Regularization for Likelihood-Free Inference
Authors:
Dongjun Kim,
Kyungwoo Song,
Seungjae Shin,
Wanmo Kang,
Il-Chul Moon,
Weonyoung Joo
Abstract:
A simulation is useful when the phenomenon of interest is either expensive to regenerate or irreproducible with the same context. Recently, Bayesian inference on the distribution of the simulation input parameter has been implemented sequentially to minimize the required simulation budget for the task of simulation validation to the real-world. However, the Bayesian inference is still challenging…
▽ More
A simulation is useful when the phenomenon of interest is either expensive to regenerate or irreproducible with the same context. Recently, Bayesian inference on the distribution of the simulation input parameter has been implemented sequentially to minimize the required simulation budget for the task of simulation validation to the real-world. However, the Bayesian inference is still challenging when the ground-truth posterior is multi-modal with a high-dimensional simulation output. This paper introduces a regularization technique, namely Neural Posterior Regularization (NPR), which enforces the model to explore the input parameter space effectively. Afterward, we provide the closed-form solution of the regularized optimization that enables analyzing the effect of the regularization. We empirically validate that NPR attains the statistically significant gain on benchmark performances for diverse simulation tasks.
△ Less
Submitted 3 November, 2022; v1 submitted 15 February, 2021;
originally announced February 2021.
-
GST: Group-Sparse Training for Accelerating Deep Reinforcement Learning
Authors:
Juhyoung Lee,
Sangyeob Kim,
Sangjin Kim,
Wooyoung Jo,
Hoi-Jun Yoo
Abstract:
Deep reinforcement learning (DRL) has shown remarkable success in sequential decision-making problems but suffers from a long training time to obtain such good performance. Many parallel and distributed DRL training approaches have been proposed to solve this problem, but it is difficult to utilize them on resource-limited devices. In order to accelerate DRL in real-world edge devices, memory band…
▽ More
Deep reinforcement learning (DRL) has shown remarkable success in sequential decision-making problems but suffers from a long training time to obtain such good performance. Many parallel and distributed DRL training approaches have been proposed to solve this problem, but it is difficult to utilize them on resource-limited devices. In order to accelerate DRL in real-world edge devices, memory bandwidth bottlenecks due to large weight transactions have to be resolved. However, previous iterative pruning not only shows a low compression ratio at the beginning of training but also makes DRL training unstable. To overcome these shortcomings, we propose a novel weight compression method for DRL training acceleration, named group-sparse training (GST). GST selectively utilizes block-circulant compression to maintain a high weight compression ratio during all iterations of DRL training and dynamically adapt target sparsity through reward-aware pruning for stable training. Thanks to the features, GST achieves a 25 \%p $\sim$ 41.5 \%p higher average compression ratio than the iterative pruning method without reward drop in Mujoco Halfcheetah-v2 and Mujoco humanoid-v2 environment with TD3 training.
△ Less
Submitted 24 January, 2021;
originally announced January 2021.
-
Counterfactual Fairness with Disentangled Causal Effect Variational Autoencoder
Authors:
Hyemi Kim,
Seungjae Shin,
JoonHo Jang,
Kyungwoo Song,
Weonyoung Joo,
Wanmo Kang,
Il-Chul Moon
Abstract:
The problem of fair classification can be mollified if we develop a method to remove the embedded sensitive information from the classification features. This line of separating the sensitive information is developed through the causal inference, and the causal inference enables the counterfactual generations to contrast the what-if case of the opposite sensitive attribute. Along with this separat…
▽ More
The problem of fair classification can be mollified if we develop a method to remove the embedded sensitive information from the classification features. This line of separating the sensitive information is developed through the causal inference, and the causal inference enables the counterfactual generations to contrast the what-if case of the opposite sensitive attribute. Along with this separation with the causality, a frequent assumption in the deep latent causal model defines a single latent variable to absorb the entire exogenous uncertainty of the causal graph. However, we claim that such structure cannot distinguish the 1) information caused by the intervention (i.e., sensitive variable) and 2) information correlated with the intervention from the data. Therefore, this paper proposes Disentangled Causal Effect Variational Autoencoder (DCEVAE) to resolve this limitation by disentangling the exogenous uncertainty into two latent variables: either 1) independent to interventions or 2) correlated to interventions without causality. Particularly, our disentangling approach preserves the latent variable correlated to interventions in generating counterfactual examples. We show that our method estimates the total effect and the counterfactual effect without a complete causal graph. By adding a fairness regularization, DCEVAE generates a counterfactual fair dataset while losing less original information. Also, DCEVAE generates natural counterfactual images by only flipping sensitive information. Additionally, we theoretically show the differences in the covariance structures of DCEVAE and prior works from the perspective of the latent disentanglement.
△ Less
Submitted 9 December, 2020; v1 submitted 23 November, 2020;
originally announced November 2020.
-
Sequential Likelihood-Free Inference with Neural Proposal
Authors:
Dongjun Kim,
Kyungwoo Song,
YoonYeong Kim,
Yongjin Shin,
Wanmo Kang,
Il-Chul Moon,
Weonyoung Joo
Abstract:
Bayesian inference without the likelihood evaluation, or likelihood-free inference, has been a key research topic in simulation studies for gaining quantitatively validated simulation models on real-world datasets. As the likelihood evaluation is inaccessible, previous papers train the amortized neural network to estimate the ground-truth posterior for the simulation of interest. Training the netw…
▽ More
Bayesian inference without the likelihood evaluation, or likelihood-free inference, has been a key research topic in simulation studies for gaining quantitatively validated simulation models on real-world datasets. As the likelihood evaluation is inaccessible, previous papers train the amortized neural network to estimate the ground-truth posterior for the simulation of interest. Training the network and accumulating the dataset alternatively in a sequential manner could save the total simulation budget by orders of magnitude. In the data accumulation phase, the new simulation inputs are chosen within a portion of the total simulation budget to accumulate upon the collected dataset. This newly accumulated data degenerates because the set of simulation inputs is hardly mixed, and this degenerated data collection process ruins the posterior inference. This paper introduces a new sampling approach, called Neural Proposal (NP), of the simulation input that resolves the biased data collection as it guarantees the i.i.d. sampling. The experiments show the improved performance of our sampler, especially for the simulations with multi-modal posteriors.
△ Less
Submitted 4 November, 2022; v1 submitted 15 October, 2020;
originally announced October 2020.
-
Adaptive Workload Allocation for Multi-human Multi-robot Teams for Independent and Homogeneous Tasks
Authors:
Tamzidul Mina,
Shyam Sundar Kannan,
Wonse Jo,
Byung-Cheol Min
Abstract:
Multi-human multi-robot (MH-MR) systems have the ability to combine the potential advantages of robotic systems with those of having humans in the loop. Robotic systems contribute precision performance and long operation on repetitive tasks without tiring, while humans in the loop improve situational awareness and enhance decision-making abilities. A system's ability to adapt allocated workload to…
▽ More
Multi-human multi-robot (MH-MR) systems have the ability to combine the potential advantages of robotic systems with those of having humans in the loop. Robotic systems contribute precision performance and long operation on repetitive tasks without tiring, while humans in the loop improve situational awareness and enhance decision-making abilities. A system's ability to adapt allocated workload to changing conditions and the performance of each individual (human and robot) during the mission is vital to maintaining overall system performance. Previous works from literature including market-based and optimization approaches have attempted to address the task/workload allocation problem with focus on maximizing the system output without regarding individual agent conditions, lacking in real-time processing and have mostly focused exclusively on multi-robot systems. Given the variety of possible combination of teams (autonomous robots and human-operated robots: any number of human operators operating any number of robots at a time) and the operational scale of MH-MR systems, development of a generalized framework of workload allocation has been a particularly challenging task. In this paper, we present such a framework for independent homogeneous missions, capable of adaptively allocating the system workload in relation to health conditions and work performances of human-operated and autonomous robots in real-time. The framework consists of removable modular function blocks ensuring its applicability to different MH-MR scenarios. A new workload transition function block ensures smooth transition without the workload change having adverse effects on individual agents. The effectiveness and scalability of the system's workload adaptability is validated by experiments applying the proposed framework in a MH-MR patrolling scenario with changing human and robot condition, and failing robots.
△ Less
Submitted 27 July, 2020;
originally announced July 2020.
-
Evaluation of Sampling Methods for Robotic Sediment Sampling Systems
Authors:
Jun Han Bae,
Wonse Jo,
Jee Hwan Park,
Richard M. Voyles,
Sara K. McMillan,
Byung-Cheol Min
Abstract:
Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ea…
▽ More
Analysis of sediments from rivers, lakes, reservoirs, wetlands and other constructed surface water impoundments is an important tool to characterize the function and health of these systems, but is generally carried out manually. This is costly and can be hazardous and difficult for humans due to inaccessibility, contamination, or availability of required equipment. Robotic sampling systems can ease these burdens, but little work has examined the efficiency of such sampling means and no prior work has investigated the quality of the resulting samples. This paper presents an experimental study that evaluates and optimizes sediment sampling patterns applied to a robot sediment sampling system that allows collection of minimally-disturbed sediment cores from natural and man-made water bodies for various sediment types. To meet this need, we developed and tested a robotic sampling platform in the laboratory to test functionality under a range of sediment types and operating conditions. Specifically, we focused on three patterns by which a cylindrical coring device was driven into the sediment (linear, helical, and zig-zag) for three sediment types (coarse sand, medium sand, and silt). The results show that the optimal sampling pattern varies depending on the type of sediment and can be optimized based on the sampling objective. We examined two sampling objectives: maximizing the mass of minimally disturbed sediment and minimizing the power per mass of sample. This study provides valuable data to aid in the selection of optimal sediment coring methods for various applications and builds a solid foundation for future field testing under a range of environmental conditions.
△ Less
Submitted 23 June, 2020;
originally announced June 2020.
-
ROSbag-based Multimodal Affective Dataset for Emotional and Cognitive States
Authors:
Wonse Jo,
Shyam Sundar Kannan,
Go-Eum Cha,
Ahreum Lee,
Byung-Cheol Min
Abstract:
This paper introduces a new ROSbag-based multimodal affective dataset for emotional and cognitive states generated using Robot Operating System (ROS). We utilized images and sounds from the International Affective Pictures System (IAPS) and the International Affective Digitized Sounds (IADS) to stimulate targeted emotions (happiness, sadness, anger, fear, surprise, disgust, and neutral), and a dua…
▽ More
This paper introduces a new ROSbag-based multimodal affective dataset for emotional and cognitive states generated using Robot Operating System (ROS). We utilized images and sounds from the International Affective Pictures System (IAPS) and the International Affective Digitized Sounds (IADS) to stimulate targeted emotions (happiness, sadness, anger, fear, surprise, disgust, and neutral), and a dual N-back game to stimulate different levels of cognitive workload. 30 human subjects participated in the user study; their physiological data was collected using the latest commercial wearable sensors, behavioral data was collected using hardware devices such as cameras, and subjective assessments were carried out through questionnaires. All data was stored in single ROSbag files rather than in conventional Comma-separated values (CSV) files. This not only ensures synchronization of signals and videos in a data set, but also allows researchers to easily analyze and verify their algorithms by connecting directly to this dataset through ROS. The generated affective dataset consists of 1,602 ROSbag files, and size of the dataset is about 787GB. The dataset is made publicly available. We expect that our dataset can be great resource for many researchers in the fields of affective computing, HCI, and HRI.
△ Less
Submitted 20 October, 2020; v1 submitted 9 June, 2020;
originally announced June 2020.
-
Investigating the Effect of Deictic Movements of a Multi-robot
Authors:
Ahreum Lee,
Wonse Jo,
Shyam Sundar Kannan,
Byung-Cheol Min
Abstract:
Multi-robot systems are made up of a team of multiple robots, which provides the advantage of performing complex tasks with high efficiency, flexibility, and robustness. Although research on human-robot interaction is ongoing as robots become more readily available and easier to use, the study of interactions between a human and multiple robots represents a relatively new field of research. In par…
▽ More
Multi-robot systems are made up of a team of multiple robots, which provides the advantage of performing complex tasks with high efficiency, flexibility, and robustness. Although research on human-robot interaction is ongoing as robots become more readily available and easier to use, the study of interactions between a human and multiple robots represents a relatively new field of research. In particular, how multi-robots could be used for everyday users has not been extensively explored. Additionally, the impact of the characteristics of multiple robots on human perception and cognition in human multi-robot interaction should be further explored. In this paper, we specifically focus on the benefits of physical affordances generated by the movements of multi-robots, and investigate the effects of deictic movements of multi-robots on information retrieval by conducting a delayed free recall task.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
A ROS-based Framework for Monitoring Human and Robot Conditions in a Human-Multi-robot Team
Authors:
Wonse Jo,
Shyam Sundar Kannan,
Go-Eum Cha,
Ahreum Lee,
Byung-Cheol Min
Abstract:
This paper presents a framework for monitoring human and robot conditions in human multi-robot interactions. The proposed framework consists of four modules: 1) human and robot conditions monitoring interface, 2) synchronization time filter, 3) data feature extraction interface, and 4) condition monitoring interface. The framework is based on Robot Operating System (ROS), and it supports physiolog…
▽ More
This paper presents a framework for monitoring human and robot conditions in human multi-robot interactions. The proposed framework consists of four modules: 1) human and robot conditions monitoring interface, 2) synchronization time filter, 3) data feature extraction interface, and 4) condition monitoring interface. The framework is based on Robot Operating System (ROS), and it supports physiological and behavioral sensors and devices and robot systems, as well as custom programs. Furthermore, it allows synchronizing the monitoring conditions and sharing them simultaneously. In order to validate the proposed framework, we present experiment results and analysis obtained from the user study where 30 human subjects participated and simulated robot experiments.
△ Less
Submitted 6 June, 2020;
originally announced June 2020.
-
Adversarial Likelihood-Free Inference on Black-Box Generator
Authors:
Dongjun Kim,
Weonyoung Joo,
Seungjae Shin,
Kyungwoo Song,
Il-Chul Moon
Abstract:
Generative Adversarial Network (GAN) can be viewed as an implicit estimator of a data distribution, and this perspective motivates using the adversarial concept in the true input parameter estimation of black-box generators. While previous works on likelihood-free inference introduces an implicit proposal distribution on the generator input, this paper analyzes theoretic limitations of the proposa…
▽ More
Generative Adversarial Network (GAN) can be viewed as an implicit estimator of a data distribution, and this perspective motivates using the adversarial concept in the true input parameter estimation of black-box generators. While previous works on likelihood-free inference introduces an implicit proposal distribution on the generator input, this paper analyzes theoretic limitations of the proposal distribution approach. On top of that, we introduce a new algorithm, Adversarial Likelihood-Free Inference (ALFI), to mitigate the analyzed limitations, so ALFI is able to find the posterior distribution on the input parameter for black-box generative models. We experimented ALFI with diverse simulation models as well as pre-trained statistical models, and we identified that ALFI achieves the best parameter estimation accuracy with a limited simulation budget.
△ Less
Submitted 11 June, 2020; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Neutralizing Gender Bias in Word Embedding with Latent Disentanglement and Counterfactual Generation
Authors:
Seungjae Shin,
Kyungwoo Song,
JoonHo Jang,
Hyemi Kim,
Weonyoung Joo,
Il-Chul Moon
Abstract:
Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a \textit{Latent Disentanglement} method with a siamese auto-encod…
▽ More
Recent research demonstrates that word embeddings, trained on the human-generated corpus, have strong gender biases in embedding spaces, and these biases can result in the discriminative results from the various downstream tasks. Whereas the previous methods project word embeddings into a linear subspace for debiasing, we introduce a \textit{Latent Disentanglement} method with a siamese auto-encoder structure with an adapted gradient reversal layer. Our structure enables the separation of the semantic latent information and gender latent information of given word into the disjoint latent dimensions. Afterwards, we introduce a \textit{Counterfactual Generation} to convert the gender information of words, so the original and the modified embeddings can produce a gender-neutralized word embedding after geometric alignment regularization, without loss of semantic information. From the various quantitative and qualitative debiasing experiments, our method shows to be better than existing debiasing methods in debiasing word embeddings. In addition, Our method shows the ability to preserve semantic information during debiasing by minimizing the semantic information losses for extrinsic NLP downstream tasks.
△ Less
Submitted 3 November, 2020; v1 submitted 7 April, 2020;
originally announced April 2020.
-
Generalized Gumbel-Softmax Gradient Estimator for Generic Discrete Random Variables
Authors:
Weonyoung Joo,
Dongjun Kim,
Seungjae Shin,
Il-Chul Moon
Abstract:
Estimating the gradients of stochastic nodes in stochastic computational graphs is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network parameters. Stochastic gradient estimators of discrete random variables are widely explored, for example, Gumbel-Softmax reparameterization trick for Bernoulli and catego…
▽ More
Estimating the gradients of stochastic nodes in stochastic computational graphs is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network parameters. Stochastic gradient estimators of discrete random variables are widely explored, for example, Gumbel-Softmax reparameterization trick for Bernoulli and categorical distributions. Meanwhile, other discrete distribution cases such as the Poisson, geometric, binomial, multinomial, negative binomial, etc. have not been explored. This paper proposes a generalized version of the Gumbel-Softmax estimator, which is able to reparameterize generic discrete distributions, not restricted to the Bernoulli and the categorical. The proposed estimator utilizes the truncation of discrete random variables, the Gumbel-Softmax trick, and a special form of linear transformation. Our experiments consist of (1) synthetic examples and applications on VAE, which show the efficacy of our methods; and (2) topic models, which demonstrate the value of the proposed estimation in practice.
△ Less
Submitted 21 February, 2023; v1 submitted 3 March, 2020;
originally announced March 2020.
-
Extracting hierarchical backbones from bipartite networks
Authors:
Woo Seong Jo,
Jaehyuk Park,
Arthur Luhur,
Beom Jun Kim,
Yong-Yeol Ahn
Abstract:
We propose a method for extracting hierarchical backbones from a bipartite network. Our method leverages the observation that a hierarchical relationship between two nodes in a bipartite network is often manifested as an asymmetry in the conditional probability of observing the connections to them from the other node set. Our method estimates both the importance and direction of the hierarchical r…
▽ More
We propose a method for extracting hierarchical backbones from a bipartite network. Our method leverages the observation that a hierarchical relationship between two nodes in a bipartite network is often manifested as an asymmetry in the conditional probability of observing the connections to them from the other node set. Our method estimates both the importance and direction of the hierarchical relationship between a pair of nodes, thereby providing a flexible way to identify the essential part of the networks. Using semi-synthetic benchmarks, we show that our method outperforms existing methods at identifying planted hierarchy while offering more flexibility. Application of our method to empirical datasets---a bipartite network of skills and individuals as well as the network between gene products and Gene Ontology (GO) terms---demonstrates the possibility of automatically extracting or augmenting ontology from data.
△ Less
Submitted 18 March, 2020; v1 submitted 17 February, 2020;
originally announced February 2020.
-
Sequential Recommendation with Relation-Aware Kernelized Self-Attention
Authors:
Mingi Ji,
Weonyoung Joo,
Kyungwoo Song,
Yoon-Yeong Kim,
Il-Chul Moon
Abstract:
Recent studies identified that sequential Recommendation is improved by the attention mechanism. By following this development, we propose Relation-Aware Kernelized Self-Attention (RKSA) adopting a self-attention mechanism of the Transformer with augmentation of a probabilistic model. The original self-attention of Transformer is a deterministic measure without relation-awareness. Therefore, we in…
▽ More
Recent studies identified that sequential Recommendation is improved by the attention mechanism. By following this development, we propose Relation-Aware Kernelized Self-Attention (RKSA) adopting a self-attention mechanism of the Transformer with augmentation of a probabilistic model. The original self-attention of Transformer is a deterministic measure without relation-awareness. Therefore, we introduce a latent space to the self-attention, and the latent space models the recommendation context from relation as a multivariate skew-normal distribution with a kernelized covariance matrix from co-occurrences, item characteristics, and user information. This work merges the self-attention of the Transformer and the sequential recommendation by adding a probabilistic model of the recommendation task specifics. We experimented RKSA over the benchmark datasets, and RKSA shows significant improvements compared to the recent baseline models. Also, RKSA were able to produce a latent space model that answers the reasons for recommendation.
△ Less
Submitted 14 November, 2019;
originally announced November 2019.
-
Dirichlet Variational Autoencoder
Authors:
Weonyoung Joo,
Wonsung Lee,
Sungrae Park,
Il-Chul Moon
Abstract:
This paper proposes Dirichlet Variational Autoencoder (DirVAE) using a Dirichlet prior for a continuous latent variable that exhibits the characteristic of the categorical probabilities. To infer the parameters of DirVAE, we utilize the stochastic gradient method by approximating the Gamma distribution, which is a component of the Dirichlet distribution, with the inverse Gamma CDF approximation. A…
▽ More
This paper proposes Dirichlet Variational Autoencoder (DirVAE) using a Dirichlet prior for a continuous latent variable that exhibits the characteristic of the categorical probabilities. To infer the parameters of DirVAE, we utilize the stochastic gradient method by approximating the Gamma distribution, which is a component of the Dirichlet distribution, with the inverse Gamma CDF approximation. Additionally, we reshape the component collapsing issue by investigating two problem sources, which are decoder weight collapsing and latent value collapsing, and we show that DirVAE has no component collapsing; while Gaussian VAE exhibits the decoder weight collapsing and Stick-Breaking VAE shows the latent value collapsing. The experimental results show that 1) DirVAE models the latent representation result with the best log-likelihood compared to the baselines; and 2) DirVAE produces more interpretable latent values with no collapsing issues which the baseline models suffer from. Also, we show that the learned latent representation from the DirVAE achieves the best classification accuracy in the semi-supervised and the supervised classification tasks on MNIST, OMNIGLOT, and SVHN compared to the baseline VAEs. Finally, we demonstrated that the DirVAE augmented topic models show better performances in most cases.
△ Less
Submitted 9 January, 2019;
originally announced January 2019.
-
Material Mapping in Unknown Environments using Tapping Sound
Authors:
Shyam Sundar Kannan,
Wonse Jo,
Ramviyas Parasuraman,
Byung-Cheol Min
Abstract:
In this paper, we propose an autonomous exploration and a tapping mechanism-based material mapping system for a mobile robot in unknown environments. The goal of the proposed system is to integrate simultaneous localization and mapping (SLAM) modules and sound-based material classification to enable a mobile robot to explore an unknown environment autonomously and at the same time identify the var…
▽ More
In this paper, we propose an autonomous exploration and a tapping mechanism-based material mapping system for a mobile robot in unknown environments. The goal of the proposed system is to integrate simultaneous localization and mapping (SLAM) modules and sound-based material classification to enable a mobile robot to explore an unknown environment autonomously and at the same time identify the various objects and materials in the environment. This creates a material map that localizes the various materials in the environment which has potential applications for search and rescue scenarios. A tapping mechanism and tapping audio signal processing based on machine learning techniques are exploited for a robot to identify the objects and materials. We demonstrate the proposed system through experiments using a mobile robot platform installed with Velodyne LiDAR, a linear solenoid, and microphones in an exploration-like scenario with various materials. Experiment results demonstrate that the proposed system can create useful material maps in unknown environments.
△ Less
Submitted 3 August, 2020; v1 submitted 13 December, 2018;
originally announced December 2018.