-
ConTEXTure: Consistent Multiview Images to Texture
Authors:
Jaehoon Ahn,
Sumin Cho,
Harim Jung,
Kibeom Hong,
Seonghoon Ban,
Moon-Ryul Jung
Abstract:
We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure…
▽ More
We introduce ConTEXTure, a generative network designed to create a texture map/atlas for a given 3D mesh using images from multiple viewpoints. The process begins with generating a front-view image from a text prompt, such as 'Napoleon, front view', describing the 3D mesh. Additional images from different viewpoints are derived from this front-view image and camera poses relative to it. ConTEXTure builds upon the TEXTure network, which uses text prompts for six viewpoints (e.g., 'Napoleon, front view', 'Napoleon, left view', etc.). However, TEXTure often generates images for non-front viewpoints that do not accurately represent those viewpoints.To address this issue, we employ Zero123++, which generates multiple view-consistent images for the six specified viewpoints simultaneously, conditioned on the initial front-view image and the depth maps of the mesh for the six viewpoints. By utilizing these view-consistent images, ConTEXTure learns the texture atlas from all viewpoint images concurrently, unlike previous methods that do so sequentially. This approach ensures that the rendered images from various viewpoints, including back, side, bottom, and top, are free from viewpoint irregularities.
△ Less
Submitted 15 July, 2024;
originally announced July 2024.
-
ERR@HRI 2024 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Interactions
Authors:
Micol Spitale,
Maria Teresa Parreira,
Maia Stiber,
Minja Axelsson,
Neval Kara,
Garima Kankariya,
Chien-Ming Huang,
Malte Jung,
Wendy Ju,
Hatice Gunes
Abstract:
Despite the recent advancements in robotics and machine learning (ML), the deployment of autonomous robots in our everyday lives is still an open challenge. This is due to multiple reasons among which are their frequent mistakes, such as interrupting people or having delayed responses, as well as their limited ability to understand human speech, i.e., failure in tasks like transcribing speech to t…
▽ More
Despite the recent advancements in robotics and machine learning (ML), the deployment of autonomous robots in our everyday lives is still an open challenge. This is due to multiple reasons among which are their frequent mistakes, such as interrupting people or having delayed responses, as well as their limited ability to understand human speech, i.e., failure in tasks like transcribing speech to text. These mistakes may disrupt interactions and negatively influence human perception of these robots. To address this problem, robots need to have the ability to detect human-robot interaction (HRI) failures. The ERR@HRI 2024 challenge tackles this by offering a benchmark multimodal dataset of robot failures during human-robot interactions (HRI), encouraging researchers to develop and benchmark multimodal machine learning models to detect these failures. We created a dataset featuring multimodal non-verbal interaction data, including facial, speech, and pose features from video clips of interactions with a robotic coach, annotated with labels indicating the presence or absence of robot mistakes, user awkwardness, and interaction ruptures, allowing for the training and evaluation of predictive models. Challenge participants have been invited to submit their multimodal ML models for detection of robot errors and to be evaluated against various performance metrics such as accuracy, precision, recall, F1 score, with and without a margin of error reflecting the time-sensitivity of these metrics. The results of this challenge will help the research field in better understanding the robot failures in human-robot interactions and designing autonomous robots that can mitigate their own errors after successfully detecting them.
△ Less
Submitted 8 July, 2024;
originally announced July 2024.
-
ConPR: Ongoing Construction Site Dataset for Place Recognition
Authors:
Dongjae Lee,
Minwoo Jung,
Ayoung Kim
Abstract:
Place recognition, an essential challenge in computer vision and robotics, involves identifying previously visited locations. Despite algorithmic progress, challenges related to appearance change persist, with existing datasets often focusing on seasonal and weather variations but overlooking terrain changes. Understanding terrain alterations becomes critical for effective place recognition, given…
▽ More
Place recognition, an essential challenge in computer vision and robotics, involves identifying previously visited locations. Despite algorithmic progress, challenges related to appearance change persist, with existing datasets often focusing on seasonal and weather variations but overlooking terrain changes. Understanding terrain alterations becomes critical for effective place recognition, given the aging infrastructure and ongoing city repairs. For real-world applicability, the comprehensive evaluation of algorithms must consider spatial dynamics. To address existing limitations, we present a novel multi-session place recognition dataset acquired from an active construction site. Our dataset captures ongoing construction progress through multiple data collections, facilitating evaluation in dynamic environments. It includes camera images, LiDAR point cloud data, and IMU data, enabling visual and LiDAR-based place recognition techniques, and supporting sensor fusion. Additionally, we provide ground truth information for range-based place recognition evaluation. Our dataset aims to advance place recognition algorithms in challenging and dynamic settings. Our dataset is available at https://github.com/dongjae0107/ConPR.
△ Less
Submitted 4 July, 2024;
originally announced July 2024.
-
Eliciting New Perspectives in RtD Studies through Annotated Portfolios: A Case Study of Robotic Artefacts
Authors:
Marius Hoggenmuller,
Wen-Ying Lee,
Luke Hespanhol,
Malte Jung,
Martin Tomitsch
Abstract:
In this paper, we investigate how to elicit new perspectives in research-through-design (RtD) studies through annotated portfolios. Situating the usage in human-robot interaction (HRI), we used two robotic artefacts as a case study: we first created our own annotated portfolio and subsequently ran online workshops during which we asked HRI experts to annotate our robotic artefacts. We report on th…
▽ More
In this paper, we investigate how to elicit new perspectives in research-through-design (RtD) studies through annotated portfolios. Situating the usage in human-robot interaction (HRI), we used two robotic artefacts as a case study: we first created our own annotated portfolio and subsequently ran online workshops during which we asked HRI experts to annotate our robotic artefacts. We report on the different aspects revealed about the value, use, and further improvements of the robotic artefacts through using the annotated portfolio technique ourselves versus using it with experts. We suggest that annotated portfolios - when performed by external experts - allow design researchers to obtain a form of creative and generative peer critique. Our paper offers methodological considerations for conducting expert annotation sessions. Further, we discuss the use of annotated portfolios to unveil designerly HRI knowledge in RtD studies.
△ Less
Submitted 16 June, 2024;
originally announced June 2024.
-
Navigating Conflicting Views: Harnessing Trust for Learning
Authors:
Jueqing Lu,
Lan Du,
Wray Buntine,
Myong Chol Jung,
Joanna Dipnall,
Belinda Gabbe
Abstract:
Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct inform…
▽ More
Resolving conflicts is essential to make the decisions of multi-view classification more reliable. Much research has been conducted on learning consistent informative representations among different views, assuming that all views are identically important and strictly aligned. However, real-world multi-view data may not always conform to these assumptions, as some views may express distinct information. To address this issue, we develop a computational trust-based discounting method to enhance the existing trustworthy framework in scenarios where conflicts between different views may arise. Its belief fusion process considers the trustworthiness of predictions made by individual views via an instance-wise probability-sensitive trust discounting mechanism. We evaluate our method on six real-world datasets, using Top-1 Accuracy, AUC-ROC for Uncertainty-Aware Prediction, Fleiss' Kappa, and a new metric called Multi-View Agreement with Ground Truth that takes into consideration the ground truth labels. The experimental results show that computational trust can effectively resolve conflicts, paving the way for more reliable multi-view classification models in real-world applications.
△ Less
Submitted 2 June, 2024;
originally announced June 2024.
-
TotalVibeSegmentator: Full Torso Segmentation for the NAKO and UK Biobank in Volumetric Interpolated Breath-hold Examination Body Images
Authors:
Robert Graf,
Paul-Sören Platzek,
Evamaria Olga Riedel,
Constanze Ramschütz,
Sophie Starck,
Hendrik Kristian Möller,
Matan Atad,
Henry Völzke,
Robin Bülow,
Carsten Oliver Schmidt,
Julia Rüdebusch,
Matthias Jung,
Marco Reisert,
Jakob Weiss,
Maximilian Löffler,
Fabian Bamberg,
Bene Wiestler,
Johannes C. Paetzold,
Daniel Rueckert,
Jan Stefan Kirschke
Abstract:
Objectives: To present a publicly available torso segmentation network for large epidemiology datasets on volumetric interpolated breath-hold examination (VIBE) images. Materials & Methods: We extracted preliminary segmentations from TotalSegmentator, spine, and body composition networks for VIBE images, then improved them iteratively and retrained a nnUNet network. Using subsets of NAKO (85 subje…
▽ More
Objectives: To present a publicly available torso segmentation network for large epidemiology datasets on volumetric interpolated breath-hold examination (VIBE) images. Materials & Methods: We extracted preliminary segmentations from TotalSegmentator, spine, and body composition networks for VIBE images, then improved them iteratively and retrained a nnUNet network. Using subsets of NAKO (85 subjects) and UK Biobank (16 subjects), we evaluated with Dice-score on a holdout set (12 subjects) and existing organ segmentation approach (1000 subjects), generating 71 semantic segmentation types for VIBE images. We provide an additional network for the vertebra segments 22 individual vertebra types. Results: We achieved an average Dice score of 0.89 +- 0.07 overall 71 segmentation labels. We scored > 0.90 Dice-score on the abdominal organs except for the pancreas with a Dice of 0.70. Conclusion: Our work offers a detailed and refined publicly available full torso segmentation on VIBE images.
△ Less
Submitted 31 May, 2024;
originally announced June 2024.
-
TotalSegmentator MRI: Sequence-Independent Segmentation of 59 Anatomical Structures in MR images
Authors:
Tugba Akinci D'Antonoli,
Lucas K. Berger,
Ashraya K. Indrakanti,
Nathan Vishwanathan,
Jakob Weiß,
Matthias Jung,
Zeynep Berkarda,
Alexander Rau,
Marco Reisert,
Thomas Küstner,
Alexandra Walter,
Elmar M. Merkle,
Martin Segeroth,
Joshy Cyriac,
Shan Yang,
Jakob Wasserthal
Abstract:
Purpose: To develop an open-source and easy-to-use segmentation model that can automatically and robustly segment most major anatomical structures in MR images independently of the MR sequence.
Materials and Methods: In this study we extended the capabilities of TotalSegmentator to MR images. 298 MR scans and 227 CT scans were used to segment 59 anatomical structures (20 organs, 18 bones, 11 mus…
▽ More
Purpose: To develop an open-source and easy-to-use segmentation model that can automatically and robustly segment most major anatomical structures in MR images independently of the MR sequence.
Materials and Methods: In this study we extended the capabilities of TotalSegmentator to MR images. 298 MR scans and 227 CT scans were used to segment 59 anatomical structures (20 organs, 18 bones, 11 muscles, 7 vessels, 3 tissue types) relevant for use cases such as organ volumetry, disease characterization, and surgical planning. The MR and CT images were randomly sampled from routine clinical studies and thus represent a real-world dataset (different ages, pathologies, scanners, body parts, sequences, contrasts, echo times, repetition times, field strengths, slice thicknesses and sites). We trained an nnU-Net segmentation algorithm on this dataset and calculated Dice similarity coefficients (Dice) to evaluate the model's performance.
Results: The model showed a Dice score of 0.824 (CI: 0.801, 0.842) on the test set, which included a wide range of clinical data with major pathologies. The model significantly outperformed two other publicly available segmentation models (Dice score, 0.824 versus 0.762; p<0.001 and 0.762 versus 0.542; p<0.001). On the CT image test set of the original TotalSegmentator paper it almost matches the performance of the original TotalSegmentator (Dice score, 0.960 versus 0.970; p<0.001).
Conclusion: Our proposed model extends the capabilities of TotalSegmentator to MR images. The annotated dataset (https://zenodo.org/doi/10.5281/zenodo.11367004) and open-source toolkit (https://www.github.com/wasserth/TotalSegmentator) are publicly available.
△ Less
Submitted 29 May, 2024;
originally announced May 2024.
-
Enhancing Near OOD Detection in Prompt Learning: Maximum Gains, Minimal Costs
Authors:
Myong Chol Jung,
He Zhao,
Joanna Dipnall,
Belinda Gabbe,
Lan Du
Abstract:
Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detectio…
▽ More
Prompt learning has shown to be an efficient and effective fine-tuning method for vision-language models like CLIP. While numerous studies have focused on the generalisation of these models in few-shot classification, their capability in near out-of-distribution (OOD) detection has been overlooked. A few recent works have highlighted the promising performance of prompt learning in far OOD detection. However, the more challenging task of few-shot near OOD detection has not yet been addressed. In this study, we investigate the near OOD detection capabilities of prompt learning models and observe that commonly used OOD scores have limited performance in near OOD detection. To enhance the performance, we propose a fast and simple post-hoc method that complements existing logit-based scores, improving near OOD detection AUROC by up to 11.67% with minimal computational cost. Our method can be easily applied to any prompt learning model without change in architecture or re-training the models. Comprehensive empirical evaluations across 13 datasets and 8 models demonstrate the effectiveness and adaptability of our method.
△ Less
Submitted 25 May, 2024;
originally announced May 2024.
-
NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG
Authors:
Cheol-Hui Lee,
Hakseung Kim,
Hyun-jee Han,
Min-Kyung Jung,
Byung C. Yoon,
Dong-Joo Kim
Abstract:
The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need fo…
▽ More
The classification of sleep stages is a pivotal aspect of diagnosing sleep disorders and evaluating sleep quality. However, the conventional manual scoring process, conducted by clinicians, is time-consuming and prone to human bias. Recent advancements in deep learning have substantially propelled the automation of sleep stage classification. Nevertheless, challenges persist, including the need for large datasets with labels and the inherent biases in human-generated annotations. This paper introduces NeuroNet, a self-supervised learning (SSL) framework designed to effectively harness unlabeled single-channel sleep electroencephalogram (EEG) signals by integrating contrastive learning tasks and masked prediction tasks. NeuroNet demonstrates superior performance over existing SSL methodologies through extensive experimentation conducted across three polysomnography (PSG) datasets. Additionally, this study proposes a Mamba-based temporal context module to capture the relationships among diverse EEG epochs. Combining NeuroNet with the Mamba-based temporal context module has demonstrated the capability to achieve, or even surpass, the performance of the latest supervised learning methodologies, even with a limited amount of labeled data. This study is expected to establish a new benchmark in sleep stage classification, promising to guide future research and applications in the field of sleep analysis.
△ Less
Submitted 13 May, 2024; v1 submitted 10 April, 2024;
originally announced April 2024.
-
Toward Robust LiDAR based 3D Object Detection via Density-Aware Adaptive Thresholding
Authors:
Eunho Lee,
Minwoo Jung,
Ayoung Kim
Abstract:
Robust 3D object detection is a core challenge for autonomous mobile systems in field robotics. To tackle this issue, many researchers have demonstrated improvements in 3D object detection performance in datasets. However, real-world urban scenarios with unstructured and dynamic situations can still lead to numerous false positives, posing a challenge for robust 3D object detection models. This pa…
▽ More
Robust 3D object detection is a core challenge for autonomous mobile systems in field robotics. To tackle this issue, many researchers have demonstrated improvements in 3D object detection performance in datasets. However, real-world urban scenarios with unstructured and dynamic situations can still lead to numerous false positives, posing a challenge for robust 3D object detection models. This paper presents a post-processing algorithm that dynamically adjusts object detection thresholds based on the distance from the ego-vehicle. 3D object detection models usually perform well in detecting nearby objects but may exhibit suboptimal performance for distant ones. While conventional perception algorithms typically employ a single threshold in post-processing, the proposed algorithm addresses this issue by employing adaptive thresholds based on the distance from the ego-vehicle, minimizing false negatives and reducing false positives in urban scenarios. The results show performance enhancements in 3D object detection models across a range of scenarios, not only in dynamic urban road conditions but also in scenarios involving adverse weather conditions.
△ Less
Submitted 21 April, 2024;
originally announced April 2024.
-
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Authors:
Tianyu Zhu,
Myong Chol Jung,
Jesse Clark
Abstract:
Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular contrastive frameworks typically learn from binary relevance, making them ineffective at incorporating direct fine-grained rankings. In this paper, we curate a large-scale dataset featuring detailed relevance scores for each query-document pair to facilita…
▽ More
Contrastive learning has gained widespread adoption for retrieval tasks due to its minimal requirement for manual annotations. However, popular contrastive frameworks typically learn from binary relevance, making them ineffective at incorporating direct fine-grained rankings. In this paper, we curate a large-scale dataset featuring detailed relevance scores for each query-document pair to facilitate future research and evaluation. Subsequently, we propose Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking (GCL), which is designed to learn from fine-grained rankings beyond binary relevance scores. Our results show that GCL achieves a 94.5% increase in NDCG@10 for in-domain and 26.3 to 48.8% increases for cold-start evaluations, all relative to the CLIP baseline and involving ground truth rankings.
△ Less
Submitted 12 April, 2024;
originally announced April 2024.
-
Formation-Controlled Dimensionality Reduction
Authors:
Taeuk Jeong,
Yoon Mo Jung
Abstract:
Dimensionality reduction represents the process of generating a low dimensional representation of high dimensional data. Motivated by the formation control of mobile agents, we propose a nonlinear dynamical system for dimensionality reduction. The system consists of two parts; the control of neighbor points, addressing local structures, and the control of remote points, accounting for global struc…
▽ More
Dimensionality reduction represents the process of generating a low dimensional representation of high dimensional data. Motivated by the formation control of mobile agents, we propose a nonlinear dynamical system for dimensionality reduction. The system consists of two parts; the control of neighbor points, addressing local structures, and the control of remote points, accounting for global structures. We also include a brief mathematical observation of the model and its numerical procedure. Numerical experiments are performed on both synthetic and real datasets and comparisons with existing models demonstrate the soundness and effectiveness of the proposed model.
△ Less
Submitted 10 April, 2024;
originally announced April 2024.
-
ContrastCAD: Contrastive Learning-based Representation Learning for Computer-Aided Design Models
Authors:
Minseop Jung,
Minseong Kim,
Jibum Kim
Abstract:
The success of Transformer-based models has encouraged many researchers to learn CAD models using sequence-based approaches. However, learning CAD models is still a challenge, because they can be represented as complex shapes with long construction sequences. Furthermore, the same CAD model can be expressed using different CAD construction sequences. We propose a novel contrastive learning-based a…
▽ More
The success of Transformer-based models has encouraged many researchers to learn CAD models using sequence-based approaches. However, learning CAD models is still a challenge, because they can be represented as complex shapes with long construction sequences. Furthermore, the same CAD model can be expressed using different CAD construction sequences. We propose a novel contrastive learning-based approach, named ContrastCAD, that effectively captures semantic information within the construction sequences of the CAD model. ContrastCAD generates augmented views using dropout techniques without altering the shape of the CAD model. We also propose a new CAD data augmentation method, called a Random Replace and Extrude (RRE) method, to enhance the learning performance of the model when training an imbalanced training CAD dataset. Experimental results show that the proposed RRE augmentation method significantly enhances the learning performance of Transformer-based autoencoders, even for complex CAD models having very long construction sequences. The proposed ContrastCAD model is shown to be robust to permutation changes of construction sequences and performs better representation learning by generating representation spaces where similar CAD models are more closely clustered. Our codes are available at https://github.com/cm8908/ContrastCAD.
△ Less
Submitted 2 April, 2024;
originally announced April 2024.
-
Imaging radar and LiDAR image translation for 3-DOF extrinsic calibration
Authors:
Sangwoo Jung,
Hyesu Jang,
Minwoo Jung,
Ayoung Kim,
Myung-Hwan Jeon
Abstract:
The integration of sensor data is crucial in the field of robotics to take full advantage of the various sensors employed. One critical aspect of this integration is determining the extrinsic calibration parameters, such as the relative transformation, between each sensor. The use of data fusion between complementary sensors, such as radar and LiDAR, can provide significant benefits, particularly…
▽ More
The integration of sensor data is crucial in the field of robotics to take full advantage of the various sensors employed. One critical aspect of this integration is determining the extrinsic calibration parameters, such as the relative transformation, between each sensor. The use of data fusion between complementary sensors, such as radar and LiDAR, can provide significant benefits, particularly in harsh environments where accurate depth data is required. However, noise included in radar sensor data can make the estimation of extrinsic calibration challenging. To address this issue, we present a novel framework for the extrinsic calibration of radar and LiDAR sensors, utilizing CycleGAN as amethod of image-to-image translation. Our proposed method employs translating radar bird-eye-view images into LiDAR-style images to estimate the 3-DOF extrinsic parameters. The use of image registration techniques, as well as deskewing based on sensor odometry and B-spline interpolation, is employed to address the rolling shutter effect commonly present in spinning sensors. Our method demonstrates a notable improvement in extrinsic calibration compared to filter-based methods using the MulRan dataset.
△ Less
Submitted 27 March, 2024;
originally announced March 2024.
-
QCEDA: Using Quantum Computers for EDA
Authors:
Matthias Jung,
Sven O. Krumke,
Christof Schroth,
Elisabeth Lobe,
Wolfgang Mauerer
Abstract:
The field of Electronic Design Automation (EDA) is crucial for microelectronics, but the increasing complexity of Integrated Circuits (ICs) poses challenges for conventional EDA: Corresponding problems are often NP-hard and are therefore in general solved by heuristics, not guaranteeing optimal solutions. Quantum computers may offer better solutions due to their potential for optimization through…
▽ More
The field of Electronic Design Automation (EDA) is crucial for microelectronics, but the increasing complexity of Integrated Circuits (ICs) poses challenges for conventional EDA: Corresponding problems are often NP-hard and are therefore in general solved by heuristics, not guaranteeing optimal solutions. Quantum computers may offer better solutions due to their potential for optimization through entanglement, superposition, and interference. Most of the works in the area of EDA and quantum computers focus on how to use EDA for building quantum circuits. However, almost no research focuses on exploiting quantum computers for solving EDA problems. Therefore, this paper investigates the feasibility and potential of quantum computing for a typical EDA optimization problem broken down to the Min-$k$-Union problem. The problem is mathematically transformed into a Quadratic Unconstrained Binary Optimization (QUBO) problem, which was successfully solved on an IBM quantum computer and a D-Wave quantum annealer.
△ Less
Submitted 6 March, 2024;
originally announced March 2024.
-
LodeStar: Maritime Radar Descriptor for Semi-Direct Radar Odometry
Authors:
Hyesu Jang,
Minwoo Jung,
Myung-Hwan Jeon,
Ayoung Kim
Abstract:
Maritime radars are prevalently adopted to capture the vessel's omnidirectional data as imagery. Nevertheless, inherent challenges persist with marine radars, including limited frequency, suboptimal resolution, and indeterminate detections. Additionally, the scarcity of discernible landmarks in the vast marine expanses remains a challenge, resulting in consecutive scenes that often lack matching f…
▽ More
Maritime radars are prevalently adopted to capture the vessel's omnidirectional data as imagery. Nevertheless, inherent challenges persist with marine radars, including limited frequency, suboptimal resolution, and indeterminate detections. Additionally, the scarcity of discernible landmarks in the vast marine expanses remains a challenge, resulting in consecutive scenes that often lack matching feature points. In this context, we introduce a resilient maritime radar scan representation LodeStar, and an enhanced feature extraction technique tailored for marine radar applications. Moreover, we embark on estimating marine radar odometry utilizing a semi-direct approach. LodeStar-based approach markedly attenuates the errors in odometry estimation, and our assertion is corroborated through meticulous experimental validation.
△ Less
Submitted 5 March, 2024;
originally announced March 2024.
-
Understanding Entrainment in Human Groups: Optimising Human-Robot Collaboration from Lessons Learned during Human-Human Collaboration
Authors:
Eike Schneiders,
Christopher Fourie,
Stanley Celestin,
Julie Shah,
Malte Jung
Abstract:
Successful entrainment during collaboration positively affects trust, willingness to collaborate, and likeability towards collaborators. In this paper, we present a mixed-method study to investigate characteristics of successful entrainment leading to pair and group-based synchronisation. Drawing inspiration from industrial settings, we designed a fast-paced, short-cycle repetitive task. Using mot…
▽ More
Successful entrainment during collaboration positively affects trust, willingness to collaborate, and likeability towards collaborators. In this paper, we present a mixed-method study to investigate characteristics of successful entrainment leading to pair and group-based synchronisation. Drawing inspiration from industrial settings, we designed a fast-paced, short-cycle repetitive task. Using motion tracking, we investigated entrainment in both dyadic and triadic task completion. Furthermore, we utilise audio-video recordings and semi-structured interviews to contextualise participants' experiences. This paper contributes to the Human-Computer/Robot Interaction (HCI/HRI) literature using a human-centred approach to identify characteristics of entrainment during pair- and group-based collaboration. We present five characteristics related to successful entrainment. These are related to the occurrence of entrainment, leader-follower patterns, interpersonal communication, the importance of the point-of-assembly, and the value of acoustic feedback. Finally, we present three design considerations for future research and design on collaboration with robots.
△ Less
Submitted 23 February, 2024;
originally announced February 2024.
-
LiDAR Odometry Survey: Recent Advancements and Remaining Challenges
Authors:
Dongjae Lee,
Minwoo Jung,
Wooseong Yang,
Ayoung Kim
Abstract:
Odometry is crucial for robot navigation, particularly in situations where global positioning methods like global positioning system (GPS) are unavailable. The main goal of odometry is to predict the robot's motion and accurately determine its current location. Various sensors, such as wheel encoder, inertial measurement unit (IMU), camera, radar, and Light Detection and Ranging (LiDAR), are used…
▽ More
Odometry is crucial for robot navigation, particularly in situations where global positioning methods like global positioning system (GPS) are unavailable. The main goal of odometry is to predict the robot's motion and accurately determine its current location. Various sensors, such as wheel encoder, inertial measurement unit (IMU), camera, radar, and Light Detection and Ranging (LiDAR), are used for odometry in robotics. LiDAR, in particular, has gained attention for its ability to provide rich three-dimensional (3D) data and immunity to light variations. This survey aims to examine advancements in LiDAR odometry thoroughly. We start by exploring LiDAR technology and then scrutinize LiDAR odometry works, categorizing them based on their sensor integration approaches. These approaches include methods relying solely on LiDAR, those combining LiDAR with IMU, strategies involving multiple LiDARs, and methods fusing LiDAR with other sensor modalities. In conclusion, we address existing challenges and outline potential future directions in LiDAR odometry. Additionally, we analyze public datasets and evaluation methods for LiDAR odometry. To our knowledge, this survey is the first comprehensive exploration of LiDAR odometry.
△ Less
Submitted 29 December, 2023;
originally announced December 2023.
-
What Predicts Interpersonal Affect? Preliminary Analyses from Retrospective Evaluations
Authors:
Maria Teresa Parreira,
Michael J. Sack,
Malte Jung
Abstract:
While the field of affective computing has contributed to greatly improving the seamlessness of human-robot interactions, the focus has primarily been on the emotional processing of the self, rather than the perception of the other. To address this gap, in a user study with 30 participant dyads, we collected the users' retrospective ratings of the interpersonal perception of the other interactant,…
▽ More
While the field of affective computing has contributed to greatly improving the seamlessness of human-robot interactions, the focus has primarily been on the emotional processing of the self, rather than the perception of the other. To address this gap, in a user study with 30 participant dyads, we collected the users' retrospective ratings of the interpersonal perception of the other interactant, after a short interaction. We made use of CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation. In this work, we analyze how these interpersonal ratings correlate with different aspects of the interaction, namely personality traits, participation balance, and sentiment analysis. Notably, we discovered that conversational imbalance has a significant effect on the retrospective ratings, among other findings. By employing these analyses and methodologies, we lay the groundwork for enhanced human-robot interactions, wherein affect is understood as a highly dynamic and context-dependent outcome of interaction history.
△ Less
Submitted 15 November, 2023;
originally announced November 2023.
-
NITEC: Versatile Hand-Annotated Eye Contact Dataset for Ego-Vision Interaction
Authors:
Thorsten Hempel,
Magnus Jung,
Ahmed A. Abdelrahman,
Ayoub Al-Hamadi
Abstract:
Eye contact is a crucial non-verbal interaction modality and plays an important role in our everyday social life. While humans are very sensitive to eye contact, the capabilities of machines to capture a person's gaze are still mediocre. We tackle this challenge and present NITEC, a hand-annotated eye contact dataset for ego-vision interaction. NITEC exceeds existing datasets for ego-vision eye co…
▽ More
Eye contact is a crucial non-verbal interaction modality and plays an important role in our everyday social life. While humans are very sensitive to eye contact, the capabilities of machines to capture a person's gaze are still mediocre. We tackle this challenge and present NITEC, a hand-annotated eye contact dataset for ego-vision interaction. NITEC exceeds existing datasets for ego-vision eye contact in size and variety of demographics, social contexts, and lighting conditions, making it a valuable resource for advancing ego-vision-based eye contact research. Our extensive evaluations on NITEC demonstrate strong cross-dataset performance, emphasizing its effectiveness and adaptability in various scenarios, that allows seamless utilization to the fields of computer vision, human-computer interaction, and social robotics. We make our NITEC dataset publicly available to foster reproducibility and further exploration in the field of ego-vision interaction. https://github.com/thohemp/nitec
△ Less
Submitted 8 November, 2023;
originally announced November 2023.
-
(Social) Trouble on the Road: Understanding and Addressing Social Discomfort in Shared Car Trips
Authors:
Alexandra Bremers,
Natalie Friedman,
Sam Lee,
Tong Wu,
Eric Laurier,
Malte Jung,
Jorge Ortiz,
Wendy Ju
Abstract:
Unpleasant social interactions on the road can negatively affect driving safety. At the same time, researchers have attempted to address social discomfort by exploring Conversational User Interfaces (CUIs) as social mediators. Before knowing whether CUIs could reduce social discomfort in a car, it is necessary to understand the nature of social discomfort in shared rides. To this end, we recorded…
▽ More
Unpleasant social interactions on the road can negatively affect driving safety. At the same time, researchers have attempted to address social discomfort by exploring Conversational User Interfaces (CUIs) as social mediators. Before knowing whether CUIs could reduce social discomfort in a car, it is necessary to understand the nature of social discomfort in shared rides. To this end, we recorded nine families going on drives and performed interaction analysis on this data. We define three strategies to address social discomfort: contextual mediation, social mediation, and social support. We discuss considerations for engineering and design, and explore the limitations of current large language models in addressing social discomfort on the road.
△ Less
Submitted 24 May, 2024; v1 submitted 7 November, 2023;
originally announced November 2023.
-
Toward Reinforcement Learning-based Rectilinear Macro Placement Under Human Constraints
Authors:
Tuyen P. Le,
Hieu T. Nguyen,
Seungyeol Baek,
Taeyoun Kim,
Jungwoo Lee,
Seongjung Kim,
Hyunjin Kim,
Misu Jung,
Daehoon Kim,
Seokyong Lee,
Daewoo Choi
Abstract:
Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology tha…
▽ More
Macro placement is a critical phase in chip design, which becomes more intricate when involving general rectilinear macros and layout areas. Furthermore, macro placement that incorporates human-like constraints, such as design hierarchy and peripheral bias, has the potential to significantly reduce the amount of additional manual labor required from designers. This study proposes a methodology that leverages an approach suggested by Google's Circuit Training (G-CT) to provide a learning-based macro placer that not only supports placing rectilinear cases, but also adheres to crucial human-like design principles. Our experimental results demonstrate the effectiveness of our framework in achieving power-performance-area (PPA) metrics and in obtaining placements of high quality, comparable to those produced with human intervention. Additionally, our methodology shows potential as a generalized model to address diverse macro shapes and layout areas.
△ Less
Submitted 2 November, 2023;
originally announced November 2023.
-
Re-Scoring Using Image-Language Similarity for Few-Shot Object Detection
Authors:
Min Jae Jung,
Seung Dae Han,
Joohee Kim
Abstract:
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specificall…
▽ More
Few-shot object detection, which focuses on detecting novel objects with few labels, is an emerging challenge in the community. Recent studies show that adapting a pre-trained model or modified loss function can improve performance. In this paper, we explore leveraging the power of Contrastive Language-Image Pre-training (CLIP) and hard negative classification loss in low data setting. Specifically, we propose Re-scoring using Image-language Similarity for Few-shot object detection (RISF) which extends Faster R-CNN by introducing Calibration Module using CLIP (CM-CLIP) and Background Negative Re-scale Loss (BNRL). The former adapts CLIP, which performs zero-shot classification, to re-score the classification scores of a detector using image-class similarities, the latter is modified classification loss considering the punishment for fake backgrounds as well as confusing categories on a generalized few-shot object detection dataset. Extensive experiments on MS-COCO and PASCAL VOC show that the proposed RISF substantially outperforms the state-of-the-art approaches. The code will be available.
△ Less
Submitted 1 November, 2023;
originally announced November 2023.
-
PGA: Personalizing Grasping Agents with Single Human-Robot Interaction
Authors:
Junghyun Kim,
Gi-Cheon Kang,
Jaein Kim,
Seoyun Yang,
Minjoon Jung,
Byoung-Tak Zhang
Abstract:
Language-Conditioned Robotic Grasping (LCRG) aims to develop robots that comprehend and grasp objects based on natural language instructions. While the ability to understand personal objects like my wallet facilitates more natural interaction with human users, current LCRG systems only allow generic language instructions, e.g., the black-colored wallet next to the laptop. To this end, we introduce…
▽ More
Language-Conditioned Robotic Grasping (LCRG) aims to develop robots that comprehend and grasp objects based on natural language instructions. While the ability to understand personal objects like my wallet facilitates more natural interaction with human users, current LCRG systems only allow generic language instructions, e.g., the black-colored wallet next to the laptop. To this end, we introduce a task scenario GraspMine alongside a novel dataset aimed at pinpointing and grasping personal objects given personal indicators via learning from a single human-robot interaction, rather than a large labeled dataset. Our proposed method, Personalized Grasping Agent (PGA), addresses GraspMine by leveraging the unlabeled image data of the user's environment, called Reminiscence. Specifically, PGA acquires personal object information by a user presenting a personal object with its associated indicator, followed by PGA inspecting the object by rotating it. Based on the acquired information, PGA pseudo-labels objects in the Reminiscence by our proposed label propagation algorithm. Harnessing the information acquired from the interactions and the pseudo-labeled objects in the Reminiscence, PGA adapts the object grounding model to grasp personal objects. This results in significant efficiency while previous LCRG systems rely on resource-intensive human annotations -- necessitating hundreds of labeled data to learn my wallet. Moreover, PGA outperforms baseline methods across all metrics and even shows comparable performance compared to the fully-supervised method, which learns from 9k annotated data samples. We further validate PGA's real-world applicability by employing a physical robot to execute GrsapMine. Code and data are publicly available at https://github.com/JHKim-snu/PGA.
△ Less
Submitted 19 March, 2024; v1 submitted 19 October, 2023;
originally announced October 2023.
-
HeLiPR: Heterogeneous LiDAR Dataset for inter-LiDAR Place Recognition under Spatiotemporal Variations
Authors:
Minwoo Jung,
Wooseong Yang,
Dongjae Lee,
Hyeonjae Gil,
Giseop Kim,
Ayoung Kim
Abstract:
Place recognition is crucial for robot localization and loop closure in simultaneous localization and mapping (SLAM). Light Detection and Ranging (LiDAR), known for its robust sensing capabilities and measurement consistency even in varying illumination conditions, has become pivotal in various fields, surpassing traditional imaging sensors in certain applications. Among various types of LiDAR, sp…
▽ More
Place recognition is crucial for robot localization and loop closure in simultaneous localization and mapping (SLAM). Light Detection and Ranging (LiDAR), known for its robust sensing capabilities and measurement consistency even in varying illumination conditions, has become pivotal in various fields, surpassing traditional imaging sensors in certain applications. Among various types of LiDAR, spinning LiDARs are widely used, while non-repetitive scanning patterns have recently been utilized in robotics applications. Some LiDARs provide additional measurements such as reflectivity, Near Infrared (NIR), and velocity from Frequency modulated continuous wave (FMCW) LiDARs. Despite these advances, there is a lack of comprehensive datasets reflecting the broad spectrum of LiDAR configurations for place recognition. To tackle this issue, our paper proposes the HeLiPR dataset, curated especially for place recognition with heterogeneous LiDARs, embodying spatiotemporal variations. To the best of our knowledge, the HeLiPR dataset is the first heterogeneous LiDAR dataset supporting inter-LiDAR place recognition with both non-repetitive and spinning LiDARs, accommodating different field of view (FOV)s and varying numbers of rays. The dataset covers diverse environments, from urban cityscapes to high-dynamic freeways, over a month, enhancing adaptability and robustness across scenarios. Notably, HeLiPR includes trajectories parallel to MulRan sequences, making it valuable for research in heterogeneous LiDAR place recognition and long-term studies. The dataset is accessible at https://sites.google.com/view/heliprdataset.
△ Less
Submitted 19 March, 2024; v1 submitted 25 September, 2023;
originally announced September 2023.
-
Learning Whole-body Manipulation for Quadrupedal Robot
Authors:
Seunghun Jeon,
Moonkyu Jung,
Suyoung Choi,
Beomjoon Kim,
Jemin Hwangbo
Abstract:
We propose a learning-based system for enabling quadrupedal robots to manipulate large, heavy objects using their whole body. Our system is based on a hierarchical control strategy that uses the deep latent variable embedding which captures manipulation-relevant information from interactions, proprioception, and action history, allowing the robot to implicitly understand object properties. We eval…
▽ More
We propose a learning-based system for enabling quadrupedal robots to manipulate large, heavy objects using their whole body. Our system is based on a hierarchical control strategy that uses the deep latent variable embedding which captures manipulation-relevant information from interactions, proprioception, and action history, allowing the robot to implicitly understand object properties. We evaluate our framework in both simulation and real-world scenarios. In the simulation, it achieves a success rate of 93.6 % in accurately re-positioning and re-orienting various objects within a tolerance of 0.03 m and 5 °. Real-world experiments demonstrate the successful manipulation of objects such as a 19.2 kg water-filled drum and a 15.3 kg plastic box filled with heavy objects while the robot weighs 27 kg. Unlike previous works that focus on manipulating small and light objects using prehensile manipulation, our framework illustrates the possibility of using quadrupeds for manipulating large and heavy objects that are ungraspable with the robot's entire body. Our method does not require explicit object modeling and offers significant computational efficiency compared to optimization-based methods. The video can be found at https://youtu.be/fO_PVr27QxU.
△ Less
Submitted 6 September, 2023; v1 submitted 31 August, 2023;
originally announced August 2023.
-
Not Only Rewards But Also Constraints: Applications on Legged Robot Locomotion
Authors:
Yunho Kim,
Hyunsik Oh,
Jeonghyun Lee,
Jinhyeok Choi,
Gwanghyeon Ji,
Moonkyu Jung,
Donghoon Youm,
Jemin Hwangbo
Abstract:
Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of…
▽ More
Several earlier studies have shown impressive control performance in complex robotic systems by designing the controller using a neural network and training it with model-free reinforcement learning. However, these outstanding controllers with natural motion style and high task performance are developed through extensive reward engineering, which is a highly laborious and time-consuming process of designing numerous reward terms and determining suitable reward coefficients. In this work, we propose a novel reinforcement learning framework for training neural network controllers for complex robotic systems consisting of both rewards and constraints. To let the engineers appropriately reflect their intent to constraints and handle them with minimal computation overhead, two constraint types and an efficient policy optimization algorithm are suggested. The learning framework is applied to train locomotion controllers for several legged robots with different morphology and physical attributes to traverse challenging terrains. Extensive simulation and real-world experiments demonstrate that performant controllers can be trained with significantly less reward engineering, by tuning only a single reward coefficient. Furthermore, a more straightforward and intuitive engineering process can be utilized, thanks to the interpretability and generalizability of constraints. The summary video is available at https://youtu.be/KAlm3yskhvM.
△ Less
Submitted 6 May, 2024; v1 submitted 23 August, 2023;
originally announced August 2023.
-
ProtoFL: Unsupervised Federated Learning via Prototypical Distillation
Authors:
Hansol Kim,
Youngjun Kwak,
Minyoung Jung,
Jinho Shin,
Youngsung Kim,
Changick Kim
Abstract:
Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning…
▽ More
Federated learning (FL) is a promising approach for enhancing data privacy preservation, particularly for authentication systems. However, limited round communications, scarce representation, and scalability pose significant challenges to its deployment, hindering its full potential. In this paper, we propose 'ProtoFL', Prototypical Representation Distillation based unsupervised Federated Learning to enhance the representation power of a global model and reduce round communication costs. Additionally, we introduce a local one-class classifier based on normalizing flows to improve performance with limited data. Our study represents the first investigation of using FL to improve one-class classification performance. We conduct extensive experiments on five widely used benchmarks, namely MNIST, CIFAR-10, CIFAR-100, ImageNet-30, and Keystroke-Dynamics, to demonstrate the superior performance of our proposed framework over previous methods in the literature.
△ Less
Submitted 7 August, 2023; v1 submitted 23 July, 2023;
originally announced July 2023.
-
TRansPose: Large-Scale Multispectral Dataset for Transparent Object
Authors:
Jeongyun Kim,
Myung-Hwan Jeon,
Sangwoo Jung,
Wooseong Yang,
Minwoo Jung,
Jaeho Shin,
Ayoung Kim
Abstract:
Transparent objects are encountered frequently in our daily lives, yet recognizing them poses challenges for conventional vision sensors due to their unique material properties, not being well perceived from RGB or depth cameras. Overcoming this limitation, thermal infrared cameras have emerged as a solution, offering improved visibility and shape information for transparent objects. In this paper…
▽ More
Transparent objects are encountered frequently in our daily lives, yet recognizing them poses challenges for conventional vision sensors due to their unique material properties, not being well perceived from RGB or depth cameras. Overcoming this limitation, thermal infrared cameras have emerged as a solution, offering improved visibility and shape information for transparent objects. In this paper, we present TRansPose, the first large-scale multispectral dataset that combines stereo RGB-D, thermal infrared (TIR) images, and object poses to promote transparent object research. The dataset includes 99 transparent objects, encompassing 43 household items, 27 recyclable trashes, 29 chemical laboratory equivalents, and 12 non-transparent objects. It comprises a vast collection of 333,819 images and 4,000,056 annotations, providing instance-level segmentation masks, ground-truth poses, and completed depth information. The data was acquired using a FLIR A65 thermal infrared (TIR) camera, two Intel RealSense L515 RGB-D cameras, and a Franka Emika Panda robot manipulator. Spanning 87 sequences, TRansPose covers various challenging real-life scenarios, including objects filled with water, diverse lighting conditions, heavy clutter, non-transparent or translucent containers, objects in plastic bags, and multi-stacked objects. TRansPose dataset can be accessed from the following link: https://sites.google.com/view/transpose-dataset
△ Less
Submitted 10 November, 2023; v1 submitted 11 July, 2023;
originally announced July 2023.
-
RaPlace: Place Recognition for Imaging Radar using Radon Transform and Mutable Threshold
Authors:
Hyesu Jang,
Minwoo Jung,
Ayoung Kim
Abstract:
Due to the robustness in sensing, radar has been highlighted, overcoming harsh weather conditions such as fog and heavy snow. In this paper, we present a novel radar-only place recognition that measures the similarity score by utilizing Radon-transformed sinogram images and cross-correlation in frequency domain. Doing so achieves rigid transform invariance during place recognition, while ignoring…
▽ More
Due to the robustness in sensing, radar has been highlighted, overcoming harsh weather conditions such as fog and heavy snow. In this paper, we present a novel radar-only place recognition that measures the similarity score by utilizing Radon-transformed sinogram images and cross-correlation in frequency domain. Doing so achieves rigid transform invariance during place recognition, while ignoring the effects of radar multipath and ring noises. In addition, we compute the radar similarity distance using mutable threshold to mitigate variability of the similarity score, and reduce the time complexity of processing a copious radar data with hierarchical retrieval. We demonstrate the matching performance for both intra-session loop-closure detection and global place recognition using a publicly available imaging radar datasets. We verify reliable performance compared to existing stable radar place recognition method. Furthermore, codes for the proposed imaging radar place recognition is released for community.
△ Less
Submitted 9 July, 2023;
originally announced July 2023.
-
"How Did They Come Across?" Lessons Learned from Continuous Affective Ratings
Authors:
Maria Teresa Parreira,
Michael J. Sack,
Hifza Javed,
Nawid Jamali,
Malte Jung
Abstract:
Social distance, or perception of the other, is recognized as a dynamic dimension of an interaction, but yet to be widely explored or understood. Through CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation, we collected retrospective ratings of interpersonal perceptions between 12 participant dyads. In this work, we explore how different aspects of these intera…
▽ More
Social distance, or perception of the other, is recognized as a dynamic dimension of an interaction, but yet to be widely explored or understood. Through CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation, we collected retrospective ratings of interpersonal perceptions between 12 participant dyads. In this work, we explore how different aspects of these interactions reflect on the ratings collected, through a discourse analysis of individual and social behavior of the interactants. We found that different events observed in the ratings can be mapped to complex interaction phenomena, shedding light on relevant interaction features that may play a role in interpersonal understanding and grounding. This paves the way for better, more seamless human-robot interactions, where affect is interpreted as highly dynamic and contingent on interaction history.
△ Less
Submitted 7 July, 2023;
originally announced July 2023.
-
CORAE: A Tool for Intuitive and Continuous Retrospective Evaluation of Interactions
Authors:
Michael J. Sack,
Maria Teresa Parreira,
Jenny Fu,
Asher Lipman,
Hifza Javed,
Nawid Jamali,
Malte Jung
Abstract:
This paper introduces CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation, designed to capture continuous affect data about interpersonal perceptions in dyadic interactions. Grounded in behavioral ecology perspectives of emotion, this approach replaces valence as the relevant rating dimension with approach and withdrawal, reflecting the degree to which behavior…
▽ More
This paper introduces CORAE, a novel web-based open-source tool for COntinuous Retrospective Affect Evaluation, designed to capture continuous affect data about interpersonal perceptions in dyadic interactions. Grounded in behavioral ecology perspectives of emotion, this approach replaces valence as the relevant rating dimension with approach and withdrawal, reflecting the degree to which behavior is perceived as increasing or decreasing social distance. We conducted a study to experimentally validate the efficacy of our platform with 24 participants. The tool's effectiveness was tested in the context of dyadic negotiation, revealing insights about how interpersonal dynamics evolve over time. We find that the continuous affect rating method is consistent with individuals' perception of the overall interaction. This paper contributes to the growing body of research on affective computing and offers a valuable tool for researchers interested in investigating the temporal dynamics of affect and emotion in social interactions.
△ Less
Submitted 28 June, 2023;
originally announced June 2023.
-
Overcoming Weak Visual-Textual Alignment for Video Moment Retrieval
Authors:
Minjoon Jung,
Youwon Jang,
Seongho Choi,
Joochan Kim,
Jin-Hwa Kim,
Byoung-Tak Zhang
Abstract:
Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query. This task is prone to suffer the weak visual-textual alignment problem innate in video datasets. Due to the ambiguity, a query does not fully cover the relevant details of the corresponding moment, or the moment may contain misaligned and irrelevant frames, potentially limiting furth…
▽ More
Video moment retrieval (VMR) identifies a specific moment in an untrimmed video for a given natural language query. This task is prone to suffer the weak visual-textual alignment problem innate in video datasets. Due to the ambiguity, a query does not fully cover the relevant details of the corresponding moment, or the moment may contain misaligned and irrelevant frames, potentially limiting further performance gains. To tackle this problem, we propose a background-aware moment detection transformer (BM-DETR). Our model adopts a contrastive approach, carefully utilizing the negative queries matched to other moments in the video. Specifically, our model learns to predict the target moment from the joint probability of each frame given the positive query and the complement of negative queries. This leads to effective use of the surrounding background, improving moment sensitivity and enhancing overall alignments in videos. Extensive experiments on four benchmarks demonstrate the effectiveness of our approach.
△ Less
Submitted 19 November, 2023; v1 submitted 5 June, 2023;
originally announced June 2023.
-
GraphTensor: Comprehensive GNN-Acceleration Framework for Efficient Parallel Processing of Massive Datasets
Authors:
Junhyeok Jang,
Miryeong Kwon,
Donghyun Gouk,
Hanyeoreum Bae,
Myoungsoo Jung
Abstract:
We present GraphTensor, a comprehensive open-source framework that supports efficient parallel neural network processing on large graphs. GraphTensor offers a set of easy-to-use programming primitives that appreciate both graph and neural network execution behaviors from the beginning (graph sampling) to the end (dense data processing). Our framework runs diverse graph neural network (GNN) models…
▽ More
We present GraphTensor, a comprehensive open-source framework that supports efficient parallel neural network processing on large graphs. GraphTensor offers a set of easy-to-use programming primitives that appreciate both graph and neural network execution behaviors from the beginning (graph sampling) to the end (dense data processing). Our framework runs diverse graph neural network (GNN) models in a destination-centric, feature-wise manner, which can significantly shorten training execution times in a GPU. In addition, GraphTensor rearranges multiple GNN kernels based on their system hyperparameters in a self-governing manner, thereby reducing the processing dimensionality and the latencies further. From the end-to-end execution viewpoint, GraphTensor significantly shortens the service-level GNN latency by applying pipeline parallelism for efficient graph dataset preprocessing. Our evaluation shows that GraphTensor exhibits 1.4x better training performance than emerging GNN frameworks under the execution of large-scale, real-world graph workloads. For the end-to-end services, GraphTensor reduces training latencies of an advanced version of the GNN frameworks (optimized for multi-threaded graph sampling) by 2.4x, on average.
△ Less
Submitted 27 May, 2023;
originally announced May 2023.
-
Asynchronous Multiple LiDAR-Inertial Odometry using Point-wise Inter-LiDAR Uncertainty Propagation
Authors:
Minwoo Jung,
Sangwoo Jung,
Ayoung Kim
Abstract:
In recent years, multiple Light Detection and Ranging (LiDAR) systems have grown in popularity due to their enhanced accuracy and stability from the increased field of view (FOV). However, integrating multiple LiDARs can be challenging, attributable to temporal and spatial discrepancies. Common practice is to transform points among sensors while requiring strict time synchronization or approximati…
▽ More
In recent years, multiple Light Detection and Ranging (LiDAR) systems have grown in popularity due to their enhanced accuracy and stability from the increased field of view (FOV). However, integrating multiple LiDARs can be challenging, attributable to temporal and spatial discrepancies. Common practice is to transform points among sensors while requiring strict time synchronization or approximating transformation among sensor frames. Unlike existing methods, we elaborate the inter-sensor transformation using continuous-time (CT) inertial measurement unit (IMU) modeling and derive associated ambiguity as a point-wise uncertainty. This uncertainty, modeled by combining the state covariance with the acquisition time and point range, allows us to alleviate the strict time synchronization and to overcome FOV difference. The proposed method has been validated on both public and our datasets and is compatible with various LiDAR manufacturers and scanning patterns. We open-source the code for public access at https://github.com/minwoo0611/MA-LIO.
△ Less
Submitted 7 November, 2023; v1 submitted 26 May, 2023;
originally announced May 2023.
-
A Lightweight CNN-Transformer Model for Learning Traveling Salesman Problems
Authors:
Minseop Jung,
Jaeseung Lee,
Jibum Kim
Abstract:
Several studies have attempted to solve traveling salesman problems (TSPs) using various deep learning techniques. Among them, Transformer-based models show state-of-the-art performance even for large-scale Traveling Salesman Problems (TSPs). However, they are based on fully-connected attention models and suffer from large computational complexity and GPU memory usage. Our work is the first CNN-Tr…
▽ More
Several studies have attempted to solve traveling salesman problems (TSPs) using various deep learning techniques. Among them, Transformer-based models show state-of-the-art performance even for large-scale Traveling Salesman Problems (TSPs). However, they are based on fully-connected attention models and suffer from large computational complexity and GPU memory usage. Our work is the first CNN-Transformer model based on a CNN embedding layer and partial self-attention for TSP. Our CNN-Transformer model is able to better learn spatial features from input data using a CNN embedding layer compared with the standard Transformer-based models. It also removes considerable redundancy in fully-connected attention models using the proposed partial self-attention. Experimental results show that the proposed CNN embedding layer and partial self-attention are very effective in improving performance and computational complexity. The proposed model exhibits the best performance in real-world datasets and outperforms other existing state-of-the-art (SOTA) Transformer-based models in various aspects. Our code is publicly available at https://github.com/cm8908/CNN_Transformer3.
△ Less
Submitted 5 March, 2024; v1 submitted 3 May, 2023;
originally announced May 2023.
-
Beyond Unimodal: Generalising Neural Processes for Multimodal Uncertainty Estimation
Authors:
Myong Chol Jung,
He Zhao,
Joanna Dipnall,
Lan Du
Abstract:
Uncertainty estimation is an important research area to make deep neural networks (DNNs) more trustworthy. While extensive research on uncertainty estimation has been conducted with unimodal data, uncertainty estimation for multimodal data remains a challenge. Neural processes (NPs) have been demonstrated to be an effective uncertainty estimation method for unimodal data by providing the reliabili…
▽ More
Uncertainty estimation is an important research area to make deep neural networks (DNNs) more trustworthy. While extensive research on uncertainty estimation has been conducted with unimodal data, uncertainty estimation for multimodal data remains a challenge. Neural processes (NPs) have been demonstrated to be an effective uncertainty estimation method for unimodal data by providing the reliability of Gaussian processes with efficient and powerful DNNs. While NPs hold significant potential for multimodal uncertainty estimation, the adaptation of NPs for multimodal data has not been carefully studied. To bridge this gap, we propose Multimodal Neural Processes (MNPs) by generalising NPs for multimodal uncertainty estimation. Based on the framework of NPs, MNPs consist of several novel and principled mechanisms tailored to the characteristics of multimodal data. In extensive empirical evaluation, our method achieves state-of-the-art multimodal uncertainty estimation performance, showing its appealing robustness against noisy samples and reliability in out-of-distribution detection with faster computation time compared to the current state-of-the-art multimodal uncertainty estimation method.
△ Less
Submitted 22 October, 2023; v1 submitted 4 April, 2023;
originally announced April 2023.
-
Asynchronous Persistence with ASAP
Authors:
Ahmed Abulila,
Izzat El Hajj,
Myoungsoo Jung,
Nam Sung Kim
Abstract:
Supporting atomic durability of updates for persistent memories is typically achieved with Write-Ahead Logging (WAL). WAL flushes log entries to persistent memory before making the actual data persistent to ensure that a consistent state can be recovered if a crash occurs. Performing WAL in hardware is attractive because it makes most aspects of log management transparent to software, and it compl…
▽ More
Supporting atomic durability of updates for persistent memories is typically achieved with Write-Ahead Logging (WAL). WAL flushes log entries to persistent memory before making the actual data persistent to ensure that a consistent state can be recovered if a crash occurs. Performing WAL in hardware is attractive because it makes most aspects of log management transparent to software, and it completes log persist operations (LPs) and data persist operations (DPs) in the background, overlapping them with the execution of other instructions.
Prior hardware logging solutions commit atomic regions synchronously. Once the end of a region is reached, all outstanding persist operations required for the region to commit must be completed before instruction execution may proceed. For undo logging, LPs and DPs are both performed synchronously to ensure that the region commits synchronously. For redo logging, DPs can be performed asynchronously, but LPs are performed synchronously to ensure that the region commits synchronously. In both cases, waiting for synchronous persist operations (LP or DP) at the end of an atomic region causes atomic regions to incur high latency.
To tackle this limitation, we propose ASAP, a hardware logging solution that allows atomic regions to commit asynchronously. That is, once the end of an atomic region is reached, instruction execution may proceed without waiting for outstanding persist operations to complete. As such, both LPs and DPs can be performed asynchronously. The challenge with allowing atomic regions to commit asynchronously is that it can lead to control and data dependence violations in the commit order of the atomic regions, leaving data in an unrecoverable state in case of a crash. To address this issue, ASAP tracks and enforces control and data dependencies between atomic regions in hardware to ensure that the regions commit in the proper order.
△ Less
Submitted 26 February, 2023;
originally announced February 2023.
-
Liveness score-based regression neural networks for face anti-spoofing
Authors:
Youngjun Kwak,
Minyoung Jung,
Hunjae Yoo,
JinHo Shin,
Changick Kim
Abstract:
Previous anti-spoofing methods have used either pseudo maps or user-defined labels, and the performance of each approach depends on the accuracy of the third party networks generating pseudo maps and the way in which the users define the labels. In this paper, we propose a liveness score-based regression network for overcoming the dependency on third party networks and users. First, we introduce a…
▽ More
Previous anti-spoofing methods have used either pseudo maps or user-defined labels, and the performance of each approach depends on the accuracy of the third party networks generating pseudo maps and the way in which the users define the labels. In this paper, we propose a liveness score-based regression network for overcoming the dependency on third party networks and users. First, we introduce a new labeling technique, called pseudo-discretized label encoding for generating discretized labels indicating the amount of information related to real images. Secondly, we suggest the expected liveness score based on a regression network for training the difference between the proposed supervision and the expected liveness score. Finally, extensive experiments were conducted on four face anti-spoofing benchmarks to verify our proposed method on both intra-and cross-dataset tests. The experimental results show our approach outperforms previous methods.
△ Less
Submitted 20 March, 2023; v1 submitted 18 February, 2023;
originally announced February 2023.
-
Failure Tolerant Training with Persistent Memory Disaggregation over CXL
Authors:
Miryeong Kwon,
Junhyeok Jang,
Hanjin Choi,
Sangwon Lee,
Myoungsoo Jung
Abstract:
This paper proposes TRAININGCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU's memory hierarchy, such that GPU can access PMEM witho…
▽ More
This paper proposes TRAININGCXL that can efficiently process large-scale recommendation datasets in the pool of disaggregated memory while making training fault tolerant with low overhead. To this end, i) we integrate persistent memory (PMEM) and GPU into a cache-coherent domain as Type-2. Enabling CXL allows PMEM to be directly placed in GPU's memory hierarchy, such that GPU can access PMEM without software intervention. TRAININGCXL introduces computing and checkpointing logic near the CXL controller, thereby training data and managing persistency in an active manner. Considering PMEM's vulnerability, ii) we utilize the unique characteristics of recommendation models and take the checkpointing overhead off the critical path of their training. Lastly, iii) TRAININGCXL employs an advanced checkpointing technique that relaxes the updating sequence of model parameters and embeddings across training batches. The evaluation shows that TRAININGCXL achieves 5.2x training performance improvement and 76% energy savings, compared to the modern PMEM-based recommendation systems.
△ Less
Submitted 19 January, 2023; v1 submitted 14 January, 2023;
originally announced January 2023.
-
Universal convex covering problems under translation and discrete rotations
Authors:
Mook Kwon Jung,
Sang Duk Yoon,
Hee-Kap Ahn,
Takeshi Tokuyama
Abstract:
We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $π$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curv…
▽ More
We consider the smallest-area universal covering of planar objects of perimeter 2 (or equivalently closed curves of length 2) allowing translation and discrete rotations. In particular, we show that the solution is an equilateral triangle of height 1 when translation and discrete rotation of $π$ are allowed. Our proof is purely geometric and elementary. We also give convex coverings of closed curves of length 2 under translation and discrete rotations of multiples of $π/2$ and $2π/3$. We show a minimality of the covering for discrete rotation of multiples of $π/2$, which is an equilateral triangle of height smaller than 1, and conjecture that the covering is the smallest-area convex covering. Finally, we give the smallest-area convex coverings of all unit segments under translation and discrete rotations $2π/k$ for all integers $k\ge 3$.
△ Less
Submitted 27 November, 2022;
originally announced November 2022.
-
Blank Collapse: Compressing CTC emission for the faster decoding
Authors:
Minkyu Jung,
Ohhyeok Kwon,
Seunghyun Seo,
Soonshin Seo
Abstract:
Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an external language model like n-gram LM is necessary to obtain reasonable results. In this paper we analyze the blank label in CTC beam search deeply and propose a ve…
▽ More
Connectionist Temporal Classification (CTC) model is a very efficient method for modeling sequences, especially for speech data. In order to use CTC model as an Automatic Speech Recognition (ASR) task, the beam search decoding with an external language model like n-gram LM is necessary to obtain reasonable results. In this paper we analyze the blank label in CTC beam search deeply and propose a very simple method to reduce the amount of calculation resulting in faster beam search decoding speed. With this method, we can get up to 78% faster decoding speed than ordinary beam search decoding with a very small loss of accuracy in LibriSpeech datasets. We prove this method is effective not only practically by experiments but also theoretically by mathematical reasoning. We also observe that this reduction is more obvious if the accuracy of the model is higher.
△ Less
Submitted 26 June, 2023; v1 submitted 30 October, 2022;
originally announced October 2022.
-
AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination
Authors:
Myunghun Jung,
Hoirin Kim
Abstract:
Many recent loss functions in deep metric learning are expressed with logarithmic and exponential forms, and they involve margin and scale as essential hyper-parameters. Since each data class has an intrinsic characteristic, several previous works have tried to learn embedding space close to the real distribution by introducing adaptive margins. However, there was no work on adaptive scales at all…
▽ More
Many recent loss functions in deep metric learning are expressed with logarithmic and exponential forms, and they involve margin and scale as essential hyper-parameters. Since each data class has an intrinsic characteristic, several previous works have tried to learn embedding space close to the real distribution by introducing adaptive margins. However, there was no work on adaptive scales at all. We argue that both margin and scale should be adaptively adjustable during the training. In this paper, we propose a method called Adaptive Margin and Scale (AdaMS), where hyper-parameters of margin and scale are replaced with learnable parameters of adaptive margins and adaptive scales for each class. Our method is evaluated on Wall Street Journal dataset, and we achieve outperforming results for word discrimination tasks.
△ Less
Submitted 23 May, 2023; v1 submitted 26 October, 2022;
originally announced October 2022.
-
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Authors:
Minjoon Jung,
Seongho Choi,
Joochan Kim,
Jin-Hwa Kim,
Byoung-Tak Zhang
Abstract:
Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query. For narrative videos, e.g., dramas or movies, the holistic understanding of temporal dynamics and multimodal reasoning is crucial. Previous works have shown promising results; however, they relied on the expensive query annotations for VCMR, i.e., th…
▽ More
Video corpus moment retrieval (VCMR) is the task to retrieve the most relevant video moment from a large video corpus using a natural language query. For narrative videos, e.g., dramas or movies, the holistic understanding of temporal dynamics and multimodal reasoning is crucial. Previous works have shown promising results; however, they relied on the expensive query annotations for VCMR, i.e., the corresponding moment intervals. To overcome this problem, we propose a self-supervised learning framework: Modal-specific Pseudo Query Generation Network (MPGN). First, MPGN selects candidate temporal moments via subtitle-based moment sampling. Then, it generates pseudo queries exploiting both visual and textual information from the selected temporal moments. Through the multimodal information in the pseudo queries, we show that MPGN successfully learns to localize the video corpus moment without any explicit annotation. We validate the effectiveness of MPGN on the TVR dataset, showing competitive results compared with both supervised models and unsupervised setting models.
△ Less
Submitted 23 October, 2022;
originally announced October 2022.
-
TSP-Bot: Robotic TSP Pen Art using High-DoF Manipulators
Authors:
Daeun Song,
Eunjung Lim,
Jiyoon Park,
Minjung Jung,
Young J. Kim
Abstract:
TSP art is an art form for drawing an image using piecewise-continuous line segments. We present TSP-Bot, a robotic pen drawing system capable of creating complicated TSP pen art on a planar surface using multiple colors. The system begins by converting a colored raster image into a set of points that represent the image's tone, which can be controlled by adjusting the point density. Next, the sys…
▽ More
TSP art is an art form for drawing an image using piecewise-continuous line segments. We present TSP-Bot, a robotic pen drawing system capable of creating complicated TSP pen art on a planar surface using multiple colors. The system begins by converting a colored raster image into a set of points that represent the image's tone, which can be controlled by adjusting the point density. Next, the system finds a piecewise-continuous linear path that visits each point exactly once, which is equivalent to solving a Traveling Salesman Problem (TSP). The path is simplified with fewer points using bounded approximation and smoothed and optimized using Bezier spline curves with bounded curvature. Our robotic drawing system consisting of single or dual manipulators with fingered grippers and a mobile platform performs the drawing task by following the resulting complex and sophisticated path composed of thousands of TSP sites. As a result, our system can draw complicated and visually pleasing TSP pen art.
△ Less
Submitted 10 April, 2024; v1 submitted 14 October, 2022;
originally announced October 2022.
-
Uncertainty Estimation for Multi-view Data: The Power of Seeing the Whole Picture
Authors:
Myong Chol Jung,
He Zhao,
Joanna Dipnall,
Belinda Gabbe,
Lan Du
Abstract:
Uncertainty estimation is essential to make neural networks trustworthy in real-world applications. Extensive research efforts have been made to quantify and reduce predictive uncertainty. However, most existing works are designed for unimodal data, whereas multi-view uncertainty estimation has not been sufficiently investigated. Therefore, we propose a new multi-view classification framework for…
▽ More
Uncertainty estimation is essential to make neural networks trustworthy in real-world applications. Extensive research efforts have been made to quantify and reduce predictive uncertainty. However, most existing works are designed for unimodal data, whereas multi-view uncertainty estimation has not been sufficiently investigated. Therefore, we propose a new multi-view classification framework for better uncertainty estimation and out-of-domain sample detection, where we associate each view with an uncertainty-aware classifier and combine the predictions of all the views in a principled way. The experimental results with real-world datasets demonstrate that our proposed approach is an accurate, reliable, and well-calibrated classifier, which predominantly outperforms the multi-view baselines tested in terms of expected calibration error, robustness to noise, and accuracy for the in-domain sample classification and the out-of-domain sample detection tasks.
△ Less
Submitted 6 October, 2022;
originally announced October 2022.
-
Unveiling the Real Performance of LPDDR5 Memories
Authors:
Lukas Steiner,
Matthias Jung,
Michael Huonker,
Norbert Wehn
Abstract:
LPDDR5 is the latest low-power DRAM standard and expected to be used in various application fields. The vendors have published promising peak bandwidths up to 50 % higher than those of the predecessor LPDDR4. In this paper we evaluate the best-case and worst-case real bandwidth utilization of different LPDDR5 configurations and compare the results to corresponding LPDDR4 configurations. We also sh…
▽ More
LPDDR5 is the latest low-power DRAM standard and expected to be used in various application fields. The vendors have published promising peak bandwidths up to 50 % higher than those of the predecessor LPDDR4. In this paper we evaluate the best-case and worst-case real bandwidth utilization of different LPDDR5 configurations and compare the results to corresponding LPDDR4 configurations. We also show that an upgrade from LPDDR4 to LPDDR5 does not always bring a bandwidth advantage and that some LPDDR5 configurations should be avoided for specific workloads.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
A Framework for Formal Verification of DRAM Controllers
Authors:
Lukas Steiner,
Chirag Sudarshan,
Matthias Jung,
Dominik Stoffel,
Norbert Wehn
Abstract:
The large number of recent JEDEC DRAM standard releases and their increasing feature set makes it difficult for designers to rapidly upgrade the memory controller IPs to each new standard. Especially the hardware verification is challenging due to the higher protocol complexity of standards like DDR5, LPDDR5 or HBM3 in comparison with their predecessors. With traditional simulation-based verificat…
▽ More
The large number of recent JEDEC DRAM standard releases and their increasing feature set makes it difficult for designers to rapidly upgrade the memory controller IPs to each new standard. Especially the hardware verification is challenging due to the higher protocol complexity of standards like DDR5, LPDDR5 or HBM3 in comparison with their predecessors. With traditional simulation-based verification it is laborious to guarantee the coverage of all possible states, especially for control flow rich memory controllers. This has a direct impact on the time-to-market. A promising alternative is formal verification because it allows to ensure protocol compliance based on mathematical proofs. However, with regard to memory controllers no fully-automated verification process has been presented in the state-of-the-art yet, which means there is still a potential risk of human error. In this paper we present a framework that automatically generates SystemVerilog Assertions for a DRAM protocol. In addition, we show how the framework can be used efficiently for different tasks of memory controller development.
△ Less
Submitted 28 September, 2022;
originally announced September 2022.
-
An Empirical Evaluation of Four Off-the-Shelf Proprietary Visual-Inertial Odometry Systems
Authors:
Jungha Kim,
Minkyeong Song,
Yeoeun Lee,
Moonkyeong Jung,
Pyojin Kim
Abstract:
Commercial visual-inertial odometry (VIO) systems have been gaining attention as cost-effective, off-the-shelf six degrees of freedom (6-DoF) ego-motion tracking methods for estimating accurate and consistent camera pose data, in addition to their ability to operate without external localization from motion capture or global positioning systems. It is unclear from existing results, however, which…
▽ More
Commercial visual-inertial odometry (VIO) systems have been gaining attention as cost-effective, off-the-shelf six degrees of freedom (6-DoF) ego-motion tracking methods for estimating accurate and consistent camera pose data, in addition to their ability to operate without external localization from motion capture or global positioning systems. It is unclear from existing results, however, which commercial VIO platforms are the most stable, consistent, and accurate in terms of state estimation for indoor and outdoor robotic applications. We assess four popular proprietary VIO systems (Apple ARKit, Google ARCore, Intel RealSense T265, and Stereolabs ZED 2) through a series of both indoor and outdoor experiments where we show their positioning stability, consistency, and accuracy. We present our complete results as a benchmark comparison for the research community.
△ Less
Submitted 14 July, 2022;
originally announced July 2022.
-
Human-Robot Commensality: Bite Timing Prediction for Robot-Assisted Feeding in Groups
Authors:
Jan Ondras,
Abrar Anwar,
Tong Wu,
Fanjun Bu,
Malte Jung,
Jorge Jose Ortiz,
Tapomayukh Bhattacharjee
Abstract:
We develop data-driven models to predict when a robot should feed during social dining scenarios. Being able to eat independently with friends and family is considered one of the most memorable and important activities for people with mobility limitations. While existing robotic systems for feeding people with mobility limitations focus on solitary dining, commensality, the act of eating together,…
▽ More
We develop data-driven models to predict when a robot should feed during social dining scenarios. Being able to eat independently with friends and family is considered one of the most memorable and important activities for people with mobility limitations. While existing robotic systems for feeding people with mobility limitations focus on solitary dining, commensality, the act of eating together, is often the practice of choice. Sharing meals with others introduces the problem of socially appropriate bite timing for a robot, i.e. the appropriate timing for the robot to feed without disrupting the social dynamics of a shared meal. Our key insight is that bite timing strategies that take into account the delicate balance of social cues can lead to seamless interactions during robot-assisted feeding in a social dining scenario. We approach this problem by collecting a Human-Human Commensality Dataset (HHCD) containing 30 groups of three people eating together. We use this dataset to analyze human-human commensality behaviors and develop bite timing prediction models in social dining scenarios. We also transfer these models to human-robot commensality scenarios. Our user studies show that prediction improves when our algorithm uses multimodal social signaling cues between diners to model bite timing. The HHCD dataset, videos of user studies, and code are available at https://emprise.cs.cornell.edu/hrcom/
△ Less
Submitted 16 November, 2022; v1 submitted 7 July, 2022;
originally announced July 2022.