-
SumRec: A Framework for Recommendation using Open-Domain Dialogue
Authors:
Ryutaro Asahara,
Masaki Takahashi,
Chiho Iwahashi,
Michimasa Inaba
Abstract:
Chat dialogues contain considerable useful information about a speaker's interests, preferences, and experiences.Thus, knowledge from open-domain chat dialogue can be used to personalize various systems and offer recommendations for advanced information.This study proposed a novel framework SumRec for recommending information from open-domain chat dialogue.The study also examined the framework usi…
▽ More
Chat dialogues contain considerable useful information about a speaker's interests, preferences, and experiences.Thus, knowledge from open-domain chat dialogue can be used to personalize various systems and offer recommendations for advanced information.This study proposed a novel framework SumRec for recommending information from open-domain chat dialogue.The study also examined the framework using ChatRec, a newly constructed dataset for training and evaluation. To extract the speaker and item characteristics, the SumRec framework employs a large language model (LLM) to generate a summary of the speaker information from a dialogue and to recommend information about an item according to the type of user.The speaker and item information are then input into a score estimation model, generating a recommendation score.Experimental results show that the SumRec framework provides better recommendations than the baseline method of using dialogues and item descriptions in their original form. Our dataset and code is publicly available at https://github.com/Ryutaro-A/SumRec
△ Less
Submitted 6 February, 2024;
originally announced February 2024.
-
Exploring the Potential of Multi-Modal AI for Driving Hazard Prediction
Authors:
Korawat Charoenpitaks,
Van-Quang Nguyen,
Masanori Suganuma,
Masahiro Takahashi,
Ryoma Niihara,
Takayuki Okatani
Abstract:
This paper addresses the problem of predicting hazards that drivers may encounter while driving a car. We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams. Unlike existing approaches to driving hazard prediction that rely on computational simulations or anomaly detection from videos, this study focuses on high-level inference from stati…
▽ More
This paper addresses the problem of predicting hazards that drivers may encounter while driving a car. We formulate it as a task of anticipating impending accidents using a single input image captured by car dashcams. Unlike existing approaches to driving hazard prediction that rely on computational simulations or anomaly detection from videos, this study focuses on high-level inference from static images. The problem needs predicting and reasoning about future events based on uncertain observations, which falls under visual abductive reasoning. To enable research in this understudied area, a new dataset named the DHPR (Driving Hazard Prediction and Reasoning) dataset is created. The dataset consists of 15K dashcam images of street scenes, and each image is associated with a tuple containing car speed, a hypothesized hazard description, and visual entities present in the scene. These are annotated by human annotators, who identify risky scenes and provide descriptions of potential accidents that could occur a few seconds later. We present several baseline methods and evaluate their performance on our dataset, identifying remaining issues and discussing future directions. This study contributes to the field by introducing a novel problem formulation and dataset, enabling researchers to explore the potential of multi-modal AI for driving hazard prediction.
△ Less
Submitted 1 July, 2024; v1 submitted 6 October, 2023;
originally announced October 2023.
-
Switching Head-Tail Funnel UNITER for Dual Referring Expression Comprehension with Fetch-and-Carry Tasks
Authors:
Ryosuke Korekata,
Motonari Kambara,
Yu Yoshida,
Shintaro Ishikawa,
Yosuke Kawasaki,
Masaki Takahashi,
Komei Sugiura
Abstract:
This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target ob…
▽ More
This paper describes a domestic service robot (DSR) that fetches everyday objects and carries them to specified destinations according to free-form natural language instructions. Given an instruction such as "Move the bottle on the left side of the plate to the empty chair," the DSR is expected to identify the bottle and the chair from multiple candidates in the environment and carry the target object to the destination. Most of the existing multimodal language understanding methods are impractical in terms of computational complexity because they require inferences for all combinations of target object candidates and destination candidates. We propose Switching Head-Tail Funnel UNITER, which solves the task by predicting the target object and the destination individually using a single model. Our method is validated on a newly-built dataset consisting of object manipulation instructions and semi photo-realistic images captured in a standard Embodied AI simulator. The results show that our method outperforms the baseline method in terms of language comprehension accuracy. Furthermore, we conduct physical experiments in which a DSR delivers standardized everyday objects in a standardized domestic environment as requested by instructions with referring expressions. The experimental results show that the object grasping and placing actions are achieved with success rates of more than 90%.
△ Less
Submitted 14 July, 2023;
originally announced July 2023.
-
Motion Capture Dataset for Practical Use of AI-based Motion Editing and Stylization
Authors:
Makito Kobayashi,
Chen-Chieh Liao,
Keito Inoue,
Sentaro Yojima,
Masafumi Takahashi
Abstract:
In this work, we proposed a new style-diverse dataset for the domain of motion style transfer. The motion dataset uses an industrial-standard human bone structure and thus is industry-ready to be plugged into 3D characters for many projects. We claim the challenges in motion style transfer and encourage future work in this domain by releasing the proposed motion dataset both to the public and the…
▽ More
In this work, we proposed a new style-diverse dataset for the domain of motion style transfer. The motion dataset uses an industrial-standard human bone structure and thus is industry-ready to be plugged into 3D characters for many projects. We claim the challenges in motion style transfer and encourage future work in this domain by releasing the proposed motion dataset both to the public and the market. We conduct a comprehensive study on motion style transfer in the experiment using the state-of-the-art method, and the results show the proposed dataset's validity for the motion style transfer task.
△ Less
Submitted 9 July, 2023; v1 submitted 15 June, 2023;
originally announced June 2023.
-
Sketch-based Medical Image Retrieval
Authors:
Kazuma Kobayashi,
Lin Gu,
Ryuichiro Hataya,
Takaaki Mizuno,
Mototaka Miyake,
Hirokazu Watanabe,
Masamichi Takahashi,
Yasuyuki Takamizawa,
Yukihiro Yoshida,
Satoshi Nakamura,
Nobuji Kouno,
Amina Bolatkan,
Yusuke Kurose,
Tatsuya Harada,
Ryuji Hamamoto
Abstract:
The amount of medical images stored in hospitals is increasing faster than ever; however, utilizing the accumulated medical images has been limited. This is because existing content-based medical image retrieval (CBMIR) systems usually require example images to construct query vectors; nevertheless, example images cannot always be prepared. Besides, there can be images with rare characteristics th…
▽ More
The amount of medical images stored in hospitals is increasing faster than ever; however, utilizing the accumulated medical images has been limited. This is because existing content-based medical image retrieval (CBMIR) systems usually require example images to construct query vectors; nevertheless, example images cannot always be prepared. Besides, there can be images with rare characteristics that make it difficult to find similar example images, which we call isolated samples. Here, we introduce a novel sketch-based medical image retrieval (SBMIR) system that enables users to find images of interest without example images. The key idea lies in feature decomposition of medical images, whereby the entire feature of a medical image can be decomposed into and reconstructed from normal and abnormal features. By extending this idea, our SBMIR system provides an easy-to-use two-step graphical user interface: users first select a template image to specify a normal feature and then draw a semantic sketch of the disease on the template image to represent an abnormal feature. Subsequently, it integrates the two kinds of input to construct a query vector and retrieves reference images with the closest reference vectors. Using two datasets, ten healthcare professionals with various clinical backgrounds participated in the user test for evaluation. As a result, our SBMIR system enabled users to overcome previous challenges, including image retrieval based on fine-grained image characteristics, image retrieval without example images, and image retrieval for isolated samples. Our SBMIR system achieves flexible medical image retrieval on demand, thereby expanding the utility of medical image databases.
△ Less
Submitted 6 March, 2023;
originally announced March 2023.
-
UnrealEgo: A New Dataset for Robust Egocentric 3D Human Motion Capture
Authors:
Hiroyasu Akada,
Jian Wang,
Soshi Shimada,
Masaki Takahashi,
Christian Theobalt,
Vladislav Golyanik
Abstract:
We present UnrealEgo, i.e., a new large-scale naturalistic dataset for egocentric 3D human pose estimation. UnrealEgo is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments. We design their virtual prototype and attach them to 3D human models for stereo view capture. We next generate a large corpus of human motions. As a conse…
▽ More
We present UnrealEgo, i.e., a new large-scale naturalistic dataset for egocentric 3D human pose estimation. UnrealEgo is based on an advanced concept of eyeglasses equipped with two fisheye cameras that can be used in unconstrained environments. We design their virtual prototype and attach them to 3D human models for stereo view capture. We next generate a large corpus of human motions. As a consequence, UnrealEgo is the first dataset to provide in-the-wild stereo images with the largest variety of motions among existing egocentric datasets. Furthermore, we propose a new benchmark method with a simple but effective idea of devising a 2D keypoint estimation module for stereo inputs to improve 3D human pose estimation. The extensive experiments show that our approach outperforms the previous state-of-the-art methods qualitatively and quantitatively. UnrealEgo and our source codes are available on our project web page.
△ Less
Submitted 2 August, 2022;
originally announced August 2022.
-
Independent set reconfiguration on directed graphs
Authors:
Takehiro Ito,
Yuni Iwamasa,
Yasuaki Kobayashi,
Yu Nakahata,
Yota Otachi,
Masahiro Takahashi,
Kunihiro Wasa
Abstract:
\textsc{Directed Token Sliding} asks, given a directed graph and two sets of pairwise nonadjacent vertices, whether one can reach from one set to the other by repeatedly applying a local operation that exchanges a vertex in the current set with one of its out-neighbors, while keeping the nonadjacency. It can be seen as a reconfiguration process where a token is placed on each vertex in the current…
▽ More
\textsc{Directed Token Sliding} asks, given a directed graph and two sets of pairwise nonadjacent vertices, whether one can reach from one set to the other by repeatedly applying a local operation that exchanges a vertex in the current set with one of its out-neighbors, while keeping the nonadjacency. It can be seen as a reconfiguration process where a token is placed on each vertex in the current set, and the local operation slides a token along an arc respecting its direction. Previously, such a problem was extensively studied on undirected graphs, where the edges have no directions and thus the local operation is symmetric. \textsc{Directed Token Sliding} is a generalization of its undirected variant since an undirected edge can be simulated by two arcs of opposite directions.
In this paper, we initiate the algorithmic study of \textsc{Directed Token Sliding}. We first observe that the problem is PSPACE-complete even if we forbid parallel arcs in opposite directions and that the problem on directed acyclic graphs is NP-complete and W[1]-hard parameterized by the size of the sets in consideration. We then show our main result: a linear-time algorithm for the problem on directed graphs whose underlying undirected graphs are trees, which are called polytrees. Such a result is also known for the undirected variant of the problem on trees~[Demaine et al.~TCS 2015], but the techniques used here are quite different because of the asymmetric nature of the directed problem. We present a characterization of yes-instances based on the existence of a certain set of directed paths, and then derive simple equivalent conditions from it by some observations, which admits an efficient algorithm. For the polytree case, we also present a quadratic-time algorithm that outputs, if the input is a yes-instance, one of the shortest reconfiguration sequences.
△ Less
Submitted 24 March, 2022;
originally announced March 2022.
-
Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
Authors:
Yuichiro Koyama,
Kazuhide Shigemi,
Masafumi Takahashi,
Kazuki Shimada,
Naoya Takahashi,
Emiru Tsunoo,
Shusuke Takahashi,
Yuki Mitsufuji
Abstract:
Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events…
▽ More
Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events make it difficult to accurately extract spatial characteristics from target sound events. To address this problem, we propose an impulse response simulation framework (IRS) that augments spatial characteristics using simulated room impulse responses (RIR). RIRs corresponding to a microphone array assumed to be placed in various rooms are accurately simulated, and the source signals of the target sound events are extracted from a mixture. The simulated RIRs are then convolved with the extracted source signals to obtain an augmented multi-channel training dataset. Evaluation results obtained using the TAU-NIGENS Spatial Sound Events 2021 dataset show that the IRS contributes to improving the overall SELD performance. Additionally, we conducted an ablation study to discuss the contribution and need for each component within the IRS.
△ Less
Submitted 28 April, 2022; v1 submitted 13 October, 2021;
originally announced October 2021.
-
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
Authors:
Kazuki Shimada,
Naoya Takahashi,
Yuichiro Koyama,
Shusuke Takahashi,
Emiru Tsunoo,
Masafumi Takahashi,
Yuki Mitsufuji
Abstract:
This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference. Our previous system based on activity-coupled Cartesian direction of arrival (ACCDOA) representation enables us to solve a SELD task with a single target. This ACCDOA-based system with efficient network architecture called RD3Net and data augme…
▽ More
This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference. Our previous system based on activity-coupled Cartesian direction of arrival (ACCDOA) representation enables us to solve a SELD task with a single target. This ACCDOA-based system with efficient network architecture called RD3Net and data augmentation techniques outperformed state-of-the-art SELD systems in terms of localization and location-dependent detection. Using the ACCDOA-based system as a base, we perform model ensembles by averaging outputs of several systems trained with different conditions such as input features, training folds, and model architectures. We also use the event independent network v2 (EINV2)-based system to increase the diversity of the model ensembles. To generalize the models, we further propose impulse response simulation (IRS), which generates simulated multi-channel signals by convolving simulated room impulse responses (RIRs) with source signals extracted from the original dataset. Our systems significantly improved over the baseline system on the development dataset.
△ Less
Submitted 20 June, 2021;
originally announced June 2021.
-
Decomposing Normal and Abnormal Features of Medical Images into Discrete Latent Codes for Content-Based Image Retrieval
Authors:
Kazuma Kobayashi,
Ryuichiro Hataya,
Yusuke Kurose,
Mototaka Miyake,
Masamichi Takahashi,
Akiko Nakagawa,
Tatsuya Harada,
Ryuji Hamamoto
Abstract:
In medical imaging, the characteristics purely derived from a disease should reflect the extent to which abnormal findings deviate from the normal features. Indeed, physicians often need corresponding images without abnormal findings of interest or, conversely, images that contain similar abnormal findings regardless of normal anatomical context. This is called comparative diagnostic reading of me…
▽ More
In medical imaging, the characteristics purely derived from a disease should reflect the extent to which abnormal findings deviate from the normal features. Indeed, physicians often need corresponding images without abnormal findings of interest or, conversely, images that contain similar abnormal findings regardless of normal anatomical context. This is called comparative diagnostic reading of medical images, which is essential for a correct diagnosis. To support comparative diagnostic reading, content-based image retrieval (CBIR), which can selectively utilize normal and abnormal features in medical images as two separable semantic components, will be useful. Therefore, we propose a neural network architecture to decompose the semantic components of medical images into two latent codes: normal anatomy code and abnormal anatomy code. The normal anatomy code represents normal anatomies that should have existed if the sample is healthy, whereas the abnormal anatomy code attributes to abnormal changes that reflect deviation from the normal baseline. These latent codes are discretized through vector quantization to enable binary hashing, which can reduce the computational burden at the time of similarity search. By calculating the similarity based on either normal or abnormal anatomy codes or the combination of the two codes, our algorithm can retrieve images according to the selected semantic component from a dataset consisting of brain magnetic resonance images of gliomas. Our CBIR system qualitatively and quantitatively achieves remarkable results.
△ Less
Submitted 23 March, 2021;
originally announced March 2021.
-
Expandable YOLO: 3D Object Detection from RGB-D Images
Authors:
Masahiro Takahashi,
Alessandro Moro,
Yonghoon Ji,
Kazunori Umeda
Abstract:
This paper aims at constructing a light-weight object detector that inputs a depth and a color image from a stereo camera. Specifically, by extending the network architecture of YOLOv3 to 3D in the middle, it is possible to output in the depth direction. In addition, Intersection over Uninon (IoU) in 3D space is introduced to confirm the accuracy of region extraction results. In the field of deep…
▽ More
This paper aims at constructing a light-weight object detector that inputs a depth and a color image from a stereo camera. Specifically, by extending the network architecture of YOLOv3 to 3D in the middle, it is possible to output in the depth direction. In addition, Intersection over Uninon (IoU) in 3D space is introduced to confirm the accuracy of region extraction results. In the field of deep learning, object detectors that use distance information as input are actively studied for utilizing automated driving. However, the conventional detector has a large network structure, and the real-time property is impaired. The effectiveness of the detector constructed as described above is verified using datasets. As a result of this experiment, the proposed model is able to output 3D bounding boxes and detect people whose part of the body is hidden. Further, the processing speed of the model is 44.35 fps.
△ Less
Submitted 26 June, 2020;
originally announced June 2020.
-
Learning Global and Local Features of Normal Brain Anatomy for Unsupervised Abnormality Detection
Authors:
Kazuma Kobayashi,
Ryuichiro Hataya,
Yusuke Kurose,
Amina Bolatkan,
Mototaka Miyake,
Hirokazu Watanabe,
Masamichi Takahashi,
Jun Itami,
Tatsuya Harada,
Ryuji Hamamoto
Abstract:
In real-world clinical practice, overlooking unanticipated findings can result in serious consequences. However, supervised learning, which is the foundation for the current success of deep learning, only encourages models to identify abnormalities that are defined in datasets in advance. Therefore, abnormality detection must be implemented in medical images that are not limited to a specific dise…
▽ More
In real-world clinical practice, overlooking unanticipated findings can result in serious consequences. However, supervised learning, which is the foundation for the current success of deep learning, only encourages models to identify abnormalities that are defined in datasets in advance. Therefore, abnormality detection must be implemented in medical images that are not limited to a specific disease category. In this study, we demonstrate an unsupervised learning framework for pixel-wise abnormality detection in brain magnetic resonance imaging captured from a patient population with metastatic brain tumor. Our concept is as follows: If an image reconstruction network can faithfully reproduce the global features of normal anatomy, then the abnormal lesions in unseen images can be identified based on the local difference from those reconstructed as normal by a discriminative network. Both networks are trained on a dataset comprising only normal images without labels. In addition, we devise a metric to evaluate the anatomical fidelity of the reconstructed images and confirm that the overall detection performance is improved when the image reconstruction network achieves a higher score. For evaluation, clinically significant abnormalities are comprehensively segmented. The results show that the area under the receiver operating characteristics curve values for metastatic brain tumors, extracranial metastatic tumors, postoperative cavities, and structural changes are 0.78, 0.61, 0.91, and 0.60, respectively.
△ Less
Submitted 8 May, 2021; v1 submitted 26 May, 2020;
originally announced May 2020.
-
AptaTRACE: Elucidating Sequence-Structure Binding Motifs by Uncovering Selection Trends in HT-SELEX Experiments
Authors:
Phuong Dao,
Jan Hoinka,
Yijie Wang,
Mayumi Takahashi,
Jiehua Zhou,
Fabrizio Costa,
John Rossi,
John Burnett,
Rolf Backofen,
Teresa M. Przytycka
Abstract:
Aptamers, short synthetic RNA/DNA molecules binding specific targets with high affinity and specificity, are utilized in an increasing spectrum of bio-medical applications. Aptamers are identified in vitro via the Systematic Evolution of Ligands by Exponential Enrichment (SELEX) protocol. SELEX selects binders through an iterative process that, starting from a pool of random ssDNA/RNA sequences, a…
▽ More
Aptamers, short synthetic RNA/DNA molecules binding specific targets with high affinity and specificity, are utilized in an increasing spectrum of bio-medical applications. Aptamers are identified in vitro via the Systematic Evolution of Ligands by Exponential Enrichment (SELEX) protocol. SELEX selects binders through an iterative process that, starting from a pool of random ssDNA/RNA sequences, amplifies target-affine species through a series of selection cycles. HT-SELEX, which combines SELEX with high throughput sequencing, has recently transformed aptamer development and has opened the field to even more applications. HT-SELEX is capable of generating over half a billion data points, challenging computational scientists with the task of identifying aptamer properties such as sequence structure motifs that determine binding. While currently available motif finding approaches suggest partial solutions to this question, none possess the generality or scalability required for HT-SELEX data, and they do not take advantage of important properties of the experimental procedure.
We present AptaTRACE, a novel approach for the identification of sequence-structure binding motifs in HT-SELEX derived aptamers. Our approach leverages the experimental design of the SELEX protocol and identifies sequence-structure motifs that show a signature of selection. Because of its unique approach, AptaTRACE can uncover motifs even when these are present in only a minuscule fraction of the pool. Due to these features, our method can help to reduce the number of selection cycles required to produce aptamers with the desired properties, thus reducing cost and time of this rather expensive procedure. The performance of the method on simulated and real data indicates that AptaTRACE can detect sequence-structure motifs even in highly challenging data.
△ Less
Submitted 5 April, 2016;
originally announced April 2016.