Skip to main content

Showing 1–42 of 42 results for author: Shimada, K

  1. arXiv:2402.04542  [pdf, other

    cs.CL

    Share What You Already Know: Cross-Language-Script Transfer and Alignment for Sentiment Detection in Code-Mixed Data

    Authors: Niraj Pahari, Kazutaka Shimada

    Abstract: Code-switching entails mixing multiple languages. It is an increasingly occurring phenomenon in social media texts. Usually, code-mixed texts are written in a single script, even though the languages involved have different scripts. Pre-trained multilingual models primarily utilize the data in the native script of the language. In existing studies, the code-switched texts are utilized as they are.… ▽ More

    Submitted 6 February, 2024; originally announced February 2024.

  2. arXiv:2401.00365  [pdf, other

    cs.LG cs.AI cs.CV

    HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

    Authors: Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki Mitsufuji

    Abstract: Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity reconstructions. However, such hierarchical extensions of VQ-VAE often suffer from the codebook/layer collapse issue, where the co… ▽ More

    Submitted 28 March, 2024; v1 submitted 30 December, 2023; originally announced January 2024.

    Comments: 34 pages with 17 figures, accepted for TMLR

  3. arXiv:2309.09223  [pdf, other

    cs.SD eess.AS

    Zero- and Few-shot Sound Event Localization and Detection

    Authors: Kazuki Shimada, Kengo Uchida, Yuichiro Koyama, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji, Tatsuya Kawahara

    Abstract: Sound event localization and detection (SELD) systems estimate direction-of-arrival (DOA) and temporal activation for sets of target classes. Neural network (NN)-based SELD systems have performed well in various sets of target classes, but they only output the DOA and temporal activation of preset classes trained before inference. To customize target classes after training, we tackle zero- and few… ▽ More

    Submitted 17 January, 2024; v1 submitted 17 September, 2023; originally announced September 2023.

    Comments: 5 pages, 4 figures, accepted for publication in IEEE ICASSP 2024

  4. arXiv:2309.09121  [pdf, other

    cs.RO

    Heuristic-based Incremental Probabilistic Roadmap for Efficient UAV Exploration in Dynamic Environments

    Authors: Zhefan Xu, Christopher Suzuki, Xiaoyang Zhan, Kenji Shimada

    Abstract: Autonomous exploration in dynamic environments necessitates a planner that can proactively respond to changes and make efficient and safe decisions for robots. Although plenty of sampling-based works have shown success in exploring static environments, their inherent sampling randomness and limited utilization of previous samples often result in sub-optimal exploration efficiency. Additionally, mo… ▽ More

    Submitted 16 September, 2023; originally announced September 2023.

  5. arXiv:2309.08544  [pdf, other

    cs.RO

    Quadcopter Trajectory Time Minimization and Robust Collision Avoidance via Optimal Time Allocation

    Authors: Zhefan Xu, Kenji Shimada

    Abstract: Autonomous navigation requires robots to generate trajectories for collision avoidance efficiently. Although plenty of previous works have proven successful in generating smooth and spatially collision-free trajectories, their solutions often suffer from suboptimal time efficiency and potential unsafety, particularly when accounting for uncertainties in robot perception and control. To address thi… ▽ More

    Submitted 15 September, 2023; originally announced September 2023.

  6. arXiv:2306.09126  [pdf, other

    cs.SD cs.CV cs.MM eess.AS eess.IV

    STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

    Authors: Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

    Abstract: While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper proposes an audio-visual sound event localization and detection (SELD) task, which uses multichannel audio and video information… ▽ More

    Submitted 14 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 27 pages, 9 figures, accepted for publication in NeurIPS 2023 Track on Datasets and Benchmarks

  7. arXiv:2305.10734  [pdf, other

    cs.SD cs.CL eess.AS

    Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders

    Authors: Hao Shi, Kazuki Shimada, Masato Hirano, Takashi Shibuya, Yuichiro Koyama, Zhi Zhong, Shusuke Takahashi, Tatsuya Kawahara, Yuki Mitsufuji

    Abstract: Diffusion-based generative speech enhancement (SE) has recently received attention, but reverse diffusion remains time-consuming. One solution is to initialize the reverse diffusion process with enhanced features estimated by a predictive SE system. However, the pipeline structure currently does not consider for a combined use of generative and predictive decoders. The predictive decoder allows us… ▽ More

    Submitted 28 February, 2024; v1 submitted 18 May, 2023; originally announced May 2023.

  8. arXiv:2305.06701  [pdf, ps, other

    cs.SD eess.AS

    Extending Audio Masked Autoencoders Toward Audio Restoration

    Authors: Zhi Zhong, Hao Shi, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Audio classification and restoration are among major downstream tasks in audio signal processing. However, restoration derives less of a benefit from pretrained models compared to the overwhelming success of pretrained models in classification tasks. Due to such unbalanced benefits, there has been rising interest in how to improve the performance of pretrained models for restoration tasks, e.g., s… ▽ More

    Submitted 17 August, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

    Comments: WASPAA 2023.Copyright 2023 IEEE.Personal use of this material is permitted.Permission from IEEE must be obtained for all other uses,in any current or future media,including reprinting/republishing this material for advertising or promotional purposes, creating new collective works,for resale or redistribution to servers or lists,or reuse of any copyrighted component of this work in other works

  9. arXiv:2305.05857  [pdf, other

    eess.AS cs.SD

    Diffusion-based Signal Refiner for Speech Separation

    Authors: Masato Hirano, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: We have developed a diffusion-based speech refiner that improves the reference-free perceptual quality of the audio predicted by preceding single-channel speech separation models. Although modern deep neural network-based speech separation models have show high performance in reference-based metrics, they often produce perceptually unnatural artifacts. The recent advancements made to diffusion mod… ▽ More

    Submitted 12 May, 2023; v1 submitted 9 May, 2023; originally announced May 2023.

    Comments: Under review

  10. Onboard dynamic-object detection and tracking for autonomous robot navigation with RGB-D camera

    Authors: Zhefan Xu, Xiaoyang Zhan, Yumeng Xiu, Christopher Suzuki, Kenji Shimada

    Abstract: Deploying autonomous robots in crowded indoor environments usually requires them to have accurate dynamic obstacle perception. Although plenty of previous works in the autonomous driving field have investigated the 3D object detection problem, the usage of dense point clouds from a heavy Light Detection and Ranging (LiDAR) sensor and their high computation cost for learning-based data processing m… ▽ More

    Submitted 23 November, 2023; v1 submitted 28 February, 2023; originally announced March 2023.

    Comments: 8 pages, 10 figures, 2 tables

    Journal ref: IEEE Robotics and Automation Letters, Volume: 9, Issue: 1, January 2024. Page(s): 651 - 658

  11. arXiv:2302.08136  [pdf, ps, other

    cs.SD eess.AS

    An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification

    Authors: Zhi Zhong, Masato Hirano, Kazuki Shimada, Kazuya Tateishi, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Although music is typically multi-label, many works have studied hierarchical music tagging with simplified settings such as single-label data. Moreover, there lacks a framework to describe various joint training methods under the multi-label setting. In order to discuss the above topics, we introduce hierarchical multi-label music instrument classification task. The task provides a realistic sett… ▽ More

    Submitted 16 February, 2023; originally announced February 2023.

    Comments: To appear at ICASSP 2023

  12. A vision-based autonomous UAV inspection framework for unknown tunnel construction sites with dynamic obstacles

    Authors: Zhefan Xu, Baihan Chen, Xiaoyang Zhan, Yumeng Xiu, Christopher Suzuki, Kenji Shimada

    Abstract: Tunnel construction using the drill-and-blast method requires the 3D measurement of the excavation front to evaluate underbreak locations. Considering the inspection and measurement task's safety, cost, and efficiency, deploying lightweight autonomous robots, such as unmanned aerial vehicles (UAV), becomes more necessary and popular. Most of the previous works use a prior map for inspection viewpo… ▽ More

    Submitted 12 January, 2024; v1 submitted 19 January, 2023; originally announced January 2023.

    Comments: 8 pages, 8 figures

    Journal ref: IEEE Robotics and Automation Letters, Volume: 8, Issue: 8, June 2023. Page(s): 4983 - 4990

  13. Component Segmentation of Engineering Drawings Using Graph Convolutional Networks

    Authors: Wentai Zhang, Joe Joseph, Yue Yin, Liuyue Xie, Tomotake Furuhata, Soji Yamakawa, Kenji Shimada, Levent Burak Kara

    Abstract: We present a data-driven framework to automate the vectorization and machine interpretation of 2D engineering part drawings. In industrial settings, most manufacturing engineers still rely on manual reads to identify the topological and manufacturing requirements from drawings submitted by designers. The interpretation process is laborious and time-consuming, which severely inhibits the efficiency… ▽ More

    Submitted 14 March, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

    Comments: Preprint accepted to Computers in Industry

  14. A real-time dynamic obstacle tracking and mapping system for UAV navigation and collision avoidance with an RGB-D camera

    Authors: Zhefan Xu, Xiaoyang Zhan, Baihan Chen, Yumeng Xiu, Chenhao Yang, Kenji Shimada

    Abstract: The real-time dynamic environment perception has become vital for autonomous robots in crowded spaces. Although the popular voxel-based mapping methods can efficiently represent 3D obstacles with arbitrarily complex shapes, they can hardly distinguish between static and dynamic obstacles, leading to the limited performance of obstacle avoidance. While plenty of sophisticated learning-based dynamic… ▽ More

    Submitted 12 January, 2024; v1 submitted 17 September, 2022; originally announced September 2022.

    Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

  15. Vision-aided UAV navigation and dynamic obstacle avoidance using gradient-based B-spline trajectory optimization

    Authors: Zhefan Xu, Yumeng Xiu, Xiaoyang Zhan, Baihan Chen, Kenji Shimada

    Abstract: Navigating dynamic environments requires the robot to generate collision-free trajectories and actively avoid moving obstacles. Most previous works designed path planning algorithms based on one single map representation, such as the geometric, occupancy, or ESDF map. Although they have shown success in static environments, due to the limitation of map representation, those methods cannot reliably… ▽ More

    Submitted 12 January, 2024; v1 submitted 14 September, 2022; originally announced September 2022.

    Journal ref: 2023 IEEE International Conference on Robotics and Automation (ICRA)

  16. Robotic Depowdering for Additive Manufacturing Via Pose Tracking

    Authors: Zhenwei Liu, Junyi Geng, Xikai Dai, Tomasz Swierzewski, Kenji Shimada

    Abstract: With the rapid development of powder-based additive manufacturing, depowdering, a process of removing unfused powder that covers 3D-printed parts, has become a major bottleneck to further improve its productiveness. Traditional manual depowdering is extremely time-consuming and costly, and some prior automated systems either require pre-depowdering or lack adaptability to different 3D-printed part… ▽ More

    Submitted 4 September, 2022; v1 submitted 9 July, 2022; originally announced July 2022.

    Comments: Video link: https://www.youtube.com/watch?v=AUIkyULAhqM

    Journal ref: 2022 IEEE Robotics and Automation Letters

  17. arXiv:2206.01948  [pdf, other

    eess.AS cs.SD

    STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

    Authors: Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

    Abstract: This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, first-order Ambisonics and tetrahedral microphone arr… ▽ More

    Submitted 2 September, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

  18. LCP-dropout: Compression-based Multiple Subword Segmentation for Neural Machine Translation

    Authors: Keita Nonaka, Kazutaka Yamanouchi, Tomohiro I, Tsuyoshi Okita, Kazutaka Shimada, Hiroshi Sakamoto

    Abstract: In this study, we propose a simple and effective preprocessing method for subword segmentation based on a data compression algorithm. Compression-based subword segmentation has recently attracted significant attention as a preprocessing method for training data in Neural Machine Translation. Among them, BPE/BPE-dropout is one of the fastest and most effective method compared to conventional approa… ▽ More

    Submitted 19 March, 2022; v1 submitted 28 February, 2022; originally announced February 2022.

    Comments: 12 pages

    Journal ref: Electronics 11(7), Article number 1014, 2022

  19. Analysis of Leading Communities Contributing to arXiv Information Distribution on Twitter

    Authors: Kyosuke Shimada, Kazuhiro Kazama, Mitsuo Yoshida, Ikki Ohmukai, Sho Sato

    Abstract: To analyze the impact that arXiv is having on the world, in this paper we propose an arXiv information distribution model on Twitter, which has a three-layer structure: arXiv papers, information spreaders, and information collectors. First, we use the HITS algorithm to analyze the arXiv information diffusion network with users as nodes, which is created from three types of behavior on Twitter rega… ▽ More

    Submitted 15 December, 2021; originally announced December 2021.

    Comments: The 20th IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT '21)

  20. arXiv:2110.07124  [pdf, other

    eess.AS cs.SD

    Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training

    Authors: Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Naoya Takahashi, Emiru Tsunoo, Yuki Mitsufuji

    Abstract: Sound event localization and detection (SELD) involves identifying the direction-of-arrival (DOA) and the event class. The SELD methods with a class-wise output format make the model predict activities of all sound event classes and corresponding locations. The class-wise methods can output activity-coupled Cartesian DOA (ACCDOA) vectors, which enable us to solve a SELD task with a single target u… ▽ More

    Submitted 27 March, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 3 figures, accepted for publication in IEEE ICASSP 2022

  21. arXiv:2110.06501  [pdf, other

    cs.SD eess.AS

    Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection

    Authors: Yuichiro Koyama, Kazuhide Shigemi, Masafumi Takahashi, Kazuki Shimada, Naoya Takahashi, Emiru Tsunoo, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Recording and annotating real sound events for a sound event localization and detection (SELD) task is time consuming, and data augmentation techniques are often favored when the amount of data is limited. However, how to augment the spatial information in a dataset, including unlabeled directional interference events, remains an open research question. Furthermore, directional interference events… ▽ More

    Submitted 28 April, 2022; v1 submitted 13 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, accepted for publication in IEEE ICASSP 2022

  22. Spatial mixup: Directional loudness modification as data augmentation for sound event localization and detection

    Authors: Ricardo Falcon-Perez, Kazuki Shimada, Yuichiro Koyama, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Data augmentation methods have shown great importance in diverse supervised learning problems where labeled data is scarce or costly to obtain. For sound event localization and detection (SELD) tasks several augmentation methods have been proposed, with most borrowing ideas from other domains such as images, speech, or monophonic audio. However, only a few exploit the spatial properties of a full… ▽ More

    Submitted 12 October, 2021; originally announced October 2021.

    Comments: 5 pages, 2 figures, 4 tables. Submitted to the 2022 International Conference on Acoustics, Speech, & Signal Processing (ICASSP)

  23. arXiv:2109.07024  [pdf, other

    cs.RO cs.AI

    DPMPC-Planner: A real-time UAV trajectory planning framework for complex static environments with dynamic obstacles

    Authors: Zhefan Xu, Di Deng, Yiping Dong, Kenji Shimada

    Abstract: Safe UAV navigation is challenging due to the complex environment structures, dynamic obstacles, and uncertainties from measurement noises and unpredictable moving obstacle behaviors. Although plenty of recent works achieve safe navigation in complex static environments with sophisticated mapping algorithms, such as occupancy map and ESDF map, these methods cannot reliably handle dynamic environme… ▽ More

    Submitted 12 March, 2022; v1 submitted 14 September, 2021; originally announced September 2021.

    Comments: 7pages, 8 figures

    Journal ref: 2022 IEEE International Conference on Robotics and Automation (ICRA)

  24. arXiv:2106.10806  [pdf, other

    eess.AS cs.SD

    Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

    Authors: Kazuki Shimada, Naoya Takahashi, Yuichiro Koyama, Shusuke Takahashi, Emiru Tsunoo, Masafumi Takahashi, Yuki Mitsufuji

    Abstract: This report describes our systems submitted to the DCASE2021 challenge task 3: sound event localization and detection (SELD) with directional interference. Our previous system based on activity-coupled Cartesian direction of arrival (ACCDOA) representation enables us to solve a SELD task with a single target. This ACCDOA-based system with efficient network architecture called RD3Net and data augme… ▽ More

    Submitted 20 June, 2021; originally announced June 2021.

    Comments: 5 pages, 3 figures, submitted to DCASE2021 task3

  25. arXiv:2105.05163  [pdf, other

    cs.IT

    An Efficient Bayes Coding Algorithm for the Non-Stationary Source in Which Context Tree Model Varies from Interval to Interval

    Authors: Koshi Shimada, Shota Saito, Toshiyasu Matsushima

    Abstract: The context tree source is a source model in which the occurrence probability of symbols is determined from a finite past sequence, and is a broader class of sources that includes i.i.d. and Markov sources. The proposed source model in this paper represents that a subsequence in each interval is generated from a different context tree model. The Bayes code for such sources requires weighting of th… ▽ More

    Submitted 13 May, 2021; v1 submitted 11 May, 2021; originally announced May 2021.

  26. Airfoil GAN: Encoding and Synthesizing Airfoils for Aerodynamic Shape Optimization

    Authors: Yuyang Wang, Kenji Shimada, Amir Barati Farimani

    Abstract: The current design of aerodynamic shapes, like airfoils, involves computationally intensive simulations to explore the possible design space. Usually, such design relies on the prior definition of design parameters and places restrictions on synthesizing novel shapes. In this work, we propose a data-driven shape encoding and generating method, which automatically learns representations from existi… ▽ More

    Submitted 6 July, 2023; v1 submitted 12 January, 2021; originally announced January 2021.

    Comments: Published in Journal of Computational Design and Engineering. 13 pages, 13 figures, 1 table

  27. arXiv:2011.05323  [pdf, other

    cs.RO

    Robotic Exploration of Unknown 2D Environment Using a Frontier-based Automatic-Differentiable Information Gain Measure

    Authors: Di Deng, Runlin Duan, Jiahong Liu, Kuangjie Sheng, Kenji Shimada

    Abstract: At the heart of path-planning methods for autonomous robotic exploration is a heuristic which encourages exploring unknown regions of the environment. Such heuristics are typically computed using frontier-based or information-theoretic methods. Frontier-based methods define the information gain of an exploration path as the number of boundary cells, or frontiers, which are visible from the path. H… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  28. arXiv:2011.05288  [pdf, other

    cs.RO

    Frontier-based Automatic-differentiable Information Gain Measure for Robotic Exploration of Unknown 3D Environments

    Authors: Di Deng, Zhefan Xu, Wenbo Zhao, Kenji Shimada

    Abstract: The path planning problem for autonomous exploration of an unknown region by a robotic agent typically employs frontier-based or information-theoretic heuristics. Frontier-based heuristics typically evaluate the information gain of a viewpoint by the number of visible frontier voxels, which is a discrete measure that can only be optimized by sampling. On the other hand, information-theoretic heuri… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  29. arXiv:2011.05275  [pdf, other

    cs.RO

    Coordinated Aerial-Ground Robot Exploration via Monte-Carlo View Quality Rendering

    Authors: Di Deng, Zhefan Xu, Wenbo Zhao, Kenji Shimada

    Abstract: We present a framework for a ground-aerial robotic team to explore large, unstructured, and unknown environments. In such exploration problems, the effectiveness of existing exploration-boosting heuristics often scales poorly with the environments' size and complexity. This work proposes a novel framework combining incremental frontier distribution, goal selection with Monte-Carlo view quality ren… ▽ More

    Submitted 10 November, 2020; originally announced November 2020.

  30. arXiv:2010.15306  [pdf, other

    eess.AS cs.SD

    ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection

    Authors: Kazuki Shimada, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Neural-network (NN)-based methods show high performance in sound event localization and detection (SELD). Conventional NN-based methods use two branches for a sound event detection (SED) target and a direction-of-arrival (DOA) target. The two-branch representation with a single network has to decide how to balance the two objectives during optimization. Using two networks dedicated to each task in… ▽ More

    Submitted 14 February, 2021; v1 submitted 28 October, 2020; originally announced October 2020.

    Comments: 5 pages, 5 figures, accepted for publication in IEEE ICASSP 2021

  31. Autonomous UAV Exploration of Dynamic Environments via Incremental Sampling and Probabilistic Roadmap

    Authors: Zhefan Xu, Di Deng, Kenji Shimada

    Abstract: Autonomous exploration requires robots to generate informative trajectories iteratively. Although sampling-based methods are highly efficient in unmanned aerial vehicle exploration, many of these methods do not effectively utilize the sampled information from the previous planning iterations, leading to redundant computation and longer exploration time. Also, few have explicitly shown their explor… ▽ More

    Submitted 20 March, 2021; v1 submitted 14 October, 2020; originally announced October 2020.

    Comments: 8 Pages, 9 Figures, and 5 Tables. Video Link: https://youtu.be/ileyP4DRBjU. Github Link: https://github.com/Zhefan-Xu/DEP

    Journal ref: IEEE Robotics and Automation Letters, Volume: 6, Issue: 2, April 2021. Page(s): 2729 - 2736

  32. arXiv:2009.08924  [pdf, other

    cs.CV

    Multi-Resolution Graph Neural Network for Large-Scale Pointcloud Segmentation

    Authors: Liuyue Xie, Tomotake Furuhata, Kenji Shimada

    Abstract: In this paper, we propose a multi-resolution deep-learning architecture to semantically segment dense large-scale pointclouds. Dense pointcloud data require a computationally expensive feature encoding process before semantic segmentation. Previous work has used different approaches to drastically downsample from the original pointcloud so common computing hardware can be utilized. While these app… ▽ More

    Submitted 18 September, 2020; originally announced September 2020.

    Journal ref: Conference on Robot Learning, 2020, 184

  33. arXiv:2007.13065  [pdf, other

    cs.RO

    Multi-UAV Coverage Path Planning for the Inspection of Large and Complex Structures

    Authors: Wei Jing, Di Deng, Yan Wu, Kenji Shimada

    Abstract: We present a multi-UAV Coverage Path Planning (CPP) framework for the inspection of large-scale, complex 3D structures. In the proposed sampling-based coverage path planning method, we formulate the multi-UAV inspection applications as a multi-agent coverage path planning problem. By combining two NP-hard problems: Set Covering Problem (SCP) and Vehicle Routing Problem (VRP), a Set-Covering Vehicl… ▽ More

    Submitted 26 July, 2020; originally announced July 2020.

    Comments: Accepted by IROS2020

  34. arXiv:2006.12014  [pdf, other

    eess.AS cs.SD

    Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net

    Authors: Kazuki Shimada, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Our systems submitted to the DCASE2020 task~3: Sound Event Localization and Detection (SELD) are described in this report. We consider two systems: a single-stage system that solve sound event localization~(SEL) and sound event detection~(SED) simultaneously, and a two-stage system that first handles the SED and SEL tasks individually and later combines those results. As the single-stage system, w… ▽ More

    Submitted 7 October, 2020; v1 submitted 22 June, 2020; originally announced June 2020.

    Comments: Submitted to DCASE2020 task3

  35. arXiv:1911.09864  [pdf, other

    cs.RO

    Constrained Heterogeneous Vehicle Path Planning for Large-area Coverage

    Authors: Di Deng, Wei Jing, Yuhe Fu, Ziyin Huang, Jiahong Liu, Kenji Shimada

    Abstract: There is a strong demand for covering a large area autonomously by multiple UAVs (Unmanned Aerial Vehicles) supported by a ground vehicle. Limited by UAVs' battery life and communication distance, complete coverage of large areas typically involves multiple take-offs and landings to recharge batteries, and the transportation of UAVs between operation areas by a ground vehicle. In this paper, we in… ▽ More

    Submitted 22 November, 2019; originally announced November 2019.

  36. arXiv:1910.13724  [pdf, other

    eess.AS cs.LG cs.SD

    Metric Learning with Background Noise Class for Few-shot Detection of Rare Sound Events

    Authors: Kazuki Shimada, Yuichiro Koyama, Akira Inoue

    Abstract: Few-shot learning systems for sound event recognition have gained interests since they require only a few examples to adapt to new target classes without fine-tuning. However, such systems have only been applied to chunks of sounds for classification or verification. In this paper, we aim to achieve few-shot detection of rare sound events, from query sequence that contain not only the target event… ▽ More

    Submitted 18 February, 2020; v1 submitted 30 October, 2019; originally announced October 2019.

    Comments: 5 pages, 5 figures, accepted for publication in IEEE ICASSP 2020

  37. arXiv:1908.02901  [pdf, other

    cs.RO

    Coverage Path Planning using Path Primitive Sampling and Primitive Coverage Graph for Visual Inspection

    Authors: Wei Jing, Di Deng, Zhe Xiao, Yong Liu, Kenji Shimada

    Abstract: Planning the path to gather the surface information of the target objects is crucial to improve the efficiency of and reduce the overall cost, for visual inspection applications with Unmanned Aerial Vehicles (UAVs). Coverage Path Planning (CPP) problem is often formulated for these inspection applications because of the coverage requirement. Traditionally, researchers usually plan and optimize the… ▽ More

    Submitted 7 August, 2019; originally announced August 2019.

    Comments: Accepted by IROS 2019, 8 pages

  38. arXiv:1906.08809  [pdf, other

    cs.LG cs.AI stat.ML

    A Deep Reinforcement Learning Approach for Global Routing

    Authors: Haiguang Liao, Wentai Zhang, Xuliang Dong, Barnabas Poczos, Kenji Shimada, Levent Burak Kara

    Abstract: Global routing has been a historically challenging problem in electronic circuit design, where the challenge is to connect a large and arbitrary number of circuit components with wires without violating the design rules for the printed circuit boards or integrated circuits. Similar routing problems also exist in the design of complex hydraulic systems, pipe systems and logistic networks. Existing… ▽ More

    Submitted 20 June, 2019; originally announced June 2019.

    Comments: Preprint submitted to ASME JMD

  39. arXiv:1904.07964  [pdf, other

    cs.LG cs.CG cs.NE stat.ML

    3D Shape Synthesis for Conceptual Design and Optimization Using Variational Autoencoders

    Authors: Wentai Zhang, Zhangsihao Yang, Haoliang Jiang, Suyash Nigam, Soji Yamakawa, Tomotake Furuhata, Kenji Shimada, Levent Burak Kara

    Abstract: We propose a data-driven 3D shape design method that can learn a generative model from a corpus of existing designs, and use this model to produce a wide range of new designs. The approach learns an encoding of the samples in the training corpus using an unsupervised variational autoencoder-decoder architecture, without the need for an explicit parametric representation of the original designs. To… ▽ More

    Submitted 16 April, 2019; originally announced April 2019.

    Comments: Preprint accepted by ASME IDETC/CIE 2019

  40. arXiv:1903.09341  [pdf, other

    cs.SD cs.LG eess.AS stat.ML

    Unsupervised Speech Enhancement Based on Multichannel NMF-Informed Beamforming for Noise-Robust Automatic Speech Recognition

    Authors: Kazuki Shimada, Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

    Abstract: This paper describes multichannel speech enhancement for improving automatic speech recognition (ASR) in noisy environments. Recently, the minimum variance distortionless response (MVDR) beamforming has widely been used because it works well if the steering vector of speech and the spatial covariance matrix (SCM) of noise are given. To estimating such spatial information, conventional studies take… ▽ More

    Submitted 31 March, 2019; v1 submitted 21 March, 2019; originally announced March 2019.

  41. Data-driven Upsampling of Point Clouds

    Authors: Wentai Zhang, Haoliang Jiang, Zhangsihao Yang, Soji Yamakawa, Kenji Shimada, Levent Burak Kara

    Abstract: High quality upsampling of sparse 3D point clouds is critically useful for a wide range of geometric operations such as reconstruction, rendering, meshing, and analysis. In this paper, we propose a data-driven algorithm that enables an upsampling of 3D point clouds without the need for hard-coded rules. Our approach uses a deep network with Chamfer distance as the loss function, capable of learnin… ▽ More

    Submitted 27 December, 2018; v1 submitted 7 July, 2018; originally announced July 2018.

    Comments: Preprint submitted to CAD

    Journal ref: Computer-Aided Design, Volume 112, Pages 1-13, 2019

  42. arXiv:1803.02723  [pdf, other

    cs.RO

    Heterogeneous Vehicles Routing for Water Canal Damage Assessment

    Authors: Di Deng, Tao Pang, Prasanth Palli, Fang Shu, Kenji Shimada

    Abstract: In Japan, inspection of irrigation water canals has been mostly conducted manually. However, the huge demand for more regular inspections as infrastructure ages, coupled with the limited time window available for inspection, has rendered manual inspection increasingly insufficient. With shortened inspection time and reduced labor cost, automated inspection using a combination of unmanned aerial ve… ▽ More

    Submitted 7 March, 2018; originally announced March 2018.