subscribe to arXiv mailings

Towards the Development of a Tendon-Actuated Galvanometer for Endoscopic Surgical Laser Scanning

Authors: Kent K. Yamamoto, Tanner J. Zachem, Behnam Moradkhani, Yash Chitalia, Patrick J. Codd

Abstract: There is a need for precision pathological sensing, imaging, and tissue manipulation in neurosurgical procedures, such as brain tumor resection. Precise tumor margin identification and resection can prevent further growth and protect critical structures. Surgical lasers with small laser diameters and steering capabilities can allow for new minimally invasive procedures by traversing through comple… ▽ More There is a need for precision pathological sensing, imaging, and tissue manipulation in neurosurgical procedures, such as brain tumor resection. Precise tumor margin identification and resection can prevent further growth and protect critical structures. Surgical lasers with small laser diameters and steering capabilities can allow for new minimally invasive procedures by traversing through complex anatomy, then providing energy to sense, visualize, and affect tissue. In this paper, we present the design of a small-scale tendon-actuated galvanometer (TAG) that can serve as an end-effector tool for a steerable surgical laser. The galvanometer sensor design, fabrication, and kinematic modeling are presented and derived. It can accurately rotate up to 30.14 degrees (or a laser reflection angle of 60.28 degrees). A kinematic mapping of input tendon stroke to output galvanometer angle change and a forward-kinematics model relating the end of the continuum joint to the laser end-point are derived and validated. △ Less

Submitted 5 June, 2024; originally announced June 2024.

Comments: 6 pages, 7 figures, conference paper at the 2024 International Symposium on Medical Robotics

arXiv:2405.20509 [pdf, ps, other]

An FBG-based Stiffness Estimation Sensor for In-vivo Diagnostics

Authors: Behnam Moradkhani, Pejman Kheradmand, Harshith Jella, Kent K. Yamamoto, Alireza Tofangchi, Patrick J. Codd, Yash Chitalia

Abstract: In-vivo tissue stiffness identification can be useful in pulmonary fibrosis diagnostics and minimally invasive tumor identification, among many other applications. In this work, we propose a palpation-based method for tissue stiffness estimation that uses a sensorized beam buckled onto the surface of a tissue. Fiber Bragg Gratings (FBGs) are used in our sensor as a shape-estimation modality to get… ▽ More In-vivo tissue stiffness identification can be useful in pulmonary fibrosis diagnostics and minimally invasive tumor identification, among many other applications. In this work, we propose a palpation-based method for tissue stiffness estimation that uses a sensorized beam buckled onto the surface of a tissue. Fiber Bragg Gratings (FBGs) are used in our sensor as a shape-estimation modality to get real-time beam shape, even while the device is not visually monitored. A mechanical model is developed to predict the behavior of a buckling beam and is validated using finite element analysis and bench-top testing with phantom tissue samples (made of PDMS and PA-Gel). Bench-top estimations were conducted and the results were compared with the actual stiffness values. Mean RMSE and standard deviation (from the actual stiffnesses) values of 413.86 KPa and 313.82 KPa were obtained. Estimations for softer samples were relatively closer to the actual values. Ultimately, we used the stiffness sensor within a mock concentric tube robot as a demonstration of \textit{in-vivo} sensor feasibility. Bench-top trials with and without the robot demonstrate the effectiveness of this unique sensing modality in \textit{in-vivo} applications. △ Less

Submitted 30 May, 2024; originally announced May 2024.

Comments: 6 pages (excluding the references), 5 figures

arXiv:2404.03161 [pdf, other]

BioVL-QR: Egocentric Biochemical Video-and-Language Dataset Using Micro QR Codes

Authors: Taichi Nishimura, Koki Yamamoto, Yuto Haneji, Keiya Kajimura, Chihiro Nishiwaki, Eriko Daikoku, Natsuko Okuda, Fumihito Ono, Hirotaka Kameko, Shinsuke Mori

Abstract: This paper introduces a biochemical vision-and-language dataset, which consists of 24 egocentric experiment videos, corresponding protocols, and video-and-language alignments. The key challenge in the wet-lab domain is detecting equipment, reagents, and containers is difficult because the lab environment is scattered by filling objects on the table and some objects are indistinguishable. Therefore… ▽ More This paper introduces a biochemical vision-and-language dataset, which consists of 24 egocentric experiment videos, corresponding protocols, and video-and-language alignments. The key challenge in the wet-lab domain is detecting equipment, reagents, and containers is difficult because the lab environment is scattered by filling objects on the table and some objects are indistinguishable. Therefore, previous studies assume that objects are manually annotated and given for downstream tasks, but this is costly and time-consuming. To address this issue, this study focuses on Micro QR Codes to detect objects automatically. From our preliminary study, we found that detecting objects only using Micro QR Codes is still difficult because the researchers manipulate objects, causing blur and occlusion frequently. To address this, we also propose a novel object labeling method by combining a Micro QR Code detector and an off-the-shelf hand object detector. As one of the applications of our dataset, we conduct the task of generating protocols from experiment videos and find that our approach can generate accurate protocols. △ Less

Submitted 3 April, 2024; originally announced April 2024.

Comments: 6 pages

arXiv:2401.16971 [pdf, other]

Autonomy Loops for Monitoring, Operational Data Analytics, Feedback, and Response in HPC Operations

Authors: Francieli Boito, Jim Brandt, Valeria Cardellini, Philip Carns, Florina M. Ciorba, Hilary Egan, Ahmed Eleliemy, Ann Gentile, Thomas Gruber, Jeff Hanson, Utz-Uwe Haus, Kevin Huck, Thomas Ilsche, Thomas Jakobsche, Terry Jones, Sven Karlsson, Abdullah Mueen, Michael Ott, Tapasya Patki, Ivy Peng, Krishnan Raghavan, Stephen Simms, Kathleen Shoga, Michael Showerman, Devesh Tiwari , et al. (2 additional authors not shown)

Abstract: Many High Performance Computing (HPC) facilities have developed and deployed frameworks in support of continuous monitoring and operational data analytics (MODA) to help improve efficiency and throughput. Because of the complexity and scale of systems and workflows and the need for low-latency response to address dynamic circumstances, automated feedback and response have the potential to be more… ▽ More Many High Performance Computing (HPC) facilities have developed and deployed frameworks in support of continuous monitoring and operational data analytics (MODA) to help improve efficiency and throughput. Because of the complexity and scale of systems and workflows and the need for low-latency response to address dynamic circumstances, automated feedback and response have the potential to be more effective than current human-in-the-loop approaches which are laborious and error prone. Progress has been limited, however, by factors such as the lack of infrastructure and feedback hooks, and successful deployment is often site- and case-specific. In this position paper we report on the outcomes and plans from a recent Dagstuhl Seminar, seeking to carve a path for community progress in the development of autonomous feedback loops for MODA, based on the established formalism of similar (MAPE-K) loops in autonomous computing and self-adaptive systems. By defining and developing such loops for significant cases experienced across HPC sites, we seek to extract commonalities and develop conventions that will facilitate interoperability and interchangeability with system hardware, software, and applications across different sites, and will motivate vendors and others to provide telemetry interfaces and feedback hooks to enable community development and pervasive deployment of MODA autonomy loops. △ Less

Submitted 30 January, 2024; originally announced January 2024.

arXiv:2401.08821 [pdf]

Surface-Enhanced Raman Spectroscopy and Transfer Learning Toward Accurate Reconstruction of the Surgical Zone

Authors: Ashutosh Raman, Ren A. Odion, Kent K. Yamamoto, Weston Ross, Tuan Vo-Dinh, Patrick J. Codd

Abstract: Raman spectroscopy, a photonic modality based on the inelastic backscattering of coherent light, is a valuable asset to the intraoperative sensing space, offering non-ionizing potential and highly-specific molecular fingerprint-like spectroscopic signatures that can be used for diagnosis of pathological tissue in the dynamic surgical field. Though Raman suffers from weakness in intensity, Surface-… ▽ More Raman spectroscopy, a photonic modality based on the inelastic backscattering of coherent light, is a valuable asset to the intraoperative sensing space, offering non-ionizing potential and highly-specific molecular fingerprint-like spectroscopic signatures that can be used for diagnosis of pathological tissue in the dynamic surgical field. Though Raman suffers from weakness in intensity, Surface-Enhanced Raman Spectroscopy (SERS), which uses metal nanostructures to amplify Raman signals, can achieve detection sensitivities that rival traditional photonic modalities. In this study, we outline a robotic Raman system that can reliably pinpoint the location and boundaries of a tumor embedded in healthy tissue, modeled here as a tissue-mimicking phantom with selectively infused Gold Nanostar regions. Further, due to the relative dearth of collected biological SERS or Raman data, we implement transfer learning to achieve 100% validation classification accuracy for Gold Nanostars compared to Control Agarose, thus providing a proof-of-concept for Raman-based deep learning training pipelines. We reconstruct a surgical field of 30x60mm in 10.2 minutes, and achieve 98.2% accuracy, preserving relative measurements between features in the phantom. We also achieve an 84.3% Intersection-over-Union score, which is the extent of overlap between the ground truth and predicted reconstructions. Lastly, we also demonstrate that the Raman system and classification algorithm do not discern based on sample color, but instead on presence of SERS agents. This study provides a crucial step in the translation of intelligent Raman systems in intraoperative oncological spaces. △ Less

Submitted 16 January, 2024; originally announced January 2024.

Comments: Accepted to Hamlyn Symposium on Medical Robotics, 2023

arXiv:2312.13787 [pdf, other]

User-adaptive Tourist Information Dialogue System with Yes/No Classifier and Sentiment Estimator

Authors: Ryo Yanagimoto, Yunosuke Kubo, Miki Oshio, Mikio Nakano, Kenta Yamamoto, Kazunori Komatani

Abstract: We introduce our system developed for Dialogue Robot Competition 2023 (DRC2023). First, rule-based utterance selection and utterance generation using a large language model (LLM) are combined. We ensure the quality of system utterances while also being able to respond to unexpected user utterances. Second, dialogue flow is controlled by considering the results of the BERT-based yes/no classifier a… ▽ More We introduce our system developed for Dialogue Robot Competition 2023 (DRC2023). First, rule-based utterance selection and utterance generation using a large language model (LLM) are combined. We ensure the quality of system utterances while also being able to respond to unexpected user utterances. Second, dialogue flow is controlled by considering the results of the BERT-based yes/no classifier and sentiment estimator. These allow the system to adapt state transitions and sightseeing plans to the user. △ Less

Submitted 21 December, 2023; originally announced December 2023.

Comments: This paper is part of the proceedings of the Dialogue Robot Competition 2023

arXiv:2311.04323 [pdf, other]

Incident Angle Study for Designing an Endoscopic Tool for Intraoperative Brain Tumor Detection

Authors: Kent Y. Yamamoto, Tanner J. Zachem, Weston A. Ross, Patrick J. Codd

Abstract: In neurosurgical procedures maximizing the resection of tumor tissue while avoiding healthy tissue is of paramount importance and a difficult task due to many factors, such as surrounding eloquent brain. Swiftly identifying tumor tissue for removal could increase surgical outcomes. The TumorID is a laser-induced fluorescence spectroscopy device that utilizes endogenous fluorophores such as NADH an… ▽ More In neurosurgical procedures maximizing the resection of tumor tissue while avoiding healthy tissue is of paramount importance and a difficult task due to many factors, such as surrounding eloquent brain. Swiftly identifying tumor tissue for removal could increase surgical outcomes. The TumorID is a laser-induced fluorescence spectroscopy device that utilizes endogenous fluorophores such as NADH and FAD to detect tumor regions. With the goal of creating an endoscopic tool for intraoperative tumor detection in mind, a study of the TumorID was conducted to assess how the angle of incidence (AoI) affects the collected spectral response of the scanned tumor. For this study, flat and convex NADH/FAD gellan gum phantoms were scanned at various AoI (a range of 36 degrees) to observe the spectral behavior. Results showed that spectral signature did not change significantly across flat and convex phantoms, and the Area under Curve (AUC) values calculated for each spectrum had a standard deviation of 0.02 and 0.01 for flat and convex phantoms, respectively. Therefore, the study showed that AoI will affect the intensity of the spectral response, but the peaks representative of the endogenous fluorophores are still observable and similar. Future work includes conducting an AoI study with a longer working-distance lens, then incorporating said lens to design an endoscopic, intraoperative tumor detection device for minimally invasive surgery, with first applications in endonasal endoscopic approaches for pituitary tumors. △ Less

Submitted 7 November, 2023; originally announced November 2023.

Comments: Accepted for publication in Hamlyn Symposium on Medical Robotics, 2023

arXiv:2310.12404 [pdf, other]

Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing

Authors: Yixiao Zhang, Akira Maezawa, Gus Xia, Kazuhiko Yamamoto, Simon Dixon

Abstract: Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpre… ▽ More Creating music is iterative, requiring varied methods at each stage. However, existing AI music systems fall short in orchestrating multiple subsystems for diverse needs. To address this gap, we introduce Loop Copilot, a novel system that enables users to generate and iteratively refine music through an interactive, multi-round dialogue interface. The system uses a large language model to interpret user intentions and select appropriate AI models for task execution. Each backend model is specialized for a specific task, and their outputs are aggregated to meet the user's requirements. To ensure musical coherence, essential attributes are maintained in a centralized table. We evaluate the effectiveness of the proposed system through semi-structured interviews and questionnaires, highlighting its utility not only in facilitating music creation but also its potential for broader applications. △ Less

Submitted 18 October, 2023; originally announced October 2023.

Comments: Source code and demo video are available at \url{https://sites.google.com/view/loop-copilot}

arXiv:2309.12547 [pdf, other]

Real-time Motion Generation and Data Augmentation for Grasping Moving Objects with Dynamic Speed and Position Changes

Authors: Kenjiro Yamamoto, Hiroshi Ito, Hideyuki Ichiwara, Hiroki Mori, Tetsuya Ogata

Abstract: While deep learning enables real robots to perform complex tasks had been difficult to implement in the past, the challenge is the enormous amount of trial-and-error and motion teaching in a real environment. The manipulation of moving objects, due to their dynamic properties, requires learning a wide range of factors such as the object's position, movement speed, and grasping timing. We propose a… ▽ More While deep learning enables real robots to perform complex tasks had been difficult to implement in the past, the challenge is the enormous amount of trial-and-error and motion teaching in a real environment. The manipulation of moving objects, due to their dynamic properties, requires learning a wide range of factors such as the object's position, movement speed, and grasping timing. We propose a data augmentation method for enabling a robot to grasp moving objects with different speeds and grasping timings at low cost. Specifically, the robot is taught to grasp an object moving at low speed using teleoperation, and multiple data with different speeds and grasping timings are generated by down-sampling and padding the robot sensor data in the time-series direction. By learning multiple sensor data in a time series, the robot can generate motions while adjusting the grasping timing for unlearned movement speeds and sudden speed changes. We have shown using a real robot that this data augmentation method facilitates learning the relationship between object position and velocity and enables the robot to perform robust grasping motions for unlearned positions and objects with dynamically changing positions and velocities. △ Less

Submitted 21 September, 2023; originally announced September 2023.

arXiv:2307.13007 [pdf, other]

Sparse-firing regularization methods for spiking neural networks with time-to-first spike coding

Authors: Yusuke Sakemi, Kakei Yamamoto, Takeo Hosomi, Kazuyuki Aihara

Abstract: The training of multilayer spiking neural networks (SNNs) using the error backpropagation algorithm has made significant progress in recent years. Among the various training schemes, the error backpropagation method that directly uses the firing time of neurons has attracted considerable attention because it can realize ideal temporal coding. This method uses time-to-first spike (TTFS) coding, in… ▽ More The training of multilayer spiking neural networks (SNNs) using the error backpropagation algorithm has made significant progress in recent years. Among the various training schemes, the error backpropagation method that directly uses the firing time of neurons has attracted considerable attention because it can realize ideal temporal coding. This method uses time-to-first spike (TTFS) coding, in which each neuron fires at most once, and this restriction on the number of firings enables information to be processed at a very low firing frequency. This low firing frequency increases the energy efficiency of information processing in SNNs, which is important not only because of its similarity with information processing in the brain, but also from an engineering point of view. However, only an upper limit has been provided for TTFS-coded SNNs, and the information-processing capability of SNNs at lower firing frequencies has not been fully investigated. In this paper, we propose two spike timing-based sparse-firing (SSR) regularization methods to further reduce the firing frequency of TTFS-coded SNNs. The first is the membrane potential-aware SSR (M-SSR) method, which has been derived as an extreme form of the loss function of the membrane potential value. The second is the firing condition-aware SSR (F-SSR) method, which is a regularization function obtained from the firing conditions. Both methods are characterized by the fact that they only require information about the firing timing and associated weights. The effects of these regularization methods were investigated on the MNIST, Fashion-MNIST, and CIFAR-10 datasets using multilayer perceptron networks and convolutional neural network structures. △ Less

Submitted 24 July, 2023; originally announced July 2023.

arXiv:2211.16113 [pdf, other]

Timing-Based Backpropagation in Spiking Neural Networks Without Single-Spike Restrictions

Authors: Kakei Yamamoto, Yusuke Sakemi, Kazuyuki Aihara

Abstract: We propose a novel backpropagation algorithm for training spiking neural networks (SNNs) that encodes information in the relative multiple spike timing of individual neurons without single-spike restrictions. The proposed algorithm inherits the advantages of conventional timing-based methods in that it computes accurate gradients with respect to spike timing, which promotes ideal temporal coding.… ▽ More We propose a novel backpropagation algorithm for training spiking neural networks (SNNs) that encodes information in the relative multiple spike timing of individual neurons without single-spike restrictions. The proposed algorithm inherits the advantages of conventional timing-based methods in that it computes accurate gradients with respect to spike timing, which promotes ideal temporal coding. Unlike conventional methods where each neuron fires at most once, the proposed algorithm allows each neuron to fire multiple times. This extension naturally improves the computational capacity of SNNs. Our SNN model outperformed comparable SNN models and achieved as high accuracy as non-convolutional artificial neural networks. The spike count property of our networks was altered depending on the time constant of the postsynaptic current and the membrane potential. Moreover, we found that there existed the optimal time constant with the maximum test accuracy. That was not seen in conventional SNNs with single-spike restrictions on time-to-fast-spike (TTFS) coding. This result demonstrates the computational properties of SNNs that biologically encode information into the multi-spike timing of individual neurons. Our code would be publicly available. △ Less

Submitted 29 November, 2022; originally announced November 2022.

Comments: 10 pages, 5 figures

ACM Class: I.5.1

arXiv:2207.05902 [pdf, other]

Verifying Attention Robustness of Deep Neural Networks against Semantic Perturbations

Authors: Satoshi Munakata, Caterina Urban, Haruki Yokoyama, Koji Yamamoto, Kazuki Munakata

Abstract: It is known that deep neural networks (DNNs) classify an input image by paying particular attention to certain specific pixels; a graphical representation of the magnitude of attention to each pixel is called a saliency-map. Saliency-maps are used to check the validity of the classification decision basis, e.g., it is not a valid basis for classification if a DNN pays more attention to the backgro… ▽ More It is known that deep neural networks (DNNs) classify an input image by paying particular attention to certain specific pixels; a graphical representation of the magnitude of attention to each pixel is called a saliency-map. Saliency-maps are used to check the validity of the classification decision basis, e.g., it is not a valid basis for classification if a DNN pays more attention to the background rather than the subject of an image. Semantic perturbations can significantly change the saliency-map. In this work, we propose the first verification method for attention robustness, i.e., the local robustness of the changes in the saliency-map against combinations of semantic perturbations. Specifically, our method determines the range of the perturbation parameters (e.g., the brightness change) that maintains the difference between the actual saliency-map change and the expected saliency-map change below a given threshold value. Our method is based on activation region traversals, focusing on the outermost robust boundary for scalability on larger DNNs. Experimental results demonstrate that our method can show the extent to which DNNs can classify with the same basis regardless of semantic perturbations and report on performance and performance factors of activation region traversals. △ Less

Submitted 12 July, 2022; originally announced July 2022.

Comments: 25 pages, 12 figures

ACM Class: D.2.4; I.1.4

arXiv:2112.09407 [pdf, other]

Communication-oriented Model Fine-tuning for Packet-loss Resilient Distributed Inference under Highly Lossy IoT Networks

Authors: Sohei Itahara, Takayuki Nishio, Yusuke Koda, Koji Yamamoto

Abstract: The distributed inference (DI) framework has gained traction as a technique for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In DI, computational tasks are offloaded from the IoT device to the edge server via lossy IoT networks. However, generally, there is a communication system-level trade-off between commun… ▽ More The distributed inference (DI) framework has gained traction as a technique for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In DI, computational tasks are offloaded from the IoT device to the edge server via lossy IoT networks. However, generally, there is a communication system-level trade-off between communication latency and reliability; thus, to provide accurate DI results, a reliable and high-latency communication system is required to be adapted, which results in non-negligible end-to-end latency of the DI. This motivated us to improve the trade-off between the communication latency and accuracy by efforts on ML techniques. Specifically, we have proposed a communication-oriented model tuning (COMtune), which aims to achieve highly accurate DI with low-latency but unreliable communication links. In COMtune, the key idea is to fine-tune the ML model by emulating the effect of unreliable communication links through the application of the dropout technique. This enables the DI system to obtain robustness against unreliable communication links. Our ML experiments revealed that COMtune enables accurate predictions with low latency and under lossy networks. △ Less

Submitted 17 December, 2021; originally announced December 2021.

Comments: Submitted to IEEE Access

arXiv:2112.06695 [pdf, other]

Bi-directional Beamforming Feedback-based Firmware-agnostic WiFi Sensing: An Empirical Study

Authors: S. Kondo, S. Itahara, K. Yamashita, K. Yamamoto, Y. Koda, T. Nishio, A. Taya

Abstract: In the field of WiFi sensing, as an alternative sensing source of the channel state information (CSI) matrix, the use of a beamforming feedback matrix (BFM)that is a right singular matrix of the CSI matrix has attracted significant interest owing to its wide availability regarding the underlying WiFi systems. In the IEEE 802.11ac/ax standard, the station (STA) transmits a BFM to an access point (A… ▽ More In the field of WiFi sensing, as an alternative sensing source of the channel state information (CSI) matrix, the use of a beamforming feedback matrix (BFM)that is a right singular matrix of the CSI matrix has attracted significant interest owing to its wide availability regarding the underlying WiFi systems. In the IEEE 802.11ac/ax standard, the station (STA) transmits a BFM to an access point (AP), which uses the BFM for precoded multiple-input and multiple-output communications. In addition, in the same way, the AP transmits a BFM to the STA, and the STA uses the received BFM. Regarding BFM-based sensing, extensive real-world experiments were conducted as part of this study, and two key insights were reported: Firstly, this report identified a potential issue related to accuracy in existing uni-directional BFM-based sensing frameworks that leverage only BFMs transmitted for the AP or STA. Such uni-directionality introduces accuracy concerns when there is a sensing capability gap between the uni-directional BFMs for the AP and STA. Thus, this report experimentally evaluates the sensing ability disparity between the uni-directional BFMs, and shows that the BFMs transmitted for an AP achieve higher sensing accuracy compared to the BFMs transmitted from the STA when the sensing target values are estimated depending on the angle of departure of the AP. Secondly, to complement the sensing gap, this paper proposes a bi-directional sensing framework, which simultaneously leverages the BFMs transmitted from the AP and STA. The experimental evaluations reveal that bi-directional sensing achieves higher accuracy than uni-directional sensing in terms of the human localization task. △ Less

Submitted 27 February, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

Comments: 10 pages, 7 figures

arXiv:2112.06442 [pdf, ps, other]

Contact-Rich Manipulation of a Flexible Object based on Deep Predictive Learning using Vision and Tactility

Authors: Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

Abstract: We achieved contact-rich flexible object manipulation, which was difficult to control with vision alone. In the unzipping task we chose as a validation task, the gripper grasps the puller, which hides the bag state such as the direction and amount of deformation behind it, making it difficult to obtain information to perform the task by vision alone. Additionally, the flexible fabric bag state con… ▽ More We achieved contact-rich flexible object manipulation, which was difficult to control with vision alone. In the unzipping task we chose as a validation task, the gripper grasps the puller, which hides the bag state such as the direction and amount of deformation behind it, making it difficult to obtain information to perform the task by vision alone. Additionally, the flexible fabric bag state constantly changes during operation, so the robot needs to dynamically respond to the change. However, the appropriate robot behavior for all bag states is difficult to prepare in advance. To solve this problem, we developed a model that can perform contact-rich flexible object manipulation by real-time prediction of vision with tactility. We introduced a point-based attention mechanism for extracting image features, softmax transformation for predicting motions, and convolutional neural network for extracting tactile features. The results of experiments using a real robot arm revealed that our method can realize motions responding to the deformation of the bag while reducing the load on the zipper. Furthermore, using tactility improved the success rate from 56.7% to 93.3% compared with vision alone, demonstrating the effectiveness and high performance of our method. △ Less

Submitted 10 May, 2022; v1 submitted 13 December, 2021; originally announced December 2021.

arXiv:2110.15660 [pdf, other]

doi 10.1109/CCNC49033.2022.9700520

Frame-Capture-Based CSI Recomposition Pertaining to Firmware-Agnostic WiFi Sensing

Authors: Ryosuke Hanahara, Sohei Itahara, Kota Yamashita, Yusuke Koda, Akihito Taya, Takayuki Nishio, Koji Yamamoto

Abstract: With regard to the implementation of WiFi sensing agnostic according to the availability of channel state information (CSI), we investigate the possibility of estimating a CSI matrix based on its compressed version, which is known as beamforming feedback matrix (BFM). Being different from the CSI matrix that is processed and discarded in physical layer components, the BFM can be captured using a m… ▽ More With regard to the implementation of WiFi sensing agnostic according to the availability of channel state information (CSI), we investigate the possibility of estimating a CSI matrix based on its compressed version, which is known as beamforming feedback matrix (BFM). Being different from the CSI matrix that is processed and discarded in physical layer components, the BFM can be captured using a medium-access-layer frame-capturing technique because this is exchanged among an access point (AP) and stations (STAs) over the air. This indicates that WiFi sensing that leverages the BFM matrix is more practical to implement using the pre-installed APs. However, the ability of BFM-based sensing has been evaluated in a few tasks, and more general insights into its performance should be provided. To fill this gap, we propose a CSI estimation method based on BFM, approximating the estimation function with a machine learning model. In addition, to improve the estimation accuracy, we leverage the inter-subcarrier dependency using the BFMs at multiple subcarriers in orthogonal frequency division multiplexing transmissions. Our simulation evaluation reveals that the estimated CSI matches the ground-truth amplitude. Moreover, compared to CSI estimation at each individual subcarrier, the effect of the BFMs at multiple subcarriers on the CSI estimation accuracy is validated. △ Less

Submitted 29 October, 2021; originally announced October 2021.

Journal ref: Proc. IEEE 19th Annual Consumer Communications & Networking Conference (CCNC 2022)

arXiv:2110.14211 [pdf, other]

Beamforming Feedback-based Model-Driven Angle of Departure Estimation Toward Legacy Support in WiFi Sensing: An Experimental Study

Authors: Sohei Itahara, Sota Kondo, Kota Yamashita, Takayuki Nishio, Koji Yamamoto, Yusuke Koda

Abstract: This study experimentally validated the possibility of angle of departure (AoD) estimation using multiple signal classification (MUSIC) with only WiFi control frames for beamforming feedback (BFF), defined in IEEE 802.11ac/ax. The examined BFF-based MUSIC is a model-driven algorithm, which does not require a pre-obtained database. This contrasts with most existing BFF-based sensing techniques, whi… ▽ More This study experimentally validated the possibility of angle of departure (AoD) estimation using multiple signal classification (MUSIC) with only WiFi control frames for beamforming feedback (BFF), defined in IEEE 802.11ac/ax. The examined BFF-based MUSIC is a model-driven algorithm, which does not require a pre-obtained database. This contrasts with most existing BFF-based sensing techniques, which are data-driven and require a pre-obtained database. Moreover, the BFF-based MUSIC affords an alternative AoD estimation method without access to channel state information (CSI). Specifically, the extensive experimental and numerical evaluations demonstrated that the BFF-based MUSIC successfully estimates the AoDs for multiple propagation paths. Moreover, the evaluations performed in this study revealed that the BFF-based MUSIC achieved a comparable error of AoD estimation to the CSI-based MUSIC, while BFF is a highly compressed version of CSI in IEEE 802.11ac/ax. △ Less

Submitted 2 February, 2022; v1 submitted 27 October, 2021; originally announced October 2021.

Comments: Submitted to IEEE Access

arXiv:2107.05043 [pdf, other]

A Projector-Camera System Using Hybrid Pixels with Projection and Capturing Capabilities

Authors: Kenta Yamamoto, Daisuke Iwai, Kosuke Sato

Abstract: We propose a novel projector-camera system (ProCams) in which each pixel has both projection and capturing capabilities. Our proposed ProCams solves the difficulty of obtaining precise pixel correspondence between the projector and the camera. We implemented a proof-of-concept ProCams prototype and demonstrated its applicability to a dynamic projection mapping. We propose a novel projector-camera system (ProCams) in which each pixel has both projection and capturing capabilities. Our proposed ProCams solves the difficulty of obtaining precise pixel correspondence between the projector and the camera. We implemented a proof-of-concept ProCams prototype and demonstrated its applicability to a dynamic projection mapping. △ Less

Submitted 11 July, 2021; originally announced July 2021.

Comments: Author's version of a paper published at IDW (International Display Workshops) 2020

Journal ref: In Proceedings of the International Display Workshops, pp. 655-658, 2020

arXiv:2107.04770 [pdf, other]

Computer Vision-assisted Single-antenna and Single-anchor RSSI Localization Harnessing Dynamic Blockage Events

Authors: Tomoya Sunami, Sohei Itahara, Yusuke Koda, Takayuki Nishio, Koji Yamamoto

Abstract: This paper demonstrates the feasibility of single-antenna and single-RF (radio frequency)- anchor received power strength indicator (RSSI) localization (SARR-LOC) with the assistance of the computer vision (CV) technique. Generally, to perform radio frequency (RF)-based device localization, either 1) fine-grained channel state information or 2) RSSIs from multiple antenna elements or multiple RF a… ▽ More This paper demonstrates the feasibility of single-antenna and single-RF (radio frequency)- anchor received power strength indicator (RSSI) localization (SARR-LOC) with the assistance of the computer vision (CV) technique. Generally, to perform radio frequency (RF)-based device localization, either 1) fine-grained channel state information or 2) RSSIs from multiple antenna elements or multiple RF anchors (e.g., access points) is required. Meanwhile, owing to deficiency of single-antenna and single-anchor RSSI, which only indicates a coarse-grained distance information between a receiver and a transmitter, realizing localization with single-antenna and single-anchor RSSI is challenging. Our key idea to address this challenge is to leverage CV technique and to estimate the most likely first Fresnel zone (FFZ) between the receiver and transmitter, where the role of the RSSI is to detect blockage timings. Specifically, historical positions of an obstacle that dynamically blocks the FFZ are detected by the CV technique, and we estimate positions at which a blockage starts and ends via a time series of RSSI. These estimated obstacle positions, in principle, coincide with points on the FFZ boundaries, enabling the estimation of the FFZ and localization of the transmitter. The experimental evaluation revealed that the proposed SARR-LOC achieved the localization error less than 1.0 m in an indoor environment, which is comparable to that of a conventional triangulation-based RSSI localization with multiple RF anchors. △ Less

Submitted 6 December, 2021; v1 submitted 10 July, 2021; originally announced July 2021.

Comments: Submitted to IEEE Internet of Things journal

arXiv:2104.13629 [pdf, other]

Packet-Loss-Tolerant Split Inference for Delay-Sensitive Deep Learning in Lossy Wireless Networks

Authors: Sohei Itahara, Takayuki Nishio, Koji Yamamoto

Abstract: The distributed inference framework is an emerging technology for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In distributed inference, computational tasks are offloaded from the IoT device to other devices or the edge server via lossy IoT networks. However, narrow-band and lossy IoT networks cause non-neglig… ▽ More The distributed inference framework is an emerging technology for real-time applications empowered by cutting-edge deep machine learning (ML) on resource-constrained Internet of things (IoT) devices. In distributed inference, computational tasks are offloaded from the IoT device to other devices or the edge server via lossy IoT networks. However, narrow-band and lossy IoT networks cause non-negligible packet losses and retransmissions, resulting in non-negligible communication latency. This study solves the problem of the incremental retransmission latency caused by packet loss in a lossy IoT network. We propose a split inference with no retransmissions (SI-NR) method that achieves high accuracy without any retransmissions, even when packet loss occurs. In SI-NR, the key idea is to train the ML model by emulating the packet loss by a dropout method, which randomly drops the output of hidden units in a DNN layer. This enables the SI-NR system to obtain robustness against packet losses. Our ML experimental evaluation reveals that SI-NR obtains accurate predictions without packet retransmission at a packet loss rate of 60%. △ Less

Submitted 28 April, 2021; originally announced April 2021.

arXiv:2104.11811 [pdf, other]

doi 10.1109/CCNC49033.2022.9700707

ACK-Less Rate Adaptation for IEEE 802.11bc Enhanced Broadcast Services Using Sim-to-Real Deep Reinforcement Learning

Authors: T. Kanda, Y. Koda, K. Yamamoto, T. Nishio

Abstract: In IEEE 802.11bc, the broadcast mode on wireless local area networks (WLANs), data rate control that is based on acknowledgement (ACK) mechanism similar to the one in the current IEEE 802.11 WLANs is not applicable because ACK mechanism is not implemented. This paper addresses this challenge by proposing ACK-less data rate adaptation methods by capturing non-broadcast uplink frames of STAs. In IEE… ▽ More In IEEE 802.11bc, the broadcast mode on wireless local area networks (WLANs), data rate control that is based on acknowledgement (ACK) mechanism similar to the one in the current IEEE 802.11 WLANs is not applicable because ACK mechanism is not implemented. This paper addresses this challenge by proposing ACK-less data rate adaptation methods by capturing non-broadcast uplink frames of STAs. In IEEE 802.11bc, an use case is assumed, where a part of STAs in the broadcast recipients is also associated with non-broadcast APs, and such STAs periodically transmit uplink frames including ACK frames. The proposed method is based on the idea that by overhearing such uplink frames, the broadcast AP surveys channel conditions at partial STAs, thereby setting appropriate data rates for the STAs. Furthermore, in order to avoid reception failures in a large portion of STAs, this paper proposes deep reinforcement learning (DRL)-based data rate adaptation framework that uses a sim-to-real approach. Therein, information of reception success/failure at broadcast recipient STAs, that could not be notified to the broadcast AP in real deployments, are made available by simulations beforehand, thereby forming data rate adaptation strategies. Numerical results show that utilizing overheard uplink frames of recipients makes it feasible to manage data rates in ACK-less broadcast WLANs, and using the sim-to-real DRL framework can decrease reception failures. △ Less

Submitted 23 April, 2021; originally announced April 2021.

Journal ref: Proc. IEEE 19th Annual Consumer Communications & Networking Conference (CCNC 2022)

arXiv:2104.00352 [pdf, other]

doi 10.1109/TSIPN.2022.3205549

Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

Authors: Akihito Taya, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto

Abstract: This paper proposes a fully decentralized federated learning (FL) scheme for Internet of Everything (IoE) devices that are connected via multi-hop networks. Because FL algorithms hardly converge the parameters of machine learning (ML) models, this paper focuses on the convergence of ML models in function spaces. Considering that the representative loss functions of ML tasks e.g, mean squared error… ▽ More This paper proposes a fully decentralized federated learning (FL) scheme for Internet of Everything (IoE) devices that are connected via multi-hop networks. Because FL algorithms hardly converge the parameters of machine learning (ML) models, this paper focuses on the convergence of ML models in function spaces. Considering that the representative loss functions of ML tasks e.g, mean squared error (MSE) and Kullback-Leibler (KL) divergence, are convex functionals, algorithms that directly update functions in function spaces could converge to the optimal solution. The key concept of this paper is to tailor a consensus-based optimization algorithm to work in the function space and achieve the global optimum in a distributed manner. This paper first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm, and shows that the spectral graph theory can be applied to the function space in a manner similar to that of numerical vectors. Then, consensus-based multi-hop federated distillation (CMFD) is developed for a neural network (NN) to implement the meta-algorithm. CMFD leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging. An advantage of CMFD is that it works even with different NN models among the distributed learners. Although CMFD does not perfectly reflect the behavior of the meta-algorithm, the discussion of the meta-algorithm's convergence property promotes an intuitive understanding of CMFD, and simulation evaluations show that NN models converge using CMFD for several tasks. The simulation results also show that CMFD achieves higher accuracy than parameter aggregation for weakly connected networks, and CMFD is more stable than parameter aggregation methods. △ Less

Submitted 3 October, 2022; v1 submitted 1 April, 2021; originally announced April 2021.

Journal ref: IEEE Transactions on Signal and Information Processing over Networks, vol. 8, pp. 799-814, 2022

arXiv:2103.07156 [pdf, other]

Learnable Companding Quantization for Accurate Low-bit Neural Networks

Authors: Kohei Yamamoto

Abstract: Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit models to achieve accuracy comparable with that of full-precision models. To address this issue, we propose learnable companding quantization (LCQ) as a novel non-… ▽ More Quantizing deep neural networks is an effective method for reducing memory consumption and improving inference speed, and is thus useful for implementation in resource-constrained devices. However, it is still hard for extremely low-bit models to achieve accuracy comparable with that of full-precision models. To address this issue, we propose learnable companding quantization (LCQ) as a novel non-uniform quantization method for 2-, 3-, and 4-bit models. LCQ jointly optimizes model weights and learnable companding functions that can flexibly and non-uniformly control the quantization levels of weights and activations. We also present a new weight normalization technique that allows more stable training for quantization. Experimental results show that LCQ outperforms conventional state-of-the-art methods and narrows the gap between quantized and full-precision models for image classification and object detection tasks. Notably, the 2-bit ResNet-50 model on ImageNet achieves top-1 accuracy of 75.1% and reduces the gap to 1.7%, allowing LCQ to further exploit the potential of non-uniform quantization. △ Less

Submitted 12 March, 2021; originally announced March 2021.

Comments: Accepted at CVPR 2021

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5029-5038

arXiv:2103.01598 [pdf, ps, other]

Spatial Attention Point Network for Deep-learning-based Robust Autonomous Robot Motion Generation

Authors: Hideyuki Ichiwara, Hiroshi Ito, Kenjiro Yamamoto, Hiroki Mori, Tetsuya Ogata

Abstract: Deep learning provides a powerful framework for automated acquisition of complex robotic motions. However, despite a certain degree of generalization, the need for vast amounts of training data depending on the work-object position is an obstacle to industrial applications. Therefore, a robot motion-generation model that can respond to a variety of work-object positions with a small amount of trai… ▽ More Deep learning provides a powerful framework for automated acquisition of complex robotic motions. However, despite a certain degree of generalization, the need for vast amounts of training data depending on the work-object position is an obstacle to industrial applications. Therefore, a robot motion-generation model that can respond to a variety of work-object positions with a small amount of training data is necessary. In this paper, we propose a method robust to changes in object position by automatically extracting spatial attention points in the image for the robot task and generating motions on the basis of their positions. We demonstrate our method with an LBR iiwa 7R1400 robot arm on a picking task and a pick-and-place task at various positions in various situations. In each task, the spatial attention points are obtained for the work objects that are important to the task. Our method is robust to changes in object position. Further, it is robust to changes in background, lighting, and obstacles that are not important to the task because it only focuses on positions that are important to the task. △ Less

Submitted 2 March, 2021; originally announced March 2021.

arXiv:2102.08055 [pdf, other]

Zero-Shot Adaptation for mmWave Beam-Tracking on Overhead Messenger Wires through Robust Adversarial Reinforcement Learning

Authors: Masao Shinzaki, Yusuke Koda, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura, Yushi Shirato, Daisei Uchida, Naoki Kita

Abstract: Millimeter wave (mmWave) beam-tracking based on machine learning enables the development of accurate tracking policies while obviating the need to periodically solve beam-optimization problems. However, its applicability is still arguable when training-test gaps exist in terms of environmental parameters that affect the node dynamics. From this skeptical point of view, the contribution of this stu… ▽ More Millimeter wave (mmWave) beam-tracking based on machine learning enables the development of accurate tracking policies while obviating the need to periodically solve beam-optimization problems. However, its applicability is still arguable when training-test gaps exist in terms of environmental parameters that affect the node dynamics. From this skeptical point of view, the contribution of this study is twofold. First, by considering an example scenario, we confirm that the training-test gap adversely affects the beam-tracking performance. More specifically, we consider nodes placed on overhead messenger wires, where the node dynamics are affected by several environmental parameters, e.g, the wire mass and tension. Although these are particular scenarios, they yield insight into the validation of the training-test gap problems. Second, we demonstrate the feasibility of \textit{zero-shot adaptation} as a solution, where a learning agent adapts to environmental parameters unseen during training. This is achieved by leveraging a robust adversarial reinforcement learning (RARL) technique, where such training-and-test gaps are regarded as disturbances by adversaries that are jointly trained with a legitimate beam-tracking agent. Numerical evaluations demonstrate that the beam-tracking policy learned via RARL can be applied to a wide range of environmental parameters without severely degrading the received power. △ Less

Submitted 10 July, 2021; v1 submitted 16 February, 2021; originally announced February 2021.

Comments: 13 pages, 13 figures, 3 tables, under submission for possible publication for IEEE

arXiv:2101.11326 [pdf, other]

See-Through Captions: Real-Time Captioning on Transparent Display for Deaf and Hard-of-Hearing People

Authors: Kenta Yamamoto, Ippei Suzuki, Akihisa Shitara, Yoichi Ochiai

Abstract: Real-time captioning is a useful technique for deaf and hard-of-hearing (DHH) people to talk to hearing people. With the improvement in device performance and the accuracy of automatic speech recognition (ASR), real-time captioning is becoming an important tool for helping DHH people in their daily lives. To realize higher-quality communication and overcome the limitations of mobile and augmented-… ▽ More Real-time captioning is a useful technique for deaf and hard-of-hearing (DHH) people to talk to hearing people. With the improvement in device performance and the accuracy of automatic speech recognition (ASR), real-time captioning is becoming an important tool for helping DHH people in their daily lives. To realize higher-quality communication and overcome the limitations of mobile and augmented-reality devices, real-time captioning that can be used comfortably while maintaining nonverbal communication and preventing incorrect recognition is required. Therefore, we propose a real-time captioning system that uses a transparent display. In this system, the captions are presented on both sides of the display to address the problem of incorrect ASR, and the highly transparent display makes it possible to see both the body language and the captions. △ Less

Submitted 27 January, 2021; originally announced January 2021.

arXiv:2012.02431 [pdf]

doi 10.1038/s41598-021-91880-2

Acoustic Hologram Optimisation Using Automatic Differentiation

Authors: Tatsuki Fushimi, Kenta Yamamoto, Yoichi Ochiai

Abstract: Acoustic holograms are the keystone of modern acoustics. It encodes three-dimensional acoustic fields in two dimensions, and its quality determine the performance of acoustic systems. Optimisation methods that control only the phase of an acoustic wave are considered inferior to methods that control both the amplitude and phase of the wave. In this paper, we present Diff-PAT, an acoustic hologram… ▽ More Acoustic holograms are the keystone of modern acoustics. It encodes three-dimensional acoustic fields in two dimensions, and its quality determine the performance of acoustic systems. Optimisation methods that control only the phase of an acoustic wave are considered inferior to methods that control both the amplitude and phase of the wave. In this paper, we present Diff-PAT, an acoustic hologram optimisation algorithm with automatic differentiation. We demonstrate that our method achieves superior accuracy than conventional methods. The performance of Diff-PAT was evaluated by randomly generating 1000 sets of up to 32 control points for single-sided arrays and single-axis arrays. The improved acoustic hologram can be used in wide range of applications of PATs without introducing any changes to existing systems that control the PATs. In addition, we applied Diff-PAT to acoustic metamaterial and achieved an >8 dB increase in the peak noise-to-signal ratio of acoustic hologram. △ Less

Submitted 4 December, 2020; originally announced December 2020.

Comments: 25 pages, 5 figures, manuscript

arXiv:2012.00982 [pdf, other]

Millimeter Wave Communications on Overhead Messenger Wire: Deep Reinforcement Learning-Based Predictive Beam Tracking

Authors: Yusuke Koda, Masao Shinzaki, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura, Yushi Shirato, Daisei Uchida, Naoki Kita

Abstract: This paper discusses the feasibility of beam tracking against dynamics in millimeter wave (mmWave) nodes placed on overhead messenger wires, including wind-forced perturbations and disturbances caused by impulsive forces to wires. Our main contribution is to answer whether or not historical positions and velocities of a mmWave node is useful to track directional beams given the complicated on-wire… ▽ More This paper discusses the feasibility of beam tracking against dynamics in millimeter wave (mmWave) nodes placed on overhead messenger wires, including wind-forced perturbations and disturbances caused by impulsive forces to wires. Our main contribution is to answer whether or not historical positions and velocities of a mmWave node is useful to track directional beams given the complicated on-wire dynamics. To this end, we implement beam-tracking based on deep reinforcement learning (DRL) to learn the complicated relationships between the historical positions/velocities and appropriate beam steering angles. Our numerical evaluations yielded the following key insights: Against wind perturbations, an appropriate beam-tracking policy can be learned from the historical positions and velocities of a node. Meanwhile, against impulsive forces to the wire, the use of the position and velocity of the node is not necessarily sufficient owing to the rapid displacement of the node. To solve this, we propose to take advantage of the positional interaction on the wire by leveraging the positions/velocities of several points on the wire as state information in DRL. The results confirmed that this results in the avoidance of beam misalignment, which would not be possible by using only the position/velocity of the node. △ Less

Submitted 2 December, 2020; originally announced December 2020.

Comments: 12 pages, 18 figures

arXiv:2009.13879 [pdf, other]

MAB-based Client Selection for Federated Learning with Uncertain Resources in Mobile Networks

Authors: Naoya Yoshida, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto

Abstract: This paper proposes a client selection method for federated learning (FL) when the computation and communication resource of clients cannot be estimated; the method trains a machine learning (ML) model using the rich data and computational resources of mobile clients without collecting their data in central systems. Conventional FL with client selection estimates the required time for an FL round… ▽ More This paper proposes a client selection method for federated learning (FL) when the computation and communication resource of clients cannot be estimated; the method trains a machine learning (ML) model using the rich data and computational resources of mobile clients without collecting their data in central systems. Conventional FL with client selection estimates the required time for an FL round from a given clients' computation power and throughput and determines a client set to reduce time consumption in FL rounds. However, it is difficult to obtain accurate resource information for all clients before the FL process is conducted because the available computation and communication resources change easily based on background computation tasks, background traffic, bottleneck links, etc. Consequently, the FL operator must select clients through exploration and exploitation processes. This paper proposes a multi-armed bandit (MAB)-based client selection method to solve the exploration and exploitation trade-off and reduce the time consumption for FL in mobile networks. The proposed method balances the selection of clients for which the amount of resources is uncertain and those known to have a large amount of resources. The simulation evaluation demonstrated that the proposed scheme requires less learning time than the conventional method in the resource fluctuating scenario. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2009.13864 [pdf, other]

doi 10.1109/GCWkshps50303.2020.9367396

Online Trainable Wireless Link Quality Prediction System using Camera Imagery

Authors: Sohei Itahara, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto

Abstract: Machine-learning-based prediction of future wireless link quality is an emerging technique that can potentially improve the reliability of wireless communications, especially at higher frequencies (e.g., millimeter-wave and terahertz technologies), through predictive handover and beamforming to solve line-of-sight (LOS) blockage problem. In this study, a real-time online trainable wireless link qu… ▽ More Machine-learning-based prediction of future wireless link quality is an emerging technique that can potentially improve the reliability of wireless communications, especially at higher frequencies (e.g., millimeter-wave and terahertz technologies), through predictive handover and beamforming to solve line-of-sight (LOS) blockage problem. In this study, a real-time online trainable wireless link quality prediction system was proposed; the system was implemented with commercially available laptops. The proposed system collects datasets, updates a model, and infers the received power in real-time. The experimental evaluation was conducted using 5 GHz Wi-Fi, where received signal strength could be degraded by 10 dB when the LOS path was blocked by large obstacles. The experimental results demonstrate that the prediction model is updated in real-time, adapts to the change in environment, and predicts the time-varying Wi-Fi received power accurately. △ Less

Submitted 29 September, 2020; originally announced September 2020.

arXiv:2008.06180 [pdf, other]

doi 10.1109/TMC.2021.3070013

Distillation-Based Semi-Supervised Federated Learning for Communication-Efficient Collaborative Training with Non-IID Private Data

Authors: Sohei Itahara, Takayuki Nishio, Yusuke Koda, Masahiro Morikura, Koji Yamamoto

Abstract: This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instea… ▽ More This study develops a federated learning (FL) framework overcoming largely incremental communication costs due to model sizes in typical frameworks without compromising model performance. To this end, based on the idea of leveraging an unlabeled open dataset, we propose a distillation-based semi-supervised FL (DS-FL) algorithm that exchanges the outputs of local models among mobile devices, instead of model parameter exchange employed by the typical frameworks. In DS-FL, the communication cost depends only on the output dimensions of the models and does not scale up according to the model size. The exchanged model outputs are used to label each sample of the open dataset, which creates an additionally labeled dataset. Based on the new dataset, local models are further trained, and model performance is enhanced owing to the data augmentation effect. We further highlight that in DS-FL, the heterogeneity of the devices' dataset leads to ambiguous of each data sample and lowing of the training convergence. To prevent this, we propose entropy reduction averaging, where the aggregated model outputs are intentionally sharpened. Moreover, extensive experiments show that DS-FL reduces communication costs up to 99% relative to those of the FL benchmark while achieving similar or higher classification accuracy. △ Less

Submitted 20 January, 2021; v1 submitted 13 August, 2020; originally announced August 2020.

Journal ref: IEEE Transactions on Mobile Computing (2021) 1-15

arXiv:2008.01645 [pdf, other]

doi 10.1109/TVCG.2020.3028889

A Visual Analytics Framework for Reviewing Multivariate Time-Series Data with Dimensionality Reduction

Authors: Takanori Fujiwara, Shilpika, Naohisa Sakamoto, Jorji Nonaka, Keiji Yamamoto, Kwan-Liu Ma

Abstract: Data-driven problem solving in many real-world applications involves analysis of time-dependent multivariate data, for which dimensionality reduction (DR) methods are often used to uncover the intrinsic structure and features of the data. However, DR is usually applied to a subset of data that is either single-time-point multivariate or univariate time-series, resulting in the need to manually exa… ▽ More Data-driven problem solving in many real-world applications involves analysis of time-dependent multivariate data, for which dimensionality reduction (DR) methods are often used to uncover the intrinsic structure and features of the data. However, DR is usually applied to a subset of data that is either single-time-point multivariate or univariate time-series, resulting in the need to manually examine and correlate the DR results out of different data subsets. When the number of dimensions is large either in terms of the number of time points or attributes, this manual task becomes too tedious and infeasible. In this paper, we present MulTiDR, a new DR framework that enables processing of time-dependent multivariate data as a whole to provide a comprehensive overview of the data. With the framework, we employ DR in two steps. When treating the instances, time points, and attributes of the data as a 3D array, the first DR step reduces the three axes of the array to two, and the second DR step visualizes the data in a lower-dimensional space. In addition, by coupling with a contrastive learning method and interactive visualizations, our framework enhances analysts' ability to interpret DR results. We demonstrate the effectiveness of our framework with four case studies using real-world datasets. △ Less

Submitted 27 October, 2021; v1 submitted 2 August, 2020; originally announced August 2020.

Comments: This is the author's version of the article that has been published in IEEE Transactions on Visualization and Computer Graphics. The final version of this record is available at: 10.1109/TVCG.2020.3028889

arXiv:2007.08208 [pdf, other]

Distributed Heteromodal Split Learning for Vision Aided mmWave Received Power Prediction

Authors: Yusuke Koda, Jihong Park, Mehdi Bennis, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: The goal of this work is the accurate prediction of millimeter-wave received power leveraging both radio frequency (RF) signals and heterogeneous visual data from multiple distributed cameras, in a communication and energy-efficient manner while preserving data privacy. To this end, firstly focusing on data privacy, we propose heteromodal split learning with feature aggregation (HetSLAgg) that spl… ▽ More The goal of this work is the accurate prediction of millimeter-wave received power leveraging both radio frequency (RF) signals and heterogeneous visual data from multiple distributed cameras, in a communication and energy-efficient manner while preserving data privacy. To this end, firstly focusing on data privacy, we propose heteromodal split learning with feature aggregation (HetSLAgg) that splits neural network (NN) models into camera-side and base station (BS)-side segments. The BS-side NN segment fuses RF signals and uploaded image features without collecting raw images. However, the usage of multiple visual data leads to an increase in NN input dimensions, which gives rise to additional communication and energy costs. To overcome additional communication and energy costs due to image interpolation to blend different frame rates, we propose a novel BS-side manifold mixup technique that offloads the interpolation operations from cameras to a BS. Subsequently, we confront energy costs for operating a larger size of the BS- side NN segment due to concatenating image features across cameras and propose an energy-efficient aggregation method. This is done via a linear combination of image features instead of concatenating them, where the NN size is independent of the number of cameras. Comprehensive test-bed experiments with measured channels demonstrate that HetSLAgg reduces the prediction error by 44% compared to a baseline leveraging only RF received power. Moreover, the experiments show that the designed HetSLAgg achieves over 20% gains in terms of communication and energy cost reduction compared to several baseline designs within at most 1% of accuracy loss. △ Less

Submitted 16 July, 2020; originally announced July 2020.

Comments: 14 pages, 17 figures

arXiv:2006.01413 [pdf]

Resolving Class Imbalance in Object Detection with Weighted Cross Entropy Losses

Authors: Trong Huy Phan, Kazuma Yamamoto

Abstract: Object detection is an important task in computer vision which serves a lot of real-world applications such as autonomous driving, surveillance and robotics. Along with the rapid thrive of large-scale data, numerous state-of-the-art generalized object detectors (e.g. Faster R-CNN, YOLO, SSD) were developed in the past decade. Despite continual efforts in model modification and improvement in train… ▽ More Object detection is an important task in computer vision which serves a lot of real-world applications such as autonomous driving, surveillance and robotics. Along with the rapid thrive of large-scale data, numerous state-of-the-art generalized object detectors (e.g. Faster R-CNN, YOLO, SSD) were developed in the past decade. Despite continual efforts in model modification and improvement in training strategies to boost detection accuracy, there are still limitations in performance of detectors when it comes to specialized datasets with uneven object class distributions. This originates from the common usage of Cross Entropy loss function for object classification sub-task that simply ignores the frequency of appearance of object class during training, and thus results in lower accuracies for object classes with fewer number of samples. Class-imbalance in general machine learning has been widely studied, however, little attention has been paid on the subject of object detection. In this paper, we propose to explore and overcome such problem by application of several weighted variants of Cross Entropy loss, for examples Balanced Cross Entropy, Focal Loss and Class-Balanced Loss Based on Effective Number of Samples to our object detector. Experiments with BDD100K (a highly class-imbalanced driving database acquired from on-vehicle cameras capturing mostly Car-class objects and other minority object classes such as Bus, Person and Motor) have proven better class-wise performances of detector trained with the afore-mentioned loss functions. △ Less

Submitted 2 June, 2020; originally announced June 2020.

arXiv:2005.12027 [pdf, other]

A Preliminary Study for Identification of Additive Manufactured Objects with Transmitted Images

Authors: Kenta Yamamoto, Ryota Kawamura, Kazuki Takazawa, Hiroyuki Osone, Yoichi Ochiai

Abstract: Additive manufacturing has the potential to become a standard method for manufacturing products, and product information is indispensable for the item distribution system. While most products are given barcodes to the exterior surfaces, research on embedding barcodes inside products is underway. This is because additive manufacturing makes it possible to carry out manufacturing and information add… ▽ More Additive manufacturing has the potential to become a standard method for manufacturing products, and product information is indispensable for the item distribution system. While most products are given barcodes to the exterior surfaces, research on embedding barcodes inside products is underway. This is because additive manufacturing makes it possible to carry out manufacturing and information adding at the same time, and embedding information inside does not impair the exterior appearance of the product. However, products that have not been embedded information can not be identified, and embedded information can not be rewritten later. In this study, we have developed a product identification system that does not require embedding barcodes inside. This system uses a transmission image of the product which contains information of each product such as different inner support structures and manufacturing errors. We have shown through experiments that if datasets of transmission images are available, objects can be identified with an accuracy of over 90%. This result suggests that our approach can be useful for identifying objects without embedded information. △ Less

Submitted 25 May, 2020; originally announced May 2020.

arXiv:2005.00833 [pdf, other]

Transfer Learning-Based Received Power Prediction with Ray-tracing Simulation and Small Amount of Measurement Data

Authors: Masahiro Iwasaki, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto

Abstract: This paper proposes a method to predict received power in urban area deterministically, which can learn a prediction model from small amount of measurement data by a simulation-aided transfer learning and data augmentation. Recent development in machine learning such as artificial neural network (ANN) enables us to predict radio propagation and path loss accurately. However, training a high-perfor… ▽ More This paper proposes a method to predict received power in urban area deterministically, which can learn a prediction model from small amount of measurement data by a simulation-aided transfer learning and data augmentation. Recent development in machine learning such as artificial neural network (ANN) enables us to predict radio propagation and path loss accurately. However, training a high-performance ANN model requires a significant number of data, which are difficult to obtain in real environments. The main motivation for this work was to facilitate accurate prediction using small amount of measurement data. To this end, we propose a transfer learning-based prediction method with data augmentation. The proposed method pre-trains a prediction model using data generated from ray-tracing simulations, increases the number of data using simulation-assisted data augmentation, and then fine-tunes a model using the augmented data to fit the target environment. Experiments using Wi-Fi devices were conducted, and the results demonstrate that the proposed method predicts received power with 50% (or less) of the RMS error of conventional methods. △ Less

Submitted 2 May, 2020; originally announced May 2020.

arXiv:2004.09817 [pdf, other]

doi 10.1109/VTC2020-Fall49728.2020.9348439

Lottery Hypothesis based Unsupervised Pre-training for Model Compression in Federated Learning

Authors: Sohei Itahara, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto

Abstract: Federated learning (FL) enables a neural network (NN) to be trained using privacy-sensitive data on mobile devices while retaining all the data on their local storages. However, FL asks the mobile devices to perform heavy communication and computation tasks, i.e., devices are requested to upload and download large-volume NN models and train them. This paper proposes a novel unsupervised pre-traini… ▽ More Federated learning (FL) enables a neural network (NN) to be trained using privacy-sensitive data on mobile devices while retaining all the data on their local storages. However, FL asks the mobile devices to perform heavy communication and computation tasks, i.e., devices are requested to upload and download large-volume NN models and train them. This paper proposes a novel unsupervised pre-training method adapted for FL, which aims to reduce both the communication and computation costs through model compression. Since the communication and computation costs are highly dependent on the volume of NN models, reducing the volume without decreasing model performance can reduce these costs. The proposed pre-training method leverages unlabeled data, which is expected to be obtained from the Internet or data repository much more easily than labeled data. The key idea of the proposed method is to obtain a ``good'' subnetwork from the original NN using the unlabeled data based on the lottery hypothesis. The proposed method trains an original model using a denoising auto encoder with the unlabeled data and then prunes small-magnitude parameters of the original model to generate a small but good subnetwork. The proposed method is evaluated using an image classification task. The results show that the proposed method requires 35\% less traffic and computation time than previous methods when achieving a certain test accuracy. △ Less

Submitted 21 April, 2020; originally announced April 2020.

arXiv:2004.06337 [pdf, other]

doi 10.1109/GLOBECOM42002.2020.9322199

Differentially Private AirComp Federated Learning with Power Adaptation Harnessing Receiver Noise

Authors: Yusuke Koda, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: Over-the-air computation (AirComp)-based federated learning (FL) enables low-latency uploads and the aggregation of machine learning models by exploiting simultaneous co-channel transmission and the resultant waveform superposition. This study aims at realizing secure AirComp-based FL against various privacy attacks where malicious central servers infer clients' private data from aggregated global… ▽ More Over-the-air computation (AirComp)-based federated learning (FL) enables low-latency uploads and the aggregation of machine learning models by exploiting simultaneous co-channel transmission and the resultant waveform superposition. This study aims at realizing secure AirComp-based FL against various privacy attacks where malicious central servers infer clients' private data from aggregated global models. To this end, a differentially private AirComp-based FL is designed in this study, where the key idea is to harness receiver noise perturbation injected to aggregated global models inherently, thereby preventing the inference of clients' private data. However, the variance of the inherent receiver noise is often uncontrollable, which renders the process of injecting an appropriate noise perturbation to achieve a desired privacy level quite challenging. Hence, this study designs transmit power control across clients, wherein the received signal level is adjusted intentionally to control the noise perturbation levels effectively, thereby achieving the desired privacy level. It is observed that a higher privacy level requires lower transmit power, which indicates the tradeoff between the privacy level and signal-to-noise ratio (SNR). To understand this tradeoff more fully, the closed-form expressions of SNR (with respect to the privacy level) are derived, and the tradeoff is analytically demonstrated. The analytical results also demonstrate that among the configurable parameters, the number of participating clients is a key parameter that enhances the received SNR under the aforementioned tradeoff. The analytical results are validated through numerical evaluations. △ Less

Submitted 14 April, 2020; originally announced April 2020.

Comments: 6 pages, 4 figures

arXiv:2004.00835 [pdf, other]

Adversarial Reinforcement Learning-based Robust Access Point Coordination Against Uncoordinated Interference

Authors: Yuto Kihira, Yusuke Koda, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: This paper proposes a robust adversarial reinforcement learning (RARL)-based multi-access point (AP) coordination method that is robust even against unexpected decentralized operations of uncoordinated APs. Multi-AP coordination is a promising technique towards IEEE 802.11be, and there are studies that use RL for multi-AP coordination. Indeed, a simple RL-based multi-AP coordination method diminis… ▽ More This paper proposes a robust adversarial reinforcement learning (RARL)-based multi-access point (AP) coordination method that is robust even against unexpected decentralized operations of uncoordinated APs. Multi-AP coordination is a promising technique towards IEEE 802.11be, and there are studies that use RL for multi-AP coordination. Indeed, a simple RL-based multi-AP coordination method diminishes the collision probability among the APs; therefore, the method is a promising approach to improve time-resource efficiency. However, this method is vulnerable to frame transmissions of uncoordinated APs that are less aware of frame transmissions of other coordinated APs. To help the central agent experience even such unexpected frame transmissions, in addition to the central agent, the proposed method also competitively trains an adversarial AP that disturbs coordinated APs by causing frame collisions intensively. Besides, we propose to exploit a history of frame losses of a coordinated AP to promote reasonable competition between the central agent and adversarial AP. The simulation results indicate that the proposed method can avoid uncoordinated interference and thereby improve the minimum sum of the throughputs in the system compared to not considering the uncoordinated AP. △ Less

Submitted 2 April, 2020; originally announced April 2020.

arXiv:2003.10094 [pdf, other]

Penalized and Decentralized Contextual Bandit Learning for WLAN Channel Allocation with Contention-Driven Feature Extraction

Authors: Kota Yamashita, Shotaro Kamiya, Koji Yamamoto, Yusuke Koda, Takayuki Nishio, Masahiro Morikura

Abstract: In this study, a contextual multi-armed bandit (CMAB)-based decentralized channel exploration framework disentangling a channel utility function (i.e., reward) with respect to contending neighboring access points (APs) is proposed. The proposed framework enables APs to evaluate observed rewards compositionally for contending APs, allowing both robustness against reward fluctuation due to neighbori… ▽ More In this study, a contextual multi-armed bandit (CMAB)-based decentralized channel exploration framework disentangling a channel utility function (i.e., reward) with respect to contending neighboring access points (APs) is proposed. The proposed framework enables APs to evaluate observed rewards compositionally for contending APs, allowing both robustness against reward fluctuation due to neighboring APs' varying channels and assessment of even unexplored channels. To realize this framework, we propose contention-driven feature extraction (CDFE), which extracts the adjacency relation among APs under contention and forms the basis for expressing reward functions in the disentangled form, that is, a linear combination of parameters associated with neighboring APs under contention). This allows the CMAB to be leveraged with joint a linear upper confidence bound (JLinUCB) exploration and to delve into the effectiveness of the proposed framework. Moreover, we address the problem of non-convergence -- the channel exploration cycle -- by proposing a penalized JLinUCB (P-JLinUCB) based on the key idea of introducing a discount parameter to the reward for exploiting a different channel before and after the learning round. Numerical evaluations confirm that the proposed method allows APs to assess the channel quality robustly against reward fluctuations by CDFE and achieves better convergence properties by P-JLinUCB. △ Less

Submitted 1 December, 2021; v1 submitted 23 March, 2020; originally announced March 2020.

Comments: 12 pages, 6 figures, 3 Tables

arXiv:2003.00645 [pdf, other]

Communication-Efficient Multimodal Split Learning for mmWave Received Power Prediction

Authors: Yusuke Koda, Jihong Park, Mehdi Bennis, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: The goal of this study is to improve the accuracy of millimeter wave received power prediction by utilizing camera images and radio frequency (RF) signals, while gathering image inputs in a communication-efficient and privacy-preserving manner. To this end, we propose a distributed multimodal machine learning (ML) framework, coined multimodal split learning (MultSL), in which a large neural networ… ▽ More The goal of this study is to improve the accuracy of millimeter wave received power prediction by utilizing camera images and radio frequency (RF) signals, while gathering image inputs in a communication-efficient and privacy-preserving manner. To this end, we propose a distributed multimodal machine learning (ML) framework, coined multimodal split learning (MultSL), in which a large neural network (NN) is split into two wirelessly connected segments. The upper segment combines images and received powers for future received power prediction, whereas the lower segment extracts features from camera images and compresses its output to reduce communication costs and privacy leakage. Experimental evaluation corroborates that MultSL achieves higher accuracy than the baselines utilizing either images or RF signals. Remarkably, without compromising accuracy, compressing the lower segment output by 16x yields 16x lower communication latency and 2.8% less privacy leakage compared to the case without compression. △ Less

Submitted 2 March, 2020; v1 submitted 1 March, 2020; originally announced March 2020.

Comments: 5 pages, 7 figures, to be published at IEEE Communications Letters

arXiv:2001.00594 [pdf, ps, other]

Large-scale Gender/Age Prediction of Tumblr Users

Authors: Yao Zhan, Changwei Hu, Yifan Hu, Tejaswi Kasturi, Shanmugam Ramasamy, Matt Gillingham, Keith Yamamoto

Abstract: Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gende… ▽ More Tumblr, as a leading content provider and social media, attracts 371 million monthly visits, 280 million blogs and 53.3 million daily posts. The popularity of Tumblr provides great opportunities for advertisers to promote their products through sponsored posts. However, it is a challenging task to target specific demographic groups for ads, since Tumblr does not require user information like gender and ages during their registration. Hence, to promote ad targeting, it is essential to predict user's demography using rich content such as posts, images and social connections. In this paper, we propose graph based and deep learning models for age and gender predictions, which take into account user activities and content features. For graph based models, we come up with two approaches, network embedding and label propagation, to generate connection features as well as directly infer user's demography. For deep learning models, we leverage convolutional neural network (CNN) and multilayer perceptron (MLP) to prediction users' age and gender. Experimental results on real Tumblr daily dataset, with hundreds of millions of active users and billions of following relations, demonstrate that our approaches significantly outperform the baseline model, by improving the accuracy relatively by 81% for age, and the AUC and accuracy by 5\% for gender. △ Less

Submitted 2 January, 2020; originally announced January 2020.

Journal ref: IEEE ICMLA 2019

arXiv:1912.06352 [pdf, other]

doi 10.1109/LWC.2019.2921367

Random Access with Opportunity Detection in Wireless Networks

Authors: Jinho Choi, Seung-Woo Ko, Koji Yamamoto, Seong-Lyun Kim

Abstract: This letter proposes a novel random medium access control (MAC) based on a transmission opportunity prediction, which can be measured in a form of a conditional success probability given transmitter-side interference. A transmission probability depends on the opportunity prediction, preventing indiscriminate transmissions and reducing excessive interference causing collisions. Using stochastic geo… ▽ More This letter proposes a novel random medium access control (MAC) based on a transmission opportunity prediction, which can be measured in a form of a conditional success probability given transmitter-side interference. A transmission probability depends on the opportunity prediction, preventing indiscriminate transmissions and reducing excessive interference causing collisions. Using stochastic geometry, we derive a fixed-point equation to provide the optimal transmission probability maximizing a proportionally fair throughput. Its approximated solution is given in closed form. The proposed MAC is applicable to full-duplex networks, leading to significant throughput improvement by allowing more nodes to transmit. △ Less

Submitted 13 December, 2019; originally announced December 2019.

Comments: 4 pages, 4 figures

Journal ref: IEEE Wireless Communications Letters ( Volume: 8 , Issue: 5 , Oct. 2019 )

arXiv:1912.03880 [pdf, other]

Video Motion Capture from the Part Confidence Maps of Multi-Camera Images by Spatiotemporal Filtering Using the Human Skeletal Model

Authors: Takuya Ohashi, Yosuke Ikegami, Kazuki Yamamoto, Wataru Takano, Yoshihiko Nakamura

Abstract: This paper discusses video motion capture, namely, 3D reconstruction of human motion from multi-camera images. After the Part Confidence Maps are computed from each camera image, the proposed spatiotemporal filter is applied to deliver the human motion data with accuracy and smoothness for human motion analysis. The spatiotemporal filter uses the human skeleton and mixes temporal smoothing in two-… ▽ More This paper discusses video motion capture, namely, 3D reconstruction of human motion from multi-camera images. After the Part Confidence Maps are computed from each camera image, the proposed spatiotemporal filter is applied to deliver the human motion data with accuracy and smoothness for human motion analysis. The spatiotemporal filter uses the human skeleton and mixes temporal smoothing in two-time inverse kinematics computations. The experimental results show that the mean per joint position error was 26.1mm for regular motions and 38.8mm for inverted motions. △ Less

Submitted 10 December, 2019; v1 submitted 9 December, 2019; originally announced December 2019.

Comments: International Conference on Intelligent Robots and Systems (IROS), 2018

arXiv:1911.01682 [pdf, other]

doi 10.1145/3360468.3368176

One Pixel Image and RF Signal Based Split Learning for mmWave Received Power Prediction

Authors: Yusuke Koda, Jihong Park, Mehdi Bennis, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: Focusing on the received power prediction of millimeter-wave (mmWave) radio-frequency (RF) signals, we propose a multimodal split learning (SL) framework that integrates RF received signal powers and depth-images observed by physically separated entities. To improve its communication efficiency while preserving data privacy, we propose an SL neural network architecture that compresses the communic… ▽ More Focusing on the received power prediction of millimeter-wave (mmWave) radio-frequency (RF) signals, we propose a multimodal split learning (SL) framework that integrates RF received signal powers and depth-images observed by physically separated entities. To improve its communication efficiency while preserving data privacy, we propose an SL neural network architecture that compresses the communication payload, i.e., images. Compared to a baseline solely utilizing RF signals, numerical results show that SL integrating only one pixel image with RF signals achieves higher prediction accuracy while maximizing both communication efficiency and privacy guarantees. △ Less

Submitted 5 November, 2019; originally announced November 2019.

Comments: 3 pages, Accepted in ACM CoNEXT 2019 Poster Session

arXiv:1906.05694 [pdf, other]

Cooperative Sensing in Deep RL-Based Image-to-Decision Proactive Handover for mmWave Networks

Authors: Yusuke Koda, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: For reliable millimeter-wave (mmWave) networks, this paper proposes cooperative sensing with multi-camera operation in an image-to-decision proactive handover framework that directly maps images to a handover decision. In the framework, camera images are utilized to allow for the prediction of blockage effects in a mmWave link, whereby a network controller triggers a handover in a proactive fashio… ▽ More For reliable millimeter-wave (mmWave) networks, this paper proposes cooperative sensing with multi-camera operation in an image-to-decision proactive handover framework that directly maps images to a handover decision. In the framework, camera images are utilized to allow for the prediction of blockage effects in a mmWave link, whereby a network controller triggers a handover in a proactive fashion. Furthermore, direct mapping allows for the scalability of the number of pedestrians. This paper experimentally investigates the feasibility of adopting cooperative sensing with multiple cameras that can compensate for one another's blind spots. The optimal mapping is learned via deep reinforcement learning to resolve the high dimensionality of images from multiple cameras. An evaluation based on experimentally obtained images and received powers verifies that a mapping that enhances channel capacity can be learned in a multi-camera operation. The results indicate that our proposed framework with multi-camera operation outperforms a conventional framework with single-camera operation in terms of the average capacity. △ Less

Submitted 12 June, 2019; originally announced June 2019.

Comments: arXiv admin note: text overlap with arXiv:1904.04585

arXiv:1905.07210 [pdf, other]

Hybrid-FL for Wireless Networks: Cooperative Learning Mechanism Using Non-IID Data

Authors: Naoya Yoshida, Takayuki Nishio, Masahiro Morikura, Koji Yamamoto, Ryo Yonetani

Abstract: This paper proposes a cooperative mechanism for mitigating the performance degradation due to non-independent-and-identically-distributed (non-IID) data in collaborative machine learning (ML), namely federated learning (FL), which trains an ML model using the rich data and computational resources of mobile clients without gathering their data to central systems. The data of mobile clients is typic… ▽ More This paper proposes a cooperative mechanism for mitigating the performance degradation due to non-independent-and-identically-distributed (non-IID) data in collaborative machine learning (ML), namely federated learning (FL), which trains an ML model using the rich data and computational resources of mobile clients without gathering their data to central systems. The data of mobile clients is typically non-IID owing to diversity among mobile clients' interests and usage, and FL with non-IID data could degrade the model performance. Therefore, to mitigate the degradation induced by non-IID data, we assume that a limited number (e.g., less than 1%) of clients allow their data to be uploaded to a server, and we propose a hybrid learning mechanism referred to as Hybrid-FL, wherein the server updates the model using the data gathered from the clients and aggregates the model with the models trained by clients. The Hybrid-FL solves both client- and data-selection problems via heuristic algorithms, which try to select the optimal sets of clients who train models with their own data, clients who upload their data to the server, and data uploaded to the server. The algorithms increase the number of clients participating in FL and make more data gather in the server IID, thereby improving the prediction accuracy of the aggregated model. Evaluations, which consist of network simulations and ML experiments, demonstrate that the proposed scheme achieves a 13.5% higher classification accuracy than those of the previously proposed schemes for the non-IID case. △ Less

Submitted 5 March, 2020; v1 submitted 17 May, 2019; originally announced May 2019.

Journal ref: Proc. IEEE ICC 2019, Dublin, Ireland, June 2020

arXiv:1905.07144 [pdf, ps, other]

Deep Reinforcement Learning-Based Channel Allocation for Wireless LANs with Graph Convolutional Networks

Authors: Kota Nakashima, Shotaro Kamiya, Kazuki Ohtsu, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: Last year, IEEE 802.11 Extremely High Throughput Study Group (EHT Study Group) was established to initiate discussions on new IEEE 802.11 features. Coordinated control methods of the access points (APs) in the wireless local area networks (WLANs) are discussed in EHT Study Group. The present study proposes a deep reinforcement learning-based channel allocation scheme using graph convolutional netw… ▽ More Last year, IEEE 802.11 Extremely High Throughput Study Group (EHT Study Group) was established to initiate discussions on new IEEE 802.11 features. Coordinated control methods of the access points (APs) in the wireless local area networks (WLANs) are discussed in EHT Study Group. The present study proposes a deep reinforcement learning-based channel allocation scheme using graph convolutional networks (GCNs). As a deep reinforcement learning method, we use a well-known method double deep Q-network. In densely deployed WLANs, the number of the available topologies of APs is extremely high, and thus we extract the features of the topological structures based on GCNs. We apply GCNs to a contention graph where APs within their carrier sensing ranges are connected to extract the features of carrier sensing relationships. Additionally, to improve the learning speed especially in an early stage of learning, we employ a game theory-based method to collect the training data independently of the neural network model. The simulation results indicate that the proposed method can appropriately control the channels when compared to extant methods. △ Less

Submitted 17 May, 2019; originally announced May 2019.

arXiv:1904.04585 [pdf, other]

Handover Management for mmWave Networks with Proactive Performance Prediction Using Camera Images and Deep Reinforcement Learning

Authors: Yusuke Koda, Kota Nakashima, Koji Yamamoto, Takayuki Nishio, Masahiro Morikura

Abstract: For millimeter-wave networks, this paper presents a paradigm shift for leveraging time-consecutive camera images in handover decision problems. While making handover decisions, it is important to predict future long-term performance---e.g., the cumulative sum of time-varying data rates---proactively to avoid making myopic decisions. However, this study experimentally notices that a time-variation… ▽ More For millimeter-wave networks, this paper presents a paradigm shift for leveraging time-consecutive camera images in handover decision problems. While making handover decisions, it is important to predict future long-term performance---e.g., the cumulative sum of time-varying data rates---proactively to avoid making myopic decisions. However, this study experimentally notices that a time-variation in the received powers is not necessarily informative for proactively predicting the rapid degradation of data rates caused by moving obstacles. To overcome this challenge, this study proposes a proactive framework wherein handover timings are optimized while obstacle-caused data rate degradations are predicted before the degradations occur. The key idea is to expand a state space to involve time consecutive camera images, which comprises informative features for predicting such data rate degradations. To overcome the difficulty in handling the large dimensionality of the expanded state space, we use a deep reinforcement learning for deciding the handover timings. The evaluations performed based on the experimentally obtained camera images and received powers demonstrate that the expanded state space facilitates (i) the prediction of obstacle-caused data rate degradations from 500 ms before the degradations occur and (ii) superior performance to a handover framework without the state space expansion △ Less

Submitted 17 July, 2020; v1 submitted 9 April, 2019; originally announced April 2019.

Comments: 14 pages, 19 figures, Published at IEEE Transactions on Cognitive Communications and Networking

arXiv:1904.02096 [pdf, other]

doi 10.1016/j.specom.2020.06.001

GEDI: Gammachirp Envelope Distortion Index for Predicting Intelligibility of Enhanced Speech

Authors: Katsuhiko Yamamoto, Toshio Irino, Shoko Araki, Keisuke Kinoshita, Tomohiro Nakatani

Abstract: In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDRenv to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gamma… ▽ More In this study, we propose a new concept, the gammachirp envelope distortion index (GEDI), based on the signal-to-distortion ratio in the auditory envelope, SDRenv to predict the intelligibility of speech enhanced by nonlinear algorithms. The objective of GEDI is to calculate the distortion between enhanced and clean-speech representations in the domain of a temporal envelope extracted by the gammachirp auditory filterbank and modulation filterbank. We also extend GEDI with multi-resolution analysis (mr-GEDI) to predict the speech intelligibility of sounds under non-stationary noise conditions. We evaluate GEDI in terms of speech intelligibility predictions of speech sounds enhanced by a classic spectral subtraction and a Wiener filtering method. The predictions are compared with human results for various signal-to-noise ratio conditions with additive pink and babble noises. The results showed that mr-GEDI predicted the intelligibility curves better than short-time objective intelligibility (STOI) measure, extended-STOI (ESTOI) measure, and hearing-aid speech perception index (HASPI) under pink-noise conditions, and better than HASPI under babble-noise conditions. The mr-GEDI method does not present an overestimation tendency and is considered a more conservative approach than STOI and ESTOI. Therefore, the evaluation with mr-GEDI may provide additional information in the development of speech enhancement algorithms. △ Less

Submitted 19 July, 2020; v1 submitted 3 April, 2019; originally announced April 2019.

Comments: Preprint, 37 pages, 6 tables, 9 figures

Journal ref: Speech Communication, Vol. 123, pp. 43-58, 2020

Showing 1–50 of 69 results for author: Yamamoto, K