subscribe to arXiv mailings

E2GS: Event Enhanced Gaussian Splatting

Authors: Hiroyuki Deguchi, Mana Masuda, Takuya Nakabayashi, Hideo Saito

Abstract: Event cameras, known for their high dynamic range, absence of motion blur, and low energy usage, have recently found a wide range of applications thanks to these attributes. In the past few years, the field of event-based 3D reconstruction saw remarkable progress, with the Neural Radiance Field (NeRF) based approach demonstrating photorealistic view synthesis results. However, the volume rendering… ▽ More Event cameras, known for their high dynamic range, absence of motion blur, and low energy usage, have recently found a wide range of applications thanks to these attributes. In the past few years, the field of event-based 3D reconstruction saw remarkable progress, with the Neural Radiance Field (NeRF) based approach demonstrating photorealistic view synthesis results. However, the volume rendering paradigm of NeRF necessitates extensive training and rendering times. In this paper, we introduce Event Enhanced Gaussian Splatting (E2GS), a novel method that incorporates event data into Gaussian Splatting, which has recently made significant advances in the field of novel view synthesis. Our E2GS effectively utilizes both blurry images and event data, significantly improving image deblurring and producing high-quality novel view synthesis. Our comprehensive experiments on both synthetic and real-world datasets demonstrate our E2GS can generate visually appealing renderings while offering faster training and rendering speed (140 FPS). Our code is available at https://github.com/deguchihiroyuki/E2GS. △ Less

Submitted 21 June, 2024; originally announced June 2024.

Comments: 7pages,

arXiv:2406.03095 [pdf, other]

EgoSurgery-Tool: A Dataset of Surgical Tool and Hand Detection from Egocentric Open Surgery Videos

Authors: Ryo Fujii, Hideo Saito, Hiroki Kajita

Abstract: Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an exten… ▽ More Surgical tool detection is a fundamental task for understanding egocentric open surgery videos. However, detecting surgical tools presents significant challenges due to their highly imbalanced class distribution, similar shapes and similar textures, and heavy occlusion. The lack of a comprehensive large-scale dataset compounds these challenges. In this paper, we introduce EgoSurgery-Tool, an extension of the existing EgoSurgery-Phase dataset, which contains real open surgery videos captured using an egocentric camera attached to the surgeon's head, along with phase annotations. EgoSurgery-Tool has been densely annotated with surgical tools and comprises over 49K surgical tool bounding boxes across 15 categories, constituting a large-scale surgical tool detection dataset. EgoSurgery-Tool also provides annotations for hand detection with over 46K hand-bounding boxes, capturing hand-object interactions that are crucial for understanding activities in egocentric open surgery. EgoSurgery-Tool is superior to existing datasets due to its larger scale, greater variety of surgical tools, more annotations, and denser scenes. We conduct a comprehensive analysis of EgoSurgery-Tool using nine popular object detectors to assess their effectiveness in both surgical tool and hand detection. The dataset will be released at https://github.com/Fujiry0/EgoSurgery. △ Less

Submitted 6 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

arXiv:2405.20030 [pdf, other]

EMAG: Ego-motion Aware and Generalizable 2D Hand Forecasting from Egocentric Videos

Authors: Masashi Hatano, Ryo Hachiuma, Hideo Saito

Abstract: Predicting future human behavior from egocentric videos is a challenging but critical task for human intention understanding. Existing methods for forecasting 2D hand positions rely on visual representations and mainly focus on hand-object interactions. In this paper, we investigate the hand forecasting task and tackle two significant issues that persist in the existing methods: (1) 2D hand positi… ▽ More Predicting future human behavior from egocentric videos is a challenging but critical task for human intention understanding. Existing methods for forecasting 2D hand positions rely on visual representations and mainly focus on hand-object interactions. In this paper, we investigate the hand forecasting task and tackle two significant issues that persist in the existing methods: (1) 2D hand positions in future frames are severely affected by ego-motions in egocentric videos; (2) prediction based on visual information tends to overfit to background or scene textures, posing a challenge for generalization on novel scenes or human behaviors. To solve the aforementioned problems, we propose EMAG, an ego-motion-aware and generalizable 2D hand forecasting method. In response to the first problem, we propose a method that considers ego-motion, represented by a sequence of homography matrices of two consecutive frames. We further leverage modalities such as optical flow, trajectories of hands and interacting objects, and ego-motions, thereby alleviating the second issue. Extensive experiments on two large-scale egocentric video datasets, Ego4D and EPIC-Kitchens 55, verify the effectiveness of the proposed method. In particular, our model outperforms prior methods by $7.0$\% on cross-dataset evaluations. Project page: https://masashi-hatano.github.io/EMAG/ △ Less

Submitted 30 May, 2024; originally announced May 2024.

arXiv:2405.19917 [pdf, other]

Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition

Authors: Masashi Hatano, Ryo Hachiuma, Ryo Fujii, Hideo Saito

Abstract: We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition in CD-FSL settings: (1) the extreme domain gap in egocentric videos (e.g., daily life vs. industrial domain) and (2) the computational cost for real-… ▽ More We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input and unlabeled target data for egocentric action recognition. This paper simultaneously tackles two critical challenges associated with egocentric action recognition in CD-FSL settings: (1) the extreme domain gap in egocentric videos (e.g., daily life vs. industrial domain) and (2) the computational cost for real-world applications. We propose MM-CDFSL, a domain-adaptive and computationally efficient approach designed to enhance adaptability to the target domain and improve inference cost. To address the first challenge, we propose the incorporation of multimodal distillation into the student RGB model using teacher models. Each teacher model is trained independently on source and target data for its respective modality. Leveraging only unlabeled target data during multimodal distillation enhances the student model's adaptability to the target domain. We further introduce ensemble masked inference, a technique that reduces the number of input tokens through masking. In this approach, ensemble prediction mitigates the performance degradation caused by masking, effectively addressing the second issue. Our approach outperformed the state-of-the-art CD-FSL approaches with a substantial margin on multiple egocentric datasets, improving by an average of 6.12/6.10 points for 1-shot/5-shot settings while achieving $2.2$ times faster inference speed. Project page: https://masashi-hatano.github.io/MM-CDFSL/ △ Less

Submitted 16 July, 2024; v1 submitted 30 May, 2024; originally announced May 2024.

Comments: Accepted at ECCV'24

arXiv:2405.19644 [pdf, other]

EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos

Authors: Ryo Fujii, Masashi Hatano, Hideo Saito, Hiroki Kajita

Abstract: Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datase… ▽ More Surgical phase recognition has gained significant attention due to its potential to offer solutions to numerous demands of the modern operating room. However, most existing methods concentrate on minimally invasive surgery (MIS), leaving surgical phase recognition for open surgery understudied. This discrepancy is primarily attributed to the scarcity of publicly available open surgery video datasets for surgical phase recognition. To address this issue, we introduce a new egocentric open surgery video dataset for phase recognition, named EgoSurgery-Phase. This dataset comprises 15 hours of real open surgery videos spanning 9 distinct surgical phases all captured using an egocentric camera attached to the surgeon's head. In addition to video, the EgoSurgery-Phase offers eye gaze. As far as we know, it is the first real open surgery video dataset for surgical phase recognition publicly available. Furthermore, inspired by the notable success of masked autoencoders (MAEs) in video understanding tasks (e.g., action recognition), we propose a gaze-guided masked autoencoder (GGMAE). Considering the regions where surgeons' gaze focuses are often critical for surgical phase recognition (e.g., surgical field), in our GGMAE, the gaze information acts as an empirical semantic richness prior to guiding the masking process, promoting better attention to semantically rich spatial regions. GGMAE significantly improves the previous state-of-the-art recognition method (6.4% in Jaccard) and the masked autoencoder-based method (3.1% in Jaccard) on EgoSurgery-Phase. The dataset will be released at https://github.com/Fujiry0/EgoSurgery. △ Less

Submitted 29 May, 2024; originally announced May 2024.

Comments: Early accepted by MICCAI 2024

arXiv:2401.02791 [pdf, other]

Weakly Semi-supervised Tool Detection in Minimally Invasive Surgery Videos

Authors: Ryo Fujii, Ryo Hachiuma, Hideo Saito

Abstract: Surgical tool detection is essential for analyzing and evaluating minimally invasive surgery videos. Current approaches are mostly based on supervised methods that require large, fully instance-level labels (i.e., bounding boxes). However, large image datasets with instance-level labels are often limited because of the burden of annotation. Thus, surgical tool detection is important when providing… ▽ More Surgical tool detection is essential for analyzing and evaluating minimally invasive surgery videos. Current approaches are mostly based on supervised methods that require large, fully instance-level labels (i.e., bounding boxes). However, large image datasets with instance-level labels are often limited because of the burden of annotation. Thus, surgical tool detection is important when providing image-level labels instead of instance-level labels since image-level annotations are considerably more time-efficient than instance-level annotations. In this work, we propose to strike a balance between the extremely costly annotation burden and detection performance. We further propose a co-occurrence loss, which considers a characteristic that some tool pairs often co-occur together in an image to leverage image-level labels. Encapsulating the knowledge of co-occurrence using the co-occurrence loss helps to overcome the difficulty in classification that originates from the fact that some tools have similar shapes and textures. Extensive experiments conducted on the Endovis2018 dataset in various data settings show the effectiveness of our method. △ Less

Submitted 8 January, 2024; v1 submitted 5 January, 2024; originally announced January 2024.

Comments: Accepted to ICASSP 2024

arXiv:2305.07152 [pdf, other]

Surgical tool classification and localization: results and methods from the MICCAI 2022 SurgToolLoc challenge

Authors: Aneeq Zia, Kiran Bhattacharyya, Xi Liu, Max Berniker, Ziheng Wang, Rogerio Nespolo, Satoshi Kondo, Satoshi Kasai, Kousuke Hirasawa, Bo Liu, David Austin, Yiheng Wang, Michal Futrega, Jean-Francois Puget, Zhenqiang Li, Yoichi Sato, Ryo Fujii, Ryo Hachiuma, Mana Masuda, Hideo Saito, An Wang, Mengya Xu, Mobarakol Islam, Long Bai, Winnie Pang , et al. (46 additional authors not shown)

Abstract: The ability to automatically detect and track surgical instruments in endoscopic videos can enable transformational interventions. Assessing surgical performance and efficiency, identifying skilled tool use and choreography, and planning operational and logistical aspects of OR resources are just a few of the applications that could benefit. Unfortunately, obtaining the annotations needed to train… ▽ More The ability to automatically detect and track surgical instruments in endoscopic videos can enable transformational interventions. Assessing surgical performance and efficiency, identifying skilled tool use and choreography, and planning operational and logistical aspects of OR resources are just a few of the applications that could benefit. Unfortunately, obtaining the annotations needed to train machine learning models to identify and localize surgical tools is a difficult task. Annotating bounding boxes frame-by-frame is tedious and time-consuming, yet large amounts of data with a wide variety of surgical tools and surgeries must be captured for robust training. Moreover, ongoing annotator training is needed to stay up to date with surgical instrument innovation. In robotic-assisted surgery, however, potentially informative data like timestamps of instrument installation and removal can be programmatically harvested. The ability to rely on tool installation data alone would significantly reduce the workload to train robust tool-tracking models. With this motivation in mind we invited the surgical data science community to participate in the challenge, SurgToolLoc 2022. The goal was to leverage tool presence data as weak labels for machine learning models trained to detect tools and localize them in video frames with bounding boxes. We present the results of this challenge along with many of the team's efforts. We conclude by discussing these results in the broader context of machine learning and surgical data science. The training data used for this challenge consisting of 24,695 video clips with tool presence labels is also being released publicly and can be accessed at https://console.cloud.google.com/storage/browser/isi-surgtoolloc-2022. △ Less

Submitted 31 May, 2023; v1 submitted 11 May, 2023; originally announced May 2023.

arXiv:2305.04531 [pdf, ps, other]

A method for analyzing sampling jitter in audio equipment

Authors: Makoto Takeuchi, Haruo Saito

Abstract: A method for analyzing sampling jitter in audio equipment is proposed. The method is based on the time-domain analysis where the time fluctuations of zero-crossing points in recorded sinusoidal waves are employed to characterize jitter. This method enables the separate evaluation of jitter in an audio player from those in audio recorders when the same playback signal is simultaneously fed into two… ▽ More A method for analyzing sampling jitter in audio equipment is proposed. The method is based on the time-domain analysis where the time fluctuations of zero-crossing points in recorded sinusoidal waves are employed to characterize jitter. This method enables the separate evaluation of jitter in an audio player from those in audio recorders when the same playback signal is simultaneously fed into two audio recorders. Experiments are conducted using commercially available portable devices with a maximum sampling rate of 192~000 samples per second. The results show jitter values of a few tens of picoseconds can be identified in an audio player. Moreover, the proposed method enables the separation of jitter from phase-independent noise utilizing the left and right channels of the audio equipment. As such, this method is applicable for performance evaluation of audio equipment, signal generators, and clock sources. △ Less

Submitted 8 May, 2023; originally announced May 2023.

Comments: 15 pages, 12 figures

arXiv:2304.04559 [pdf, other]

Event-based Camera Tracker by $\nabla$t NeRF

Authors: Mana Masuda, Yusuke Sekikawa, Hideo Saito

Abstract: When a camera travels across a 3D world, only a fraction of pixel value changes; an event-based camera observes the change as sparse events. How can we utilize sparse events for efficient recovery of the camera pose? We show that we can recover the camera pose by minimizing the error between sparse events and the temporal gradient of the scene represented as a neural radiance field (NeRF). To enab… ▽ More When a camera travels across a 3D world, only a fraction of pixel value changes; an event-based camera observes the change as sparse events. How can we utilize sparse events for efficient recovery of the camera pose? We show that we can recover the camera pose by minimizing the error between sparse events and the temporal gradient of the scene represented as a neural radiance field (NeRF). To enable the computation of the temporal gradient of the scene, we augment NeRF's camera pose as a time function. When the input pose to the NeRF coincides with the actual pose, the output of the temporal gradient of NeRF equals the observed intensity changes on the event's points. Using this principle, we propose an event-based camera pose tracking framework called TeGRA which realizes the pose update by using the sparse event's observation. To the best of our knowledge, this is the first camera pose estimation algorithm using the scene's implicit representation and the sparse intensity change from events. △ Less

Submitted 7 April, 2023; originally announced April 2023.

arXiv:2304.03420 [pdf, other]

Toward Unsupervised 3D Point Cloud Anomaly Detection using Variational Autoencoder

Authors: Mana Masuda, Ryo Hachiuma, Ryo Fujii, Hideo Saito, Yusuke Sekikawa

Abstract: In this paper, we present an end-to-end unsupervised anomaly detection framework for 3D point clouds. To the best of our knowledge, this is the first work to tackle the anomaly detection task on a general object represented by a 3D point cloud. We propose a deep variational autoencoder-based unsupervised anomaly detection network adapted to the 3D point cloud and an anomaly score specifically for… ▽ More In this paper, we present an end-to-end unsupervised anomaly detection framework for 3D point clouds. To the best of our knowledge, this is the first work to tackle the anomaly detection task on a general object represented by a 3D point cloud. We propose a deep variational autoencoder-based unsupervised anomaly detection network adapted to the 3D point cloud and an anomaly score specifically for 3D point clouds. To verify the effectiveness of the model, we conducted extensive experiments on the ShapeNet dataset. Through quantitative and qualitative evaluation, we demonstrate that the proposed method outperforms the baseline method. Our code is available at https://github.com/llien30/point_cloud_anomaly_detection. △ Less

Submitted 6 April, 2023; originally announced April 2023.

Comments: ICIP2021

arXiv:2303.15947 [pdf, other]

Deep Selection: A Fully Supervised Camera Selection Network for Surgery Recordings

Authors: Ryo Hachiuma, Tomohiro Shimizu, Hideo Saito, Hiroki Kajita, Yoshifumi Takatsume

Abstract: Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one… ▽ More Recording surgery in operating rooms is an essential task for education and evaluation of medical treatment. However, recording the desired targets, such as the surgery field, surgical tools, or doctor's hands, is difficult because the targets are heavily occluded during surgery. We use a recording system in which multiple cameras are embedded in the surgical lamp, and we assume that at least one camera is recording the target without occlusion at any given time. As the embedded cameras obtain multiple video sequences, we address the task of selecting the camera with the best view of the surgery. Unlike the conventional method, which selects the camera based on the area size of the surgery field, we propose a deep neural network that predicts the camera selection probability from multiple video sequences by learning the supervision of the expert annotation. We created a dataset in which six different types of plastic surgery are recorded, and we provided the annotation of camera switching. Our experiments show that our approach successfully switched between cameras and outperformed three baseline methods. △ Less

Submitted 28 March, 2023; originally announced March 2023.

Comments: MICCAI 2020

arXiv:2303.13465 [pdf, other]

Deep RL with Hierarchical Action Exploration for Dialogue Generation

Authors: Itsugun Cho, Ryota Takahashi, Yusaku Yanase, Hiroaki Saito

Abstract: Traditionally, approximate dynamic programming is employed in dialogue generation with greedy policy improvement through action sampling, as the natural language action space is vast. However, this practice is inefficient for reinforcement learning (RL) due to the sparsity of eligible responses with high action values, which leads to weak improvement sustained by random sampling. This paper presen… ▽ More Traditionally, approximate dynamic programming is employed in dialogue generation with greedy policy improvement through action sampling, as the natural language action space is vast. However, this practice is inefficient for reinforcement learning (RL) due to the sparsity of eligible responses with high action values, which leads to weak improvement sustained by random sampling. This paper presents theoretical analysis and experiments that reveal the performance of the dialogue policy is positively correlated with the sampling size. To overcome this limitation, we introduce a novel dual-granularity Q-function that explores the most promising response category to intervene in the sampling process. Our approach extracts actions based on a grained hierarchy, thereby achieving the optimum with fewer policy iterations. Additionally, we use offline RL and learn from multiple reward functions designed to capture emotional nuances in human interactions. Empirical studies demonstrate that our algorithm outperforms baselines across automatic metrics and human evaluations. Further testing reveals that our algorithm exhibits both explainability and controllability and generates responses with higher expected rewards. △ Less

Submitted 15 May, 2023; v1 submitted 22 March, 2023; originally announced March 2023.

arXiv:2204.07372 [pdf, other]

A Personalized Dialogue Generator with Implicit User Persona Detection

Authors: Itsugun Cho, Dongyang Wang, Ryota Takahashi, Hiroaki Saito

Abstract: Current works in the generation of personalized dialogue primarily contribute to the agent presenting a consistent personality and driving a more informative response. However, we found that the generated responses from most previous models tend to be self-centered, with little care for the user in the dialogue. Moreover, we consider that human-like conversation is essentially built based on infer… ▽ More Current works in the generation of personalized dialogue primarily contribute to the agent presenting a consistent personality and driving a more informative response. However, we found that the generated responses from most previous models tend to be self-centered, with little care for the user in the dialogue. Moreover, we consider that human-like conversation is essentially built based on inferring information about the persona of the other party. Motivated by this, we propose a novel personalized dialogue generator by detecting an implicit user persona. Because it is hard to collect a large number of detailed personas for each user, we attempted to model the user's potential persona and its representation from dialogue history, with no external knowledge. The perception and fader variables were conceived using conditional variational inference. The two latent variables simulate the process of people being aware of each other's persona and producing a corresponding expression in conversation. Finally, posterior-discriminated regularization was presented to enhance the training procedure. Empirical studies demonstrate that, compared to state-of-the-art methods, our approach is more concerned with the user's persona and achieves a considerable boost across the evaluations. △ Less

Submitted 21 August, 2022; v1 submitted 15 April, 2022; originally announced April 2022.

Comments: 9 pages, 7 figures, Accepted by Coling2022

arXiv:2203.07098 [pdf, other]

A Two-Block RNN-based Trajectory Prediction from Incomplete Trajectory

Authors: Ryo Fujii, Jayakorn Vongkulbhisal, Ryo Hachiuma, Hideo Saito

Abstract: Trajectory prediction has gained great attention and significant progress has been made in recent years. However, most works rely on a key assumption that each video is successfully preprocessed by detection and tracking algorithms and the complete observed trajectory is always available. However, in complex real-world environments, we often encounter miss-detection of target agents (e.g., pedestr… ▽ More Trajectory prediction has gained great attention and significant progress has been made in recent years. However, most works rely on a key assumption that each video is successfully preprocessed by detection and tracking algorithms and the complete observed trajectory is always available. However, in complex real-world environments, we often encounter miss-detection of target agents (e.g., pedestrian, vehicles) caused by the bad image conditions, such as the occlusion by other agents. In this paper, we address the problem of trajectory prediction from incomplete observed trajectory due to miss-detection, where the observed trajectory includes several missing data points. We introduce a two-block RNN model that approximates the inference steps of the Bayesian filtering framework and seeks the optimal estimation of the hidden state when miss-detection occurs. The model uses two RNNs depending on the detection result. One RNN approximates the inference step of the Bayesian filter with the new measurement when the detection succeeds, while the other does the approximation when the detection fails. Our experiments show that the proposed model improves the prediction accuracy compared to the three baseline imputation methods on publicly available datasets: ETH and UCY ($9\%$ and $7\%$ improvement on the ADE and FDE metrics). We also show that our proposed method can achieve better prediction compared to the baselines when there is no miss-detection. △ Less

Submitted 16 March, 2022; v1 submitted 14 March, 2022; originally announced March 2022.

Comments: Accepted by IEEE Access

arXiv:2202.00210 [pdf, other]

INPUT Team Description Paper in 2022

Authors: Masaki Yasuhara, Tomoya Takahashi, Hiroki Maruta, Hiroyuki Saito, Shota Higuchi, Takaaki Nara, Keitaro Takeuchi, Yota Sakai, Kazuki Ishibashi

Abstract: INPUT is a team participating in the RoboCup Soccer Small League (SSL). It aims to show the world the technological capabilities of the Nagaoka region of Niigata Prefecture, which is where the team members are from. For this purpose, we are working on one of the projects from the Nagaoka Activation Zone of Energy (NAZE). Herein, we introduce two robots, v2019 and v2022, as well as AI systems that… ▽ More INPUT is a team participating in the RoboCup Soccer Small League (SSL). It aims to show the world the technological capabilities of the Nagaoka region of Niigata Prefecture, which is where the team members are from. For this purpose, we are working on one of the projects from the Nagaoka Activation Zone of Energy (NAZE). Herein, we introduce two robots, v2019 and v2022, as well as AI systems that will be used in RoboCup 2022. In addition, we describe our efforts to develop robots in collaboration with companies in the Nagaoka area. △ Less

Submitted 31 January, 2022; originally announced February 2022.

arXiv:2111.03824 [pdf, other]

Neural Implicit Event Generator for Motion Tracking

Authors: Mana Masuda, Yusuke Sekikawa, Ryo Fujii, Hideo Saito

Abstract: We present a novel framework of motion tracking from event data using implicit expression. Our framework use pre-trained event generation MLP named implicit event generator (IEG) and does motion tracking by updating its state (position and velocity) based on the difference between the observed event and generated event from the current state estimate. The difference is computed implicitly by the I… ▽ More We present a novel framework of motion tracking from event data using implicit expression. Our framework use pre-trained event generation MLP named implicit event generator (IEG) and does motion tracking by updating its state (position and velocity) based on the difference between the observed event and generated event from the current state estimate. The difference is computed implicitly by the IEG. Unlike the conventional explicit approach, which requires dense computation to evaluate the difference, our implicit approach realizes efficient state update directly from sparse event data. Our sparse algorithm is especially suitable for mobile robotics applications where computational resources and battery life are limited. To verify the effectiveness of our method on real-world data, we applied it to the AR marker tracking application. We have confirmed that our framework works well in real-world environments in the presence of noise and background clutter. △ Less

Submitted 6 November, 2021; originally announced November 2021.

Comments: Submitted to ICRA 2022

arXiv:2110.07413 [pdf, other]

RGB-D Image Inpainting Using Generative Adversarial Network with a Late Fusion Approach

Authors: Ryo Fujii, Ryo Hachiuma, Hideo Saito

Abstract: Diminished reality is a technology that aims to remove objects from video images and fills in the missing region with plausible pixels. Most conventional methods utilize the different cameras that capture the same scene from different viewpoints to allow regions to be removed and restored. In this paper, we propose an RGB-D image inpainting method using generative adversarial network, which does n… ▽ More Diminished reality is a technology that aims to remove objects from video images and fills in the missing region with plausible pixels. Most conventional methods utilize the different cameras that capture the same scene from different viewpoints to allow regions to be removed and restored. In this paper, we propose an RGB-D image inpainting method using generative adversarial network, which does not require multiple cameras. Recently, an RGB image inpainting method has achieved outstanding results by employing a generative adversarial network. However, RGB inpainting methods aim to restore only the texture of the missing region and, therefore, does not recover geometric information (i.e, 3D structure of the scene). We expand conventional image inpainting method to RGB-D image inpainting to jointly restore the texture and geometry of missing regions from a pair of RGB and depth images. Inspired by other tasks that use RGB and depth images (e.g., semantic segmentation and object detection), we propose late fusion approach that exploits the advantage of RGB and depth information each other. The experimental results verify the effectiveness of our proposed method. △ Less

Submitted 14 October, 2021; originally announced October 2021.

Comments: Accepted at AVR 2020

arXiv:2108.12971 [pdf, other]

HELMHOLTZ: A Verifier for Tezos Smart Contracts Based on Refinement Types

Authors: Yuki Nishida, Hiromasa Saito, Ran Chen, Akira Kawata, Jun Furuse, Kohei Suenaga, Atsushi Igarashi

Abstract: A smart contract is a program executed on a blockchain, based on which many cryptocurrencies are implemented, and is being used for automating transactions. Due to the large amount of money that smart contracts deal with, there is a surging demand for a method that can statically and formally verify them. This article describes our type-based static verification tool HELMHOLTZ for Michelson, whi… ▽ More A smart contract is a program executed on a blockchain, based on which many cryptocurrencies are implemented, and is being used for automating transactions. Due to the large amount of money that smart contracts deal with, there is a surging demand for a method that can statically and formally verify them. This article describes our type-based static verification tool HELMHOLTZ for Michelson, which is a statically typed stack-based language for writing smart contracts that are executed on the blockchain platform Tezos. HELMHOLTZ is designed on top of our extension of Michelson's type system with refinement types. HELMHOLTZ takes a Michelson program annotated with a user-defined specification written in the form of a refinement type as input; it then typechecks the program against the specification based on the refinement type system, discharging the generated verification conditions with the SMT solver Z3. We briefly introduce our refinement type system for the core calculus Mini-Michelson of Michelson, which incorporates the characteristic features such as compound datatypes (e.g., lists and pairs), higher-order functions, and invocation of another contract. \HELMHOLTZ{} successfully verifies several practical Michelson programs, including one that transfers money to an account and that checks a digital signature. △ Less

Submitted 10 September, 2021; v1 submitted 29 August, 2021; originally announced August 2021.

arXiv:2105.00151 [pdf, other]

Theoretical Analysis for Determining Geographical Route of Cable Network with Various Disaster-Endurance Levels

Authors: Hiroshi Saito

Abstract: This paper theoretically analyzes cable network disconnection due to randomly occurring natural disasters, where the disaster-endurance (DE) levels of the network are determined by a network entity such as the type of shielding method used for a duct containing cables. The network operator can determine which parts have a high DE level. When a part of a network can be protected, the placement of t… ▽ More This paper theoretically analyzes cable network disconnection due to randomly occurring natural disasters, where the disaster-endurance (DE) levels of the network are determined by a network entity such as the type of shielding method used for a duct containing cables. The network operator can determine which parts have a high DE level. When a part of a network can be protected, the placement of that part can be specified to decrease the probability of disconnecting two given nodes. The maximum lower bound of the probability of connecting two given nodes is explicitly derived. Conditions decreasing (not decreasing) the probability of connecting two given nodes with a partially protected network are provided. △ Less

Submitted 30 April, 2021; originally announced May 2021.

arXiv:2010.06318 [pdf, other]

Audio-Visual Self-Supervised Terrain Type Discovery for Mobile Platforms

Authors: Akiyoshi Kurobe, Yoshikatsu Nakajima, Hideo Saito, Kris Kitani

Abstract: The ability to both recognize and discover terrain characteristics is an important function required for many autonomous ground robots such as social robots, assistive robots, autonomous vehicles, and ground exploration robots. Recognizing and discovering terrain characteristics is challenging because similar terrains may have very different appearances (e.g., carpet comes in many colors), while t… ▽ More The ability to both recognize and discover terrain characteristics is an important function required for many autonomous ground robots such as social robots, assistive robots, autonomous vehicles, and ground exploration robots. Recognizing and discovering terrain characteristics is challenging because similar terrains may have very different appearances (e.g., carpet comes in many colors), while terrains with very similar appearance may have very different physical properties (e.g. mulch versus dirt). In order to address the inherent ambiguity in vision-based terrain recognition and discovery, we propose a multi-modal self-supervised learning technique that switches between audio features extracted from a mic attached to the underside of a mobile platform and image features extracted by a camera on the platform to cluster terrain types. The terrain cluster labels are then used to train an image-based convolutional neural network to predict changes in terrain types. Through experiments, we demonstrate that the proposed self-supervised terrain type discovery method achieves over 80% accuracy, which greatly outperforms several baselines and suggests strong potential for assistive applications. △ Less

Submitted 13 October, 2020; originally announced October 2020.

arXiv:2010.03341 [pdf, other]

doi 10.1016/j.compbiomed.2021.104596

Deep Learning in Diabetic Foot Ulcers Detection: A Comprehensive Evaluation

Authors: Moi Hoon Yap, Ryo Hachiuma, Azadeh Alavi, Raphael Brungel, Bill Cassidy, Manu Goyal, Hongtao Zhu, Johannes Ruckert, Moshe Olshansky, Xiao Huang, Hideo Saito, Saeed Hassanpour, Christoph M. Friedrich, David Ascher, Anping Song, Hiroki Kajita, David Gillespie, Neil D. Reeves, Joseph Pappachan, Claire O'Shea, Eibe Frank

Abstract: There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 i… ▽ More There has been a substantial amount of research involving computer methods and technology for the detection and recognition of diabetic foot ulcers (DFUs), but there is a lack of systematic comparisons of state-of-the-art deep learning object detection frameworks applied to this problem. DFUC2020 provided participants with a comprehensive dataset consisting of 2,000 images for training and 2,000 images for testing. This paper summarises the results of DFUC2020 by comparing the deep learning-based algorithms proposed by the winning teams: Faster R-CNN, three variants of Faster R-CNN and an ensemble method; YOLOv3; YOLOv5; EfficientDet; and a new Cascade Attention Network. For each deep learning method, we provide a detailed description of model architecture, parameter settings for training and additional stages including pre-processing, data augmentation and post-processing. We provide a comprehensive evaluation for each method. All the methods required a data augmentation stage to increase the number of images available for training and a post-processing stage to remove false positives. The best performance was obtained from Deformable Convolution, a variant of Faster R-CNN, with a mean average precision (mAP) of 0.6940 and an F1-Score of 0.7434. Finally, we demonstrate that the ensemble method based on different deep learning methods can enhanced the F1-Score but not the mAP. △ Less

Submitted 24 May, 2021; v1 submitted 7 October, 2020; originally announced October 2020.

Comments: 19 pages, 18 figures, 10 tables

Journal ref: Computers in Biology and Medicine, Volume 135, 2021, 104596, ISSN 0010-4825,

arXiv:1912.09650 [pdf, ps, other]

doi 10.1109/TMC.2019.2959990

Spatio-Temporal Correlation of Interference in MANET Under Spatially Correlated Shadowing Environment

Authors: Tatsuaki Kimura, Hiroshi Saito

Abstract: Correlation of interference affects spatio-temporal aspects of various wireless mobile systems, such as retransmission, multiple antennas and cooperative relaying. In this paper, we study the spatial and temporal correlation of interference in mobile ad-hoc networks under a correlated shadowing environment. By modeling the node locations as a Poisson point process with an i.i.d. mobility model and… ▽ More Correlation of interference affects spatio-temporal aspects of various wireless mobile systems, such as retransmission, multiple antennas and cooperative relaying. In this paper, we study the spatial and temporal correlation of interference in mobile ad-hoc networks under a correlated shadowing environment. By modeling the node locations as a Poisson point process with an i.i.d. mobility model and considering Gudmundson (1991)' s spatially correlated shadowing model, we theoretically analyze the relationship between the correlation distance of log-normal shadowing and the spatial and temporal correlation coefficients of interference. Since the exact expressions of the correlation coefficients are intractable, we obtain their simple asymptotic expressions as the variance of log-normal shadowing increases. We found in our numerical examples that the asymptotic expansions can be used as tight approximate formulas and useful for modeling general wireless systems under spatially correlated shadowing. △ Less

Submitted 20 December, 2019; originally announced December 2019.

Comments: to appear in IEEE Transactions on Mobile Computing

arXiv:1907.10008 [pdf, other]

Incremental Class Discovery for Semantic Segmentation with RGBD Sensing

Authors: Yoshikatsu Nakajima, Byeongkeun Kang, Hideo Saito, Kris Kitani

Abstract: This work addresses the task of open world semantic segmentation using RGBD sensing to discover new semantic classes over time. Although there are many types of objects in the real-word, current semantic segmentation methods make a closed world assumption and are trained only to segment a limited number of object classes. Towards a more open world approach, we propose a novel method that increment… ▽ More This work addresses the task of open world semantic segmentation using RGBD sensing to discover new semantic classes over time. Although there are many types of objects in the real-word, current semantic segmentation methods make a closed world assumption and are trained only to segment a limited number of object classes. Towards a more open world approach, we propose a novel method that incrementally learns new classes for image segmentation. The proposed system first segments each RGBD frame using both color and geometric information, and then aggregates that information to build a single segmented dense 3D map of the environment. The segmented 3D map representation is a key component of our approach as it is used to discover new object classes by identifying coherent regions in the 3D map that have no semantic label. The use of coherent region in the 3D map as a primitive element, rather than traditional elements such as surfels or voxels, also significantly reduces the computational complexity and memory use of our method. It thus leads to semi-real-time performance at {10.7}Hz when incrementally updating the dense 3D map at every frame. Through experiments on the NYUDv2 dataset, we demonstrate that the proposed method is able to correctly cluster objects of both known and unseen classes. We also show the quantitative comparison with the state-of-the-art supervised methods, the processing time of each step, and the influences of each component. △ Less

Submitted 23 July, 2019; originally announced July 2019.

Comments: 10 pages, To appear at IEEE International Conference on Computer Vision (ICCV 2019)

arXiv:1907.09127 [pdf, other]

DetectFusion: Detecting and Segmenting Both Known and Unknown Dynamic Objects in Real-time SLAM

Authors: Ryo Hachiuma, Christian Pirchheim, Dieter Schmalstieg, Hideo Saito

Abstract: We present DetectFusion, an RGB-D SLAM system that runs in real-time and can robustly handle semantically known and unknown objects that can move dynamically in the scene. Our system detects, segments and assigns semantic class labels to known objects in the scene, while tracking and reconstructing them even when they move independently in front of the monocular camera. In contrast to related work… ▽ More We present DetectFusion, an RGB-D SLAM system that runs in real-time and can robustly handle semantically known and unknown objects that can move dynamically in the scene. Our system detects, segments and assigns semantic class labels to known objects in the scene, while tracking and reconstructing them even when they move independently in front of the monocular camera. In contrast to related work, we achieve real-time computational performance on semantic instance segmentation with a novel method combining 2D object detection and 3D geometric segmentation. In addition, we propose a method for detecting and segmenting the motion of semantically unknown objects, thus further improving the accuracy of camera tracking and map reconstruction. We show that our method performs on par or better than previous work in terms of localization and object reconstruction accuracy, while achieving about 20 FPS even if the objects are segmented in each frame. △ Less

Submitted 22 July, 2019; originally announced July 2019.

Comments: 12 pages, 4 figures, 4 tables, accepted by BMVC 2019 spotlight session

arXiv:1812.07045 [pdf, other]

EventNet: Asynchronous Recursive Event Processing

Authors: Yusuke Sekikawa, Kosuke Hara, Hideo Saito

Abstract: Event cameras are bio-inspired vision sensors that mimic retinas to asynchronously report per-pixel intensity changes rather than outputting an actual intensity image at regular intervals. This new paradigm of image sensor offers significant potential advantages; namely, sparse and non-redundant data representation. Unfortunately, however, most of the existing artificial neural network architectur… ▽ More Event cameras are bio-inspired vision sensors that mimic retinas to asynchronously report per-pixel intensity changes rather than outputting an actual intensity image at regular intervals. This new paradigm of image sensor offers significant potential advantages; namely, sparse and non-redundant data representation. Unfortunately, however, most of the existing artificial neural network architectures, such as a CNN, require dense synchronous input data, and therefore, cannot make use of the sparseness of the data. We propose EventNet, a neural network designed for real-time processing of asynchronous event streams in a recursive and event-wise manner. EventNet models dependence of the output on tens of thousands of causal events recursively using a novel temporal coding scheme. As a result, at inference time, our network operates in an event-wise manner that is realized with very few sum-of-the-product operations---look-up table and temporal feature aggregation---which enables processing of 1 mega or more events per second on standard CPU. In experiments using real data, we demonstrated the real-time performance and robustness of our framework. △ Less

Submitted 1 April, 2019; v1 submitted 7 December, 2018; originally announced December 2018.

arXiv:1803.02784 [pdf, other]

Fast and Accurate Semantic Mapping through Geometric-based Incremental Segmentation

Authors: Yoshikatsu Nakajima, Keisuke Tateno, Federico Tombari, Hideo Saito

Abstract: We propose an efficient and scalable method for incrementally building a dense, semantically annotated 3D map in real-time. The proposed method assigns class probabilities to each region, not each element (e.g., surfel and voxel), of the 3D map which is built up through a robust SLAM framework and incrementally segmented with a geometric-based segmentation method. Differently from all other approa… ▽ More We propose an efficient and scalable method for incrementally building a dense, semantically annotated 3D map in real-time. The proposed method assigns class probabilities to each region, not each element (e.g., surfel and voxel), of the 3D map which is built up through a robust SLAM framework and incrementally segmented with a geometric-based segmentation method. Differently from all other approaches, our method has a capability of running at over 30Hz while performing all processing components, including SLAM, segmentation, 2D recognition, and updating class probabilities of each segmentation label at every incoming frame, thanks to the high efficiency that characterizes the computationally intensive stages of our framework. By utilizing a specifically designed CNN to improve the frame-wise segmentation result, we can also achieve high accuracy. We validate our method on the NYUv2 dataset by comparing with the state of the art in terms of accuracy and computational efficiency, and by means of an analysis in terms of time and space complexity. △ Less

Submitted 7 March, 2018; originally announced March 2018.

arXiv:1707.06128 [pdf, ps, other]

Geometric Analysis of Observability of Target Object Shape Using Location-Unknown Distance Sensors

Authors: Hiroshi Saito, Hirotada Honda

Abstract: We geometrically analyze the problem of estimating parameters related to the shape and size of a two-dimensional target object on the plane by using randomly distributed distance sensors whose locations are unknown. Based on the analysis using geometric probability, we discuss the observability of these parameters: which parameters we can estimate and what conditions are required to estimate them.… ▽ More We geometrically analyze the problem of estimating parameters related to the shape and size of a two-dimensional target object on the plane by using randomly distributed distance sensors whose locations are unknown. Based on the analysis using geometric probability, we discuss the observability of these parameters: which parameters we can estimate and what conditions are required to estimate them. For a convex target object, its size and perimeter length are observable, and other parameters are not observable. For a general polygon target object, convexity in addition to its size and perimeter length is observable. Parameters related to a concave vertex can be observable when some conditions are satisfied. We also propose a method for estimating the convexity of a target object and the perimeter length of the target object. △ Less

Submitted 15 May, 2017; originally announced July 2017.

Comments: under submission

arXiv:1706.09606 [pdf, ps, other]

Theoretical Performance Analysis of Vehicular Broadcast Communications at Intersection and their Optimization

Authors: Tatsuaki Kimura, Hiroshi Saito

Abstract: In this paper, we propose an optimization method for the broadcast rate in vehicle-to-vehicle (V2V) broadcast communications at an intersection on the basis of theoretical analysis. We consider a model in which locations of vehicles are modeled separately as queuing and running segments and derive key performance metrics of V2V broadcast communications via a stochastic geometry approach. Since the… ▽ More In this paper, we propose an optimization method for the broadcast rate in vehicle-to-vehicle (V2V) broadcast communications at an intersection on the basis of theoretical analysis. We consider a model in which locations of vehicles are modeled separately as queuing and running segments and derive key performance metrics of V2V broadcast communications via a stochastic geometry approach. Since these theoretical expressions are mathematically intractable, we developed closed-form approximate formulae for them. Using them, we optimize the broadcast rate such that the mean number of successful receivers per unit time is maximized. Because of the closed form approximation, the optimal rate can be used as a guideline for a real-time control-method, which is not achieved through time-consuming simulations. We evaluated our method through numerical examples and demonstrated the effectiveness of our method. △ Less

Submitted 29 March, 2019; v1 submitted 29 June, 2017; originally announced June 2017.

arXiv:1403.2486 [pdf, ps, other]

Theoretical Evaluation of Offloading through Wireless LANs

Authors: Hiroshi Saito, Ryoichi Kawahara

Abstract: Offloading of cellular traffic through a wireless local area network (WLAN) is theoretically evaluated. First, empirical data sets of the locations of WLAN internet access points are analyzed and an inhomogeneous Poisson process consisting of high, normal, and low density regions is proposed as a spatial point process model for these configurations. Second, performance metrics, such as mean availa… ▽ More Offloading of cellular traffic through a wireless local area network (WLAN) is theoretically evaluated. First, empirical data sets of the locations of WLAN internet access points are analyzed and an inhomogeneous Poisson process consisting of high, normal, and low density regions is proposed as a spatial point process model for these configurations. Second, performance metrics, such as mean available bandwidth for a user and the number of vertical handovers, are evaluated for the proposed model through geometric analysis. Explicit formulas are derived for the metrics, although they depend on many parameters such as the number of WLAN access points, the shape of each WLAN coverage region, the location of each WLAN access point, the available bandwidth (bps) of the WLAN, and the shape and available bandwidth (bps) of each subregion identified by the channel quality indicator in a cell of the cellular network. Explicit formulas strongly suggest that the bandwidth a user experiences does not depend on the user mobility. This is because the bandwidth available by a user who does not move and that available by a user who moves are the same or approximately the same as a probabilistic distribution. Numerical examples show that parameters, such as the size of regions where placement of WLAN access points is not allowed and the mean density of WLANs in high density regions, have a large impact on performance metrics. In particular, a homogeneous Poisson process model as the WLAN access point location model largely overestimates the mean available bandwidth for a user and the number of vertical handovers. The overestimated mean available bandwidth is, for example, about 50% in a certain condition. △ Less

Submitted 11 March, 2014; originally announced March 2014.

arXiv:1402.6835 [pdf, ps, other]

doi 10.1109/JLT.2014.2385100

Spatial Design of Physical Network Robust against Earthquakes

Authors: Hiroshi Saito

Abstract: This paper analyzes the survivability of a physical network against earthquakes and proposes spatial network design rules to make a network robust against earthquakes. The disaster area model used is fairly generic and bounded. The proposed design rules for physical networks include: (i) a shorter zigzag route can reduce the probability that a network intersects a disaster area, (ii) an additive p… ▽ More This paper analyzes the survivability of a physical network against earthquakes and proposes spatial network design rules to make a network robust against earthquakes. The disaster area model used is fairly generic and bounded. The proposed design rules for physical networks include: (i) a shorter zigzag route can reduce the probability that a network intersects a disaster area, (ii) an additive performance metric, such as repair cost, is independent of the network shape if the route length is fixed, and (iii) additional routes within a ring network does not decrease the probability that all the routes between a given pair of nodes intersect the disaster area, but a wider detour route decreases it. Formulas for evaluating the probability of disconnecting two given nodes are also derived. An optimal server placement is shown as an application of the theoretical results. These analysis results are validated through empirical earthquake data. △ Less

Submitted 27 February, 2014; originally announced February 2014.

Comments: arXiv admin note: text overlap with arXiv:1312.7187

arXiv:1402.1637 [pdf]

Vertical Clustering of 3D Elliptical Helical Data

Authors: Wasantha Samarathunga, Masatoshi Seki, Hidenobu Saito, Ken Ichiryu, Yasuhiro Ohyama

Abstract: This research proposes an effective vertical clustering strategy of 3D data in an elliptical helical shape based on 2D geometry. The clustering object is an elliptical cross-sectioned metal pipe which is been bended in to an elliptical helical shape which is used in wearable muscle support designing for welfare industry. The aim of this proposed method is to maximize the vertical clustering (verti… ▽ More This research proposes an effective vertical clustering strategy of 3D data in an elliptical helical shape based on 2D geometry. The clustering object is an elliptical cross-sectioned metal pipe which is been bended in to an elliptical helical shape which is used in wearable muscle support designing for welfare industry. The aim of this proposed method is to maximize the vertical clustering (vertical partitioning) ability of surface data in order to run the product evaluation process addressed in research [2]. The experiment results prove that the proposed method outperforms the existing threshold no of clusters that preserves the vertical shape than applying the conventional 3D data. This research also proposes a new product testing strategy that provides the flexibility in computer aided testing by not restricting the sequence depending measurements which apply weight on measuring process. The clustering algorithms used for the experiments in this research are self-organizing map (SOM) and K-medoids. △ Less

Submitted 7 February, 2014; originally announced February 2014.

Journal ref: International Journal of Computer Trends and Technology, volume 6 number 2,Dec 2013

arXiv:1402.1635 [pdf]

Product Evaluation In Elliptical Helical Pipe Bending

Authors: Wasantha Samarathunga, Masatoshi Seki, Hidenobu Saito, Ken Ichiryu, Yasuhiro Ohyama

Abstract: This research proposes a computation approach to address the evaluation of end product machining accuracy in elliptical surfaced helical pipe bending using 6dof parallel manipulator as a pipe bender. The target end product is wearable metal muscle supporters used in build-to-order welfare product manufacturing. This paper proposes a product testing model that mainly corrects the surface direction… ▽ More This research proposes a computation approach to address the evaluation of end product machining accuracy in elliptical surfaced helical pipe bending using 6dof parallel manipulator as a pipe bender. The target end product is wearable metal muscle supporters used in build-to-order welfare product manufacturing. This paper proposes a product testing model that mainly corrects the surface direction estimation errors of existing least squares ellipse fittings, followed by arc length and central angle evaluations. This post-machining modelling requires combination of reverse rotations and translations to a specific location before accuracy evaluation takes place, i.e. the reverse comparing to pre-machining product modelling. This specific location not only allows us to compute surface direction but also the amount of excessive surface twisting as a rotation angle about a specified axis, i.e. quantification of surface torsion. At first we experimented three ellipse fitting methods such as, two least-squares fitting methods with Bookstein constraint and Trace constraint, and one non- linear least squares method using Gauss-Newton algorithm. From fitting results, we found that using Trace constraint is more reliable and designed a correction filter for surface torsion observation. Finally we apply 2D total least squares line fitting method with a rectification filter for surface direction detection. △ Less

Submitted 7 February, 2014; originally announced February 2014.

Journal ref: International Journal of Computer Trends and Technology, volume 4 Issue 10 Oct 2013

arXiv:1312.7187 [pdf, ps, other]

Analysis of Geometric Disaster Evaluation Model for Physical Networks

Authors: Hiroshi Saito

Abstract: A geometric model of a physical network affected by a disaster is proposed and analyzed using integral geometry (geometric probability). This analysis provides a theoretical method of evaluating performance metrics, such as the probability of maintaining connectivity, and a network design rule that can make the network robust against disasters. The proposed model is of when the disaster area is… ▽ More A geometric model of a physical network affected by a disaster is proposed and analyzed using integral geometry (geometric probability). This analysis provides a theoretical method of evaluating performance metrics, such as the probability of maintaining connectivity, and a network design rule that can make the network robust against disasters. The proposed model is of when the disaster area is much larger than the part of the network in which we are interested. Performance metrics, such as the probability of maintaining connectivity, are explicitly given by linear functions of the perimeter length of convex hulls determined by physical routes. The derived network design rule includes the following. (1) Reducing the convex hull of the physical route reduces the expected number of nodes that cannot connect to the destination. (2) The probability of maintaining the connectivity of two nodes on a loop cannot be changed by changing the physical route of that loop. (3) The effect of introducing a loop is identical to that of a single physical route implemented by the straight-line route. △ Less

Submitted 26 December, 2013; originally announced December 2013.

Comments: 12 pages

arXiv:0911.3842 [pdf, other]

doi 10.1088/1367-2630/12/5/053030

Musical Genres: Beating to the Rhythms of Different Drums

Authors: Debora C. Correa, Jose H. Saito, Luciano da F. Costa

Abstract: Online music databases have increased signicantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic music genre classification is addressed by exploring rhythm-based features obtained f… ▽ More Online music databases have increased signicantly as a consequence of the rapid growth of the Internet and digital audio, requiring the development of faster and more efficient tools for music content analysis. Musical genres are widely used to organize music collections. In this paper, the problem of automatic music genre classification is addressed by exploring rhythm-based features obtained from a respective complex network representation. A Markov model is build in order to analyse the temporal sequence of rhythmic notation events. Feature analysis is performed by using two multivariate statistical approaches: principal component analysis(unsupervised) and linear discriminant analysis (supervised). Similarly, two classifiers are applied in order to identify the category of rhythms: parametric Bayesian classifier under gaussian hypothesis (supervised), and agglomerative hierarchical clustering (unsupervised). Qualitative results obtained by Kappa coefficient and the obtained clusters corroborated the effectiveness of the proposed method. △ Less

Submitted 19 November, 2009; originally announced November 2009.

Comments: 35 pages, 13 figures, 13 tables

Showing 1–34 of 34 results for author: Saito, H