subscribe to arXiv mailings

doi 10.1109/ICCRE57112.2023.10155581

Task segmentation based on transition state clustering for surgical robot assistance

Authors: Yutaro Yamada, Jacinto Colan, Ana Davila, Yasuhisa Hasegawa

Abstract: Understanding surgical tasks represents an important challenge for autonomy in surgical robotic systems. To achieve this, we propose an online task segmentation framework that uses hierarchical transition state clustering to activate predefined robot assistance. Our approach involves performing a first clustering on visual features and a subsequent clustering on robot kinematic features for each v… ▽ More Understanding surgical tasks represents an important challenge for autonomy in surgical robotic systems. To achieve this, we propose an online task segmentation framework that uses hierarchical transition state clustering to activate predefined robot assistance. Our approach involves performing a first clustering on visual features and a subsequent clustering on robot kinematic features for each visual cluster. This enables to capture relevant task transition information on each modality independently. The approach is implemented for a pick-and-place task commonly found in surgical training. The validation of the transition segmentation showed high accuracy and fast computation time. We have integrated the transition recognition module with predefined robot-assisted tool positioning. The complete framework has shown benefits in reducing task completion time and cognitive workload. △ Less

Submitted 14 June, 2024; originally announced June 2024.

Comments: Accepted at 2023 International Conference on Control and Robotics Engineering (ICCRE)

Journal ref: 2023 International Conference on Control and Robotics Engineering (ICCRE), pp.260-264

arXiv:2402.09052 [pdf, other]

L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects

Authors: Yutaro Yamada, Khyathi Chandu, Yuchen Lin, Jack Hessel, Ilker Yildirim, Yejin Choi

Abstract: Diffusion-based image generation models such as DALL-E 3 and Stable Diffusion-XL demonstrate remarkable capabilities in generating images with realistic and unique compositions. Yet, these models are not robust in precisely reasoning about physical and spatial configurations of objects, especially when instructed with unconventional, thereby out-of-distribution descriptions, such as "a chair with… ▽ More Diffusion-based image generation models such as DALL-E 3 and Stable Diffusion-XL demonstrate remarkable capabilities in generating images with realistic and unique compositions. Yet, these models are not robust in precisely reasoning about physical and spatial configurations of objects, especially when instructed with unconventional, thereby out-of-distribution descriptions, such as "a chair with five legs". In this paper, we propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation of unconventional objects that current data-driven diffusion models struggle with. More concretely, we use large language models as agents to compose a desired object via trial-and-error within the 3D simulation environment. To facilitate our investigation, we develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender where language agents can build and compose atomic building blocks via API calls. Human and automatic GPT-4V evaluations show that our approach surpasses the standard GPT-4 and other language agents (e.g., ReAct and Reflexion) for 3D mesh generation on ShapeNet. Moreover, when tested on our UFO benchmark, our approach outperforms other state-of-the-art text-to-2D image and text-to-3D models based on human evaluation. △ Less

Submitted 14 February, 2024; originally announced February 2024.

arXiv:2312.05751 [pdf, other]

Benchmarking of Query Strategies: Towards Future Deep Active Learning

Authors: Shiryu Ueno, Yusei Yamada, Shunsuke Nakatsuka, Kunihito Kato

Abstract: In this study, we benchmark query strategies for deep actice learning~(DAL). DAL reduces annotation costs by annotating only high-quality samples selected by query strategies. Existing research has two main problems, that the experimental settings are not standardized, making the evaluation of existing methods is difficult, and that most of experiments were conducted on the CIFAR or MNIST datasets… ▽ More In this study, we benchmark query strategies for deep actice learning~(DAL). DAL reduces annotation costs by annotating only high-quality samples selected by query strategies. Existing research has two main problems, that the experimental settings are not standardized, making the evaluation of existing methods is difficult, and that most of experiments were conducted on the CIFAR or MNIST datasets. Therefore, we develop standardized experimental settings for DAL and investigate the effectiveness of various query strategies using six datasets, including those that contain medical and visual inspection images. In addition, since most current DAL approaches are model-based, we perform verification experiments using fully-trained models for querying to investigate the effectiveness of these approaches for the six datasets. Our code is available at \href{https://github.com/ia-gu/Benchmarking-of-Query-Strategies-Towards-Future-Deep-Active-Learning} △ Less

Submitted 9 December, 2023; originally announced December 2023.

Comments: 14 pages

arXiv:2310.14540 [pdf, other]

Evaluating Spatial Understanding of Large Language Models

Authors: Yutaro Yamada, Yihan Bao, Andrew K. Lampinen, Jungo Kasai, Ilker Yildirim

Abstract: Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language nav… ▽ More Large language models (LLMs) show remarkable capabilities across a variety of tasks. Despite the models only seeing text in training, several recent studies suggest that LLM representations implicitly capture aspects of the underlying grounded concepts. Here, we explore LLM representations of a particularly salient kind of grounded knowledge -- spatial relationships. We design natural-language navigation tasks and evaluate the ability of LLMs, in particular GPT-3.5-turbo, GPT-4, and Llama2 series models, to represent and reason about spatial structures. These tasks reveal substantial variability in LLM performance across different spatial structures, including square, hexagonal, and triangular grids, rings, and trees. In extensive error analysis, we find that LLMs' mistakes reflect both spatial and non-spatial factors. These findings suggest that LLMs appear to capture certain aspects of spatial structure implicitly, but room for improvement remains. △ Less

Submitted 12 April, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

Comments: Accepted to TMLR 2024. Our code and data are available at https://github.com/runopti/SpatialEvalLLM, https://huggingface.co/datasets/yyamada/SpatialEvalLLM

arXiv:2309.08623 [pdf]

doi 10.1109/ICDH60066.2023.00014

Balance Measures Derived from Insole Sensor Differentiate Prodromal Dementia with Lewy Bodies

Authors: Masatomo Kobayashi, Yasunori Yamada, Kaoru Shinkawa, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai

Abstract: Dementia with Lewy bodies is the second most common type of neurodegenerative dementia, and identification at the prodromal stage$-$i.e., mild cognitive impairment due to Lewy bodies (MCI-LB)$-$is important for providing appropriate care. However, MCI-LB is often underrecognized because of its diversity in clinical manifestations and similarities with other conditions such as mild cognitive impair… ▽ More Dementia with Lewy bodies is the second most common type of neurodegenerative dementia, and identification at the prodromal stage$-$i.e., mild cognitive impairment due to Lewy bodies (MCI-LB)$-$is important for providing appropriate care. However, MCI-LB is often underrecognized because of its diversity in clinical manifestations and similarities with other conditions such as mild cognitive impairment due to Alzheimer's disease (MCI-AD). In this study, we propose a machine learning-based automatic pipeline that helps identify MCI-LB by exploiting balance measures acquired with an insole sensor during a 30-s standing task. An experiment with 98 participants (14 MCI-LB, 38 MCI-AD, 46 cognitively normal) showed that the resultant models could discriminate MCI-LB from the other groups with up to 78.0% accuracy (AUC: 0.681), which was 6.8% better than the accuracy of a reference model based on demographic and clinical neuropsychological measures. Our findings may open up a new approach for timely identification of MCI-LB, enabling better care for patients. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Journal ref: 2023 IEEE International Conference on Digital Health (ICDH)

arXiv:2309.05777 [pdf]

doi 10.1109/ICDH60066.2023.00015

Smartwatch-derived Acoustic Markers for Deficits in Cognitively Relevant Everyday Functioning

Authors: Yasunori Yamada, Kaoru Shinkawa, Masatomo Kobayashi, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai

Abstract: Detection of subtle deficits in everyday functioning due to cognitive impairment is important for early detection of neurodegenerative diseases, particularly Alzheimer's disease. However, current standards for assessment of everyday functioning are based on qualitative, subjective ratings. Speech has been shown to provide good objective markers for cognitive impairments, but the association with c… ▽ More Detection of subtle deficits in everyday functioning due to cognitive impairment is important for early detection of neurodegenerative diseases, particularly Alzheimer's disease. However, current standards for assessment of everyday functioning are based on qualitative, subjective ratings. Speech has been shown to provide good objective markers for cognitive impairments, but the association with cognition-relevant everyday functioning remains uninvestigated. In this study, we demonstrate the feasibility of using a smartwatch-based application to collect acoustic features as objective markers for detecting deficits in everyday functioning. We collected voice data during the performance of cognitive tasks and daily conversation, as possible application scenarios, from 54 older adults, along with a measure of everyday functioning. Machine learning models using acoustic features could detect individuals with deficits in everyday functioning with up to 77.8% accuracy, which was higher than the 68.5% accuracy with standard neuropsychological tests. We also identified common acoustic features for robustly discriminating deficits in everyday functioning across both types of voice data (cognitive tasks and daily conversation). Our results suggest that common acoustic features extracted from different types of voice data can be used as markers for deficits in everyday functioning. △ Less

Submitted 11 September, 2023; originally announced September 2023.

Journal ref: 2023 IEEE International Conference on Digital Health (ICDH)

arXiv:2307.10688 [pdf, other]

Bounded Combinatorial Reconfiguration with Answer Set Programming

Authors: Yuya Yamada, Mutsunori Banbara, Katsumi Inoue, Torsten Schaub

Abstract: We develop an approach called bounded combinatorial reconfiguration for solving combinatorial reconfiguration problems based on Answer Set Programming (ASP). The general task is to study the solution spaces of source combinatorial problems and to decide whether or not there are sequences of feasible solutions that have special properties. The resulting recongo solver covers all metrics of the solv… ▽ More We develop an approach called bounded combinatorial reconfiguration for solving combinatorial reconfiguration problems based on Answer Set Programming (ASP). The general task is to study the solution spaces of source combinatorial problems and to decide whether or not there are sequences of feasible solutions that have special properties. The resulting recongo solver covers all metrics of the solver track in the most recent international competition on combinatorial reconfiguration (CoRe Challenge 2022). recongo ranked first in the shortest metric of the single-engine solvers track. In this paper, we present the design and implementation of bounded combinatorial reconfiguration, and present an ASP encoding of the independent set reconfiguration problem that is one of the most studied combinatorial reconfiguration problems. Finally, we present empirical analysis considering all instances of CoRe Challenge 2022. △ Less

Submitted 20 July, 2023; originally announced July 2023.

Comments: 15 pages

arXiv:2303.18027 [pdf, other]

Evaluating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations

Authors: Jungo Kasai, Yuhei Kasai, Keisuke Sakaguchi, Yutaro Yamada, Dragomir Radev

Abstract: As large language models (LLMs) gain popularity among speakers of diverse languages, we believe that it is crucial to benchmark them to better understand model behaviors, failures, and limitations in languages beyond English. In this work, we evaluate LLM APIs (ChatGPT, GPT-3, and GPT-4) on the Japanese national medical licensing examinations from the past five years, including the current year. O… ▽ More As large language models (LLMs) gain popularity among speakers of diverse languages, we believe that it is crucial to benchmark them to better understand model behaviors, failures, and limitations in languages beyond English. In this work, we evaluate LLM APIs (ChatGPT, GPT-3, and GPT-4) on the Japanese national medical licensing examinations from the past five years, including the current year. Our team comprises native Japanese-speaking NLP researchers and a practicing cardiologist based in Japan. Our experiments show that GPT-4 outperforms ChatGPT and GPT-3 and passes all six years of the exams, highlighting LLMs' potential in a language that is typologically distant from English. However, our evaluation also exposes critical limitations of the current LLM APIs. First, LLMs sometimes select prohibited choices that should be strictly avoided in medical practice in Japan, such as suggesting euthanasia. Further, our analysis shows that the API costs are generally higher and the maximum context size is smaller for Japanese because of the way non-Latin scripts are currently tokenized in the pipeline. We release our benchmark as Igaku QA as well as all model outputs and exam metadata. We hope that our results and benchmark will spur progress on more diverse applications of LLMs. Our benchmark is available at https://github.com/jungokasai/IgakuQA. △ Less

Submitted 5 April, 2023; v1 submitted 31 March, 2023; originally announced March 2023.

Comments: Added results from the March 2023 exam

arXiv:2212.12043 [pdf, other]

When are Lemons Purple? The Concept Association Bias of Vision-Language Models

Authors: Yutaro Yamada, Yingtian Tang, Yoyo Zhang, Ilker Yildirim

Abstract: Large-scale vision-language models such as CLIP have shown impressive performance on zero-shot image classification and image-to-text retrieval. However, such performance does not realize in tasks that require a finer-grained correspondence between vision and language, such as Visual Question Answering (VQA). As a potential cause of the difficulty of applying these models to VQA and similar tasks,… ▽ More Large-scale vision-language models such as CLIP have shown impressive performance on zero-shot image classification and image-to-text retrieval. However, such performance does not realize in tasks that require a finer-grained correspondence between vision and language, such as Visual Question Answering (VQA). As a potential cause of the difficulty of applying these models to VQA and similar tasks, we report an interesting phenomenon of vision-language models, which we call the Concept Association Bias (CAB). We find that models with CAB tend to treat input as a bag of concepts and attempt to fill in the other missing concept crossmodally, leading to an unexpected zero-shot prediction. We demonstrate CAB by showing that CLIP's zero-shot classification performance greatly suffers when there is a strong concept association between an object (e.g. eggplant) and an attribute (e.g. color purple). We also show that the strength of CAB predicts the performance on VQA. We observe that CAB is prevalent in vision-language models trained with contrastive losses, even when autoregressive losses are jointly employed. However, a model that solely relies on autoregressive loss seems to exhibit minimal or no signs of CAB. △ Less

Submitted 13 April, 2024; v1 submitted 22 December, 2022; originally announced December 2022.

Comments: EMNLP 2023 main

arXiv:2211.08685 [pdf]

doi 10.1109/ICDH55609.2022.00008

Automated Analysis of Drawing Process for Detecting Prodromal and Clinical Dementia

Authors: Yasunori Yamada, Masatomo Kobayashi, Kaoru Shinkawa, Miyuki Nemoto, Miho Ota, Kiyotaka Nemoto, Tetsuaki Arai

Abstract: Early diagnosis of dementia, particularly in the prodromal stage (i.e., mild cognitive impairment, or MCI), has become a research and clinical priority but remains challenging. Automated analysis of the drawing process has been studied as a promising means for screening prodromal and clinical dementia, providing multifaceted information encompassing features, such as drawing speed, pen posture, wr… ▽ More Early diagnosis of dementia, particularly in the prodromal stage (i.e., mild cognitive impairment, or MCI), has become a research and clinical priority but remains challenging. Automated analysis of the drawing process has been studied as a promising means for screening prodromal and clinical dementia, providing multifaceted information encompassing features, such as drawing speed, pen posture, writing pressure, and pauses. We examined the feasibility of using these features not only for detecting prodromal and clinical dementia but also for predicting the severity of cognitive impairments assessed using Mini-Mental State Examination (MMSE) as well as the severity of neuropathological changes assessed by medial temporal lobe (MTL) atrophy. We collected drawing data with a digitizing tablet and pen from 145 older adults of cognitively normal (CN), MCI, and dementia. The nested cross-validation results indicate that the combination of drawing features could be used to classify CN, MCI, and dementia with an AUC of 0.909 and 75.1% accuracy (CN vs. MCI: 82.4% accuracy; CN vs. dementia: 92.2% accuracy; MCI vs. dementia: 80.3% accuracy) and predict MMSE scores with an $R^2$ of 0.491 and severity of MTL atrophy with an $R^2$ of 0.293. Our findings suggest that automated analysis of the drawing process can provide information about cognitive impairments and neuropathological changes due to dementia, which can help identify prodromal and clinical dementia as a digital biomarker. △ Less

Submitted 16 November, 2022; originally announced November 2022.

Journal ref: 2022 IEEE International Conference on Digital Health (ICDH)

arXiv:2204.03934 [pdf, other]

Does Robustness on ImageNet Transfer to Downstream Tasks?

Authors: Yutaro Yamada, Mayu Otani

Abstract: As clean ImageNet accuracy nears its ceiling, the research community is increasingly more concerned about robust accuracy under distributional shifts. While a variety of methods have been proposed to robustify neural networks, these techniques often target models trained on ImageNet classification. At the same time, it is a common practice to use ImageNet pretrained backbones for downstream tasks… ▽ More As clean ImageNet accuracy nears its ceiling, the research community is increasingly more concerned about robust accuracy under distributional shifts. While a variety of methods have been proposed to robustify neural networks, these techniques often target models trained on ImageNet classification. At the same time, it is a common practice to use ImageNet pretrained backbones for downstream tasks such as object detection, semantic segmentation, and image classification from different domains. This raises a question: Can these robust image classifiers transfer robustness to downstream tasks? For object detection and semantic segmentation, we find that a vanilla Swin Transformer, a variant of Vision Transformer tailored for dense prediction tasks, transfers robustness better than Convolutional Neural Networks that are trained to be robust to the corrupted version of ImageNet. For CIFAR10 classification, we find that models that are robustified for ImageNet do not retain robustness when fully fine-tuned. These findings suggest that current robustification techniques tend to emphasize ImageNet evaluations. Moreover, network architecture is a strong source of robustness when we consider transfer learning. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Comments: CVPR 2022

arXiv:2201.12122 [pdf, other]

Can Wikipedia Help Offline Reinforcement Learning?

Authors: Machel Reid, Yutaro Yamada, Shixiang Shane Gu

Abstract: Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is tr… ▽ More Fine-tuning reinforcement learning (RL) models has been challenging because of a lack of large scale off-the-shelf datasets as well as high variance in transferability among different environments. Recent work has looked at tackling offline RL from the perspective of sequence modeling with improved results as result of the introduction of the Transformer architecture. However, when the model is trained from scratch, it suffers from slow convergence speeds. In this paper, we look to take advantage of this formulation of reinforcement learning as sequence modeling and investigate the transferability of pre-trained sequence models on other domains (vision, language) when finetuned on offline RL tasks (control, games). To this end, we also propose techniques to improve transfer between these domains. Results show consistent performance gains in terms of both convergence speed and reward on a variety of environments, accelerating training by 3-6x and achieving state-of-the-art performance in a variety of tasks using Wikipedia-pretrained and GPT2 language models. We hope that this work not only brings light to the potentials of leveraging generic sequence modeling techniques and pre-trained models for RL, but also inspires future work on sharing knowledge between generative modeling tasks of completely different domains. △ Less

Submitted 23 July, 2022; v1 submitted 28 January, 2022; originally announced January 2022.

arXiv:2201.10713 [pdf, other]

doi 10.1109/ACCESS.2022.3186479

Adaptive Resonance Theory-based Topological Clustering with a Divisive Hierarchical Structure Capable of Continual Learning

Authors: Naoki Masuyama, Narito Amako, Yuna Yamada, Yusuke Nojima, Hisao Ishibuchi

Abstract: Adaptive Resonance Theory (ART) is considered as an effective approach for realizing continual learning thanks to its ability to handle the plasticity-stability dilemma. In general, however, the clustering performance of ART-based algorithms strongly depends on the specification of a similarity threshold, i.e., a vigilance parameter, which is data-dependent and specified by hand. This paper propos… ▽ More Adaptive Resonance Theory (ART) is considered as an effective approach for realizing continual learning thanks to its ability to handle the plasticity-stability dilemma. In general, however, the clustering performance of ART-based algorithms strongly depends on the specification of a similarity threshold, i.e., a vigilance parameter, which is data-dependent and specified by hand. This paper proposes an ART-based topological clustering algorithm with a mechanism that automatically estimates a similarity threshold from the distribution of data points. In addition, for improving information extraction performance, a divisive hierarchical clustering algorithm capable of continual learning is proposed by introducing a hierarchical structure to the proposed algorithm. Experimental results demonstrate that the proposed algorithm has high clustering performance comparable with recently-proposed state-of-the-art hierarchical clustering algorithms. △ Less

Submitted 7 July, 2022; v1 submitted 25 January, 2022; originally announced January 2022.

Comments: This paper is accepted in IEEE Access

Journal ref: IEEE Access, vol. 10, pp. 68042-68056, June 2022

arXiv:2110.15960 [pdf, other]

Support Recovery with Stochastic Gates: Theory and Application for Linear Models

Authors: Soham Jana, Henry Li, Yutaro Yamada, Ofir Lindenbaum

Abstract: Consider the problem of simultaneous estimation and support recovery of the coefficient vector in a linear data model with additive Gaussian noise. We study the problem of estimating the model coefficients based on a recently proposed non-convex regularizer, namely the stochastic gates (STG) [Yamada et al. 2020]. We suggest a new projection-based algorithm for solving the STG regularized minimizat… ▽ More Consider the problem of simultaneous estimation and support recovery of the coefficient vector in a linear data model with additive Gaussian noise. We study the problem of estimating the model coefficients based on a recently proposed non-convex regularizer, namely the stochastic gates (STG) [Yamada et al. 2020]. We suggest a new projection-based algorithm for solving the STG regularized minimization problem, and prove convergence and support recovery guarantees of the STG-estimator for a range of random and non-random design matrix setups. Our new algorithm has been shown to outperform the existing STG algorithm and other classical estimators for support recovery in various real and synthetic data analyses. △ Less

Submitted 12 November, 2022; v1 submitted 29 October, 2021; originally announced October 2021.

Comments: 15 pages, 7 figures, extended the theoretical studies and numerical analyses to other probabilistic models (subgaussian, moment constraints etc.)

arXiv:2103.06515 [pdf]

Physical Activity Analysis of College Students During the COVID-19 Pandemic Using Smartphones

Authors: Yuuki Nishiyama, Yuui Kakino, Enishi Naka, Yuka Noda, Satsuki Hashiba, Yusuke Yamada, Wataru Sasaki, Tadashi Okoshi, Jin Nakazawa, Masaki Mori, Hisashi Mizutori, Kotomi Shiota, Tomohisa Nagano, Yuko Tokairin, Takaaki Kato

Abstract: Owing to the pandemic caused by the coronavirus disease of 2019 (COVID-19), several universities have closed their campuses for preventing the spread of infection. Consequently, the university classes are being held over the Internet, and students attend these classes from their homes. While the COVID-19 pandemic is expected to be prolonged, the online-centric lifestyle has raised concerns about s… ▽ More Owing to the pandemic caused by the coronavirus disease of 2019 (COVID-19), several universities have closed their campuses for preventing the spread of infection. Consequently, the university classes are being held over the Internet, and students attend these classes from their homes. While the COVID-19 pandemic is expected to be prolonged, the online-centric lifestyle has raised concerns about secondary health issues caused by reduced physical activity (PA). However, the actual status of PA among university students has not yet been examined in Japan. Hence, in this study, we collected daily PA data (including the data corresponding to the number of steps taken and the data associated with six types of activities) by employing smartphones and thereby analyzed the changes in the PA of university students. The PA data were collected over a period of ten weeks from 305 first-year university students who were attending a mandatory class of physical education at the university. The obtained results indicate that compared to the average number of steps taken before the COVID-19 pandemic (6474.87 steps), the average number of steps taken after the COVID-19 pandemic (3522.5 steps) has decreased by 45.6%. Furthermore, the decrease in commuting time (7 AM to 10 AM), classroom time, and extracurricular activity time (11 AM to 12 AM) has led to a decrease in PA on weekdays owing to reduced unplanned exercise opportunities and has caused an increase in the duration of being in the stationary state in the course of daily life. △ Less

Submitted 11 March, 2021; originally announced March 2021.

Comments: 12 pages, in Japanese, 16 figures and 2 tables

ACM Class: J.4; H.4.0; H.5.3

arXiv:2103.03209 [pdf, ps, other]

Requirement Analyses and Evaluations of Blockchain Platforms per Possible Use Cases

Authors: Kenji Saito, Akimitsu Shiseki, Mitsuyasu Takada, Hiroki Yamamoto, Masaaki Saitoh, Hiroaki Ohkawa, Hirofumi Andou, Naotake Miyamoto, Kazuaki Yamakawa, Kiyoshi Kurakawa, Tomohiro Yabushita, Yuji Yamada, Go Masuda, Kazuyuki Masuda

Abstract: It is said that blockchain will contribute to the digital transformation of society in a wide range of ways, from the management of public and private documents to the traceability in various industries, as well as digital currencies. A number of so-called blockchain platforms have been developed, and experiments and applications have been carried out on them. But are these platforms really conduc… ▽ More It is said that blockchain will contribute to the digital transformation of society in a wide range of ways, from the management of public and private documents to the traceability in various industries, as well as digital currencies. A number of so-called blockchain platforms have been developed, and experiments and applications have been carried out on them. But are these platforms really conducive to practical use of the blockchain concept? To answer the question, we need to better understand what the technology called blockchain really is. We need to sort out the confusion we see in understanding what blockchain was invented for and what it means. We also need to clarify the structure of its applications. This document provides a generic model of understanding blockchain and its applications. We introduce design patterns to classify the platforms. We categorize possible use cases by identifying the structure among applications, and organize the functional, performance, operational and legal requirements for each such case. Based on the categorization and criteria, we evaluated and compared the following platforms: Hyperledger Fabric, Hyperledger Iroha, Hyperledger Indy, Ethereum, Quorum/Hyperledger Besu, Ethereum 2.0, Polkadot, Corda and BBc-1. We have tried to be fair in our evaluations and comparisons, but we also expect to provoke discussion. The intended readers for this document is anyone involved in development of application systems who wants to understand blockchain and their platforms, including non-engineers and non-technologists. The assessments in this document will allow readers to understand the technological requirements for the blockchain platforms, to question existing technologies, and to choose the appropriate platforms for the applications they envision. The comparisons hopefully will also be useful as a guide for designing new technologies. △ Less

Submitted 4 March, 2021; originally announced March 2021.

Comments: 50 pages, 3 figures

arXiv:2012.09407 [pdf, other]

Joint Search of Data Augmentation Policies and Network Architectures

Authors: Taiga Kashima, Yoshihiro Yamada, Shunta Saito

Abstract: The common pipeline of training deep neural networks consists of several building blocks such as data augmentation and network architecture selection. AutoML is a research field that aims at automatically designing those parts, but most methods explore each part independently because it is more challenging to simultaneously search all the parts. In this paper, we propose a joint optimization metho… ▽ More The common pipeline of training deep neural networks consists of several building blocks such as data augmentation and network architecture selection. AutoML is a research field that aims at automatically designing those parts, but most methods explore each part independently because it is more challenging to simultaneously search all the parts. In this paper, we propose a joint optimization method for data augmentation policies and network architectures to bring more automation to the design of training pipeline. The core idea of our approach is to make the whole part differentiable. The proposed method combines differentiable methods for augmentation policy search and network architecture search to jointly optimize them in the end-to-end manner. The experimental results show our method achieves competitive or superior performance to the independently searched results. △ Less

Submitted 12 January, 2021; v1 submitted 17 December, 2020; originally announced December 2020.

Comments: AAAI 2021 Workshop: Learning Network Architecture during Training

arXiv:2012.03598 [pdf, other]

Self-supervised Deep Learning for Reading Activity Classification

Authors: Md. Rabiul Islam, Shuji Sakamoto, Yoshihiro Yamada, Andrew Vargo, Motoi Iwata, Masakazu Iwamura, Koichi Kise

Abstract: Reading analysis can give important information about a user's confidence and habits and can be used to construct feedback to improve a user's reading behavior. A lack of labeled data inhibits the effective application of fully-supervised Deep Learning (DL) for automatic reading analysis. In this paper, we propose a self-supervised DL method for reading analysis and evaluate it on two classificati… ▽ More Reading analysis can give important information about a user's confidence and habits and can be used to construct feedback to improve a user's reading behavior. A lack of labeled data inhibits the effective application of fully-supervised Deep Learning (DL) for automatic reading analysis. In this paper, we propose a self-supervised DL method for reading analysis and evaluate it on two classification tasks. We first evaluate the proposed self-supervised DL method on a four-class classification task on reading detection using electrooculography (EOG) glasses datasets, followed by an evaluation of a two-class classification task of confidence estimation on answers of multiple-choice questions (MCQs) using eye-tracking datasets. Fully-supervised DL and support vector machines (SVMs) are used to compare the performance of the proposed self-supervised DL method. The results show that the proposed self-supervised DL method is superior to the fully-supervised DL and SVM for both tasks, especially when training data is scarce. This result indicates that the proposed self-supervised DL method is the superior choice for reading analysis tasks. The results of this study are important for informing the design and implementation of automatic reading analysis platforms. △ Less

Submitted 7 December, 2020; originally announced December 2020.

Comments: 28 pages

arXiv:2002.02056 [pdf]

Design of the Inspection Process Using the GitHub Flow in Project Based Learning for Software Engineering and Its Practice

Authors: Yutsuki Miyashita, Yuki Yamada, Hiroaki Hashiura, Atsuo Hazeyama

Abstract: Project based learning (PBL) for software development (we call it software development PBL) has garnered attention as a practical educational method. A number of studies have reported on the introduction of social coding tools such as GitHub, in software development PBL. In education, it is important to give feedback (advice, error corrections, and so on) to learners, especially in software develo… ▽ More Project based learning (PBL) for software development (we call it software development PBL) has garnered attention as a practical educational method. A number of studies have reported on the introduction of social coding tools such as GitHub, in software development PBL. In education, it is important to give feedback (advice, error corrections, and so on) to learners, especially in software development PBL because almost all learners tackle practical software development from the viewpoint of technical and managerial aspects for the first time. This study regards inspection that is conducted in general software development activities as an opportunity to provide feedback and proposes the inspection process using the pull request on GitHub. By applying the proposed process to an actual software development PBL, we enable giving feedback to the accurate locations of artifacts the learners created. △ Less

Submitted 5 February, 2020; originally announced February 2020.

arXiv:1912.07801 [pdf]

An RSSI-based Wireless Sensor Node Localisation using Trilateration and Multilateration Methods for Outdoor Environment

Authors: Mohd Ismifaizul Mohd Ismail, Rudzidatul Akmam Dzyauddin, Shafiqa Samsul, Nur Aisyah Azmi, Yoshihide Yamada, Mohd Fitri Mohd Yakub, Noor Azurati Binti Ahmad Salleh

Abstract: Localisation can be defined as estimating or finding a position of the node. There are two techniques in localisation, which are range-based and range-free techniques. This paper focusses on the Received Signal Strength Indicator (RSSI) localisation method, which is categorised in a range-based technique along with the time of arrival, time difference of arrival and angle of arrival. Therefore, th… ▽ More Localisation can be defined as estimating or finding a position of the node. There are two techniques in localisation, which are range-based and range-free techniques. This paper focusses on the Received Signal Strength Indicator (RSSI) localisation method, which is categorised in a range-based technique along with the time of arrival, time difference of arrival and angle of arrival. Therefore, this study aims to compare the trilateration and multilateration method for RSSI-based technique for localising the transmitted (Tx) node. The wireless sensor module in the work used LOng-RAnge radio (LoRa) with 868MHz frequency. Nowadays, wireless networks have been a key technology for smart environments, monitoring, and object tracking due to low power consumption with long-range connectivity. The number of received (Rx) nodes are three and four for trilateration and multilateration methods, respectively. The transmitted node is placed at 32 different coordinates within the 10x10 meter outdoor area. The results show that error localisation obtained for General Error Localisation (GER) for multilateration and trilateration is 1.83m and 2.30m, respectively. An additional, the maximum and minimum error for multilateration and trilateration from 1.00 to 5.28m and 0.5 to 3.61m. The study concludes that the multilateration method more accurate than trilateration. Therefore, with the increasing number of Rx node, the accuracy of localisation of the Tx node increases. △ Less

Submitted 16 December, 2019; originally announced December 2019.

arXiv:1906.09786 [pdf, other]

Extending Attack Graphs to Represent Cyber-Attacks in Communication Protocols and Modern IT Networks

Authors: Orly Stan, Ron Bitton, Michal Ezrets, Moran Dadon, Masaki Inokuchi, Yoshinobu Ohta, Yoshiyuki Yamada, Tomohiko Yagyu, Yuval Elovici, Asaf Shabtai

Abstract: An attack graph is a method used to enumerate the possible paths that an attacker can execute in the organization network. MulVAL is a known open-source framework used to automatically generate attack graphs. MulVAL's default modeling has two main shortcomings. First, it lacks the representation of network protocol vulnerabilities, and thus it cannot be used to model common network attacks such as… ▽ More An attack graph is a method used to enumerate the possible paths that an attacker can execute in the organization network. MulVAL is a known open-source framework used to automatically generate attack graphs. MulVAL's default modeling has two main shortcomings. First, it lacks the representation of network protocol vulnerabilities, and thus it cannot be used to model common network attacks such as ARP poisoning, DNS spoofing, and SYN flooding. Second, it does not support advanced types of communication such as wireless and bus communication, and thus it cannot be used to model cyber-attacks on networks that include IoT devices or industrial components. In this paper, we present an extended network security model for MulVAL that: (1) considers the physical network topology, (2) supports short-range communication protocols (e.g., Bluetooth), (3) models vulnerabilities in the design of network protocols, and (4) models specific industrial communication architectures. Using the proposed extensions, we were able to model multiple attack techniques including: spoofing, man-in-the-middle, and denial of service, as well as attacks on advanced types of communication. We demonstrate the proposed model on a testbed implementing a simplified network architecture comprised of both IT and industrial components. △ Less

Submitted 24 June, 2019; originally announced June 2019.

arXiv:1812.10087 [pdf, other]

Classification of X-Ray Protein Crystallization Using Deep Convolutional Neural Networks with a Finder Module

Authors: Yusei Miura, Tetsuya Sakurai, Claus Aranha, Toshiya Senda, Ryuichi Kato, Yusuke Yamada

Abstract: Recently, deep convolutional neural networks have shown good results for image recognition. In this paper, we use convolutional neural networks with a finder module, which discovers the important region for recognition and extracts that region. We propose applying our method to the recognition of protein crystals for X-ray structural analysis. In this analysis, it is necessary to recognize states… ▽ More Recently, deep convolutional neural networks have shown good results for image recognition. In this paper, we use convolutional neural networks with a finder module, which discovers the important region for recognition and extracts that region. We propose applying our method to the recognition of protein crystals for X-ray structural analysis. In this analysis, it is necessary to recognize states of protein crystallization from a large number of images. There are several methods that realize protein crystallization recognition by using convolutional neural networks. In each method, large-scale data sets are required to recognize with high accuracy. In our data set, the number of images is not good enough for training CNN. The amount of data for CNN is a serious issue in various fields. Our method realizes high accuracy recognition with few images by discovering the region where the crystallization drop exists. We compared our crystallization image recognition method with a high precision method using Inception-V3. We demonstrate that our method is effective for crystallization images using several experiments. Our method gained the AUC value that is about 5% higher than the compared method. △ Less

Submitted 25 December, 2018; originally announced December 2018.

Comments: 7 pages, 16 figures

arXiv:1810.04247 [pdf, other]

Feature Selection using Stochastic Gates

Authors: Yutaro Yamada, Ofir Lindenbaum, Sahand Negahban, Yuval Kluger

Abstract: Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that… ▽ More Feature selection problems have been extensively studied for linear estimation, for instance, Lasso, but less emphasis has been placed on feature selection for non-linear functions. In this study, we propose a method for feature selection in high-dimensional non-linear function estimation problems. The new procedure is based on minimizing the $\ell_0$ norm of the vector of indicator variables that represent if a feature is selected or not. Our approach relies on the continuous relaxation of Bernoulli distributions, which allows our model to learn the parameters of the approximate Bernoulli distributions via gradient descent. This general framework simultaneously minimizes a loss function while selecting relevant features. Furthermore, we provide an information-theoretic justification of incorporating Bernoulli distribution into our approach and demonstrate the potential of the approach on synthetic and real-life applications. △ Less

Submitted 26 July, 2020; v1 submitted 9 October, 2018; originally announced October 2018.

Comments: Published in ICML 2020

Journal ref: Proceedings of Machine Learning and Systems 2020, pages 8952--8963

arXiv:1803.10840 [pdf, other]

Defending against Adversarial Images using Basis Functions Transformations

Authors: Uri Shaham, James Garritano, Yutaro Yamada, Ethan Weinberger, Alex Cloninger, Xiuyuan Cheng, Kelly Stanton, Yuval Kluger

Abstract: We study the effectiveness of various approaches that defend against adversarial attacks on deep networks via manipulations based on basis function representations of images. Specifically, we experiment with low-pass filtering, PCA, JPEG compression, low resolution wavelet approximation, and soft-thresholding. We evaluate these defense techniques using three types of popular attacks in black, gray… ▽ More We study the effectiveness of various approaches that defend against adversarial attacks on deep networks via manipulations based on basis function representations of images. Specifically, we experiment with low-pass filtering, PCA, JPEG compression, low resolution wavelet approximation, and soft-thresholding. We evaluate these defense techniques using three types of popular attacks in black, gray and white-box settings. Our results show JPEG compression tends to outperform the other tested defenses in most of the settings considered, in addition to soft-thresholding, which performs well in specific cases, and yields a more mild decrease in accuracy on benign examples. In addition, we also mathematically derive a novel white-box attack in which the adversarial perturbation is composed only of terms corresponding a to pre-determined subset of the basis functions, of which a "low frequency attack" is a special case. △ Less

Submitted 16 April, 2018; v1 submitted 28 March, 2018; originally announced March 2018.

Comments: added link to GitHub repository

arXiv:1802.02375 [pdf, other]

doi 10.1109/ACCESS.2019.2960566

ShakeDrop Regularization for Deep Residual Learning

Authors: Yoshihiro Yamada, Masakazu Iwamura, Takuya Akiba, Koichi Kise

Abstract: Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied… ▽ More Overfitting is a crucial problem in deep neural networks, even in the latest network architectures. In this paper, to relieve the overfitting effect of ResNet and its improvements (i.e., Wide ResNet, PyramidNet, and ResNeXt), we propose a new regularization method called ShakeDrop regularization. ShakeDrop is inspired by Shake-Shake, which is an effective regularization method, but can be applied to ResNeXt only. ShakeDrop is more effective than Shake-Shake and can be applied not only to ResNeXt but also ResNet, Wide ResNet, and PyramidNet. An important key is to achieve stability of training. Because effective regularization often causes unstable training, we introduce a training stabilizer, which is an unusual use of an existing regularizer. Through experiments under various conditions, we demonstrate the conditions under which ShakeDrop works well. △ Less

Submitted 6 January, 2020; v1 submitted 7 February, 2018; originally announced February 2018.

Journal ref: IEEE Access, 7, 1, pp.186126-186136 (2019)

arXiv:1612.01230 [pdf, ps, other]

Deep Pyramidal Residual Networks with Separated Stochastic Depth

Authors: Yoshihiro Yamada, Masakazu Iwamura, Koichi Kise

Abstract: On general object recognition, Deep Convolutional Neural Networks (DCNNs) achieve high accuracy. In particular, ResNet and its improvements have broken the lowest error rate records. In this paper, we propose a method to successfully combine two ResNet improvements, ResDrop and PyramidNet. We confirmed that the proposed network outperformed the conventional methods; on CIFAR-100, the proposed netw… ▽ More On general object recognition, Deep Convolutional Neural Networks (DCNNs) achieve high accuracy. In particular, ResNet and its improvements have broken the lowest error rate records. In this paper, we propose a method to successfully combine two ResNet improvements, ResDrop and PyramidNet. We confirmed that the proposed network outperformed the conventional methods; on CIFAR-100, the proposed network achieved an error rate of 16.18% in contrast to PiramidNet achieving that of 18.29% and ResNeXt 17.31%. △ Less

Submitted 4 December, 2016; originally announced December 2016.

arXiv:1609.03191 [pdf]

doi 10.1016/j.cognition.2016.09.001

When categorization-based stranger avoidance explains the uncanny valley: A comment on MacDorman & Chattopadhyay (2016)

Authors: Takahiro Kawabe, Kyoshiro Sasaki, Keiko Ihaya, Yuki Yamada

Abstract: Artificial objects often subjectively look eerie when their appearance to some extent resembles a human, which is known as the uncanny valley phenomenon. From a cognitive psychology perspective, several explanations of the phenomenon have been put forth, two of which are object categorization and realism inconsistency. Recently, MacDorman and Chattopadhyay (2016) reported experimental data as evid… ▽ More Artificial objects often subjectively look eerie when their appearance to some extent resembles a human, which is known as the uncanny valley phenomenon. From a cognitive psychology perspective, several explanations of the phenomenon have been put forth, two of which are object categorization and realism inconsistency. Recently, MacDorman and Chattopadhyay (2016) reported experimental data as evidence in support of the latter. In our estimation, however, their results are still consistent with categorization-based stranger avoidance. In this Discussions paper, we try to describe why categorization-based stranger avoidance remains a viable explanation, despite the evidence of MacDorman and Chattopadhyay, and how it offers a more inclusive explanation of the impression of eeriness in the uncanny valley phenomenon. △ Less

Submitted 20 September, 2016; v1 submitted 11 September, 2016; originally announced September 2016.

Comments: published in Cognition

Journal ref: Cognition, 2016

arXiv:1602.04579 [pdf, other]

Secure Approximation Guarantee for Cryptographically Private Empirical Risk Minimization

Authors: Toshiyuki Takada, Hiroyuki Hanada, Yoshiji Yamada, Jun Sakuma, Ichiro Takeuchi

Abstract: Privacy concern has been increasingly important in many machine learning (ML) problems. We study empirical risk minimization (ERM) problems under secure multi-party computation (MPC) frameworks. Main technical tools for MPC have been developed based on cryptography. One of limitations in current cryptographically private ML is that it is computationally intractable to evaluate non-linear functions… ▽ More Privacy concern has been increasingly important in many machine learning (ML) problems. We study empirical risk minimization (ERM) problems under secure multi-party computation (MPC) frameworks. Main technical tools for MPC have been developed based on cryptography. One of limitations in current cryptographically private ML is that it is computationally intractable to evaluate non-linear functions such as logarithmic functions or exponential functions. Therefore, for a class of ERM problems such as logistic regression in which non-linear function evaluations are required, one can only obtain approximate solutions. In this paper, we introduce a novel cryptographically private tool called secure approximation guarantee (SAG) method. The key property of SAG method is that, given an arbitrary approximate solution, it can provide a non-probabilistic assumption-free bound on the approximation quality under cryptographically secure computation framework. We demonstrate the benefit of the SAG method by applying it to several problems including a practical privacy-preserving data analysis task on genomic and clinical information. △ Less

Submitted 15 February, 2016; originally announced February 2016.

arXiv:1511.05432 [pdf, other]

doi 10.1016/j.neucom.2018.04.027

Understanding Adversarial Training: Increasing Local Stability of Neural Nets through Robust Optimization

Authors: Uri Shaham, Yutaro Yamada, Sahand Negahban

Abstract: We propose a general framework for increasing local stability of Artificial Neural Nets (ANNs) using Robust Optimization (RO). We achieve this through an alternating minimization-maximization procedure, in which the loss of the network is minimized over perturbed examples that are generated at each parameter update. We show that adversarial training of ANNs is in fact robustification of the networ… ▽ More We propose a general framework for increasing local stability of Artificial Neural Nets (ANNs) using Robust Optimization (RO). We achieve this through an alternating minimization-maximization procedure, in which the loss of the network is minimized over perturbed examples that are generated at each parameter update. We show that adversarial training of ANNs is in fact robustification of the network optimization, and that our proposed framework generalizes previous approaches for increasing local stability of ANNs. Experimental results reveal that our approach increases the robustness of the network to existing adversarial examples, while making it harder to generate new ones. Furthermore, our algorithm improves the accuracy of the network also on the original test data. △ Less

Submitted 16 January, 2016; v1 submitted 17 November, 2015; originally announced November 2015.

Showing 1–29 of 29 results for author: Yamada, Y