Skip to main content

Showing 1–14 of 14 results for author: Takano, R

  1. arXiv:2401.09284  [pdf, other

    cs.NI

    A Fast Control Plane for a Large-Scale and High-Speed Optical Circuit Switch System

    Authors: Ryousei Takano, Kiyo Ishii, Toshiyuki Shimizu, Fumihiro Okazaki, Shu Namiki, Ken-ichi Sato

    Abstract: We experimentally verify a fast control plane with 100 microseconds of configuration time that can support more than 1000 racks, leveraged by a software-defined network controller and an industrial real-time Ethernet standard EtherCAT.

    Submitted 17 January, 2024; originally announced January 2024.

    Comments: 5 pages, 4 figures

  2. arXiv:2309.06565  [pdf, other

    cs.AR cs.PF

    METICULOUS: An FPGA-based Main Memory Emulator for System Software Studies

    Authors: Takahiro Hirofuchi, Takaaki Fukai, Akram Ben Ahmed, Ryousei Takano, Kento Sato

    Abstract: Due to the scaling problem of the DRAM technology, non-volatile memory devices, which are based on different principle of operation than DRAM, are now being intensively developed to expand the main memory of computers. Disaggregated memory is also drawing attention as an emerging technology to scale up the main memory. Although system software studies need to discuss management mechanisms for the… ▽ More

    Submitted 7 September, 2023; originally announced September 2023.

  3. arXiv:2105.12301  [pdf, other

    cs.DC cs.MS cs.PF

    kEDM: A Performance-portable Implementation of Empirical Dynamic Modeling using Kokkos

    Authors: Keichi Takahashi, Wassapon Watanakeesuntorn, Kohei Ichikawa, Joseph Park, Ryousei Takano, Jason Haga, George Sugihara, Gerald M. Pao

    Abstract: Empirical Dynamic Modeling (EDM) is a state-of-the-art non-linear time-series analysis framework. Despite its wide applicability, EDM was not scalable to large datasets due to its expensive computational cost. To overcome this obstacle, researchers have attempted and succeeded in accelerating EDM from both algorithmic and implementational aspects. In previous work, we developed a massively paralle… ▽ More

    Submitted 25 May, 2021; originally announced May 2021.

    Comments: 8 pages, 9 figures, accepted at Practice & Experience in Advanced Research Computing (PEARC'21), corresponding authors: Keichi Takahashi, Gerald M. Pao

  4. An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks

    Authors: Albert Njoroge Kahira, Truong Thao Nguyen, Leonardo Bautista Gomez, Ryousei Takano, Rosa M Badia, Mohamed Wahib

    Abstract: Deep Neural Network (DNN) frameworks use distributed training to enable faster time to convergence and alleviate memory capacity limitations when training large models and/or using high dimension inputs. With the steady increase in datasets and model sizes, model/hybrid parallelism is deemed to have an important role in the future of distributed training of DNNs. We analyze the compute, communicat… ▽ More

    Submitted 19 April, 2021; originally announced April 2021.

    Comments: The International ACM Symposium on High-Performance Parallel and Distributed Computing 2021 (HPDC'21)

  5. arXiv:2011.11082  [pdf, other

    cs.DC math.DS q-bio.QM

    Massively Parallel Causal Inference of Whole Brain Dynamics at Single Neuron Resolution

    Authors: Wassapon Watanakeesuntorn, Keichi Takahashi, Kohei Ichikawa, Joseph Park, George Sugihara, Ryousei Takano, Jason Haga, Gerald M. Pao

    Abstract: Empirical Dynamic Modeling (EDM) is a nonlinear time series causal inference framework. The latest implementation of EDM, cppEDM, has only been used for small datasets due to computational cost. With the growth of data collection capabilities, there is a great need to identify causal relationships in large datasets. We present mpEDM, a parallel distributed implementation of EDM optimized for moder… ▽ More

    Submitted 22 November, 2020; originally announced November 2020.

    Comments: 10 pges, 10 figures, accepted at IEEE International Conference on Parallel and Distributed Systems (ICPADS)2020, corresponding authors: Keichi Takahashi, Gerald M Pao

    ACM Class: K.6.3; G.4; J.3

  6. arXiv:2010.13594  [pdf, other

    cs.OS

    Disaggregated Accelerator Management System for Cloud Data Centers

    Authors: Ryousei Takano, Kuniyasu Suzaki

    Abstract: A conventional data center that consists of monolithic-servers is confronted with limitations including lack of operational flexibility, low resource utilization, low maintainability, etc. Resource disaggregation is a promising solution to address the above issues. We propose a concept of disaggregated cloud data center architecture called Flow-in-Cloud (FiC) that enables an existing cluster compu… ▽ More

    Submitted 26 October, 2020; originally announced October 2020.

    Comments: To appear in IEICE Transactions on Information and Systems, 2020

  7. arXiv:2008.11421  [pdf, other

    cs.DC cs.LG

    Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

    Authors: Mohamed Wahib, Haoyu Zhang, Truong Thao Nguyen, Aleksandr Drozd, Jens Domke, Lingqi Zhang, Ryousei Takano, Satoshi Matsuoka

    Abstract: The dedicated memory of hardware accelerators can be insufficient to store all weights and/or intermediate states of large deep learning models. Although model parallelism is a viable approach to reduce the memory pressure issue, significant modification of the source code and considerations for algorithms are required. An alternative solution is to use out-of-core methods instead of, or in additi… ▽ More

    Submitted 26 August, 2020; originally announced August 2020.

    Comments: ACM/IEEE Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'20)

  8. A Prompt Report on the Performance of Intel Optane DC Persistent Memory Module

    Authors: Takahiro Hirofuchi, Ryousei Takano

    Abstract: In this prompt report, we present the basic performance evaluation of Intel Optane Data Center Persistent Memory Module (Optane DCPMM), which is the first commercially-available, byte-addressable non-volatile memory modules released in April 2019. Since at the moment of writing only a few reports on its performance were published, this letter is intended to complement other performance studies. Th… ▽ More

    Submitted 13 February, 2020; originally announced February 2020.

    Comments: To appear in IEICE Transactions on Information and Systems, 2020. arXiv admin note: substantial text overlap with arXiv:1907.12014

  9. iFDK: A Scalable Framework for Instant High-resolution Image Reconstruction

    Authors: Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

    Abstract: Computed Tomography (CT) is a widely used technology that requires compute-intense algorithms for image reconstruction. We propose a novel back-projection algorithm that reduces the projection computation cost to 1/6 of the standard algorithm. We also propose an efficient implementation that takes advantage of the heterogeneity of GPU-accelerated systems by overlapping the filtering and back-proje… ▽ More

    Submitted 6 September, 2019; originally announced September 2019.

    Comments: ACM/IEEE Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'19)

  10. A Software-based NVM Emulator Supporting Read/Write Asymmetric Latencies

    Authors: Atsushi Koshiba, Takahiro Hirofuchi, Ryousei Takano, Mitaro Namiki

    Abstract: Non-volatile memory (NVM) is a promising technology for low-energy and high-capacity main memory of computers. The characteristics of NVM devices, however, tend to be fundamentally different from those of DRAM (i.e., the memory device currently used for main memory), because of differences in principles of memory cells. Typically, the write latency of an NVM device such as PCM and ReRAM is much hi… ▽ More

    Submitted 2 August, 2019; originally announced August 2019.

    Comments: To appear in IEICE Transactions on Information and Systems, December, 2019

  11. arXiv:1907.12014  [pdf, other

    cs.OS cs.AR cs.PF

    The Preliminary Evaluation of a Hypervisor-based Virtualization Mechanism for Intel Optane DC Persistent Memory Module

    Authors: Takahiro Hirofuchi, Ryousei Takano

    Abstract: Non-volatile memory (NVM) technologies, being accessible in the same manner as DRAM, are considered indispensable for expanding main memory capacities. Intel Optane DCPMM is a long-awaited product that drastically increases main memory capacities. However, a substantial performance gap exists between DRAM and DCPMM. In our experiments, the read/write latencies of DCPMM were 400% and 407% higher th… ▽ More

    Submitted 28 July, 2019; originally announced July 2019.

    ACM Class: D.4; B.3

  12. A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

    Authors: Peng Chen, Mohamed Wahib, Shinichiro Takizawa, Ryousei Takano, Satoshi Matsuoka

    Abstract: This paper proposes a versatile high-performance execution model, inspired by systolic arrays, for memory-bound regular kernels running on CUDA-enabled GPUs. We formulate a systolic model that shifts partial sums by CUDA warp primitives for the computation. We also employ register files as a cache resource in order to operate the entire model efficiently. We demonstrate the effectiveness and versa… ▽ More

    Submitted 6 September, 2019; v1 submitted 13 July, 2019; originally announced July 2019.

    Comments: ACM/IEEE Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'19)

  13. arXiv:1902.01514  [pdf

    cs.LG stat.ML

    Perturbative GAN: GAN with Perturbation Layers

    Authors: Yuma Kishi, Tsutomu Ikegami, Shin-ichi O'uchi, Ryousei Takano, Wakana Nogami, Tomohiro Kudoh

    Abstract: Perturbative GAN, which replaces convolution layers of existing convolutional GANs (DCGAN, WGAN-GP, BIGGAN, etc.) with perturbation layers that adds a fixed noise mask, is proposed. Compared with the convolu-tional GANs, the number of parameters to be trained is smaller, the convergence of training is faster, the incep-tion score of generated images is higher, and the overall training cost is redu… ▽ More

    Submitted 4 February, 2019; originally announced February 2019.

  14. arXiv:1509.06991  [pdf, other

    cs.NI

    Feasibility Evaluation of 6LoWPAN over Bluetooth Low Energy

    Authors: Varat Chawathaworncharoen, Vasaka Visoottiviseth, Ryousei Takano

    Abstract: IPv6 over Low power Wireless Personal Area Network (6LoWPAN) is an emerging technology to enable ubiquitous IoT services. However, there are very few studies of the performance evaluation on real hardware environments. This paper demonstrates the feasibility of 6LoWPAN through conducting a preliminary performance evaluation of a commodity hardware environment, including Bluetooth Low Energy (BLE)… ▽ More

    Submitted 23 September, 2015; originally announced September 2015.

    Comments: 4 pages, PRAGMA Workshop on International Clouds for Data Science (PRAGMA-ICDS 2015)