Skip to main content

Showing 1–10 of 10 results for author: Yoshimura, T

  1. arXiv:2407.05467  [pdf, other

    cs.DC cs.AI

    The infrastructure powering IBM's Gen AI model development

    Authors: Talia Gershon, Seetharami Seelam, Brian Belgodere, Milton Bonilla, Lan Hoang, Danny Barnett, I-Hsin Chung, Apoorve Mohan, Ming-Hung Chen, Lixiang Luo, Robert Walkup, Constantinos Evangelinos, Shweta Salaria, Marc Dombrowa, Yoonho Park, Apo Kayi, Liran Schour, Alim Alim, Ali Sydney, Pavlos Maniotis, Laurent Schares, Bernard Metzler, Bengi Karacali-Akyamac, Sophia Wen, Tatsuhiro Chiba , et al. (121 additional authors not shown)

    Abstract: AI Infrastructure plays a key role in the speed and cost-competitiveness of developing and deploying advanced AI models. The current demand for powerful AI infrastructure for model training is driven by the emergence of generative AI and foundational models, where on occasion thousands of GPUs must cooperate on a single training job for the model to be trained in a reasonable time. Delivering effi… ▽ More

    Submitted 7 July, 2024; originally announced July 2024.

    Comments: Corresponding Authors: Talia Gershon, Seetharami Seelam,Brian Belgodere, Milton Bonilla

  2. arXiv:2309.01399  [pdf, other

    cs.DC

    Objcache: An Elastic Filesystem over External Persistent Storage for Container Clusters

    Authors: Takeshi Yoshimura, Tatsuhiro Chiba, Sunyanan Choochotkaew, Seetharami Seelam, Hui-fang Wen, Jonas Pfefferle

    Abstract: Container virtualization enables emerging AI workloads such as model serving, highly parallelized training, machine learning pipelines, and so on, to be easily scaled on demand on the elastic cloud infrastructure. Particularly, AI workloads require persistent storage to store data such as training inputs, models, and checkpoints. An external storage system like cloud object storage is a common cho… ▽ More

    Submitted 4 September, 2023; originally announced September 2023.

    Comments: 13 pages

  3. arXiv:2305.01828  [pdf, other

    cs.IT

    ns-3 Implementation of Sub-Terahertz and Millimeter Wave Drop-based NYU Channel Model (NYUSIM)

    Authors: Hitesh Poddar, Tomoki Yoshimura, Matteo Pagin, Theodore S Rappaport, Art Ishii, Michele Zorzi

    Abstract: The next generation of wireless networks will use sub-THz frequencies alongside mmWave frequencies to enable multi-Gbps and low-latency applications. To enable different verticals and use cases, engineers must take a holistic approach to build, analyze, and study different parts of the network and the interplay among the lower and higher layers of the protocol stack. It is of paramount importance… ▽ More

    Submitted 2 May, 2023; originally announced May 2023.

  4. arXiv:2302.12385  [pdf, ps, other

    cs.IT

    Full-Stack End-To-End mmWave Simulations Using 3GPP and NYUSIM Channel Model in ns-3

    Authors: H. Poddar, T. Yoshimura, M. Pagin, T. S. Rappaport, A. Ishii, M. Zorzi

    Abstract: Accurate channel modeling and simulation tools are vital for studying sub-THz and millimeter (mmWave) wideband communication system performance. To accurately design future high data rate, low latency wireless modems, the entire protocol stack must be appropriately modeled to understand how the physical layer impacts the end-to-end performance experienced by the end user. This paper presents a ful… ▽ More

    Submitted 5 March, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: ICC 2023 - 2023 IEEE International Conference on Communications

  5. arXiv:2211.11222  [pdf, other

    eess.AS cs.CL cs.SD

    Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

    Authors: Takenori Yoshimura, Shinji Takaki, Kazuhiro Nakamura, Keiichiro Oura, Yukiya Hono, Kei Hashimoto, Yoshihiko Nankaku, Keiichi Tokuda

    Abstract: This paper integrates a classic mel-cepstral synthesis filter into a modern neural speech synthesis system towards end-to-end controllable speech synthesis. Since the mel-cepstral synthesis filter is explicitly embedded in neural waveform models in the proposed system, both voice characteristics and the pitch of synthesized speech are highly controlled via a frequency warping parameter and fundame… ▽ More

    Submitted 21 November, 2022; originally announced November 2022.

    Comments: Submitted to ICASSP 2023

  6. arXiv:2110.07840  [pdf, other

    cs.CL cs.SD eess.AS

    ESPnet2-TTS: Extending the Edge of TTS Research

    Authors: Tomoki Hayashi, Ryuichi Yamamoto, Takenori Yoshimura, Peter Wu, Jiatong Shi, Takaaki Saeki, Yooncheol Ju, Yusuke Yasuda, Shinnosuke Takamichi, Shinji Watanabe

    Abstract: This paper describes ESPnet2-TTS, an end-to-end text-to-speech (E2E-TTS) toolkit. ESPnet2-TTS extends our earlier version, ESPnet-TTS, by adding many new features, including: on-the-fly flexible pre-processing, joint training with neural vocoders, and state-of-the-art TTS models with extensions like full-band E2E text-to-waveform modeling, which simplify the training pipeline and further enhance T… ▽ More

    Submitted 14 October, 2021; originally announced October 2021.

    Comments: Submitted to ICASSP2022. Demo HP: https://espnet.github.io/icassp2022-tts/

  7. arXiv:2002.00551  [pdf, other

    eess.AS cs.CL cs.SD

    End-to-End Automatic Speech Recognition Integrated With CTC-Based Voice Activity Detection

    Authors: Takenori Yoshimura, Tomoki Hayashi, Kazuya Takeda, Shinji Watanabe

    Abstract: This paper integrates a voice activity detection (VAD) function with end-to-end automatic speech recognition toward an online speech interface and transcribing very long audio recordings. We focus on connectionist temporal classification (CTC) and its extension of CTC/attention architectures. As opposed to an attention-based architecture, input-synchronous label prediction can be performed based o… ▽ More

    Submitted 14 February, 2020; v1 submitted 2 February, 2020; originally announced February 2020.

    Comments: Submitted to ICASSP 2020

  8. arXiv:1910.10909  [pdf, ps, other

    cs.CL eess.AS

    ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

    Authors: Tomoki Hayashi, Ryuichi Yamamoto, Katsuki Inoue, Takenori Yoshimura, Shinji Watanabe, Tomoki Toda, Kazuya Takeda, Yu Zhang, Xu Tan

    Abstract: This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit named ESPnet-TTS, which is an extension of the open-source speech processing toolkit ESPnet. The toolkit supports state-of-the-art E2E-TTS models, including Tacotron~2, Transformer TTS, and FastSpeech, and also provides recipes inspired by the Kaldi automatic speech recognition (ASR) toolkit. The recipes are based on the desig… ▽ More

    Submitted 16 February, 2020; v1 submitted 24 October, 2019; originally announced October 2019.

    Comments: Accepted to ICASSP2020. Demo HP: https://espnet.github.io/icassp2020-tts/

  9. A Comparative Study on Transformer vs RNN in Speech Applications

    Authors: Shigeki Karita, Nanxin Chen, Tomoki Hayashi, Takaaki Hori, Hirofumi Inaguma, Ziyan Jiang, Masao Someki, Nelson Enrique Yalta Soplin, Ryuichi Yamamoto, Xiaofei Wang, Shinji Watanabe, Takenori Yoshimura, Wangyou Zhang

    Abstract: Sequence-to-sequence models have been widely used in end-to-end speech processing, for example, automatic speech recognition (ASR), speech translation (ST), and text-to-speech (TTS). This paper focuses on an emergent sequence-to-sequence model called Transformer, which achieves state-of-the-art performance in neural machine translation and other natural language processing applications. We underto… ▽ More

    Submitted 28 September, 2019; v1 submitted 13 September, 2019; originally announced September 2019.

    Comments: Accepted at ASRU 2019

    Journal ref: IEEE Automatic Speech Recognition and Understanding Workshop 2019

  10. arXiv:1703.01457  [pdf

    cs.AR cs.LG

    Chain-NN: An Energy-Efficient 1D Chain Architecture for Accelerating Deep Convolutional Neural Networks

    Authors: Shihao Wang, Dajiang Zhou, Xushen Han, Takeshi Yoshimura

    Abstract: Deep convolutional neural networks (CNN) have shown their good performances in many computer vision tasks. However, the high computational complexity of CNN involves a huge amount of data movements between the computational processor core and memory hierarchy which occupies the major of the power consumption. This paper presents Chain-NN, a novel energy-efficient 1D chain architecture for accelera… ▽ More

    Submitted 4 March, 2017; originally announced March 2017.

    Comments: DATE 2017