subscribe to arXiv mailings

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

Authors: Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

Abstract: Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of mod… ▽ More Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of model parameters. To address the challenges, we propose SpecMaskGIT, a light-weighted, efficient yet effective TTA model based on the masked generative modeling of spectrograms. First, SpecMaskGIT synthesizes a realistic 10s audio clip by less than 16 iterations, an order-of-magnitude less than previous iterative TTA methods. As a discrete model, SpecMaskGIT outperforms larger VQ-Diffusion and auto-regressive models in the TTA benchmark, while being real-time with only 4 CPU cores or even 30x faster with a GPU. Next, built upon a latent space of Mel-spectrogram, SpecMaskGIT has a wider range of applications (e.g., the zero-shot bandwidth extension) than similar methods built on the latent wave domain. Moreover, we interpret SpecMaskGIT as a generative extension to previous discriminative audio masked Transformers, and shed light on its audio representation learning potential. We hope our work inspires the exploration of masked audio modeling toward further diverse scenarios. △ Less

Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

Comments: 6 pages, 8 figures, 8 tables. Audio samples: https://zzaudio.github.io/SpecMaskGIT/index.html

arXiv:2203.04249 [pdf]

Evaluating feasibility of batteries for second-life applications using machine learning

Authors: Aki Takahashi, Anirudh Allam, Simona Onori

Abstract: This paper presents a combination of machine learning techniques to enable prompt evaluation of retired electric vehicle batteries as to either retain those batteries for a second-life application and extend their operation beyond the original and first intent or send them to recycle facilities. The proposed algorithm generates features from available battery current and voltage measurements with… ▽ More This paper presents a combination of machine learning techniques to enable prompt evaluation of retired electric vehicle batteries as to either retain those batteries for a second-life application and extend their operation beyond the original and first intent or send them to recycle facilities. The proposed algorithm generates features from available battery current and voltage measurements with simple statistics, selects and ranks the features using correlation analysis, and employs Gaussian Process Regression enhanced with bagging. This approach is validated over publicly available aging datasets of more than 200 cells with slow and fast charging, with different cathode chemistries, and for diverse operating conditions. Promising results are observed based on multiple training-test partitions, wherein the mean of Root Mean Squared Percent Error and Mean Percent Error performance errors are found to be less than 1.48% and 1.29%, respectively, in the worst-case scenarios. △ Less

Submitted 7 April, 2023; v1 submitted 8 March, 2022; originally announced March 2022.

Comments: 23 pages

arXiv:2104.10953 [pdf, other]

doi 10.1016/j.jss.2021.110986

An Extensive Study on Smell-Aware Bug Localization

Authors: Aoi Takahashi, Natthawute Sae-Lim, Shinpei Hayashi, Motoshi Saeki

Abstract: Bug localization is an important aspect of software maintenance because it can locate modules that should be changed to fix a specific bug. Our previous study showed that the accuracy of the information retrieval (IR)-based bug localization technique improved when used in combination with code smell information. Although this technique showed promise, the study showed limited usefulness because of… ▽ More Bug localization is an important aspect of software maintenance because it can locate modules that should be changed to fix a specific bug. Our previous study showed that the accuracy of the information retrieval (IR)-based bug localization technique improved when used in combination with code smell information. Although this technique showed promise, the study showed limited usefulness because of the small number of: 1) projects in the dataset, 2) types of smell information, and 3) baseline bug localization techniques used for assessment. This paper presents an extension of our previous experiments on Bench4BL, the largest bug localization benchmark dataset available for bug localization. In addition, we generalized the smell-aware bug localization technique to allow different configurations of smell information, which were combined with various bug localization techniques. Our results confirmed that our technique can improve the performance of IR-based bug localization techniques for the class level even when large datasets are processed. Furthermore, because of the optimized configuration of the smell information, our technique can enhance the performance of most state-of-the-art bug localization techniques. △ Less

Submitted 22 April, 2021; originally announced April 2021.

Comments: 19 pages, JSS

Journal ref: Journal of Systems and Software, 178(110986):1-17, 2021

arXiv:2103.08346 [pdf, ps, other]

Maximum Number of Steps of Topswops on 18 and 19 Cards

Authors: Kento Kimura, Atsuki Takahashi, Tetsuya Araki, Kazuyuki Amano

Abstract: Let $f(n)$ be the maximum number of steps of Topswops on $n$ cards. In this note, we report our computational experiments to determine the values of $f(18)$ and $f(19)$. By applying an algorithm developed by Knuth in a parallel fashion, we conclude that $f(18)=191$ and $f(19)=221$. Let $f(n)$ be the maximum number of steps of Topswops on $n$ cards. In this note, we report our computational experiments to determine the values of $f(18)$ and $f(19)$. By applying an algorithm developed by Knuth in a parallel fashion, we conclude that $f(18)=191$ and $f(19)=221$. △ Less

Submitted 15 March, 2021; originally announced March 2021.

Comments: 4 pages

arXiv:2008.11917 [pdf, ps, other]

Fingerprint Feature Extraction by Combining Texture, Minutiae, and Frequency Spectrum Using Multi-Task CNN

Authors: Ai Takahashi, Yoshinori Koda, Koichi Ito, Takafumi Aoki

Abstract: Although most fingerprint matching methods utilize minutia points and/or texture of fingerprint images as fingerprint features, the frequency spectrum is also a useful feature since a fingerprint is composed of ridge patterns with its inherent frequency band. We propose a novel CNN-based method for extracting fingerprint features from texture, minutiae, and frequency spectrum. In order to extract… ▽ More Although most fingerprint matching methods utilize minutia points and/or texture of fingerprint images as fingerprint features, the frequency spectrum is also a useful feature since a fingerprint is composed of ridge patterns with its inherent frequency band. We propose a novel CNN-based method for extracting fingerprint features from texture, minutiae, and frequency spectrum. In order to extract effective texture features from local regions around the minutiae, the minutia attention module is introduced to the proposed method. We also propose new data augmentation methods, which takes into account the characteristics of fingerprint images to increase the number of images during training since we use only a public dataset in training, which includes a few fingerprint classes. Through a set of experiments using FVC2004 DB1 and DB2, we demonstrated that the proposed method exhibits the efficient performance on fingerprint verification compared with a commercial fingerprint matching software and the conventional method. △ Less

Submitted 27 August, 2020; originally announced August 2020.

Comments: IJCB2020

arXiv:2006.08965 [pdf, other]

Efficient Path Algorithms for Clustered Lasso and OSCAR

Authors: Atsumori Takahashi, Shunichi Nomura

Abstract: In high dimensional regression, feature clustering by their effects on outcomes is often as important as feature selection. For that purpose, clustered Lasso and octagonal shrinkage and clustering algorithm for regression (OSCAR) are used to make feature groups automatically by pairwise $L_1$ norm and pairwise $L_\infty$ norm, respectively. This paper proposes efficient path algorithms for cluster… ▽ More In high dimensional regression, feature clustering by their effects on outcomes is often as important as feature selection. For that purpose, clustered Lasso and octagonal shrinkage and clustering algorithm for regression (OSCAR) are used to make feature groups automatically by pairwise $L_1$ norm and pairwise $L_\infty$ norm, respectively. This paper proposes efficient path algorithms for clustered Lasso and OSCAR to construct solution paths with respect to their regularization parameters. Despite too many terms in exhaustive pairwise regularization, their computational costs are reduced by using symmetry of those terms. Simple equivalent conditions to check subgradient equations in each feature group are derived by some graph theories. The proposed algorithms are shown to be more efficient than existing algorithms in numerical experiments. △ Less

Submitted 16 June, 2020; originally announced June 2020.

Showing 1–6 of 6 results for author: Takahashi, A