subscribe to arXiv mailings

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track

Authors: Stefan Uhlich, Giorgio Fabbro, Masato Hirano, Shusuke Takahashi, Gordon Wichern, Jonathan Le Roux, Dipam Chakraborty, Sharada Mohanty, Kai Li, Yi Luo, Jianwei Yu, Rongzhi Gu, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Mikhail Sukhovei, Yuki Mitsufuji

Abstract: This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most succes… ▽ More This paper summarizes the cinematic demixing (CDX) track of the Sound Demixing Challenge 2023 (SDX'23). We provide a comprehensive summary of the challenge setup, detailing the structure of the competition and the datasets used. Especially, we detail CDXDB23, a new hidden dataset constructed from real movies that was used to rank the submissions. The paper also offers insights into the most successful approaches employed by participants. Compared to the cocktail-fork baseline, the best-performing system trained exclusively on the simulated Divide and Remaster (DnR) dataset achieved an improvement of 1.8 dB in SDR, whereas the top-performing system on the open leaderboard, where any data could be used for training, saw a significant improvement of 5.7 dB. A significant source of this improvement was making the simulated data better match real cinematic audio, which we further investigate in detail. △ Less

Submitted 18 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: Accepted for Transactions of the International Society for Music Information Retrieval

arXiv:2308.06979 [pdf, other]

doi 10.5334/tismir.171

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

Authors: Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang , et al. (2 additional authors not shown)

Abstract: This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce t… ▽ More This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions. △ Less

Submitted 19 April, 2024; v1 submitted 14 August, 2023; originally announced August 2023.

Comments: Published in Transactions of the International Society for Music Information Retrieval (https://transactions.ismir.net/articles/10.5334/tismir.171)

Journal ref: Transactions of the International Society for Music Information Retrieval, 7(1), pp.63-84, 2024

arXiv:2305.07489 [pdf, ps, other]

Benchmarks and leaderboards for sound demixing tasks

Authors: Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva

Abstract: Music demixing is the task of separating different tracks from the given single audio signal into components, such as drums, bass, and vocals from the rest of the accompaniment. Separation of sources is useful for a range of areas, including entertainment and hearing aids. In this paper, we introduce two new benchmarks for the sound source separation tasks and compare popular models for sound demi… ▽ More Music demixing is the task of separating different tracks from the given single audio signal into components, such as drums, bass, and vocals from the rest of the accompaniment. Separation of sources is useful for a range of areas, including entertainment and hearing aids. In this paper, we introduce two new benchmarks for the sound source separation tasks and compare popular models for sound demixing, as well as their ensembles, on these benchmarks. For the models' assessments, we provide the leaderboard at https://mvsep.com/quality_checker/, giving a comparison for a range of models. The new benchmark datasets are available for download. We also develop a novel approach for audio separation, based on the ensembling of different models that are suited best for the particular stem. The proposed solution was evaluated in the context of the Music Demixing Challenge 2023 and achieved top results in different tracks of the challenge. The code and the approach are open-sourced on GitHub. △ Less

Submitted 7 May, 2024; v1 submitted 12 May, 2023; originally announced May 2023.

arXiv:2104.01687 [pdf, other]

doi 10.1016/j.compbiomed.2021.105089

3D Convolutional Neural Networks for Stalled Brain Capillary Detection

Authors: Roman Solovyev, Alexandr A. Kalinin, Tatiana Gabruseva

Abstract: Adequate blood supply is critical for normal brain function. Brain vasculature dysfunctions such as stalled blood flow in cerebral capillaries are associated with cognitive decline and pathogenesis in Alzheimer's disease. Recent advances in imaging technology enabled generation of high-quality 3D images that can be used to visualize stalled blood vessels. However, localization of stalled vessels i… ▽ More Adequate blood supply is critical for normal brain function. Brain vasculature dysfunctions such as stalled blood flow in cerebral capillaries are associated with cognitive decline and pathogenesis in Alzheimer's disease. Recent advances in imaging technology enabled generation of high-quality 3D images that can be used to visualize stalled blood vessels. However, localization of stalled vessels in 3D images is often required as the first step for downstream analysis, which can be tedious, time-consuming and error-prone, when done manually. Here, we describe a deep learning-based approach for automatic detection of stalled capillaries in brain images based on 3D convolutional neural networks. Our networks employed custom 3D data augmentations and were used weight transfer from pre-trained 2D models for initialization. We used an ensemble of several 3D models to produce the winning submission to the Clog Loss: Advance Alzheimer's Research with Stall Catchers machine learning competition that challenged the participants with classifying blood vessels in 3D image stacks as stalled or flowing. In this setting, our approach outperformed other methods and demonstrated state-of-the-art results, achieving 0.85 Matthews correlation coefficient, 85% sensitivity, and 99.3% specificity. The source code for our solution is made publicly available. △ Less

Submitted 14 February, 2022; v1 submitted 4 April, 2021; originally announced April 2021.

Journal ref: Computers in biology and medicine. 2022

arXiv:2004.11482 [pdf, other]

Roof material classification from aerial imagery

Authors: Roman Solovyev

Abstract: This paper describes an algorithm for classification of roof materials using aerial photographs. Main advantages of the algorithm are proposed methods to improve prediction accuracy. Proposed methods includes: method of converting ImageNet weights of neural networks for using multi-channel images; special set of features of second level models that are used in addition to specific predictions of n… ▽ More This paper describes an algorithm for classification of roof materials using aerial photographs. Main advantages of the algorithm are proposed methods to improve prediction accuracy. Proposed methods includes: method of converting ImageNet weights of neural networks for using multi-channel images; special set of features of second level models that are used in addition to specific predictions of neural networks; special set of image augmentations that improve training accuracy. In addition, complete flow for solving this problem is proposed. The following content is available in open access: solution code, weight sets and architecture of the used neural networks. The proposed solution achieved second place in the competition "Open AI Caribbean Challenge". △ Less

Submitted 23 April, 2020; originally announced April 2020.

arXiv:1910.13302 [pdf, other]

doi 10.1016/j.imavis.2021.104117

Weighted boxes fusion: Ensembling boxes from different object detection models

Authors: Roman Solovyev, Weimin Wang, Tatiana Gabruseva

Abstract: In this work, we present a novel method for combining predictions of object detection models: weighted boxes fusion. Our algorithm utilizes confidence scores of all proposed bounding boxes to constructs the averaged boxes. We tested method on several datasets and evaluated it in the context of the Open Images and COCO Object Detection tracks, achieving top results in these challenges. The source c… ▽ More In this work, we present a novel method for combining predictions of object detection models: weighted boxes fusion. Our algorithm utilizes confidence scores of all proposed bounding boxes to constructs the averaged boxes. We tested method on several datasets and evaluated it in the context of the Open Images and COCO Object Detection tracks, achieving top results in these challenges. The source code is publicly available at https://github.com/ZFTurbo/Weighted-Boxes-Fusion △ Less

Submitted 6 February, 2021; v1 submitted 29 October, 2019; originally announced October 2019.

Journal ref: Image and Vision Computing (2021): 104117

arXiv:1908.02924 [pdf, other]

Bayesian Feature Pyramid Networks for Automatic Multi-Label Segmentation of Chest X-rays and Assessment of Cardio-Thoratic Ratio

Authors: Roman Solovyev, Iaroslav Melekhov, Timo Lesonen, Elias Vaattovaara, Osmo Tervonen, Aleksei Tiulpin

Abstract: Cardiothoratic ratio (CTR) estimated from chest radiographs is a marker indicative of cardiomegaly, the presence of which is in the criteria for heart failure diagnosis. Existing methods for automatic assessment of CTR are driven by Deep Learning-based segmentation. However, these techniques produce only point estimates of CTR but clinical decision making typically assumes the uncertainty. In this… ▽ More Cardiothoratic ratio (CTR) estimated from chest radiographs is a marker indicative of cardiomegaly, the presence of which is in the criteria for heart failure diagnosis. Existing methods for automatic assessment of CTR are driven by Deep Learning-based segmentation. However, these techniques produce only point estimates of CTR but clinical decision making typically assumes the uncertainty. In this paper, we propose a novel method for chest X-ray segmentation and CTR assessment in an automatic manner. In contrast to the previous art, we, for the first time, propose to estimate CTR with uncertainty bounds. Our method is based on Deep Convolutional Neural Network with Feature Pyramid Network (FPN) decoder. We propose two modifications of FPN: replace the batch normalization with instance normalization and inject the dropout which allows to obtain the Monte-Carlo estimates of the segmentation maps at test time. Finally, using the predicted segmentation mask samples, we estimate CTR with uncertainty. In our experiments we demonstrate that the proposed method generalizes well to three different test sets. Finally, we make the annotations produced by two radiologists for all our datasets publicly available. △ Less

Submitted 8 August, 2019; originally announced August 2019.

Comments: Roman Solovyev and Iaroslav Melekhov contributed equally. Timo Lesonen and Elias Vaattovaara contributed equally

arXiv:1810.02364 [pdf, other]

Deep Learning Approaches for Understanding Simple Speech Commands

Authors: Roman A. Solovyev, Maxim Vakhrushev, Alexander Radionov, Vladimir Aliev, Alexey A. Shvets

Abstract: Automatic classification of sound commands is becoming increasingly important, especially for mobile and embedded devices. Many of these devices contain both cameras and microphones, and companies that develop them would like to use the same technology for both of these classification tasks. One way of achieving this is to represent sound commands as images, and use convolutional neural networks w… ▽ More Automatic classification of sound commands is becoming increasingly important, especially for mobile and embedded devices. Many of these devices contain both cameras and microphones, and companies that develop them would like to use the same technology for both of these classification tasks. One way of achieving this is to represent sound commands as images, and use convolutional neural networks when classifying images as well as sounds. In this paper we consider several approaches to the problem of sound classification that we applied in TensorFlow Speech Recognition Challenge organized by Google Brain team on the Kaggle platform. Here we show different representation of sounds (Wave frames, Spectrograms, Mel-Spectrograms, MFCCs) and apply several 1D and 2D convolutional neural networks in order to get the best performance. Our experiments show that we found appropriate sound representation and corresponding convolutional neural networks. As a result we achieved good classification accuracy that allowed us to finish the challenge on 8-th place among 1315 teams. △ Less

Submitted 4 October, 2018; originally announced October 2018.

Comments: 12 page, 4 figures, 1 table

arXiv:1808.09945 [pdf]

doi 10.1109/EIConRus.2019.8656778

Fixed-Point Convolutional Neural Network for Real-Time Video Processing in FPGA

Authors: Roman Solovyev, Alexander Kustov, Dmitry Telpukhov, Vladimir Rukhlov, Alexandr Kalinin

Abstract: Modern mobile neural networks with a reduced number of weights and parameters do a good job with image classification tasks, but even they may be too complex to be implemented in an FPGA for video processing tasks. The article proposes neural network architecture for the practical task of recognizing images from a camera, which has several advantages in terms of speed. This is achieved by reducing… ▽ More Modern mobile neural networks with a reduced number of weights and parameters do a good job with image classification tasks, but even they may be too complex to be implemented in an FPGA for video processing tasks. The article proposes neural network architecture for the practical task of recognizing images from a camera, which has several advantages in terms of speed. This is achieved by reducing the number of weights, moving from a floating-point to a fixed-point arithmetic, and due to a number of hardware-level optimizations associated with storing weights in blocks, a shift register, and an adjustable number of convolutional blocks that work in parallel. The article also proposed methods for adapting the existing data set for solving a different task. As the experiments showed, the proposed neural network copes well with real-time video processing even on the cheap FPGAs. △ Less

Submitted 3 December, 2020; v1 submitted 29 August, 2018; originally announced August 2018.

Comments: 2019 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus)

Showing 1–9 of 9 results for author: Solovyev, R