Skip to main content

Showing 1–7 of 7 results for author: Comunità, M

  1. arXiv:2406.17672  [pdf, other

    cs.SD eess.AS

    SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond

    Authors: Marco Comunità, Zhi Zhong, Akira Takahashi, Shiqi Yang, Mengjie Zhao, Koichi Saito, Yukara Ikemiya, Takashi Shibuya, Shusuke Takahashi, Yuki Mitsufuji

    Abstract: Recent advances in generative models that iteratively synthesize audio clips sparked great success to text-to-audio synthesis (TTA), but with the cost of slow synthesis speed and heavy computation. Although there have been attempts to accelerate the iterative procedure, high-quality TTA systems remain inefficient due to hundreds of iterations required in the inference phase and large amount of mod… ▽ More

    Submitted 26 June, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

    Comments: 6 pages, 8 figures, 8 tables. Audio samples: https://zzaudio.github.io/SpecMaskGIT/index.html

  2. arXiv:2310.15247  [pdf, other

    cs.SD cs.CV cs.LG cs.MM eess.AS

    SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis

    Authors: Marco Comunità, Riccardo F. Gramaccioni, Emilian Postolache, Emanuele Rodolà, Danilo Comminiello, Joshua D. Reiss

    Abstract: Sound design involves creatively selecting, recording, and editing sound effects for various media like cinema, video games, and virtual/augmented reality. One of the most time-consuming steps when designing sound is synchronizing audio with video. In some cases, environmental recordings from video shoots are available, which can aid in the process. However, in video games and animations, no refer… ▽ More

    Submitted 23 October, 2023; originally announced October 2023.

  3. arXiv:2305.13262  [pdf, other

    cs.SD cs.LG eess.AS

    Modulation Extraction for LFO-driven Audio Effects

    Authors: Christopher Mitcheltree, Christian J. Steinmetz, Marco Comunità, Joshua D. Reiss

    Abstract: Low frequency oscillator (LFO) driven audio effects such as phaser, flanger, and chorus, modify an input signal using time-varying filters and delays, resulting in characteristic sweeping or widening effects. It has been shown that these effects can be modeled using neural networks when conditioned with the ground truth LFO signal. However, in most cases, the LFO signal is not accessible and measu… ▽ More

    Submitted 22 May, 2023; originally announced May 2023.

    Comments: Accepted to DAFx 2023. Listening samples and plugins can be found at https://christhetree.github.io/mod_extraction/

  4. arXiv:2211.00497  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Modelling black-box audio effects with time-varying feature modulation

    Authors: Marco Comunità, Christian J. Steinmetz, Huy Phan, Joshua D. Reiss

    Abstract: Deep learning approaches for black-box modelling of audio effects have shown promise, however, the majority of existing work focuses on nonlinear effects with behaviour on relatively short time-scales, such as guitar amplifiers and distortion. While recurrent and convolutional architectures can theoretically be extended to capture behaviour at longer time scales, we show that simply scaling the wi… ▽ More

    Submitted 9 May, 2023; v1 submitted 1 November, 2022; originally announced November 2022.

  5. arXiv:2110.09605  [pdf, other

    cs.SD cs.AI cs.LG eess.AS

    Neural Synthesis of Footsteps Sound Effects with Generative Adversarial Networks

    Authors: Marco Comunità, Huy Phan, Joshua D. Reiss

    Abstract: Footsteps are among the most ubiquitous sound effects in multimedia applications. There is substantial research into understanding the acoustic features and developing synthesis models for footstep sound effects. In this paper, we present a first attempt at adopting neural synthesis for this task. We implemented two GAN-based architectures and compared the results with real recordings as well as s… ▽ More

    Submitted 10 December, 2021; v1 submitted 18 October, 2021; originally announced October 2021.

  6. arXiv:2012.03216  [pdf, other

    cs.SD cs.LG eess.AS

    Guitar Effects Recognition and Parameter Estimation with Convolutional Neural Networks

    Authors: Marco Comunità, Dan Stowell, Joshua D. Reiss

    Abstract: Despite the popularity of guitar effects, there is very little existing research on classification and parameter estimation of specific plugins or effect units from guitar recordings. In this paper, convolutional neural networks were used for classification and parameter estimation for 13 overdrive, distortion and fuzz guitar effects. A novel dataset of processed electric guitar samples was assemb… ▽ More

    Submitted 6 December, 2020; originally announced December 2020.

    Journal ref: JAES Volume 69 Issue 7/8 pp. 594-604; July 2021

  7. arXiv:2008.04638  [pdf, other

    cs.SD cs.HC cs.MM eess.AS

    PlugSonic: a web- and mobile-based platform for binaural audio and sonic narratives

    Authors: Marco Comunità, Andrea Gerino, Veranika Lim, Lorenzo Picinali

    Abstract: PlugSonic is a suite of web- and mobile-based applications for the curation and experience of binaural interactive soundscapes and sonic narratives. It was developed as part of the PLUGGY EU project (Pluggable Social Platform for Heritage Awareness and Participation) and consists of two main applications: PlugSonic Sample, to edit and apply audio effects, and PlugSonic Soundscape, to create and ex… ▽ More

    Submitted 11 August, 2020; originally announced August 2020.

    Comments: 22 pages, 11 figures