Skip to main content

Showing 1–32 of 32 results for author: Politis, A

  1. arXiv:2407.06847  [pdf, ps, other

    eess.AS cs.GR cs.SD eess.SP

    Gaunt coefficients for complex and real spherical harmonics with applications to spherical array processing and Ambisonics

    Authors: Archontis Politis

    Abstract: Acoustical signal processing of directional representations of sound fields, including source, receiver, and scatterer transfer functions, are often expressed and modeled in the spherical harmonic domain (SHD). Certain such modeling operations, or applications of those models, involve multiplications of those directional quantities, which can also be expressed conveniently in the SHD through coupl… ▽ More

    Submitted 9 July, 2024; originally announced July 2024.

  2. arXiv:2406.03228  [pdf, other

    eess.AS

    Reference Channel Selection by Multi-Channel Masking for End-to-End Multi-Channel Speech Enhancement

    Authors: Wang Dai, Xiaofei Li, Archontis Politis, Tuomas Virtanen

    Abstract: In end-to-end multi-channel speech enhancement, the traditional approach of designating one microphone signal as the reference for processing may not always yield optimal results. The limitation is particularly in scenarios with large distributed microphone arrays with varying speaker-to-microphone distances or compact, highly directional microphone arrays where speaker or microphone positions cha… ▽ More

    Submitted 11 June, 2024; v1 submitted 5 June, 2024; originally announced June 2024.

    Comments: Accepted by EUSIPCO 2024

  3. Speaker Distance Estimation in Enclosures from Single-Channel Audio

    Authors: Michael Neri, Archontis Politis, Daniel Krause, Marco Carli, Tuomas Virtanen

    Abstract: Distance estimation from audio plays a crucial role in various applications, such as acoustic scene analysis, sound source localization, and room modeling. Most studies predominantly center on employing a classification approach, where distances are discretized into distinct categories, enabling smoother model training and achieving higher accuracy but imposing restrictions on the precision of the… ▽ More

    Submitted 26 March, 2024; originally announced March 2024.

    Comments: Accepted for publication in IEEE/ACM Transactions on Audio, Speech, and Language Processing

  4. arXiv:2403.11827  [pdf, other

    cs.SD cs.LG eess.AS

    Sound Event Detection and Localization with Distance Estimation

    Authors: Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros

    Abstract: Sound Event Detection and Localization (SELD) is a combined task of identifying sound events and their corresponding direction-of-arrival (DOA). While this task has numerous applications and has been extensively researched in recent years, it fails to provide full information about the sound source position. In this paper, we overcome this problem by extending the task to Sound Event Detection, Lo… ▽ More

    Submitted 12 June, 2024; v1 submitted 18 March, 2024; originally announced March 2024.

    Comments: This paper has been accepted for the 32nd European Signal Processing Conference EUSIPCO 2024 in Lyon

  5. arXiv:2401.13401  [pdf, other

    eess.AS eess.SP

    Perceptually-motivated Spatial Audio Codec for Higher-Order Ambisonics Compression

    Authors: Christoph Hold, Leo McCormack, Archontis Politis, Ville Pulkki

    Abstract: Scene-based spatial audio formats, such as Ambisonics, are playback system agnostic and may therefore be favoured for delivering immersive audio experiences to a wide range of (potentially unknown) devices. The number of channels required to deliver high spatial resolution Ambisonic audio, however, can be prohibitive for low-bandwidth applications. Therefore, this paper proposes a compression code… ▽ More

    Submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted for publication in Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2024)

  6. arXiv:2401.05916  [pdf, other

    eess.AS cs.SD

    Neural Ambisonics encoding for compact irregular microphone arrays

    Authors: Mikko Heikkinen, Archontis Politis, Tuomas Virtanen

    Abstract: Ambisonics encoding of microphone array signals can enable various spatial audio applications, such as virtual reality or telepresence, but it is typically designed for uniformly-spaced spherical microphone arrays. This paper proposes a method for Ambisonics encoding that uses a deep neural network (DNN) to estimate a signal transform from microphone inputs to Ambisonics signals. The approach uses… ▽ More

    Submitted 11 January, 2024; originally announced January 2024.

    Comments: Accepted for publication in Proceedings of the 2024 IEEE International Conference on Acoustics, Speech and Signal Processing

  7. arXiv:2312.10756  [pdf, other

    eess.AS cs.LG eess.SP

    Attention-Driven Multichannel Speech Enhancement in Moving Sound Source Scenarios

    Authors: Yuzhu Wang, Archontis Politis, Tuomas Virtanen

    Abstract: Current multichannel speech enhancement algorithms typically assume a stationary sound source, a common mismatch with reality that limits their performance in real-world scenarios. This paper focuses on attention-driven spatial filtering techniques designed for dynamic settings. Specifically, we study the application of linear and nonlinear attention-based methods for estimating time-varying spati… ▽ More

    Submitted 17 December, 2023; originally announced December 2023.

  8. arXiv:2310.16550  [pdf, other

    cs.SD eess.AS

    Dynamic Processing Neural Network Architecture For Hearing Loss Compensation

    Authors: Szymon Drgas, Lars Bramsløw, Archontis Politis, Gaurav Naithani, Tuomas Virtanen

    Abstract: This paper proposes neural networks for compensating sensorineural hearing loss. The aim of the hearing loss compensation task is to transform a speech signal to increase speech intelligibility after further processing by a person with a hearing impairment, which is modeled by a hearing loss model. We propose an interpretable model called dynamic processing network, which has a structure similar t… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

  9. arXiv:2306.09126  [pdf, other

    cs.SD cs.CV cs.MM eess.AS eess.IV

    STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events

    Authors: Kazuki Shimada, Archontis Politis, Parthasaarathy Sudarsanam, Daniel Krause, Kengo Uchida, Sharath Adavanne, Aapo Hakala, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Tuomas Virtanen, Yuki Mitsufuji

    Abstract: While direction of arrival (DOA) of sound events is generally estimated from multichannel audio data recorded in a microphone array, sound events usually derive from visually perceptible source objects, e.g., sounds of footsteps come from the feet of a walker. This paper proposes an audio-visual sound event localization and detection (SELD) task, which uses multichannel audio and video information… ▽ More

    Submitted 14 November, 2023; v1 submitted 15 June, 2023; originally announced June 2023.

    Comments: 27 pages, 9 figures, accepted for publication in NeurIPS 2023 Track on Datasets and Benchmarks

  10. arXiv:2306.08510  [pdf, other

    eess.AS cs.LG cs.SD eess.SP

    Permutation Invariant Recurrent Neural Networks for Sound Source Tracking Applications

    Authors: David Diaz-Guerra, Archontis Politis, Antonio Miguel, Jose R. Beltran, Tuomas Virtanen

    Abstract: Many multi-source localization and tracking models based on neural networks use one or several recurrent layers at their final stages to track the movement of the sources. Conventional recurrent neural networks (RNNs), such as the long short-term memories (LSTMs) or the gated recurrent units (GRUs), take a vector as their input and use another vector to store their state. However, this approach re… ▽ More

    Submitted 14 June, 2023; originally announced June 2023.

    Comments: Accepted for publication at Forum Acusticum 2023

  11. arXiv:2303.07816  [pdf, other

    eess.AS cs.SD

    Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

    Authors: Wang Dai, Archontis Politis, Tuomas Virtanen

    Abstract: This work proposes a learnable filterbank based on a multi-channel masking framework for multi-channel source separation. The learnable filterbank is a 1D Conv layer, which transforms the raw waveform into a 2D representation. In contrast to the conventional single-channel masking method, we estimate a mask for each individual microphone channel. The estimated masks are then applied to the transfo… ▽ More

    Submitted 14 March, 2023; originally announced March 2023.

  12. arXiv:2211.16958  [pdf, ps, other

    cs.SD eess.AS

    How to (virtually) train your speaker localizer

    Authors: Prerak Srivastava, Antoine Deleforge, Archontis Politis, Emmanuel Vincent

    Abstract: Learning-based methods have become ubiquitous in speaker localization. Existing systems rely on simulated training sets for the lack of sufficiently large, diverse and annotated real datasets. Most room acoustics simulators used for this purpose rely on the image source method (ISM) because of its computational efficiency. This paper argues that carefully extending the ISM to incorporate more real… ▽ More

    Submitted 25 May, 2023; v1 submitted 30 November, 2022; originally announced November 2022.

    Comments: Published in INTERSPEECH 2023

  13. arXiv:2210.14536  [pdf, ps, other

    eess.AS cs.LG cs.SD eess.SP

    Position tracking of a varying number of sound sources with sliding permutation invariant training

    Authors: David Diaz-Guerra, Archontis Politis, Tuomas Virtanen

    Abstract: Recent data- and learning-based sound source localization (SSL) methods have shown strong performance in challenging acoustic scenarios. However, little work has been done on adapting such methods to track consistently multiple sources appearing and disappearing, as would occur in reality. In this paper, we present a new training strategy for deep learning SSL models with a straightforward impleme… ▽ More

    Submitted 5 June, 2023; v1 submitted 26 October, 2022; originally announced October 2022.

    Comments: Accepted for publication at the 31st European Signal Processing Conference (EUSIPCO 2023)

  14. arXiv:2206.01948  [pdf, other

    eess.AS cs.SD

    STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

    Authors: Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

    Abstract: This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, first-order Ambisonics and tetrahedral microphone arr… ▽ More

    Submitted 2 September, 2022; v1 submitted 4 June, 2022; originally announced June 2022.

  15. Self-supervised Learning of Audio Representations from Audio-Visual Data using Spatial Alignment

    Authors: Shanshan Wang, Archontis Politis, Annamaria Mesaros, Tuomas Virtanen

    Abstract: Learning from audio-visual data offers many possibilities to express correspondence between the audio and visual content, similar to the human perception that relates aural and visual information. In this work, we present a method for self-supervised representation learning based on audio-visual spatial alignment (AVSA), a more sophisticated alignment task than the audio-visual correspondence (AVC… ▽ More

    Submitted 2 June, 2022; originally announced June 2022.

  16. arXiv:2111.00030  [pdf, other

    eess.AS cs.SD

    Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers

    Authors: Sharath Adavanne, Archontis Politis, Tuomas Virtanen

    Abstract: Data-based and learning-based sound source localization (SSL) has shown promising results in challenging conditions, and is commonly set as a classification or a regression problem. Regression-based approaches have certain advantages over classification-based, such as continuous direction-of-arrival estimation of static and moving sources. However, multi-source scenarios require multiple regressor… ▽ More

    Submitted 29 October, 2021; originally announced November 2021.

    Comments: Submitted to IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA2021)

  17. arXiv:2107.12033  [pdf, other

    cs.SD cs.LG eess.AS

    Joint Direction and Proximity Classification of Overlapping Sound Events from Binaural Audio

    Authors: Daniel Aleksander Krause, Archontis Politis, Annamaria Mesaros

    Abstract: Sound source proximity and distance estimation are of great interest in many practical applications, since they provide significant information for acoustic scene analysis. As both tasks share complementary qualities, ensuring efficient interaction between these two is crucial for a complete picture of an aural environment. In this paper, we aim to investigate several ways of performing joint prox… ▽ More

    Submitted 26 July, 2021; originally announced July 2021.

  18. arXiv:2107.09388  [pdf, other

    cs.SD eess.AS

    Assessment of Self-Attention on Learned Features For Sound Event Localization and Detection

    Authors: Parthasaarathy Sudarsanam, Archontis Politis, Konstantinos Drossos

    Abstract: Joint sound event localization and detection (SELD) is an emerging audio signal processing task adding spatial dimensions to acoustic scene analysis and sound event detection. A popular approach to modeling SELD jointly is using convolutional recurrent neural network (CRNN) models, where CNNs learn high-level features from multi-channel audio input and the RNNs learn temporal relationships from th… ▽ More

    Submitted 27 September, 2021; v1 submitted 20 July, 2021; originally announced July 2021.

  19. arXiv:2106.14787  [pdf, other

    eess.AS

    Mobile Microphone Array Speech Detection and Localization in Diverse Everyday Environments

    Authors: Pasi Pertilä, Emre Cakir, Aapo Hakala, Eemi Fagerlund, Tuomas Virtanen, Archontis Politis, Antti Eronen

    Abstract: Joint sound event localization and detection (SELD) is an integral part of developing context awareness into communication interfaces of mobile robots, smartphones, and home assistants. For example, an automatic audio focus for video capture on a mobile phone requires robust detection of relevant acoustic events around the device and their direction. Existing SELD approaches have been evaluated us… ▽ More

    Submitted 28 June, 2021; originally announced June 2021.

    Comments: to be published in the proceedings of the 29th European Signal Processing Conference, EUSIPCO 2021

  20. arXiv:2106.11794  [pdf, other

    eess.AS cs.SD

    Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair

    Authors: Shanshan Wang, Gaurav Naithani, Archontis Politis, Tuomas Virtanen

    Abstract: Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation. In this paper, we propose the usage of an asymmetric analysis-synthesis window pair which allows for training with targets with better frequency resolution, while retaining the low-latency during inference suitable for real-time spee… ▽ More

    Submitted 22 June, 2021; originally announced June 2021.

    Comments: Accepted to EUSIPCO-2021

  21. arXiv:2106.06999  [pdf, other

    eess.AS cs.SD

    A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection

    Authors: Archontis Politis, Sharath Adavanne, Daniel Krause, Antoine Deleforge, Prerak Srivastava, Tuomas Virtanen

    Abstract: This report presents the dataset and baseline of Task 3 of the DCASE2021 Challenge on Sound Event Localization and Detection (SELD). The dataset is based on emulation of real recordings of static or moving sound events under real conditions of reverberation and ambient noise, using spatial room impulse responses captured in a variety of rooms and delivered in two spatial formats. The acoustical sy… ▽ More

    Submitted 4 July, 2021; v1 submitted 13 June, 2021; originally announced June 2021.

  22. Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019

    Authors: Archontis Politis, Annamaria Mesaros, Sharath Adavanne, Toni Heittola, Tuomas Virtanen

    Abstract: Sound event localization and detection is a novel area of research that emerged from the combined interest of analyzing the acoustic scene in terms of the spatial and temporal activity of sounds of interest. This paper presents an overview of the first international evaluation on sound event localization and detection, organized as a task of the DCASE 2019 Challenge. A large-scale realistic datase… ▽ More

    Submitted 11 January, 2021; v1 submitted 6 September, 2020; originally announced September 2020.

  23. arXiv:2006.01919  [pdf, other

    eess.AS cs.SD

    A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

    Authors: Archontis Politis, Sharath Adavanne, Tuomas Virtanen

    Abstract: This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge. The SELD task refers to the problem of trying to simultaneously classify a known set of sound event classes, detect their temporal activations, and estimate their spatial directions or locations while they are active. To train and test SELD systems, datase… ▽ More

    Submitted 27 June, 2020; v1 submitted 2 June, 2020; originally announced June 2020.

  24. arXiv:2005.08380  [pdf

    q-bio.QM stat.AP

    Deuteros 2.0: Peptide-level significance testing of data from hydrogen deuterium exchange mass spectrometry

    Authors: Andy M. Lau, Jurgen Claesen, Kjetil Hansen, Argyris Politis

    Abstract: Summary: Hydrogen deuterium exchange mass spectrometry (HDX-MS) is becoming increasing routine for monitoring changes in the structural dynamics of proteins. Differential HDX-MS allows comparison of individual protein states, such as in the absence or presence of a ligand. This can be used to attribute changes in conformation to binding events, allowing the mapping of entire con-formational networ… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

    Comments: Application note with 3 pages, 1 figure

  25. arXiv:2003.01162  [pdf, ps, other

    eess.AS cs.SD

    Multichannel Singing Voice Separation by Deep Neural Network Informed DOA Constrained CNMF

    Authors: Antonio J. Muñoz-Montoro, Julio J. Carabias-Orti, Archontis Politis, Konstantinos Drossos

    Abstract: This work addresses the problem of multichannel source separation combining two powerful approaches, multichannel spectral factorization with recent monophonic deep-learning (DL) based spectrum inference. Individual source spectra at different channels are estimated with a Masker-Denoiser Twin Network (MaD TwinNet), able to model long-term temporal patterns of a musical piece. The monophonic sourc… ▽ More

    Submitted 2 March, 2020; originally announced March 2020.

  26. arXiv:1905.08546  [pdf, other

    cs.SD eess.AS

    A multi-room reverberant dataset for sound event localization and detection

    Authors: Sharath Adavanne, Archontis Politis, Tuomas Virtanen

    Abstract: This paper presents the sound event localization and detection (SELD) task setup for the DCASE 2019 challenge. The goal of the SELD task is to detect the temporal activities of a known set of sound event classes, and further localize them in space when active. As part of the challenge, a synthesized dataset with each sound event associated with a spatial coordinate represented using azimuth and el… ▽ More

    Submitted 24 May, 2019; v1 submitted 21 May, 2019; originally announced May 2019.

  27. arXiv:1904.12769  [pdf, other

    cs.SD cs.LG eess.AS

    Localization, Detection and Tracking of Multiple Moving Sound Sources with a Convolutional Recurrent Neural Network

    Authors: Sharath Adavanne, Archontis Politis, Tuomas Virtanen

    Abstract: This paper investigates the joint localization, detection, and tracking of sound events using a convolutional recurrent neural network (CRNN). We use a CRNN previously proposed for the localization and detection of stationary sources, and show that the recurrent layers enable the spatial tracking of moving sources when trained with dynamic scenes. The tracking performance of the CRNN is compared w… ▽ More

    Submitted 29 April, 2019; originally announced April 2019.

  28. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks

    Authors: Sharath Adavanne, Archontis Politis, Joonas Nikunen, Tuomas Virtanen

    Abstract: In this paper, we propose a convolutional recurrent neural network for joint sound event localization and detection (SELD) of multiple overlapping sound events in three-dimensional (3D) space. The proposed network takes a sequence of consecutive spectrogram time-frames as input and maps it to two outputs in parallel. As the first output, the sound event detection (SED) is performed as a multi-labe… ▽ More

    Submitted 17 December, 2018; v1 submitted 30 June, 2018; originally announced July 2018.

    Comments: Published in Journal of Selected Topics in Signal Processing 2018

  29. arXiv:1801.09522  [pdf, other

    cs.SD cs.LG eess.AS

    Multichannel Sound Event Detection Using 3D Convolutional Neural Networks for Learning Inter-channel Features

    Authors: Sharath Adavanne, Archontis Politis, Tuomas Virtanen

    Abstract: In this paper, we propose a stacked convolutional and recurrent neural network (CRNN) with a 3D convolutional neural network (CNN) in the first layer for the multichannel sound event detection (SED) task. The 3D CNN enables the network to simultaneously learn the inter- and intra-channel features from the input multichannel audio. In order to evaluate the proposed method, multichannel audio datase… ▽ More

    Submitted 29 January, 2018; originally announced January 2018.

  30. arXiv:1710.10059  [pdf, other

    cs.SD cs.LG eess.AS

    Direction of arrival estimation for multiple sound sources using convolutional recurrent neural network

    Authors: Sharath Adavanne, Archontis Politis, Tuomas Virtanen

    Abstract: This paper proposes a deep neural network for estimating the directions of arrival (DOA) of multiple sound sources. The proposed stacked convolutional and recurrent neural network (DOAnet) generates a spatial pseudo-spectrum (SPS) along with the DOA estimates in both azimuth and elevation. We avoid any explicit feature extraction step by using the magnitudes and phases of the spectrograms of all t… ▽ More

    Submitted 5 August, 2018; v1 submitted 27 October, 2017; originally announced October 2017.

    Comments: EUSIPCO 2018

  31. arXiv:1609.03409  [pdf, ps, other

    cs.SD

    Acoustic intensity, energy-density and diffuseness estimation in a directionally-constrained region

    Authors: Archontis Politis, Ville Pulkki

    Abstract: This work presents a method for estimation of the acoustic intensity, the energy density and the associated sound field diffuseness around the origin, when the sound field is weighted with a spatial filter. The method permits energetic DOA estimation and sound field characterization focused in a specific angular region determined by the beam pattern of the spatial filter. The formulation of the es… ▽ More

    Submitted 13 September, 2016; v1 submitted 12 September, 2016; originally announced September 2016.

    Comments: 7 pages

  32. arXiv:1608.07713  [pdf, ps, other

    cs.SD

    Diffuse-field coherence of sensors with arbitrary directional responses

    Authors: Archontis Politis

    Abstract: Knowledge of the diffuse-field coherence between array sensors is a basic assumption for a wide range of array processing applications. Explicit relations previously existed only for omnidirectional and first-order directional sensors, or a restricted arrangement of differential patterns. We present a closed-form formulation of the theoretical coherence function between arbitrary directionally ban… ▽ More

    Submitted 27 August, 2016; originally announced August 2016.

    Comments: 5 pages