Computer Science > Sound

arXiv:1808.00606 (cs)

[Submitted on 2 Aug 2018 (v1), last revised 23 Aug 2018 (this version, v2)]

Title:AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

Authors:Sourish Chaudhuri, Joseph Roth, Daniel P. W. Ellis, Andrew Gallagher, Liat Kaver, Radhika Marvin, Caroline Pantofaru, Nathan Reale, Loretta Guarino Reid, Kevin Wilson, Zhonghua Xi

View PDF

Abstract:Speech activity detection (or endpointing) is an important processing step for applications such as speech recognition, language identification and speaker diarization. Both audio- and vision-based approaches have been used for this task in various settings, often tailored toward end applications. However, much of the prior work reports results in synthetic settings, on task-specific datasets, or on datasets that are not openly available. This makes it difficult to compare approaches and understand their strengths and weaknesses. In this paper, we describe a new dataset which we will release publicly containing densely labeled speech activity in YouTube videos, with the goal of creating a shared, available dataset for this task. The labels in the dataset annotate three different speech activity conditions: clean speech, speech co-occurring with music, and speech co-occurring with noise, which enable analysis of model performance in more challenging conditions based on the presence of overlapping noise. We report benchmark performance numbers on AVA-Speech using off-the-shelf, state-of-the-art audio and vision models that serve as a baseline to facilitate future research.

Comments:	Interspeech, 2018
Subjects:	Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:1808.00606 [cs.SD]
	(or arXiv:1808.00606v2 [cs.SD] for this version)
	https://doi.org/10.48550/arXiv.1808.00606

Submission history

From: Sourish Chaudhuri [view email]
[v1] Thu, 2 Aug 2018 00:13:11 UTC (2,182 KB)
[v2] Thu, 23 Aug 2018 23:28:38 UTC (2,183 KB)

Computer Science > Sound

Title:AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Sound

Title:AVA-Speech: A Densely Labeled Dataset of Speech Activity in Movies

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators