Skip to main content

Showing 1–40 of 40 results for author: Villalba, J

  1. arXiv:2403.07891  [pdf

    cs.CV cs.CR cs.LG

    Digital Video Manipulation Detection Technique Based on Compression Algorithms

    Authors: Edgar Gonzalez Fernandez, Ana Lucila Sandoval Orozco, Luis Javier Garcia Villalba

    Abstract: Digital images and videos play a very important role in everyday life. Nowadays, people have access the affordable mobile devices equipped with advanced integrated cameras and powerful image processing applications. Technological development facilitates not only the generation of multimedia content, but also the intentional modification of it, either with recreational or malicious purposes. This i… ▽ More

    Submitted 3 February, 2024; originally announced March 2024.

    Journal ref: IEEE Transactions on Intelligent Transportation Systems, Vol. 23, No. 3, pp. 2596-2605, December 2021

  2. arXiv:2402.19355  [pdf, other

    cs.SD cs.CR cs.LG eess.AS

    Unraveling Adversarial Examples against Speaker Identification -- Techniques for Attack Detection and Victim Model Classification

    Authors: Sonal Joshi, Thomas Thebaud, Jesús Villalba, Najim Dehak

    Abstract: Adversarial examples have proven to threaten speaker identification systems, and several countermeasures against them have been proposed. In this paper, we propose a method to detect the presence of adversarial examples, i.e., a binary classifier distinguishing between benign and adversarial examples. We build upon and extend previous work on attack type classification by exploring new architectur… ▽ More

    Submitted 29 February, 2024; originally announced February 2024.

  3. Adaptive Artificial Immune Networks for Mitigating DoS flooding Attacks

    Authors: Jorge Maestre Vidal, Ana Lucila Sandoval Orozco, Luis Javier García Villalba

    Abstract: Denial of service attacks pose a threat in constant growth. This is mainly due to their tendency to gain in sophistication, ease of implementation, obfuscation and the recent improvements in occultation of fingerprints. On the other hand, progress towards self-organizing networks, and the different techniques involved in their development, such as software-defined networking, network-function virt… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

    Journal ref: J. Maestre Vidal, A. L. Sandoval Orozco, L. J. García Villalba: Adaptive Artificial Immune Networks for Mitigating DoS Flooding Attacks. Swarm and Evolutionary Computation. Vol. 38, pp. 3894-108, February 2018

  4. Compression effects and scene details on the source camera identification of digital videos

    Authors: Raquel Ramos López, Ana Lucila Sandoval Orozco, Luis Javier García Villalba

    Abstract: The continuous growth of technologies like 4G or 5G has led to a massive use of mobile devices such as smartphones and tablets. This phenomenon, combined with the fact that people use mobile phones for a longer period of time, results in mobile phones becoming the main source of creation of visual information. However, its reliability as a true representation of reality cannot be taken for granted… ▽ More

    Submitted 7 February, 2024; originally announced February 2024.

    Journal ref: Expert Systems with Applications, Vol. 170, pp. 114515, May 2021

  5. arXiv:2402.06661  [pdf

    cs.CR cs.LG cs.MM eess.IV

    Authentication and integrity of smartphone videos through multimedia container structure analysis

    Authors: Carlos Quinto Huamán, Ana Lucila Sandoval Orozco, Luis Javier García Villalba

    Abstract: Nowadays, mobile devices have become the natural substitute for the digital camera, as they capture everyday situations easily and quickly, encouraging users to express themselves through images and videos. These videos can be shared across different platforms exposing them to any kind of intentional manipulation by criminals who are aware of the weaknesses of forensic techniques to accuse an inno… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Journal ref: Quinto Huamán, A. L. Sandoval Orozco, L. J. García Villalba: Authentication and Integrity of Smartphone Videos Through Multimedia Container Structure Analysis. Future Generation Computer Systems. Vol. 108, pp. 15-33, July 2020

  6. A novel pattern recognition system for detecting Android malware by analyzing suspicious boot sequences

    Authors: Jorge Maestre Vidal, Marco Antonio Sotelo Monge, Luis Javier García Villalba

    Abstract: This paper introduces a malware detection system for smartphones based on studying the dynamic behavior of suspicious applications. The main goal is to prevent the installation of the malicious software on the victim systems. The approach focuses on identifying malware addressed against the Android platform. For that purpose, only the system calls performed during the boot process of the recently… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Journal ref: Knowledge-Based Systems. Vol. 150, pp. 198-217, June 2018

  7. A security framework for Ethereum smart contracts

    Authors: Antonio López Vivar, Ana Lucila Sandoval Orozco, Luis Javier García Villalba

    Abstract: The use of blockchain and smart contracts have not stopped growing in recent years. Like all software that begins to expand its use, it is also beginning to be targeted by hackers who will try to exploit vulnerabilities in both the underlying technology and the smart contract code itself. While many tools already exist for analyzing vulnerabilities in smart contracts, the heterogeneity and variety… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Journal ref: Computer Communications. Vol. 172, pp. 119-129, April 2021

  8. arXiv:2402.02240  [pdf

    cs.CR stat.CO

    Recommendations on Statistical Randomness Test Batteries for Cryptographic Purposes

    Authors: Elena Almaraz Luengo, Luis Javier García Villalba

    Abstract: Security in different applications is closely related to the goodness of the sequences generated for such purposes. Not only in Cryptography but also in other areas, it is necessary to obtain long sequences of random numbers or that, at least, behave as such. To decide whether the generator used produces sequences that are random, unpredictable and independent, statistical checks are needed. Diffe… ▽ More

    Submitted 3 February, 2024; originally announced February 2024.

    Journal ref: ACM Computing Surveys, Vol. 54, No. 80, pp. 12420, May 2021

  9. arXiv:2401.09464  [pdf

    cs.AR

    Floating Point HUB Adder for RISC-V Sargantana Processor

    Authors: Gerardo Bandera, Javier Salamero, Miquel Moreto, Julio Villalba

    Abstract: HUB format is an emerging technique to improve the hardware and time requirement when round to nearest is needed. On the other hand, RISC-V is an open-source ISA that many companies currently use in their designs. This paper presents a tailored floating point HUB adder implemented in the Sargantana RISC-V processor.

    Submitted 8 January, 2024; originally announced January 2024.

    Comments: RISC-V Summit Europe, Barcelona, 5-9th June 2023

  10. arXiv:2309.04628  [pdf, other

    eess.AS cs.SD

    Leveraging Pretrained Image-text Models for Improving Audio-Visual Learning

    Authors: Saurabhchand Bhati, Jesús Villalba, Laureano Moro-Velazquez, Thomas Thebaud, Najim Dehak

    Abstract: Visually grounded speech systems learn from paired images and their spoken captions. Recently, there have been attempts to utilize the visually grounded models trained from images and their corresponding text captions, such as CLIP, to improve speech-based visually grounded models' performance. However, the majority of these models only utilize the pretrained image encoder. Cascaded SpeechCLIP att… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  11. arXiv:2303.04187  [pdf, other

    cs.LG

    Stabilized training of joint energy-based models and their practical applications

    Authors: Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak

    Abstract: The recently proposed Joint Energy-based Model (JEM) interprets discriminatively trained classifier $p(y|x)$ as an energy model, which is also trained as a generative model describing the distribution of the input observations $p(x)$. The JEM training relies on "positive examples" (i.e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distr… ▽ More

    Submitted 7 March, 2023; originally announced March 2023.

  12. arXiv:2208.05445  [pdf, other

    eess.AS cs.AI cs.LG

    Non-Contrastive Self-supervised Learning for Utterance-Level Information Extraction from Speech

    Authors: Jaejin Cho, Jes'us Villalba, Laureano Moro-Velazquez, Najim Dehak

    Abstract: In recent studies, self-supervised pre-trained models tend to outperform supervised pre-trained models in transfer learning. In particular, self-supervised learning (SSL) of utterance-level speech representation can be used in speech applications that require discriminative representation of consistent attributes within an utterance: speaker, language, emotion, and age. Existing frame-level self-s… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: EARLY ACCESS of IEEE JSTSP Special Issue on Self-Supervised Learning for Speech and Audio Processing

  13. arXiv:2208.05413  [pdf, other

    eess.AS cs.LG

    Non-Contrastive Self-Supervised Learning of Utterance-Level Speech Representations

    Authors: Jaejin Cho, Raghavendra Pappagari, Piotr Żelasko, Laureano Moro-Velazquez, Jesús Villalba, Najim Dehak

    Abstract: Considering the abundance of unlabeled speech data and the high labeling costs, unsupervised learning methods can be essential for better system development. One of the most successful methods is contrastive self-supervised methods, which require negative sampling: sampling alternative samples to contrast with the current sample (anchor). However, it is hard to ensure if all the negative samples b… ▽ More

    Submitted 10 August, 2022; originally announced August 2022.

    Comments: Accepted at Interspeech 2022

  14. arXiv:2204.03851  [pdf, other

    eess.AS cs.CR cs.SD

    Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser

    Authors: Sonal Joshi, Saurabh Kataria, Yiwen Shao, Piotr Zelasko, Jesus Villalba, Sanjeev Khudanpur, Najim Dehak

    Abstract: Adversarial attacks are a threat to automatic speech recognition (ASR) systems, and it becomes imperative to propose defenses to protect them. In this paper, we perform experiments to show that K2 conformer hybrid ASR is strongly affected by white-box adversarial attacks. We propose three defenses--denoiser pre-processor, adversarially fine-tuning ASR model, and adversarially fine-tuning joint mod… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to Interspeech 2022

  15. arXiv:2204.03848  [pdf, ps, other

    eess.AS cs.CR cs.SD

    AdvEst: Adversarial Perturbation Estimation to Classify and Detect Adversarial Attacks against Speaker Identification

    Authors: Sonal Joshi, Saurabh Kataria, Jesus Villalba, Najim Dehak

    Abstract: Adversarial attacks pose a severe security threat to the state-of-the-art speaker identification systems, thereby making it vital to propose countermeasures against them. Building on our previous work that used representation learning to classify and detect adversarial attacks, we propose an improvement to it using AdvEst, a method to estimate adversarial perturbation. First, we prove our claim th… ▽ More

    Submitted 8 April, 2022; originally announced April 2022.

    Comments: Submitted to InterSpeech 2022

  16. arXiv:2203.16614  [pdf, other

    eess.AS cs.SD

    Joint domain adaptation and speech bandwidth extension using time-domain GANs for speaker verification

    Authors: Saurabh Kataria, Jesús Villalba, Laureano Moro-Velázquez, Najim Dehak

    Abstract: Speech systems developed for a particular choice of acoustic domain and sampling frequency do not translate easily to others. The usual practice is to learn domain adaptation and bandwidth extension models independently. Contrary to this, we propose to learn both tasks together. Particularly, we learn to map narrowband conversational telephone speech to wideband microphone speech. We developed par… ▽ More

    Submitted 30 March, 2022; originally announced March 2022.

    Comments: submitted to Interspeech 2022

  17. arXiv:2110.02345  [pdf, other

    eess.AS cs.SD

    Unsupervised Speech Segmentation and Variable Rate Representation Learning using Segmental Contrastive Predictive Coding

    Authors: Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

    Abstract: Typically, unsupervised segmentation of speech into the phone and word-like units are treated as separate tasks and are often done via different methods which do not fully leverage the inter-dependence of the two tasks. Here, we unify them and propose a technique that can jointly perform both, showing that these two tasks indeed benefit from each other. Recent attempts employ self-supervised learn… ▽ More

    Submitted 8 October, 2021; v1 submitted 5 October, 2021; originally announced October 2021.

    Comments: arXiv admin note: substantial text overlap with arXiv:2106.02170

  18. arXiv:2109.13425  [pdf, ps, other

    eess.AS cs.LG cs.SD

    The JHU submission to VoxSRC-21: Track 3

    Authors: Jejin Cho, Jesus Villalba, Najim Dehak

    Abstract: This technical report describes Johns Hopkins University speaker recognition system submitted to Voxceleb Speaker Recognition Challenge 2021 Track 3: Self-supervised speaker verification (closed). Our overall training process is similar to the proposed one from the first place team in the last year's VoxSRC2020 challenge. The main difference is a recently proposed non-contrastive self-supervised m… ▽ More

    Submitted 27 September, 2021; originally announced September 2021.

  19. arXiv:2109.06112  [pdf, other

    cs.CL cs.SD eess.AS

    Beyond Isolated Utterances: Conversational Emotion Recognition

    Authors: Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Laureano Moro-Velazquez, Najim Dehak

    Abstract: Speech emotion recognition is the task of recognizing the speaker's emotional state given a recording of their utterance. While most of the current approaches focus on inferring emotion from isolated utterances, we argue that this is not sufficient to achieve conversational emotion recognition (CER) which deals with recognizing emotions in conversations. In this work, we propose several approaches… ▽ More

    Submitted 13 September, 2021; originally announced September 2021.

    Comments: Accepted for ASRU 2021

  20. arXiv:2106.02170  [pdf, other

    eess.AS cs.CL cs.LG cs.SD

    Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation

    Authors: Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

    Abstract: Automatic detection of phoneme or word-like units is one of the core objectives in zero-resource speech processing. Recent attempts employ self-supervised training methods, such as contrastive predictive coding (CPC), where the next frame is predicted given past context. However, CPC only looks at the audio signal's frame-level structure. We overcome this limitation with a segmental contrastive pr… ▽ More

    Submitted 3 June, 2021; originally announced June 2021.

  21. arXiv:2103.17122  [pdf, ps, other

    eess.AS cs.CR cs.SD

    Adversarial Attacks and Defenses for Speech Recognition Systems

    Authors: Piotr Żelasko, Sonal Joshi, Yiwen Shao, Jesus Villalba, Jan Trmal, Najim Dehak, Sanjeev Khudanpur

    Abstract: The ubiquitous presence of machine learning systems in our lives necessitates research into their vulnerabilities and appropriate countermeasures. In particular, we investigate the effectiveness of adversarial attacks and defenses against automatic speech recognition (ASR) systems. We select two ASR models - a thoroughly studied DeepSpeech model and a more recent Espresso framework Transformer enc… ▽ More

    Submitted 31 March, 2021; originally announced March 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  22. arXiv:2101.08909  [pdf, other

    eess.AS cs.SD

    Study of Pre-processing Defenses against Adversarial Attacks on State-of-the-art Speaker Recognition Systems

    Authors: Sonal Joshi, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velázquez, Najim Dehak

    Abstract: Adversarial examples to speaker recognition (SR) systems are generated by adding a carefully crafted noise to the speech signal to make the system fail while being imperceptible to humans. Such attacks pose severe security risks, making it vital to deep-dive and understand how much the state-of-the-art SR systems are vulnerable to these attacks. Moreover, it is of greater importance to propose def… ▽ More

    Submitted 25 June, 2021; v1 submitted 21 January, 2021; originally announced January 2021.

    Comments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

  23. arXiv:2011.02090  [pdf, other

    eess.AS cs.SD

    Frustratingly Easy Noise-aware Training of Acoustic Models

    Authors: Desh Raj, Jesus Villalba, Daniel Povey, Sanjeev Khudanpur

    Abstract: Environmental noises and reverberation have a detrimental effect on the performance of automatic speech recognition (ASR) systems. Multi-condition training of neural network-based acoustic models is used to deal with this problem, but it requires many-folds data augmentation, resulting in increased training time. In this paper, we propose utterance-level noise vectors for noise-aware training of a… ▽ More

    Submitted 2 February, 2021; v1 submitted 3 November, 2020; originally announced November 2020.

    Comments: 6 + 3 (Appendix) pages

  24. arXiv:2011.01210  [pdf, other

    eess.AS cs.LG

    Focus on the present: a regularization method for the ASR source-target attention layer

    Authors: Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

    Abstract: This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training. Our method is based on the fact that both, CTC and source-target attention, are acting on the same encoder representations. To understand the functionality of the attention, CTC is applie… ▽ More

    Submitted 2 November, 2020; originally announced November 2020.

    Comments: submitted to ICASSP2021. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

  25. arXiv:2010.14602  [pdf, ps, other

    cs.SD cs.LG eess.AS

    CopyPaste: An Augmentation Method for Speech Emotion Recognition

    Authors: Raghavendra Pappagari, Jesús Villalba, Piotr Żelasko, Laureano Moro-Velazquez, Najim Dehak

    Abstract: Data augmentation is a widely used strategy for training robust machine learning models. It partially alleviates the problem of limited data for tasks like speech emotion recognition (SER), where collecting data is expensive and challenging. This study proposes CopyPaste, a perceptually motivated novel augmentation procedure for SER. Assuming that the presence of emotions other than neutral dictat… ▽ More

    Submitted 11 February, 2021; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: Accepted at ICASSP2021

  26. arXiv:2010.11860  [pdf, other

    eess.AS cs.SD

    Perceptual Loss based Speech Denoising with an ensemble of Audio Pattern Recognition and Self-Supervised Models

    Authors: Saurabh Kataria, Jesús Villalba, Najim Dehak

    Abstract: Deep learning based speech denoising still suffers from the challenge of improving perceptual quality of enhanced signals. We introduce a generalized framework called Perceptual Ensemble Regularization Loss (PERL) built on the idea of perceptual losses. Perceptual loss discourages distortion to certain speech properties and we analyze it using six large-scale pre-trained models: speaker classifica… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

  27. arXiv:2010.11221  [pdf, other

    eess.AS cs.LG cs.SD

    Learning Speaker Embedding from Text-to-Speech

    Authors: Jaejin Cho, Piotr Zelasko, Jesus Villalba, Shinji Watanabe, Najim Dehak

    Abstract: Zero-shot multi-speaker Text-to-Speech (TTS) generates target speaker voices given an input text and the corresponding speaker embedding. In this work, we investigate the effectiveness of the TTS reconstruction objective to improve representation learning for speaker verification. We jointly trained end-to-end Tacotron 2 TTS and speaker embedding networks in a self-supervised fashion. We hypothesi… ▽ More

    Submitted 21 October, 2020; originally announced October 2020.

  28. arXiv:2007.13033  [pdf, other

    eess.AS cs.LG cs.SD

    Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery

    Authors: Saurabhchand Bhati, Jesús Villalba, Piotr Żelasko, Najim Dehak

    Abstract: Unsupervised spoken term discovery consists of two tasks: finding the acoustic segment boundaries and labeling acoustically similar segments with the same labels. We perform segmentation based on the assumption that the frame feature vectors are more similar within a segment than across the segments. Therefore, for strong segmentation performance, it is crucial that the features represent the phon… ▽ More

    Submitted 25 July, 2020; originally announced July 2020.

  29. arXiv:2005.08331  [pdf, ps, other

    eess.AS cs.SD

    Single Channel Far Field Feature Enhancement For Speaker Verification In The Wild

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Paola García-Perera, Jesús Villalba, Najim Dehak

    Abstract: We investigated an enhancement and a domain adaptation approach to make speaker verification systems robust to perturbations of far-field speech. In the enhancement approach, using paired (parallel) reverberant-clean speech, we trained a supervised Generative Adversarial Network (GAN) along with a feature mapping loss. For the domain adaptation approach, we trained a Cycle Consistent Generative Ad… ▽ More

    Submitted 17 May, 2020; originally announced May 2020.

    Comments: submitted to INTERSPEECH 2020

  30. arXiv:2002.05039  [pdf, ps, other

    eess.AS cs.LG cs.SD stat.ML

    x-vectors meet emotions: A study on dependencies between emotion and speaker recognition

    Authors: Raghavendra Pappagari, Tianzi Wang, Jesus Villalba, Nanxin Chen, Najim Dehak

    Abstract: In this work, we explore the dependencies between speaker recognition and emotion recognition. We first show that knowledge learned for speaker recognition can be reused for emotion recognition through transfer learning. Then, we show the effect of emotion on speaker recognition. For emotion recognition, we show that using a simple linear model is enough to obtain good performance on the features… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

    Comments: 45th International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020

  31. arXiv:2002.00139  [pdf, other

    eess.AS cs.SD

    Analysis of Deep Feature Loss based Enhancement for Speaker Verification

    Authors: Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Najim Dehak

    Abstract: Data augmentation is conventionally used to inject robustness in Speaker Verification systems. Several recently organized challenges focus on handling novel acoustic environments. Deep learning based speech enhancement is a modern solution for this. Recently, a study proposed to optimize the enhancement network in the activation space of a pre-trained auxiliary network. This methodology, called de… ▽ More

    Submitted 27 April, 2020; v1 submitted 31 January, 2020; originally announced February 2020.

    Comments: 8 pages; accepted in Odyssey2020 workshop

  32. arXiv:1912.00938  [pdf

    eess.AS cs.SD

    Speaker detection in the wild: Lessons learned from JSALT 2019

    Authors: Paola Garcia, Jesus Villalba, Herve Bredin, Jun Du, Diego Castan, Alejandrina Cristia, Latane Bullock, Ling Guo, Koji Okabe, Phani Sankar Nidadavolu, Saurabh Kataria, Sizhu Chen, Leo Galmant, Marvin Lavechin, Lei Sun, Marie-Philippe Gill, Bar Ben-Yair, Sajjad Abdoli, Xin Wang, Wassim Bouaziz, Hadrien Titeux, Emmanuel Dupoux, Kong Aik Lee, Najim Dehak

    Abstract: This paper presents the problems and solutions addressed at the JSALT workshop when using a single microphone for speaker detection in adverse scenarios. The main focus was to tackle a wide range of conditions that go from meetings to wild speech. We describe the research threads we explored and a set of modules that was successful for these scenarios. The ultimate goal was to explore speaker dete… ▽ More

    Submitted 2 December, 2019; originally announced December 2019.

    Comments: Submitted to ICASSP 2020

  33. arXiv:1911.04908  [pdf, other

    eess.AS cs.LG cs.SD stat.ML

    Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition

    Authors: Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak

    Abstract: Recently very deep transformers have outperformed conventional bi-directional long short-term memory networks by a large margin in speech recognition. However, to put it into production usage, inference computation cost is still a serious concern in real scenarios. In this paper, we study two different non-autoregressive transformer structure for automatic speech recognition (ASR): A-CMLM and A-FM… ▽ More

    Submitted 6 April, 2020; v1 submitted 10 November, 2019; originally announced November 2019.

  34. arXiv:1910.11915  [pdf, ps, other

    eess.AS cs.SD

    Unsupervised Feature Enhancement for speaker verification

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Paola García-Perera, Najim Dehak

    Abstract: The task of making speaker verification systems robust to adverse scenarios remain a challenging and an active area of research. We developed an unsupervised feature enhancement approach in log-filter bank domain with the end goal of improving speaker verification performance. We experimented with using both real speech recorded in adverse environments and degraded speech obtained by simulation to… ▽ More

    Submitted 14 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: 5 pages; accepted in ICASSP 2020

  35. arXiv:1910.11909  [pdf, other

    eess.AS cs.SD

    Low-Resource Domain Adaptation for Speaker Recognition Using Cycle-GANs

    Authors: Phani Sankar Nidadavolu, Saurabh Kataria, Jesús Villalba, Najim Dehak

    Abstract: Current speaker recognition technology provides great performance with the x-vector approach. However, performance decreases when the evaluation domain is different from the training domain, an issue usually addressed with domain adaptation approaches. Recently, unsupervised domain adaptation using cycle-consistent Generative Adversarial Netorks (CycleGAN) has received a lot of attention. CycleGAN… ▽ More

    Submitted 25 October, 2019; originally announced October 2019.

    Comments: 8 pages, accepted to ASRU 2019

  36. arXiv:1910.11905  [pdf, ps, other

    eess.AS cs.SD

    Feature Enhancement with Deep Feature Losses for Speaker Verification

    Authors: Saurabh Kataria, Phani Sankar Nidadavolu, Jesús Villalba, Nanxin Chen, Paola García, Najim Dehak

    Abstract: Speaker Verification still suffers from the challenge of generalization to novel adverse environments. We leverage on the recent advancements made by deep learning based speech enhancement and propose a feature-domain supervised denoising based solution. We propose to use Deep Feature Loss which optimizes the enhancement network in the hidden activation space of a pre-trained auxiliary speaker emb… ▽ More

    Submitted 14 February, 2020; v1 submitted 25 October, 2019; originally announced October 2019.

    Comments: 5 pages, accepted in ICASSP 2020

  37. arXiv:1910.10781  [pdf, ps, other

    cs.CL cs.LG stat.ML

    Hierarchical Transformers for Long Document Classification

    Authors: Raghavendra Pappagari, Piotr Żelasko, Jesús Villalba, Yishay Carmiel, Najim Dehak

    Abstract: BERT, which stands for Bidirectional Encoder Representations from Transformers, is a recently introduced language representation model based upon the transfer learning paradigm. We extend its fine-tuning procedure to address one of its major limitations - applicability to inputs longer than a few hundred words, such as transcripts of human call conversations. Our method is conceptually simple. We… ▽ More

    Submitted 23 October, 2019; originally announced October 2019.

    Comments: 4 figures, 7 pages

    Journal ref: Automatic Speech Recognition and Understanding Workshop, 2019

  38. arXiv:1904.01120  [pdf, other

    cs.CL cs.LG cs.SD eess.AS

    ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

    Authors: Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

    Abstract: We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT). Anti-spoofing has gathered more and more attention since the inauguration of the ASVspoof Challenges, and ASVspoof 2019 dedicates to address attacks from all three major types: text-to-speech, voice conversion, and replay. Built upon previous research work on Dee… ▽ More

    Submitted 1 April, 2019; originally announced April 2019.

    Comments: Submitted to Interspeech 2019, Graz, Austria

  39. arXiv:1811.02162  [pdf, other

    eess.AS cs.SD

    Language model integration based on memory control for sequence to sequence speech recognition

    Authors: Jaejin Cho, Shinji Watanabe, Takaaki Hori, Murali Karthick Baskar, Hirofumi Inaguma, Jesus Villalba, Najim Dehak

    Abstract: In this paper, we explore several new schemes to train a seq2seq model to integrate a pre-trained LM. Our proposed fusion methods focus on the memory cell state and the hidden state in the seq2seq decoder long short-term memory (LSTM), and the memory cell state is updated by the LM unlike the prior studies. This means the memory retained by the main seq2seq would be adjusted by the external LM. Th… ▽ More

    Submitted 5 November, 2018; originally announced November 2018.

    Comments: 4 pages, 1 figure, 5 tables, submitted to ICASSP 2019

  40. arXiv:1101.5411  [pdf, ps, other

    cs.DM

    Efficient Algorithms for Searching Optimal Shortened Cyclic Single-Burst-Correcting Codes

    Authors: Luis Javier García Villalba, José René Fuentes Cortez, Ana Lucila Sandoval Orozco, Mario Blaum

    Abstract: In a previous work it was shown that the best measure for the efficiency of a single burst-correcting code is obtained using the Gallager bound as opposed to the Reiger bound. In this paper, an efficient algorithm that searches for the best (shortened) cyclic burst-correcting codes is presented. Using this algorithm, extensive tables that either tie existing constructions or improve them are obtai… ▽ More

    Submitted 27 January, 2011; originally announced January 2011.