Skip to main content

Showing 1–16 of 16 results for author: Avila, R

  1. arXiv:2403.08654  [pdf, other

    eess.AS cs.SD

    An Efficient End-to-End Approach to Noise Invariant Speech Features via Multi-Task Learning

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

    Abstract: Self-supervised speech representation learning enables the extraction of meaningful features from raw waveforms. These features can then be efficiently used across multiple downstream tasks. However, two significant issues arise when considering the deployment of such methods ``in-the-wild": (i) Their large size, which can be prohibitive for edge applications; and (ii) their robustness to detrimen… ▽ More

    Submitted 13 March, 2024; originally announced March 2024.

    Comments: Under review on IEEE Transactions on Audio, Speech, and Language Processing (2024)

  2. arXiv:2402.03353  [pdf

    q-fin.ST cs.LG math.FA math.NA

    Tweet Influence on Market Trends: Analyzing the Impact of Social Media Sentiment on Biotech Stocks

    Authors: C. Sarai R. Avila

    Abstract: This study investigates the relationship between tweet sentiment across diverse categories: news, company opinions, CEO opinions, competitor opinions, and stock market behavior in the biotechnology sector, with a focus on understanding the impact of social media discourse on investor sentiment and decision-making processes. We analyzed historical stock market data for ten of the largest and most i… ▽ More

    Submitted 26 January, 2024; originally announced February 2024.

    Comments: This submission includes 51 pages and 24 figures

    MSC Class: 62P05; 91G70; 62H30; 91B84; 68T05 ACM Class: I.2.7; I.2.6; K.4.1; A.0; J.1

  3. arXiv:2309.14462  [pdf, ps, other

    eess.AS cs.SD

    On the Impact of Quantization and Pruning of Self-Supervised Speech Models for Downstream Speech Recognition Tasks "In-the-Wild''

    Authors: Arthur Pimentel, Heitor Guimarães, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Recent advances with self-supervised learning have allowed speech recognition systems to achieve state-of-the-art (SOTA) word error rates (WER) while requiring only a fraction of the labeled training data needed by its predecessors. Notwithstanding, while such models achieve SOTA performance in matched train/test conditions, their performance degrades substantially when tested in unseen conditions… ▽ More

    Submitted 25 September, 2023; originally announced September 2023.

  4. arXiv:2306.06819  [pdf, other

    cs.CL cs.LG eess.AS

    Multimodal Audio-textual Architecture for Robust Spoken Language Understanding

    Authors: Anderson R. Avila, Mehdi Rezagholizadeh, Chao Xing

    Abstract: Recent voice assistants are usually based on the cascade spoken language understanding (SLU) solution, which consists of an automatic speech recognition (ASR) engine and a natural language understanding (NLU) system. Because such approach relies on the ASR output, it often suffers from the so-called ASR error propagation. In this work, we investigate impacts of this ASR error propagation on state-… ▽ More

    Submitted 13 June, 2023; v1 submitted 11 June, 2023; originally announced June 2023.

  5. arXiv:2304.09655  [pdf, other

    cs.CR

    How Secure is Code Generated by ChatGPT?

    Authors: Raphaël Khoury, Anderson R. Avila, Jacob Brunelle, Baba Mamadou Camara

    Abstract: In recent years, large language models have been responsible for great advances in the field of artificial intelligence (AI). ChatGPT in particular, an AI chatbot developed and recently released by OpenAI, has taken the field to the next level. The conversational model is able not only to process human-like text, but also to translate natural language into code. However, the safety of programs gen… ▽ More

    Submitted 19 April, 2023; originally announced April 2023.

  6. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  7. arXiv:2302.09437  [pdf, other

    eess.AS cs.SD

    RobustDistiller: Compressing Universal Speech Representations for Enhanced Environment Robustness

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Boxing Chen, Tiago H. Falk

    Abstract: Self-supervised speech pre-training enables deep neural network models to capture meaningful and disentangled factors from raw waveform signals. The learned universal speech representations can then be used across numerous downstream tasks. These representations, however, are sensitive to distribution shifts caused by environmental factors, such as noise and/or room reverberation. Their large size… ▽ More

    Submitted 22 February, 2023; v1 submitted 18 February, 2023; originally announced February 2023.

    Comments: Accepted by ICASSP 2023

  8. arXiv:2211.06562  [pdf, ps, other

    cs.SD cs.CL eess.AS

    Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement

    Authors: Heitor R. Guimarães, Arthur Pimentel, Anderson R. Avila, Mehdi Rezagholizadeh, Tiago H. Falk

    Abstract: Self-supervised speech representation learning aims to extract meaningful factors from the speech signal that can later be used across different downstream tasks, such as speech and/or emotion recognition. Existing models, such as HuBERT, however, can be fairly large thus may not be suitable for edge speech applications. Moreover, realistic applications typically involve speech corrupted by noise… ▽ More

    Submitted 11 November, 2022; originally announced November 2022.

    Comments: ENLSP-II NeurIPS Workshop 2022, 6 pages

  9. arXiv:2207.07497  [pdf, other

    cs.SD cs.LG eess.AS

    Low-bit Shift Network for End-to-End Spoken Language Understanding

    Authors: Anderson R. Avila, Khalil Bibi, Rui Heng Yang, Xinlin Li, Chao Xing, Xiao Chen

    Abstract: Deep neural networks (DNN) have achieved impressive success in multiple domains. Over the years, the accuracy of these models has increased with the proliferation of deeper and more complex architectures. Thus, state-of-the-art solutions are often computationally expensive, which makes them unfit to be deployed on edge computing platforms. In order to mitigate the high computation, memory, and pow… ▽ More

    Submitted 15 July, 2022; originally announced July 2022.

    Comments: Accepted at INTERSPEECH 2022

  10. arXiv:2106.04660  [pdf, other

    cs.CL cs.SD eess.AS

    Sequential End-to-End Intent and Slot Label Classification and Localization

    Authors: Yiran Cao, Nihal Potdar, Anderson R. Avila

    Abstract: Human-computer interaction (HCI) is significantly impacted by delayed responses from a spoken dialogue system. Hence, end-to-end (e2e) spoken language understanding (SLU) solutions have recently been proposed to decrease latency. Such approaches allow for the extraction of semantic information directly from the speech signal, thus bypassing the need for a transcript from an automatic speech recogn… ▽ More

    Submitted 8 June, 2021; originally announced June 2021.

    Comments: Accepted at Interspeech 2021

  11. arXiv:2105.10042  [pdf, other

    cs.CL cs.SD eess.AS

    A Streaming End-to-End Framework For Spoken Language Understanding

    Authors: Nihal Potdar, Anderson R. Avila, Chao Xing, Dong Wang, Yiran Cao, Xiao Chen

    Abstract: End-to-end spoken language understanding (SLU) has recently attracted increasing interest. Compared to the conventional tandem-based approach that combines speech recognition and language understanding as separate modules, the new approach extracts users' intentions directly from the speech signals, resulting in joint optimization and low latency. Such an approach, however, is typically designed t… ▽ More

    Submitted 17 July, 2021; v1 submitted 20 May, 2021; originally announced May 2021.

    Comments: Accepted at IJCAI 2021

  12. arXiv:2104.02472  [pdf, other

    cs.LG eess.IV eess.SP

    Depth Evaluation for Metal Surface Defects by Eddy Current Testing using Deep Residual Convolutional Neural Networks

    Authors: Tian Meng, Yang Tao, Ziqi Chen, Jorge R. Salas Avila, Qiaoye Ran, Yuchun Shao, Ruochen Huang, Yuedong Xie, Qian Zhao, Zhijie Zhang, Hujun Yin, Anthony J. Peyton, Wuliang Yin

    Abstract: Eddy current testing (ECT) is an effective technique in the evaluation of the depth of metal surface defects. However, in practice, the evaluation primarily relies on the experience of an operator and is often carried out by manual inspection. In this paper, we address the challenges of automatic depth evaluation of metal surface defects by virtual of state-of-the-art deep learning (DL) techniques… ▽ More

    Submitted 8 March, 2021; originally announced April 2021.

  13. Advanced Join Patterns for the Actor Model based on CEP Techniques

    Authors: Humberto Rodriguez Avila, Joeri De Koster, Wolfgang De Meuter

    Abstract: Context: Actor-based programming languages offer many essential features for developing modern distributed reactive systems. These systems exploit the actor model's isolation property to fulfill their performance and scalability demands. Unfortunately, the reliance of the model on isolation as its most fundamental property requires programmers to express complex interaction patterns between their… ▽ More

    Submitted 30 October, 2020; originally announced October 2020.

    Journal ref: The Art, Science, and Engineering of Programming, 2021, Vol. 5, Issue 2, Article 10

  14. arXiv:2007.15693  [pdf, other

    cs.CV cs.LG eess.IV

    Deep learning for lithological classification of carbonate rock micro-CT images

    Authors: Carlos E. M. dos Anjos, Manuel R. V. Avila, Adna G. P. Vasconcelos, Aurea M. P. Neta, Lizianne C. Medeiros, Alexandre G. Evsukoff, Rodrigo Surmas

    Abstract: In addition to the ongoing development, pre-salt carbonate reservoir characterization remains a challenge, primarily due to inherent geological particularities. These challenges stimulate the use of well-established technologies, such as artificial intelligence algorithms, for image classification tasks. Therefore, this work intends to present an application of deep learning techniques to identify… ▽ More

    Submitted 30 July, 2020; originally announced July 2020.

    Comments: 13 pages, 8 figures

  15. arXiv:2005.14181  [pdf, other

    eess.AS cs.SD eess.SP stat.AP stat.ML

    Bayesian Restoration of Audio Degraded by Low-Frequency Pulses Modeled via Gaussian Process

    Authors: Hugo Tremonte de Carvalho, Flávio Rainho Ávila, Luiz Wagner Pereira Biscainho

    Abstract: A common defect found when reproducing old vinyl and gramophone recordings with mechanical devices are the long pulses with significant low-frequency content caused by the interaction of the arm-needle system with deep scratches or even breakages on the media surface. Previous approaches to their suppression on digital counterparts of the recordings depend on a prior estimation of the pulse locati… ▽ More

    Submitted 26 September, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: 14 pages, 7 figures, 4 tables. Submitted to IEEE Journal of Selected Topics in Signal Processing - Special Issue "Reconstruction of audio from incomplete or highly degraded observations"

  16. arXiv:1903.06908  [pdf, other

    eess.AS cs.SD

    Non-intrusive speech quality assessment using neural networks

    Authors: Anderson R. Avila, Hannes Gamper, Chandan Reddy, Ross Cutler, Ivan Tashev, Johannes Gehrke

    Abstract: Estimating the perceived quality of an audio signal is critical for many multimedia and audio processing systems. Providers strive to offer optimal and reliable services in order to increase the user quality of experience (QoE). In this work, we present an investigation of the applicability of neural networks for non-intrusive audio quality assessment. We propose three neural network-based approac… ▽ More

    Submitted 16 March, 2019; originally announced March 2019.

    Comments: Accepted at ICASSP 2019