Skip to main content

Showing 1–27 of 27 results for author: Araujo, G

  1. arXiv:2407.10730  [pdf, other

    cs.CV cs.PF

    ConvBench: A Comprehensive Benchmark for 2D Convolution Primitive Evaluation

    Authors: Lucas Alvarenga, Victor Ferrari, Rafael Souza, Marcio Pereira, Guido Araujo

    Abstract: Convolution is a compute-intensive operation placed at the heart of Convolution Neural Networks (CNNs). It has led to the development of many high-performance algorithms, such as Im2col-GEMM, Winograd, and Direct-Convolution. However, the comparison of different convolution algorithms is an error-prone task as it requires specific data layouts and system resources. Failure to address these require… ▽ More

    Submitted 15 July, 2024; originally announced July 2024.

    Comments: 5 pages, 3 figures, presented on MLArchSys workshop of ISCA'2024

  2. arXiv:2406.17523  [pdf, other

    cs.LG cs.AI

    On the consistency of hyper-parameter selection in value-based deep reinforcement learning

    Authors: Johan Obando-Ceron, João G. M. Araújo, Aaron Courville, Pablo Samuel Castro

    Abstract: Deep reinforcement learning (deep RL) has achieved tremendous success on various domains through a combination of algorithmic design and careful selection of hyper-parameters. Algorithmic improvements are often the result of iterative enhancements built upon prior approaches, while hyper-parameter choices are typically inherited from previous methods or fine-tuned specifically for the proposed tec… ▽ More

    Submitted 2 July, 2024; v1 submitted 25 June, 2024; originally announced June 2024.

  3. arXiv:2406.17268  [pdf, other

    cs.SE

    Search-based Trace Diagnostic

    Authors: Gabriel Araujo, Ricardo Caldas, Federico Formica, Genaína Rodrigues, Patrizio Pelliccione, Claudio Menghi

    Abstract: Cyber-physical systems (CPS) development requires verifying whether system behaviors violate their requirements. This analysis often considers system behaviors expressed by execution traces and requirements expressed by signal-based temporal properties. When an execution trace violates a requirement, engineers need to solve the trace diagnostic problem: They need to understand the cause of the bre… ▽ More

    Submitted 25 June, 2024; originally announced June 2024.

    Comments: 14 pages plus two for references

  4. arXiv:2406.04267  [pdf, other

    cs.CL cs.LG

    Transformers need glasses! Information over-squashing in language tasks

    Authors: Federico Barbero, Andrea Banino, Steven Kapturowski, Dharshan Kumaran, João G. M. Araújo, Alex Vitvitskyi, Razvan Pascanu, Petar Veličković

    Abstract: We study how information propagates in decoder-only Transformers, which are the architectural backbone of most existing frontier large language models (LLMs). We rely on a theoretical signal propagation analysis -- specifically, we analyse the representations of the last token in the final layer of the Transformer, as this is the representation used for next-token prediction. Our analysis reveals… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  5. arXiv:2403.13115  [pdf

    cs.SE

    Professional Insights into Benefits and Limitations of Implementing MLOps Principles

    Authors: Gabriel Araujo, Marcos Kalinowski, Markus Endler, Fabio Calefato

    Abstract: Context: Machine Learning Operations (MLOps) has emerged as a set of practices that combines development, testing, and operations to deploy and maintain machine learning applications. Objective: In this paper, we assess the benefits and limitations of using the MLOps principles in online supervised learning. Method: We conducted two focus group sessions on the benefits and limitations of applying… ▽ More

    Submitted 19 March, 2024; originally announced March 2024.

    Comments: Author version of paper accepted for publication at ICEIS 2024

  6. arXiv:2402.15332  [pdf, ps, other

    cs.LG cs.AI math.CT math.RA stat.ML

    Position: Categorical Deep Learning is an Algebraic Theory of All Architectures

    Authors: Bruno Gavranović, Paul Lessard, Andrew Dudzik, Tamara von Glehn, João G. M. Araújo, Petar Veličković

    Abstract: We present our position on the elusive quest for a general-purpose framework for specifying and studying deep learning architectures. Our opinion is that the key attempts made so far lack a coherent bridge between specifying constraints which models must satisfy and specifying their implementations. Focusing on building a such a bridge, we propose to apply category theory -- precisely, the univers… ▽ More

    Submitted 5 June, 2024; v1 submitted 23 February, 2024; originally announced February 2024.

    Comments: To appear in ICML 2024. Comments welcome. More info at categoricaldeeplearning.com

  7. arXiv:2402.03046  [pdf, other

    cs.LG

    Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

    Authors: Shengyi Huang, Quentin Gallouédec, Florian Felten, Antonin Raffin, Rousslan Fernand Julien Dossa, Yanxiao Zhao, Ryan Sullivan, Viktor Makoviychuk, Denys Makoviichuk, Mohamad H. Danesh, Cyril Roumégous, Jiayi Weng, Chufan Chen, Md Masudur Rahman, João G. M. Araújo, Guorui Quan, Daniel Tan, Timo Klein, Rujikorn Charakorn, Mark Towers, Yann Berthelot, Kinal Mehta, Dipam Chakraborty, Arjun KG, Valentin Charraut , et al. (8 additional authors not shown)

    Abstract: In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, i… ▽ More

    Submitted 5 February, 2024; originally announced February 2024.

    Comments: Under review

  8. arXiv:2311.15497  [pdf, other

    cs.CV cs.AI cs.LG

    Adaptive Image Registration: A Hybrid Approach Integrating Deep Learning and Optimization Functions for Enhanced Precision

    Authors: Gabriel De Araujo, Shanlin Sun, Xiaohui Xie

    Abstract: Image registration has traditionally been done using two distinct approaches: learning based methods, relying on robust deep neural networks, and optimization-based methods, applying complex mathematical transformations to warp images accordingly. Of course, both paradigms offer advantages and disadvantages, and, in this work, we seek to combine their respective strengths into a single streamlined… ▽ More

    Submitted 18 January, 2024; v1 submitted 26 November, 2023; originally announced November 2023.

  9. arXiv:2305.18236  [pdf, ps, other

    cs.DC cs.PF

    Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering

    Authors: Braedy Kuzma, Ivan Korostelev, João P. L. de Carvalho, José E. Moreira, Christopher Barton, Guido Araujo, José Nelson Amaral

    Abstract: The resurgence of machine learning has increased the demand for high-performance basic linear algebra subroutines (BLAS), which have long depended on libraries to achieve peak performance on commodity hardware. High-performance BLAS implementations rely on a layered approach that consists of tiling and packing layers, for data (re)organization, and micro kernels that perform the actual computation… ▽ More

    Submitted 15 May, 2023; originally announced May 2023.

    ACM Class: C.4

  10. Tensor Slicing and Optimization for Multicore NPUs

    Authors: Rafael Sousa, Marcio Pereira, Yongin Kwon, Taeho Kim, Namsoon Jung, Chang Soo Kim, Michael Frank, Guido Araujo

    Abstract: Although code generation for Convolution Neural Network (CNN) models has been extensively studied, performing efficient data slicing and parallelization for highly-constrai\-ned Multicore Neural Processor Units (NPUs) is still a challenging problem. Given the size of convolutions' input/output tensors and the small footprint of NPU on-chip memories, minimizing memory transactions while maximizing… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Journal ref: Journal of Parallel and Distributed Computing Journal of Parallel and Distributed Computing, Volume 175, May 2023, Pages 66-79

  11. arXiv:2303.04739  [pdf, other

    cs.CV cs.AR cs.LG cs.PF

    Advancing Direct Convolution using Convolution Slicing Optimization and ISA Extensions

    Authors: Victor Ferrari, Rafael Sousa, Marcio Pereira, João P. L. de Carvalho, José Nelson Amaral, José Moreira, Guido Araujo

    Abstract: Convolution is one of the most computationally intensive operations that must be performed for machine-learning model inference. A traditional approach to compute convolutions is known as the Im2Col + BLAS method. This paper proposes SConv: a direct-convolution algorithm based on a MLIR/LLVM code-generation toolchain that can be integrated into machine-learning compilers . This algorithm introduce… ▽ More

    Submitted 8 March, 2023; originally announced March 2023.

    Comments: 15 pages, 11 figures

  12. arXiv:2301.10835  [pdf, other

    cs.LG

    When Layers Play the Lottery, all Tickets Win at Initialization

    Authors: Artur Jordao, George Correa de Araujo, Helena de Almeida Maia, Helio Pedrini

    Abstract: Pruning is a standard technique for reducing the computational cost of deep networks. Many advances in pruning leverage concepts from the Lottery Ticket Hypothesis (LTH). LTH reveals that inside a trained dense network exists sparse subnetworks (tickets) able to achieve similar accuracy (i.e., win the lottery - winning tickets). Pruning at initialization focuses on finding winning tickets without… ▽ More

    Submitted 19 March, 2024; v1 submitted 25 January, 2023; originally announced January 2023.

    Comments: Published at International Conference on Computer Vision Workshop (ICCV), 2023

  13. arXiv:2210.03743  [pdf, other

    eess.IV cs.AI cs.CV cs.LG

    Single Image Super-Resolution Based on Capsule Neural Networks

    Authors: George Corrêa de Araújo, Helio Pedrini

    Abstract: Single image super-resolution (SISR) is the process of obtaining one high-resolution version of a low-resolution image by increasing the number of pixels per unit area. This method has been actively investigated by the research community, due to the wide variety of real-world problems where it can be applied, from aerial and satellite imaging to compressed image and video enhancement. Despite the… ▽ More

    Submitted 6 October, 2022; originally announced October 2022.

    Comments: 19 pages, 13 figures

    ACM Class: I.2.10; I.4.3; I.5.1

  14. The OpenMP Cluster Programming Model

    Authors: Hervé Yviquel, Marcio Pereira, Emílio Francesquini, Guilherme Valarini, Gustavo Leite, Pedro Rosso, Rodrigo Ceccato, Carla Cusihualpa, Vitoria Dias, Sandro Rigo, Alan Souza, Guido Araujo

    Abstract: Despite the various research initiatives and proposed programming models, efficient solutions for parallel programming in HPC clusters still rely on a complex combination of different programming models (e.g., OpenMP and MPI), languages (e.g., C++ and CUDA), and specialized runtimes (e.g., Charm++ and Legion). On the other hand, task parallelism has shown to be an efficient and seamless programmin… ▽ More

    Submitted 13 August, 2022; v1 submitted 12 July, 2022; originally announced July 2022.

    Comments: 12 pages, 7 figures, 1 listing, to be published in the 51st International Conference on Parallel Processing Workshop Proceedings (ICPP Workshops 22)

    ACM Class: D.4.1; D.3.2

  15. arXiv:2207.02700  [pdf, other

    cs.IT eess.SP

    Channel Estimation in RIS-Assisted MIMO Systems Operating Under Imperfections

    Authors: Paulo R. B. Gomes, Gilderlan T. de Araújo, Bruno Sokal, André L. F. de Almeida, Behrooz Makki, Gábor Fodor

    Abstract: Reconfigurable intelligent surface is a potential technology component of future wireless networks due to its capability of shaping the wireless environment. The promising MIMO systems in terms of extended coverage and enhanced capacity are, however, critically dependent on the accuracy of the channel state information. However, traditional channel estimation schemes are not applicable in RIS-assi… ▽ More

    Submitted 6 July, 2022; originally announced July 2022.

    Comments: arXiv admin note: text overlap with arXiv:2206.03557

  16. arXiv:2205.10290  [pdf, other

    cs.IT eess.SP

    Semi-Blind Joint Channel and Symbol Estimation for IRS-Assisted MIMO Systems

    Authors: Gilderlan Tavares de Araújo, André Lima Férrer de Almeida, Rémy Boyer, Gábor Fodor

    Abstract: Intelligent reflecting surface (IRS) is a promising technology for the 6th generation of wireless systems, realizing the smart radio environment concept. In this paper, we present a novel tensor-based receiver for IRS-assisted multiple-input multiple-output communications capable of jointly estimating the channels and the transmitted data streams in a semi-blind fashion. Assuming a fully passive I… ▽ More

    Submitted 20 May, 2022; originally announced May 2022.

  17. arXiv:2204.06514  [pdf, other

    cs.LG cs.CL

    Scalable Training of Language Models using JAX pjit and TPUv4

    Authors: Joanna Yoo, Kuba Perlin, Siddhartha Rao Kamalakara, João G. M. Araújo

    Abstract: Modern large language models require distributed training strategies due to their size. The challenges of efficiently and robustly training them are met with rapid developments on both software and hardware frontiers. In this technical report, we explore challenges and design decisions associated with developing a scalable training framework, and present a quantitative analysis of efficiency impro… ▽ More

    Submitted 13 April, 2022; originally announced April 2022.

    Comments: 5 pages, 4 figures

  18. arXiv:2202.11087  [pdf, other

    cs.IT eess.SP

    Semi-Blind Joint Channel and Symbol Estimation in IRS-Assisted Multi-User MIMO Networks

    Authors: Gilderlan Tavares de Araújo, Paulo Ricardo Brboza Gomes, André Lima Férrer de Almeida, Gabor Fodor, Behrooz Makki

    Abstract: Intelligent reflecting surface (IRS) is a promising technology for beyond 5th Generation of the wireless communications. In fully passive IRS-assisted systems, channel estimation is challenging and should be carried out only at the base station or at the terminals since the elements of the IRS are incapable of processing signals. In this letter, we formulate a tensor-based semi-blind receiver that… ▽ More

    Submitted 4 May, 2022; v1 submitted 22 February, 2022; originally announced February 2022.

  19. arXiv:2202.04153  [pdf, other

    cs.PL

    Source Matching and Rewriting

    Authors: Vinicius Couto, Luciano Zago, Hervé Yviquel, Guido Araújo

    Abstract: A typical compiler flow relies on a uni-directional sequence of translation/optimization steps that lower the program abstract representation, making it hard to preserve higher-level program information across each transformation step. On the other hand, modern ISA extensions and hardware accelerators can benefit from the compiler's ability to detect and raise program idioms to acceleration instru… ▽ More

    Submitted 4 February, 2022; originally announced February 2022.

    Comments: 10 pages, 7 figures

  20. arXiv:2110.12609  [pdf, other

    cs.CL cs.LG

    No News is Good News: A Critique of the One Billion Word Benchmark

    Authors: Helen Ngo, João G. M. Araújo, Jeffrey Hui, Nicholas Frosst

    Abstract: The One Billion Word Benchmark is a dataset derived from the WMT 2011 News Crawl, commonly used to measure language modeling ability in natural language processing. We train models solely on Common Crawl web scrapes partitioned by year, and demonstrate that they perform worse on this task over time due to distributional shift. Analysis of this corpus reveals that it contains several examples of ha… ▽ More

    Submitted 24 October, 2021; originally announced October 2021.

  21. arXiv:2108.07790  [pdf, other

    cs.CL cs.LG

    Mitigating harm in language models with conditional-likelihood filtration

    Authors: Helen Ngo, Cooper Raterink, João G. M. Araújo, Ivan Zhang, Carol Chen, Adrien Morisot, Nicholas Frosst

    Abstract: Language models trained on large-scale unfiltered datasets curated from the open web acquire systemic biases, prejudices, and harmful views from their training data. We present a methodology for programmatically identifying and removing harmful text from web-scale datasets. A pretrained language model is used to calculate the log-likelihood of researcher-written trigger phrases conditioned on a sp… ▽ More

    Submitted 27 November, 2021; v1 submitted 4 August, 2021; originally announced August 2021.

  22. NDN4IVC: A Framework for Simulating and Testing of Applications in Vehicular Named Data Networking

    Authors: Guilherme B. Araujo, Maycon L. M. Peixoto, Leobino N. Sampaio

    Abstract: This paper presents a customized framework (NDN4IVC) for simulating and testing intelligent transportation systems and applications in vehicular named-data networking (V-NDN). The project uses two popular simulators in the literature for VANET simulation, a network simulator based on discrete events (Ns-3), with ndnSIM module installed, and Sumo, a simulator for urban mobility. NDN4IVC allows bidi… ▽ More

    Submitted 9 July, 2021; v1 submitted 1 July, 2021; originally announced July 2021.

    Report number: Computer Networks 1389-1286

    Journal ref: 2023

  23. arXiv:2103.10573  [pdf, other

    cs.DC

    Enabling OpenMP Task Parallelism on Multi-FPGAs

    Authors: R. Nepomuceno, R. Sterle, G. Valarini, M. Pereira, H. Yviquel, G. Araujo

    Abstract: FPGA-based hardware accelerators have received increasing attention mainly due to their ability to accelerate deep pipelined applications, thus resulting in higher computational performance and energy efficiency. Nevertheless, the amount of resources available on even the most powerful FPGA is still not enough to speed up very large modern workloads. To achieve that, FPGAs need to be interconnecte… ▽ More

    Submitted 21 March, 2021; v1 submitted 18 March, 2021; originally announced March 2021.

  24. Multicloud API binding generation from documentation

    Authors: Michał J. Gajda, Vitor Vitali Barrozzi, Gabriel Araujo

    Abstract: We present industry experience from implementing retargetable cloud API binding generator. The analysis is implemented in Haskell, using type classes, types a la carte, and code generation monad. It also targets Haskell, and allows us to bind cloud APIs on short notice, and unprecedented scale.

    Submitted 5 November, 2020; originally announced November 2020.

    Comments: Presented on XP 2020: Agility in Microservices workshop

  25. arXiv:2007.14863  [pdf, other

    cs.CV eess.IV

    Automatic Detection of Aedes aegypti Breeding Grounds Based on Deep Networks with Spatio-Temporal Consistency

    Authors: Wesley L. Passos, Gabriel M. Araujo, Amaro A. de Lima, Sergio L. Netto, Eduardo A. B. da Silva

    Abstract: Every year, the Aedes aegypti mosquito infects millions of people with diseases such as dengue, zika, chikungunya, and urban yellow fever. The main form to combat these diseases is to avoid mosquito reproduction by searching for and eliminating the potential mosquito breeding grounds. In this work, we introduce a comprehensive dataset of aerial videos, acquired with an unmanned aerial vehicle, con… ▽ More

    Submitted 27 November, 2021; v1 submitted 29 July, 2020; originally announced July 2020.

  26. Adaptive mixed norm optical flow estimation

    Authors: Vania V. Estrela, Matthias O. Franz, Ricardo T. Lopes, G. P. De Araujo

    Abstract: The pel-recursive computation of 2-D optical flow has been extensively studied in computer vision to estimate motion from image sequences, but it still raises a wealth of issues, such as the treatment of outliers, motion discontinuities and occlusion. It relies on spatio-temporal brightness variations due to motion. Our proposed adaptive regularized approach deals with these issues within a common… ▽ More

    Submitted 3 November, 2016; originally announced November 2016.

    Comments: 8 pages, 4 figures. arXiv admin note: text overlap with arXiv:1403.7365

    Journal ref: Proc. SPIE 5960, Visual Communications and Image Processing 2005, 59603W, July 31, 2006, Beijing, China

  27. arXiv:1505.05135  [pdf

    cs.PF

    Network Simulator - Visão Geral da Ferramenta de Simulação de Redes

    Authors: Marcos Portnoi, Rafael Gonçalves Bezerra de Araújo

    Abstract: This paper describes NS - Network Simulator, the computer networks simulation tool. We offer an overview NS, and also analyze its characteristics and functions. Finally, we present in detail all steps for preparing a simulation of a simple model in NS.

    Submitted 27 April, 2015; originally announced May 2015.

    Comments: in Portuguese, Seminário Estudantil de Produção Acadêmica, 2002