Skip to main content

Showing 1–20 of 20 results for author: Miranda, B

  1. arXiv:2406.06555  [pdf, other

    cs.LG cs.AI cs.CL cs.PL

    An Evaluation Benchmark for Autoformalization in Lean4

    Authors: Aryan Gulati, Devanshu Ladsaria, Shubhra Mishra, Jasdeep Sidhu, Brando Miranda

    Abstract: Large Language Models (LLMs) hold the potential to revolutionize autoformalization. The introduction of Lean4, a mathematical programming language, presents an unprecedented opportunity to rigorously assess the autoformalization capabilities of LLMs. This paper introduces a novel evaluation benchmark designed for Lean4, applying it to test the abilities of state-of-the-art LLMs, including GPT-3.5,… ▽ More

    Submitted 1 June, 2024; originally announced June 2024.

    Comments: To appear at ICLR 2024 as part of the Tiny Papers track

  2. arXiv:2406.04391  [pdf, other

    cs.LG cs.AI cs.CL

    Why Has Predicting Downstream Capabilities of Frontier AI Models with Scale Remained Elusive?

    Authors: Rylan Schaeffer, Hailey Schoelkopf, Brando Miranda, Gabriel Mukobi, Varun Madan, Adam Ibrahim, Herbie Bradley, Stella Biderman, Sanmi Koyejo

    Abstract: Predictable behavior from scaling advanced AI systems is an extremely desirable property. Although a well-established literature exists on how pretraining performance scales, the literature on how particular downstream capabilities scale is significantly muddier. In this work, we take a step back and ask: why has predicting specific downstream capabilities with scale remained elusive? While many f… ▽ More

    Submitted 6 June, 2024; originally announced June 2024.

  3. arXiv:2308.09013  [pdf, other

    cs.LG eess.SP

    Deep-seeded Clustering for Unsupervised Valence-Arousal Emotion Recognition from Physiological Signals

    Authors: Antoine Dubois, Carlos Lima Azevedo, Sonja Haustein, Bruno Miranda

    Abstract: Emotions play a significant role in the cognitive processes of the human brain, such as decision making, learning and perception. The use of physiological signals has shown to lead to more objective, reliable and accurate emotion recognition combined with raising machine learning methods. Supervised learning methods have dominated the attention of the research community, but the challenge in colle… ▽ More

    Submitted 17 August, 2023; originally announced August 2023.

    Comments: 7 pages, 1 figure, 2 tables

  4. arXiv:2306.13841  [pdf, other

    cs.LG cs.AI cs.CL cs.CV

    Is Pre-training Truly Better Than Meta-Learning?

    Authors: Brando Miranda, Patrick Yu, Saumya Goyal, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: In the context of few-shot learning, it is currently believed that a fixed pre-trained (PT) model, along with fine-tuning the final layer during evaluation, outperforms standard meta-learning algorithms. We re-evaluate these claims under an in-depth empirical examination of an extensive set of formally diverse datasets and compare PT to Model Agnostic Meta-Learning (MAML). Unlike previous work, we… ▽ More

    Submitted 23 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning 2023 DMLR Workshop

  5. arXiv:2306.13840  [pdf, other

    cs.CL cs.AI cs.LG cs.NE

    Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data

    Authors: Alycia Lee, Brando Miranda, Sudharsan Sundar, Sanmi Koyejo

    Abstract: Current trends to pre-train capable Large Language Models (LLMs) mostly focus on scaling of model and dataset size. However, the quality of pre-training data is an important factor for training powerful LLMs, yet it is a nebulous concept that has not been fully characterized. Therefore, we use the recently proposed Task2Vec diversity coefficient to ground and understand formal aspects of data qual… ▽ More

    Submitted 26 September, 2023; v1 submitted 23 June, 2023; originally announced June 2023.

    Journal ref: Proceedings of the 40th International Conference on Machine Learning DMLR 2023

  6. arXiv:2304.15004  [pdf, other

    cs.AI cs.LG

    Are Emergent Abilities of Large Language Models a Mirage?

    Authors: Rylan Schaeffer, Brando Miranda, Sanmi Koyejo

    Abstract: Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intriguing is two-fold: their sharpness, transitioning seemingly instantaneously from not present to present, and their unpredictability, appearing at seemingly unforeseeable model scales. Here, we present an… ▽ More

    Submitted 22 May, 2023; v1 submitted 28 April, 2023; originally announced April 2023.

  7. arXiv:2304.10500  [pdf, other

    cs.PL cs.AI cs.LG cs.LO cs.SC

    Transformer Models for Type Inference in the Simply Typed Lambda Calculus: A Case Study in Deep Learning for Code

    Authors: Brando Miranda, Avi Shinnar, Vasily Pestun, Barry Trager

    Abstract: Despite a growing body of work at the intersection of deep learning and formal languages, there has been relatively little systematic exploration of transformer models for reasoning about typed lambda calculi. This is an interesting area of inquiry for two reasons. First, typed lambda calculi are the lingua franc of programming languages. A set of heuristics that relate various typed lambda calcul… ▽ More

    Submitted 15 March, 2023; originally announced April 2023.

    Comments: 22 pages

  8. arXiv:2208.01545  [pdf, other

    cs.LG

    The Curse of Low Task Diversity: On the Failure of Transfer Learning to Outperform MAML and Their Empirical Equivalence

    Authors: Brando Miranda, Patrick Yu, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: Recently, it has been observed that a transfer learning solution might be all we need to solve many few-shot learning benchmarks -- thus raising important questions about when and how meta-learning algorithms should be deployed. In this paper, we seek to clarify these questions by 1. proposing a novel metric -- the diversity coefficient -- to measure the diversity of tasks in a few-shot learning b… ▽ More

    Submitted 2 August, 2022; originally announced August 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2112.13121

  9. arXiv:2112.13137  [pdf, other

    cs.LG cs.AI cs.CV cs.NE

    Does MAML Only Work via Feature Re-use? A Data Centric Perspective

    Authors: Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: Recent work has suggested that a good embedding is all we need to solve many few-shot learning benchmarks. Furthermore, other work has strongly suggested that Model Agnostic Meta-Learning (MAML) also works via this same method - by learning a good embedding. These observations highlight our lack of understanding of what meta-learning algorithms are doing and when they work. In this work, we provid… ▽ More

    Submitted 24 December, 2021; originally announced December 2021.

    Comments: 15 pages, 12 figures

  10. arXiv:2112.13121   

    cs.LG cs.AI cs.CV cs.NE

    The Curse of Zero Task Diversity: On the Failure of Transfer Learning to Outperform MAML and their Empirical Equivalence

    Authors: Brando Miranda, Yu-Xiong Wang, Sanmi Koyejo

    Abstract: Recently, it has been observed that a transfer learning solution might be all we need to solve many few-shot learning benchmarks -- thus raising important questions about when and how meta-learning algorithms should be deployed. In this paper, we seek to clarify these questions by proposing a novel metric -- the diversity coefficient -- to measure the diversity of tasks in a few-shot learning benc… ▽ More

    Submitted 28 November, 2022; v1 submitted 24 December, 2021; originally announced December 2021.

    Comments: An updated version with updated correction is at arXiv:2208.01545 and it's acompanying neurips submission is at https://brando90.github.io/brandomiranda/publications.html

  11. arXiv:2012.03759  [pdf, other

    cs.SE

    Exposing Bugs in JavaScript Engines through Test Transplantation and Differential Testing

    Authors: Igor Lima, Jefferson Silva, Breno Miranda, Gustavo Pinto, Marcelo d'Amorim

    Abstract: Context. JavaScript is a popular programming language today with several implementations competing for market dominance. Although a specification document and a conformance test suite exist to guide engine development, bugs occur and have important practical consequences. Implementing correct engines is challenging because the spec is intentionally incomplete and evolves frequently. Objective. Thi… ▽ More

    Submitted 7 December, 2020; originally announced December 2020.

    Comments: 32 pages, 2 figuras

    Journal ref: Software Quality Journal 2021

  12. arXiv:1903.04991  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    Theory III: Dynamics and Generalization in Deep Networks

    Authors: Andrzej Banburski, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Fernanda De La Torre, Jack Hidary, Tomaso Poggio

    Abstract: The key to generalization is controlling the complexity of the network. However, there is no obvious control of complexity -- such as an explicit regularization term -- in the training of deep networks for classification. We will show that a classical form of norm control -- but kind of hidden -- is present in deep networks trained with gradient descent techniques on exponential-type losses. In pa… ▽ More

    Submitted 10 April, 2020; v1 submitted 12 March, 2019; originally announced March 2019.

    Comments: 47 pages, 11 figures. This replaces previous versions of Theory III, that appeared on Arxiv [arXiv:1806.11379, arXiv:1801.00173] or on the CBMM site. v5: Changes throughout the paper to the presentation and tightening some of the statements

  13. arXiv:1807.09659  [pdf, other

    cs.LG cs.AI cs.CV stat.ML

    A Surprising Linear Relationship Predicts Test Performance in Deep Networks

    Authors: Qianli Liao, Brando Miranda, Andrzej Banburski, Jack Hidary, Tomaso Poggio

    Abstract: Given two networks with the same training loss on a dataset, when would they have drastically different test losses and errors? Better understanding of this question of generalization may improve practical applications of deep networks. In this paper we show that with cross-entropy loss it is surprisingly simple to induce significantly different generalization performances for two networks that ha… ▽ More

    Submitted 25 July, 2018; originally announced July 2018.

  14. arXiv:1806.11379  [pdf, other

    cs.LG cs.AI cs.NE stat.ML

    Theory IIIb: Generalization in Deep Networks

    Authors: Tomaso Poggio, Qianli Liao, Brando Miranda, Andrzej Banburski, Xavier Boix, Jack Hidary

    Abstract: A main puzzle of deep neural networks (DNNs) revolves around the apparent absence of "overfitting", defined in this paper as follows: the expected error does not get worse when increasing the number of neurons or of iterations of gradient descent. This is surprising because of the large capacity demonstrated by DNNs to fit randomly labeled data and the absence of explicit regularization. Recent re… ▽ More

    Submitted 29 June, 2018; originally announced June 2018.

    Comments: 38 pages, 7 figures

  15. arXiv:1801.02254  [pdf, other

    cs.LG

    Theory of Deep Learning IIb: Optimization Properties of SGD

    Authors: Chiyuan Zhang, Qianli Liao, Alexander Rakhlin, Brando Miranda, Noah Golowich, Tomaso Poggio

    Abstract: In Theory IIb we characterize with a mix of theory and experiments the optimization of deep convolutional networks by Stochastic Gradient Descent. The main new result in this paper is theoretical and experimental evidence for the following conjecture about SGD: SGD concentrates in probability -- like the classical Langevin equation -- on large volume, "flat" minima, selecting flat minimizers which… ▽ More

    Submitted 7 January, 2018; originally announced January 2018.

  16. arXiv:1801.00173  [pdf, other

    cs.LG

    Theory of Deep Learning III: explaining the non-overfitting puzzle

    Authors: Tomaso Poggio, Kenji Kawaguchi, Qianli Liao, Brando Miranda, Lorenzo Rosasco, Xavier Boix, Jack Hidary, Hrushikesh Mhaskar

    Abstract: A main puzzle of deep networks revolves around the absence of overfitting despite large overparametrization and despite the large capacity demonstrated by zero training error on randomly labeled data. In this note, we show that the dynamics associated to gradient descent minimization of nonlinear networks is topologically equivalent, near the asymptotically stable minima of the empirical error, to… ▽ More

    Submitted 16 January, 2018; v1 submitted 30 December, 2017; originally announced January 2018.

  17. arXiv:1711.05104  [pdf, other

    cs.CV physics.data-an

    An optimized shape descriptor based on structural properties of networks

    Authors: Gisele H. B. Miranda, Jeaneth Machicao, Odemir M. Bruno

    Abstract: The structural analysis of shape boundaries leads to the characterization of objects as well as to the understanding of shape properties. The literature on graphs and networks have contributed to the structural characterization of shapes with different theoretical approaches. We performed a study on the relationship between the shape architecture and the network topology constructed over the shape… ▽ More

    Submitted 14 November, 2017; originally announced November 2017.

    Comments: 19 pages, 13 figures

  18. arXiv:1611.00740  [pdf, other

    cs.LG

    Why and When Can Deep -- but Not Shallow -- Networks Avoid the Curse of Dimensionality: a Review

    Authors: Tomaso Poggio, Hrushikesh Mhaskar, Lorenzo Rosasco, Brando Miranda, Qianli Liao

    Abstract: The paper characterizes classes of functions for which deep learning can be exponentially better than shallow learning. Deep convolutional networks are a special case of these conditions, though weight sharing is not the main reason for their exponential advantage.

    Submitted 4 February, 2017; v1 submitted 2 November, 2016; originally announced November 2016.

  19. Authorship Attribution Based on Life-Like Network Automata

    Authors: Jeaneth Machicao, Edilson A. Corrêa Jr., Gisele H. B. Miranda, Diego R. Amancio, Odemir M. Bruno

    Abstract: The authorship attribution is a problem of considerable practical and technical interest. Several methods have been designed to infer the authorship of disputed documents in multiple contexts. While traditional statistical methods based solely on word counts and related measurements have provided a simple, yet effective solution in particular cases; they are prone to manipulation. Recently, texts… ▽ More

    Submitted 20 October, 2016; originally announced October 2016.

    Journal ref: PLoS ONE 13(3): e0193703, 2018

  20. arXiv:1008.5387  [pdf, other

    cs.AI astro-ph.CO q-bio.QM

    Pattern Recognition in Collective Cognitive Systems: Hybrid Human-Machine Learning (HHML) By Heterogeneous Ensembles

    Authors: Hesam T. Dashti, Adel Ardalan, Alireza F. Siahpirani, Jernej Tonejc, Ioan V. Uilecan, Tiago Simas, Bruno Miranda, Rita Ribeiro, Liya Wang, Amir H. Assadi

    Abstract: The ubiquitous role of the cyber-infrastructures, such as the WWW, provides myriad opportunities for machine learning and its broad spectrum of application domains taking advantage of digital communication. Pattern classification and feature extraction are among the first applications of machine learning that have received extensive attention. The most remarkable achievements have addressed data s… ▽ More

    Submitted 31 August, 2010; originally announced August 2010.

    Comments: International Conference on Artificial Intelligence, WorldComp 2010

    ACM Class: I.2.6; J.2; J.3

    Journal ref: IC-AI CSREA Press (2010) , p. 183-188