Skip to main content

Showing 1–10 of 10 results for author: Ryder, N

  1. arXiv:2303.08774  [pdf, other

    cs.CL cs.AI

    GPT-4 Technical Report

    Authors: OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, Jake Berdine, Gabriel Bernadett-Shapiro, Christopher Berner, Lenny Bogdonoff, Oleg Boiko , et al. (256 additional authors not shown)

    Abstract: We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based mo… ▽ More

    Submitted 4 March, 2024; v1 submitted 15 March, 2023; originally announced March 2023.

    Comments: 100 pages; updated authors list; fixed author names and added citation

  2. arXiv:2203.03466  [pdf, other

    cs.LG cond-mat.dis-nn cs.NE

    Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

    Authors: Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

    Abstract: Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters. We show that, in the recently discovered Maximal Update Parametrization (muP), many optimal HPs remain stable even as model size changes. This leads to a new HP tuning paradigm we call muTransfer: parametrize the target model in muP, tune the HP indirectly on… ▽ More

    Submitted 28 March, 2022; v1 submitted 7 March, 2022; originally announced March 2022.

    Comments: NeurIPS 2021

  3. arXiv:2109.13956  [pdf, ps, other

    cs.DS cs.SC math.NA math.OC

    Bit Complexity of Jordan Normal Form and Spectral Factorization

    Authors: Papri Dey, Ravi Kannan, Nick Ryder, Nikhil Srivastava

    Abstract: We study the bit complexity of two related fundamental computational problems in linear algebra and control theory. Our results are: (1) An $\tilde{O}(n^{ω+3}a+n^4a^2+n^ω\log(1/ε))$ time algorithm for finding an $ε-$approximation to the Jordan Normal form of an integer matrix with $a-$bit entries, where $ω$ is the exponent of matrix multiplication. (2) An… ▽ More

    Submitted 25 November, 2022; v1 submitted 28 September, 2021; originally announced September 2021.

    Comments: 19pp

    Journal ref: ITCS 2023

  4. arXiv:2107.03374  [pdf, other

    cs.LG

    Evaluating Large Language Models Trained on Code

    Authors: Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter , et al. (33 additional authors not shown)

    Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J sol… ▽ More

    Submitted 14 July, 2021; v1 submitted 7 July, 2021; originally announced July 2021.

    Comments: corrected typos, added references, added authors, added acknowledgements

  5. arXiv:2010.14701  [pdf, other

    cs.LG cs.CL cs.CV

    Scaling Laws for Autoregressive Generative Modeling

    Authors: Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B. Brown, Prafulla Dhariwal, Scott Gray, Chris Hallacy, Benjamin Mann, Alec Radford, Aditya Ramesh, Nick Ryder, Daniel M. Ziegler, John Schulman, Dario Amodei, Sam McCandlish

    Abstract: We identify empirical scaling laws for the cross-entropy loss in four domains: generative image modeling, video modeling, multimodal image$\leftrightarrow$text models, and mathematical problem solving. In all cases autoregressive Transformers smoothly improve in performance as model size and compute budgets increase, following a power-law plus constant scaling law. The optimal model size also depe… ▽ More

    Submitted 5 November, 2020; v1 submitted 27 October, 2020; originally announced October 2020.

    Comments: 20+17 pages, 33 figures; added appendix with additional language results

  6. arXiv:2005.14165  [pdf, other

    cs.CL

    Language Models are Few-Shot Learners

    Authors: Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess , et al. (6 additional authors not shown)

    Abstract: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few… ▽ More

    Submitted 22 July, 2020; v1 submitted 28 May, 2020; originally announced May 2020.

    Comments: 40+32 pages

  7. arXiv:1906.09489  [pdf, other

    cs.LG stat.ML

    Asymmetric Random Projections

    Authors: Nick Ryder, Zohar Karnin, Edo Liberty

    Abstract: Random projections (RP) are a popular tool for reducing dimensionality while preserving local geometry. In many applications the data set to be projected is given to us in advance, yet the current RP techniques do not make use of information about the data. In this paper, we provide a computationally light way to extract statistics from the data that allows designing a data dependent RP with super… ▽ More

    Submitted 22 June, 2019; originally announced June 2019.

    Comments: 14 pages, 5 figures

  8. arXiv:1801.00843  [pdf, ps, other

    cs.CC

    The geometry of rank decompositions of matrix multiplication II: $3\times 3$ matrices

    Authors: Grey Ballard, Christian Ikenmeyer, J. M. Landsberg, Nick Ryder

    Abstract: This is the second in a series of papers on rank decompositions of the matrix multiplication tensor. We present new rank $23$ decompositions for the $3\times 3$ matrix multiplication tensor $M_{\langle 3\rangle}$. All our decompositions have symmetry groups that include the standard cyclic permutation of factors but otherwise exhibit a range of behavior. One of them has 11 cubes as summands and ad… ▽ More

    Submitted 2 January, 2018; originally announced January 2018.

    MSC Class: 68Q17; 14L30; 15A69

  9. arXiv:1711.11497  [pdf, other

    math.OC cs.CC

    Exponential lower bounds on spectrahedral representations of hyperbolicity cones

    Authors: Prasad Raghavendra, Nick Ryder, Nikhil Srivastava, Benjamin Weitz

    Abstract: The Generalized Lax Conjecture asks whether every hyperbolicity cone is a section of a semidefinite cone of sufficiently high dimension. We prove that the space of hyperbolicity cones of hyperbolic polynomials of degree $d$ in $n$ variables contains $(n/d)^{Ω(d)}$ pairwise distant cones in a certain metric, and therefore that any semidefinite representation of such cones must have dimension at lea… ▽ More

    Submitted 12 January, 2018; v1 submitted 30 November, 2017; originally announced November 2017.

    Comments: Fixed a mistake in the proof of Lemma 6. The statement is unchanged except for constant factors, and the main theorem is unaffected. Wrote a slightly stronger statement for the main theorem, emphasizing approximate representations (the proof is the same). Added one figure

  10. arXiv:1610.00209  [pdf, ps, other

    cs.DS

    Real Stability Testing

    Authors: Prasad Raghavendra, Nick Ryder, Nikhil Srivastava

    Abstract: We give a strongly polynomial time algorithm which determines whether or not a bivariate polynomial is real stable. As a corollary, this implies an algorithm for testing whether a given linear transformation on univariate polynomials preserves real-rootedness. The proof exploits properties of hyperbolic polynomials to reduce real stability testing to testing nonnegativity of a finite number of pol… ▽ More

    Submitted 1 October, 2016; originally announced October 2016.