Skip to main content

Showing 1–6 of 6 results for author: Marshall, W

  1. arXiv:2308.16149  [pdf, other

    cs.CL cs.AI cs.LG

    Jais and Jais-chat: Arabic-Centric Foundation and Instruction-Tuned Open Generative Large Language Models

    Authors: Neha Sengupta, Sunil Kumar Sahu, Bokang Jia, Satheesh Katipomu, Haonan Li, Fajri Koto, William Marshall, Gurpreet Gosal, Cynthia Liu, Zhiming Chen, Osama Mohammed Afzal, Samta Kamboj, Onkar Pandit, Rahul Pal, Lalit Pradhan, Zain Muhammad Mujahid, Massa Baali, Xudong Han, Sondos Mahmoud Bsharat, Alham Fikri Aji, Zhiqiang Shen, Zhengzhong Liu, Natalia Vassilieva, Joel Hestness, Andy Hock , et al. (7 additional authors not shown)

    Abstract: We introduce Jais and Jais-chat, new state-of-the-art Arabic-centric foundation and instruction-tuned open generative large language models (LLMs). The models are based on the GPT-3 decoder-only architecture and are pretrained on a mixture of Arabic and English texts, including source code in various programming languages. With 13 billion parameters, they demonstrate better knowledge and reasoning… ▽ More

    Submitted 29 September, 2023; v1 submitted 30 August, 2023; originally announced August 2023.

    Comments: Arabic-centric, foundation model, large-language model, LLM, generative model, instruction-tuned, Jais, Jais-chat

    MSC Class: 68T50 ACM Class: F.2.2; I.2.7

  2. arXiv:2304.03208  [pdf, other

    cs.LG cs.CL

    Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster

    Authors: Nolan Dey, Gurpreet Gosal, Zhiming, Chen, Hemant Khachane, William Marshall, Ribhu Pathria, Marvin Tom, Joel Hestness

    Abstract: We study recent research advances that improve large language models through efficient pre-training and scaling, and open datasets and tools. We combine these advances to introduce Cerebras-GPT, a family of open compute-optimal language models scaled from 111M to 13B parameters. We train Cerebras-GPT models on the Eleuther Pile dataset following DeepMind Chinchilla scaling rules for efficient pre-… ▽ More

    Submitted 6 April, 2023; originally announced April 2023.

    Comments: 13 pages main text, 16 pages appendix, 13 figures

  3. arXiv:2303.10464  [pdf, other

    cs.LG cs.CL

    SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

    Authors: Vithursan Thangarasa, Abhay Gupta, William Marshall, Tianda Li, Kevin Leong, Dennis DeCoste, Sean Lie, Shreyas Saxena

    Abstract: The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Sca… ▽ More

    Submitted 29 July, 2023; v1 submitted 18 March, 2023; originally announced March 2023.

    Comments: Accepted to Uncertainty in Artificial Intelligence (UAI) 2023 Conference; 13 pages, 4 figures (Main Paper) + 5 pages (Supplementary Material)

  4. arXiv:2011.14929  [pdf, other

    cs.DS cs.GT

    Windowed Prophet Inequalities

    Authors: William Marshall, Nolan Miranda, Albert Zuo

    Abstract: The prophet inequalities problem has received significant study over the past decades and has several applications such as to online auctions. In this paper, we study two variants of the i.i.d. prophet inequalities problem, namely the windowed prophet inequalities problem and the batched prophet inequalities problem. For the windowed prophet inequalities problem, we show that for window size… ▽ More

    Submitted 27 November, 2020; originally announced November 2020.

    Comments: 18 pages

  5. arXiv:1712.09644  [pdf, other

    q-bio.NC cs.AI q-bio.QM

    PyPhi: A toolbox for integrated information theory

    Authors: William G. P. Mayner, William Marshall, Larissa Albantakis, Graham Findlay, Robert Marchman, Giulio Tononi

    Abstract: Integrated information theory provides a mathematical framework to fully characterize the cause-effect structure of a physical system. Here, we introduce PyPhi, a Python software package that implements this framework for causal analysis and unfolds the full cause-effect structure of discrete dynamical systems of binary elements. The software allows users to easily study these structures, serves a… ▽ More

    Submitted 27 June, 2018; v1 submitted 27 December, 2017; originally announced December 2017.

    Comments: 22 pages, 4 figures, 6 pages of appendices. Supporting information "S1 Calculating Phi" can be found in the ancillary files

    Journal ref: PLOS Computational Biology 14(7): e1006343. 2018

  6. arXiv:1708.06716  [pdf, other

    cs.AI math.ST

    What caused what? A quantitative account of actual causation using dynamical causal networks

    Authors: Larissa Albantakis, William Marshall, Erik Hoel, Giulio Tononi

    Abstract: Actual causation is concerned with the question "what caused what?" Consider a transition between two states within a system of interacting elements, such as an artificial neural network, or a biological brain circuit. Which combination of synapses caused the neuron to fire? Which image features caused the classifier to misinterpret the picture? Even detailed knowledge of the system's causal netwo… ▽ More

    Submitted 9 January, 2019; v1 submitted 22 August, 2017; originally announced August 2017.

    Comments: 43 pages, 16 figures, supplementary discussion, supplementary methods, supplementary proofs