Skip to main content

Showing 1–1 of 1 results for author: GNVV, S S S N

  1. arXiv:2406.00894  [pdf, other

    cs.LG cs.AI cs.CL

    Pretrained Hybrids with MAD Skills

    Authors: Nicholas Roberts, Samuel Guo, Zhiqi Gao, Satya Sai Srinath Namburi GNVV, Sonia Cromp, Chengjun Wu, Chengyu Duan, Frederic Sala

    Abstract: While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently-proposed $\textit{hybrid architectures}$ seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: i… ▽ More

    Submitted 2 June, 2024; originally announced June 2024.