Computer Science > Machine Learning

arXiv:2312.06881 (cs)

[Submitted on 11 Dec 2023]

Title:DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers

Authors:Sarin Chandy, Varun Gangal, Yi Yang, Gabriel Maggiotti

View PDF

Abstract:We devise, implement and performance-asses DYAD, a layer which can serve as a faster and more memory-efficient approximate replacement for linear layers, (nn.Linear() in Pytorch). These layers appear in common subcomponents, such as in the ff module of Transformers. DYAD is based on a bespoke near-sparse matrix structure which approximates the dense "weight" matrix W that matrix-multiplies the input in the typical realization of such a layer, a.k.a DENSE. Our alternative near-sparse matrix structure is decomposable to a sum of 2 matrices permutable to a block-sparse counterpart. These can be represented as 3D tensors, which in unison allow a faster execution of matrix multiplication with the mini-batched input matrix X compared to DENSE (O(rows(W ) x cols(W )) --> O( rows(W ) x cols(W ) # of blocks )). As the crux of our experiments, we pretrain both DYAD and DENSE variants of 2 sizes of the OPT arch and 1 size of the Pythia arch, including at different token scales of the babyLM benchmark. We find DYAD to be competitive (>= 90%) of DENSE performance on zero-shot (e.g. BLIMP), few-shot (OPENLM) and finetuning (GLUE) benchmarks, while being >=7-15% faster to train on-GPU even at 125m scale, besides surfacing larger speedups at increasing scale and model width.

Comments:	Accepted at WANT workshop at NeurIPS 2023; code at this https URL
Subjects:	Machine Learning (cs.LG); Computation and Language (cs.CL)
Cite as:	arXiv:2312.06881 [cs.LG]
	(or arXiv:2312.06881v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2312.06881

Submission history

From: Sarin Chandy [view email]
[v1] Mon, 11 Dec 2023 23:04:48 UTC (3,951 KB)

Computer Science > Machine Learning

Title:DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:DYAD: A Descriptive Yet Abjuring Density efficient approximation to linear neural network layers

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators