Skip to main content

Showing 1–23 of 23 results for author: Kjolstad, F

  1. arXiv:2406.18111  [pdf, other

    cs.DC

    Automatic Tracing in Task-Based Runtime Systems

    Authors: Rohan Yadav, Michael Bauer, David Broman, Michael Garland, Alex Aiken, Fredrik Kjolstad

    Abstract: Implicitly parallel task-based runtime systems often perform dynamic analysis to discover dependencies in and extract parallelism from sequential programs. Dependence analysis becomes expensive as task granularity drops below a threshold. Tracing techniques have been developed where programmers annotate repeated program fragments (traces) issued by the application, and the runtime system memoizes… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  2. arXiv:2406.18109  [pdf, other

    cs.DC

    Composing Distributed Computations Through Task and Kernel Fusion

    Authors: Rohan Yadav, Shiv Sundram, Wonchan Lee, Michael Garland, Michael Bauer, Alex Aiken, Fredrik Kjolstad

    Abstract: We introduce Diffuse, a system that dynamically performs task and kernel fusion in distributed, task-based runtime systems. The key component of Diffuse is an intermediate representation of distributed computation that enables the necessary analyses for the fusion of distributed tasks to be performed in a scalable manner. We pair task fusion with a JIT compiler to fuse together the kernels within… ▽ More

    Submitted 26 June, 2024; originally announced June 2024.

  3. arXiv:2405.16883  [pdf, other

    cs.LG cs.AI cs.MS cs.PL

    Scorch: A Library for Sparse Deep Learning

    Authors: Bobby Yan, Alexander J. Root, Trevor Gale, David Broman, Fredrik Kjolstad

    Abstract: The rapid growth in the size of deep learning models strains the capabilities of traditional dense computation paradigms. Leveraging sparse computation has become increasingly popular for training and deploying large-scale models, but existing deep learning frameworks lack extensive support for sparse operations. To bridge this gap, we introduce Scorch, a library that seamlessly integrates efficie… ▽ More

    Submitted 20 June, 2024; v1 submitted 27 May, 2024; originally announced May 2024.

    Comments: 25 pages, 8 figures

  4. arXiv:2404.04541  [pdf, other

    cs.PL

    Compilation of Modular and General Sparse Workspaces

    Authors: Genghan Zhang, Olivia Hsu, Fredrik Kjolstad

    Abstract: Recent years have seen considerable work on compiling sparse tensor algebra expressions. This paper addresses a shortcoming in that work, namely how to generate efficient code (in time and space) that scatters values into a sparse result tensor. We address this shortcoming through a compiler design that generates code that uses sparse intermediate tensors (sparse workspaces) as efficient adapters… ▽ More

    Submitted 6 April, 2024; originally announced April 2024.

    Comments: 30 pages, 27 figures, to be published in Proc. ACM Program. Lang., Vol. 8, No. PLDI, Article 196. Publication date: June 2024

  5. arXiv:2309.04660  [pdf, other

    cs.PL

    Compiling Recurrences over Dense and Sparse Arrays

    Authors: Shiv Sundram, Muhammad Usman Tariq, Fredrik Kjolstad

    Abstract: Recurrence equations lie at the heart of many computational paradigms including dynamic programming, graph analysis, and linear solvers. These equations are often expensive to compute and much work has gone into optimizing them for different situations. The set of recurrence implementations is a large design space across the set of all recurrences (e.g., the Viterbi and Floyd-Warshall algorithms),… ▽ More

    Submitted 8 September, 2023; originally announced September 2023.

  6. arXiv:2302.06124  [pdf, other

    cs.AR

    Revet: A Language and Compiler for Dataflow Threads

    Authors: Alexander Rucker, Shiv Sundram, Coleman Smith, Matthew Vilim, Raghu Prabhakar, Fredrik Kjolstad, Kunle Olukotun

    Abstract: Spatial dataflow architectures such as reconfigurable dataflow accelerators (RDA) can provide much higher performance and efficiency than CPUs and GPUs. In particular, vectorized reconfigurable dataflow accelerators (vRDA) in recent literature represent a design point that enhances the efficiency of dataflow architectures with vectorization. Today, vRDAs can be exploited using either hardcoded ker… ▽ More

    Submitted 30 January, 2024; v1 submitted 13 February, 2023; originally announced February 2023.

    Comments: To appear in HPCA 2024

  7. arXiv:2212.11142  [pdf, other

    cs.PL cs.LG cs.PF

    BaCO: A Fast and Portable Bayesian Compiler Optimization Framework

    Authors: Erik Hellsten, Artur Souza, Johannes Lenfers, Rubens Lacouture, Olivia Hsu, Adel Ejjeh, Fredrik Kjolstad, Michel Steuwer, Kunle Olukotun, Luigi Nardi

    Abstract: We introduce the Bayesian Compiler Optimization framework (BaCO), a general purpose autotuner for modern compilers targeting CPUs, GPUs, and FPGAs. BaCO provides the flexibility needed to handle the requirements of modern autotuning tasks. Particularly, it deals with permutation, ordered, and continuous parameter types along with both known and unknown parameter constraints. To reason about these… ▽ More

    Submitted 11 April, 2023; v1 submitted 1 December, 2022; originally announced December 2022.

  8. arXiv:2211.03251  [pdf, other

    cs.PL cs.AR

    Stardust: Compiling Sparse Tensor Algebra to a Reconfigurable Dataflow Architecture

    Authors: Olivia Hsu, Alexander Rucker, Tian Zhao, Kunle Olukotun, Fredrik Kjolstad

    Abstract: We introduce Stardust, a compiler that compiles sparse tensor algebra to reconfigurable dataflow architectures (RDAs). Stardust introduces new user-provided data representation and scheduling language constructs for mapping to resource-constrained accelerated architectures. Stardust uses the information provided by these constructs to determine on-chip memory placement and to lower to the Capstan… ▽ More

    Submitted 6 November, 2022; originally announced November 2022.

    Comments: 15 pages, 13 figures, 6 tables,

    ACM Class: D.3

  9. arXiv:2209.05250  [pdf

    cs.PL cs.MS

    Looplets: A Language For Structured Coiteration

    Authors: Willow Ahrens, Daniel Donenfeld, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: Real world arrays often contain underlying structure, such as sparsity, runs of repeated values, or symmetry. Specializing for structure yields significant speedups. But automatically generating efficient code for structured data is challenging, especially when arrays with different structure interact. We show how to abstract over array structures so that the compiler can generate code to coiter… ▽ More

    Submitted 8 September, 2022; originally announced September 2022.

  10. The Sparse Abstract Machine

    Authors: Olivia Hsu, Maxwell Strange, Ritvik Sharma, Jaeyeon Won, Kunle Olukotun, Joel Emer, Mark Horowitz, Fredrik Kjolstad

    Abstract: We propose the Sparse Abstract Machine (SAM), an abstract machine model for targeting sparse tensor algebra to reconfigurable and fixed-function spatial dataflow accelerators. SAM defines a streaming dataflow abstraction with sparse primitives that encompass a large space of scheduled tensor algebra expressions. SAM dataflow graphs naturally separate tensor formats from algorithms and are expressi… ▽ More

    Submitted 23 March, 2023; v1 submitted 30 August, 2022; originally announced August 2022.

    Comments: 18 pages, 17 figures, 3 tables

    Journal ref: ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems Volume 3 (2023) 710-726

  11. arXiv:2207.13901  [pdf, other

    cs.DC cs.PL

    SpDISTAL: Compiling Distributed Sparse Tensor Computations

    Authors: Rohan Yadav, Alex Aiken, Fredrik Kjolstad

    Abstract: We introduce SpDISTAL, a compiler for sparse tensor algebra that targets distributed systems. SpDISTAL combines separate descriptions of tensor algebra expressions, sparse data structures, data distribution, and computation distribution. Thus, it enables distributed execution of sparse tensor algebra expressions with a wide variety of sparse data structures and data distributions. SpDISTAL is impl… ▽ More

    Submitted 28 July, 2022; originally announced July 2022.

  12. arXiv:2207.13291  [pdf, other

    cs.PL

    Correct Compilation of Semiring Contractions

    Authors: Scott Kovach, Fredrik Kjolstad

    Abstract: We introduce a formal operational semantics that describes the fused execution of variable contraction problems, which compute indexed arithmetic over a semiring and generalize sparse and dense tensor algebra, relational algebra, and graph algorithms. We prove that the model is correct with respect to a functional semantics. We also develop a compiler for variable contraction expressions and show… ▽ More

    Submitted 27 July, 2022; originally announced July 2022.

    ACM Class: F.3.1; F.3.2

  13. DISTAL: The Distributed Tensor Algebra Compiler

    Authors: Rohan Yadav, Alex Aiken, Fredrik Kjolstad

    Abstract: We introduce DISTAL, a compiler for dense tensor algebra that targets modern distributed and heterogeneous systems. DISTAL lets users independently describe how tensors and computation map onto target machines through separate format and scheduling languages. The combination of choices for data and computation distribution creates a large design space that includes many algorithms from both the pa… ▽ More

    Submitted 17 March, 2022; v1 submitted 15 March, 2022; originally announced March 2022.

  14. Compiler Support for Sparse Tensor Computations in MLIR

    Authors: Aart J. C. Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, Fredrik Kjolstad

    Abstract: Sparse tensors arise in problems in science, engineering, machine learning, and data analytics. Programs that operate on such tensors can exploit sparsity to reduce storage requirements and computational time. Developing and maintaining sparse software by hand, however, is a complex and error-prone task. Therefore, we propose treating sparsity as a property of tensors, not a tedious implementation… ▽ More

    Submitted 9 February, 2022; originally announced February 2022.

  15. arXiv:2111.14947  [pdf, other

    cs.MS cs.PL

    An Asymptotic Cost Model for Autoscheduling Sparse Tensor Programs

    Authors: Willow Ahrens, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: While loop reordering and fusion can make big impacts on the constant-factor performance of dense tensor programs, the effects on sparse tensor programs are asymptotic, often leading to orders of magnitude performance differences in practice. Sparse tensors also introduce a choice of compressed storage formats that can have asymptotic effects. Research into sparse tensor compilers has led to sim… ▽ More

    Submitted 29 November, 2021; originally announced November 2021.

    Comments: 14 pages, 7 figures

  16. arXiv:2110.00186  [pdf, other

    cs.MS cs.PL

    An Attempt to Generate Code for Symmetric Tensor Computations

    Authors: Jessica Shi, Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: This document describes an attempt to develop a compiler-based approach for computations with symmetric tensors. Given a computation and the symmetries of its input tensors, we derive formulas for random access under a storage scheme that eliminates redundancies; construct intermediate representations to describe the loop structure; and translate this information, using the taco tensor algebra com… ▽ More

    Submitted 30 September, 2021; originally announced October 2021.

  17. arXiv:2105.12858  [pdf, other

    cs.AR

    Compiling Halide Programs to Push-Memory Accelerators

    Authors: Qiaoyi Liu, Dillon Huff, Jeff Setter, Maxwell Strange, Kathleen Feng, Kavya Sreedhar, Ziheng Wang, Keyi Zhang, Mark Horowitz, Priyanka Raina, Fredrik Kjolstad

    Abstract: Image processing and machine learning applications benefit tremendously from hardware acceleration, but existing compilers target either FPGAs, which sacrifice power and performance for flexible hardware, or ASICs, which rapidly become obsolete as applications change. Programmable domain-specific accelerators have emerged as a promising middle-ground between these two extremes, but such architectu… ▽ More

    Submitted 26 May, 2021; originally announced May 2021.

  18. Copy-and-Patch Compilation: A fast compilation algorithm for high-level languages and bytecode

    Authors: Haoran Xu, Fredrik Kjolstad

    Abstract: Fast compilation is important when compilation occurs at runtime, such as query compilers in modern database systems and WebAssembly virtual machines in modern browsers. We present copy-and-patch, an extremely fast compilation technique that also produces good quality code. It is capable of lowering both high-level languages and low-level bytecode programs to binary code, by stitching together cod… ▽ More

    Submitted 14 September, 2021; v1 submitted 26 November, 2020; originally announced November 2020.

  19. Sparse Tensor Transpositions

    Authors: Suzanne Mueller, Willow Ahrens, Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: We present a new algorithm for transposing sparse tensors called Quesadilla. The algorithm converts the sparse tensor data structure to a list of coordinates and sorts it with a fast multi-pass radix algorithm that exploits knowledge of the requested transposition and the tensors input partial coordinate ordering to provably minimize the number of parallel partial sorting passes. We evaluate bot… ▽ More

    Submitted 20 May, 2020; originally announced May 2020.

    Comments: This work will be the subject of a brief announcement at the 32nd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '20)

  20. Automatic Generation of Efficient Sparse Tensor Format Conversion Routines

    Authors: Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: This paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping, analysis, and assembly. We then develop a language that precisely describes how different formats group together and order a tensor's nonzeros in… ▽ More

    Submitted 29 June, 2020; v1 submitted 8 January, 2020; originally announced January 2020.

    Comments: Presented at PLDI 2020

  21. arXiv:2001.00532  [pdf, other

    cs.MS cs.PL

    A Unified Iteration Space Transformation Framework for Sparse and Dense Tensor Algebra

    Authors: Ryan Senanayake, Fredrik Kjolstad, Changwan Hong, Shoaib Kamil, Saman Amarasinghe

    Abstract: We address the problem of optimizing mixed sparse and dense tensor algebra in a compiler. We show that standard loop transformations, such as strip-mining, tiling, collapsing, parallelization and vectorization, can be applied to irregular loops over sparse iteration spaces. We also show how these transformations can be applied to the contiguous value arrays of sparse tensor data structures, which… ▽ More

    Submitted 28 December, 2019; originally announced January 2020.

  22. arXiv:1804.10112  [pdf, other

    cs.MS cs.PL

    Format Abstraction for Sparse Tensor Algebra Compilers

    Authors: Stephen Chou, Fredrik Kjolstad, Saman Amarasinghe

    Abstract: This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DI… ▽ More

    Submitted 11 November, 2018; v1 submitted 23 April, 2018; originally announced April 2018.

    Comments: Presented at OOPSLA 2018

    Journal ref: Proc. ACM Program. Lang. 2, OOPSLA, Article 123 (November 2018)

  23. arXiv:1802.10574  [pdf, other

    cs.MS cs.PL

    Sparse Tensor Algebra Optimizations with Workspaces

    Authors: Fredrik Kjolstad, Willow Ahrens, Shoaib Kamil, Saman Amarasinghe

    Abstract: This paper shows how to optimize sparse tensor algebraic expressions by introducing temporary tensors, called workspaces, into the resulting loop nests. We develop a new intermediate language for tensor operations called concrete index notation that extends tensor index notation. Concrete index notation expresses when and where sub-computations occur and what tensor they are stored into. We then… ▽ More

    Submitted 24 April, 2018; v1 submitted 28 February, 2018; originally announced February 2018.

    Comments: 25 pages