Skip to main content

Showing 1–24 of 24 results for author: Yelick, K

  1. RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs

    Authors: Benjamin Brock, Aydın Buluç, Katherine Yelick

    Abstract: Sparse matrix multiplication is an important kernel for large-scale graph processing and other data-intensive applications. In this paper, we implement various asynchronous, RDMA-based sparse times dense (SpMM) and sparse times sparse (SpGEMM) algorithms, evaluating their performance running in a distributed memory setting on GPUs. Our RDMA-based implementations use the NVSHMEM communication libra… ▽ More

    Submitted 31 May, 2024; v1 submitted 29 November, 2023; originally announced November 2023.

    Comments: To appear in ACM International Conference on Supercomputing (ICS) 2024

    Journal ref: Proceedings of the 38th ACM International Conference on Supercomputing (ICS 2024) 225-235

  2. arXiv:2311.02909  [pdf, other

    cs.LG cs.DC cs.PF

    Distributed Matrix-Based Sampling for Graph Neural Network Training

    Authors: Alok Tripathy, Katherine Yelick, Aydin Buluc

    Abstract: Graph Neural Networks (GNNs) offer a compact and computationally efficient way to learn embeddings and classifications on graph data. GNN models are frequently large, making distributed minibatch training necessary. The primary contribution of this paper is new methods for reducing communication in the sampling step for distributed GNN training. Here, we propose a matrix-based bulk sampling appr… ▽ More

    Submitted 19 April, 2024; v1 submitted 6 November, 2023; originally announced November 2023.

    Comments: Proceedings of Machine Learning and Systems

  3. arXiv:2303.01845  [pdf, other

    cs.DC cs.PF q-bio.GN

    Extreme-scale many-against-many protein similarity search

    Authors: Oguz Selvitopi, Saliya Ekanayake, Giulia Guidi, Muaaz G. Awan, Georgios A. Pavlopoulos, Ariful Azad, Nikos Kyrpides, Leonid Oliker, Katherine Yelick, Aydın Buluç

    Abstract: Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405… ▽ More

    Submitted 3 March, 2023; originally announced March 2023.

    Comments: 2022 ACM Gordon Bell Prize Finalist

    Journal ref: SC'22: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2022

  4. arXiv:2212.09005  [pdf, other

    cs.DC cs.DS

    High-Performance Filters For GPUs

    Authors: Hunter McCoy, Steven Hofmeyr, Katherine Yelick, Prashant Pandey

    Abstract: Filters approximately store a set of items while trading off accuracy for space-efficiency and can address the limited memory on accelerators, such as GPUs. However, there is a lack of high-performance and feature-rich GPU filters as most advancements in filter research has focused on CPUs. In this paper, we explore the design space of filters with a goal to develop massively parallel, high perf… ▽ More

    Submitted 17 December, 2022; originally announced December 2022.

    Comments: Published at PPOPP 2023

  5. Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

    Authors: Giulia Guidi, Gabriel Raulet, Daniel Rokhsar, Leonid Oliker, Katherine Yelick, Aydin Buluc

    Abstract: De novo genome assembly, i.e., rebuilding the sequence of an unknown genome from redundant and erroneous short sequences, is a key but computationally intensive step in many genomics pipelines. The exponential growth of genomic data is increasing the computational demand and requires scalable, high-performance approaches. In this work, we present a novel distributed-memory algorithm that, from a s… ▽ More

    Submitted 9 July, 2022; originally announced July 2022.

    Comments: ICPP22, August 29-September 1, 2022, Bordeaux, France

    Journal ref: ICPP22, August 29-September 1, 2022, Bordeaux, France

  6. arXiv:2112.00132  [pdf, other

    cs.DC

    Atos: A Task-Parallel GPU Dynamic Scheduling Framework for Dynamic Irregular Computations

    Authors: Yuxin Chen, Benjamin Brock, Serban Porumbescu, Aydın Buluç, Katherine Yelick, John D. Owens

    Abstract: We present Atos, a task-parallel GPU dynamic scheduling framework that is especially suited to dynamic irregular applications. Compared to the dominant Bulk Synchronous Parallel (BSP) frameworks, Atos exposes additional concurrency by supporting task-parallel formulations of applications with relaxed dependencies, achieving higher GPU utilization, which is particularly significant for problems wit… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: 12 pages, 4 figures

  7. 10 Years Later: Cloud Computing is Closing the Performance Gap

    Authors: Giulia Guidi, Marquita Ellis, Aydin Buluc, Katherine Yelick, David Culler

    Abstract: Can cloud computing infrastructures provide HPC-competitive performance for scientific applications broadly? Despite prolific related literature, this question remains open. Answers are crucial for designing future systems and democratizing high-performance computing. We present a multi-level approach to investigate the performance gap between HPC and cloud computing, isolating different variables… ▽ More

    Submitted 5 March, 2021; v1 submitted 1 November, 2020; originally announced November 2020.

    Journal ref: Companion of the 2021 ACM/SPEC International Conference on Performance Engineering (ICPE21 Companion)

  8. arXiv:2010.16027  [pdf, other

    q-bio.BM cs.LG math.AT

    PersGNN: Applying Topological Data Analysis and Geometric Deep Learning to Structure-Based Protein Function Prediction

    Authors: Nicolas Swenson, Aditi S. Krishnapriyan, Aydin Buluc, Dmitriy Morozov, Katherine Yelick

    Abstract: Understanding protein structure-function relationships is a key challenge in computational biology, with applications across the biotechnology and pharmaceutical industries. While it is known that protein structure directly impacts protein function, many functional prediction tasks use only protein sequence. In this work, we isolate protein structure to make functional annotations for proteins in… ▽ More

    Submitted 29 October, 2020; originally announced October 2020.

    Comments: The first two authors contributed equally to this work

  9. arXiv:2010.10055  [pdf, other

    cs.DC q-bio.GN

    Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly

    Authors: Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine Yelick, Aydin Buluc

    Abstract: One of the most computationally intensive tasks in computational biology is de novo genome assembly, the decoding of the sequence of an unknown genome from redundant and erroneous short sequences. A common assembly paradigm identifies overlapping sequences, simplifies their layout, and creates consensus. Despite many algorithms developed in the literature, the efficient assembly of large genomes i… ▽ More

    Submitted 20 October, 2020; originally announced October 2020.

  10. arXiv:2008.00023  [pdf

    cs.CY cs.AR

    Opportunities and Challenges for Next Generation Computing

    Authors: Gregory D. Hager, Mark D. Hill, Katherine Yelick

    Abstract: Computing has dramatically changed nearly every aspect of our lives, from business and agriculture to communication and entertainment. As a nation, we rely on computing in the design of systems for energy, transportation and defense; and computing fuels scientific discoveries that will improve our fundamental understanding of the world and help develop solutions to major challenges in health and t… ▽ More

    Submitted 31 July, 2020; originally announced August 2020.

    Comments: A Computing Community Consortium (CCC) white paper, 7 pages

  11. arXiv:2005.03300  [pdf, other

    cs.LG cs.DC stat.ML

    Reducing Communication in Graph Neural Network Training

    Authors: Alok Tripathy, Katherine Yelick, Aydin Buluc

    Abstract: Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks.… ▽ More

    Submitted 2 September, 2020; v1 submitted 7 May, 2020; originally announced May 2020.

    Comments: To appear in International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'20)

  12. arXiv:2002.05200  [pdf, other

    q-bio.GN cs.DC

    LOGAN: High-Performance GPU-Based X-Drop Long-Read Alignment

    Authors: Alberto Zeni, Giulia Guidi, Marquita Ellis, Nan Ding, Marco D. Santambrogio, Steven Hofmeyr, Aydın Buluç, Leonid Oliker, Katherine Yelick

    Abstract: Pairwise sequence alignment is one of the most computationally intensive kernels in genomic data analysis, accounting for more than 90% of the runtime for key bioinformatics applications. This method is particularly expensive for third-generation sequences due to the high computational cost of analyzing sequences of length between 1Kb and 1Mb. Given the quadratic overhead of exact pairwise algorit… ▽ More

    Submitted 12 February, 2020; originally announced February 2020.

    Journal ref: 34th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020

  13. diBELLA: Distributed Long Read to Long Read Alignment

    Authors: Marquita Ellis, Giulia Guidi, Aydın Buluç, Leonid Oliker, Katherine Yelick

    Abstract: We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers. While long sequences of DNA offer enormous advantages for biological analysis and insight, current long read sequencing instruments have high error rates and therefore require different approaches to anal… ▽ More

    Submitted 27 January, 2020; originally announced January 2020.

    Comments: This is the authors' preprint of the article that appears in the proceedings of ICPP 2019, the 48th International Conference on Parallel Processing

    Journal ref: In Proceedings of the 48th International Conference on Parallel Processing (ICPP 2019). Association for Computing Machinery, New York, NY, USA, Article 70, 1-11

  14. The Parallelism Motifs of Genomic Data Analysis

    Authors: Katherine Yelick, Aydin Buluc, Muaaz Awan, Ariful Azad, Benjamin Brock, Rob Egan, Saliya Ekanayake, Marquita Ellis, Evangelos Georganas, Giulia Guidi, Steven Hofmeyr, Oguz Selvitopi, Cristina Teodoropol, Leonid Oliker

    Abstract: Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from… ▽ More

    Submitted 20 January, 2020; originally announced January 2020.

  15. arXiv:1910.02158  [pdf, other

    cs.DC

    RDMA vs. RPC for Implementing Distributed Data Structures

    Authors: Benjamin Brock, Yuxin Chen, Jiakun Yan, John D. Owens, Aydın Buluç, Katherine Yelick

    Abstract: Distributed data structures are key to implementing scalable applications for scientific simulations and data analysis. In this paper we look at two implementation styles for distributed data structures: remote direct memory access (RDMA) and remote procedure call (RPC). We focus on operations that require individual accesses to remote portions of a distributed data structure, e.g., accessing a ha… ▽ More

    Submitted 14 October, 2019; v1 submitted 4 October, 2019; originally announced October 2019.

  16. BCL: A Cross-Platform Distributed Container Library

    Authors: Benjamin Brock, Aydın Buluç, Katherine Yelick

    Abstract: One-sided communication is a useful paradigm for irregular parallel applications, but most one-sided programming environments, including MPI's one-sided interface and PGAS programming languages, lack application level libraries to support these applications. We present the Berkeley Container Library, a set of generic, cross-platform, high-performance data structures for irregular applications, inc… ▽ More

    Submitted 15 December, 2023; v1 submitted 30 October, 2018; originally announced October 2018.

    Journal ref: Proceedings of the 48th International Conference on Parallel Processing (ICPP 2019) 1-10

  17. arXiv:1809.07014  [pdf, other

    cs.DC q-bio.GN

    Extreme Scale De Novo Metagenome Assembly

    Authors: Evangelos Georganas, Rob Egan, Steven Hofmeyr, Eugene Goltsman, Bill Arndt, Andrew Tritt, Aydin Buluc, Leonid Oliker, Katherine Yelick

    Abstract: Metagenome assembly is the process of transforming a set of short, overlapping, and potentially erroneous DNA segments from environmental samples into the accurate representation of the underlying microbiomes's genomes. State-of-the-art tools require big shared memory machines and cannot handle contemporary metagenome datasets that exceed Terabytes in size. In this paper, we introduce the MetaHipM… ▽ More

    Submitted 19 September, 2018; originally announced September 2018.

    Comments: Accepted to SC18

  18. arXiv:1710.10769  [pdf, other

    stat.ML cs.DC cs.LG

    Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

    Authors: Penporn Koanantakool, Alnur Ali, Ariful Azad, Aydin Buluc, Dmitriy Morozov, Leonid Oliker, Katherine Yelick, Sang-Yun Oh

    Abstract: Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CO… ▽ More

    Submitted 8 April, 2018; v1 submitted 30 October, 2017; originally announced October 2017.

    Comments: Main paper: 15 pages, appendix: 24 pages

    Journal ref: Artificial Intelligence and Statistics vol. 84 1376-1386 (2018)

  19. arXiv:1707.00599  [pdf

    cs.CY

    Advanced Cyberinfrastructure for Science, Engineering, and Public Policy

    Authors: Vasant G. Honavar, Katherine Yelick, Klara Nahrstedt, Holly Rushmeier, Jennifer Rexford, Mark D. Hill, Elizabeth Bradley, Elizabeth Mynatt

    Abstract: Progress in many domains increasingly benefits from our ability to view the systems through a computational lens, i.e., using computational abstractions of the domains; and our ability to acquire, share, integrate, and analyze disparate types of data. These advances would not be possible without the advanced data and computational cyberinfrastructure and tools for data capture, integration, analys… ▽ More

    Submitted 30 June, 2017; originally announced July 2017.

    Comments: A Computing Community Consortium (CCC) white paper, 9 pages. arXiv admin note: text overlap with arXiv:1604.02006

  20. arXiv:1705.11147  [pdf, other

    cs.DC

    Extreme-Scale De Novo Genome Assembly

    Authors: Evangelos Georganas, Steven Hofmeyr, Rob Egan, Aydin Buluc, Leonid Oliker, Daniel Rokhsar, Katherine Yelick

    Abstract: De novo whole genome assembly reconstructs genomic sequence from short, overlapping, and potentially erroneous DNA segments and is one of the most important computations in modern genomics. This work presents HipMER, a high-quality end-to-end de novo assembler designed for extreme scale analysis, via efficient parallelization of the Meraculous code. Genome assembly software has many components, ea… ▽ More

    Submitted 31 May, 2017; originally announced May 2017.

    Comments: To appear as a chapter in Exascale Scientific Applications: Programming Approaches for Scalability, Performance, and Portability, Straatsma, Antypas, Williams (editors), CRC Press, 2017

  21. arXiv:1609.06756  [pdf

    cs.CY

    21st Century Computer Architecture

    Authors: Mark D. Hill, Sarita Adve, Luis Ceze, Mary Jane Irwin, David Kaeli, Margaret Martonosi, Josep Torrellas, Thomas F. Wenisch, David Wood, Katherine Yelick

    Abstract: Because most technology and computer architecture innovations were (intentionally) invisible to higher layers, application and other software developers could reap the benefits of this progress without engaging in it. Higher performance has both made more computationally demanding applications feasible (e.g., virtual assistants, computer vision) and made less demanding applications easier to devel… ▽ More

    Submitted 21 September, 2016; originally announced September 2016.

    Comments: A Computing Community Consortium (CCC) white paper, 16 pages

  22. arXiv:1608.00044  [pdf, other

    cs.MS

    An Asynchronous Task-based Fan-Both Sparse Cholesky Solver

    Authors: Mathias Jacquelin, Yili Zheng, Esmond Ng, Katherine Yelick

    Abstract: Systems of linear equations arise at the heart of many scientific and engineering applications. Many of these linear systems are sparse; i.e., most of the elements in the coefficient matrix are zero. Direct methods based on matrix factorizations are sometimes needed to ensure accurate solutions. For example, accurate solution of sparse linear systems is needed in shift-invert Lanczos to compute in… ▽ More

    Submitted 23 August, 2016; v1 submitted 29 July, 2016; originally announced August 2016.

  23. arXiv:1604.02006  [pdf

    cs.CY cs.AI cs.DC cs.HC

    Accelerating Science: A Computing Research Agenda

    Authors: Vasant G. Honavar, Mark D. Hill, Katherine Yelick

    Abstract: The emergence of "big data" offers unprecedented opportunities for not only accelerating scientific advances but also enabling new modes of discovery. Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, i.e., using algorithmic or information processing abstractions of the underlying processes; and our ability t… ▽ More

    Submitted 6 April, 2016; originally announced April 2016.

    Comments: Computing Community Consortium (CCC) white paper, 17 pages

  24. arXiv:1308.0068  [pdf, other

    math.CA cs.CC cs.DS

    Communication lower bounds and optimal algorithms for programs that reference arrays -- Part 1

    Authors: Michael Christ, James Demmel, Nicholas Knight, Thomas Scanlon, Katherine Yelick

    Abstract: The movement of data (communication) between levels of a memory hierarchy, or between parallel processors on a network, can greatly dominate the cost of computation, so algorithms that minimize communication are of interest. Motivated by this, attainable lower bounds for the amount of communication required by algorithms were established by several groups for a variety of algorithms, including mat… ▽ More

    Submitted 31 July, 2013; originally announced August 2013.

    MSC Class: 68W40; 26D15