Showing 1–2 of 2 results for author: Strzodka, R
-
An implementation of tensor product patch smoothers on GPU
Authors:
Cu Cui,
Paul Grosse-Bley,
Guido Kanschat,
Robert Strzodka
Abstract:
We present a GPU implementation of vertex-patch smoothers for higher order finite element methods in two and three dimensions. Analysis shows that they are not memory bound with respect to GPU DRAM, but with respect to on-chip scratchpad memory. Multigrid operations are optimized through localization and reorganized local operations in on-chip memory, achieving minimal global data transfer and a c…
▽ More
We present a GPU implementation of vertex-patch smoothers for higher order finite element methods in two and three dimensions. Analysis shows that they are not memory bound with respect to GPU DRAM, but with respect to on-chip scratchpad memory. Multigrid operations are optimized through localization and reorganized local operations in on-chip memory, achieving minimal global data transfer and a conflict free memory access pattern. Performance tests demonstrate that the optimized kernel is at least 2 times faster than the straightforward implementation for the Poisson problem, across various polynomial degrees in 2D and 3D, achieving up to 36% of the peak performance in both single and double precision on Nvidia A100 GPU.
△ Less
Submitted 30 May, 2024; v1 submitted 29 May, 2024;
originally announced May 2024.
-
Parallel Performance of Algebraic Multigrid Domain Decomposition (AMG-DD)
Authors:
Wayne B. Mitchell,
Robert Strzodka,
Robert D. Falgout
Abstract:
Algebraic multigrid (AMG) is a widely used scalable solver and preconditioner for large-scale linear systems resulting from the discretization of a wide class of elliptic PDEs. While AMG has optimal computational complexity, the cost of communication has become a significant bottleneck that limits its scalability as processor counts continue to grow on modern machines. This paper examines the desi…
▽ More
Algebraic multigrid (AMG) is a widely used scalable solver and preconditioner for large-scale linear systems resulting from the discretization of a wide class of elliptic PDEs. While AMG has optimal computational complexity, the cost of communication has become a significant bottleneck that limits its scalability as processor counts continue to grow on modern machines. This paper examines the design, implementation, and parallel performance of a novel algorithm, Algebraic Multigrid Domain Decomposition (AMG-DD), designed specifically to limit communication. The goal of AMG-DD is to provide a low-communication alternative to standard AMG V-cycles by trading some additional computational overhead for a significant reduction in communication cost. Numerical results show that AMG-DD achieves superior accuracy per communication cost compared to AMG, and speedup over AMG is demonstrated on a large GPU cluster.
△ Less
Submitted 21 January, 2020; v1 submitted 25 June, 2019;
originally announced June 2019.