Skip to main content

Showing 1–50 of 59 results for author: Markidis, S

  1. arXiv:2407.00394  [pdf, other

    physics.plasm-ph cs.DC cs.PF physics.comp-ph

    Understanding Large-Scale Plasma Simulation Challenges for Fusion Energy on Supercomputers

    Authors: Jeremy J. Williams, Ashish Bhole, Dylan Kierans, Matthias Hoelzl, Ihor Holod, Weikang Tang, David Tskhakaya, Stefan Costea, Leon Kos, Ales Podolnik, Jakub Hromadka, JOREK Team, Erwin Laure, Stefano Markidis

    Abstract: Understanding plasma instabilities is essential for achieving sustainable fusion energy, with large-scale plasma simulations playing a crucial role in both the design and development of next-generation fusion energy devices and the modelling of industrial plasmas. To achieve sustainable fusion energy, it is essential to accurately model and predict plasma behavior under extreme conditions, requiri… ▽ More

    Submitted 29 June, 2024; originally announced July 2024.

    Comments: Accepted by EPS PLASMA 2024 (50th European Physical Society Conference on Plasma Physics), prepared in the standardized EPS conference proceedings format and consists of 4 pages, which includes the main text, references, and figures

  2. arXiv:2406.19058  [pdf, other

    physics.comp-ph cs.DC cs.PF physics.plasm-ph

    Understanding the Impact of openPMD on BIT1, a Particle-in-Cell Monte Carlo Code, through Instrumentation, Monitoring, and In-Situ Analysis

    Authors: Jeremy J. Williams, Stefan Costea, Allen D. Malony, David Tskhakaya, Leon Kos, Ales Podolnik, Jakub Hromadka, Kevin Huck, Erwin Laure, Stefano Markidis

    Abstract: Particle-in-Cell Monte Carlo simulations on large-scale systems play a fundamental role in understanding the complexities of plasma dynamics in fusion devices. Efficient handling and analysis of vast datasets are essential for advancing these simulations. Previously, we addressed this challenge by integrating openPMD with BIT1, a Particle-in-Cell Monte Carlo code, streamlining data streaming and s… ▽ More

    Submitted 27 June, 2024; originally announced June 2024.

    Comments: Accepted by the Euro-Par 2024 workshops (PHYSHPC 2024), prepared in the standardized Springer LNCS format and consists of 12 pages, which includes the main text, references, and figures

  3. arXiv:2406.14297  [pdf, other

    cs.AI astro-ph.IM

    AI in Space for Scientific Missions: Strategies for Minimizing Neural-Network Model Upload

    Authors: Jonah Ekelund, Ricardo Vinuesa, Yuri Khotyaintsev, Pierre Henri, Gian Luca Delzanno, Stefano Markidis

    Abstract: Artificial Intelligence (AI) has the potential to revolutionize space exploration by delegating several spacecraft decisions to an onboard AI instead of relying on ground control and predefined procedures. It is likely that there will be an AI/ML Processing Unit onboard the spacecraft running an inference engine. The neural-network will have pre-installed parameters that can be updated onboard by… ▽ More

    Submitted 20 June, 2024; originally announced June 2024.

  4. arXiv:2405.07222  [pdf, other

    quant-ph cs.DC

    What is Quantum Parallelism, Anyhow?

    Authors: Stefano Markidis

    Abstract: Central to the power of quantum computing is the concept of quantum parallelism: quantum systems can explore and process multiple computational paths simultaneously. In this paper, we discuss the elusive nature of quantum parallelism, drawing parallels with classical parallel computing models to elucidate its fundamental characteristics and implications for algorithmic performance. We begin by def… ▽ More

    Submitted 12 May, 2024; originally announced May 2024.

    Comments: Accepted at ISC HPC 2024 conference

  5. arXiv:2405.05640  [pdf, other

    cs.DC cs.MS physics.flu-dyn

    Experience and Analysis of Scalable High-Fidelity Computational Fluid Dynamics on Modular Supercomputing Architectures

    Authors: Martin Karp, Estela Suarez, Jan H. Meinke, Måns I. Andersson, Philipp Schlatter, Stefano Markidis, Niclas Jansson

    Abstract: The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 13 pages, 5 figures, 3 tables, preprint

    ACM Class: J.2; C.1.4; G.4

  6. arXiv:2405.05639  [pdf, other

    cs.DC

    Supercomputers as a Continous Medium

    Authors: Martin Karp, Niclas Jansson, Philipp Schlatter, Stefano Markidis

    Abstract: As supercomputers' complexity has grown, the traditional boundaries between processor, memory, network, and accelerators have blurred, making a homogeneous computer model, in which the overall computer system is modeled as a continuous medium with homogeneously distributed computational power, memory, and data movement transfer capabilities, an intriguing and powerful abstraction. By applying a ho… ▽ More

    Submitted 9 May, 2024; originally announced May 2024.

    Comments: 10 pages, 8 figures, 3 tables

    ACM Class: F.1; F.2; I.6

  7. arXiv:2405.04161  [pdf, other

    cs.LG cs.AI

    Opportunities for machine learning in scientific discovery

    Authors: Ricardo Vinuesa, Jean Rabault, Hossein Azizpour, Stefan Bauer, Bingni W. Brunton, Arne Elofsson, Elias Jarlebring, Hedvig Kjellstrom, Stefano Markidis, David Marlevi, Paola Cinnella, Steven L. Brunton

    Abstract: Technological advancements have substantially increased computational power and data availability, enabling the application of powerful machine-learning (ML) techniques across various fields. However, our ability to leverage ML methods for scientific discovery, {\it i.e.} to obtain fundamental and formalized knowledge about natural processes, is still in its infancy. In this review, we explore how… ▽ More

    Submitted 7 May, 2024; originally announced May 2024.

  8. arXiv:2404.10270  [pdf, other

    cs.DC cs.PF physics.comp-ph

    Optimizing BIT1, a Particle-in-Cell Monte Carlo Code, with OpenMP/OpenACC and GPU Acceleration

    Authors: Jeremy J. Williams, Felix Liu, David Tskhakaya, Stefan Costea, Ales Podolnik, Stefano Markidis

    Abstract: On the path toward developing the first fusion energy devices, plasma simulations have become indispensable tools for supporting the design and development of fusion machines. Among these critical simulation tools, BIT1 is an advanced Particle-in-Cell code with Monte Carlo collisions, specifically designed for modeling plasma-material interaction and, in particular, analyzing the power load distri… ▽ More

    Submitted 15 April, 2024; originally announced April 2024.

    Comments: Accepted by ICCS 2024 (The 24th International Conference on Computational Science), prepared in English, formatted according to the Springer LNCS templates and consists of 15 pages, which includes the main text, references, and figures

  9. arXiv:2401.15661  [pdf, other

    cs.CE

    Brain-Inspired Physics-Informed Neural Networks: Bare-Minimum Neural Architectures for PDE Solvers

    Authors: Stefano Markidis

    Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a powerful tool for solving partial differential equations~(PDEs) in various scientific and engineering domains. However, traditional PINN architectures typically rely on large, fully connected multilayer perceptrons~(MLPs), lacking the sparsity and modularity inherent in many traditional numerical solvers. An unsolved and critical question… ▽ More

    Submitted 19 April, 2024; v1 submitted 28 January, 2024; originally announced January 2024.

    Comments: Accepted at the 24th International Conference on Computational Science (ICCS)

  10. arXiv:2401.14576  [pdf

    cs.DC cs.PF

    Accelerating Scientific Application through Transparent I/O Interposition

    Authors: Steven W. D. Chien, Kento Sato, Artur Podobas, Niclas Jansson, Stefano Markidis, Michio Honda

    Abstract: The ability to handle a large volume of data generated by scientific applications is crucial. We have seen an increase in the heterogeneity of storage technologies available to scientific applications, such as burst buffers, local temporary block storage, managed cloud parallel file systems (PFS), and non-POSIX object stores. However, scientific applications designed for traditional HPC systems ca… ▽ More

    Submitted 25 January, 2024; originally announced January 2024.

    Comments: Submitted to HPDC 2024

  11. arXiv:2308.00763  [pdf, other

    cs.DC

    Boosting the Performance of Object Tracking with a Half-Precision Particle Filter on GPU

    Authors: Gabin Schieffer, Nattawat Pornthisan, Daniel Araújo de Medeiros, Stefano Markidis, Jacob Wahlgren, Ivy Peng

    Abstract: High-performance GPU-accelerated particle filter methods are critical for object detection applications, ranging from autonomous driving, robot localization, to time-series prediction. In this work, we investigate the design, development and optimization of particle-filter using half-precision on CUDA cores and compare their performance and accuracy with single- and double-precision baselines on N… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

    Comments: 12 pages, 8 figures, conference. To be published in The 21st International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar2023)

  12. arXiv:2308.00637  [pdf, other

    math.OC cs.DC

    Krylov Solvers for Interior Point Methods with Applications in Radiation Therapy and Support Vector Machines

    Authors: Felix Liu, Albin Fredriksson, Stefano Markidis

    Abstract: Interior point methods are widely used for different types of mathematical optimization problems. Many implementations of interior point methods in use today rely on direct linear solvers to solve systems of equations in each iteration. The need to solve ever larger optimization problems more efficiently and the rise of hardware accelerators for general purpose computing has led to a large interes… ▽ More

    Submitted 26 February, 2024; v1 submitted 1 August, 2023; originally announced August 2023.

  13. arXiv:2308.00497  [pdf, other

    cs.MS

    Leveraging MLIR for Loop Vectorization and GPU Porting of FFT Libraries

    Authors: Yifei He, Artur Podobas, Stefano Markidis

    Abstract: FFTc is a Domain-Specific Language (DSL) for designing and generating Fast Fourier Transforms (FFT) libraries. The FFTc uniqueness is that it leverages and extend Multi-Level Intermediate Representation (MLIR) dialects to optimize FFT code generation. In this work, we present FFTc extensions and improvements such as the possibility of using different data layout for complex-value arrays, and spars… ▽ More

    Submitted 1 August, 2023; originally announced August 2023.

  14. arXiv:2307.14860  [pdf, other

    cs.PF

    Quantum Computer Simulations at Warp Speed: Assessing the Impact of GPU Acceleration

    Authors: Jennifer Faj, Ivy Peng, Jacob Wahlgren, Stefano Markidis

    Abstract: Quantum computer simulators are crucial for the development of quantum computing. In this work, we investigate the suitability and performance impact of GPU and multi-GPU systems on a widely used simulation tool - the state vector simulator Qiskit Aer. In particular, we evaluate the performance of both Qiskit's default Nvidia Thrust backend and the recent Nvidia cuQuantum backend on Nvidia A100 GP… ▽ More

    Submitted 27 July, 2023; originally announced July 2023.

  15. Leveraging HPC Profiling & Tracing Tools to Understand the Performance of Particle-in-Cell Monte Carlo Simulations

    Authors: Jeremy J. Williams, David Tskhakaya, Stefan Costea, Ivy B. Peng, Marta Garcia-Gasulla, Stefano Markidis

    Abstract: Large-scale plasma simulations are critical for designing and developing next-generation fusion energy devices and modeling industrial plasmas. BIT1 is a massively parallel Particle-in-Cell code designed for specifically studying plasma material interaction in fusion devices. Its most salient characteristic is the inclusion of collision Monte Carlo models for different plasma species. In this work… ▽ More

    Submitted 28 June, 2023; originally announced June 2023.

    Comments: Accepted by the Euro-Par 2023 workshops (TDLPP 2023), prepared in the standardized Springer LNCS format and consists of 12 pages, which includes the main text, references, and figures

  16. QHDL: a Low-Level Circuit Description Language for Quantum Computing

    Authors: Gilbert Netzer, Stefano Markidis

    Abstract: This paper proposes a descriptive language called QHDL, akin to VHDL, to program gate-based quantum computing systems. Unlike other popular quantum programming languages, QHDL targets low-level quantum computing programming and aims to provide a common framework for programming FPGAs and gate-based quantum computing systems. The paper presents an initial implementation and design principles of the… ▽ More

    Submitted 16 May, 2023; originally announced May 2023.

    Comments: 4 pages, 7 figures, to be published in Proceedings of the 20th ACM International Conference on Computing Frontiers, May 9-11, 2023, Bologna, Italy

  17. arXiv:2305.04635  [pdf, other

    math.NA cs.DC

    Parallel Cholesky Factorization for Banded Matrices using OpenMP Tasks

    Authors: Felix Liu, Albin Fredriksson, Stefano Markidis

    Abstract: Cholesky factorization is a widely used method for solving linear systems involving symmetric, positive-definite matrices, and can be an attractive choice in applications where a high degree of numerical stability is needed. One such application is numerical optimization, where direct methods for solving linear systems are widely used and often a significant performance bottleneck. An example wher… ▽ More

    Submitted 8 May, 2023; originally announced May 2023.

  18. Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

    Authors: Ayesha Afzal, Georg Hager, Stefano Markidis, Gerhard Wellein

    Abstract: Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI communication in memory-bound parallel programs on multicore clusters and how it can be facilitated. For instance, slowing down MPI processes by deliberate inject… ▽ More

    Submitted 24 February, 2023; v1 submitted 23 February, 2023; originally announced February 2023.

    Comments: 18 pages, 14 figures, 7 tables. Corrected Fig. 4 layout

  19. On Physics-Informed Neural Networks for Quantum Computers

    Authors: Stefano Markidis

    Abstract: Physics-Informed Neural Networks (PINN) emerged as a powerful tool for solving scientific computing problems, ranging from the solution of Partial Differential Equations to data assimilation tasks. One of the advantages of using PINN is to leverage the usage of Machine Learning computational frameworks relying on the combined usage of CPUs and co-processors, such as accelerators, to achieve maximu… ▽ More

    Submitted 17 October, 2022; v1 submitted 28 September, 2022; originally announced September 2022.

    Comments: Updated the previous work section and abstract, fixed typos, and changed the title

    Journal ref: https://www.frontiersin.org/articles/10.3389/fams.2022.1036711/full

  20. arXiv:2208.13658  [pdf, other

    physics.comp-ph cs.PF

    Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software

    Authors: Måns I. Andersson, N. Arul Murugan, Artur Podobas, Stefano Markidis

    Abstract: GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during th… ▽ More

    Submitted 29 August, 2022; originally announced August 2022.

  21. Distributed Objective Function Evaluation for Optimization of Radiation Therapy Treatment Plans

    Authors: Felix Liu, Måns I. Andersson, Albin Fredriksson, Stefano Markidis

    Abstract: The modern workflow for radiation therapy treatment planning involves mathematical optimization to determine optimal treatment machine parameters for each patient case. The optimization problems can be computationally expensive, requiring iterative optimization algorithms to solve. In this work, we investigate a method for distributing the calculation of objective functions and gradients for radia… ▽ More

    Submitted 24 August, 2022; originally announced August 2022.

    Comments: Accepted for publication at the PPAM22 conference

    Journal ref: Parallel Processing and Applied Mathematics. PPAM 2022. Lecture Notes in Computer Science, vol 13826. Springer, Cham

  22. arXiv:2207.07098  [pdf, other

    cs.MS cs.CE cs.DC physics.flu-dyn

    Large-Scale Direct Numerical Simulations of Turbulence Using GPUs and Modern Fortran

    Authors: Martin Karp, Daniele Massaro, Niclas Jansson, Alistair Hart, Jacob Wahlgren, Philipp Schlatter, Stefano Markidis

    Abstract: We present our approach to making direct numerical simulations of turbulence with applications in sustainable shipping. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by… ▽ More

    Submitted 23 June, 2022; originally announced July 2022.

    Comments: 13 pages, 7 figures

    ACM Class: G.4; J.2

  23. arXiv:2207.06803  [pdf, other

    cs.MS cs.CL

    FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries

    Authors: Yifei He, Artur Podobas, Måns I. Andersson, Stefano Markidis

    Abstract: Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for express… ▽ More

    Submitted 26 July, 2022; v1 submitted 14 July, 2022; originally announced July 2022.

  24. arXiv:2206.14103  [pdf, other

    cs.DC

    Workflows to driving high-performance interactive supercomputing for urgent decision making

    Authors: Nick Brown, Rupert Nash, Gordon Gibb, Evgenij Belikov, Artur Podobas, Wei Der Chien, Stefano Markidis, Markus Flatken, Andreas Gerndt

    Abstract: Interactive urgent computing is a small but growing user of supercomputing resources. However there are numerous technical challenges that must be overcome to make supercomputers fully suited to the wide range of urgent workloads which could benefit from the computational power delivered by such instruments. An important question is how to connect the different components of an urgent workload; na… ▽ More

    Submitted 28 June, 2022; originally announced June 2022.

    Comments: Pre-print of paper accepted to the InteractiveHPC workshop of ISC2022

  25. Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

    Authors: Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis

    Abstract: This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time p… ▽ More

    Submitted 27 May, 2022; originally announced May 2022.

    Comments: 12 pages, 9 figures, 1 table

  26. arXiv:2112.00116  [pdf, ps, other

    cs.DC

    A Review on Parallel Virtual Screening Softwares for High Performance Computers

    Authors: Natarajan Arul Murugan, Artur Podobas, Davide Gadioli, Emanuele Vitali, Gianluca Palermo, Stefano Markidis

    Abstract: Drug discovery is the most expensive, time demanding and challenging project in biopharmaceutical companies which aims at the identification and optimization of lead compounds from large-sized chemical libraries. The lead compounds should have high affinity binding and specificity for a target associated with a disease and in addition they should have favorable pharmacodynamic and pharmacokinetic… ▽ More

    Submitted 30 November, 2021; originally announced December 2021.

    Comments: Submitted to Pharmaceuticals, MPDI journal

  27. arXiv:2111.05654  [pdf, other

    cs.DC

    Utilising urgent computing to tackle the spread of mosquito-borne diseases

    Authors: Nick Brown, Rupert Nash, Piero Poletti, Giorgio Guzzetta, Mattia Manica, Agnese Zardini, Markus Flatken, Jules Vidal, Charles Gueunet, Evgenij Belikov, Julien Tierny, Artur Podobas, Wei Der Chien, Stefano Markidis, Andreas Gerndt

    Abstract: It is estimated that around 80\% of the world's population live in areas susceptible to at-least one major vector borne disease, and approximately 20% of global communicable diseases are spread by mosquitoes. Furthermore, the outbreaks of such diseases are becoming more common and widespread, with much of this driven in recent years by socio-demographic and climatic factors. These trends are causi… ▽ More

    Submitted 10 November, 2021; originally announced November 2021.

    Comments: Preprint of paper in 2021 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC)

  28. arXiv:2109.03592  [pdf, ps, other

    cs.DC

    Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems

    Authors: Jonathan Vincent, Jing Gong, Martin Karp, Adam Peplinski, Niclas Jansson, Artur Podobas, Andreas Jocksch, Jie Yao, Fazle Hussain, Stefano Markidis, Matts Karlsson, Dirk Pleiter, Erwin Laure, Philipp Schlatter

    Abstract: We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers $Re_τ=360$ and $Re_τ=550$, based on friction velocity and pipe radius. The strong… ▽ More

    Submitted 4 November, 2021; v1 submitted 8 September, 2021; originally announced September 2021.

    Comments: 9 pages, 8 figures. Submitted to HPC-Asia 2022 conference, updated to address reviewers comments

    ACM Class: G.4; J.2; C.1

  29. A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays

    Authors: Martin Karp, Artur Podobas, Tobias Kenter, Niclas Jansson, Christian Plessl, Philipp Schlatter, Stefano Markidis

    Abstract: The impending termination of Moore's law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand. In this work, we de… ▽ More

    Submitted 2 November, 2021; v1 submitted 27 August, 2021; originally announced August 2021.

    Comments: 12 pages, 3 figures, 3 tables, Accepted to HPC Asia 2022

    ACM Class: G.4; J.2; C.1

  30. arXiv:2107.06676  [pdf, other

    cs.LG cs.CE cs.DC cs.NE

    Higgs Boson Classification: Brain-inspired BCPNN Learning with StreamBrain

    Authors: Martin Svedin, Artur Podobas, Steven W. D. Chien, Stefano Markidis

    Abstract: One of the most promising approaches for data analysis and exploration of large data sets is Machine Learning techniques that are inspired by brain models. Such methods use alternative learning rules potentially more efficiently than established learning rules. In this work, we focus on the potential of brain-inspired ML for exploiting High-Performance Computing (HPC) resources to solve ML problem… ▽ More

    Submitted 17 August, 2021; v1 submitted 14 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at The 2nd Workshop on Artificial Intelligence and Machine Learning for Scientific Applications (AI4S 2021)

  31. arXiv:2107.02232  [pdf, other

    physics.plasm-ph cs.LG physics.comp-ph

    A Deep Learning-Based Particle-in-Cell Method for Plasma Simulations

    Authors: Xavier Aguilar, Stefano Markidis

    Abstract: We design and develop a new Particle-in-Cell (PIC) method for plasma simulations using Deep-Learning (DL) to calculate the electric field from the electron phase space. We train a Multilayer Perceptron (MLP) and a Convolutional Neural Network (CNN) to solve the two-stream instability test. We verify that the DL-based MLP PIC method produces the correct results using the two-stream instability: the… ▽ More

    Submitted 5 July, 2021; originally announced July 2021.

    Comments: Submitted to AI4S Workshop at Cluster Conference

  32. arXiv:2107.01243  [pdf

    cs.MS

    Neko: A Modern, Portable, and Scalable Framework for High-Fidelity Computational Fluid Dynamics

    Authors: Niclas Jansson, Martin Karp, Artur Podobas, Stefano Markidis, Philipp Schlatter

    Abstract: Recent trends and advancement in including more diverse and heterogeneous hardware in High-Performance Computing is challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim "software outlives hardware" may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. C… ▽ More

    Submitted 2 July, 2021; originally announced July 2021.

  33. arXiv:2106.05373  [pdf, other

    cs.DC cs.LG cs.NE

    StreamBrain: An HPC Framework for Brain-like Neural Networks on CPUs, GPUs and FPGAs

    Authors: Artur Podobas, Martin Svedin, Steven W. D. Chien, Ivy B. Peng, Naresh Balaji Ravichandran, Pawel Herman, Anders Lansner, Stefano Markidis

    Abstract: The modern deep learning method based on backpropagation has surged in popularity and has been used in multiple domains and application areas. At the same time, there are other -- less-known -- machine learning algorithms with a mature and solid theoretical foundation whose performance remains unexplored. One such example is the brain-like Bayesian Confidence Propagation Neural Network (BCPNN). In… ▽ More

    Submitted 9 June, 2021; originally announced June 2021.

    Comments: Accepted for publication at the International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART 2021)

  34. arXiv:2103.09683  [pdf, other

    cs.DC

    Accelerating Radiation Therapy Dose Calculation with Nvidia GPUs

    Authors: Felix Liu, Niclas Jansson, Artur Podobas, Albin Fredriksson, Stefano Markidis

    Abstract: Radiation Treatment Planning (RTP) is the process of planning the appropriate external beam radiotherapy to combat cancer in human patients. RTP is a complex and compute-intensive task, which often takes a long time (several hours) to compute. Reducing this time allows for higher productivity at clinics and more sophisticated treatment planning, which can materialize in better treatments. The stat… ▽ More

    Submitted 19 September, 2021; v1 submitted 17 March, 2021; originally announced March 2021.

  35. arXiv:2103.09655  [pdf, other

    math.NA cs.DC physics.comp-ph

    The Old and the New: Can Physics-Informed Deep-Learning Replace Traditional Linear Solvers?

    Authors: Stefano Markidis

    Abstract: Physics-Informed Neural Networks (PINN) are neural networks encoding the problem governing equations, such as Partial Differential Equations (PDE), as a part of the neural network. PINNs have emerged as a new essential tool to solve various challenging problems, including computing linear systems arising from PDEs, a task for which several traditional methods exist. In this work, we focus first on… ▽ More

    Submitted 5 July, 2021; v1 submitted 11 March, 2021; originally announced March 2021.

    Comments: Preprint - submitted to Frontiers

  36. arXiv:2010.13463  [pdf

    cs.DC

    High-Performance Spectral Element Methods on Field-Programmable Gate Arrays

    Authors: Martin Karp, Artur Podobas, Niclas Jansson, Tobias Kenter, Christian Plessl, Philipp Schlatter, Stefano Markidis

    Abstract: Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate… ▽ More

    Submitted 4 May, 2021; v1 submitted 26 October, 2020; originally announced October 2020.

    Comments: 10 pages, IEEE International Parallel and Distributed Processing Symposium 2021 (IPDPS'21)

    ACM Class: G.4; J.2; C.1

  37. arXiv:2010.05348  [pdf, other

    physics.comp-ph cs.LG

    Automatic Particle Trajectory Classification in Plasma Simulations

    Authors: Stefano Markidis, Ivy Peng, Artur Podobas, Itthinat Jongsuebchoke, Gabriel Bengtsson, Pawel Herman

    Abstract: Numerical simulations of plasma flows are crucial for advancing our understanding of microscopic processes that drive the global plasma dynamics in fusion devices, space, and astrophysical systems. Identifying and classifying particle trajectories allows us to determine specific on-going acceleration mechanisms, shedding light on essential plasma processes. Our overall goal is to provide a gener… ▽ More

    Submitted 11 October, 2020; originally announced October 2020.

    Comments: Accepted for publication at AI4S: Workshop on Artificial Intelligence and Machine Learning for Scientific Applications

  38. sputniPIC: an Implicit Particle-in-Cell Code for Multi-GPU Systems

    Authors: Steven W. D. Chien, Jonas Nylund, Gabriel Bengtsson, Ivy B. Peng, Artur Podobas, Stefano Markidis

    Abstract: Large-scale simulations of plasmas are essential for advancing our understanding of fusion devices, space, and astrophysical systems. Particle-in-Cell (PIC) codes have demonstrated their success in simulating numerous plasma phenomena on HPC systems. Today, flagship supercomputers feature multiple GPUs per compute node to achieve unprecedented computing power at high power efficiency. PIC codes re… ▽ More

    Submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 32nd International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD 2020)

  39. tf-Darshan: Understanding Fine-grained I/O Performance in Machine Learning Workloads

    Authors: Steven W. D. Chien, Artur Podobas, Ivy B. Peng, Stefano Markidis

    Abstract: Machine Learning applications on HPC systems have been gaining popularity in recent years. The upcoming large scale systems will offer tremendous parallelism for training through GPUs. However, another heavy aspect of Machine Learning is I/O, and this can potentially be a performance bottleneck. TensorFlow, one of the most popular Deep-Learning platforms, now offers a new profiler interface and al… ▽ More

    Submitted 11 August, 2020; v1 submitted 10 August, 2020; originally announced August 2020.

    Comments: Accepted for publication at the 2020 International Conference on Cluster Computing (CLUSTER 2020)

  40. arXiv:2005.13425  [pdf

    cs.DC

    Optimization of Tensor-product Operations in Nekbone on GPUs

    Authors: Martin Karp, Niclas Jansson, Artur Podobas, Philipp Schlatter, Stefano Markidis

    Abstract: In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to… ▽ More

    Submitted 27 May, 2020; originally announced May 2020.

    Comments: 4 pages, 4 figures

    ACM Class: G.4; J.2

  41. Performance Evaluation of Advanced Features in CUDA Unified Memory

    Authors: Steven W. D. Chien, Ivy B. Peng, Stefano Markidis

    Abstract: CUDA Unified Memory improves the GPU programmability and also enables GPU memory oversubscription. Recently, two advanced memory features, memory advises and asynchronous prefetch, have been introduced. In this work, we evaluate the new features on two platforms that feature different CPUs, GPUs, and interconnects. We derive a benchmark suite for the experiments and stress the memory system to eva… ▽ More

    Submitted 21 October, 2019; originally announced October 2019.

    Comments: Accepted for publication at Workshop on Memory Centric High Performance Computing (MCHPC'19) in SC19

  42. arXiv:1908.05715  [pdf, other

    physics.space-ph cs.LG eess.IV

    Automated classification of plasma regions using 3D particle energy distributions

    Authors: Vyacheslav Olshevsky, Yuri V. Khotyaintsev, Ahmad Lalti, Andrey Divin, Gian Luca Delzanno, Sven Anderzen, Pawel Herman, Steven W. D. Chien, Levon Avanov, Andrew P. Dimmock, Stefano Markidis

    Abstract: We investigate the properties of the ion sky maps produced by the Dual Ion Spectrometers (DIS) from the Fast Plasma Investigation (FPI). We have trained a convolutional neural network classifier to predict four regions crossed by the MMS on the dayside magnetosphere: solar wind, ion foreshock, magnetosheath, and magnetopause using solely DIS spectrograms. The accuracy of the classifier is >98%. We… ▽ More

    Submitted 21 September, 2021; v1 submitted 15 August, 2019; originally announced August 2019.

    Comments: Accepted to JGR: Space Physics

  43. Posit NPB: Assessing the Precision Improvement in HPC Scientific Applications

    Authors: Steven W. D. Chien, Ivy B. Peng, Stefano Markidis

    Abstract: Floating-point operations can significantly impact the accuracy and performance of scientific applications on large-scale parallel systems. Recently, an emerging floating-point format called Posit has attracted attention as an alternative to the standard IEEE floating-point formats because it could enable higher precision than IEEE formats using the same number of bits. In this work, we first expl… ▽ More

    Submitted 12 July, 2019; originally announced July 2019.

    Comments: Accepted for publication in PPAM 2019 conference

  44. Multi-GPU Acceleration of the iPIC3D Implicit Particle-in-Cell Code

    Authors: Chaitanya Prasad Sishtla, Steven W. D. Chien, Vyacheslav Olshevsky, Erwin Laure, Stefano Markidis

    Abstract: iPIC3D is a widely used massively parallel Particle-in-Cell code for the simulation of space plasmas. However, its current implementation does not support execution on multiple GPUs. In this paper, we describe the porting of iPIC3D particle mover to GPUs and the optimization steps to increase the performance and parallel scaling on multiple GPUs. We analyze the strong scaling of the mover on two G… ▽ More

    Submitted 7 April, 2019; originally announced April 2019.

    Comments: Accepted for publication in ICCS 2019

  45. TensorFlow Doing HPC

    Authors: Steven W. D. Chien, Stefano Markidis, Vyacheslav Olshevsky, Yaroslav Bulatov, Erwin Laure, Jeffrey S. Vetter

    Abstract: TensorFlow is a popular emerging open-source programming framework supporting the execution of distributed applications on heterogeneous hardware. While TensorFlow has been initially designed for developing Machine Learning (ML) applications, in fact TensorFlow aims at supporting the development of a much broader range of application kinds that are outside the ML domain and can possibly include HP… ▽ More

    Submitted 11 March, 2019; originally announced March 2019.

    Comments: Accepted for publication at The Ninth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES'19)

  46. arXiv:1810.04150  [pdf, other

    cs.DC

    Exploring the Vision Processing Unit as Co-processor for Inference

    Authors: Sergio Rivas-Gomez, Antonio J. Peña, David Moloney, Erwin Laure, Stefano Markidis

    Abstract: The success of the exascale supercomputer is largely debated to remain dependent on novel breakthroughs in technology that effectively reduce the power consumption and thermal dissipation requirements. In this work, we consider the integration of co-processors in high-performance computing (HPC) to enable low-power, seamless computation offloading of certain operations. In particular, we explore t… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

  47. arXiv:1810.04146  [pdf, other

    cs.DC

    Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks

    Authors: Sergio Rivas-Gomez, Sai Narasimhamurthy, Keeran Brabazon, Oliver Perks, Erwin Laure, Stefano Markidis

    Abstract: In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

  48. arXiv:1810.04110  [pdf, other

    cs.DC

    MPI Windows on Storage for HPC Applications

    Authors: Sergio Rivas-Gomez, Roberto Gioiosa, Ivy Bo Peng, Gokcen Kestor, Sai Narasimhamurthy, Erwin Laure, Stefano Markidis

    Abstract: Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we e… ▽ More

    Submitted 9 October, 2018; originally announced October 2018.

  49. Characterizing Deep-Learning I/O Workloads in TensorFlow

    Authors: Steven W. D. Chien, Stefano Markidis, Chaitanya Prasad Sishtla, Luis Santos, Pawel Herman, Sai Narasimhamurthy, Erwin Laure

    Abstract: The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to resta… ▽ More

    Submitted 6 October, 2018; originally announced October 2018.

    Comments: Accepted for publication at pdsw-DISCS 2018

  50. The SAGE Project: a Storage Centric Approach for Exascale Computing

    Authors: Sai Narasimhamurthy, Nikita Danilov, Sining Wu, Ganesan Umanesan, Steven Wei-der Chien, Sergio Rivas-Gomez, Ivy Bo Peng, Erwin Laure, Shaun de Witt, Dirk Pleiter, Stefano Markidis

    Abstract: SAGE (Percipient StorAGe for Exascale Data Centric Computing) is a European Commission funded project towards the era of Exascale computing. Its goal is to design and implement a Big Data/Extreme Computing (BDEC) capable infrastructure with associated software stack. The SAGE system follows a "storage centric" approach as it is capable of storing and processing large data volumes at the Exascale r… ▽ More

    Submitted 6 July, 2018; originally announced July 2018.

    Comments: Submitted to Computing Frontiers 2018. arXiv admin note: substantial text overlap with arXiv:1805.00556