-
The LBNL Superfacility Project Report
Authors:
Deborah Bard,
Cory Snavely,
Lisa Gerhardt,
Jason Lee,
Becci Totzke,
Katie Antypas,
William Arndt,
Johannes Blaschke,
Suren Byna,
Ravi Cheema,
Shreyas Cholia,
Mark Day,
Bjoern Enders,
Aditi Gaur,
Annette Greiner,
Taylor Groves,
Mariam Kiran,
Quincey Koziol,
Tom Lehman,
Kelly Rowland,
Chris Samuel,
Ashwin Selvarajan,
Alex Sim,
David Skinner,
Laurie Stephey
, et al. (2 additional authors not shown)
Abstract:
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019…
▽ More
The Superfacility model is designed to leverage HPC for experimental science. It is more than simply a model of connected experiment, network, and HPC facilities; it encompasses the full ecosystem of infrastructure, software, tools, and expertise needed to make connected facilities easy to use. The three-year Lawrence Berkeley National Laboratory (LBNL) Superfacility project was initiated in 2019 to coordinate work being performed at LBNL to support this model, and to provide a coherent and comprehensive set of science requirements to drive existing and new work.
A key component of the project was the in-depth engagements with eight science teams that represent challenging use cases across the DOE Office of Science. By the close of the project, we met our project goal by enabling our science application engagements to demonstrate automated pipelines that analyze data from remote facilities at large scale, without routine human intervention. In several cases, we have gone beyond demonstrations and now provide production-level services. To achieve this goal, the Superfacility team developed tools, infrastructure, and policies for near-real-time computing support, dynamic high-performance networking, data management and movement tools, API-driven automation, HPC-scale notebooks via Jupyter, authentication using Federated Identity and container-based edge services supported.
The lessons we learned during this project provide a valuable model for future large, complex, cross-disciplinary collaborations. There is a pressing need for a coherent computing infrastructure across national facilities, and LBNL's Superfacility project is a unique model for success in tackling the challenges that will be faced in hardware, software, policies, and services across multiple science domains.
△ Less
Submitted 27 June, 2022; v1 submitted 23 June, 2022;
originally announced June 2022.
-
Meta-modeling strategy for data-driven forecasting
Authors:
Dominic J. Skinner,
Romit Maulik
Abstract:
Accurately forecasting the weather is a key requirement for climate change mitigation. Data-driven methods offer the ability to make more accurate forecasts, but lack interpretability and can be expensive to train and deploy if models are not carefully developed. Here, we make use of two historical climate data sets and tools from machine learning, to accurately predict temperature fields. Further…
▽ More
Accurately forecasting the weather is a key requirement for climate change mitigation. Data-driven methods offer the ability to make more accurate forecasts, but lack interpretability and can be expensive to train and deploy if models are not carefully developed. Here, we make use of two historical climate data sets and tools from machine learning, to accurately predict temperature fields. Furthermore, we are able to use low fidelity models that are cheap to train and evaluate, to selectively avoid expensive high fidelity function evaluations, as well as uncover seasonal variations in predictive power. This allows for an adaptive training strategy for computationally efficient geophysical emulation.
△ Less
Submitted 14 November, 2020;
originally announced December 2020.
-
Universal Differential Equations for Scientific Machine Learning
Authors:
Christopher Rackauckas,
Yingbo Ma,
Julius Martensen,
Collin Warner,
Kirill Zubov,
Rohit Supekar,
Dominic Skinner,
Ali Ramadhan,
Alan Edelman
Abstract:
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equ…
▽ More
In the context of science, the well-known adage "a picture is worth a thousand words" might well be "a model is worth a thousand datasets." In this manuscript we introduce the SciML software ecosystem as a tool for mixing the information of physical laws and scientific models with data-driven machine learning approaches. We describe a mathematical object, which we denote universal differential equations (UDEs), as the unifying framework connecting the ecosystem. We show how a wide variety of applications, from automatically discovering biological mechanisms to solving high-dimensional Hamilton-Jacobi-Bellman equations, can be phrased and efficiently handled through the UDE formalism and its tooling. We demonstrate the generality of the software tooling to handle stochasticity, delays, and implicit constraints. This funnels the wide variety of SciML applications into a core set of training mechanisms which are highly optimized, stabilized for stiff equations, and compatible with distributed parallelism and GPU accelerators.
△ Less
Submitted 2 November, 2021; v1 submitted 13 January, 2020;
originally announced January 2020.
-
Standing Together for Reproducibility in Large-Scale Computing: Report on reproducibility@XSEDE
Authors:
Doug James,
Nancy Wilkins-Diehr,
Victoria Stodden,
Dirk Colbry,
Carlos Rosales,
Mark Fahey,
Justin Shi,
Rafael F. Silva,
Kyo Lee,
Ralph Roskies,
Laurence Loewe,
Susan Lindsey,
Rob Kooper,
Lorena Barba,
David Bailey,
Jonathan Borwein,
Oscar Corcho,
Ewa Deelman,
Michael Dietze,
Benjamin Gilbert,
Jan Harkes,
Seth Keele,
Praveen Kumar,
Jong Lee,
Erika Linke
, et al. (30 additional authors not shown)
Abstract:
This is the final report on reproducibility@xsede, a one-day workshop held in conjunction with XSEDE14, the annual conference of the Extreme Science and Engineering Discovery Environment (XSEDE). The workshop's discussion-oriented agenda focused on reproducibility in large-scale computational research. Two important themes capture the spirit of the workshop submissions and discussions: (1) organiz…
▽ More
This is the final report on reproducibility@xsede, a one-day workshop held in conjunction with XSEDE14, the annual conference of the Extreme Science and Engineering Discovery Environment (XSEDE). The workshop's discussion-oriented agenda focused on reproducibility in large-scale computational research. Two important themes capture the spirit of the workshop submissions and discussions: (1) organizational stakeholders, especially supercomputer centers, are in a unique position to promote, enable, and support reproducible research; and (2) individual researchers should conduct each experiment as though someone will replicate that experiment. Participants documented numerous issues, questions, technologies, practices, and potentially promising initiatives emerging from the discussion, but also highlighted four areas of particular interest to XSEDE: (1) documentation and training that promotes reproducible research; (2) system-level tools that provide build- and run-time information at the level of the individual job; (3) the need to model best practices in research collaborations involving XSEDE staff; and (4) continued work on gateways and related technologies. In addition, an intriguing question emerged from the day's interactions: would there be value in establishing an annual award for excellence in reproducible research?
△ Less
Submitted 2 January, 2015; v1 submitted 17 December, 2014;
originally announced December 2014.
-
Making QCD Lattice Data Accessible and Organized through Advanced Web Interfaces
Authors:
Massimo Di Pierro,
James Hetrick,
Shreyas Cholia,
David Skinner
Abstract:
The Gauge Connection at qcd.nersc.gov is one of the most popular repositories of QCD lattice ensembles. It is used to access 16TB of archived QCD data from the High Performance Storage System (HPSS) at the National Energy Research Scientific Computing Center (NERSC). Here, we present a new web interface for qcd.nersc.gov which allows physicists to browse and search the data, as well as download in…
▽ More
The Gauge Connection at qcd.nersc.gov is one of the most popular repositories of QCD lattice ensembles. It is used to access 16TB of archived QCD data from the High Performance Storage System (HPSS) at the National Energy Research Scientific Computing Center (NERSC). Here, we present a new web interface for qcd.nersc.gov which allows physicists to browse and search the data, as well as download individual files or entire ensembles in batch. Our system distinguishes itself from others because of its ease of use and web based workflow.
△ Less
Submitted 9 December, 2011;
originally announced December 2011.