subscribe to arXiv mailings

A Perspective on Foundation Models for the Electric Power Grid

Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belvi, Ricardo J. Bessa, Bishnu Prasad Bhattari , et al. (2 additional authors not shown)

Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transition and climate change. In this paper, we call for the development of, and state why we believe in, the potential of FMs for electric grids. We highlight their strengths and weaknesses amidst the challenges of a changing grid. We argue that an FM learning from diverse grid data and topologies could unlock transformative capabilities, pioneering a new approach in leveraging AI to redefine how we manage complexity and uncertainty in the electric grid. Finally, we discuss a power grid FM concept, namely GridFM, based on graph neural networks and show how different downstream tasks benefit. △ Less

Submitted 12 July, 2024; originally announced July 2024.

Comments: Lead contact: H.F.H.; Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S

arXiv:2407.01764 [pdf, other]

Object Proxy Patterns for Accelerating Distributed Applications

Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster

Abstract: Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area r… ▽ More Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area references that can resolve to data regardless of location, has been demonstrated as an effective low-level building block in such situations. Here we propose three high-level proxy-based programming patterns -- distributed futures, streaming, and ownership -- that make the power of the proxy pattern usable for more complex and dynamic distributed program structures. We motivate these patterns via careful review of application requirements and describe implementations of each pattern. We evaluate our implementations through a suite of benchmarks and by applying them in three substantial scientific applications, in which we demonstrate substantial improvements in runtime, throughput, and memory usage. △ Less

Submitted 1 July, 2024; originally announced July 2024.

arXiv:2405.15828 [pdf, other]

Oil & Water? Diffusion of AI Within and Across Scientific Fields

Authors: Eamon Duede, William Dolan, André Bauer, Ian Foster, Karim Lakhani

Abstract: This study empirically investigates claims of the increasing ubiquity of artificial intelligence (AI) within roughly 80 million research publications across 20 diverse scientific fields, by examining the change in scholarly engagement with AI from 1985 through 2022. We observe exponential growth, with AI-engaged publications increasing approximately thirteenfold (13x) across all fields, suggesting… ▽ More This study empirically investigates claims of the increasing ubiquity of artificial intelligence (AI) within roughly 80 million research publications across 20 diverse scientific fields, by examining the change in scholarly engagement with AI from 1985 through 2022. We observe exponential growth, with AI-engaged publications increasing approximately thirteenfold (13x) across all fields, suggesting a dramatic shift from niche to mainstream. Moreover, we provide the first empirical examination of the distribution of AI-engaged publications across publication venues within individual fields, with results that reveal a broadening of AI engagement within disciplines. While this broadening engagement suggests a move toward greater disciplinary integration in every field, increased ubiquity is associated with a semantic tension between AI-engaged research and more traditional disciplinary research. Through an analysis of tens of millions of document embeddings, we observe a complex interplay between AI-engaged and non-AI-engaged research within and across fields, suggesting that increasing ubiquity is something of an oil-and-water phenomenon -- AI-engaged work is spreading out over fields, but not mixing well with non-AI-engaged work. △ Less

Submitted 23 May, 2024; originally announced May 2024.

arXiv:2404.19717 [pdf, other]

Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study

Authors: Lukasz Lacinski, Lee Liming, Steven Turoscy, Cameron Harr, Kyle Chard, Eli Dart, Paul Durack, Sasha Ames, Forrest M. Hoffman, Ian T. Foster

Abstract: We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and OR… ▽ More We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and ORNL, was performed largely automatically by a simple replication tool, a script that invoked Globus to transfer large bundles of files while tracking progress in a database. Under the covers, Globus organized transfers to make efficient use of the high-speed Energy Sciences network (ESnet) and the data transfer nodes deployed at participating sites, and also addressed security, integrity checking, and recovery from a variety of transient failures. This success demonstrates the considerable benefits that can accrue from the adoption of performant data replication infrastructure. △ Less

Submitted 30 April, 2024; originally announced April 2024.

arXiv:2404.15668 [pdf, other]

doi 10.1145/3629526.3645035

MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes

Authors: Xiaolong Ma, Feng Yan, Lei Yang, Ian Foster, Michael E. Papka, Zhengchun Liu, Rajkumar Kettimuthu

Abstract: First-come first-serve scheduling can result in substantial (up to 10%) of transiently idle nodes on supercomputers. Recognizing that such unfilled nodes are well-suited for deep neural network (DNN) training, due to the flexible nature of DNN training tasks, Liu et al. proposed that the re-scaling DNN training tasks to fit gaps in schedules be formulated as a mixed-integer linear programming (MIL… ▽ More First-come first-serve scheduling can result in substantial (up to 10%) of transiently idle nodes on supercomputers. Recognizing that such unfilled nodes are well-suited for deep neural network (DNN) training, due to the flexible nature of DNN training tasks, Liu et al. proposed that the re-scaling DNN training tasks to fit gaps in schedules be formulated as a mixed-integer linear programming (MILP) problem, and demonstrated via simulation the potential benefits of the approach. Here, we introduce MalleTrain, a system that provides the first practical implementation of this approach and that furthermore generalizes it by allowing it use even for DNN training applications for which model information is unknown before runtime. Key to this latter innovation is the use of a lightweight online job profiling advisor (JPA) to collect critical scalability information for DNN jobs -- information that it then employs to optimize resource allocations dynamically, in real time. We describe the MalleTrain architecture and present the results of a detailed experimental evaluation on a supercomputer GPU cluster and several representative DNN training workloads, including neural architecture search and hyperparameter optimization. Our results not only confirm the practical feasibility of leveraging idle supercomputer nodes for DNN training but improve significantly on prior results, improving training throughput by up to 22.3\% without requiring users to provide job scalability information. △ Less

Submitted 24 April, 2024; originally announced April 2024.

arXiv:2403.19257 [pdf, other]

UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving

Authors: Yifei Li, Ryan Chard, Yadu Babuji, Kyle Chard, Ian Foster, Zhuozhao Li

Abstract: Modern scientific applications are increasingly decomposable into individual functions that may be deployed across distributed and diverse cyberinfrastructure such as supercomputers, clouds, and accelerators. Such applications call for new approaches to programming, distributed execution, and function-level management. We present UniFaaS, a parallel programming framework that relies on a federated… ▽ More Modern scientific applications are increasingly decomposable into individual functions that may be deployed across distributed and diverse cyberinfrastructure such as supercomputers, clouds, and accelerators. Such applications call for new approaches to programming, distributed execution, and function-level management. We present UniFaaS, a parallel programming framework that relies on a federated function-as-a-service (FaaS) model to enable composition of distributed, scalable, and high-performance scientific workflows, and to support fine-grained function-level management. UniFaaS provides a unified programming interface to compose dynamic task graphs with transparent wide-area data management. UniFaaS exploits an observe-predict-decide approach to efficiently map workflow tasks to target heterogeneous and dynamic resources. We propose a dynamic heterogeneity-aware scheduling algorithm that employs a delay mechanism and a re-scheduling mechanism to accommodate dynamic resource capacity. Our experiments show that UniFaaS can efficiently execute workflows across computing resources with minimal scheduling overhead. We show that UniFaaS can improve the performance of a real-world drug screening workflow by as much as 22.99% when employing an additional 19.48% of resources and a montage workflow by 54.41% when employing an additional 47.83% of resources across multiple distributed clusters, in contrast to using a single cluster △ Less

Submitted 28 March, 2024; originally announced March 2024.

Comments: 13 pages, 13 figures, IPDPS2024

arXiv:2403.06077 [pdf, other]

Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments

Authors: Jim Pruyne, Valerie Hayot-Sasson, Weijian Zheng, Ryan Chard, Justin M. Wozniak, Tekin Bicer, Kyle Chard, Ian T. Foster

Abstract: Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the… ▽ More Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the steps to be performed on data. As each scan or measurement is performed by an instrument, a new instance of the flow is initiated resulting in a "fleet" of concurrently running flows, with the overall goal to process all the data collected during a potentially long-running experiment. During the course of the experiment, each flow may need to adapt its execution due to changes in the environment, such as computational or storage resource availability, or based on the progress of the fleet as a whole such as completion or discovery of an intermediate result leading to a change in subsequent flow's behavior. We introduce a cloud-based decision engine, Braid, which flows consult during execution to query their run-time environment and coordinate with other flows within their fleet. Braid accepts streams of measurements taken from the run-time environment or from within flow runs which can then be statistically aggregated and compared to other streams to determine a strategy to guide flow execution. For example, queue lengths in execution environments can be used to direct a flow to run computations in one environment or another, or experiment progress as measured by individual flows can be aggregated to determine the progress and subsequent direction of the flows within a fleet. We describe Braid, its interface, implementation and performance characteristics. We further show through examples and experience modifying an existing scientific flow how Braid is used to make adaptable flows. △ Less

Submitted 9 March, 2024; originally announced March 2024.

arXiv:2312.06592 [pdf, other]

Flexible visual prompts for in-context learning in computer vision

Authors: Thomas Foster, Ioana Croitoru, Robert Dorfman, Christoffer Edlund, Thomas Varsavsky, Jon Almazán

Abstract: In this work, we address in-context learning (ICL) for the task of image segmentation, introducing a novel approach that adapts a modern Video Object Segmentation (VOS) technique for visual in-context learning. This adaptation is inspired by the VOS method's ability to efficiently and flexibly learn objects from a few examples. Through evaluations across a range of support set sizes and on diverse… ▽ More In this work, we address in-context learning (ICL) for the task of image segmentation, introducing a novel approach that adapts a modern Video Object Segmentation (VOS) technique for visual in-context learning. This adaptation is inspired by the VOS method's ability to efficiently and flexibly learn objects from a few examples. Through evaluations across a range of support set sizes and on diverse segmentation datasets, our method consistently surpasses existing techniques. Notably, it excels with data containing classes not encountered during training. Additionally, we propose a technique for support set selection, which involves choosing the most relevant images to include in this set. By employing support set selection, the performance increases for all tested methods without the need for additional training or prompt tuning. The code can be found at https://github.com/v7labs/XMem_ICL/. △ Less

Submitted 11 December, 2023; originally announced December 2023.

arXiv:2312.03989 [pdf, other]

Rapid detection of rare events from in situ X-ray diffraction data using machine learning

Authors: Weijian Zheng, Jun-Sang Park, Peter Kenesei, Ahsan Ali, Zhengchun Liu, Ian T. Foster, Nicholas Schwarz, Rajkumar Kettimuthu, Antonino Miceli, Hemant Sharma

Abstract: High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs o… ▽ More High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs of traditional data acquisition and reduction approaches pose a barrier to quickly extracting actionable insights and improving the temporal resolution of these snapshots. Here we present a fully automated technique capable of rapidly detecting the onset of plasticity in high-energy X-ray microscopy data. Our technique is computationally faster by at least 50 times than the traditional approaches and works for data sets that are up to 9 times sparser than a full data set. This new technique leverages self-supervised image representation learning and clustering to transform massive data into compact, semantic-rich representations of visually salient characteristics (e.g., peak shapes). These characteristics can be a rapid indicator of anomalous events such as changes in diffraction peak shapes. We anticipate that this technique will provide just-in-time actionable information to drive smarter experiments that effectively deploy multi-modal X-ray diffraction methods that span many decades of length scales. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2312.03876 [pdf, other]

Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

Authors: Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Sandeep Madireddy, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Aditya Grover

Abstract: Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it… ▽ More Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it difficult to understand what truly contributes to their success. Here we introduce Stormer, a simple transformer model that achieves state-of-the-art performance on weather forecasting with minimal changes to the standard transformer backbone. We identify the key components of Stormer through careful empirical analyses, including weather-specific embedding, randomized dynamics forecast, and pressure-weighted loss. At the core of Stormer is a randomized forecasting objective that trains the model to forecast the weather dynamics over varying time intervals. During inference, this allows us to produce multiple forecasts for a target lead time and combine them to obtain better forecast accuracy. On WeatherBench 2, Stormer performs competitively at short to medium-range forecasts and outperforms current methods beyond 7 days, while requiring orders-of-magnitude less training data and compute. Additionally, we demonstrate Stormer's favorable scaling properties, showing consistent improvements in forecast accuracy with increases in model size and training tokens. Code and checkpoints will be made publicly available. △ Less

Submitted 6 December, 2023; originally announced December 2023.

arXiv:2310.18948 [pdf, other]

Multi-Path Long-Term Vessel Trajectories Forecasting with Probabilistic Feature Fusion for Problem Shifting

Authors: Gabriel Spadon, Jay Kumar, Derek Eden, Josh van Berkel, Tom Foster, Amilcar Soares, Ronan Fablet, Stan Matwin, Ronald Pelot

Abstract: This paper addresses the challenge of boosting the precision of multi-path long-term vessel trajectory forecasting on engineered sequences of Automatic Identification System (AIS) data using feature fusion for problem shifting. We have developed a deep auto-encoder model and a phased framework approach to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data as input. To… ▽ More This paper addresses the challenge of boosting the precision of multi-path long-term vessel trajectory forecasting on engineered sequences of Automatic Identification System (AIS) data using feature fusion for problem shifting. We have developed a deep auto-encoder model and a phased framework approach to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data as input. To this end, we fuse the spatiotemporal features from the AIS messages with probabilistic features engineered from historical AIS data referring to potential routes and destinations. As a result, we reduce the forecasting uncertainty by shifting the problem into a trajectory reconstruction problem. The probabilistic features have an F1-Score of approximately 85% and 75% for the vessel route and destination prediction, respectively. Under such circumstances, we achieved an R2 Score of over 98% with different layer structures and varying feature combinations; the high R2 Score is a natural outcome of the well-defined shipping lanes in the study region. However, our proposal stands out among competing approaches as it demonstrates the capability of complex decision-making during turnings and route selection. Furthermore, we have shown that our model achieves more accurate forecasting with average and median errors of 11km and 6km, respectively, a 25% improvement from the current state-of-the-art approaches. The resulting model from this proposal is deployed as part of a broader Decision Support System to safeguard whales by preventing the risk of vessel-whale collisions under the smartWhales initiative and acting on the Gulf of St. Lawrence in Atlantic Canada. △ Less

Submitted 10 July, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

arXiv:2310.16270 [pdf, other]

Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

Authors: Mansi Sakarvadia, Arham Khan, Aswathy Ajith, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster

Abstract: Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose… ▽ More Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose Attention Lens, a tool that enables researchers to translate the outputs of attention heads into vocabulary tokens via learned attention-head-specific transformations called lenses. Preliminary findings from our trained lenses indicate that attention heads play highly specialized roles in language models. The code for Attention Lens is available at github.com/msakarvadia/AttentionLens. △ Less

Submitted 24 October, 2023; originally announced October 2023.

arXiv:2310.00510 [pdf, other]

Exploring Benchmarks for Self-Driving Labs using Color Matching

Authors: Tobias Ginsburg, Kyle Hippe, Ryan Lewis, Doga Ozgulbas, Aileen Cleary, Rory Butler, Casey Stone, Abraham Stroka, Ian Foster

Abstract: Self Driving Labs (SDLs) that combine automation of experimental procedures with autonomous decision making are gaining popularity as a means of increasing the throughput of scientific workflows. The task of identifying quantities of supplied colored pigments that match a target color, the color matching problem, provides a simple and flexible SDL test case, as it requires experiment proposal, sam… ▽ More Self Driving Labs (SDLs) that combine automation of experimental procedures with autonomous decision making are gaining popularity as a means of increasing the throughput of scientific workflows. The task of identifying quantities of supplied colored pigments that match a target color, the color matching problem, provides a simple and flexible SDL test case, as it requires experiment proposal, sample creation, and sample analysis, three common components in autonomous discovery applications. We present a robotic solution to the color matching problem that allows for fully autonomous execution of a color matching protocol. Our solution leverages the WEI science factory platform to enable portability across different robotic hardware, the use of alternative optimization methods for continuous refinement, and automated publication of results for experiment tracking and post-hoc analysis. △ Less

Submitted 30 September, 2023; originally announced October 2023.

arXiv:2309.05605 [pdf, other]

Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

Authors: Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster

Abstract: Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response… ▽ More Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response to single and multi-hop prompts. We then propose a mechanism that allows users to inject pertinent prompt-specific information, which we refer to as "memories," at critical LLM locations during inference. By thus enabling the LLM to incorporate additional relevant information during inference, we enhance the quality of multi-hop prompt completions. We show empirically that a simple, efficient, and targeted memory injection into a key attention layer can often increase the probability of the desired next token in multi-hop tasks, by up to 424%. △ Less

Submitted 28 February, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

Comments: Oral Presentation at BlackboxNLP Workshop at EMNLP 2023

arXiv:2308.13701 [pdf, other]

Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line / Microscope to Supercomputers

Authors: Alexander Brace, Rafael Vescovi, Ryan Chard, Nickolaus D. Saint, Arvind Ramanathan, Nestor J. Zaluzec, Ian Foster

Abstract: The Dynamic PicoProbe at Argonne National Laboratory is undergoing upgrades that will enable it to produce up to 100s of GB of data per day. While this data is highly important for both fundamental science and industrial applications, there is currently limited on-site infrastructure to handle these high-volume data streams. We address this problem by providing a software architecture capable of s… ▽ More The Dynamic PicoProbe at Argonne National Laboratory is undergoing upgrades that will enable it to produce up to 100s of GB of data per day. While this data is highly important for both fundamental science and industrial applications, there is currently limited on-site infrastructure to handle these high-volume data streams. We address this problem by providing a software architecture capable of supporting large-scale data transfers to the neighboring supercomputers at the Argonne Leadership Computing Facility. To prepare for future scientific workflows, we implement two instructive use cases for hyperspectral and spatiotemporal datasets, which include: (i) off-site data transfer, (ii) machine learning/artificial intelligence and traditional data analysis approaches, and (iii) automatic metadata extraction and cataloging of experimental results. This infrastructure supports expected workloads and also provides domain scientists the ability to reinterrogate data from past experiments to yield additional scientific value and derive new insights. △ Less

Submitted 25 August, 2023; originally announced August 2023.

arXiv:2308.09793 [pdf, other]

Towards a Modular Architecture for Science Factories

Authors: Rafael Vescovi, Tobias Ginsburg, Kyle Hippe, Doga Ozgulbas, Casey Stone, Abraham Stroka, Rory Butler, Ben Blaiszik, Tom Brettin, Kyle Chard, Mark Hereld, Arvind Ramanathan, Rick Stevens, Aikaterini Vriza, Jie Xu, Qingteng Zhang, Ian Foster

Abstract: Advances in robotic automation, high-performance computing (HPC), and artificial intelligence (AI) encourage us to conceive of science factories: large, general-purpose computation- and AI-enabled self-driving laboratories (SDLs) with the generality and scale needed both to tackle large discovery problems and to support thousands of scientists. Science factories require modular hardware and softwa… ▽ More Advances in robotic automation, high-performance computing (HPC), and artificial intelligence (AI) encourage us to conceive of science factories: large, general-purpose computation- and AI-enabled self-driving laboratories (SDLs) with the generality and scale needed both to tackle large discovery problems and to support thousands of scientists. Science factories require modular hardware and software that can be replicated for scale and (re)configured to support many applications. To this end, we propose a prototype modular science factory architecture in which reconfigurable modules encapsulating scientific instruments are linked with manipulators to form workcells, that can themselves be combined to form larger assemblages, and linked with distributed computing for simulation, AI model training and inference, and related tasks. Workflows that perform sets of actions on modules can be specified, and various applications, comprising workflows plus associated computational and data manipulation steps, can be run concurrently. We report on our experiences prototyping this architecture and applying it in experiments involving 15 different robotic apparatus, five applications (one in education, two in biology, two in materials), and a variety of workflows, across four laboratories. We describe the reuse of modules, workcells, and workflows in different applications, the migration of applications between workcells, and the use of digital twins, and suggest directions for future work aimed at yet more generality and scalability. Code and data are available at https://ad-sdl.github.io/wei2023 and in the Supplementary Information △ Less

Submitted 17 October, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

arXiv:2306.08695 [pdf, other]

doi 10.1038/s42004-023-01090-2

A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture

Authors: Hyun Park, Xiaoli Yan, Ruijie Zhu, E. A. Huerta, Santanu Chaudhuri, Donny Cooper, Ian Foster, Emad Tajkhorshid

Abstract: Metal-organic frameworks (MOFs) exhibit great promise for CO2 capture. However, finding the best performing materials poses computational and experimental grand challenges in view of the vast chemical space of potential building blocks. Here, we introduce GHP-MOFassemble, a generative artificial intelligence (AI), high performance framework for the rational and accelerated design of MOFs with high… ▽ More Metal-organic frameworks (MOFs) exhibit great promise for CO2 capture. However, finding the best performing materials poses computational and experimental grand challenges in view of the vast chemical space of potential building blocks. Here, we introduce GHP-MOFassemble, a generative artificial intelligence (AI), high performance framework for the rational and accelerated design of MOFs with high CO2 adsorption capacity and synthesizable linkers. GHP-MOFassemble generates novel linkers, assembled with one of three pre-selected metal nodes (Cu paddlewheel, Zn paddlewheel, Zn tetramer) into MOFs in a primitive cubic topology. GHP-MOFassemble screens and validates AI-generated MOFs for uniqueness, synthesizability, structural validity, uses molecular dynamics simulations to study their stability and chemical consistency, and crystal graph neural networks and Grand Canonical Monte Carlo simulations to quantify their CO2 adsorption capacities. We present the top six AI-generated MOFs with CO2 capacities greater than 2 $m mol/g$, i.e., higher than 96.9% of structures in the hypothetical MOF dataset. △ Less

Submitted 12 March, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

Comments: 25 pages, 17 figures, 6 tables, accepted to Nature Communications Chemistry. This work was awarded the HPCwire 2023 Editors' Choice Awards for Best Use of High Performance Data Analytics \& Artificial Intelligence see https://www.hpcwire.com/2023-readers-editors-choice-data-analytics-ai/

ACM Class: I.2

Journal ref: Commun Chem 7, 21 (2024)

arXiv:2306.06283 [pdf, other]

doi 10.1039/D3DD00113J

14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

Authors: Kevin Maik Jablonka, Qianxiang Ai, Alexander Al-Feghali, Shruti Badhwar, Joshua D. Bocarsly, Andres M Bran, Stefan Bringuier, L. Catherine Brinson, Kamal Choudhary, Defne Circi, Sam Cox, Wibe A. de Jong, Matthew L. Evans, Nicolas Gastellu, Jerome Genzling, María Victoria Gil, Ankur K. Gupta, Zhi Hong, Alishba Imran, Sabine Kruschwitz, Anne Labarre, Jakub Lála, Tao Liu, Steven Ma, Sauradeep Majumdar , et al. (28 additional authors not shown)

Abstract: Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of mole… ▽ More Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines. △ Less

Submitted 14 July, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

arXiv:2305.09593 [pdf, other]

Accelerating Communications in Federated Applications with Transparent Object Proxies

Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster

Abstract: Advances in networks, accelerators, and cloud services encourage programmers to reconsider where to compute -- such as when fast networks make it cost-effective to compute on remote accelerators despite added latency. Workflow and cloud-hosted serverless computing frameworks can manage multi-step computations spanning federated collections of cloud, high-performance computing (HPC), and edge syste… ▽ More Advances in networks, accelerators, and cloud services encourage programmers to reconsider where to compute -- such as when fast networks make it cost-effective to compute on remote accelerators despite added latency. Workflow and cloud-hosted serverless computing frameworks can manage multi-step computations spanning federated collections of cloud, high-performance computing (HPC), and edge systems, but passing data among computational steps via cloud storage can incur high costs. Here, we overcome this obstacle with a new programming paradigm that decouples control flow from data flow by extending the pass-by-reference model to distributed applications. We describe ProxyStore, a system that implements this paradigm by providing object proxies that act as wide-area object references with just-in-time resolution. This proxy model enables data producers to communicate data unilaterally, transparently, and efficiently to both local and remote consumers. We demonstrate the benefits of this model with synthetic benchmarks and real-world scientific applications, running across various computing platforms. △ Less

Submitted 29 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

Comments: Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC23)

arXiv:2210.08973 [pdf, ps, other]

doi 10.1038/s41597-023-02298-6

FAIR for AI: An interdisciplinary and international community building perspective

Authors: E. A. Huerta, Ben Blaiszik, L. Catherine Brinson, Kristofer E. Bouchard, Daniel Diaz, Caterina Doglioni, Javier M. Duarte, Murali Emani, Ian Foster, Geoffrey Fox, Philip Harris, Lukas Heinrich, Shantenu Jha, Daniel S. Katz, Volodymyr Kindratenko, Christine R. Kirkpatrick, Kati Lassila-Perini, Ravi K. Madduri, Mark S. Neubauer, Fotis E. Psomopoulos, Avik Roy, Oliver Rübel, Zhizhen Zhao, Ruike Zhu

Abstract: A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i… ▽ More A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022. △ Less

Submitted 1 August, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

Comments: 10 pages, comments welcome!; v2: 12 pages, accepted to Scientific Data

ACM Class: I.2.0; E.0

Journal ref: Scientific Data 10, 487 (2023)

arXiv:2209.11631 [pdf, other]

doi 10.1109/TPDS.2022.3208767

funcX: Federated Function as a Service for Science

Authors: Zhuozhao Li, Ryan Chard, Yadu Babuji, Ben Galewsky, Tyler Skluzacek, Kirill Nagaitsev, Anna Woodard, Ben Blaiszik, Josh Bryan, Daniel S. Katz, Ian Foster, Kyle Chard

Abstract: funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and superc… ▽ More funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and supercomputers, in effect turning them into function serving systems. funcX's cloud-hosted service provides a single location for registering, sharing, and managing both functions and endpoints. It allows for transparent, secure, and reliable function execution across the federated ecosystem of endpoints--enabling users to route functions to endpoints based on specific needs. funcX uses containers (e.g., Docker, Singularity, and Shifter) to provide common execution environments across endpoints. funcX implements various container management strategies to execute functions with high performance and efficiency on diverse funcX endpoints. funcX also integrates with an in-memory data store and Globus for managing data that may span endpoints. We motivate the need for funcX, present our prototype design and implementation, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than 130 000 concurrent workers. We show that funcX's container warming-aware routing algorithm can reduce the completion time for 3000 functions by up to 61% compared to a randomized algorithm and the in-memory data store can speed up data transfers by up to 3x compared to a shared file system. △ Less

Submitted 23 September, 2022; originally announced September 2022.

Comments: arXiv admin note: substantial text overlap with arXiv:2005.04215

arXiv:2209.09408 [pdf, other]

Deep learning at the edge enables real-time streaming ptychographic imaging

Authors: Anakha V Babu, Tao Zhou, Saugat Kandel, Tekin Bicer, Zhengchun Liu, William Judge, Daniel J. Ching, Yi Jiang, Sinisa Veseli, Steven Henke, Ryan Chard, Yudong Yao, Ekaterina Sirazitdinova, Geetika Gupta, Martin V. Holt, Ian T. Foster, Antonino Miceli, Mathew J. Cherukara

Abstract: Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials charact… ▽ More Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials characterization. However, associated significant increases in data and compute needs mean that conventional approaches no longer suffice for recovering sample images in real-time from high-speed coherent imaging experiments. Here, we demonstrate a workflow that leverages artificial intelligence at the edge and high-performance computing to enable real-time inversion on X-ray ptychography data streamed directly from a detector at up to 2 kHz. The proposed AI-enabled workflow eliminates the sampling constraints imposed by traditional ptychography, allowing low dose imaging using orders of magnitude less data than required by traditional methods. △ Less

Submitted 19 September, 2022; originally announced September 2022.

arXiv:2208.09513 [pdf, other]

Globus Automation Services: Research process automation across the space-time continuum

Authors: Ryan Chard, Jim Pruyne, Kurt McKee, Josh Bryan, Brigitte Raumann, Rachana Ananthakrishnan, Kyle Chard, Ian Foster

Abstract: Research process automation -- the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources -- has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of… ▽ More Research process automation -- the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources -- has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of actions, \emph{flows}, and the execution of such flows in heterogeneous research environments. To support flows with broad spatial extent (e.g., from scientific instrument to remote data center) and temporal extent (from seconds to weeks), these Globus automation services feature: 1) cloud hosting for reliable execution of even long-lived flows despite sporadic failures; 2) a simple specification and extensible asynchronous action provider API, for defining and executing a wide variety of actions and flows involving heterogeneous resources; 3) an event-driven execution model for automating execution of flows in response to arbitrary events; and 4) a rich security model enabling authorization delegation mechanisms for secure execution of long-running actions across distributed resources. These services permit researchers to outsource and automate the management of a broad range of research tasks to a reliable, scalable, and secure cloud platform. We present use cases for Globus automation services, describe their design and implementation, present microbenchmark studies, and review experiences applying the services in a range of applications. △ Less

Submitted 6 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

arXiv:2207.00611 [pdf, other]

doi 10.1038/s41597-022-01712-9

FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy

Authors: Nikil Ravi, Pranshu Chaturvedi, E. A. Huerta, Zhengchun Liu, Ryan Chard, Aristana Scourtas, K. J. Schmidt, Kyle Chard, Ben Blaiszik, Ian Foster

Abstract: A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set o… ▽ More A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set of practical, concise, and measurable FAIR principles for AI models. We showcase how to create and share FAIR data and AI models within a unified computational framework combining the following elements: the Advanced Photon Source at Argonne National Laboratory, the Materials Data Facility, the Data and Learning Hub for Science, and funcX, and the Argonne Leadership Computing Facility (ALCF), in particular the ThetaGPU supercomputer and the SambaNova DataScale system at the ALCF AI Testbed. We describe how this domain-agnostic computational framework may be harnessed to enable autonomous AI-driven discovery. △ Less

Submitted 21 December, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

Comments: 11 pages, 3 figures; Accepted to Scientific Data; for press release see https://www.anl.gov/article/argonne-scientists-promote-fair-standards-for-managing-artificial-intelligence-models and https://www.ncsa.illinois.edu/ncsa-student-researchers-lead-authors-on-award-winning-paper; Received 2022 HPCwire Readers' Choice Award on Best Use of High Performance Data Analytics & Artificial Intelligence

MSC Class: 68T01; 68T05 ACM Class: I.2; J.2

Journal ref: Scientific Data 9, 657 (2022)

arXiv:2205.11342 [pdf, other]

The Diminishing Returns of Masked Language Models to Science

Authors: Zhi Hong, Aswathy Ajith, Gregory Pauloski, Eamon Duede, Kyle Chard, Ian Foster

Abstract: Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks. It has also been demonstrated that the downstream task performance of such models can be improved by pretraining larger models for longer on more data. In this work, we empirically evaluate the extent to which these results extend to tasks in science. We use 14… ▽ More Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks. It has also been demonstrated that the downstream task performance of such models can be improved by pretraining larger models for longer on more data. In this work, we empirically evaluate the extent to which these results extend to tasks in science. We use 14 domain-specific transformer-based models (including ScholarBERT, a new 770M-parameter science-focused masked language model pretrained on up to 225B tokens) to evaluate the impact of training data, model size, pretraining and finetuning time on 12 downstream scientific tasks. Interestingly, we find that increasing model sizes, training data, or compute time does not always lead to significant improvements (i.e., >1% F1), if at all, in scientific information extraction tasks and offered possible explanations for the surprising performance differences. △ Less

Submitted 3 May, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

Comments: 12 pages. 3 figures. 5 tables. Accepted to the Findings of ACL 2023

ACM Class: I.2.7

arXiv:2204.05128 [pdf, other]

Linking Scientific Instruments and HPC: Patterns, Technologies, Experiences

Authors: Rafael Vescovi, Ryan Chard, Nickolaus Saint, Ben Blaiszik, Jim Pruyne, Tekin Bicer, Alex Lavens, Zhengchun Liu, Michael E. Papka, Suresh Narayanan, Nicholas Schwarz, Kyle Chard, Ian Foster

Abstract: Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analyses require methods for configuring and running hi… ▽ More Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analyses require methods for configuring and running high-performance distributed computing pipelines--what we call flows--linking instruments, HPC (e.g., for analysis, simulation, AI model training), edge computing (for analysis), data stores, metadata catalogs, and high-speed networks. In this article, we review common patterns associated with such flows and describe methods for instantiating those patterns. We also present experiences with the application of these methods to the processing of data from five different scientific instruments, each of which engages HPC resources for data inversion, machine learning model training, or other purposes. We also discuss implications of these new methods for operators and users of scientific facilities. △ Less

Submitted 22 August, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

arXiv:2204.04312 [pdf]

doi 10.3233/978-1-60750-803-8-3

The History of the Grid

Authors: Ian Foster, Carl Kesselman

Abstract: With the widespread availability of high-speed networks, it becomes feasible to outsource computing to remote providers and to federate resources from many locations. Such observations motivated the development, from the mid-1990s onwards, of a range of innovative Grid technologies, applications, and infrastructures. We review the history, current status, and future prospects for Grid computing. With the widespread availability of high-speed networks, it becomes feasible to outsource computing to remote providers and to federate resources from many locations. Such observations motivated the development, from the mid-1990s onwards, of a range of innovative Grid technologies, applications, and infrastructures. We review the history, current status, and future prospects for Grid computing. △ Less

Submitted 8 April, 2022; originally announced April 2022.

Journal ref: High Performance Computing: From Grids and Clouds to Exascale, IOS Press, pages 3-30, 2011

arXiv:2202.01710 [pdf, other]

doi 10.1016/j.cma.2022.115041

Multi-Output Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Uncertainties

Authors: Mingyuan Yang, John T. Foster

Abstract: Physics-informed neural networks (PINNs) have recently been used to solve various computational problems which are governed by partial differential equations (PDEs). In this paper, we propose a multi-output physics-informed neural network (MO-PINN) which can provide solutions with uncertainty distributions for both forward and inverse PDE problems with noisy data. In this framework, the uncertaint… ▽ More Physics-informed neural networks (PINNs) have recently been used to solve various computational problems which are governed by partial differential equations (PDEs). In this paper, we propose a multi-output physics-informed neural network (MO-PINN) which can provide solutions with uncertainty distributions for both forward and inverse PDE problems with noisy data. In this framework, the uncertainty arising from the noisy data is first translated into multiple measurements regarding the prior noise distribution using the bootstrap method, and then the outputs of neural networks are designed to satisfy the measurements as well as the underlying physical laws.The posterior estimation of target parameters can be obtained at the end of training, which can be further used for uncertainty quantification and decision making. In this paper, MO-PINNs are demonstrated with a series of numerical experiments including both linear and nonlinear, forward and inverse problems. The results show that MO-PINN is able to provide accurate predictions with noisy data.In addition, we also demonstrate that the prediction and posterior distributions from MO-PINNs are consistent with the solutions from traditional a finite element method (FEM) solver and Monte Carlo methods given the same data and prior knowledge. Finally, we show that additional statistical knowledge can be incorporated into the training to improve the prediction if available. △ Less

Submitted 3 February, 2022; originally announced February 2022.

arXiv:2201.08296 [pdf, other]

doi 10.1109/MC.2022.3160876

CUF-Links: Continuous and Ubiquitous FAIRness Linkages for reproducible research

Authors: Ian Foster, Carl Kesselman

Abstract: Despite much creative work on methods and tools, reproducibility -- the ability to repeat the computational steps used to obtain a research result -- remains elusive. One reason for these difficulties is that extant tools for capturing research processes do not align well with the rich working practices of scientists. We advocate here for simple mechanisms that can be integrated easily with curren… ▽ More Despite much creative work on methods and tools, reproducibility -- the ability to repeat the computational steps used to obtain a research result -- remains elusive. One reason for these difficulties is that extant tools for capturing research processes do not align well with the rich working practices of scientists. We advocate here for simple mechanisms that can be integrated easily with current work practices to capture basic information about every data product consumed or produced in a project. We argue that by thus extending the scope of findable, accessible, interoperable, and reusable (FAIR) data in both time and space to enable the creation of a continuous chain of continuous and ubiquitous FAIRness linkages (CUF-Links) from inputs to outputs, such mechanisms can provide a strong foundation for documenting the provenance linkages that are essential to reproducible research. We give examples of mechanisms that can achieve these goals, and review how they have been applied in practice. △ Less

Submitted 20 January, 2022; originally announced January 2022.

Journal ref: Computer, vol. 55, no. 8, pp. 20-30, Aug. 2022

arXiv:2201.06564 [pdf]

doi 10.1162/99608f92.44d21b86

Sharing Begins at Home

Authors: William Dempsey, Ian Foster, Scott Fraser, Carl Kesselman

Abstract: The broad sharing of research data is widely viewed as of critical importance for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data, and the frequency of data reuse, remain stubbornly low. We argue here that a major reason for this unfortunate state of affairs is that the organization of research resul… ▽ More The broad sharing of research data is widely viewed as of critical importance for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data, and the frequency of data reuse, remain stubbornly low. We argue here that a major reason for this unfortunate state of affairs is that the organization of research results in the findable, accessible, interoperable, and reusable (FAIR) form required for reuse is too often deferred to the end of a research project, when preparing publications, by which time essential details are no longer accessible. Thus, we propose an approach to research informatics that applies FAIR principles continuously, from the very inception of a research project, and ubiquitously, to every data asset produced by experiment or computation. We suggest that this seemingly challenging task can be made feasible by the adoption of simple tools, such as lightweight identifiers (to ensure that every data asset is findable), packaging methods (to facilitate understanding of data contents), data access methods, and metadata organization and structuring tools (to support schema development and evolution). We use an example from experimental neuroscience to illustrate how these methods can work in practice. △ Less

Submitted 8 July, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

Journal ref: Harvard Data Science Review, Volume 4, Issue 3, 2022

arXiv:2111.11330 [pdf, other]

High-Performance Ptychographic Reconstruction with Federated Facilities

Authors: Tekin Bicer, Xiaodong Yu, Daniel J. Ching, Ryan Chard, Mathew J. Cherukara, Bogdan Nicolae, Rajkumar Kettimuthu, Ian T. Foster

Abstract: Beamlines at synchrotron light source facilities are powerful scientific instruments used to image samples and observe phenomena at high spatial and temporal resolutions. Typically, these facilities are equipped only with modest compute resources for the analysis of generated experimental datasets. However, high data rate experiments can easily generate data in volumes that take days (or even week… ▽ More Beamlines at synchrotron light source facilities are powerful scientific instruments used to image samples and observe phenomena at high spatial and temporal resolutions. Typically, these facilities are equipped only with modest compute resources for the analysis of generated experimental datasets. However, high data rate experiments can easily generate data in volumes that take days (or even weeks) to process on those local resources. To address this challenge, we present a system that unifies leadership computing and experimental facilities by enabling the automated establishment of data analysis pipelines that extend from edge data acquisition systems at synchrotron beamlines to remote computing facilities; under the covers, our system uses Globus Auth authentication to minimize user interaction, funcX to run user-defined functions on supercomputers, and Globus Flows to define and execute workflows. We describe the application of this system to ptychography, an ultra-high-resolution coherent diffraction imaging technique that can produce 100s of gigabytes to terabytes in a single experiment. When deployed on the DGX A100 ThetaGPU cluster at the Argonne Leadership Computing Facility and a microscopy beamline at the Advanced Photon Source, our system performs analysis as an experiment progresses to provide timely feedback. △ Less

Submitted 22 November, 2021; originally announced November 2021.

Comments: 19 pages, 5 figures, to be published in Smoky Mountains Computational Sciences and Engineering Conference (SMC 2021)

arXiv:2110.02827 [pdf, other]

doi 10.1109/MLHPC54614.2021.00007

Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing

Authors: Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster

Abstract: Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We… ▽ More Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We present Colmena, an open-source Python framework that allows users to steer campaigns by providing just the implementations of individual tasks plus the logic used to choose which tasks to execute when. Colmena handles task dispatch, results collation, ML model invocation, and ML model (re)training, using Parsl to execute tasks on HPC systems. We describe the design of Colmena and illustrate its capabilities by applying it to electrolyte design, where it both scales to 65536 CPUs and accelerates the discovery rate for high-performance molecules by a factor of 100 over unguided searches. △ Less

Submitted 6 October, 2021; originally announced October 2021.

Comments: camera-ready version for ML in HPC Environments 2021

arXiv:2107.02841 [pdf, other]

doi 10.1109/CLUSTER.2015.74

Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing

Authors: Justin M. Wozniak, Timothy G. Armstrong, Ketan C. Maheshwari, Daniel S. Katz, Michael Wilde, Ian T. Foster

Abstract: Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitat… ▽ More Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model. △ Less

Submitted 6 July, 2021; originally announced July 2021.

Comments: 2015 IEEE International Conference on Cluster Computing

arXiv:2107.01739 [pdf, other]

doi 10.1145/3458817.3476152

KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

Authors: J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

Abstract: Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communicat… ▽ More Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communication, and computation given specific models and hardware to improve performance and increase scalability. We quantify the tradeoffs between memory and communication cost and evaluate KAISA on large models, including ResNet-50, Mask R-CNN, U-Net, and BERT, on up to 128 NVIDIA A100 GPUs. Compared to the original optimizers, KAISA converges 18.1-36.3% faster across applications with the same global batch size. Under a fixed memory budget, KAISA converges 32.5% and 41.6% faster in ResNet-50 and BERT-Large, respectively. KAISA can balance memory and communication to achieve scaling efficiency equal to or better than the baseline optimizers. KAISA is open source and available at https://github.com/gpauloski/kfac_pytorch. △ Less

Submitted 20 September, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

Comments: Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21)

arXiv:2104.06953 [pdf]

A National Discovery Cloud: Preparing the US for Global Competitiveness in the New Era of 21st Century Digital Transformation

Authors: Ian Foster, Daniel Lopresti, Bill Gropp, Mark D. Hill, Katie Schuman

Abstract: The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each dev… ▽ More The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each development has major implications for how societies function and compete; together, they represent a change in technological foundations of society as profound as the telegraph or electrification. Societies that embrace these changes will lead in the 21st Century; those that do not, will decline in prosperity and influence. Nowhere is this stark choice more evident than in research and education, the two sectors that produce the innovations that power the future and prepare a workforce able to exploit those innovations, respectively. In this article, we introduce these developments and suggest steps that the US government might take to prepare the research and education system for its implications. △ Less

Submitted 19 April, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

Comments: A Computing Community Consortium (CCC) white paper, 6 pages

Report number: ccc2021whitepaper_4

arXiv:2101.06813 [pdf, other]

Fast and accurate learned multiresolution dynamical downscaling for precipitation

Authors: Jiali Wang, Zhengchun Liu, Ian Foster, Won Chang, Rajkumar Kettimuthu, Rao Kotamarthi

Abstract: This study develops a neural network-based approach for emulating high-resolution modeled precipitation data with comparable statistical properties but at greatly reduced computational cost. The key idea is to use combination of low- and high- resolution simulations to train a neural network to map from the former to the latter. Specifically, we define two types of CNNs, one that stacks variables… ▽ More This study develops a neural network-based approach for emulating high-resolution modeled precipitation data with comparable statistical properties but at greatly reduced computational cost. The key idea is to use combination of low- and high- resolution simulations to train a neural network to map from the former to the latter. Specifically, we define two types of CNNs, one that stacks variables directly and one that encodes each variable before stacking, and we train each CNN type both with a conventional loss function, such as mean square error (MSE), and with a conditional generative adversarial network (CGAN), for a total of four CNN variants. We compare the four new CNN-derived high-resolution precipitation results with precipitation generated from original high resolution simulations, a bilinear interpolater and the state-of-the-art CNN-based super-resolution (SR) technique. Results show that the SR technique produces results similar to those of the bilinear interpolator with smoother spatial and temporal distributions and smaller data variabilities and extremes than the original high resolution simulations. While the new CNNs trained by MSE generate better results over some regions than the interpolator and SR technique do, their predictions are still not as close as the original high resolution simulations. The CNNs trained by CGAN generate more realistic and physically reasonable results, better capturing not only data variability in time and space but also extremes such as intense and long-lasting storms. The new proposed CNN-based downscaling approach can downscale precipitation from 50~km to 12~km in 14~min for 30~years once the network is trained (training takes 4~hours using 1~GPU), while the conventional dynamical downscaling would take 1~month using 600 CPU cores to generate simulations at the resolution of 12~km over contiguous United States. △ Less

Submitted 17 January, 2021; originally announced January 2021.

arXiv:2101.01284 [pdf]

Advancing Computing's Foundation of US Industry & Society

Authors: Thomas M. Conte, Ian T. Foster, William Gropp, Mark D. Hill

Abstract: While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent succes… ▽ More While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent successes in AI/ML required the synergy of improved algorithms and hardware architectures (e.g., general-purpose graphics processing units). However, unlike in the 20th Century and early 2000s, tomorrow's performance aspirations must be achieved without continued semiconductor scaling formerly provided by Moore's Law and Dennard Scaling. How will one deliver the next 100x improvement in capability at similar or less cost to enable great value? Can we make the next AI leap without 100x better hardware? This whitepaper argues for a multipronged effort to develop new computing approaches beyond Moore's Law to advance the foundation that computing provides to US industry, education, medicine, science, and government. This impact extends far beyond the IT industry itself, as IT is now central for providing value across society, for example in semi-autonomous vehicles, tele-education, health wearables, viral analysis, and efficient administration. Herein we draw upon considerable visioning work by CRA's Computing Community Consortium (CCC) and the IEEE Rebooting Computing Initiative (IEEE RCI), enabled by thought leader input from industry, academia, and the US government. △ Less

Submitted 4 January, 2021; originally announced January 2021.

Comments: A Computing Community Consortium (CCC) white paper, 4 pages

Report number: ccc2020whitepaper_17

arXiv:2012.08545 [pdf, other]

doi 10.1038/s41550-021-01405-0

Accelerated, Scalable and Reproducible AI-driven Gravitational Wave Detection

Authors: E. A. Huerta, Asad Khan, Xiaobo Huang, Minyang Tian, Maksim Levental, Ryan Chard, Wei Wei, Maeve Heflin, Daniel S. Katz, Volodymyr Kindratenko, Dawei Mu, Ben Blaiszik, Ian Foster

Abstract: The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed… ▽ More The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed computing service. Using this workflow, an ensemble of four openly available AI models can be run on HAL to process an entire month's worth (August 2017) of advanced Laser Interferometer Gravitational-Wave Observatory data in just seven minutes, identifying all four all four binary black hole mergers previously identified in this dataset and reporting no misclassifications. This approach combines advances in AI, distributed computing, and scientific data infrastructure to open new pathways to conduct reproducible, accelerated, data-driven discovery. △ Less

Submitted 9 July, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

Comments: 17 pages, 5 figures; v2: 12 pages, 6 figures. Accepted to Nature Astronomy. See also the Behind the Paper blog in Nature Astronomy "https://astronomycommunity.nature.com/posts/from-disruption-to-sustained-innovation-artificial-intelligence-for-gravitational-wave-astrophysics"

MSC Class: 68T01; 68T35; 83C35; 83C57

Journal ref: Nat Astron 5, 1062-1068 (2021)

arXiv:2012.06049 [pdf]

The Rise of AI-Driven Simulators: Building a New Crystal Ball

Authors: Ian Foster, David Parkes, Stephan Zheng

Abstract: The use of computational simulation is by now so pervasive in society that it is no exaggeration to say that continued U.S. and international prosperity, security, and health depend in part on continued improvements in simulation capabilities. What if we could predict weather two weeks out, guide the design of new drugs for new viral diseases, or manage new manufacturing processes that cut product… ▽ More The use of computational simulation is by now so pervasive in society that it is no exaggeration to say that continued U.S. and international prosperity, security, and health depend in part on continued improvements in simulation capabilities. What if we could predict weather two weeks out, guide the design of new drugs for new viral diseases, or manage new manufacturing processes that cut production costs and times by an order of magnitude? What if we could predict collective human behavior, for example, response to an evacuation request during a natural disaster, or labor response to fiscal stimulus? (See also the companion CCC Quad Paper on Pandemic Informatics, which discusses features that would be essential to solving large-scale problems like preparation for, and response to, the inevitable next pandemic.) The past decade has brought remarkable advances in complementary areas: in sensors, which can now capture enormous amounts of data about the world, and in AI methods capable of learning to extract predictive patterns from those data. These advances may lead to a new era in computational simulation, in which sensors of many kinds are used to produce vast quantities of data, AI methods identify patterns in those data, and new AI-driven simulators combine machine-learned and mathematical rules to make accurate and actionable predictions. At the same time, there are new challenges -- computers in some important regards are no longer getting faster, and in some areas we are reaching the limits of mathematical understanding, or at least of our ability to translate mathematical understanding into efficient simulation. In this paper, we lay out some themes that we envision forming part of a cohesive, multi-disciplinary, and application-inspired research agenda on AI-driven simulators. △ Less

Submitted 10 December, 2020; originally announced December 2020.

Comments: A Computing Community Consortium (CCC) white paper, 4 pages

Report number: ccc2020whitepaper_6

arXiv:2009.07226 [pdf, other]

Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes

Authors: Mert Hidayetoglu, Tekin Bicer, Simon Garcia de Gonzalo, Bin Ren, Vincent De Andrade, Doga Gursoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu

Abstract: X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, their use has been limited to small/medium datasets due to their computational requirements. In this paper, we propose a high-performance iterativ… ▽ More X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, their use has been limited to small/medium datasets due to their computational requirements. In this paper, we propose a high-performance iterative reconstruction system for terabyte(s)-scale 3D volumes. Our design involves three novel optimizations: (1) optimization of (back)projection operators by extending the 2D memory-centric approach to 3D; (2) performing hierarchical communications by exploiting "fat-node" architecture with many GPUs; (3) utilization of mixed-precision types while preserving convergence rate and quality. We extensively evaluate the proposed optimizations and scaling on the Summit supercomputer. Our largest reconstruction is a mouse brain volume with 9Kx11Kx11K voxels, where the total reconstruction time is under three minutes using 24,576 GPUs, reaching 65 PFLOPS: 34% of Summit's peak performance. △ Less

Submitted 15 September, 2020; originally announced September 2020.

arXiv:2009.03190 [pdf, other]

doi 10.1145/3452007

Design and Evaluation of a Simple Data Interface for Efficient Data Transfer Across Diverse Storage

Authors: Zhengchun Liu, Rajkumar Kettimuthu, Joaquin Chung, Rachana Ananthakrishnan, Michael Link, Ian Foster

Abstract: Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performant data exchange among these different systems, we propose Connector, a pluggable data access architecture for diverse, distributed storage… ▽ More Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performant data exchange among these different systems, we propose Connector, a pluggable data access architecture for diverse, distributed storage. By abstracting low-level storage system details, this abstraction permits a managed data transfer service (Globus in our case) to interact with a large and easily extended set of storage systems. Equally important, it supports third-party transfers: that is, direct data transfers from source to destination that are initiated by a third-party client but do not engage that third party in the data path. The abstraction also enables management of transfers for performance optimization, error handling, and end-to-end integrity. We present the Connector design, describe implementations for different storage services, evaluate tradeoffs inherent in managed vs.\ direct transfers, motivate recommended deployment options, and propose a performance model-based method that allows for easy characterization of performance in different contexts without exhaustive benchmarking. △ Less

Submitted 7 September, 2020; originally announced September 2020.

Journal ref: ACM Transactions on Modeling and Performance Evaluation of Computing Systems 2021

arXiv:2008.09591 [pdf, other]

Translating the Grid: How a Translational Approach Shaped the Development of Grid Computing

Authors: Ian Foster, Carl Kesselman

Abstract: A growing gap between progress in biological knowledge and improved health outcomes inspired the new discipline of translational medicine, in which the application of new knowledge is an explicit part of a research plan. Abramson and Parashar argue that a similar gap between complex computational technologies and ever-more-challenging applications demands an analogous discipline of translational c… ▽ More A growing gap between progress in biological knowledge and improved health outcomes inspired the new discipline of translational medicine, in which the application of new knowledge is an explicit part of a research plan. Abramson and Parashar argue that a similar gap between complex computational technologies and ever-more-challenging applications demands an analogous discipline of translational computer science, in which the deliberate movement of research results into large-scale practice becomes a central research focus rather than an afterthought. We revisit from this perspective the development and application of grid computing from the mid-1990s onwards, and find that a translational framing is useful for understanding the technology's development and impact. We discuss how the development of grid computing infrastructure, and the Globus Toolkit, in particular, benefited from a translational approach. We identify lessons learned that can be applied to other translational computer science initiatives. △ Less

Submitted 21 August, 2020; originally announced August 2020.

arXiv:2008.08198 [pdf, other]

BraggNN: Fast X-ray Bragg Peak Analysis Using Deep Learning

Authors: Zhengchun Liu, Hemant Sharma, Jun-Sang Park, Peter Kenesei, Antonino Miceli, Jonathan Almer, Rajkumar Kettimuthu, Ian Foster

Abstract: X-ray diffraction based microscopy techniques such as High Energy Diffraction Microscopy rely on knowledge of the position of diffraction peaks with high precision. These positions are typically computed by fitting the observed intensities in area detector data to a theoretical peak shape such as pseudo-Voigt. As experiments become more complex and detector technologies evolve, the computational c… ▽ More X-ray diffraction based microscopy techniques such as High Energy Diffraction Microscopy rely on knowledge of the position of diffraction peaks with high precision. These positions are typically computed by fitting the observed intensities in area detector data to a theoretical peak shape such as pseudo-Voigt. As experiments become more complex and detector technologies evolve, the computational cost of such peak detection and shape fitting becomes the biggest hurdle to the rapid analysis required for real-time feedback during in-situ experiments. To this end, we propose BraggNN, a deep learning-based method that can determine peak positions much more rapidly than conventional pseudo-Voigt peak fitting. When applied to a test dataset, BraggNN gives errors of less than 0.29 and 0.57 pixels, relative to the conventional method, for 75% and 95% of the peaks, respectively. When applied to a real experimental dataset, a 3D reconstruction that used peak positions computed by BraggNN yields 15% better results on average as compared to a reconstruction obtained using peak positions determined using conventional 2D pseudo-Voigt fitting. Recent advances in deep learning method implementations and special-purpose model inference accelerators allow BraggNN to deliver enormous performance improvements relative to the conventional method, running, for example, more than 200 times faster than a conventional method on a consumer-class GPU card with out-of-the-box software. △ Less

Submitted 2 June, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

arXiv:2008.06991 [pdf, other]

In-situ Workflow Auto-tuning via Combining Performance Models of Component Applications

Authors: Tong Shu, Yanfei Guo, Justin Wozniak, Xiaoning Ding, Ian Foster, Tahsin Kurc

Abstract: In-situ parallel workflows couple multiple component applications, such as simulation and analysis, via streaming data transfer. in order to avoid data exchange via shared file systems. Such workflows are challenging to configure for optimal performance due to the large space of possible configurations. Expert experience is rarely sufficient to identify optimal configurations, and existing empiric… ▽ More In-situ parallel workflows couple multiple component applications, such as simulation and analysis, via streaming data transfer. in order to avoid data exchange via shared file systems. Such workflows are challenging to configure for optimal performance due to the large space of possible configurations. Expert experience is rarely sufficient to identify optimal configurations, and existing empirical auto-tuning approaches are inefficient due to the high cost of obtaining training data for machine learning models. It is also infeasible to optimize individual components independently, due to component interactions. We propose here a new auto-tuning method, Component-based Ensemble Active Learning (CEAL), that combines machine learning techniques with knowledge of in-situ workflow structure to enable automated workflow configuration with a limited number of performance measurements. △ Less

Submitted 16 August, 2020; originally announced August 2020.

arXiv:2007.00784 [pdf, other]

Convolutional Neural Network Training with Distributed K-FAC

Authors: J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian T. Foster

Abstract: Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability… ▽ More Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability in convolutional neural network (CNN) training at scale. We study optimization techniques such as layer-wise distribution strategies, inverse-free second-order gradient evaluation, and dynamic K-FAC update decoupling to reduce training time while preserving convergence. We use residual neural networks (ResNet) applied to the CIFAR-10 and ImageNet-1k datasets to evaluate the correctness and scalability of our K-FAC gradient preconditioner. With ResNet-50 on the ImageNet-1k dataset, our distributed K-FAC implementation converges to the 75.9% MLPerf baseline in 18-25% less time than does the classic stochastic gradient descent (SGD) optimizer across scales on a GPU cluster. △ Less

Submitted 1 July, 2020; originally announced July 2020.

Comments: To be published in the proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20)

arXiv:2006.02431 [pdf, other]

Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

Authors: Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens, Hubertus van Dam, Rick Wagner

Abstract: Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort,… ▽ More Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort, we are aggregating numerous small molecules from a variety of sources, using high-performance computing (HPC) to computer diverse properties of those molecules, using the computed properties to train ML/AI models, and then using the resulting models for screening. In this first data release, we make available 23 datasets collected from community sources representing over 4.2 B molecules enriched with pre-computed: 1) molecular fingerprints to aid similarity searches, 2) 2D images of molecules to enable exploration and application of image-based deep learning methods, and 3) 2D and 3D molecular descriptors to speed development of machine learning models. This data release encompasses structural information on the 4.2 B molecules and 60 TB of pre-computed data. Future releases will expand the data to include more detailed molecular simulations, computed models, and other products. △ Less

Submitted 27 May, 2020; originally announced June 2020.

Comments: 11 pages, 5 figures

arXiv:2005.11300 [pdf, other]

Model Evidence with Fast Tree Based Quadrature

Authors: Thomas Foster, Chon Lok Lei, Martin Robinson, David Gavaghan, Ben Lambert

Abstract: High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this samp… ▽ More High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this sampling problem from the problem of using those samples to produce an approximation of the integral. TQ places no qualifications on how the samples provided to it are obtained, allowing it to use state-of-the-art sampling algorithms that are largely ignored by existing integration algorithms. Given a set of samples, TQ constructs a surrogate model of the integrand in the form of a regression tree, with a structure optimised to maximise integral precision. The tree divides the integration domain into smaller containers, which are individually integrated and aggregated to estimate the overall integral. Any method can be used to integrate each individual container, so existing integration methods, like Bayesian Monte Carlo, can be combined with TQ to boost their performance. On a set of benchmark problems, we show that TQ provides accurate approximations to integrals in up to 15 dimensions; and in dimensions 4 and above, it outperforms simple Monte Carlo and the popular Vegas method. △ Less

Submitted 22 May, 2020; originally announced May 2020.

arXiv:2003.05520 [pdf, ps, other]

Deriving peridynamic influence functions for one-dimensional elastic materials with periodic microstructure

Authors: Xiao Xu, John T. Foster

Abstract: The influence function in peridynamic material models has a large effect on the dynamic behavior of elastic waves and in turn can greatly effect dynamic simulations of fracture propagation and material failure. Typically, the influence functions that are used in peridynamic models are selected for their numerical properties without regard to physical considerations. In this work, we present a meth… ▽ More The influence function in peridynamic material models has a large effect on the dynamic behavior of elastic waves and in turn can greatly effect dynamic simulations of fracture propagation and material failure. Typically, the influence functions that are used in peridynamic models are selected for their numerical properties without regard to physical considerations. In this work, we present a method of deriving the peridynamic influence function for a one-dimensional initial/boundary value problem in a material with periodic microstructure. Starting with the linear local elastodynamic equation of motion in the microscale, we first use polynomial anzatzes to approximate microstructural displacements and then derive the homogenized nonlocal dynamic equation of motion for the macroscopic displacements; which, is easily reformulated as linear peridyamic equation with a discrete influence function. The shape and localization of the discrete influence function is completely determined by microstructural mechanical properties and length scales. By comparison with a highly resolved microstructural finite element model and the standard linear peridynamic model with a linearly decaying influence function, we demonstrate that the influence function derived from microstructural considerations is more accurate in predicting time dependent displacements and wave dynamics. △ Less

Submitted 4 March, 2020; originally announced March 2020.

arXiv:1912.12371 [pdf]

Open Source Software Sustainability Models: Initial White Paper from the Informatics Technology for Cancer Research Sustainability and Industry Partnership Work Group

Authors: Y. Ye, R. D. Boyce, M. K. Davis, K. Elliston, C. Davatzikos, A. Fedorov, J. C. Fillion-Robin, I. Foster, J. Gilbertson, M. Heiskanen, J. Klemm, A. Lasso, J. V. Miller, M. Morgan, S. Pieper, B. Raumann, B. Sarachan, G. Savova, J. C. Silverstein, D. Taylor, J. Zelnis, G. Q. Zhang, M. J. Becich

Abstract: The Sustainability and Industry Partnership Work Group (SIP-WG) is a part of the National Cancer Institute Informatics Technology for Cancer Research (ITCR) program. The charter of the SIP-WG is to investigate options of long-term sustainability of open source software (OSS) developed by the ITCR, in part by developing a collection of business model archetypes that can serve as sustainability plan… ▽ More The Sustainability and Industry Partnership Work Group (SIP-WG) is a part of the National Cancer Institute Informatics Technology for Cancer Research (ITCR) program. The charter of the SIP-WG is to investigate options of long-term sustainability of open source software (OSS) developed by the ITCR, in part by developing a collection of business model archetypes that can serve as sustainability plans for ITCR OSS development initiatives. The workgroup assembled models from the ITCR program, from other studies, and via engagement of its extensive network of relationships with other organizations (e.g., Chan Zuckerberg Initiative, Open Source Initiative and Software Sustainability Institute). This article reviews existing sustainability models and describes ten OSS use cases disseminated by the SIP-WG and others, and highlights five essential attributes (alignment with unmet scientific needs, dedicated development team, vibrant user community, feasible licensing model, and sustainable financial model) to assist academic software developers in achieving best practice in software sustainability. △ Less

Submitted 1 January, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

Comments: 21-page main manuscript, 43-page supplemental file

arXiv:1911.05878 [pdf, other]

Scientific Image Restoration Anywhere

Authors: Vibhatha Abeykoon, Zhengchun Liu, Rajkumar Kettimuthu, Geoffrey Fox, Ian Foster

Abstract: The use of deep learning models within scientific experimental facilities frequently requires low-latency inference, so that, for example, quality control operations can be performed while data are being collected. Edge computing devices can be useful in this context, as their low cost and compact form factor permit them to be co-located with the experimental apparatus. Can such devices, with thei… ▽ More The use of deep learning models within scientific experimental facilities frequently requires low-latency inference, so that, for example, quality control operations can be performed while data are being collected. Edge computing devices can be useful in this context, as their low cost and compact form factor permit them to be co-located with the experimental apparatus. Can such devices, with their limited resources, can perform neural network feed-forward computations efficiently and effectively? We explore this question by evaluating the performance and accuracy of a scientific image restoration model, for which both model input and output are images, on edge computing devices. Specifically, we evaluate deployments of TomoGAN, an image-denoising model based on generative adversarial networks developed for low-dose x-ray imaging, on the Google Edge TPU and NVIDIA Jetson. We adapt TomoGAN for edge execution, evaluate model inference performance, and propose methods to address the accuracy drop caused by model quantization. We show that these edge computing devices can deliver accuracy comparable to that of a full-fledged CPU or GPU model, at speeds that are more than adequate for use in the intended deployments, denoising a 1024 x 1024 image in less than a second. Our experiments also show that the Edge TPU models can provide 3x faster inference response than a CPU-based model and 1.5x faster than an edge GPU-based model. This combination of high speed and low cost permits image restoration anywhere. △ Less

Submitted 12 November, 2019; originally announced November 2019.

Comments: 6 pages, 8 figures, 1 table

Showing 1–50 of 56 results for author: Foster, T