Skip to main content

Showing 1–50 of 56 results for author: Foster, T

  1. arXiv:2407.09434  [pdf, other

    cs.LG cs.AI cs.CE eess.SY

    A Perspective on Foundation Models for the Electric Power Grid

    Authors: Hendrik F. Hamann, Thomas Brunschwiler, Blazhe Gjorgiev, Leonardo S. A. Martins, Alban Puech, Anna Varbella, Jonas Weiss, Juan Bernabe-Moreno, Alexandre Blondin Massé, Seong Choi, Ian Foster, Bri-Mathias Hodge, Rishabh Jain, Kibaek Kim, Vincent Mai, François Mirallès, Martin De Montigny, Octavio Ramos-Leaños, Hussein Suprême, Le Xie, El-Nasser S. Youssef, Arnaud Zinflou, Alexander J. Belvi, Ricardo J. Bessa, Bishnu Prasad Bhattari , et al. (2 additional authors not shown)

    Abstract: Foundation models (FMs) currently dominate news headlines. They employ advanced deep learning architectures to extract structural information autonomously from vast datasets through self-supervision. The resulting rich representations of complex systems and dynamics can be applied to many downstream applications. Therefore, FMs can find uses in electric power grids, challenged by the energy transi… ▽ More

    Submitted 12 July, 2024; originally announced July 2024.

    Comments: Lead contact: H.F.H.; Major equal contributors: H.F.H., T.B., B.G., L.S.A.M., A.P., A.V., J.W.; Significant equal contributors: J.B., A.B.M., S.C., I.F., B.H., R.J., K.K., V.M., F.M., M.D.M., O.R., H.S., L.X., E.S.Y., A.Z.; Other equal contributors: A.J.B., R.J.B., B.P.B., J.S., S.S

  2. arXiv:2407.01764  [pdf, other

    cs.DC

    Object Proxy Patterns for Accelerating Distributed Applications

    Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Alexander Brace, André Bauer, Kyle Chard, Ian Foster

    Abstract: Workflow and serverless frameworks have empowered new approaches to distributed application design by abstracting compute resources. However, their typically limited or one-size-fits-all support for advanced data flow patterns leaves optimization to the application programmer -- optimization that becomes more difficult as data become larger. The transparent object proxy, which provides wide-area r… ▽ More

    Submitted 1 July, 2024; originally announced July 2024.

  3. arXiv:2405.15828  [pdf, other

    cs.DL cs.AI

    Oil & Water? Diffusion of AI Within and Across Scientific Fields

    Authors: Eamon Duede, William Dolan, André Bauer, Ian Foster, Karim Lakhani

    Abstract: This study empirically investigates claims of the increasing ubiquity of artificial intelligence (AI) within roughly 80 million research publications across 20 diverse scientific fields, by examining the change in scholarly engagement with AI from 1985 through 2022. We observe exponential growth, with AI-engaged publications increasing approximately thirteenfold (13x) across all fields, suggesting… ▽ More

    Submitted 23 May, 2024; originally announced May 2024.

  4. arXiv:2404.19717  [pdf, other

    cs.DC

    Automated, Reliable, and Efficient Continental-Scale Replication of 7.3 Petabytes of Climate Simulation Data: A Case Study

    Authors: Lukasz Lacinski, Lee Liming, Steven Turoscy, Cameron Harr, Kyle Chard, Eli Dart, Paul Durack, Sasha Ames, Forrest M. Hoffman, Ian T. Foster

    Abstract: We report on our experiences replicating 7.3 petabytes (PB) of Earth System Grid Federation (ESGF) climate simulation data from Lawrence Livermore National Laboratory (LLNL) in California to Argonne National Laboratory (ANL) in Illinois and Oak Ridge National Laboratory (ORNL) in Tennessee. This movement of some 29 million files, twice, undertaken in order to establish new ESGF nodes at ANL and OR… ▽ More

    Submitted 30 April, 2024; originally announced April 2024.

  5. MalleTrain: Deep Neural Network Training on Unfillable Supercomputer Nodes

    Authors: Xiaolong Ma, Feng Yan, Lei Yang, Ian Foster, Michael E. Papka, Zhengchun Liu, Rajkumar Kettimuthu

    Abstract: First-come first-serve scheduling can result in substantial (up to 10%) of transiently idle nodes on supercomputers. Recognizing that such unfilled nodes are well-suited for deep neural network (DNN) training, due to the flexible nature of DNN training tasks, Liu et al. proposed that the re-scaling DNN training tasks to fit gaps in schedules be formulated as a mixed-integer linear programming (MIL… ▽ More

    Submitted 24 April, 2024; originally announced April 2024.

  6. arXiv:2403.19257  [pdf, other

    cs.DC

    UniFaaS: Programming across Distributed Cyberinfrastructure with Federated Function Serving

    Authors: Yifei Li, Ryan Chard, Yadu Babuji, Kyle Chard, Ian Foster, Zhuozhao Li

    Abstract: Modern scientific applications are increasingly decomposable into individual functions that may be deployed across distributed and diverse cyberinfrastructure such as supercomputers, clouds, and accelerators. Such applications call for new approaches to programming, distributed execution, and function-level management. We present UniFaaS, a parallel programming framework that relies on a federated… ▽ More

    Submitted 28 March, 2024; originally announced March 2024.

    Comments: 13 pages, 13 figures, IPDPS2024

  7. arXiv:2403.06077  [pdf, other

    cs.DC

    Steering a Fleet: Adaptation for Large-Scale, Workflow-Based Experiments

    Authors: Jim Pruyne, Valerie Hayot-Sasson, Weijian Zheng, Ryan Chard, Justin M. Wozniak, Tekin Bicer, Kyle Chard, Ian T. Foster

    Abstract: Experimental science is increasingly driven by instruments that produce vast volumes of data and thus a need to manage, compute, describe, and index this data. High performance and distributed computing provide the means of addressing the computing needs; however, in practice, the variety of actions required and the distributed set of resources involved, requires sophisticated "flows" defining the… ▽ More

    Submitted 9 March, 2024; originally announced March 2024.

  8. arXiv:2312.06592  [pdf, other

    cs.CV

    Flexible visual prompts for in-context learning in computer vision

    Authors: Thomas Foster, Ioana Croitoru, Robert Dorfman, Christoffer Edlund, Thomas Varsavsky, Jon Almazán

    Abstract: In this work, we address in-context learning (ICL) for the task of image segmentation, introducing a novel approach that adapts a modern Video Object Segmentation (VOS) technique for visual in-context learning. This adaptation is inspired by the VOS method's ability to efficiently and flexibly learn objects from a few examples. Through evaluations across a range of support set sizes and on diverse… ▽ More

    Submitted 11 December, 2023; originally announced December 2023.

  9. arXiv:2312.03989  [pdf, other

    cs.LG cond-mat.mtrl-sci eess.IV physics.data-an

    Rapid detection of rare events from in situ X-ray diffraction data using machine learning

    Authors: Weijian Zheng, Jun-Sang Park, Peter Kenesei, Ahsan Ali, Zhengchun Liu, Ian T. Foster, Nicholas Schwarz, Rajkumar Kettimuthu, Antonino Miceli, Hemant Sharma

    Abstract: High-energy X-ray diffraction methods can non-destructively map the 3D microstructure and associated attributes of metallic polycrystalline engineering materials in their bulk form. These methods are often combined with external stimuli such as thermo-mechanical loading to take snapshots over time of the evolving microstructure and attributes. However, the extreme data volumes and the high costs o… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  10. arXiv:2312.03876  [pdf, other

    physics.ao-ph cs.AI cs.LG

    Scaling transformer neural networks for skillful and reliable medium-range weather forecasting

    Authors: Tung Nguyen, Rohan Shah, Hritik Bansal, Troy Arcomano, Sandeep Madireddy, Romit Maulik, Veerabhadra Kotamarthi, Ian Foster, Aditya Grover

    Abstract: Weather forecasting is a fundamental problem for anticipating and mitigating the impacts of climate change. Recently, data-driven approaches for weather forecasting based on deep learning have shown great promise, achieving accuracies that are competitive with operational systems. However, those methods often employ complex, customized architectures without sufficient ablation analysis, making it… ▽ More

    Submitted 6 December, 2023; originally announced December 2023.

  11. arXiv:2310.18948  [pdf, other

    cs.LG cs.AI cs.DM math.PR

    Multi-Path Long-Term Vessel Trajectories Forecasting with Probabilistic Feature Fusion for Problem Shifting

    Authors: Gabriel Spadon, Jay Kumar, Derek Eden, Josh van Berkel, Tom Foster, Amilcar Soares, Ronan Fablet, Stan Matwin, Ronald Pelot

    Abstract: This paper addresses the challenge of boosting the precision of multi-path long-term vessel trajectory forecasting on engineered sequences of Automatic Identification System (AIS) data using feature fusion for problem shifting. We have developed a deep auto-encoder model and a phased framework approach to predict the next 12 hours of vessel trajectories using 1 to 3 hours of AIS data as input. To… ▽ More

    Submitted 10 July, 2024; v1 submitted 29 October, 2023; originally announced October 2023.

  12. arXiv:2310.16270  [pdf, other

    cs.CL cs.AI cs.LG

    Attention Lens: A Tool for Mechanistically Interpreting the Attention Head Information Retrieval Mechanism

    Authors: Mansi Sakarvadia, Arham Khan, Aswathy Ajith, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster

    Abstract: Transformer-based Large Language Models (LLMs) are the state-of-the-art for natural language tasks. Recent work has attempted to decode, by reverse engineering the role of linear layers, the internal mechanisms by which LLMs arrive at their final predictions for text completion tasks. Yet little is known about the specific role of attention heads in producing the final token prediction. We propose… ▽ More

    Submitted 24 October, 2023; originally announced October 2023.

  13. arXiv:2310.00510  [pdf, other

    cs.RO

    Exploring Benchmarks for Self-Driving Labs using Color Matching

    Authors: Tobias Ginsburg, Kyle Hippe, Ryan Lewis, Doga Ozgulbas, Aileen Cleary, Rory Butler, Casey Stone, Abraham Stroka, Ian Foster

    Abstract: Self Driving Labs (SDLs) that combine automation of experimental procedures with autonomous decision making are gaining popularity as a means of increasing the throughput of scientific workflows. The task of identifying quantities of supplied colored pigments that match a target color, the color matching problem, provides a simple and flexible SDL test case, as it requires experiment proposal, sam… ▽ More

    Submitted 30 September, 2023; originally announced October 2023.

  14. arXiv:2309.05605  [pdf, other

    cs.CL cs.AI cs.LG

    Memory Injections: Correcting Multi-Hop Reasoning Failures during Inference in Transformer-Based Language Models

    Authors: Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Daniel Grzenda, Nathaniel Hudson, André Bauer, Kyle Chard, Ian Foster

    Abstract: Answering multi-hop reasoning questions requires retrieving and synthesizing information from diverse sources. Large Language Models (LLMs) struggle to perform such reasoning consistently. Here we propose an approach to pinpoint and rectify multi-hop reasoning failures through targeted memory injections on LLM attention heads. First, we analyze the per-layer activations of GPT-2 models in response… ▽ More

    Submitted 28 February, 2024; v1 submitted 11 September, 2023; originally announced September 2023.

    Comments: Oral Presentation at BlackboxNLP Workshop at EMNLP 2023

  15. arXiv:2308.13701  [pdf, other

    cs.DC cs.AI

    Linking the Dynamic PicoProbe Analytical Electron-Optical Beam Line / Microscope to Supercomputers

    Authors: Alexander Brace, Rafael Vescovi, Ryan Chard, Nickolaus D. Saint, Arvind Ramanathan, Nestor J. Zaluzec, Ian Foster

    Abstract: The Dynamic PicoProbe at Argonne National Laboratory is undergoing upgrades that will enable it to produce up to 100s of GB of data per day. While this data is highly important for both fundamental science and industrial applications, there is currently limited on-site infrastructure to handle these high-volume data streams. We address this problem by providing a software architecture capable of s… ▽ More

    Submitted 25 August, 2023; originally announced August 2023.

  16. arXiv:2308.09793  [pdf, other

    cs.RO

    Towards a Modular Architecture for Science Factories

    Authors: Rafael Vescovi, Tobias Ginsburg, Kyle Hippe, Doga Ozgulbas, Casey Stone, Abraham Stroka, Rory Butler, Ben Blaiszik, Tom Brettin, Kyle Chard, Mark Hereld, Arvind Ramanathan, Rick Stevens, Aikaterini Vriza, Jie Xu, Qingteng Zhang, Ian Foster

    Abstract: Advances in robotic automation, high-performance computing (HPC), and artificial intelligence (AI) encourage us to conceive of science factories: large, general-purpose computation- and AI-enabled self-driving laboratories (SDLs) with the generality and scale needed both to tackle large discovery problems and to support thousands of scientists. Science factories require modular hardware and softwa… ▽ More

    Submitted 17 October, 2023; v1 submitted 18 August, 2023; originally announced August 2023.

  17. arXiv:2306.08695  [pdf, other

    cond-mat.mtrl-sci cs.AI

    A generative artificial intelligence framework based on a molecular diffusion model for the design of metal-organic frameworks for carbon capture

    Authors: Hyun Park, Xiaoli Yan, Ruijie Zhu, E. A. Huerta, Santanu Chaudhuri, Donny Cooper, Ian Foster, Emad Tajkhorshid

    Abstract: Metal-organic frameworks (MOFs) exhibit great promise for CO2 capture. However, finding the best performing materials poses computational and experimental grand challenges in view of the vast chemical space of potential building blocks. Here, we introduce GHP-MOFassemble, a generative artificial intelligence (AI), high performance framework for the rational and accelerated design of MOFs with high… ▽ More

    Submitted 12 March, 2024; v1 submitted 14 June, 2023; originally announced June 2023.

    Comments: 25 pages, 17 figures, 6 tables, accepted to Nature Communications Chemistry. This work was awarded the HPCwire 2023 Editors' Choice Awards for Best Use of High Performance Data Analytics \& Artificial Intelligence see https://www.hpcwire.com/2023-readers-editors-choice-data-analytics-ai/

    ACM Class: I.2

    Journal ref: Commun Chem 7, 21 (2024)

  18. arXiv:2306.06283  [pdf, other

    cond-mat.mtrl-sci cs.LG physics.chem-ph

    14 Examples of How LLMs Can Transform Materials Science and Chemistry: A Reflection on a Large Language Model Hackathon

    Authors: Kevin Maik Jablonka, Qianxiang Ai, Alexander Al-Feghali, Shruti Badhwar, Joshua D. Bocarsly, Andres M Bran, Stefan Bringuier, L. Catherine Brinson, Kamal Choudhary, Defne Circi, Sam Cox, Wibe A. de Jong, Matthew L. Evans, Nicolas Gastellu, Jerome Genzling, María Victoria Gil, Ankur K. Gupta, Zhi Hong, Alishba Imran, Sabine Kruschwitz, Anne Labarre, Jakub Lála, Tao Liu, Steven Ma, Sauradeep Majumdar , et al. (28 additional authors not shown)

    Abstract: Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of mole… ▽ More

    Submitted 14 July, 2023; v1 submitted 9 June, 2023; originally announced June 2023.

  19. arXiv:2305.09593  [pdf, other

    cs.DC

    Accelerating Communications in Federated Applications with Transparent Object Proxies

    Authors: J. Gregory Pauloski, Valerie Hayot-Sasson, Logan Ward, Nathaniel Hudson, Charlie Sabino, Matt Baughman, Kyle Chard, Ian Foster

    Abstract: Advances in networks, accelerators, and cloud services encourage programmers to reconsider where to compute -- such as when fast networks make it cost-effective to compute on remote accelerators despite added latency. Workflow and cloud-hosted serverless computing frameworks can manage multi-step computations spanning federated collections of cloud, high-performance computing (HPC), and edge syste… ▽ More

    Submitted 29 August, 2023; v1 submitted 16 May, 2023; originally announced May 2023.

    Comments: Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC23)

  20. arXiv:2210.08973  [pdf, ps, other

    cs.CY cs.HC cs.LG hep-ex

    FAIR for AI: An interdisciplinary and international community building perspective

    Authors: E. A. Huerta, Ben Blaiszik, L. Catherine Brinson, Kristofer E. Bouchard, Daniel Diaz, Caterina Doglioni, Javier M. Duarte, Murali Emani, Ian Foster, Geoffrey Fox, Philip Harris, Lukas Heinrich, Shantenu Jha, Daniel S. Katz, Volodymyr Kindratenko, Christine R. Kirkpatrick, Kati Lassila-Perini, Ravi K. Madduri, Mark S. Neubauer, Fotis E. Psomopoulos, Avik Roy, Oliver Rübel, Zhizhen Zhao, Ruike Zhu

    Abstract: A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to i… ▽ More

    Submitted 1 August, 2023; v1 submitted 30 September, 2022; originally announced October 2022.

    Comments: 10 pages, comments welcome!; v2: 12 pages, accepted to Scientific Data

    ACM Class: I.2.0; E.0

    Journal ref: Scientific Data 10, 487 (2023)

  21. funcX: Federated Function as a Service for Science

    Authors: Zhuozhao Li, Ryan Chard, Yadu Babuji, Ben Galewsky, Tyler Skluzacek, Kirill Nagaitsev, Anna Woodard, Ben Blaiszik, Josh Bryan, Daniel S. Katz, Ian Foster, Kyle Chard

    Abstract: funcX is a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. Unlike centralized FaaS systems, funcX decouples the cloud-hosted management functionality from the edge-hosted execution functionality. funcX's endpoint software can be deployed, by users or administrators, on arbitrary laptops, clouds, clusters, and superc… ▽ More

    Submitted 23 September, 2022; originally announced September 2022.

    Comments: arXiv admin note: substantial text overlap with arXiv:2005.04215

  22. arXiv:2209.09408  [pdf, other

    cs.LG eess.IV

    Deep learning at the edge enables real-time streaming ptychographic imaging

    Authors: Anakha V Babu, Tao Zhou, Saugat Kandel, Tekin Bicer, Zhengchun Liu, William Judge, Daniel J. Ching, Yi Jiang, Sinisa Veseli, Steven Henke, Ryan Chard, Yudong Yao, Ekaterina Sirazitdinova, Geetika Gupta, Martin V. Holt, Ian T. Foster, Antonino Miceli, Mathew J. Cherukara

    Abstract: Coherent microscopy techniques provide an unparalleled multi-scale view of materials across scientific and technological fields, from structural materials to quantum devices, from integrated circuits to biological cells. Driven by the construction of brighter sources and high-rate detectors, coherent X-ray microscopy methods like ptychography are poised to revolutionize nanoscale materials charact… ▽ More

    Submitted 19 September, 2022; originally announced September 2022.

  23. arXiv:2208.09513  [pdf, other

    cs.DC cs.AI

    Globus Automation Services: Research process automation across the space-time continuum

    Authors: Ryan Chard, Jim Pruyne, Kurt McKee, Josh Bryan, Brigitte Raumann, Rachana Ananthakrishnan, Kyle Chard, Ian Foster

    Abstract: Research process automation -- the reliable, efficient, and reproducible execution of linked sets of actions on scientific instruments, computers, data stores, and other resources -- has emerged as an essential element of modern science. We report here on new services within the Globus research data management platform that enable the specification of diverse research processes as reusable sets of… ▽ More

    Submitted 6 December, 2022; v1 submitted 19 August, 2022; originally announced August 2022.

  24. arXiv:2207.00611  [pdf, other

    cs.AI cond-mat.mtrl-sci cs.LG

    FAIR principles for AI models with a practical application for accelerated high energy diffraction microscopy

    Authors: Nikil Ravi, Pranshu Chaturvedi, E. A. Huerta, Zhengchun Liu, Ryan Chard, Aristana Scourtas, K. J. Schmidt, Kyle Chard, Ben Blaiszik, Ian Foster

    Abstract: A concise and measurable set of FAIR (Findable, Accessible, Interoperable and Reusable) principles for scientific data is transforming the state-of-practice for data management and stewardship, supporting and enabling discovery and innovation. Learning from this initiative, and acknowledging the impact of artificial intelligence (AI) in the practice of science and engineering, we introduce a set o… ▽ More

    Submitted 21 December, 2022; v1 submitted 1 July, 2022; originally announced July 2022.

    Comments: 11 pages, 3 figures; Accepted to Scientific Data; for press release see https://www.anl.gov/article/argonne-scientists-promote-fair-standards-for-managing-artificial-intelligence-models and https://www.ncsa.illinois.edu/ncsa-student-researchers-lead-authors-on-award-winning-paper; Received 2022 HPCwire Readers' Choice Award on Best Use of High Performance Data Analytics & Artificial Intelligence

    MSC Class: 68T01; 68T05 ACM Class: I.2; J.2

    Journal ref: Scientific Data 9, 657 (2022)

  25. arXiv:2205.11342  [pdf, other

    cs.CL cs.LG

    The Diminishing Returns of Masked Language Models to Science

    Authors: Zhi Hong, Aswathy Ajith, Gregory Pauloski, Eamon Duede, Kyle Chard, Ian Foster

    Abstract: Transformer-based masked language models such as BERT, trained on general corpora, have shown impressive performance on downstream tasks. It has also been demonstrated that the downstream task performance of such models can be improved by pretraining larger models for longer on more data. In this work, we empirically evaluate the extent to which these results extend to tasks in science. We use 14… ▽ More

    Submitted 3 May, 2023; v1 submitted 23 May, 2022; originally announced May 2022.

    Comments: 12 pages. 3 figures. 5 tables. Accepted to the Findings of ACL 2023

    ACM Class: I.2.7

  26. arXiv:2204.05128  [pdf, other

    cs.DC

    Linking Scientific Instruments and HPC: Patterns, Technologies, Experiences

    Authors: Rafael Vescovi, Ryan Chard, Nickolaus Saint, Ben Blaiszik, Jim Pruyne, Tekin Bicer, Alex Lavens, Zhengchun Liu, Michael E. Papka, Suresh Narayanan, Nicholas Schwarz, Kyle Chard, Ian Foster

    Abstract: Powerful detectors at modern experimental facilities routinely collect data at multiple GB/s. Online analysis methods are needed to enable the collection of only interesting subsets of such massive data streams, such as by explicitly discarding some data elements or by directing instruments to relevant areas of experimental space. Such online analyses require methods for configuring and running hi… ▽ More

    Submitted 22 August, 2022; v1 submitted 11 April, 2022; originally announced April 2022.

  27. The History of the Grid

    Authors: Ian Foster, Carl Kesselman

    Abstract: With the widespread availability of high-speed networks, it becomes feasible to outsource computing to remote providers and to federate resources from many locations. Such observations motivated the development, from the mid-1990s onwards, of a range of innovative Grid technologies, applications, and infrastructures. We review the history, current status, and future prospects for Grid computing.

    Submitted 8 April, 2022; originally announced April 2022.

    Journal ref: High Performance Computing: From Grids and Clouds to Exascale, IOS Press, pages 3-30, 2011

  28. Multi-Output Physics-Informed Neural Networks for Forward and Inverse PDE Problems with Uncertainties

    Authors: Mingyuan Yang, John T. Foster

    Abstract: Physics-informed neural networks (PINNs) have recently been used to solve various computational problems which are governed by partial differential equations (PDEs). In this paper, we propose a multi-output physics-informed neural network (MO-PINN) which can provide solutions with uncertainty distributions for both forward and inverse PDE problems with noisy data. In this framework, the uncertaint… ▽ More

    Submitted 3 February, 2022; originally announced February 2022.

  29. CUF-Links: Continuous and Ubiquitous FAIRness Linkages for reproducible research

    Authors: Ian Foster, Carl Kesselman

    Abstract: Despite much creative work on methods and tools, reproducibility -- the ability to repeat the computational steps used to obtain a research result -- remains elusive. One reason for these difficulties is that extant tools for capturing research processes do not align well with the rich working practices of scientists. We advocate here for simple mechanisms that can be integrated easily with curren… ▽ More

    Submitted 20 January, 2022; originally announced January 2022.

    Journal ref: Computer, vol. 55, no. 8, pp. 20-30, Aug. 2022

  30. Sharing Begins at Home

    Authors: William Dempsey, Ian Foster, Scott Fraser, Carl Kesselman

    Abstract: The broad sharing of research data is widely viewed as of critical importance for the speed, quality, accessibility, and integrity of science. Despite increasing efforts to encourage data sharing, both the quality of shared data, and the frequency of data reuse, remain stubbornly low. We argue here that a major reason for this unfortunate state of affairs is that the organization of research resul… ▽ More

    Submitted 8 July, 2022; v1 submitted 17 January, 2022; originally announced January 2022.

    Journal ref: Harvard Data Science Review, Volume 4, Issue 3, 2022

  31. arXiv:2111.11330  [pdf, other

    cs.DC

    High-Performance Ptychographic Reconstruction with Federated Facilities

    Authors: Tekin Bicer, Xiaodong Yu, Daniel J. Ching, Ryan Chard, Mathew J. Cherukara, Bogdan Nicolae, Rajkumar Kettimuthu, Ian T. Foster

    Abstract: Beamlines at synchrotron light source facilities are powerful scientific instruments used to image samples and observe phenomena at high spatial and temporal resolutions. Typically, these facilities are equipped only with modest compute resources for the analysis of generated experimental datasets. However, high data rate experiments can easily generate data in volumes that take days (or even week… ▽ More

    Submitted 22 November, 2021; originally announced November 2021.

    Comments: 19 pages, 5 figures, to be published in Smoky Mountains Computational Sciences and Engineering Conference (SMC 2021)

  32. arXiv:2110.02827  [pdf, other

    cs.DC cond-mat.mtrl-sci cs.LG

    Colmena: Scalable Machine-Learning-Based Steering of Ensemble Simulations for High Performance Computing

    Authors: Logan Ward, Ganesh Sivaraman, J. Gregory Pauloski, Yadu Babuji, Ryan Chard, Naveen Dandu, Paul C. Redfern, Rajeev S. Assary, Kyle Chard, Larry A. Curtiss, Rajeev Thakur, Ian Foster

    Abstract: Scientific applications that involve simulation ensembles can be accelerated greatly by using experiment design methods to select the best simulations to perform. Methods that use machine learning (ML) to create proxy models of simulations show particular promise for guiding ensembles but are challenging to deploy because of the need to coordinate dynamic mixes of simulation and learning tasks. We… ▽ More

    Submitted 6 October, 2021; originally announced October 2021.

    Comments: camera-ready version for ML in HPC Environments 2021

  33. Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing

    Authors: Justin M. Wozniak, Timothy G. Armstrong, Ketan C. Maheshwari, Daniel S. Katz, Michael Wilde, Ian T. Foster

    Abstract: Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitat… ▽ More

    Submitted 6 July, 2021; originally announced July 2021.

    Comments: 2015 IEEE International Conference on Cluster Computing

  34. KAISA: An Adaptive Second-Order Optimizer Framework for Deep Neural Networks

    Authors: J. Gregory Pauloski, Qi Huang, Lei Huang, Shivaram Venkataraman, Kyle Chard, Ian Foster, Zhao Zhang

    Abstract: Kronecker-factored Approximate Curvature (K-FAC) has recently been shown to converge faster in deep neural network (DNN) training than stochastic gradient descent (SGD); however, K-FAC's larger memory footprint hinders its applicability to large models. We present KAISA, a K-FAC-enabled, Adaptable, Improved, and ScAlable second-order optimizer framework that adapts the memory footprint, communicat… ▽ More

    Submitted 20 September, 2021; v1 submitted 4 July, 2021; originally announced July 2021.

    Comments: Accepted for publication at the International Conference for High Performance Computing, Networking, Storage and Analysis (SC21)

  35. arXiv:2104.06953  [pdf

    cs.CY

    A National Discovery Cloud: Preparing the US for Global Competitiveness in the New Era of 21st Century Digital Transformation

    Authors: Ian Foster, Daniel Lopresti, Bill Gropp, Mark D. Hill, Katie Schuman

    Abstract: The nature of computation and its role in our lives have been transformed in the past two decades by three remarkable developments: the emergence of public cloud utilities as a new computing platform; the ability to extract information from enormous quantities of data via machine learning; and the emergence of computational simulation as a research method on par with experimental science. Each dev… ▽ More

    Submitted 19 April, 2021; v1 submitted 14 April, 2021; originally announced April 2021.

    Comments: A Computing Community Consortium (CCC) white paper, 6 pages

    Report number: ccc2021whitepaper_4

  36. arXiv:2101.06813  [pdf, other

    cs.LG cs.AI stat.AP

    Fast and accurate learned multiresolution dynamical downscaling for precipitation

    Authors: Jiali Wang, Zhengchun Liu, Ian Foster, Won Chang, Rajkumar Kettimuthu, Rao Kotamarthi

    Abstract: This study develops a neural network-based approach for emulating high-resolution modeled precipitation data with comparable statistical properties but at greatly reduced computational cost. The key idea is to use combination of low- and high- resolution simulations to train a neural network to map from the former to the latter. Specifically, we define two types of CNNs, one that stacks variables… ▽ More

    Submitted 17 January, 2021; originally announced January 2021.

  37. arXiv:2101.01284  [pdf

    cs.CY cs.AR

    Advancing Computing's Foundation of US Industry & Society

    Authors: Thomas M. Conte, Ian T. Foster, William Gropp, Mark D. Hill

    Abstract: While past information technology (IT) advances have transformed society, future advances hold even greater promise. For example, we have only just begun to reap the changes from artificial intelligence (AI), especially machine learning (ML). Underlying IT's impact are the dramatic improvements in computer hardware, which deliver performance that unlock new capabilities. For example, recent succes… ▽ More

    Submitted 4 January, 2021; originally announced January 2021.

    Comments: A Computing Community Consortium (CCC) white paper, 4 pages

    Report number: ccc2020whitepaper_17

  38. arXiv:2012.08545  [pdf, other

    gr-qc astro-ph.IM cs.AI cs.DC

    Accelerated, Scalable and Reproducible AI-driven Gravitational Wave Detection

    Authors: E. A. Huerta, Asad Khan, Xiaobo Huang, Minyang Tian, Maksim Levental, Ryan Chard, Wei Wei, Maeve Heflin, Daniel S. Katz, Volodymyr Kindratenko, Dawei Mu, Ben Blaiszik, Ian Foster

    Abstract: The development of reusable artificial intelligence (AI) models for wider use and rigorous validation by the community promises to unlock new opportunities in multi-messenger astrophysics. Here we develop a workflow that connects the Data and Learning Hub for Science, a repository for publishing AI models, with the Hardware Accelerated Learning (HAL) cluster, using funcX as a universal distributed… ▽ More

    Submitted 9 July, 2021; v1 submitted 15 December, 2020; originally announced December 2020.

    Comments: 17 pages, 5 figures; v2: 12 pages, 6 figures. Accepted to Nature Astronomy. See also the Behind the Paper blog in Nature Astronomy "https://astronomycommunity.nature.com/posts/from-disruption-to-sustained-innovation-artificial-intelligence-for-gravitational-wave-astrophysics"

    MSC Class: 68T01; 68T35; 83C35; 83C57

    Journal ref: Nat Astron 5, 1062-1068 (2021)

  39. arXiv:2012.06049  [pdf

    cs.CY cs.AI

    The Rise of AI-Driven Simulators: Building a New Crystal Ball

    Authors: Ian Foster, David Parkes, Stephan Zheng

    Abstract: The use of computational simulation is by now so pervasive in society that it is no exaggeration to say that continued U.S. and international prosperity, security, and health depend in part on continued improvements in simulation capabilities. What if we could predict weather two weeks out, guide the design of new drugs for new viral diseases, or manage new manufacturing processes that cut product… ▽ More

    Submitted 10 December, 2020; originally announced December 2020.

    Comments: A Computing Community Consortium (CCC) white paper, 4 pages

    Report number: ccc2020whitepaper_6

  40. arXiv:2009.07226  [pdf, other

    cs.DC

    Petascale XCT: 3D Image Reconstruction with Hierarchical Communications on Multi-GPU Nodes

    Authors: Mert Hidayetoglu, Tekin Bicer, Simon Garcia de Gonzalo, Bin Ren, Vincent De Andrade, Doga Gursoy, Raj Kettimuthu, Ian T. Foster, Wen-mei W. Hwu

    Abstract: X-ray computed tomography is a commonly used technique for noninvasive imaging at synchrotron facilities. Iterative tomographic reconstruction algorithms are often preferred for recovering high quality 3D volumetric images from 2D X-ray images, however, their use has been limited to small/medium datasets due to their computational requirements. In this paper, we propose a high-performance iterativ… ▽ More

    Submitted 15 September, 2020; originally announced September 2020.

  41. Design and Evaluation of a Simple Data Interface for Efficient Data Transfer Across Diverse Storage

    Authors: Zhengchun Liu, Rajkumar Kettimuthu, Joaquin Chung, Rachana Ananthakrishnan, Michael Link, Ian Foster

    Abstract: Modern science and engineering computing environments often feature storage systems of different types, from parallel file systems in high-performance computing centers to object stores operated by cloud providers. To enable easy, reliable, secure, and performant data exchange among these different systems, we propose Connector, a pluggable data access architecture for diverse, distributed storage… ▽ More

    Submitted 7 September, 2020; originally announced September 2020.

    Journal ref: ACM Transactions on Modeling and Performance Evaluation of Computing Systems 2021

  42. arXiv:2008.09591  [pdf, other

    cs.DC

    Translating the Grid: How a Translational Approach Shaped the Development of Grid Computing

    Authors: Ian Foster, Carl Kesselman

    Abstract: A growing gap between progress in biological knowledge and improved health outcomes inspired the new discipline of translational medicine, in which the application of new knowledge is an explicit part of a research plan. Abramson and Parashar argue that a similar gap between complex computational technologies and ever-more-challenging applications demands an analogous discipline of translational c… ▽ More

    Submitted 21 August, 2020; originally announced August 2020.

  43. arXiv:2008.08198  [pdf, other

    eess.IV cs.LG

    BraggNN: Fast X-ray Bragg Peak Analysis Using Deep Learning

    Authors: Zhengchun Liu, Hemant Sharma, Jun-Sang Park, Peter Kenesei, Antonino Miceli, Jonathan Almer, Rajkumar Kettimuthu, Ian Foster

    Abstract: X-ray diffraction based microscopy techniques such as High Energy Diffraction Microscopy rely on knowledge of the position of diffraction peaks with high precision. These positions are typically computed by fitting the observed intensities in area detector data to a theoretical peak shape such as pseudo-Voigt. As experiments become more complex and detector technologies evolve, the computational c… ▽ More

    Submitted 2 June, 2021; v1 submitted 18 August, 2020; originally announced August 2020.

  44. arXiv:2008.06991  [pdf, other

    cs.DC cs.PF

    In-situ Workflow Auto-tuning via Combining Performance Models of Component Applications

    Authors: Tong Shu, Yanfei Guo, Justin Wozniak, Xiaoning Ding, Ian Foster, Tahsin Kurc

    Abstract: In-situ parallel workflows couple multiple component applications, such as simulation and analysis, via streaming data transfer. in order to avoid data exchange via shared file systems. Such workflows are challenging to configure for optimal performance due to the large space of possible configurations. Expert experience is rarely sufficient to identify optimal configurations, and existing empiric… ▽ More

    Submitted 16 August, 2020; originally announced August 2020.

  45. arXiv:2007.00784  [pdf, other

    cs.LG cs.DC stat.ML

    Convolutional Neural Network Training with Distributed K-FAC

    Authors: J. Gregory Pauloski, Zhao Zhang, Lei Huang, Weijia Xu, Ian T. Foster

    Abstract: Training neural networks with many processors can reduce time-to-solution; however, it is challenging to maintain convergence and efficiency at large scales. The Kronecker-factored Approximate Curvature (K-FAC) was recently proposed as an approximation of the Fisher Information Matrix that can be used in natural gradient optimizers. We investigate here a scalable K-FAC design and its applicability… ▽ More

    Submitted 1 July, 2020; originally announced July 2020.

    Comments: To be published in the proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20)

  46. arXiv:2006.02431  [pdf, other

    q-bio.BM cs.LG q-bio.QM stat.ML

    Targeting SARS-CoV-2 with AI- and HPC-enabled Lead Generation: A First Data Release

    Authors: Yadu Babuji, Ben Blaiszik, Tom Brettin, Kyle Chard, Ryan Chard, Austin Clyde, Ian Foster, Zhi Hong, Shantenu Jha, Zhuozhao Li, Xuefeng Liu, Arvind Ramanathan, Yi Ren, Nicholaus Saint, Marcus Schwarting, Rick Stevens, Hubertus van Dam, Rick Wagner

    Abstract: Researchers across the globe are seeking to rapidly repurpose existing drugs or discover new drugs to counter the the novel coronavirus disease (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). One promising approach is to train machine learning (ML) and artificial intelligence (AI) tools to screen large numbers of small molecules. As a contribution to that effort,… ▽ More

    Submitted 27 May, 2020; originally announced June 2020.

    Comments: 11 pages, 5 figures

  47. arXiv:2005.11300  [pdf, other

    stat.ML cs.LG cs.MS stat.CO

    Model Evidence with Fast Tree Based Quadrature

    Authors: Thomas Foster, Chon Lok Lei, Martin Robinson, David Gavaghan, Ben Lambert

    Abstract: High dimensional integration is essential to many areas of science, ranging from particle physics to Bayesian inference. Approximating these integrals is hard, due in part to the difficulty of locating and sampling from regions of the integration domain that make significant contributions to the overall integral. Here, we present a new algorithm called Tree Quadrature (TQ) that separates this samp… ▽ More

    Submitted 22 May, 2020; originally announced May 2020.

  48. arXiv:2003.05520  [pdf, ps, other

    cs.CE

    Deriving peridynamic influence functions for one-dimensional elastic materials with periodic microstructure

    Authors: Xiao Xu, John T. Foster

    Abstract: The influence function in peridynamic material models has a large effect on the dynamic behavior of elastic waves and in turn can greatly effect dynamic simulations of fracture propagation and material failure. Typically, the influence functions that are used in peridynamic models are selected for their numerical properties without regard to physical considerations. In this work, we present a meth… ▽ More

    Submitted 4 March, 2020; originally announced March 2020.

  49. arXiv:1912.12371  [pdf

    q-bio.OT cs.SE

    Open Source Software Sustainability Models: Initial White Paper from the Informatics Technology for Cancer Research Sustainability and Industry Partnership Work Group

    Authors: Y. Ye, R. D. Boyce, M. K. Davis, K. Elliston, C. Davatzikos, A. Fedorov, J. C. Fillion-Robin, I. Foster, J. Gilbertson, M. Heiskanen, J. Klemm, A. Lasso, J. V. Miller, M. Morgan, S. Pieper, B. Raumann, B. Sarachan, G. Savova, J. C. Silverstein, D. Taylor, J. Zelnis, G. Q. Zhang, M. J. Becich

    Abstract: The Sustainability and Industry Partnership Work Group (SIP-WG) is a part of the National Cancer Institute Informatics Technology for Cancer Research (ITCR) program. The charter of the SIP-WG is to investigate options of long-term sustainability of open source software (OSS) developed by the ITCR, in part by developing a collection of business model archetypes that can serve as sustainability plan… ▽ More

    Submitted 1 January, 2020; v1 submitted 27 December, 2019; originally announced December 2019.

    Comments: 21-page main manuscript, 43-page supplemental file

  50. arXiv:1911.05878  [pdf, other

    eess.IV cs.CV cs.LG

    Scientific Image Restoration Anywhere

    Authors: Vibhatha Abeykoon, Zhengchun Liu, Rajkumar Kettimuthu, Geoffrey Fox, Ian Foster

    Abstract: The use of deep learning models within scientific experimental facilities frequently requires low-latency inference, so that, for example, quality control operations can be performed while data are being collected. Edge computing devices can be useful in this context, as their low cost and compact form factor permit them to be co-located with the experimental apparatus. Can such devices, with thei… ▽ More

    Submitted 12 November, 2019; originally announced November 2019.

    Comments: 6 pages, 8 figures, 1 table