Skip to main content

Showing 1–7 of 7 results for author: Ross, R B

  1. Union: An Automatic Workload Manager for Accelerating Network Simulation

    Authors: Xin Wang, Misbah Mubarak, Yao Kang, Robert B. Ross, Zhiling Lan

    Abstract: With the rapid growth of the machine learning applications, the workloads of future HPC systems are anticipated to be a mix of scientific simulation, big data analytics, and machine learning applications. Simulation is a great research vehicle to understand the performance implications of co-running scientific applications with big data and machine learning workloads on large-scale systems. In thi… ▽ More

    Submitted 3 April, 2024; v1 submitted 24 March, 2024; originally announced March 2024.

  2. arXiv:2402.07877  [pdf, other

    cs.AI

    WildfireGPT: Tailored Large Language Model for Wildfire Analysis

    Authors: Yangxinyu Xie, Tanwi Mallick, Joshua David Bergerson, John K. Hutchison, Duane R. Verner, Jordan Branham, M. Ross Alexander, Robert B. Ross, Yan Feng, Leslie-Anne Levy, Weijie Su

    Abstract: The recent advancement of large language models (LLMs) represents a transformational capability at the frontier of artificial intelligence (AI) and machine learning (ML). However, LLMs are generalized models, trained on extensive text corpus, and often struggle to provide context-specific information, particularly in areas requiring specialized knowledge such as wildfire details within the broader… ▽ More

    Submitted 12 February, 2024; originally announced February 2024.

  3. arXiv:2310.16996  [pdf, other

    cs.LG cs.DC

    Towards Continually Learning Application Performance Models

    Authors: Ray A. O. Sinurat, Anurag Daram, Haryadi S. Gunawi, Robert B. Ross, Sandeep Madireddy

    Abstract: Machine learning-based performance models are increasingly being used to build critical job scheduling and application optimization decisions. Traditionally, these models assume that data distribution does not change as more samples are collected over time. However, owing to the complexity and heterogeneity of production HPC systems, they are susceptible to hardware degradation, replacement, and/o… ▽ More

    Submitted 25 October, 2023; originally announced October 2023.

    Comments: Presented at Workshop on Machine Learning for Systems at 36th Conference on Neural Information Processing Systems (NeurIPS 2022)

  4. arXiv:2204.08180  [pdf, other

    cs.DC cs.PF

    A Taxonomy of Error Sources in HPC I/O Machine Learning Models

    Authors: Mihailo Isakov, Mikaela Currier, Eliakin del Rosario, Sandeep Madireddy, Prasanna Balaprakash, Philip Carns, Robert B. Ross, Glenn K. Lockwood, Michel A. Kinsy

    Abstract: I/O efficiency is crucial to productivity in scientific computing, but the increasing complexity of the system and the applications makes it difficult for practitioners to understand and optimize I/O behavior at scale. Data-driven machine learning-based I/O throughput models offer a solution: they can be used to identify bottlenecks, automate I/O tuning, or optimize job scheduling with minimal hum… ▽ More

    Submitted 18 April, 2022; originally announced April 2022.

    Report number: STAM01

  5. arXiv:2001.09399  [pdf, other

    cs.DC cs.HC cs.LG cs.PF

    A Visual Analytics Framework for Reviewing Streaming Performance Data

    Authors: Suraj P. Kesavan, Takanori Fujiwara, Jianping Kelvin Li, Caitlin Ross, Misbah Mubarak, Christopher D. Carothers, Robert B. Ross, Kwan-Liu Ma

    Abstract: Understanding and tuning the performance of extreme-scale parallel computing systems demands a streaming approach due to the computational cost of applying offline algorithms to vast amounts of performance log data. Analyzing large streaming data is challenging because the rate of receiving data and limited time to comprehend data make it difficult for the analysts to sufficiently examine the data… ▽ More

    Submitted 25 January, 2020; originally announced January 2020.

    Comments: This is the author's preprint version that will be published in Proceedings of IEEE Pacific Visualization Symposium, 2020

  6. arXiv:1510.02135  [pdf, other

    cs.DC

    A Remote Procedure Call Approach for Extreme-scale Services

    Authors: Jerome Soumagne, Philip H. Carns, Dries Kimpe, Quincey Koziol, Robert B. Ross

    Abstract: When working at exascale, the various constraints imposed by the extreme scale of the system bring new challenges for application users and software/middleware developers. In that context, and to provide best performance, resiliency and energy efficiency, software may be provided as a service oriented approach, adjusting resource utilization to best meet facility and user requirements. Remote proc… ▽ More

    Submitted 5 October, 2015; originally announced October 2015.

    Comments: CSESSP 2015

  7. arXiv:1509.05492  [pdf, other

    cs.DC

    Challenges and Considerations for Utilizing Burst Buffers in High-Performance Computing

    Authors: Melissa Romanus, Robert B. Ross, Manish Parashar

    Abstract: As high-performance computing (HPC) moves into the exascale era, computer scientists and engineers must find innovative ways of transferring and processing unprecedented amounts of data. As the scale and complexity of the applications running on these machines increases, the cost of their interactions and data exchanges (in terms of latency, energy, runtime, etc.) can increase exponentially. In or… ▽ More

    Submitted 29 September, 2015; v1 submitted 17 September, 2015; originally announced September 2015.

    Comments: 18 pages, 2 figures

    ACM Class: B.4.3; D.4.2; C.1.4